1
|
Sun X, Wu Z, Su J, Li C. A deep attention model for wide-genome protein-peptide binding affinity prediction at a sequence level. Int J Biol Macromol 2024; 276:133811. [PMID: 38996881 DOI: 10.1016/j.ijbiomac.2024.133811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 07/09/2024] [Accepted: 07/09/2024] [Indexed: 07/14/2024]
Abstract
Peptides are pivotal in numerous biological activities by engaging in up to 40 % of protein-protein interactions in many cellular processes. Due to their exceptional specificity and effectiveness, peptides have emerged as promising candidates for drug design. However, accurately predicting protein-peptide binding affinity remains a challenging. Aiming at the problem, we develop a prediction model PepPAP based on convolutional neural network and multi-head attention, which relies solely on sequence features. These features include physicochemical properties, intrinsic disorder, sequence encoding, and especially interface propensity which is extracted from 16,689 non-redundant protein-peptide complexes. Notably, the adopted regression stratification cross-validation scheme proposed in our previous work is beneficial to improve the prediction for the cases with extreme binding affinity values. On three benchmark test datasets: T100, a series of peptides targeting to PDZ domain and CXCR4, PepPAP shows excellent performance, outperforming the existing methods and demonstrating its good generalization ability. Furthermore, PepPAP has good results in binary interaction prediction, and the analysis of the feature space distribution visualization highlights PepPAP's effectiveness. To the best of our knowledge, PepPAP is the first sequence-based deep attention model for wide-genome protein-peptide binding affinity prediction, and holds the potential to offer valuable insights for the peptide-based drug design.
Collapse
Affiliation(s)
- Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Jingjie Su
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
2
|
Ehrlich R, Glynn E, Singh M, Ghersi D. Computational Methods for Predicting Key Interactions in T Cell-Mediated Adaptive Immunity. Annu Rev Biomed Data Sci 2024; 7:295-316. [PMID: 38748864 DOI: 10.1146/annurev-biodatasci-102423-122741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
The adaptive immune system recognizes pathogen- and cancer-specific features and is endowed with memory, enabling it to respond quickly and efficiently to repeated encounters with the same antigens. T cells play a central role in the adaptive immune system by directly targeting intracellular pathogens and helping to activate B cells to secrete antibodies. Several fundamental protein interactions-including those between major histocompatibility complex (MHC) proteins and antigen-derived peptides as well as between T cell receptors and peptide-MHC complexes-underlie the ability of T cells to recognize antigens with great precision. Computational approaches to predict these interactions are increasingly being used for medically relevant applications, including vaccine design and prediction of patient response to cancer immunotherapies. We provide computational researchers with an accessible introduction to the adaptive immune system, review computational approaches to predict the key protein interactions underlying T cell-mediated adaptive immunity, and highlight remaining challenges.
Collapse
Affiliation(s)
- Ryan Ehrlich
- School of Interdisciplinary Informatics, University of Nebraska, Omaha, Nebraska, USA;
| | - Eric Glynn
- Lewis-Sigler Institute, Princeton University, Princeton, New Jersey, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA;
- Lewis-Sigler Institute, Princeton University, Princeton, New Jersey, USA
| | - Dario Ghersi
- School of Interdisciplinary Informatics, University of Nebraska, Omaha, Nebraska, USA;
| |
Collapse
|
3
|
Bulashevska A, Nacsa Z, Lang F, Braun M, Machyna M, Diken M, Childs L, König R. Artificial intelligence and neoantigens: paving the path for precision cancer immunotherapy. Front Immunol 2024; 15:1394003. [PMID: 38868767 PMCID: PMC11167095 DOI: 10.3389/fimmu.2024.1394003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/13/2024] [Indexed: 06/14/2024] Open
Abstract
Cancer immunotherapy has witnessed rapid advancement in recent years, with a particular focus on neoantigens as promising targets for personalized treatments. The convergence of immunogenomics, bioinformatics, and artificial intelligence (AI) has propelled the development of innovative neoantigen discovery tools and pipelines. These tools have revolutionized our ability to identify tumor-specific antigens, providing the foundation for precision cancer immunotherapy. AI-driven algorithms can process extensive amounts of data, identify patterns, and make predictions that were once challenging to achieve. However, the integration of AI comes with its own set of challenges, leaving space for further research. With particular focus on the computational approaches, in this article we have explored the current landscape of neoantigen prediction, the fundamental concepts behind, the challenges and their potential solutions providing a comprehensive overview of this rapidly evolving field.
Collapse
Affiliation(s)
- Alla Bulashevska
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Zsófia Nacsa
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Franziska Lang
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, Mainz, Germany
| | - Markus Braun
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Martin Machyna
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Mustafa Diken
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, Mainz, Germany
| | - Liam Childs
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Renate König
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| |
Collapse
|
4
|
Jiang M, Yu Z, Lan X. VitTCR: A deep learning method for peptide recognition prediction. iScience 2024; 27:109770. [PMID: 38711451 PMCID: PMC11070698 DOI: 10.1016/j.isci.2024.109770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 01/21/2024] [Accepted: 04/15/2024] [Indexed: 05/08/2024] Open
Abstract
This study introduces VitTCR, a predictive model based on the vision transformer (ViT) architecture, aimed at identifying interactions between T cell receptors (TCRs) and peptides, crucial for developing cancer immunotherapies and vaccines. VitTCR converts TCR-peptide interactions into numerical AtchleyMaps using Atchley factors for prediction, achieving AUROC (0.6485) and AUPR (0.6295) values. Benchmark analysis indicates VitTCR's performance is comparable to other models, with further comparative studies suggested to understand its effectiveness in varied contexts. Additionally, integrating a positional bias weight matrix (PBWM), derived from amino acid contact probabilities in structurally resolved pMHC-TCR complexes, slightly improves VitTCR's accuracy. The model's predictions show weak yet statistically significant correlations with immunological factors like T cell clonal expansion and activation percentages, underscoring the biological relevance of VitTCR's predictive capabilities. VitTCR emerges as a valuable computational tool for predicting TCR-peptide interactions, offering insights for immunotherapy and vaccine development.
Collapse
Affiliation(s)
- Mengnan Jiang
- School of Medicine, Tsinghua University, Beijing 100084, China
| | - Zilan Yu
- School of Medicine, Tsinghua University, Beijing 100084, China
- Centre for Life Sciences, Tsinghua University, Beijing 100084, China
| | - Xun Lan
- School of Medicine, Tsinghua University, Beijing 100084, China
- Centre for Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, MOE Key Laboratory of Tsinghua University, Beijing, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| |
Collapse
|
5
|
Shahjahan, Dey JK, Dey SK. Translational bioinformatics approach to combat cardiovascular disease and cancers. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2024; 139:221-261. [PMID: 38448136 DOI: 10.1016/bs.apcsb.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
Bioinformatics is an interconnected subject of science dealing with diverse fields including biology, chemistry, physics, statistics, mathematics, and computer science as the key fields to answer complicated physiological problems. Key intention of bioinformatics is to store, analyze, organize, and retrieve essential information about genome, proteome, transcriptome, metabolome, as well as organisms to investigate the biological system along with its dynamics, if any. The outcome of bioinformatics depends on the type, quantity, and quality of the raw data provided and the algorithm employed to analyze the same. Despite several approved medicines available, cardiovascular disorders (CVDs) and cancers comprises of the two leading causes of human deaths. Understanding the unknown facts of both these non-communicable disorders is inevitable to discover new pathways, find new drug targets, and eventually newer drugs to combat them successfully. Since, all these goals involve complex investigation and handling of various types of macro- and small- molecules of the human body, bioinformatics plays a key role in such processes. Results from such investigation has direct human application and thus we call this filed as translational bioinformatics. Current book chapter thus deals with diverse scope and applications of this translational bioinformatics to find cure, diagnosis, and understanding the mechanisms of CVDs and cancers. Developing complex yet small or long algorithms to address such problems is very common in translational bioinformatics. Structure-based drug discovery or AI-guided invention of novel antibodies that too with super-high accuracy, speed, and involvement of considerably low amount of investment are some of the astonishing features of the translational bioinformatics and its applications in the fields of CVDs and cancers.
Collapse
Affiliation(s)
- Shahjahan
- Laboratory for Structural Biology of Membrane Proteins, Dr. B.R. Ambedkar Center for Biomedical Research, University of Delhi, Delhi, India
| | - Joy Kumar Dey
- Central Council for Research in Homoeopathy, Ministry of Ayush, Govt. of India, New Delhi, Delhi, India
| | - Sanjay Kumar Dey
- Laboratory for Structural Biology of Membrane Proteins, Dr. B.R. Ambedkar Center for Biomedical Research, University of Delhi, Delhi, India.
| |
Collapse
|
6
|
Conev A, Fasoulis R, Hall-Swan S, Ferreira R, Kavraki LE. HLAEquity: Examining biases in pan-allele peptide-HLA binding predictors. iScience 2024; 27:108613. [PMID: 38188519 PMCID: PMC10770483 DOI: 10.1016/j.isci.2023.108613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 11/13/2023] [Accepted: 11/29/2023] [Indexed: 01/09/2024] Open
Abstract
Peptide-HLA (pHLA) binding prediction is essential in screening peptide candidates for personalized peptide vaccines. Machine learning (ML) pHLA binding prediction tools are trained on vast amounts of data and are effective in screening peptide candidates. Most ML models report the ability to generalize to HLA alleles unseen during training ("pan-allele" models). However, the use of datasets with imbalanced allele content raises concerns about biased model performance. First, we examine the data bias of two ML-based pan-allele pHLA binding predictors. We find that the pHLA datasets overrepresent alleles from geographic populations of high-income countries. Second, we show that the identified data bias is perpetuated within ML models, leading to algorithmic bias and subpar performance for alleles expressed in low-income geographic populations. We draw attention to the potential therapeutic consequences of this bias, and we challenge the use of the term "pan-allele" to describe models trained with currently available public datasets.
Collapse
Affiliation(s)
- Anja Conev
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Romanos Fasoulis
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Sarah Hall-Swan
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Rodrigo Ferreira
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Lydia E. Kavraki
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
7
|
Wu J, Li J, Chen S, Zhou Z. DeepHLApan: A Deep Learning Approach for the Prediction of Peptide-HLA Binding and Immunogenicity. Methods Mol Biol 2024; 2809:237-244. [PMID: 38907901 DOI: 10.1007/978-1-0716-3874-3_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2024]
Abstract
Neoantigens are crucial in distinguishing cancer cells from normal ones and play a significant role in cancer immunotherapy. The field of bioinformatics prediction for tumor neoantigens has rapidly developed, focusing on the prediction of peptide-HLA binding affinity. In this chapter, we introduce a user-friendly tool named DeepHLApan, which utilizes deep learning techniques to predict neoantigens by considering both peptide-HLA binding affinity and immunogenicity. We provide the application of DeepHLApan, along with the source code, docker version, and web-server. These resources are freely available at https://github.com/zjupgx/deephlapan and http://pgx.zju.edu.cn/deephlapan/ .
Collapse
Affiliation(s)
- Jingcheng Wu
- Institute of Drug Metabolism and Pharmaceutical Analysis, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jiaoyang Li
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Shuqing Chen
- Institute of Drug Metabolism and Pharmaceutical Analysis, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Zhan Zhou
- Institute of Drug Metabolism and Pharmaceutical Analysis, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
| |
Collapse
|
8
|
Kalemati M, Darvishi S, Koohi S. CapsNet-MHC predicts peptide-MHC class I binding based on capsule neural networks. Commun Biol 2023; 6:492. [PMID: 37147498 PMCID: PMC10162658 DOI: 10.1038/s42003-023-04867-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 04/24/2023] [Indexed: 05/07/2023] Open
Abstract
The Major Histocompatibility Complex (MHC) binds to the derived peptides from pathogens to present them to killer T cells on the cell surface. Developing computational methods for accurate, fast, and explainable peptide-MHC binding prediction can facilitate immunotherapies and vaccine development. Various deep learning-based methods rely on separate feature extraction from the peptide and MHC sequences and ignore their pairwise binding information. This paper develops a capsule neural network-based method to efficiently capture the peptide-MHC complex features to predict the peptide-MHC class I binding. Various evaluations confirmed our method outperformance over the alternative methods, while it can provide accurate prediction over less available data. Moreover, for providing precise insights into the results, we explored the essential features that contributed to the prediction. Since the simulation results demonstrated consistency with the experimental studies, we concluded that our method can be utilized for the accurate, rapid, and interpretable peptide-MHC binding prediction to assist biological therapies.
Collapse
Affiliation(s)
- Mahmood Kalemati
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Saeid Darvishi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Somayyeh Koohi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
| |
Collapse
|
9
|
Hasanzadeh A, Hamblin MR, Kiani J, Noori H, Hardie JM, Karimi M, Shafiee H. Could artificial intelligence revolutionize the development of nanovectors for gene therapy and mRNA vaccines? NANO TODAY 2022; 47:101665. [PMID: 37034382 PMCID: PMC10081506 DOI: 10.1016/j.nantod.2022.101665] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Gene therapy enables the introduction of nucleic acids like DNA and RNA into host cells, and is expected to revolutionize the treatment of a wide range of diseases. This growth has been further accelerated by the discovery of CRISPR/Cas technology, which allows accurate genomic editing in a broad range of cells and organisms in vitro and in vivo. Despite many advances in gene delivery and the development of various viral and non-viral gene delivery vectors, the lack of highly efficient non-viral systems with low cellular toxicity remains a challenge. The application of cutting-edge technologies such as artificial intelligence (AI) has great potential to find new paradigms to solve this issue. Herein, we review AI and its major subfields including machine learning (ML), neural networks (NNs), expert systems, deep learning (DL), computer vision and robotics. We discuss the potential of AI-based models and algorithms in the design of targeted gene delivery vehicles capable of crossing extracellular and intracellular barriers by viral mimicry strategies. We finally discuss the role of AI in improving the function of CRISPR/Cas systems, developing novel nanobots, and mRNA vaccine carriers.
Collapse
Affiliation(s)
- Akbar Hasanzadeh
- Cellular and Molecular Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Department of Medical Nanotechnology, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran 1449614535, Iran
| | - Michael R Hamblin
- Laser Research Centre, Faculty of Health Science, University of Johannesburg, Doornfontein 2028, South Africa
- Radiation Biology Research Center, Iran University of Medical Sciences, Tehran, Iran
| | - Jafar Kiani
- Oncopathology Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Department of Molecular Medicine, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Hamid Noori
- Cellular and Molecular Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Department of Medical Nanotechnology, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran 1449614535, Iran
| | - Joseph M. Hardie
- Division of Engineering in Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02139 USA
| | - Mahdi Karimi
- Cellular and Molecular Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Department of Medical Nanotechnology, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Oncopathology Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Research Center for Science and Technology in Medicine, Tehran University of Medical Sciences, Tehran 141556559, Iran
- Applied Biotechnology Research Centre, Tehran Medical Science, Islamic Azad University, Tehran 1584743311, Iran
| | - Hadi Shafiee
- Division of Engineering in Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02139 USA
| |
Collapse
|
10
|
Liu Z, Jin J, Cui Y, Xiong Z, Nasiri A, Zhao Y, Hu J. DeepSeqPanII: An Interpretable Recurrent Neural Network Model With Attention Mechanism for Peptide-HLA Class II Binding Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2188-2196. [PMID: 33886473 DOI: 10.1109/tcbb.2021.3074927] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Human leukocyte antigen (HLA) complex molecules play an essential role in immune interactions by presenting peptides on the cell surface to T cells. With significant deep learning progress, a series of neural network-based models have been proposed and demonstrated with their excellent performances for peptide-HLA class I binding prediction. However, there is still a lack of effective binding prediction models for HLA class II protein binding with peptides due to its inherent challenges. We present a novel sequence-based pan-specific neural network structure, DeepSeaPanII, for peptide-HLA class II binding prediction in this work. Our model is an end-to-end neural network model without the need for pre-or post-processing on input samples compared with existing pan-specific models. Besides state-of-the-art performance in binding affinity prediction, DeepSeqPanII can also extract biological insight on the binding mechanism over the peptide by its attention mechanism-based binding core prediction capability. The leave-one-allele-out cross-validation and benchmark evaluation results show that our proposed network model achieved state-of-the-art performance in HLA-II peptide binding. The source code and trained models are freely available at https://github.com/pcpLiu/DeepSeqPanII.
Collapse
|
11
|
Keller GLJ, Weiss LI, Baker BM. Physicochemical Heuristics for Identifying High Fidelity, Near-Native Structural Models of Peptide/MHC Complexes. Front Immunol 2022; 13:887759. [PMID: 35547730 PMCID: PMC9084917 DOI: 10.3389/fimmu.2022.887759] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 03/29/2022] [Indexed: 11/13/2022] Open
Abstract
There is long-standing interest in accurately modeling the structural features of peptides bound and presented by class I MHC proteins. This interest has grown with the advent of rapid genome sequencing and the prospect of personalized, peptide-based cancer vaccines, as well as the development of molecular and cellular therapeutics based on T cell receptor recognition of peptide-MHC. However, while the speed and accessibility of peptide-MHC modeling has improved substantially over the years, improvements in accuracy have been modest. Accuracy is crucial in peptide-MHC modeling, as T cell receptors are highly sensitive to peptide conformation and capturing fine details is therefore necessary for useful models. Studying nonameric peptides presented by the common class I MHC protein HLA-A*02:01, here we addressed a key question common to modern modeling efforts: from a set of models (or decoys) generated through conformational sampling, which is best? We found that the common strategy of decoy selection by lowest energy can lead to substantial errors in predicted structures. We therefore adopted a data-driven approach and trained functions capable of predicting near native decoys with exceptionally high accuracy. Although our implementation is limited to nonamer/HLA-A*02:01 complexes, our results serve as an important proof of concept from which improvements can be made and, given the significance of HLA-A*02:01 and its preference for nonameric peptides, should have immediate utility in select immunotherapeutic and other efforts for which structural information would be advantageous.
Collapse
Affiliation(s)
- Grant L J Keller
- Department of Chemistry & Biochemistry and the Harper Cancer Research Institute, University of Notre Dame, Notre Dame, IN, United States
| | - Laura I Weiss
- Department of Chemistry & Biochemistry and the Harper Cancer Research Institute, University of Notre Dame, Notre Dame, IN, United States
| | - Brian M Baker
- Department of Chemistry & Biochemistry and the Harper Cancer Research Institute, University of Notre Dame, Notre Dame, IN, United States
| |
Collapse
|
12
|
A transformer-based model to predict peptide–HLA class I binding and optimize mutated peptides for vaccine design. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00459-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
13
|
Cheng R, Xu Z, Luo M, Wang P, Cao H, Jin X, Zhou W, Xiao L, Jiang Q. Identification of alternative splicing-derived cancer neoantigens for mRNA vaccine development. Brief Bioinform 2022; 23:bbab553. [PMID: 35279714 DOI: 10.1093/bib/bbab553] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 11/15/2021] [Accepted: 12/02/2021] [Indexed: 12/17/2023] Open
Abstract
Messenger RNA (mRNA) vaccines have shown great potential for anti-tumor therapy due to the advantages in safety, efficacy and industrial production. However, it remains a challenge to identify suitable cancer neoantigens that can be targeted for mRNA vaccines. Abnormal alternative splicing occurs in a variety of tumors, which may result in the translation of abnormal transcripts into tumor-specific proteins. High-throughput technologies make it possible for systematic characterization of alternative splicing as a source of suitable target neoantigens for mRNA vaccine development. Here, we summarized difficulties and challenges for identifying alternative splicing-derived cancer neoantigens from RNA-seq data and proposed a conceptual framework for designing personalized mRNA vaccines based on alternative splicing-derived cancer neoantigens. In addition, several points were presented to spark further discussion toward improving the identification of alternative splicing-derived cancer neoantigens.
Collapse
Affiliation(s)
- Rui Cheng
- Harbin Institute of Technology, China
| | | | - Meng Luo
- Harbin Institute of Technology, China
| | | | | | | | | | | | | |
Collapse
|
14
|
Lantz O, Teyton L. Identification of T cell antigens in the 21st century, as difficult as ever. Semin Immunol 2022; 60:101659. [PMID: 36183497 PMCID: PMC10332289 DOI: 10.1016/j.smim.2022.101659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Identifying antigens recognized by T cells is still challenging, particularly for innate like T cells that do not recognize peptides but small metabolites or lipids in the context of MHC-like molecules or see non-MHC restricted antigens. The fundamental reason for this situation is the low affinity of T cell receptors for their ligands coupled with a level of degeneracy that makes them bind to similar surfaces on antigen presenting cells. Herein we will describe non-exhaustively some of the methods that were used to identify peptide antigens and briefly mention the high throughput methods more recently proposed for that purpose. We will then present how the molecules recognized by innate like T cells (NKT, MAIT and γδ T cells) were discovered. We will show that serendipity was instrumental in many cases.
Collapse
Affiliation(s)
- Olivier Lantz
- INSERM U932, PSL University, Institut Curie, 75005 Paris, France; Laboratoire d'Immunologie Clinique, Institut Curie, Paris 75005, France; Centre d'investigation Clinique en Biothérapie Gustave-Roussy Institut Curie (CIC-BT1428) Institut Curie, Paris 75005, France
| | - Luc Teyton
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA.
| |
Collapse
|
15
|
Dickinson Q, Meyer JG. Positional SHAP (PoSHAP) for Interpretation of machine learning models trained from biological sequences. PLoS Comput Biol 2022; 18:e1009736. [PMID: 35089914 PMCID: PMC8797255 DOI: 10.1371/journal.pcbi.1009736] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 12/09/2021] [Indexed: 11/29/2022] Open
Abstract
Machine learning with multi-layered artificial neural networks, also known as "deep learning," is effective for making biological predictions. However, model interpretation is challenging, especially for sequential input data used with recurrent neural network architectures. Here, we introduce a framework called "Positional SHAP" (PoSHAP) to interpret models trained from biological sequences by utilizing SHapely Additive exPlanations (SHAP) to generate positional model interpretations. We demonstrate this using three long short-term memory (LSTM) regression models that predict peptide properties, including binding affinity to major histocompatibility complexes (MHC), and collisional cross section (CCS) measured by ion mobility spectrometry. Interpretation of these models with PoSHAP reproduced MHC class I (rhesus macaque Mamu-A1*001 and human A*11:01) peptide binding motifs, reflected known properties of peptide CCS, and provided new insights into interpositional dependencies of amino acid interactions. PoSHAP should have widespread utility for interpreting a variety of models trained from biological sequences.
Collapse
Affiliation(s)
- Quinn Dickinson
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin
| | - Jesse G. Meyer
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin
| |
Collapse
|
16
|
Fotakis G, Trajanoski Z, Rieder D. Computational cancer neoantigen prediction: current status and recent advances. IMMUNO-ONCOLOGY TECHNOLOGY 2021; 12:100052. [PMID: 35755950 PMCID: PMC9216660 DOI: 10.1016/j.iotech.2021.100052] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Over the last few decades, immunotherapy has shown significant therapeutic efficacy in a broad range of cancer types. Antitumor immune responses are contingent on the recognition of tumor-specific antigens, which are termed neoantigens. Tumor neoantigens are ideal targets for immunotherapy since they can be recognized as non-self antigens by the host immune system and thus are able to elicit an antitumor T-cell response. There are an increasing number of studies that highlight the importance of tumor neoantigens in immunoediting and in the sensitivity to immune checkpoint blockade. Therefore, one of the most fundamental tasks in the field of immuno-oncology research is the identification of patient-specific neoantigens. To this end, a plethora of computational approaches have been developed in order to predict tumor-specific aberrant peptides and quantify their likelihood of binding to patients' human leukocyte antigen molecules in order to be recognized by T cells. In this review, we systematically summarize and present the most recent advances in computational neoantigen prediction, and discuss the challenges and novel methods that are being developed to resolve them. Tumors have the ability to acquire immune escape mechanisms. Tumor-specific aberrant peptides (neoantigens) can elicit an immune response by the host immune system. The identification of neoantigens is one of the most fundamental tasks in the field of immuno-oncology research. A plethora of computational approaches have been developed in order to predict patient-specificneoantigens.
Collapse
Affiliation(s)
- G Fotakis
- Institute of Bioinformatics, Biocenter, Medical University of Innsbruck, Innsbruck, Austria
| | - Z Trajanoski
- Institute of Bioinformatics, Biocenter, Medical University of Innsbruck, Innsbruck, Austria
| | - D Rieder
- Institute of Bioinformatics, Biocenter, Medical University of Innsbruck, Innsbruck, Austria
| |
Collapse
|
17
|
Moris P, De Pauw J, Postovskaya A, Gielis S, De Neuter N, Bittremieux W, Ogunjimi B, Laukens K, Meysman P. Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification. Brief Bioinform 2021; 22:bbaa318. [PMID: 33346826 PMCID: PMC8294552 DOI: 10.1093/bib/bbaa318] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The prediction of epitope recognition by T-cell receptors (TCRs) has seen many advancements in recent years, with several methods now available that can predict recognition for a specific set of epitopes. However, the generic case of evaluating all possible TCR-epitope pairs remains challenging, mainly due to the high diversity of the interacting sequences and the limited amount of currently available training data. In this work, we provide an overview of the current state of this unsolved problem. First, we examine appropriate validation strategies to accurately assess the generalization performance of generic TCR-epitope recognition models when applied to both seen and unseen epitopes. In addition, we present a novel feature representation approach, which we call ImRex (interaction map recognition). This approach is based on the pairwise combination of physicochemical properties of the individual amino acids in the CDR3 and epitope sequences, which provides a convolutional neural network with the combined representation of both sequences. Lastly, we highlight various challenges that are specific to TCR-epitope data and that can adversely affect model performance. These include the issue of selecting negative data, the imbalanced epitope distribution of curated TCR-epitope datasets and the potential exchangeability of TCR alpha and beta chains. Our results indicate that while extrapolation to unseen epitopes remains a difficult challenge, ImRex makes this feasible for a subset of epitopes that are not too dissimilar from the training data. We show that appropriate feature engineering methods and rigorous benchmark standards are required to create and validate TCR-epitope predictive models.
Collapse
MESH Headings
- Animals
- Complementarity Determining Regions/genetics
- Complementarity Determining Regions/immunology
- Epitopes, T-Lymphocyte/genetics
- Epitopes, T-Lymphocyte/immunology
- Humans
- Macaca mulatta
- Mice
- Models, Genetic
- Models, Immunological
- Receptors, Antigen, T-Cell, alpha-beta/genetics
- Receptors, Antigen, T-Cell, alpha-beta/immunology
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Pieter Meysman
- Corresponding author: Pieter Meysman, Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, 2020, Belgium. E-mail:
| |
Collapse
|
18
|
Jiang L, Yu H, Li J, Tang J, Guo Y, Guo F. Predicting MHC class I binder: existing approaches and a novel recurrent neural network solution. Brief Bioinform 2021; 22:6299205. [PMID: 34131696 DOI: 10.1093/bib/bbab216] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 05/14/2021] [Accepted: 05/17/2021] [Indexed: 01/04/2023] Open
Abstract
Major histocompatibility complex (MHC) possesses important research value in the treatment of complex human diseases. A plethora of computational tools has been developed to predict MHC class I binders. Here, we comprehensively reviewed 27 up-to-date MHC I binding prediction tools developed over the last decade, thoroughly evaluating feature representation methods, prediction algorithms and model training strategies on a benchmark dataset from Immune Epitope Database. A common limitation was identified during the review that all existing tools can only handle a fixed peptide sequence length. To overcome this limitation, we developed a bilateral and variable long short-term memory (BVLSTM)-based approach, named BVLSTM-MHC. It is the first variable-length MHC class I binding predictor. In comparison to the 10 mainstream prediction tools on an independent validation dataset, BVLSTM-MHC achieved the best performance in six out of eight evaluated metrics. A web server based on the BVLSTM-MHC model was developed to enable accurate and efficient MHC class I binder prediction in human, mouse, macaque and chimpanzee.
Collapse
Affiliation(s)
- Limin Jiang
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Hui Yu
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Jiawei Li
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- Department of Computer Science, University of South Carolina, SC, USA.,Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yan Guo
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
19
|
Kardani K, Bolhassani A. Exploring novel and potent cell penetrating peptides in the proteome of SARS-COV-2 using bioinformatics approaches. PLoS One 2021; 16:e0247396. [PMID: 33606823 PMCID: PMC7894964 DOI: 10.1371/journal.pone.0247396] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Accepted: 02/06/2021] [Indexed: 02/08/2023] Open
Abstract
Among various delivery systems for vaccine and drug delivery, cell-penetrating peptides (CPPs) have been known as a potent delivery system because of their capability to penetrate cell membranes and deliver some types of cargoes into cells. Several CPPs were found in the proteome of viruses such as Tat originated from human immunodeficiency virus-1 (HIV-1), and VP22 derived from herpes simplex virus-1 (HSV-1). In the current study, a wide-range of CPPs was identified in the proteome of SARS-CoV-2, a new member of coronaviruses family, using in silico analyses. These CPPs may play a main role for high penetration of virus into cells and infection of host. At first, we submitted the proteome of SARS-CoV-2 to CellPPD web server that resulted in a huge number of CPPs with ten residues in length. Afterward, we submitted the predicted CPPs to C2Pred web server for evaluation of the probability of each peptide. Then, the uptake efficiency of each peptide was investigated using CPPred-RF and MLCPP web servers. Next, the physicochemical properties of the predicted CPPs including net charge, theoretical isoelectric point (pI), amphipathicity, molecular weight, and water solubility were calculated using protparam and pepcalc tools. In addition, the probability of membrane binding potential and cellular localization of each CPP were estimated by Boman index using APD3 web server, D factor, and TMHMM web server. On the other hand, the immunogenicity, toxicity, allergenicity, hemolytic potency, and half-life of CPPs were predicted using various web servers. Finally, the tertiary structure and the helical wheel projection of some CPPs were predicted by PEP-FOLD3 and Heliquest web servers, respectively. These CPPs were divided into: a) CPP containing tumor homing motif (RGD) and/or tumor penetrating motif (RXXR); b) CPP with the highest Boman index; c) CPP with high half-life (~100 hour) in mammalian cells, and d) CPP with +5.00 net charge. Based on the results, we found a large number of novel CPPs with various features. Some of these CPPs possess tumor-specific motifs which can be evaluated in cancer therapy. Furthermore, the novel and potent CPPs derived from SARS-CoV-2 may be used alone or conjugated to some sequences such as nuclear localization sequence (NLS) for vaccine and drug delivery.
Collapse
Affiliation(s)
- Kimia Kardani
- Department of Hepatitis and AIDS, Pasteur Institute of Iran, Tehran, Iran
| | - Azam Bolhassani
- Department of Hepatitis and AIDS, Pasteur Institute of Iran, Tehran, Iran
- * E-mail: ,
| |
Collapse
|
20
|
Jin J, Liu Z, Nasiri A, Cui Y, Louis SY, Zhang A, Zhao Y, Hu J. Deep learning pan-specific model for interpretable MHC-I peptide binding prediction with improved attention mechanism. Proteins 2021; 89:866-883. [PMID: 33594723 DOI: 10.1002/prot.26065] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 01/28/2021] [Accepted: 02/08/2021] [Indexed: 11/06/2022]
Abstract
Accurate prediction of peptide binding affinity to the major histocompatibility complex (MHC) proteins has the potential to design better therapeutic vaccines. Previous work has shown that pan-specific prediction algorithms can achieve better prediction performance than other approaches. However, most of the top algorithms are neural networks based black box models. Here, we propose DeepAttentionPan, an improved pan-specific model, based on convolutional neural networks and attention mechanisms for more flexible, stable and interpretable MHC-I binding prediction. With the attention mechanism, our ensemble model consisting of 20 trained networks achieves high and more stabilized prediction performance. Extensive tests on IEDB's weekly benchmark dataset show that our method achieves state-of-the-art prediction performance on 21 test allele datasets. Analysis of the peptide positional attention weights learned by our model demonstrates its capability to capture critical binding positions of the peptides, which leads to mechanistic understanding of MHC-peptide binding with high alignment with experimentally verified results. Furthermore, we show that with transfer learning, our pan model can be fine-tuned for alleles with few samples to achieve additional performance improvement. DeepAttentionPan is freely available as an open-source software at https://github.com/jjin49/DeepAttentionPan.
Collapse
Affiliation(s)
- Jing Jin
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Zhonghao Liu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Alireza Nasiri
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Yuxin Cui
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Stephen-Yves Louis
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Ansi Zhang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Yong Zhao
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Jianjun Hu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| |
Collapse
|
21
|
Systematic auditing is essential to debiasing machine learning in biology. Commun Biol 2021; 4:183. [PMID: 33568741 PMCID: PMC7876113 DOI: 10.1038/s42003-021-01674-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 11/12/2020] [Indexed: 12/20/2022] Open
Abstract
Biases in data used to train machine learning (ML) models can inflate their prediction performance and confound our understanding of how and what they learn. Although biases are common in biological data, systematic auditing of ML models to identify and eliminate these biases is not a common practice when applying ML in the life sciences. Here we devise a systematic, principled, and general approach to audit ML models in the life sciences. We use this auditing framework to examine biases in three ML applications of therapeutic interest and identify unrecognized biases that hinder the ML process and result in substantially reduced model performance on new datasets. Ultimately, we show that ML models tend to learn primarily from data biases when there is insufficient signal in the data to learn from. We provide detailed protocols, guidelines, and examples of code to enable tailoring of the auditing framework to other biomedical applications. Fatma-Elzahraa Eid et al. illustrate a principled approach for identifying biases that can inflate the performance of biological machine learning models. When applied to three biomedical prediction problems, they identify previously unrecognized biases and ultimately show that models are likely to learn primarily from data biases when there is insufficient learnable signal in the data.
Collapse
|
22
|
Mei S, Li F, Xiang D, Ayala R, Faridi P, Webb GI, Illing PT, Rossjohn J, Akutsu T, Croft NP, Purcell AW, Song J. Anthem: a user customised tool for fast and accurate prediction of binding between peptides and HLA class I molecules. Brief Bioinform 2021; 22:6102669. [PMID: 33454737 DOI: 10.1093/bib/bbaa415] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/29/2020] [Accepted: 12/16/2020] [Indexed: 12/17/2022] Open
Abstract
Neopeptide-based immunotherapy has been recognised as a promising approach for the treatment of cancers. For neopeptides to be recognised by CD8+ T cells and induce an immune response, their binding to human leukocyte antigen class I (HLA-I) molecules is a necessary first step. Most epitope prediction tools thus rely on the prediction of such binding. With the use of mass spectrometry, the scale of naturally presented HLA ligands that could be used to develop such predictors has been expanded. However, there are rarely efforts that focus on the integration of these experimental data with computational algorithms to efficiently develop up-to-date predictors. Here, we present Anthem for accurate HLA-I binding prediction. In particular, we have developed a user-friendly framework to support the development of customisable HLA-I binding prediction models to meet challenges associated with the rapidly increasing availability of large amounts of immunopeptidomic data. Our extensive evaluation, using both independent and experimental datasets shows that Anthem achieves an overall similar or higher area under curve value compared with other contemporary tools. It is anticipated that Anthem will provide a unique opportunity for the non-expert user to analyse and interpret their own in-house or publicly deposited datasets.
Collapse
Affiliation(s)
- Shutao Mei
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Fuyi Li
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Australia
| | - Dongxu Xiang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Rochelle Ayala
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Pouya Faridi
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | | | - Patricia T Illing
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Jamie Rossjohn
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan
| | - Nathan P Croft
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Anthony W Purcell
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Biochemistry and Molecular Biology, Monash University, Australia
| |
Collapse
|
23
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|
24
|
Heng Y, Kuang Z, Huang S, Chen L, Shi T, Xu L, Mei H. A Pan-Specific GRU-Based Recurrent Neural Network for Predicting HLA-I-Binding Peptides. ACS OMEGA 2020; 5:18321-18330. [PMID: 32743207 PMCID: PMC7391852 DOI: 10.1021/acsomega.0c02039] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 07/01/2020] [Indexed: 06/11/2023]
Abstract
Human leukocyte antigens (HLAs) play a critical role in human-acquired immune responses by the recognition of non-self-peptides derived from exogenous bacteria, fungi, virus, and so forth. The accurate prediction of HLA-binding peptides is thus extremely useful for the mechanistic research of cell-mediated immunity and related epitope-based vaccine design. In this work, a simple pan-specific gated recurrent unit (GRU)-based recurrent neural network model was successfully proposed for predicting HLA-I-binding peptides. In comparison with the available six allele-specific, four pan-specific, and two ensemble-based prediction models, the GRU model achieves the highest area under the receiver operating characteristic curve (AUC) scores for 21 of 64 entries of the test benchmark datasets. Besides, the GRU model also achieves satisfactory performance on other 24 entries, of which the AUC scores differ by less than 0.1 from the highest scores. Overall, taking the advantages of the GRU network and auto-embedding techniques into account, the established pan-specific GRU model is more simple and direct and shows satisfactory prediction performance for HLA-I-binding peptides with varying lengths.
Collapse
Affiliation(s)
- Yu Heng
- Key
Laboratory of Biorheological Science and Technology (Ministry of Education), Chongqing University, Chongqing 400044, China
- College
of Bioengineering, Chongqing University, Chongqing 400044, China
| | - Zuyin Kuang
- College
of Bioengineering, Chongqing University, Chongqing 400044, China
| | - Shuheng Huang
- College
of Bioengineering, Chongqing University, Chongqing 400044, China
| | - Linxin Chen
- College
of Bioengineering, Chongqing University, Chongqing 400044, China
| | - Tingting Shi
- College
of Bioengineering, Chongqing University, Chongqing 400044, China
| | - Lei Xu
- College
of Bioengineering, Chongqing University, Chongqing 400044, China
| | - Hu Mei
- Key
Laboratory of Biorheological Science and Technology (Ministry of Education), Chongqing University, Chongqing 400044, China
- College
of Bioengineering, Chongqing University, Chongqing 400044, China
| |
Collapse
|
25
|
Abstract
Our immune system plays a key role in health and disease as it is capable of responding to foreign antigens as well as acquired antigens from cancer cells. Latter are caused by somatic mutations, the so-called neoepitopes, and might be recognized by T cells if they are presented by HLA molecules on the surface of cancer cells. Personalized mutanome vaccines are a class of customized immunotherapies, which is dependent on the detection of individual cancer-specific tumor mutations and neoepitope (i.e., prediction, followed by a rational vaccine design, before on-demand production. The development of next generation sequencing (NGS) technologies and bioinformatic tools allows a large-scale analysis of each parameter involved in this process. Here, we provide an overview of the bioinformatic aspects involved in the design of personalized, neoantigen-based vaccines, including the detection of mutations and the subsequent prediction of potential epitopes, as well as methods for associated biomarker research, such as high-throughput sequencing of T-cell receptors (TCRs), followed by data analysis and the bioinformatics quantification of immune cell infiltration in cancer samples.
Collapse
Affiliation(s)
- Christoph Holtsträter
- TRON-Translationale Onkologie an der Universitätsmedizin der Johannes Gutenberg-Universität Mainz gemeinnützige GmbH, Freiligrathstraße, Mainz, Germany
| | - Barbara Schrörs
- TRON-Translationale Onkologie an der Universitätsmedizin der Johannes Gutenberg-Universität Mainz gemeinnützige GmbH, Freiligrathstraße, Mainz, Germany
| | - Thomas Bukur
- TRON-Translationale Onkologie an der Universitätsmedizin der Johannes Gutenberg-Universität Mainz gemeinnützige GmbH, Freiligrathstraße, Mainz, Germany
| | - Martin Löwer
- TRON-Translationale Onkologie an der Universitätsmedizin der Johannes Gutenberg-Universität Mainz gemeinnützige GmbH, Freiligrathstraße, Mainz, Germany.
| |
Collapse
|
26
|
Wu J, Wang W, Zhang J, Zhou B, Zhao W, Su Z, Gu X, Wu J, Zhou Z, Chen S. DeepHLApan: A Deep Learning Approach for Neoantigen Prediction Considering Both HLA-Peptide Binding and Immunogenicity. Front Immunol 2019; 10:2559. [PMID: 31736974 PMCID: PMC6838785 DOI: 10.3389/fimmu.2019.02559] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 10/15/2019] [Indexed: 12/30/2022] Open
Abstract
Neoantigens play important roles in cancer immunotherapy. Current methods used for neoantigen prediction focus on the binding between human leukocyte antigens (HLAs) and peptides, which is insufficient for high-confidence neoantigen prediction. In this study, we apply deep learning techniques to predict neoantigens considering both the possibility of HLA-peptide binding (binding model) and the potential immunogenicity (immunogenicity model) of the peptide-HLA complex (pHLA). The binding model achieves comparable performance with other well-acknowledged tools on the latest Immune Epitope Database (IEDB) benchmark datasets and an independent mass spectrometry (MS) dataset. The immunogenicity model could significantly improve the prediction precision of neoantigens. The further application of our method to the mutations with pre-existing T-cell responses indicating its feasibility in clinical application. DeepHLApan is freely available at https://github.com/jiujiezz/deephlapan and http://biopharm.zju.edu.cn/deephlapan.
Collapse
Affiliation(s)
- Jingcheng Wu
- Institute of Drug Metabolism and Pharmaceutical Analysis and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Wenzhe Wang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Jiucheng Zhang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Binbin Zhou
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Wenyi Zhao
- Institute of Drug Metabolism and Pharmaceutical Analysis and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Zhixi Su
- MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, China
| | - Xun Gu
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, United States
| | - Jian Wu
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Zhan Zhou
- Institute of Drug Metabolism and Pharmaceutical Analysis and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Shuqing Chen
- Institute of Drug Metabolism and Pharmaceutical Analysis and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|