1
|
Kim HY, Kim S, Park WY, Kim D. TSpred: a robust prediction framework for TCR-epitope interactions using paired chain TCR sequence data. Bioinformatics 2024; 40:btae472. [PMID: 39052940 PMCID: PMC11297499 DOI: 10.1093/bioinformatics/btae472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 06/11/2024] [Accepted: 07/25/2024] [Indexed: 07/27/2024] Open
Abstract
MOTIVATION Prediction of T-cell receptor (TCR)-epitope interactions is important for many applications in biomedical research, such as cancer immunotherapy and vaccine design. The prediction of TCR-epitope interactions remains challenging especially for novel epitopes, due to the scarcity of available data. RESULTS We propose TSpred, a new deep learning approach for the pan-specific prediction of TCR binding specificity based on paired chain TCR data. We develop a robust model that generalizes well to unseen epitopes by combining the predictive power of CNN and the attention mechanism. In particular, we design a reciprocal attention mechanism which focuses on extracting the patterns underlying TCR-epitope interactions. Upon a comprehensive evaluation of our model, we find that TSpred achieves state-of-the-art performances in both seen and unseen epitope specificity prediction tasks. Also, compared to other predictors, TSpred is more robust to bias related to peptide imbalance in the dataset. In addition, the reciprocal attention component of our model allows for model interpretability by capturing structurally important binding regions. Results indicate that TSpred is a robust and reliable method for the task of TCR-epitope binding prediction. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/ha01994/TSpred.
Collapse
Affiliation(s)
- Ha Young Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea
| | | | - Woong-Yang Park
- GENINUS Inc., Seoul 05836, South Korea
- Samsung Genome Institute, Samsung Medical Center, Seoul 06351, South Korea
- Department of Molecular Cell Biology, Sungkyunkwan University School of Medicine, Suwon 16419, South Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea
| |
Collapse
|
2
|
Pertseva M, Follonier O, Scarcella D, Reddy ST. TCR clustering by contrastive learning on antigen specificity. Brief Bioinform 2024; 25:bbae375. [PMID: 39129361 PMCID: PMC11317525 DOI: 10.1093/bib/bbae375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 07/09/2024] [Accepted: 07/25/2024] [Indexed: 08/13/2024] Open
Abstract
Effective clustering of T-cell receptor (TCR) sequences could be used to predict their antigen-specificities. TCRs with highly dissimilar sequences can bind to the same antigen, thus making their clustering into a common antigen group a central challenge. Here, we develop TouCAN, a method that relies on contrastive learning and pretrained protein language models to perform TCR sequence clustering and antigen-specificity predictions. Following training, TouCAN demonstrates the ability to cluster highly dissimilar TCRs into common antigen groups. Additionally, TouCAN demonstrates TCR clustering performance and antigen-specificity predictions comparable to other leading methods in the field.
Collapse
Affiliation(s)
- Margarita Pertseva
- Department of Biosystems Science and Engineering, ETH Zurich, Schanzenstrasse 44, 4056 Basel, Switzerland
- Life Science Zurich Graduate School, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Oceane Follonier
- Department of Biosystems Science and Engineering, ETH Zurich, Schanzenstrasse 44, 4056 Basel, Switzerland
| | - Daniele Scarcella
- Department of Biosystems Science and Engineering, ETH Zurich, Schanzenstrasse 44, 4056 Basel, Switzerland
| | - Sai T Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, Schanzenstrasse 44, 4056 Basel, Switzerland
| |
Collapse
|
3
|
Karnaukhov VK, Shcherbinin DS, Chugunov AO, Chudakov DM, Efremov RG, Zvyagin IV, Shugay M. Structure-based prediction of T cell receptor recognition of unseen epitopes using TCRen. NATURE COMPUTATIONAL SCIENCE 2024; 4:510-521. [PMID: 38987378 DOI: 10.1038/s43588-024-00653-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 06/04/2024] [Indexed: 07/12/2024]
Abstract
T cell receptor (TCR) recognition of foreign peptides presented by major histocompatibility complex protein is a major event in triggering the adaptive immune response to pathogens or cancer. The prediction of TCR-peptide interactions has great importance for therapy of cancer as well as infectious and autoimmune diseases but remains a major challenge, particularly for novel (unseen) peptide epitopes. Here we present TCRen, a structure-based method for ranking candidate unseen epitopes for a given TCR. The first stage of the TCRen pipeline is modeling of the TCR-peptide-major histocompatibility complex structure. Then a TCR-peptide residue contact map is extracted from this structure and used to rank all candidate epitopes on the basis of an interaction score with the target TCR. Scoring is performed using an energy potential derived from the statistics of TCR-peptide contact preferences in existing crystal structures. We show that TCRen has high performance in discriminating cognate versus unrelated peptides and can facilitate the identification of cancer neoepitopes recognized by tumor-infiltrating lymphocytes.
Collapse
MESH Headings
- Receptors, Antigen, T-Cell/immunology
- Receptors, Antigen, T-Cell/chemistry
- Receptors, Antigen, T-Cell/metabolism
- Humans
- Peptides/immunology
- Peptides/chemistry
- Epitopes/immunology
- Epitopes/chemistry
- Models, Molecular
- Neoplasms/immunology
- Epitopes, T-Lymphocyte/immunology
- Epitopes, T-Lymphocyte/chemistry
- Major Histocompatibility Complex/immunology
- Protein Conformation
- Lymphocytes, Tumor-Infiltrating/immunology
- Lymphocytes, Tumor-Infiltrating/metabolism
Collapse
Affiliation(s)
- Vadim K Karnaukhov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia.
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia.
| | - Dmitrii S Shcherbinin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
- Institute of Translational Medicine, Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Anton O Chugunov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Dmitriy M Chudakov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia.
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia.
- Institute of Translational Medicine, Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russia.
- Central European Institute of Technology, Brno, Czech Republic.
| | - Roman G Efremov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Higher School of Economics, Moscow, Russia
| | - Ivan V Zvyagin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
- Institute of Translational Medicine, Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Mikhail Shugay
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia.
- Institute of Translational Medicine, Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russia.
| |
Collapse
|
4
|
Velez-Arce A, Huang K, Li MM, Lin X, Gao W, Fu T, Kellis M, Pentelute BL, Zitnik M. TDC-2: Multimodal Foundation for Therapeutic Science. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.12.598655. [PMID: 38948789 PMCID: PMC11212894 DOI: 10.1101/2024.06.12.598655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Therapeutics Data Commons (tdcommons.ai) is an open science initiative with unified datasets, AI models, and benchmarks to support research across therapeutic modalities and drug discovery and development stages. The Commons 2.0 (TDC-2) is a comprehensive overhaul of Therapeutic Data Commons to catalyze research in multimodal models for drug discovery by unifying single-cell biology of diseases, biochemistry of molecules, and effects of drugs through multimodal datasets, AI-powered API endpoints, new multimodal tasks and model frameworks, and comprehensive benchmarks. TDC-2 introduces over 1,000 multimodal datasets spanning approximately 85 million cells, pre-calculated embeddings from 5 state-of-the-art single-cell models, and a biomedical knowledge graph. TDC-2 drastically expands the coverage of ML tasks across therapeutic pipelines and 10+ new modalities, spanning but not limited to single-cell gene expression data, clinical trial data, peptide sequence data, peptidomimetics protein-peptide interaction data regarding newly discovered ligands derived from AS-MS spectroscopy, novel 3D structural data for proteins, and cell-type-specific protein-protein interaction networks at single-cell resolution. TDC-2 introduces multimodal data access under an API-first design using the model-view-controller paradigm. TDC-2 introduces 7 novel ML tasks with fine-grained biological contexts: contextualized drug-target identification, single-cell chemical/genetic perturbation response prediction, protein-peptide binding affinity prediction task, and clinical trial outcome prediction task, which introduce antigen-processing-pathway-specific, cell-type-specific, peptide-specific, and patient-specific biological contexts. TDC-2 also releases benchmarks evaluating 15+ state-of-the-art models across 5+ new learning tasks evaluating models on diverse biological contexts and sampling approaches. Among these, TDC-2 provides the first benchmark for context-specific learning. TDC-2, to our knowledge, is also the first to introduce a protein-peptide binding interaction benchmark.
Collapse
|
5
|
Meynard-Piganeau B, Feinauer C, Weigt M, Walczak AM, Mora T. TULIP: A transformer-based unsupervised language model for interacting peptides and T cell receptors that generalizes to unseen epitopes. Proc Natl Acad Sci U S A 2024; 121:e2316401121. [PMID: 38838016 PMCID: PMC11181096 DOI: 10.1073/pnas.2316401121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 04/29/2024] [Indexed: 06/07/2024] Open
Abstract
The accurate prediction of binding between T cell receptors (TCR) and their cognate epitopes is key to understanding the adaptive immune response and developing immunotherapies. Current methods face two significant limitations: the shortage of comprehensive high-quality data and the bias introduced by the selection of the negative training data commonly used in the supervised learning approaches. We propose a method, Transformer-based Unsupervised Language model for Interacting Peptides and T cell receptors (TULIP), that addresses both limitations by leveraging incomplete data and unsupervised learning and using the transformer architecture of language models. Our model is flexible and integrates all possible data sources, regardless of their quality or completeness. We demonstrate the existence of a bias introduced by the sampling procedure used in previous supervised approaches, emphasizing the need for an unsupervised approach. TULIP recognizes the specific TCRs binding an epitope, performing well on unseen epitopes. Our model outperforms state-of-the-art models and offers a promising direction for the development of more accurate TCR epitope recognition models.
Collapse
Affiliation(s)
- Barthelemy Meynard-Piganeau
- Laboratory of Computational and Quantitative Biology, Institut de Biologie Paris Seine, CNRS, Sorbonne Université, Paris75005, France
- Department of Computing Sciences, Bocconi University, Milan20100, Italy
| | | | - Martin Weigt
- Laboratory of Computational and Quantitative Biology, Institut de Biologie Paris Seine, CNRS, Sorbonne Université, Paris75005, France
| | - Aleksandra M. Walczak
- Laboratoire de Physique de l’Ecole Normale Supérieure, Université Paris Sciences et Lettres, CNRS, Sorbonne Université, Université de Paris Cité, Paris75005, France
| | - Thierry Mora
- Laboratoire de Physique de l’Ecole Normale Supérieure, Université Paris Sciences et Lettres, CNRS, Sorbonne Université, Université de Paris Cité, Paris75005, France
| |
Collapse
|
6
|
Yu Z, Jiang M, Lan X. HeteroTCR: A heterogeneous graph neural network-based method for predicting peptide-TCR interaction. Commun Biol 2024; 7:684. [PMID: 38834836 DOI: 10.1038/s42003-024-06380-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 05/23/2024] [Indexed: 06/06/2024] Open
Abstract
Identifying interactions between T-cell receptors (TCRs) and immunogenic peptides holds profound implications across diverse research domains and clinical scenarios. Unsupervised clustering models (UCMs) cannot predict peptide-TCR binding directly, while supervised predictive models (SPMs) often face challenges in identifying antigens previously unencountered by the immune system or possessing limited TCR binding repertoires. Therefore, we propose HeteroTCR, an SPM based on Heterogeneous Graph Neural Network (GNN), to accurately predict peptide-TCR binding probabilities. HeteroTCR captures within-type (TCR-TCR or peptide-peptide) similarity information and between-type (peptide-TCR) interaction insights for predictions on unseen peptides and TCRs, surpassing limitations of existing SPMs. Our evaluation shows HeteroTCR outperforms state-of-the-art models on independent datasets. Ablation studies and visual interpretation underscore the Heterogeneous GNN module's critical role in enhancing HeteroTCR's performance by capturing pivotal binding process features. We further demonstrate the robustness and reliability of HeteroTCR through validation using single-cell datasets, aligning with the expectation that pMHC-TCR complexes with higher predicted binding probabilities correspond to increased binding fractions.
Collapse
Affiliation(s)
- Zilan Yu
- School of Medicine, Tsinghua University, 100084, Beijing, China
- Centre for Life Sciences, Tsinghua University, 100084, Beijing, China
| | - Mengnan Jiang
- School of Medicine, Tsinghua University, 100084, Beijing, China
| | - Xun Lan
- School of Medicine, Tsinghua University, 100084, Beijing, China.
- Centre for Life Sciences, Tsinghua University, 100084, Beijing, China.
- Tsinghua-Peking Center for Life Sciences, MOE Key Laboratory of Tsinghua University, Beijing, China.
- MOE Key Laboratory of Bioinformatics, Tsinghua University, 100084, Beijing, China.
| |
Collapse
|
7
|
Bulashevska A, Nacsa Z, Lang F, Braun M, Machyna M, Diken M, Childs L, König R. Artificial intelligence and neoantigens: paving the path for precision cancer immunotherapy. Front Immunol 2024; 15:1394003. [PMID: 38868767 PMCID: PMC11167095 DOI: 10.3389/fimmu.2024.1394003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/13/2024] [Indexed: 06/14/2024] Open
Abstract
Cancer immunotherapy has witnessed rapid advancement in recent years, with a particular focus on neoantigens as promising targets for personalized treatments. The convergence of immunogenomics, bioinformatics, and artificial intelligence (AI) has propelled the development of innovative neoantigen discovery tools and pipelines. These tools have revolutionized our ability to identify tumor-specific antigens, providing the foundation for precision cancer immunotherapy. AI-driven algorithms can process extensive amounts of data, identify patterns, and make predictions that were once challenging to achieve. However, the integration of AI comes with its own set of challenges, leaving space for further research. With particular focus on the computational approaches, in this article we have explored the current landscape of neoantigen prediction, the fundamental concepts behind, the challenges and their potential solutions providing a comprehensive overview of this rapidly evolving field.
Collapse
Affiliation(s)
- Alla Bulashevska
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Zsófia Nacsa
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Franziska Lang
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, Mainz, Germany
| | - Markus Braun
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Martin Machyna
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Mustafa Diken
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, Mainz, Germany
| | - Liam Childs
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Renate König
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| |
Collapse
|
8
|
Leary AY, Scott D, Gupta NT, Waite JC, Skokos D, Atwal GS, Hawkins PG. Designing meaningful continuous representations of T cell receptor sequences with deep generative models. Nat Commun 2024; 15:4271. [PMID: 38769289 PMCID: PMC11106309 DOI: 10.1038/s41467-024-48198-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 04/24/2024] [Indexed: 05/22/2024] Open
Abstract
T Cell Receptor (TCR) antigen binding underlies a key mechanism of the adaptive immune response yet the vast diversity of TCRs and the complexity of protein interactions limits our ability to build useful low dimensional representations of TCRs. To address the current limitations in TCR analysis we develop a capacity-controlled disentangling variational autoencoder trained using a dataset of approximately 100 million TCR sequences, that we name TCR-VALID. We design TCR-VALID such that the model representations are low-dimensional, continuous, disentangled, and sufficiently informative to provide high-quality TCR sequence de novo generation. We thoroughly quantify these properties of the representations, providing a framework for future protein representation learning in low dimensions. The continuity of TCR-VALID representations allows fast and accurate TCR clustering and is benchmarked against other state-of-the-art TCR clustering tools and pre-trained language models.
Collapse
Affiliation(s)
- Allen Y Leary
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA.
| | - Darius Scott
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Namita T Gupta
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Janelle C Waite
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Dimitris Skokos
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Gurinder S Atwal
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Peter G Hawkins
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA.
| |
Collapse
|
9
|
Jiang M, Yu Z, Lan X. VitTCR: A deep learning method for peptide recognition prediction. iScience 2024; 27:109770. [PMID: 38711451 PMCID: PMC11070698 DOI: 10.1016/j.isci.2024.109770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 01/21/2024] [Accepted: 04/15/2024] [Indexed: 05/08/2024] Open
Abstract
This study introduces VitTCR, a predictive model based on the vision transformer (ViT) architecture, aimed at identifying interactions between T cell receptors (TCRs) and peptides, crucial for developing cancer immunotherapies and vaccines. VitTCR converts TCR-peptide interactions into numerical AtchleyMaps using Atchley factors for prediction, achieving AUROC (0.6485) and AUPR (0.6295) values. Benchmark analysis indicates VitTCR's performance is comparable to other models, with further comparative studies suggested to understand its effectiveness in varied contexts. Additionally, integrating a positional bias weight matrix (PBWM), derived from amino acid contact probabilities in structurally resolved pMHC-TCR complexes, slightly improves VitTCR's accuracy. The model's predictions show weak yet statistically significant correlations with immunological factors like T cell clonal expansion and activation percentages, underscoring the biological relevance of VitTCR's predictive capabilities. VitTCR emerges as a valuable computational tool for predicting TCR-peptide interactions, offering insights for immunotherapy and vaccine development.
Collapse
Affiliation(s)
- Mengnan Jiang
- School of Medicine, Tsinghua University, Beijing 100084, China
| | - Zilan Yu
- School of Medicine, Tsinghua University, Beijing 100084, China
- Centre for Life Sciences, Tsinghua University, Beijing 100084, China
| | - Xun Lan
- School of Medicine, Tsinghua University, Beijing 100084, China
- Centre for Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, MOE Key Laboratory of Tsinghua University, Beijing, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| |
Collapse
|
10
|
Gao Y, Dong K, Gao Y, Jin X, Yang J, Yan G, Liu Q. Unified cross-modality integration and analysis of T cell receptors and T cell transcriptomes by low-resource-aware representation learning. CELL GENOMICS 2024; 4:100553. [PMID: 38688285 PMCID: PMC11099349 DOI: 10.1016/j.xgen.2024.100553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/09/2024] [Accepted: 04/06/2024] [Indexed: 05/02/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) and T cell receptor sequencing (TCR-seq) are pivotal for investigating T cell heterogeneity. Integrating these modalities, which is expected to uncover profound insights in immunology that might otherwise go unnoticed with a single modality, faces computational challenges due to the low-resource characteristics of the multimodal data. Herein, we present UniTCR, a novel low-resource-aware multimodal representation learning framework designed for the unified cross-modality integration, enabling comprehensive T cell analysis. By designing a dual-modality contrastive learning module and a single-modality preservation module to effectively embed each modality into a common latent space, UniTCR demonstrates versatility in connecting TCR sequences with T cell transcriptomes across various tasks, including single-modality analysis, modality gap analysis, epitope-TCR binding prediction, and TCR profile cross-modality generation, in a low-resource-aware way. Extensive evaluations conducted on multiple scRNA-seq/TCR-seq paired datasets showed the superior performance of UniTCR, exhibiting the ability of exploring the complexity of immune system.
Collapse
Affiliation(s)
- Yicheng Gao
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Kejing Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Yuli Gao
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Xuan Jin
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Jingya Yang
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
| | - Gang Yan
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China; Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China.
| |
Collapse
|
11
|
Eskandari A, Leow TC, Rahman MBA, Oslan SN. Advances in Therapeutic Cancer Vaccines, Their Obstacles, and Prospects Toward Tumor Immunotherapy. Mol Biotechnol 2024:10.1007/s12033-024-01144-3. [PMID: 38625508 DOI: 10.1007/s12033-024-01144-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 03/15/2024] [Indexed: 04/17/2024]
Abstract
Over the past few decades, cancer immunotherapy has experienced a significant revolution due to the advancements in immune checkpoint inhibitors (ICIs) and adoptive cell therapies (ACTs), along with their regulatory approvals. In recent times, there has been hope in the effectiveness of cancer vaccines for therapy as they have been able to stimulate de novo T-cell reactions against tumor antigens. These tumor antigens include both tumor-associated antigen (TAA) and tumor-specific antigen (TSA). Nevertheless, the constant quest to fully achieve these abilities persists. Therefore, this review offers a broad perspective on the existing status of cancer immunizations. Cancer vaccine design has been revolutionized due to the advancements made in antigen selection, the development of antigen delivery systems, and a deeper understanding of the strategic intricacies involved in effective antigen presentation. In addition, this review addresses the present condition of clinical tests and deliberates on their approaches, with a particular emphasis on the immunogenicity specific to tumors and the evaluation of effectiveness against tumors. Nevertheless, the ongoing clinical endeavors to create cancer vaccines have failed to produce remarkable clinical results as a result of substantial obstacles, such as the suppression of the tumor immune microenvironment, the identification of suitable candidates, the assessment of immune responses, and the acceleration of vaccine production. Hence, there are possibilities for the industry to overcome challenges and enhance patient results in the coming years. This can be achieved by recognizing the intricate nature of clinical issues and continuously working toward surpassing existing limitations.
Collapse
Affiliation(s)
- Azadeh Eskandari
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia.
- Department of Biochemistry, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia.
| | - Thean Chor Leow
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Enzyme Technology and X-ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| | | | - Siti Nurbaya Oslan
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Department of Biochemistry, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Enzyme Technology and X-ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| |
Collapse
|
12
|
Croce G, Bobisse S, Moreno DL, Schmidt J, Guillame P, Harari A, Gfeller D. Deep learning predictions of TCR-epitope interactions reveal epitope-specific chains in dual alpha T cells. Nat Commun 2024; 15:3211. [PMID: 38615042 PMCID: PMC11016097 DOI: 10.1038/s41467-024-47461-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 04/03/2024] [Indexed: 04/15/2024] Open
Abstract
T cells have the ability to eliminate infected and cancer cells and play an essential role in cancer immunotherapy. T cell activation is elicited by the binding of the T cell receptor (TCR) to epitopes displayed on MHC molecules, and the TCR specificity is determined by the sequence of its α and β chains. Here, we collect and curate a dataset of 17,715 αβTCRs interacting with dozens of class I and class II epitopes. We use this curated data to develop MixTCRpred, an epitope-specific TCR-epitope interaction predictor. MixTCRpred accurately predicts TCRs recognizing several viral and cancer epitopes. MixTCRpred further provides a useful quality control tool for multiplexed single-cell TCR sequencing assays of epitope-specific T cells and pinpoints a substantial fraction of putative contaminants in public databases. Analysis of epitope-specific dual α T cells demonstrates that MixTCRpred can identify α chains mediating epitope recognition. Applying MixTCRpred to TCR repertoires from COVID-19 patients reveals enrichment of clonotypes predicted to bind an immunodominant SARS-CoV-2 epitope. Overall, MixTCRpred provides a robust tool to predict TCRs interacting with specific epitopes and interpret TCR-sequencing data from both bulk and epitope-specific T cells.
Collapse
Affiliation(s)
- Giancarlo Croce
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Sara Bobisse
- Agora Cancer Research Centre, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University Hospital of Lausanne, Lausanne, Switzerland
| | - Dana Léa Moreno
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Julien Schmidt
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University Hospital of Lausanne, Lausanne, Switzerland
| | - Philippe Guillame
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University Hospital of Lausanne, Lausanne, Switzerland
| | - Alexandre Harari
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University Hospital of Lausanne, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
- Agora Cancer Research Centre, Lausanne, Switzerland.
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland.
| |
Collapse
|
13
|
Ji H, Wang XX, Zhang Q, Zhang C, Zhang HM. Predicting TCR sequences for unseen antigen epitopes using structural and sequence features. Brief Bioinform 2024; 25:bbae210. [PMID: 38711371 PMCID: PMC11074592 DOI: 10.1093/bib/bbae210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/04/2024] [Accepted: 04/22/2024] [Indexed: 05/08/2024] Open
Abstract
T-cell receptor (TCR) recognition of antigens is fundamental to the adaptive immune response. With the expansion of experimental techniques, a substantial database of matched TCR-antigen pairs has emerged, presenting opportunities for computational prediction models. However, accurately forecasting the binding affinities of unseen antigen-TCR pairs remains a major challenge. Here, we present convolutional-self-attention TCR (CATCR), a novel framework tailored to enhance the prediction of epitope and TCR interactions. Our approach utilizes convolutional neural networks to extract peptide features from residue contact matrices, as generated by OpenFold, and a transformer to encode segment-based coded sequences. We introduce CATCR-D, a discriminator that can assess binding by analyzing the structural and sequence features of epitopes and CDR3-β regions. Additionally, the framework comprises CATCR-G, a generative module designed for CDR3-β sequences, which applies the pretrained encoder to deduce epitope characteristics and a transformer decoder for predicting matching CDR3-β sequences. CATCR-D achieved an AUROC of 0.89 on previously unseen epitope-TCR pairs and outperformed four benchmark models by a margin of 17.4%. CATCR-G has demonstrated high precision, recall and F1 scores, surpassing 95% in bidirectional encoder representations from transformers score assessments. Our results indicate that CATCR is an effective tool for predicting unseen epitope-TCR interactions. Incorporating structural insights enhances our understanding of the general rules governing TCR-epitope recognition significantly. The ability to predict TCRs for novel epitopes using structural and sequence information is promising, and broadening the repository of experimental TCR-epitope data could further improve the precision of epitope-TCR binding predictions.
Collapse
MESH Headings
- Receptors, Antigen, T-Cell/chemistry
- Receptors, Antigen, T-Cell/immunology
- Receptors, Antigen, T-Cell/metabolism
- Receptors, Antigen, T-Cell/genetics
- Humans
- Epitopes/chemistry
- Epitopes/immunology
- Computational Biology/methods
- Neural Networks, Computer
- Epitopes, T-Lymphocyte/immunology
- Epitopes, T-Lymphocyte/chemistry
- Antigens/chemistry
- Antigens/immunology
- Amino Acid Sequence
Collapse
Affiliation(s)
- Hongchen Ji
- Department of Oncology of Xijing Hospital, Air Force Medical University, Xi’an, Shaanxi, China
| | - Xiang-Xu Wang
- Department of Oncology of Xijing Hospital, Air Force Medical University, Xi’an, Shaanxi, China
| | - Qiong Zhang
- Department of Oncology of Xijing Hospital, Air Force Medical University, Xi’an, Shaanxi, China
| | - Chengkai Zhang
- Department of Oncology of Xijing Hospital, Air Force Medical University, Xi’an, Shaanxi, China
| | - Hong-Mei Zhang
- Department of Oncology of Xijing Hospital, Air Force Medical University, Xi’an, Shaanxi, China
| |
Collapse
|
14
|
Hudson D, Lubbock A, Basham M, Koohy H. A comparison of clustering models for inference of T cell receptor antigen specificity. IMMUNOINFORMATICS (AMSTERDAM, NETHERLANDS) 2024; 13:None. [PMID: 38525047 PMCID: PMC10955519 DOI: 10.1016/j.immuno.2024.100033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 01/18/2024] [Accepted: 01/23/2024] [Indexed: 03/26/2024]
Abstract
The vast potential sequence diversity of TCRs and their ligands has presented an historic barrier to computational prediction of TCR epitope specificity, a holy grail of quantitative immunology. One common approach is to cluster sequences together, on the assumption that similar receptors bind similar epitopes. Here, we provide the first independent evaluation of widely used clustering algorithms for TCR specificity inference, observing some variability in predictive performance between models, and marked differences in scalability. Despite these differences, we find that different algorithms produce clusters with high degrees of similarity for receptors recognising the same epitope. Our analysis strengthens the case for use of clustering models to identify signals of common specificity from large repertoires, whilst highlighting scope for improvement of complex models over simple comparators.
Collapse
Affiliation(s)
- Dan Hudson
- MRC Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
- The Rosalind Franklin Institute, Didcot, UK
| | | | | | - Hashem Koohy
- MRC Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
- Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
- Alan Turning Fellow in Health and Medicine, UK
| |
Collapse
|
15
|
Xiong P, Liang A, Cai X, Xia T. APTAnet: an atom-level peptide-TCR interaction affinity prediction model. BIOPHYSICS REPORTS 2024; 10:1-14. [PMID: 38737473 PMCID: PMC11079603 DOI: 10.52601/bpr.2023.230037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 01/26/2024] [Indexed: 05/14/2024] Open
Abstract
The prediction of affinity between TCRs and peptides is crucial for the further development of TIL (Tumor-Infiltrating Lymphocytes) immunotherapy. Inspired by the broader research of drug-protein interaction (DPI), we propose an atom-level peptide-TCR interaction (PTI) affinity prediction model APTAnet using natural language processing methods. APTAnet model achieved an average ROC-AUC and PR-AUC of 0.893 and 0.877, respectively, in ten-fold cross-validation on 25,675 pairs of PTI data. Furthermore, experimental results on an independent test set from the McPAS database showed that APTAnet outperformed the current mainstream models. Finally, through the validation on 11 cases of real tumor patient data, we found that the APTAnet model can effectively identify tumor peptides and screen tumor-specific TCRs.
Collapse
Affiliation(s)
- Peng Xiong
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Anyi Liang
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Xunhui Cai
- Institute of Pathology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Tian Xia
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
- Institute of Pathology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| |
Collapse
|
16
|
Derraz B, Breda G, Kaempf C, Baenke F, Cotte F, Reiche K, Köhl U, Kather JN, Eskenazy D, Gilbert S. New regulatory thinking is needed for AI-based personalised drug and cell therapies in precision oncology. NPJ Precis Oncol 2024; 8:23. [PMID: 38291217 PMCID: PMC10828509 DOI: 10.1038/s41698-024-00517-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 01/06/2024] [Indexed: 02/01/2024] Open
Abstract
Until recently the application of artificial intelligence (AI) in precision oncology was confined to activities in drug development and had limited impact on the personalisation of therapy. Now, a number of approaches have been proposed for the personalisation of drug and cell therapies with AI applied to therapy design, planning and delivery at the patient's bedside. Some drug and cell-based therapies are already tuneable to the individual to optimise efficacy, to reduce toxicity, to adapt the dosing regime, to design combination therapy approaches and, preclinically, even to personalise the receptor design of cell therapies. Developments in AI-based healthcare are accelerating through the adoption of foundation models, and generalist medical AI models have been proposed. The application of these approaches in therapy design is already being explored and realistic short-term advances include the application to the personalised design and delivery of drugs and cell therapies. With this pace of development, the limiting step to adoption will likely be the capacity and appropriateness of regulatory frameworks. This article explores emerging concepts and new ideas for the regulation of AI-enabled personalised cancer therapies in the context of existing and in development governance frameworks.
Collapse
Affiliation(s)
- Bouchra Derraz
- ProductLife Group, Paris, France
- Groupe de recherche et d'accueil en droit et économie de la santé (GRADES), Faculty of Pharmacy, University Paris-Saclay, Paris, France
| | | | - Christoph Kaempf
- Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany
| | - Franziska Baenke
- Carl Gustav Carus University Hospital Dresden, Dresden University of Technology, Dresden, Germany
| | - Fabienne Cotte
- Department of Emergency Medicine, University Clinic Marburg, Philipps-University, Marburg, Germany
| | - Kristin Reiche
- Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Germany
- Institute of Clinical Immunology, University Leipzig, Leipzig, Germany
| | - Ulrike Köhl
- Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany
- Institute of Clinical Immunology, University Leipzig, Leipzig, Germany
| | - Jakob Nikolas Kather
- Carl Gustav Carus University Hospital Dresden, Dresden University of Technology, Dresden, Germany
- Else Kröner Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany
| | - Deborah Eskenazy
- Groupe de recherche et d'accueil en droit et économie de la santé (GRADES), Faculty of Pharmacy, University Paris-Saclay, Paris, France
| | - Stephen Gilbert
- Carl Gustav Carus University Hospital Dresden, Dresden University of Technology, Dresden, Germany.
- Else Kröner Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany.
| |
Collapse
|
17
|
Bravi B. Development and use of machine learning algorithms in vaccine target selection. NPJ Vaccines 2024; 9:15. [PMID: 38242890 PMCID: PMC10798987 DOI: 10.1038/s41541-023-00795-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 12/07/2023] [Indexed: 01/21/2024] Open
Abstract
Computer-aided discovery of vaccine targets has become a cornerstone of rational vaccine design. In this article, I discuss how Machine Learning (ML) can inform and guide key computational steps in rational vaccine design concerned with the identification of B and T cell epitopes and correlates of protection. I provide examples of ML models, as well as types of data and predictions for which they are built. I argue that interpretable ML has the potential to improve the identification of immunogens also as a tool for scientific discovery, by helping elucidate the molecular processes underlying vaccine-induced immune responses. I outline the limitations and challenges in terms of data availability and method development that need to be addressed to bridge the gap between advances in ML predictions and their translational application to vaccine design.
Collapse
Affiliation(s)
- Barbara Bravi
- Department of Mathematics, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
18
|
Barra C, Nilsson JB, Saksager A, Carri I, Deleuran S, Garcia Alvarez HM, Høie MH, Li Y, Clifford JN, Wan YTR, Moreta LS, Nielsen M. In Silico Tools for Predicting Novel Epitopes. Methods Mol Biol 2024; 2813:245-280. [PMID: 38888783 DOI: 10.1007/978-1-0716-3890-3_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024]
Abstract
Identifying antigens within a pathogen is a critical task to develop effective vaccines and diagnostic methods, as well as understanding the evolution and adaptation to host immune responses. Historically, antigenicity was studied with experiments that evaluate the immune response against selected fragments of pathogens. Using this approach, the scientific community has gathered abundant information regarding which pathogenic fragments are immunogenic. The systematic collection of this data has enabled unraveling many of the fundamental rules underlying the properties defining epitopes and immunogenicity, and has resulted in the creation of a large panel of immunologically relevant predictive (in silico) tools. The development and application of such tools have proven to accelerate the identification of novel epitopes within biomedical applications reducing experimental costs. This chapter introduces some basic concepts about MHC presentation, T cell and B cell epitopes, the experimental efforts to determine those, and focuses on state-of-the-art methods for epitope prediction, highlighting their strengths and limitations, and catering instructions for their rational use.
Collapse
Affiliation(s)
- Carolina Barra
- Section for Bioinformatics, Health Tech, Technical University of Denmark, Lyngby, Denmark.
| | | | - Astrid Saksager
- Section for Bioinformatics, Health Tech, Technical University of Denmark, Lyngby, Denmark
| | - Ibel Carri
- Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín (UNSAM) - Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), San Martín, Argentina
| | - Sebastian Deleuran
- Section for Bioinformatics, Health Tech, Technical University of Denmark, Lyngby, Denmark
| | - Heli M Garcia Alvarez
- Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín (UNSAM) - Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), San Martín, Argentina
| | - Magnus Haraldson Høie
- Section for Bioinformatics, Health Tech, Technical University of Denmark, Lyngby, Denmark
| | - Yuchen Li
- Section for Bioinformatics, Health Tech, Technical University of Denmark, Lyngby, Denmark
| | | | - Yat-Tsai Richie Wan
- Section for Bioinformatics, Health Tech, Technical University of Denmark, Lyngby, Denmark
| | - Lys Sanz Moreta
- Section for Bioinformatics, Health Tech, Technical University of Denmark, Lyngby, Denmark
| | - Morten Nielsen
- Section for Bioinformatics, Health Tech, Technical University of Denmark, Lyngby, Denmark
- Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín (UNSAM) - Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), San Martín, Argentina
| |
Collapse
|
19
|
Chen J, Zhao B, Lin S, Sun H, Mao X, Wang M, Chu Y, Hong L, Wei D, Li M, Xiong Y. TEPCAM: Prediction of T-cell receptor-epitope binding specificity via interpretable deep learning. Protein Sci 2024; 33:e4841. [PMID: 37983648 PMCID: PMC10731497 DOI: 10.1002/pro.4841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/11/2023] [Accepted: 11/16/2023] [Indexed: 11/22/2023]
Abstract
The recognition of T-cell receptor (TCR) on the surface of T cell to specific epitope presented by the major histocompatibility complex is the key to trigger the immune response. Identifying the binding rules of TCR-epitope pair is crucial for developing immunotherapies, including neoantigen vaccine and drugs. Accurate prediction of TCR-epitope binding specificity via deep learning remains challenging, especially in test cases which are unseen in the training set. Here, we propose TEPCAM (TCR-EPitope identification based on Cross-Attention and Multi-channel convolution), a deep learning model that incorporates self-attention, cross-attention mechanism, and multi-channel convolution to improve the generalizability and enhance the model interpretability. Experimental results demonstrate that our model outperformed several state-of-the-art models on two challenging tasks including a strictly split dataset and an external dataset. Furthermore, the model can learn some interaction patterns between TCR and epitope by extracting the interpretable matrix from cross-attention layer and mapping them to the three-dimensional structures. The source code and data are freely available at https://github.com/Chenjw99/TEPCAM.
Collapse
Affiliation(s)
- Junwei Chen
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Bowen Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Heqi Sun
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Meng Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and EngineeringCentral South UniversityChangshaChina
| | - Yanyi Chu
- Department of PathologyStanford University School of MedicineStandfordCaliforniaUSA
| | - Liang Hong
- Institute of Natural Sciences, Shanghai Jiao Tong UniversityShanghaiChina
- Artificial Intelligence Biomedical Center, Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong UniversityShanghaiChina
| | - Dong‐Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and EngineeringCentral South UniversityChangshaChina
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
- Artificial Intelligence Biomedical Center, Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong UniversityShanghaiChina
| |
Collapse
|
20
|
Li X, You J, Hong L, Liu W, Guo P, Hao X. Neoantigen cancer vaccines: a new star on the horizon. Cancer Biol Med 2023; 21:j.issn.2095-3941.2023.0395. [PMID: 38164734 PMCID: PMC11033713 DOI: 10.20892/j.issn.2095-3941.2023.0395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 11/22/2023] [Indexed: 01/03/2024] Open
Abstract
Immunotherapy represents a promising strategy for cancer treatment that utilizes immune cells or drugs to activate the patient's own immune system and eliminate cancer cells. One of the most exciting advances within this field is the targeting of neoantigens, which are peptides derived from non-synonymous somatic mutations that are found exclusively within cancer cells and absent in normal cells. Although neoantigen-based therapeutic vaccines have not received approval for standard cancer treatment, early clinical trials have yielded encouraging outcomes as standalone monotherapy or when combined with checkpoint inhibitors. Progress made in high-throughput sequencing and bioinformatics have greatly facilitated the precise and efficient identification of neoantigens. Consequently, personalized neoantigen-based vaccines tailored to each patient have been developed that are capable of eliciting a robust and long-lasting immune response which effectively eliminates tumors and prevents recurrences. This review provides a concise overview consolidating the latest clinical advances in neoantigen-based therapeutic vaccines, and also discusses challenges and future perspectives for this innovative approach, particularly emphasizing the potential of neoantigen-based therapeutic vaccines to enhance clinical efficacy against advanced solid tumors.
Collapse
Affiliation(s)
- Xiaoling Li
- Cell Biotechnology Laboratory, Tianjin Cancer Hospital Airport Hospital, Tianjin 300308, China
- National Clinical Research Center for Cancer, Tianjin 300060, China
- Haihe Laboratory of Synthetic Biology, Tianjin 300090, China
| | - Jian You
- Department of Thoracic Oncology, Tianjin Cancer Hospital Airport Hospital, Tianjin 300308, China
- Department of Thoracic Oncology Surgery, Tianjin Medical University Cancer Institute & Hospital, Tianjin 300060, China
| | - Liping Hong
- Cell Biotechnology Laboratory, Tianjin Cancer Hospital Airport Hospital, Tianjin 300308, China
- National Clinical Research Center for Cancer, Tianjin 300060, China
- Haihe Laboratory of Synthetic Biology, Tianjin 300090, China
| | - Weijiang Liu
- Cell Biotechnology Laboratory, Tianjin Cancer Hospital Airport Hospital, Tianjin 300308, China
- National Clinical Research Center for Cancer, Tianjin 300060, China
- Haihe Laboratory of Synthetic Biology, Tianjin 300090, China
| | - Peng Guo
- Cell Biotechnology Laboratory, Tianjin Cancer Hospital Airport Hospital, Tianjin 300308, China
- National Clinical Research Center for Cancer, Tianjin 300060, China
- Haihe Laboratory of Synthetic Biology, Tianjin 300090, China
| | - Xishan Hao
- Cell Biotechnology Laboratory, Tianjin Cancer Hospital Airport Hospital, Tianjin 300308, China
- National Clinical Research Center for Cancer, Tianjin 300060, China
- Haihe Laboratory of Synthetic Biology, Tianjin 300090, China
- Tianjin Medical University Cancer Institute & Hospital, Tianjin 300060, China
| |
Collapse
|
21
|
Textor J, Buytenhuijs F, Rogers D, Gauthier ÈM, Sultan S, Wortel IMN, Kalies K, Fähnrich A, Pagel R, Melichar HJ, Westermann J, Mandl JN. Machine learning analysis of the T cell receptor repertoire identifies sequence features of self-reactivity. Cell Syst 2023; 14:1059-1073.e5. [PMID: 38061355 DOI: 10.1016/j.cels.2023.11.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 09/01/2023] [Accepted: 11/09/2023] [Indexed: 12/23/2023]
Abstract
The T cell receptor (TCR) determines specificity and affinity for both foreign and self-peptides presented by the major histocompatibility complex (MHC). Although the strength of TCR interactions with self-pMHC impacts T cell function, it has been challenging to identify TCR sequence features that predict T cell fate. To discern patterns distinguishing TCRs from naive CD4+ T cells with low versus high self-reactivity, we used data from 42 mice to train a machine learning (ML) algorithm that identifies population-level differences between TCRβ sequence sets. This approach revealed that weakly self-reactive T cell populations were enriched for longer CDR3β regions and acidic amino acids. We tested our ML predictions of self-reactivity using retrogenic mice with fixed TCRβ sequences. Extrapolating our analyses to independent datasets, we predicted high self-reactivity for regulatory T cells and slightly reduced self-reactivity for T cells responding to chronic infections. Our analyses suggest a potential trade-off between TCR repertoire diversity and self-reactivity. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Johannes Textor
- Data Science Group, Institute for Computing and Information Sciences, Radboud University, Nijmegen 6525 EC, the Netherlands; Medical BioSciences, Radboudumc, Nijmegen 6525 GA, the Netherlands.
| | - Franka Buytenhuijs
- Data Science Group, Institute for Computing and Information Sciences, Radboud University, Nijmegen 6525 EC, the Netherlands
| | - Dakota Rogers
- Department of Physiology, McGill University, Montreal, QC H3G 0B1, Canada; McGill Research Centre on Complex Traits, McGill University, Montreal, QC H3G 0B1, Canada
| | - Ève Mallet Gauthier
- Immunology-Oncology Unit, Maisonneuve-Rosemont Hospital Research Center, Montreal, QC H1T 2M4, Canada; Department of Microbiology, Infectious Diseases, and Immunology, Université de Montréal, Montréal, QC H3T 1J4, Canada
| | - Shabaz Sultan
- Data Science Group, Institute for Computing and Information Sciences, Radboud University, Nijmegen 6525 EC, the Netherlands; Medical BioSciences, Radboudumc, Nijmegen 6525 GA, the Netherlands
| | - Inge M N Wortel
- Data Science Group, Institute for Computing and Information Sciences, Radboud University, Nijmegen 6525 EC, the Netherlands; Medical BioSciences, Radboudumc, Nijmegen 6525 GA, the Netherlands
| | - Kathrin Kalies
- Institut für Anatomie, Universität zu Lübeck, 23562 Lübeck, Germany
| | - Anke Fähnrich
- Institut für Anatomie, Universität zu Lübeck, 23562 Lübeck, Germany
| | - René Pagel
- Institut für Anatomie, Universität zu Lübeck, 23562 Lübeck, Germany
| | - Heather J Melichar
- Immunology-Oncology Unit, Maisonneuve-Rosemont Hospital Research Center, Montreal, QC H1T 2M4, Canada; Department of Medicine, Université de Montréal, Montréal, QC H1T 2M4, Canada; Department of Microbiology & Immunology, McGill University, Montreal, QC H3A 1A3, Canada; Rosalind and Morris Goodman Cancer Institute, McGill University, Montreal, QC H3A 1A3, Canada
| | | | - Judith N Mandl
- Department of Physiology, McGill University, Montreal, QC H3G 0B1, Canada; Department of Microbiology & Immunology, McGill University, Montreal, QC H3A 1A3, Canada; McGill Research Centre on Complex Traits, McGill University, Montreal, QC H3G 0B1, Canada.
| |
Collapse
|
22
|
Koyama K, Hashimoto K, Nagao C, Mizuguchi K. Attention network for predicting T-cell receptor-peptide binding can associate attention with interpretable protein structural properties. FRONTIERS IN BIOINFORMATICS 2023; 3:1274599. [PMID: 38170146 PMCID: PMC10759225 DOI: 10.3389/fbinf.2023.1274599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/27/2023] [Indexed: 01/05/2024] Open
Abstract
Understanding how a T-cell receptor (TCR) recognizes its specific ligand peptide is crucial for gaining an insight into biological functions and disease mechanisms. Despite its importance, experimentally determining TCR-peptide-major histocompatibility complex (TCR-pMHC) interactions is expensive and time-consuming. To address this challenge, computational methods have been proposed, but they are typically evaluated by internal retrospective validation only, and few researchers have incorporated and tested an attention layer from language models into structural information. Therefore, in this study, we developed a machine learning model based on a modified version of Transformer, a source-target attention neural network, to predict the TCR-pMHC interaction solely from the amino acid sequences of the TCR complementarity-determining region (CDR) 3 and the peptide. This model achieved competitive performance on a benchmark dataset of the TCR-pMHC interaction, as well as on a truly new external dataset. Additionally, by analyzing the results of binding predictions, we associated the neural network weights with protein structural properties. By classifying the residues into large- and small-attention groups, we identified statistically significant properties associated with the largely attended residues such as hydrogen bonds within CDR3. The dataset that we created and the ability of our model to provide an interpretable prediction of TCR-peptide binding should increase our knowledge about molecular recognition and pave the way for designing new therapeutics.
Collapse
Affiliation(s)
- Kyohei Koyama
- Laboratory for Computational Biology, Institute for Protein Research, Osaka University, Osaka, Japan
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
| | - Kosuke Hashimoto
- Laboratory for Computational Biology, Institute for Protein Research, Osaka University, Osaka, Japan
| | - Chioko Nagao
- Laboratory for Computational Biology, Institute for Protein Research, Osaka University, Osaka, Japan
| | - Kenji Mizuguchi
- Laboratory for Computational Biology, Institute for Protein Research, Osaka University, Osaka, Japan
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
| |
Collapse
|
23
|
Fan T, Zhang M, Yang J, Zhu Z, Cao W, Dong C. Therapeutic cancer vaccines: advancements, challenges, and prospects. Signal Transduct Target Ther 2023; 8:450. [PMID: 38086815 PMCID: PMC10716479 DOI: 10.1038/s41392-023-01674-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 09/08/2023] [Accepted: 09/19/2023] [Indexed: 12/18/2023] Open
Abstract
With the development and regulatory approval of immune checkpoint inhibitors and adoptive cell therapies, cancer immunotherapy has undergone a profound transformation over the past decades. Recently, therapeutic cancer vaccines have shown promise by eliciting de novo T cell responses targeting tumor antigens, including tumor-associated antigens and tumor-specific antigens. The objective was to amplify and diversify the intrinsic repertoire of tumor-specific T cells. However, the complete realization of these capabilities remains an ongoing pursuit. Therefore, we provide an overview of the current landscape of cancer vaccines in this review. The range of antigen selection, antigen delivery systems development the strategic nuances underlying effective antigen presentation have pioneered cancer vaccine design. Furthermore, this review addresses the current status of clinical trials and discusses their strategies, focusing on tumor-specific immunogenicity and anti-tumor efficacy assessment. However, current clinical attempts toward developing cancer vaccines have not yielded breakthrough clinical outcomes due to significant challenges, including tumor immune microenvironment suppression, optimal candidate identification, immune response evaluation, and vaccine manufacturing acceleration. Therefore, the field is poised to overcome hurdles and improve patient outcomes in the future by acknowledging these clinical complexities and persistently striving to surmount inherent constraints.
Collapse
Affiliation(s)
- Ting Fan
- Department of Oncology, East Hospital Affiliated to Tongji University, Tongji University School of Medicine, Shanghai, China
| | - Mingna Zhang
- Postgraduate Training Base, Shanghai East Hospital, Jinzhou Medical University, Shanghai, 200120, China
| | - Jingxian Yang
- Department of Oncology, East Hospital Affiliated to Tongji University, Tongji University School of Medicine, Shanghai, China
| | - Zhounan Zhu
- Department of Oncology, East Hospital Affiliated to Tongji University, Tongji University School of Medicine, Shanghai, China
| | - Wanlu Cao
- Department of Oncology, East Hospital Affiliated to Tongji University, Tongji University School of Medicine, Shanghai, China.
| | - Chunyan Dong
- Department of Oncology, East Hospital Affiliated to Tongji University, Tongji University School of Medicine, Shanghai, China.
| |
Collapse
|
24
|
Zhao M, Xu SX, Yang Y, Yuan M. GGNpTCR: A Generative Graph Structure Neural Network for Predicting Immunogenic Peptides for T-cell Immune Response. J Chem Inf Model 2023; 63:7557-7567. [PMID: 37990917 DOI: 10.1021/acs.jcim.3c01293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Identifying the interactions between T-cell receptor (TCRs) and human antigens is a crucial step in developing new vaccines, diagnostics, and immunotherapy. Current methods primarily focus on learning binding patterns from known TCR binding repertoires by using sequence information alone without considering the binding specificity of new antigens or exogenous peptides that have not appeared in the training set. Furthermore, the spatial structure of antigens plays a critical role in immune studies and immunotherapy, which should be addressed properly in the identification of interacting TCR-antigen pairs. In this study, we introduced a novel deep learning framework based on generative graph structures, GGNpTCR, for predicting interactions between TCR and peptides from sequence information. Results of real data analysis indicate that our model achieved excellent prediction for new antigens unseen in the training data set, making significant improvements compared to existing methods. We also applied the model to a large COVID-19 data set with no antigens in the training data set, and the improvement was also significant. Furthermore, through incorporation of additional supervised mechanisms, GGNpTCR demonstrated the ability to precisely forecast the locations of peptide-TCR interactions within 3D configurations. This enhancement substantially improved the model's interpretability. In summary, based on the performance on multiple data sets, GGNpTCR has made significant progress in terms of performance, universality, and interpretability.
Collapse
Affiliation(s)
- Minghua Zhao
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Steven X Xu
- Genmab US, Inc., Princeton, New Jersey 08540, United States
| | - Yaning Yang
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Min Yuan
- School of Public Health Administration, Anhui Medical University, Hefei 230032, China
| |
Collapse
|
25
|
Korpela D, Jokinen E, Dumitrescu A, Huuhtanen J, Mustjoki S, Lähdesmäki H. EPIC-TRACE: predicting TCR binding to unseen epitopes using attention and contextualized embeddings. Bioinformatics 2023; 39:btad743. [PMID: 38070156 PMCID: PMC10963061 DOI: 10.1093/bioinformatics/btad743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 11/20/2023] [Accepted: 12/07/2023] [Indexed: 12/21/2023] Open
Abstract
MOTIVATION T cells play an essential role in adaptive immune system to fight pathogens and cancer but may also give rise to autoimmune diseases. The recognition of a peptide-MHC (pMHC) complex by a T cell receptor (TCR) is required to elicit an immune response. Many machine learning models have been developed to predict the binding, but generalizing predictions to pMHCs outside the training data remains challenging. RESULTS We have developed a new machine learning model that utilizes information about the TCR from both α and β chains, epitope sequence, and MHC. Our method uses ProtBERT embeddings for the amino acid sequences of both chains and the epitope, as well as convolution and multi-head attention architectures. We show the importance of each input feature as well as the benefit of including epitopes with only a few TCRs to the training data. We evaluate our model on existing databases and show that it compares favorably against other state-of-the-art models. AVAILABILITY AND IMPLEMENTATION https://github.com/DaniTheOrange/EPIC-TRACE.
Collapse
Affiliation(s)
- Dani Korpela
- Department of Computer Science, Aalto University, 02150 Espoo, Finland
| | - Emmi Jokinen
- Department of Computer Science, Aalto University, 02150 Espoo, Finland
- Translational Immunology Research Program, Department of Clinical Chemistry and Hematology, University of Helsinki, 00290 Helsinki, Finland
- Hematology Research Unit Helsinki, Helsinki University Hospital Comprehensive Cancer Center, 00290 Helsinki, Finland
| | | | - Jani Huuhtanen
- Translational Immunology Research Program, Department of Clinical Chemistry and Hematology, University of Helsinki, 00290 Helsinki, Finland
- Hematology Research Unit Helsinki, Helsinki University Hospital Comprehensive Cancer Center, 00290 Helsinki, Finland
| | - Satu Mustjoki
- Translational Immunology Research Program, Department of Clinical Chemistry and Hematology, University of Helsinki, 00290 Helsinki, Finland
- Hematology Research Unit Helsinki, Helsinki University Hospital Comprehensive Cancer Center, 00290 Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University, 02150 Espoo, Finland
| |
Collapse
|
26
|
Khan AR, Reinders MJT, Khatri I. Determining epitope specificity of T-cell receptors with transformers. Bioinformatics 2023; 39:btad632. [PMID: 37847663 PMCID: PMC10636277 DOI: 10.1093/bioinformatics/btad632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 09/09/2023] [Accepted: 10/16/2023] [Indexed: 10/19/2023] Open
Abstract
SUMMARY T-cell receptors (TCRs) on T cells recognize and bind to epitopes presented by the major histocompatibility complex in case of an infection or cancer. However, the high diversity of TCRs, as well as their unique and complex binding mechanisms underlying epitope recognition, make it difficult to predict the binding between TCRs and epitopes. Here, we present the utility of transformers, a deep learning strategy that incorporates an attention mechanism that learns the informative features, and show that these models pre-trained on a large set of protein sequences outperform current strategies. We compared three pre-trained auto-encoder transformer models (ProtBERT, ProtAlbert, and ProtElectra) and one pre-trained auto-regressive transformer model (ProtXLNet) to predict the binding specificity of TCRs to 25 epitopes from the VDJdb database (human and murine). Two additional modifications were performed to incorporate gene usage of the TCRs in the four transformer models. Of all 12 transformer implementations (four models with three different modifications), a modified version of the ProtXLNet model could predict TCR-epitope pairs with the highest accuracy (weighted F1 score 0.55 simultaneously considering all 25 epitopes). The modification included additional features representing the gene names for the TCRs. We also showed that the basic implementation of transformers outperformed the previously available methods, i.e. TCRGP, TCRdist, and DeepTCR, developed for the same biological problem, especially for the hard-to-classify labels. We show that the proficiency of transformers in attention learning can be made operational in a complex biological setting like TCR binding prediction. Further ingenuity in utilizing the full potential of transformers, either through attention head visualization or introducing additional features, can extend T-cell research avenues. AVAILABILITY AND IMPLEMENTATION Data and code are available on https://github.com/InduKhatri/tcrformer.
Collapse
Affiliation(s)
- Abdul Rehman Khan
- Department of Intelligent Systems, Delft University of Technology, Delft 2600 GA, The Netherlands
| | - Marcel J T Reinders
- Department of Intelligent Systems, Delft University of Technology, Delft 2600 GA, The Netherlands
- Leiden Computational Biology Center, Department of Molecular Epidemiology, Leiden University Medical Center, Leiden 2333 ZA, The Netherlands
| | - Indu Khatri
- Leiden Computational Biology Center, Department of Molecular Epidemiology, Leiden University Medical Center, Leiden 2333 ZA, The Netherlands
- Department of Immunology, Leiden University Medical Center, Leiden 2333 ZA, The Netherlands
| |
Collapse
|
27
|
Montemurro A, Povlsen HR, Jessen LE, Nielsen M. Benchmarking data-driven filtering for denoising of TCRpMHC single-cell data. Sci Rep 2023; 13:16147. [PMID: 37752190 PMCID: PMC10522655 DOI: 10.1038/s41598-023-43048-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 09/18/2023] [Indexed: 09/28/2023] Open
Abstract
Pairing of the T cell receptor (TCR) with its cognate peptide-MHC (pMHC) is a cornerstone in T cell-mediated immunity. Recently, single-cell sequencing coupled with DNA-barcoded MHC multimer staining has enabled high-throughput studies of T cell specificities. However, the immense variability of TCR-pMHC interactions combined with the relatively low signal-to-noise ratio in the data generated using current technologies are complicating these studies. Several approaches have been proposed for denoising single-cell TCR-pMHC specificity data. Here, we present a benchmark evaluating two such denoising methods, ICON and ITRAP. We applied and evaluated the methods on publicly available immune profiling data provided by 10x Genomics. We find that both methods identified approximately 75% of the raw data as noise. We analyzed both internal metrics developed for the purpose and performance on independent data using machine learning methods trained on the raw and denoised 10x data. We find an increased signal-to-noise ratio comparing the denoised to the raw data for both methods, and demonstrate an overall superior performance of the ITRAP method in terms of both data consistency and performance. In conclusion, this study demonstrates that Improving the data quality from high throughput studies of TCRpMHC-specificity by denoising is paramount in increasing our understanding of T cell-mediated immunity.
Collapse
Affiliation(s)
- Alessandro Montemurro
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800, Kgs. Lyngby, Denmark
| | - Helle Rus Povlsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800, Kgs. Lyngby, Denmark
| | - Leon Eyrich Jessen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800, Kgs. Lyngby, Denmark
| | - Morten Nielsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800, Kgs. Lyngby, Denmark.
| |
Collapse
|
28
|
Fast E, Dhar M, Chen B. TAPIR: a T-cell receptor language model for predicting rare and novel targets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.12.557285. [PMID: 37745475 PMCID: PMC10515850 DOI: 10.1101/2023.09.12.557285] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
T-cell receptors (TCRs) are involved in most human diseases, but linking their sequences with their targets remains an unsolved grand challenge in the field. In this study, we present TAPIR (T-cell receptor and Peptide Interaction Recognizer), a T-cell receptor (TCR) language model that predicts TCR-target interactions, with a focus on novel and rare targets. TAPIR employs deep convolutional neural network (CNN) encoders to process TCR and target sequences across flexible representations (e.g., beta-chain only, unknown MHC allele, etc.) and learns patterns of interactivity via several training tasks. This flexibility allows TAPIR to train on more than 50k either paired (alpha and beta chain) or unpaired TCRs (just alpha or beta chain) from public and proprietary databases against 1933 unique targets. TAPIR demonstrates state-of-the-art performance when predicting TCR interactivity against common benchmark targets and is the first method to demonstrate strong performance when predicting TCR interactivity against novel targets, where no examples are provided in training. TAPIR is also capable of predicting TCR interaction against MHC alleles in the absence of target information. Leveraging these capabilities, we apply TAPIR to cancer patient TCR repertoires and identify and validate a novel and potent anti-cancer T-cell receptor against a shared cancer neoantigen target (PIK3CA H1047L). We further show how TAPIR, when extended with a generative neural network, is capable of directly designing T-cell receptor sequences that interact with a target of interest.
Collapse
Affiliation(s)
- Ethan Fast
- Vcreate, Inc., Menlo Park, CA, 94025, USA
| | | | | |
Collapse
|
29
|
Lee CH, Huh J, Buckley PR, Jang M, Pinho MP, Fernandes RA, Antanaviciute A, Simmons A, Koohy H. A robust deep learning workflow to predict CD8 + T-cell epitopes. Genome Med 2023; 15:70. [PMID: 37705109 PMCID: PMC10498576 DOI: 10.1186/s13073-023-01225-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 08/30/2023] [Indexed: 09/15/2023] Open
Abstract
BACKGROUND T-cells play a crucial role in the adaptive immune system by triggering responses against cancer cells and pathogens, while maintaining tolerance against self-antigens, which has sparked interest in the development of various T-cell-focused immunotherapies. However, the identification of antigens recognised by T-cells is low-throughput and laborious. To overcome some of these limitations, computational methods for predicting CD8 + T-cell epitopes have emerged. Despite recent developments, most immunogenicity algorithms struggle to learn features of peptide immunogenicity from small datasets, suffer from HLA bias and are unable to reliably predict pathology-specific CD8 + T-cell epitopes. METHODS We developed TRAP (T-cell recognition potential of HLA-I presented peptides), a robust deep learning workflow for predicting CD8 + T-cell epitopes from MHC-I presented pathogenic and self-peptides. TRAP uses transfer learning, deep learning architecture and MHC binding information to make context-specific predictions of CD8 + T-cell epitopes. TRAP also detects low-confidence predictions for peptides that differ significantly from those in the training datasets to abstain from making incorrect predictions. To estimate the immunogenicity of pathogenic peptides with low-confidence predictions, we further developed a novel metric, RSAT (relative similarity to autoantigens and tumour-associated antigens), as a complementary to 'dissimilarity to self' from cancer studies. RESULTS TRAP was used to identify epitopes from glioblastoma patients as well as SARS-CoV-2 peptides, and it outperformed other algorithms in both cancer and pathogenic settings. TRAP was especially effective at extracting immunogenicity-associated properties from restricted data of emerging pathogens and translating them onto related species, as well as minimising the loss of likely epitopes in imbalanced datasets. We also demonstrated that the novel metric termed RSAT was able to estimate immunogenic of pathogenic peptides of various lengths and species. TRAP implementation is available at: https://github.com/ChloeHJ/TRAP . CONCLUSIONS This study presents a novel computational workflow for accurately predicting CD8 + T-cell epitopes to foster a better understanding of antigen-specific T-cell response and the development of effective clinical therapeutics.
Collapse
Affiliation(s)
- Chloe H Lee
- MRC Human Immunology Unit, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK
| | - Jaesung Huh
- Visual Geometry Group, Department of Engineering Science, University of Oxford, Oxford, OX2 6NN, UK
| | - Paul R Buckley
- MRC Human Immunology Unit, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK
| | - Myeongjun Jang
- Intelligent Systems Lab, Department of Computer Science, University of Oxford, Oxford, OX1 3QG, UK
| | - Mariana Pereira Pinho
- MRC Human Immunology Unit, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK
| | - Ricardo A Fernandes
- Chinese Academy of Medical Sciences (CAMS) Oxford Institute (COI), University of Oxford, Oxford, OX3 7BN, UK
| | - Agne Antanaviciute
- MRC Human Immunology Unit, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK
| | - Alison Simmons
- MRC Human Immunology Unit, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK
- Translational Gastroenterology Unit, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Hashem Koohy
- MRC Human Immunology Unit, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK.
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK.
- Alan Turning Fellow in Health and Medicine, The Alan Turing Institute, London, UK.
| |
Collapse
|
30
|
Bravi B, Di Gioacchino A, Fernandez-de-Cossio-Diaz J, Walczak AM, Mora T, Cocco S, Monasson R. A transfer-learning approach to predict antigen immunogenicity and T-cell receptor specificity. eLife 2023; 12:e85126. [PMID: 37681658 PMCID: PMC10522340 DOI: 10.7554/elife.85126] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 09/07/2023] [Indexed: 09/09/2023] Open
Abstract
Antigen immunogenicity and the specificity of binding of T-cell receptors to antigens are key properties underlying effective immune responses. Here we propose diffRBM, an approach based on transfer learning and Restricted Boltzmann Machines, to build sequence-based predictive models of these properties. DiffRBM is designed to learn the distinctive patterns in amino-acid composition that, on the one hand, underlie the antigen's probability of triggering a response, and on the other hand the T-cell receptor's ability to bind to a given antigen. We show that the patterns learnt by diffRBM allow us to predict putative contact sites of the antigen-receptor complex. We also discriminate immunogenic and non-immunogenic antigens, antigen-specific and generic receptors, reaching performances that compare favorably to existing sequence-based predictors of antigen immunogenicity and T-cell receptor specificity.
Collapse
Affiliation(s)
- Barbara Bravi
- Department of Mathematics, Imperial College LondonLondonUnited Kingdom
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| | - Andrea Di Gioacchino
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| | - Jorge Fernandez-de-Cossio-Diaz
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| | - Aleksandra M Walczak
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| | - Thierry Mora
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| | - Simona Cocco
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| | - Rémi Monasson
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| |
Collapse
|
31
|
Ghoreyshi ZS, George JT. Quantitative approaches for decoding the specificity of the human T cell repertoire. Front Immunol 2023; 14:1228873. [PMID: 37781387 PMCID: PMC10539903 DOI: 10.3389/fimmu.2023.1228873] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 08/17/2023] [Indexed: 10/03/2023] Open
Abstract
T cell receptor (TCR)-peptide-major histocompatibility complex (pMHC) interactions play a vital role in initiating immune responses against pathogens, and the specificity of TCRpMHC interactions is crucial for developing optimized therapeutic strategies. The advent of high-throughput immunological and structural evaluation of TCR and pMHC has provided an abundance of data for computational approaches that aim to predict favorable TCR-pMHC interactions. Current models are constructed using information on protein sequence, structures, or a combination of both, and utilize a variety of statistical learning-based approaches for identifying the rules governing specificity. This review examines the current theoretical, computational, and deep learning approaches for identifying TCR-pMHC recognition pairs, placing emphasis on each method's mathematical approach, predictive performance, and limitations.
Collapse
Affiliation(s)
- Zahra S. Ghoreyshi
- Department of Biomedical Engineering, Texas A&M University, College Station, TX, United States
| | - Jason T. George
- Department of Biomedical Engineering, Texas A&M University, College Station, TX, United States
- Engineering Medicine Program, Texas A&M University, Houston, TX, United States
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States
| |
Collapse
|
32
|
Yang M, Huang ZA, Zhou W, Ji J, Zhang J, He S, Zhu Z. MIX-TPI: a flexible prediction framework for TCR-pMHC interactions based on multimodal representations. Bioinformatics 2023; 39:btad475. [PMID: 37527015 PMCID: PMC10423027 DOI: 10.1093/bioinformatics/btad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 07/05/2023] [Accepted: 07/29/2023] [Indexed: 08/03/2023] Open
Abstract
MOTIVATION The interactions between T-cell receptors (TCR) and peptide-major histocompatibility complex (pMHC) are essential for the adaptive immune system. However, identifying these interactions can be challenging due to the limited availability of experimental data, sequence data heterogeneity, and high experimental validation costs. RESULTS To address this issue, we develop a novel computational framework, named MIX-TPI, to predict TCR-pMHC interactions using amino acid sequences and physicochemical properties. Based on convolutional neural networks, MIX-TPI incorporates sequence-based and physicochemical-based extractors to refine the representations of TCR-pMHC interactions. Each modality is projected into modality-invariant and modality-specific representations to capture the uniformity and diversities between different features. A self-attention fusion layer is then adopted to form the classification module. Experimental results demonstrate the effectiveness of MIX-TPI in comparison with other state-of-the-art methods. MIX-TPI also shows good generalization capability on mutual exclusive evaluation datasets and a paired TCR dataset. AVAILABILITY AND IMPLEMENTATION The source code of MIX-TPI and the test data are available at: https://github.com/Wolverinerine/MIX-TPI.
Collapse
Affiliation(s)
- Minghao Yang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Zhi-An Huang
- Research Office, City University of Hong Kong (Dongguan), Dongguan 523000, China
| | - Wei Zhou
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Junkai Ji
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Jun Zhang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Shan He
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen 518060, China
| |
Collapse
|
33
|
Hudson D, Fernandes RA, Basham M, Ogg G, Koohy H. Can we predict T cell specificity with digital biology and machine learning? Nat Rev Immunol 2023; 23:511-521. [PMID: 36755161 PMCID: PMC9908307 DOI: 10.1038/s41577-023-00835-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/07/2022] [Indexed: 02/10/2023]
Abstract
Recent advances in machine learning and experimental biology have offered breakthrough solutions to problems such as protein structure prediction that were long thought to be intractable. However, despite the pivotal role of the T cell receptor (TCR) in orchestrating cellular immunity in health and disease, computational reconstruction of a reliable map from a TCR to its cognate antigens remains a holy grail of systems immunology. Current data sets are limited to a negligible fraction of the universe of possible TCR-ligand pairs, and performance of state-of-the-art predictive models wanes when applied beyond these known binders. In this Perspective article, we make the case for renewed and coordinated interdisciplinary effort to tackle the problem of predicting TCR-antigen specificity. We set out the general requirements of predictive models of antigen binding, highlight critical challenges and discuss how recent advances in digital biology such as single-cell technology and machine learning may provide possible solutions. Finally, we describe how predicting TCR specificity might contribute to our understanding of the broader puzzle of antigen immunogenicity.
Collapse
Affiliation(s)
- Dan Hudson
- MRC Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
- The Rosalind Franklin Institute, Didcot, UK
| | - Ricardo A Fernandes
- Chinese Academy of Medical Sciences Oxford Institute, University of Oxford, Oxford, UK
| | | | - Graham Ogg
- MRC Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
- Chinese Academy of Medical Sciences Oxford Institute, University of Oxford, Oxford, UK
| | - Hashem Koohy
- MRC Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK.
- Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK.
| |
Collapse
|
34
|
Myronov A, Mazzocco G, Król P, Plewczynski D. BERTrand-peptide:TCR binding prediction using Bidirectional Encoder Representations from Transformers augmented with random TCR pairing. Bioinformatics 2023; 39:btad468. [PMID: 37535685 PMCID: PMC10444968 DOI: 10.1093/bioinformatics/btad468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 06/28/2023] [Accepted: 08/01/2023] [Indexed: 08/05/2023] Open
Abstract
MOTIVATION The advent of T-cell receptor (TCR) sequencing experiments allowed for a significant increase in the amount of peptide:TCR binding data available and a number of machine-learning models appeared in recent years. High-quality prediction models for a fixed epitope sequence are feasible, provided enough known binding TCR sequences are available. However, their performance drops significantly for previously unseen peptides. RESULTS We prepare the dataset of known peptide:TCR binders and augment it with negative decoys created using healthy donors' T-cell repertoires. We employ deep learning methods commonly applied in Natural Language Processing to train part a peptide:TCR binding model with a degree of cross-peptide generalization (0.69 AUROC). We demonstrate that BERTrand outperforms the published methods when evaluated on peptide sequences not used during model training. AVAILABILITY AND IMPLEMENTATION The datasets and the code for model training are available at https://github.com/SFGLab/bertrand.
Collapse
Affiliation(s)
- Alexander Myronov
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Ardigen, Krakow, Poland
| | | | | | - Dariusz Plewczynski
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| |
Collapse
|
35
|
Shen Y, Voigt A, Leng X, Rodriguez AA, Nguyen CQ. A current and future perspective on T cell receptor repertoire profiling. Front Genet 2023; 14:1159109. [PMID: 37408774 PMCID: PMC10319011 DOI: 10.3389/fgene.2023.1159109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Accepted: 06/12/2023] [Indexed: 07/07/2023] Open
Abstract
T cell receptors (TCR) play a vital role in the immune system's ability to recognize and respond to foreign antigens, relying on the highly polymorphic rearrangement of TCR genes. The recognition of autologous peptides by adaptive immunity may lead to the development and progression of autoimmune diseases. Understanding the specific TCR involved in this process can provide insights into the autoimmune process. RNA-seq (RNA sequencing) is a valuable tool for studying TCR repertoires by providing a comprehensive and quantitative analysis of the RNA transcripts. With the development of RNA technology, transcriptomic data must provide valuable information to model and predict TCR and antigen interaction and, more importantly, identify or predict neoantigens. This review provides an overview of the application and development of bulk RNA-seq and single-cell (SC) RNA-seq to examine the TCR repertoires. Furthermore, discussed here are bioinformatic tools that can be applied to study the structural biology of peptide/TCR/MHC (major histocompatibility complex) and predict antigenic epitopes using advanced artificial intelligence tools.
Collapse
Affiliation(s)
- Yiran Shen
- Department of Infectious Diseases and Immunology, College of Veterinary Medicine, University of Florida, Gainesville, FL, United States
| | - Alexandria Voigt
- Department of Infectious Diseases and Immunology, College of Veterinary Medicine, University of Florida, Gainesville, FL, United States
| | - Xuebing Leng
- Department of Microbiology and Immunology, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Amy A. Rodriguez
- Department of Infectious Diseases and Immunology, College of Veterinary Medicine, University of Florida, Gainesville, FL, United States
| | - Cuong Q. Nguyen
- Department of Infectious Diseases and Immunology, College of Veterinary Medicine, University of Florida, Gainesville, FL, United States
- Department of Oral Biology, College of Dentistry, University of Florida, Gainesville, FL, United States
- Center of Orphaned Autoimmune Diseases, University of Florida, Gainesville, FL, United States
| |
Collapse
|
36
|
Povlsen HR, Bentzen AK, Kadivar M, Jessen LE, Hadrup SR, Nielsen M. Improved T cell receptor antigen pairing through data-driven filtering of sequencing information from single cells. eLife 2023; 12:e81810. [PMID: 37133356 PMCID: PMC10156162 DOI: 10.7554/elife.81810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 03/13/2023] [Indexed: 05/04/2023] Open
Abstract
Novel single-cell-based technologies hold the promise of matching T cell receptor (TCR) sequences with their cognate peptide-MHC recognition motif in a high-throughput manner. Parallel capture of TCR transcripts and peptide-MHC is enabled through the use of reagents labeled with DNA barcodes. However, analysis and annotation of such single-cell sequencing (SCseq) data are challenged by dropout, random noise, and other technical artifacts that must be carefully handled in the downstream processing steps. We here propose a rational, data-driven method termed ITRAP (improved T cell Receptor Antigen Paring) to deal with these challenges, filtering away likely artifacts, and enable the generation of large sets of TCR-pMHC sequence data with a high degree of specificity and sensitivity, thus outputting the most likely pMHC target per T cell. We have validated this approach across 10 different virus-specific T cell responses in 16 healthy donors. Across these samples, we have identified up to 1494 high-confident TCR-pMHC pairs derived from 4135 single cells.
Collapse
Affiliation(s)
- Helle Rus Povlsen
- Department of Health Technology at Technical University of DenmarkKongens LyngbyDenmark
| | - Amalie Kai Bentzen
- Department of Health Technology at Technical University of DenmarkKongens LyngbyDenmark
| | - Mohammad Kadivar
- Department of Health Technology at Technical University of DenmarkKongens LyngbyDenmark
| | - Leon Eyrich Jessen
- Department of Health Technology at Technical University of DenmarkKongens LyngbyDenmark
| | - Sine Reker Hadrup
- Department of Health Technology at Technical University of DenmarkKongens LyngbyDenmark
| | - Morten Nielsen
- Department of Health Technology at Technical University of DenmarkKongens LyngbyDenmark
| |
Collapse
|
37
|
Wu J, Qi M, Zhang F, Zheng Y. TPBTE: A model based on convolutional Transformer for predicting the binding of TCR to epitope. Mol Immunol 2023; 157:30-41. [PMID: 36966551 DOI: 10.1016/j.molimm.2023.03.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 03/12/2023] [Accepted: 03/14/2023] [Indexed: 03/29/2023]
Abstract
T cell receptors (TCRs) selectively bind to antigens to fight pathogens with specific immunity. Current tools focus on the nature of amino acids within sequences and take less into account the nature of amino acids far apart and the relationship between sequences, leading to significant differences in the results from different datasets. We propose TPBTE, a model based on convolutional Transformer for Predicting the Binding of TCR to Epitope. It takes epitope sequences and the complementary decision region 3 (CDR3) sequences of TCRβ chain as inputs. And it uses a convolutional attention mechanism to learn amino acid representations between different positions of the sequences based on learning local features of the sequences. At the same time, it uses cross attention to learn the interaction information between TCR sequences and epitope sequences. A comprehensive evaluation of the TCR-epitope data shows that the average area under the curve of TPBTE outperforms the baseline model, and demonstrate an intentional performance. In addition, TPBTE can give the probability of binding TCR to epitopes, which can be used as the first step of epitope screening, narrowing the scope of epitope search and reducing the time of epitope search.
Collapse
|
38
|
Xu AM, Chour W, DeLucia DC, Su Y, Pavlovitch-Bedzyk AJ, Ng R, Rasheed Y, Davis MM, Lee JK, Heath JR. Entropic analysis of antigen-specific CDR3 domains identifies essential binding motifs shared by CDR3s with different antigen specificities. Cell Syst 2023; 14:273-284.e5. [PMID: 37001518 PMCID: PMC10355346 DOI: 10.1016/j.cels.2023.03.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 09/01/2022] [Accepted: 03/01/2023] [Indexed: 04/22/2023]
Abstract
Antigen-specific T cell receptor (TCR) sequences can have prognostic, predictive, and therapeutic value, but decoding the specificity of TCR recognition remains challenging. Unlike DNA strands that base pair, TCRs bind to their targets with different orientations and different lengths, which complicates comparisons. We present scanning parametrized by normalized TCR length (SPAN-TCR) to analyze antigen-specific TCR CDR3 sequences and identify patterns driving TCR-pMHC specificity. Using entropic analysis, SPAN-TCR identifies 2-mer motifs that decrease the diversity (entropy) of CDR3s. These motifs are the most common patterns that can predict CDR3 composition, and we identify "essential" motifs that decrease entropy in the same CDR3 α or β chain containing the 2-mer, and "super-essential" motifs that decrease entropy in both chains. Molecular dynamics analysis further suggests that these motifs may play important roles in binding. We then employ SPAN-TCR to resolve similarities in TCR repertoires against different antigens using public databases of TCR sequences.
Collapse
Affiliation(s)
- Alexander M Xu
- Institute for Systems Biology, Seattle, WA 98109, USA; Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA; Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA; Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA.
| | - William Chour
- Institute for Systems Biology, Seattle, WA 98109, USA; Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; Keck School of Medicine, University of Southern California, Los Angeles, CA 91125, USA
| | - Diana C DeLucia
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Yapeng Su
- Institute for Systems Biology, Seattle, WA 98109, USA; Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | | | - Rachel Ng
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Yusuf Rasheed
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Mark M Davis
- Computational and Systems Immunology Program, Stanford University School of Medicine, Stanford, CA 94305, USA; Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA; Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - John K Lee
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Division of Medical Oncology, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - James R Heath
- Institute for Systems Biology, Seattle, WA 98109, USA.
| |
Collapse
|
39
|
Tippalagama R, Chihab LY, Kearns K, Lewis S, Panda S, Willemsen L, Burel JG, Lindestam Arlehamn CS. Antigen-specificity measurements are the key to understanding T cell responses. Front Immunol 2023; 14:1127470. [PMID: 37122719 PMCID: PMC10140422 DOI: 10.3389/fimmu.2023.1127470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 03/30/2023] [Indexed: 05/02/2023] Open
Abstract
Antigen-specific T cells play a central role in the adaptive immune response and come in a wide range of phenotypes. T cell receptors (TCRs) mediate the antigen-specificities found in T cells. Importantly, high-throughput TCR sequencing provides a fingerprint which allows tracking of specific T cells and their clonal expansion in response to particular antigens. As a result, many studies have leveraged TCR sequencing in an attempt to elucidate the role of antigen-specific T cells in various contexts. Here, we discuss the published approaches to studying antigen-specific T cells and their specific TCR repertoire. Further, we discuss how these methods have been applied to study the TCR repertoire in various diseases in order to characterize the antigen-specific T cells involved in the immune control of disease.
Collapse
|
40
|
Buckley PR, Lee CH, Antanaviciute A, Simmons A, Koohy H. A systems approach evaluating the impact of SARS-CoV-2 variant of concern mutations on CD8+ T cell responses. IMMUNOTHERAPY ADVANCES 2023; 3:ltad005. [PMID: 37082106 PMCID: PMC10112682 DOI: 10.1093/immadv/ltad005] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/02/2023] [Indexed: 03/17/2023] Open
Abstract
T cell recognition of SARS-CoV-2 antigens after vaccination and/or natural infection has played a central role in resolving SARS-CoV-2 infections and generating adaptive immune memory. However, the clinical impact of SARS-CoV-2-specific T cell responses is variable and the mechanisms underlying T cell interaction with target antigens are not fully understood. This is especially true given the virus' rapid evolution, which leads to new variants with immune escape capacity. In this study, we used the Omicron variant as a model organism and took a systems approach to evaluate the impact of mutations on CD8+ T cell immunogenicity. We computed an immunogenicity potential score for each SARS-CoV-2 peptide antigen from the ancestral strain and Omicron, capturing both antigen presentation and T cell recognition probabilities. By comparing ancestral vs. Omicron immunogenicity scores, we reveal a divergent and heterogeneous landscape of impact for CD8+ T cell recognition of mutated targets in Omicron variants. While T cell recognition of Omicron peptides is broadly preserved, we observed mutated peptides with deteriorated immunogenicity that may assist breakthrough infection in some individuals. We then combined our scoring scheme with an in silico mutagenesis, to characterise the position- and residue-specific theoretical mutational impact on immunogenicity. While we predict many escape trajectories from the theoretical landscape of substitutions, our study suggests that Omicron mutations in T cell epitopes did not develop under cell-mediated pressure. Our study provides a generalisable platform for fostering a deeper understanding of existing and novel variant impact on antigen-specific vaccine- and/or infection-induced T cell immunity.
Collapse
Affiliation(s)
- Paul R Buckley
- Medical Research Council (MRC) Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, UK
- MRC WIMM Centre for Computational Biology, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Chloe H Lee
- Medical Research Council (MRC) Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, UK
- MRC WIMM Centre for Computational Biology, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Agne Antanaviciute
- Medical Research Council (MRC) Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, UK
- MRC WIMM Centre for Computational Biology, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Alison Simmons
- Medical Research Council (MRC) Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Hashem Koohy
- Medical Research Council (MRC) Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, UK
- MRC WIMM Centre for Computational Biology, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, UK
- Alan Turing Fellow in Health and Medicine
| |
Collapse
|
41
|
Frank ML, Lu K, Erdogan C, Han Y, Hu J, Wang T, Heymach JV, Zhang J, Reuben A. T-Cell Receptor Repertoire Sequencing in the Era of Cancer Immunotherapy. Clin Cancer Res 2023; 29:994-1008. [PMID: 36413126 PMCID: PMC10011887 DOI: 10.1158/1078-0432.ccr-22-2469] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 10/07/2022] [Accepted: 11/14/2022] [Indexed: 11/23/2022]
Abstract
T cells are integral components of the adaptive immune system, and their responses are mediated by unique T-cell receptors (TCR) that recognize specific antigens from a variety of biological contexts. As a result, analyzing the T-cell repertoire offers a better understanding of immune responses and of diseases like cancer. Next-generation sequencing technologies have greatly enabled the high-throughput analysis of the TCR repertoire. On the basis of our extensive experience in the field from the past decade, we provide an overview of TCR sequencing, from the initial library preparation steps to sequencing and analysis methods and finally to functional validation techniques. With regards to data analysis, we detail important TCR repertoire metrics and present several computational tools for predicting antigen specificity. Finally, we highlight important applications of TCR sequencing and repertoire analysis to understanding tumor biology and developing cancer immunotherapies.
Collapse
Affiliation(s)
- Meredith L Frank
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas.,The University of Texas MD Anderson Cancer Center UT Health Houston Graduate School of Biomedical Sciences, Houston, Texas
| | - Kaylene Lu
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas.,The University of Texas MD Anderson Cancer Center UT Health Houston Graduate School of Biomedical Sciences, Houston, Texas.,Department of Cancer Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Can Erdogan
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas.,Rice University, Houston, Texas
| | - Yi Han
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Jian Hu
- The University of Texas MD Anderson Cancer Center UT Health Houston Graduate School of Biomedical Sciences, Houston, Texas.,Department of Cancer Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Tao Wang
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, Texas.,Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, Texas
| | - John V Heymach
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas.,The University of Texas MD Anderson Cancer Center UT Health Houston Graduate School of Biomedical Sciences, Houston, Texas
| | - Jianjun Zhang
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas.,The University of Texas MD Anderson Cancer Center UT Health Houston Graduate School of Biomedical Sciences, Houston, Texas.,Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Alexandre Reuben
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas.,The University of Texas MD Anderson Cancer Center UT Health Houston Graduate School of Biomedical Sciences, Houston, Texas
| |
Collapse
|
42
|
Gao Y, Gao Y, Fan Y, Zhu C, Wei Z, Zhou C, Chuai G, Chen Q, Zhang H, Liu Q. Pan-Peptide Meta Learning for T-cell receptor–antigen binding recognition. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-023-00619-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
|
43
|
Camaglia F, Ryvkin A, Greenstein E, Reich-Zeliger S, Chain B, Mora T, Walczak AM, Friedman N. Quantifying changes in the T cell receptor repertoire during thymic development. eLife 2023; 12:81622. [PMID: 36661220 PMCID: PMC9934861 DOI: 10.7554/elife.81622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 01/18/2023] [Indexed: 01/21/2023] Open
Abstract
One of the feats of adaptive immunity is its ability to recognize foreign pathogens while sparing the self. During maturation in the thymus, T cells are selected through the binding properties of their antigen-specific T-cell receptor (TCR), through the elimination of both weakly (positive selection) and strongly (negative selection) self-reactive receptors. However, the impact of thymic selection on the TCR repertoire is poorly understood. Here, we use transgenic Nur77-mice expressing a T-cell activation reporter to study the repertoires of thymic T cells at various stages of their development, including cells that do not pass selection. We combine high-throughput repertoire sequencing with statistical inference techniques to characterize the selection of the TCR in these distinct subsets. We find small but significant differences in the TCR repertoire parameters between the maturation stages, which recapitulate known differentiation pathways leading to the CD4+ and CD8+ subtypes. These differences can be simulated by simple models of selection acting linearly on the sequence features. We find no evidence of specific sequences or sequence motifs or features that are suppressed by negative selection. These results favour a collective or statistical model for T-cell self non-self discrimination, where negative selection biases the repertoire away from self recognition, rather than ensuring lack of self-reactivity at the single-cell level.
Collapse
Affiliation(s)
- Francesco Camaglia
- Laboratoire de physique de l’École normale supérieure, CNRS, PSL University, Sorbonne Université, and Université de ParisParisFrance
| | - Arie Ryvkin
- Department of Immunology, Weizmann Institute of ScienceRehovotIsrael
| | - Erez Greenstein
- Department of Immunology, Weizmann Institute of ScienceRehovotIsrael
| | | | - Benny Chain
- Division of Infection and Immunity, University College LondonLondonUnited Kingdom
| | - Thierry Mora
- Laboratoire de physique de l’École normale supérieure, CNRS, PSL University, Sorbonne Université, and Université de ParisParisFrance
| | - Aleksandra M Walczak
- Laboratoire de physique de l’École normale supérieure, CNRS, PSL University, Sorbonne Université, and Université de ParisParisFrance
| | - Nir Friedman
- Department of Immunology, Weizmann Institute of ScienceRehovotIsrael
| |
Collapse
|
44
|
Valkiers S, Gielis S, Van Deuren VML, Laukens K, Meysman P. Clustering and Annotation of T Cell Receptor Repertoires. Methods Mol Biol 2023; 2673:33-51. [PMID: 37258905 DOI: 10.1007/978-1-0716-3239-0_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Immunological protection against a wide variety of pathogens is largely mediated by the diverse and dynamic T cell receptor (TCR) repertoire, a crucial component of the adaptive immune system. An encounter with infectious agents stimulates specific T cells to initiate a direct immune response to combat intruders. Hence, the TCR repertoire may conceal crucial information regarding current and past infections and might assist in the development and monitoring of vaccines. To unlock its knowledge, we describe a computational workflow involving both supervised and unsupervised machine learning techniques to analyze and annotate full TCR repertoire data. The method is explained using data from a published yellow fever virus (YFV) vaccination study in healthy individuals. The TCR repertoire of one individual is studied before and 2 weeks after vaccination, using an efficient clustering method and identification of YFV-specific TCRs.
Collapse
Affiliation(s)
- Sebastiaan Valkiers
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- AUDACIS, Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing, University of Antwerp, Antwerp, Belgium
| | - Sofie Gielis
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- AUDACIS, Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing, University of Antwerp, Antwerp, Belgium
| | - Vincent M L Van Deuren
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- AUDACIS, Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- AUDACIS, Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing, University of Antwerp, Antwerp, Belgium
| | - Pieter Meysman
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium.
- AUDACIS, Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing, University of Antwerp, Antwerp, Belgium.
| |
Collapse
|
45
|
Grazioli F, Machart P, Mösch A, Li K, Castorina LV, Pfeifer N, Min MR. Attentive Variational Information Bottleneck for TCR-peptide interaction prediction. Bioinformatics 2022; 39:6960920. [PMID: 36571499 PMCID: PMC9825246 DOI: 10.1093/bioinformatics/btac820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 11/18/2022] [Accepted: 12/23/2022] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION We present a multi-sequence generalization of Variational Information Bottleneck and call the resulting model Attentive Variational Information Bottleneck (AVIB). Our AVIB model leverages multi-head self-attention to implicitly approximate a posterior distribution over latent encodings conditioned on multiple input sequences. We apply AVIB to a fundamental immuno-oncology problem: predicting the interactions between T-cell receptors (TCRs) and peptides. RESULTS Experimental results on various datasets show that AVIB significantly outperforms state-of-the-art methods for TCR-peptide interaction prediction. Additionally, we show that the latent posterior distribution learned by AVIB is particularly effective for the unsupervised detection of out-of-distribution amino acid sequences. AVAILABILITY AND IMPLEMENTATION The code and the data used for this study are publicly available at: https://github.com/nec-research/vibtcr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Pierre Machart
- Biomedical AI Group, NEC Laboratories Europe, Heidelberg 69115, Germany
| | - Anja Mösch
- Biomedical AI Group, NEC Laboratories Europe, Heidelberg 69115, Germany
| | - Kai Li
- Machine Learning Department, NEC Laboratories America, Princeton, NJ 08540, USA
| | | | - Nico Pfeifer
- Methods in Medical Informatics, Department of Computer Science, University of Tübingen, Tübingen 72076, Germany
| | | |
Collapse
|
46
|
Fang Y, Liu X, Liu H. Attention-aware contrastive learning for predicting T cell receptor-antigen binding specificity. Brief Bioinform 2022; 23:6696141. [PMID: 36094087 DOI: 10.1093/bib/bbac378] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 07/06/2022] [Accepted: 08/09/2022] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION It has been proven that only a small fraction of the neoantigens presented by major histocompatibility complex (MHC) class I molecules on the cell surface can elicit T cells. This restriction can be attributed to the binding specificity of T cell receptor (TCR) and peptide-MHC complex (pMHC). Computational prediction of T cells binding to neoantigens is a challenging and unresolved task. RESULTS In this paper, we proposed an attention-aware contrastive learning model, ATMTCR, to infer the TCR-pMHC binding specificity. For each TCR sequence, we used a transformer encoder to transform it to latent representation, and then masked a percentage of amino acids guided by attention weights to generate its contrastive view. Compared to fully-supervised baseline model, we verified that contrastive learning-based pretraining on large-scale TCR sequences significantly improved the prediction performance of downstream tasks. Interestingly, masking a percentage of amino acids with low attention weights yielded best performance compared to other masking strategies. Comparison experiments on two independent datasets demonstrated our method achieved better performance than other existing algorithms. Moreover, we identified important amino acids and their positional preference through attention weights, which indicated the potential interpretability of our proposed model.
Collapse
Affiliation(s)
- Yiming Fang
- School of Computer Science and Technology, Nanjing Tech University, 211816, Nanjing, China
| | - Xuejun Liu
- School of Computer Science and Technology, Nanjing Tech University, 211816, Nanjing, China
| | - Hui Liu
- School of Computer Science and Technology, Nanjing Tech University, 211816, Nanjing, China
| |
Collapse
|
47
|
Grazioli F, Mösch A, Machart P, Li K, Alqassem I, O’Donnell TJ, Min MR. On TCR binding predictors failing to generalize to unseen peptides. Front Immunol 2022; 13:1014256. [PMID: 36341448 PMCID: PMC9634250 DOI: 10.3389/fimmu.2022.1014256] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 10/05/2022] [Indexed: 11/18/2022] Open
Abstract
Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how state-of-the-art deep learning models for TCR-peptide/-pMHC binding prediction generalize to unseen peptides. We create a dataset including positive samples from IEDB, VDJdb, McPAS-TCR, and the MIRA set, as well as negative samples from both randomization and 10X Genomics assays. We name this collection of samples TChard. We propose the hard split, a simple heuristic for training/test split, which ensures that test samples exclusively present peptides that do not belong to the training set. We investigate the effect of different training/test splitting techniques on the models’ test performance, as well as the effect of training and testing the models using mismatched negative samples generated randomly, in addition to the negative samples derived from assays. Our results show that modern deep learning methods fail to generalize to unseen peptides. We provide an explanation why this happens and verify our hypothesis on the TChard dataset. We then conclude that robust prediction of TCR recognition is still far for being solved.
Collapse
Affiliation(s)
- Filippo Grazioli
- Biomedical AI Group, NEC Laboratories Europe, Heidelberg, Germany
- *Correspondence: Filippo Grazioli, ; Martin Renqiang Min,
| | - Anja Mösch
- Biomedical AI Group, NEC Laboratories Europe, Heidelberg, Germany
| | - Pierre Machart
- Biomedical AI Group, NEC Laboratories Europe, Heidelberg, Germany
| | - Kai Li
- Machine Learning Department, NEC Laboratories America, Princeton, NJ, United States
| | - Israa Alqassem
- Biomedical AI Group, NEC Laboratories Europe, Heidelberg, Germany
| | - Timothy J. O’Donnell
- Division of Hematology and Medical Oncology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Martin Renqiang Min
- Machine Learning Department, NEC Laboratories America, Princeton, NJ, United States
- *Correspondence: Filippo Grazioli, ; Martin Renqiang Min,
| |
Collapse
|
48
|
Papadopoulou I, Nguyen AP, Weber A, Martínez MR. DECODE: a computational pipeline to discover T cell receptor binding rules. Bioinformatics 2022; 38:i246-i254. [PMID: 35758821 PMCID: PMC9235487 DOI: 10.1093/bioinformatics/btac257] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Motivation Understanding the mechanisms underlying T cell receptor (TCR) binding is of fundamental importance to understanding adaptive immune responses. A better understanding of the biochemical rules governing TCR binding can be used, e.g. to guide the design of more powerful and safer T cell-based therapies. Advances in repertoire sequencing technologies have made available millions of TCR sequences. Data abundance has, in turn, fueled the development of many computational models to predict the binding properties of TCRs from their sequences. Unfortunately, while many of these works have made great strides toward predicting TCR specificity using machine learning, the black-box nature of these models has resulted in a limited understanding of the rules that govern the binding of a TCR and an epitope. Results We present an easy-to-use and customizable computational pipeline, DECODE, to extract the binding rules from any black-box model designed to predict the TCR-epitope binding. DECODE offers a range of analytical and visualization tools to guide the user in the extraction of such rules. We demonstrate our pipeline on a recently published TCR-binding prediction model, TITAN, and show how to use the provided metrics to assess the quality of the computed rules. In conclusion, DECODE can lead to a better understanding of the sequence motifs that underlie TCR binding. Our pipeline can facilitate the investigation of current immunotherapeutic challenges, such as cross-reactive events due to off-target TCR binding. Availability and implementation Code is available publicly at https://github.com/phineasng/DECODE. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Iliana Papadopoulou
- IBM Research Europe, 8803 Rüschlikon, Switzerland.,ETH Zurich, Department of Biosystems Science and Engineering (D-BSSE), 4058 Basel, Switzerland
| | - An-Phi Nguyen
- IBM Research Europe, 8803 Rüschlikon, Switzerland.,ETH Zurich, Department of Mathematics (D-Math), 8092 Zurich, Switzerland
| | - Anna Weber
- IBM Research Europe, 8803 Rüschlikon, Switzerland.,ETH Zurich, Department of Biosystems Science and Engineering (D-BSSE), 4058 Basel, Switzerland
| | | |
Collapse
|
49
|
Abondio P, De Intinis C, da Silva Gonçalves Vianez Júnior JL, Pace L. SINGLE CELL MULTIOMIC APPROACHES TO DISENTANGLE T CELL HETEROGENEITY. Immunol Lett 2022; 246:37-51. [DOI: 10.1016/j.imlet.2022.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 04/16/2022] [Accepted: 04/26/2022] [Indexed: 11/29/2022]
|
50
|
Born J, Huynh T, Stroobants A, Cornell WD, Manica M. Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model. J Chem Inf Model 2021; 62:240-257. [PMID: 34905358 DOI: 10.1021/acs.jcim.1c00889] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Recent advances in deep learning have enabled the development of large-scale multimodal models for virtual screening and de novo molecular design. The human kinome with its abundant sequence and inhibitor data presents an attractive opportunity to develop proteochemometric models that exploit the size and internal diversity of this family of targets. Here, we challenge a standard practice in sequence-based affinity prediction models: instead of leveraging the full primary structure of proteins, each target is represented by a sequence of 29 discontiguous residues defining the ATP binding site. In kinase-ligand binding affinity prediction, our results show that the reduced active site sequence representation is not only computationally more efficient but consistently yields significantly higher performance than the full primary structure. This trend persists across different models, data sets, and performance metrics and holds true when predicting pIC50 for both unseen ligands and kinases. Our interpretability analysis reveals a potential explanation for the superiority of the active site models: whereas only mild statistical effects about the extraction of three-dimensional (3D) interaction sites take place in the full sequence models, the active site models are equipped with an implicit but strong inductive bias about the 3D structure stemming from the discontiguity of the active sites. Moreover, in direct comparisons, our models perform similarly or better than previous state-of-the-art approaches in affinity prediction. We then investigate a de novo molecular design task and find that the active site provides benefits in the computational efficiency, but otherwise, both kinase representations yield similar optimized affinities (for both SMILES- and SELFIES-based molecular generators). Our work challenges the assumption that the full primary structure is indispensable for modeling human kinases.
Collapse
Affiliation(s)
- Jannis Born
- IBM Research Europe, 8804 Rüschlikon, Switzerland.,Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Tien Huynh
- IBM Research, Yorktown Heights, New York 10598, United States
| | - Astrid Stroobants
- Department of Chemistry, Imperial College London, SW7 2AZ London, United Kingdom
| | - Wendy D Cornell
- IBM Research, Yorktown Heights, New York 10598, United States
| | | |
Collapse
|