1
|
Goldner Kabeli R, Zevin S, Abargel A, Zilberberg A, Efroni S. Self-supervised learning of T cell receptor sequences exposes core properties for T cell membership. SCIENCE ADVANCES 2024; 10:eadk4670. [PMID: 38669334 DOI: 10.1126/sciadv.adk4670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 03/26/2024] [Indexed: 04/28/2024]
Abstract
The T cell receptor (TCR) repertoire is an extraordinarily diverse collection of TCRs essential for maintaining the body's homeostasis and response to threats. In this study, we compiled an extensive dataset of more than 4200 bulk TCR repertoire samples, encompassing 221,176,713 sequences, alongside 6,159,652 single-cell TCR sequences from over 400 samples. From this dataset, we then selected a representative subset of 5 million bulk sequences and 4.2 million single-cell sequences to train two specialized Transformer-based language models for bulk (CVC) and single-cell (scCVC) TCR repertoires, respectively. We show that these models successfully capture TCR core qualities, such as sharing, gene composition, and single-cell properties. These qualities are emergent in the encoded TCR latent space and enable classification into TCR-based qualities such as public sequences. These models demonstrate the potential of Transformer-based language models in TCR downstream applications.
Collapse
Affiliation(s)
- Romi Goldner Kabeli
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Sarit Zevin
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Avital Abargel
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Alona Zilberberg
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Sol Efroni
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| |
Collapse
|
2
|
Barton J, Gaspariunas A, Galson JD, Leem J. Building Representation Learning Models for Antibody Comprehension. Cold Spring Harb Perspect Biol 2024; 16:a041462. [PMID: 38012013 PMCID: PMC10910360 DOI: 10.1101/cshperspect.a041462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Antibodies are versatile proteins with both the capacity to bind a broad range of targets and a proven track record as some of the most successful therapeutics. However, the development of novel antibody therapeutics is a lengthy and costly process. It is challenging to predict the functional and biophysical properties of antibodies from their amino acid sequence alone, requiring numerous experiments for full characterization. Machine learning, specifically deep representation learning, has emerged as a family of methods that can complement wet lab approaches and accelerate the overall discovery and engineering process. Here, we review advances in antibody sequence representation learning, and how this has improved antibody structure prediction and facilitated antibody optimization. We discuss challenges in the development and implementation of such models, such as the lack of publicly available, well-curated antibody function data and highlight opportunities for improvement. These and future advances in machine learning for antibody sequences have the potential to increase the success rate in developing new therapeutics, resulting in broader access to transformative medicines and improved patient outcomes.
Collapse
Affiliation(s)
- Justin Barton
- Alchemab Therapeutics Ltd, London N1C 4AX, United Kingdom
| | | | - Jacob D Galson
- Alchemab Therapeutics Ltd, London N1C 4AX, United Kingdom
| | - Jinwoo Leem
- Alchemab Therapeutics Ltd, London N1C 4AX, United Kingdom
| |
Collapse
|
3
|
Natali EN, Horst A, Meier P, Greiff V, Nuvolone M, Babrak LM, Fink K, Miho E. The dengue-specific immune response and antibody identification with machine learning. NPJ Vaccines 2024; 9:16. [PMID: 38245547 PMCID: PMC10799860 DOI: 10.1038/s41541-023-00788-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 12/07/2023] [Indexed: 01/22/2024] Open
Abstract
Dengue virus poses a serious threat to global health and there is no specific therapeutic for it. Broadly neutralizing antibodies recognizing all serotypes may be an effective treatment. High-throughput adaptive immune receptor repertoire sequencing (AIRR-seq) and bioinformatic analysis enable in-depth understanding of the B-cell immune response. Here, we investigate the dengue antibody response with these technologies and apply machine learning to identify rare and underrepresented broadly neutralizing antibody sequences. Dengue immunization elicited the following signatures on the antibody repertoire: (i) an increase of CDR3 and germline gene diversity; (ii) a change in the antibody repertoire architecture by eliciting power-law network distributions and CDR3 enrichment in polar amino acids; (iii) an increase in the expression of JNK/Fos transcription factors and ribosomal proteins. Furthermore, we demonstrate the applicability of computational methods and machine learning to AIRR-seq datasets for neutralizing antibody candidate sequence identification. Antibody expression and functional assays have validated the obtained results.
Collapse
Affiliation(s)
- Eriberto Noel Natali
- FHNW University of Applied Sciences and Arts Northwestern Switzerland, School of Life Sciences, Muttenz, Switzerland
| | - Alexander Horst
- FHNW University of Applied Sciences and Arts Northwestern Switzerland, School of Life Sciences, Muttenz, Switzerland
| | - Patrick Meier
- FHNW University of Applied Sciences and Arts Northwestern Switzerland, School of Life Sciences, Muttenz, Switzerland
| | - Victor Greiff
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Oslo, Norway
| | - Mario Nuvolone
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
| | - Lmar Marie Babrak
- FHNW University of Applied Sciences and Arts Northwestern Switzerland, School of Life Sciences, Muttenz, Switzerland
| | | | - Enkelejda Miho
- FHNW University of Applied Sciences and Arts Northwestern Switzerland, School of Life Sciences, Muttenz, Switzerland.
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- aiNET GmbH, Basel, Switzerland.
| |
Collapse
|
4
|
Textor J, Buytenhuijs F, Rogers D, Gauthier ÈM, Sultan S, Wortel IMN, Kalies K, Fähnrich A, Pagel R, Melichar HJ, Westermann J, Mandl JN. Machine learning analysis of the T cell receptor repertoire identifies sequence features of self-reactivity. Cell Syst 2023; 14:1059-1073.e5. [PMID: 38061355 DOI: 10.1016/j.cels.2023.11.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 09/01/2023] [Accepted: 11/09/2023] [Indexed: 12/23/2023]
Abstract
The T cell receptor (TCR) determines specificity and affinity for both foreign and self-peptides presented by the major histocompatibility complex (MHC). Although the strength of TCR interactions with self-pMHC impacts T cell function, it has been challenging to identify TCR sequence features that predict T cell fate. To discern patterns distinguishing TCRs from naive CD4+ T cells with low versus high self-reactivity, we used data from 42 mice to train a machine learning (ML) algorithm that identifies population-level differences between TCRβ sequence sets. This approach revealed that weakly self-reactive T cell populations were enriched for longer CDR3β regions and acidic amino acids. We tested our ML predictions of self-reactivity using retrogenic mice with fixed TCRβ sequences. Extrapolating our analyses to independent datasets, we predicted high self-reactivity for regulatory T cells and slightly reduced self-reactivity for T cells responding to chronic infections. Our analyses suggest a potential trade-off between TCR repertoire diversity and self-reactivity. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Johannes Textor
- Data Science Group, Institute for Computing and Information Sciences, Radboud University, Nijmegen 6525 EC, the Netherlands; Medical BioSciences, Radboudumc, Nijmegen 6525 GA, the Netherlands.
| | - Franka Buytenhuijs
- Data Science Group, Institute for Computing and Information Sciences, Radboud University, Nijmegen 6525 EC, the Netherlands
| | - Dakota Rogers
- Department of Physiology, McGill University, Montreal, QC H3G 0B1, Canada; McGill Research Centre on Complex Traits, McGill University, Montreal, QC H3G 0B1, Canada
| | - Ève Mallet Gauthier
- Immunology-Oncology Unit, Maisonneuve-Rosemont Hospital Research Center, Montreal, QC H1T 2M4, Canada; Department of Microbiology, Infectious Diseases, and Immunology, Université de Montréal, Montréal, QC H3T 1J4, Canada
| | - Shabaz Sultan
- Data Science Group, Institute for Computing and Information Sciences, Radboud University, Nijmegen 6525 EC, the Netherlands; Medical BioSciences, Radboudumc, Nijmegen 6525 GA, the Netherlands
| | - Inge M N Wortel
- Data Science Group, Institute for Computing and Information Sciences, Radboud University, Nijmegen 6525 EC, the Netherlands; Medical BioSciences, Radboudumc, Nijmegen 6525 GA, the Netherlands
| | - Kathrin Kalies
- Institut für Anatomie, Universität zu Lübeck, 23562 Lübeck, Germany
| | - Anke Fähnrich
- Institut für Anatomie, Universität zu Lübeck, 23562 Lübeck, Germany
| | - René Pagel
- Institut für Anatomie, Universität zu Lübeck, 23562 Lübeck, Germany
| | - Heather J Melichar
- Immunology-Oncology Unit, Maisonneuve-Rosemont Hospital Research Center, Montreal, QC H1T 2M4, Canada; Department of Medicine, Université de Montréal, Montréal, QC H1T 2M4, Canada; Department of Microbiology & Immunology, McGill University, Montreal, QC H3A 1A3, Canada; Rosalind and Morris Goodman Cancer Institute, McGill University, Montreal, QC H3A 1A3, Canada
| | | | - Judith N Mandl
- Department of Physiology, McGill University, Montreal, QC H3G 0B1, Canada; Department of Microbiology & Immunology, McGill University, Montreal, QC H3A 1A3, Canada; McGill Research Centre on Complex Traits, McGill University, Montreal, QC H3G 0B1, Canada.
| |
Collapse
|
5
|
Böttcher L, Wald S, Chou T. Mathematical Characterization of Private and Public Immune Receptor Sequences. Bull Math Biol 2023; 85:102. [PMID: 37707621 PMCID: PMC10501991 DOI: 10.1007/s11538-023-01190-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/26/2023] [Indexed: 09/15/2023]
Abstract
Diverse T and B cell repertoires play an important role in mounting effective immune responses against a wide range of pathogens and malignant cells. The number of unique T and B cell clones is characterized by T and B cell receptors (TCRs and BCRs), respectively. Although receptor sequences are generated probabilistically by recombination processes, clinical studies found a high degree of sharing of TCRs and BCRs among different individuals. In this work, we use a general probabilistic model for T/B cell receptor clone abundances to define "publicness" or "privateness" and information-theoretic measures for comparing the frequency of sampled sequences observed across different individuals. We derive mathematical formulae to quantify the mean and the variances of clone richness and overlap. Our results can be used to evaluate the effect of different sampling protocols on abundances of clones within an individual as well as the commonality of clones across individuals. Using synthetic and empirical TCR amino acid sequence data, we perform simulations to study expected clonal commonalities across multiple individuals. Based on our formulae, we compare these simulated results with the analytically predicted mean and variances of the repertoire overlap. Complementing the results on simulated repertoires, we derive explicit expressions for the richness and its uncertainty for specific, single-parameter truncated power-law probability distributions. Finally, the information loss associated with grouping together certain receptor sequences, as is done in spectratyping, is also evaluated. Our approach can be, in principle, applied under more general and mechanistically realistic clone generation models.
Collapse
Affiliation(s)
- Lucas Böttcher
- Department of Computational Science and Philosophy, Frankfurt School of Finance and Management, 60322 Frankfurt am Main, Germany
- Department of Computational Medicine, University of California, Los Angeles, 621 Charles E. Young Dr. S., Los Angeles, 90095-1766 CA USA
- Department of Medicine, University of Florida, Gainesville, 32610 FL USA
| | - Sascha Wald
- Statistical Physics Group, Centre for Fluid and Complex Systems, Coventry University, Priory Street, Coventry, CV1 5FB UK
| | - Tom Chou
- Department of Computational Medicine, University of California, Los Angeles, 621 Charles E. Young Dr. S., Los Angeles, 90095-1766 CA USA
- Department of Mathematics, University of California, Los Angeles, 520 Portola Plaza, Los Angeles, 90095-1555 CA USA
| |
Collapse
|
6
|
Monzó C, Gkioni L, Beyer A, Valenzano DR, Grönke S, Partridge L. Dietary restriction mitigates the age-associated decline in mouse B cell receptor repertoire diversity. Cell Rep 2023; 42:112722. [PMID: 37384530 PMCID: PMC10391628 DOI: 10.1016/j.celrep.2023.112722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 05/07/2023] [Accepted: 06/13/2023] [Indexed: 07/01/2023] Open
Abstract
Aging impairs the capacity to respond to novel antigens, reducing immune protection against pathogens and vaccine efficacy. Dietary restriction (DR) extends life- and health span in diverse animals. However, little is known about the capacity of DR to combat the decline in immune function. Here, we study the changes in B cell receptor (BCR) repertoire during aging in DR and control mice. By sequencing the variable region of the BCR heavy chain in the spleen, we show that DR preserves diversity and attenuates the increase in clonal expansions throughout aging. Remarkably, mice starting DR in mid-life have repertoire diversity and clonal expansion rates indistinguishable from chronic DR mice. In contrast, in the intestine, these traits are unaffected by either age or DR. Reduced within-individual B cell repertoire diversity and increased clonal expansions are correlated with higher morbidity, suggesting a potential contribution of B cell repertoire dynamics to health during aging.
Collapse
Affiliation(s)
- Carolina Monzó
- Department Biological Mechanisms of Ageing, Max Planck Institute for Biology of Ageing, 50931 Cologne, North Rhine Westphalia, Germany; Cologne Excellence Cluster on Cellular Stress Responses in Age-Associated Diseases (CECAD), Faculty of Medicine and Faculty of Mathematics and Natural Sciences, University of Cologne, 50931 Cologne, Germany
| | - Lisonia Gkioni
- Department Biological Mechanisms of Ageing, Max Planck Institute for Biology of Ageing, 50931 Cologne, North Rhine Westphalia, Germany
| | - Andreas Beyer
- Cologne Excellence Cluster on Cellular Stress Responses in Age-Associated Diseases (CECAD), Faculty of Medicine and Faculty of Mathematics and Natural Sciences, University of Cologne, 50931 Cologne, Germany
| | - Dario Riccardo Valenzano
- Microbiome-Host Interactions in Ageing Group, Max Planck Institute for Biology of Ageing, 50931 Cologne, North Rhine Westphalia, Germany; Evolutionary Biology/Microbiome-Host Interactions in Aging Group: Fritz Lipmann Institute - Leibniz Institute on Aging, 07745 Jena, Thuringia, Germany.
| | - Sebastian Grönke
- Department Biological Mechanisms of Ageing, Max Planck Institute for Biology of Ageing, 50931 Cologne, North Rhine Westphalia, Germany.
| | - Linda Partridge
- Department Biological Mechanisms of Ageing, Max Planck Institute for Biology of Ageing, 50931 Cologne, North Rhine Westphalia, Germany; Genetics, Evolution & Environment Group, Institute of Healthy Ageing, University College London, London WC1E 6BT, UK.
| |
Collapse
|
7
|
Ford EE, Tieri D, Rodriguez OL, Francoeur NJ, Soto J, Kos JT, Peres A, Gibson WS, Silver CA, Deikus G, Hudson E, Woolley CR, Beckmann N, Charney A, Mitchell TC, Yaari G, Sebra RP, Watson CT, Smith ML. FLAIRR-Seq: A Method for Single-Molecule Resolution of Near Full-Length Antibody H Chain Repertoires. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2023; 210:1607-1619. [PMID: 37027017 PMCID: PMC10152037 DOI: 10.4049/jimmunol.2200825] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 03/14/2023] [Indexed: 04/08/2023]
Abstract
Current Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) using short-read sequencing strategies resolve expressed Ab transcripts with limited resolution of the C region. In this article, we present the near-full-length AIRR-seq (FLAIRR-seq) method that uses targeted amplification by 5' RACE, combined with single-molecule, real-time sequencing to generate highly accurate (99.99%) human Ab H chain transcripts. FLAIRR-seq was benchmarked by comparing H chain V (IGHV), D (IGHD), and J (IGHJ) gene usage, complementarity-determining region 3 length, and somatic hypermutation to matched datasets generated with standard 5' RACE AIRR-seq using short-read sequencing and full-length isoform sequencing. Together, these data demonstrate robust FLAIRR-seq performance using RNA samples derived from PBMCs, purified B cells, and whole blood, which recapitulated results generated by commonly used methods, while additionally resolving H chain gene features not documented in IMGT at the time of submission. FLAIRR-seq data provide, for the first time, to our knowledge, simultaneous single-molecule characterization of IGHV, IGHD, IGHJ, and IGHC region genes and alleles, allele-resolved subisotype definition, and high-resolution identification of class switch recombination within a clonal lineage. In conjunction with genomic sequencing and genotyping of IGHC genes, FLAIRR-seq of the IgM and IgG repertoires from 10 individuals resulted in the identification of 32 unique IGHC alleles, 28 (87%) of which were previously uncharacterized. Together, these data demonstrate the capabilities of FLAIRR-seq to characterize IGHV, IGHD, IGHJ, and IGHC gene diversity for the most comprehensive view of bulk-expressed Ab repertoires to date.
Collapse
Affiliation(s)
- Easton E. Ford
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - David Tieri
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Oscar L. Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Nancy J. Francoeur
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Juan Soto
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Justin T. Kos
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, Israel
| | - William S. Gibson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Catherine A. Silver
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Gintaras Deikus
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Elizabeth Hudson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Cassandra R. Woolley
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY
| | - Noam Beckmann
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Alexander Charney
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Thomas C. Mitchell
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, Israel
| | - Robert P. Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Melissa L. Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| |
Collapse
|
8
|
Neuman H, Arrouasse J, Benjamini O, Mehr R, Kedmi M. B cell M-CLL clones retain selection against replacement mutations in their immunoglobulin gene framework regions. Front Oncol 2023; 13:1115361. [PMID: 37007112 PMCID: PMC10060519 DOI: 10.3389/fonc.2023.1115361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 03/03/2023] [Indexed: 03/18/2023] Open
Abstract
IntroductionChronic lymphocytic leukemia (CLL) is the most common adult leukemia, accounting for 30–40% of all adult leukemias. The dynamics of B-lymphocyte CLL clones with mutated immunoglobulin heavy chain variable region (IgHV) genes in their tumor (M-CLL) can be studied using mutational lineage trees.MethodsHere, we used lineage tree-based analyses of somatic hypermutation (SHM) and selection in M-CLL clones, comparing the dominant (presumably malignant) clones of 15 CLL patients to their non-dominant (presumably normal) B cell clones, and to those of healthy control repertoires. This type of analysis, which was never previously published in CLL, yielded the following novel insights. ResultsCLL dominant clones undergo – or retain – more replacement mutations that alter amino acid properties such as charge or hydropathy. Although, as expected, CLL dominant clones undergo weaker selection for replacement mutations in the complementarity determining regions (CDRs) and against replacement mutations in the framework regions (FWRs) than non-dominant clones in the same patients or normal B cell clones in healthy controls, they surprisingly retain some of the latter selection in their FWRs. Finally, using machine learning, we show that even the non-dominant clones in CLL patients differ from healthy control clones in various features, most notably their expression of higher fractions of transition mutations. DiscussionOverall, CLL seems to be characterized by significant loosening – but not a complete loss – of the selection forces operating on B cell clones, and possibly also by changes in SHM mechanisms.
Collapse
Affiliation(s)
- Hadas Neuman
- The Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel
| | - Jessica Arrouasse
- The Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel
| | - Ohad Benjamini
- Division of Hematology and Bone Marrow Transplantation, Chaim Sheba Medical Center, Ramat-Gan, Israel
- Sackler School of Medicine, Tel-Aviv University, Tel-Aviv, Israel
| | - Ramit Mehr
- The Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel
- *Correspondence: Ramit Mehr,
| | - Meirav Kedmi
- The Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel
- Division of Hematology and Bone Marrow Transplantation, Chaim Sheba Medical Center, Ramat-Gan, Israel
- Sackler School of Medicine, Tel-Aviv University, Tel-Aviv, Israel
| |
Collapse
|
9
|
Akerman O, Isakov H, Levi R, Psevkin V, Louzoun Y. Counting is almost all you need. Front Immunol 2023; 13:1031011. [PMID: 36741395 PMCID: PMC9896581 DOI: 10.3389/fimmu.2022.1031011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 12/27/2022] [Indexed: 01/21/2023] Open
Abstract
The immune memory repertoire encodes the history of present and past infections and immunological attributes of the individual. As such, multiple methods were proposed to use T-cell receptor (TCR) repertoires to detect disease history. We here show that the counting method outperforms two leading algorithms. We then show that the counting can be further improved using a novel attention model to weigh the different TCRs. The attention model is based on the projection of TCRs using a Variational AutoEncoder (VAE). Both counting and attention algorithms predict better than current leading algorithms whether the host had CMV and its HLA alleles. As an intermediate solution between the complex attention model and the very simple counting model, we propose a new Graph Convolutional Network approach that obtains the accuracy of the attention model and the simplicity of the counting model. The code for the models used in the paper is provided at: https://github.com/louzounlab/CountingIsAlmostAllYouNeed.
Collapse
Affiliation(s)
- Ofek Akerman
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
- Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
| | - Haim Isakov
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Reut Levi
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Vladimir Psevkin
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Yoram Louzoun
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| |
Collapse
|
10
|
Pennell M, Rodriguez OL, Watson CT, Greiff V. The evolutionary and functional significance of germline immunoglobulin gene variation. Trends Immunol 2023; 44:7-21. [PMID: 36470826 DOI: 10.1016/j.it.2022.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 11/07/2022] [Indexed: 12/04/2022]
Abstract
The recombination between immunoglobulin (IG) gene segments determines an individual's naïve antibody repertoire and, consequently, (auto)antigen recognition. Emerging evidence suggests that mammalian IG germline variation impacts humoral immune responses associated with vaccination, infection, and autoimmunity - from the molecular level of epitope specificity, up to profound changes in the architecture of antibody repertoires. These links between IG germline variants and immunophenotype raise the question on the evolutionary causes and consequences of diversity within IG loci. We discuss why the extreme diversity in IG loci remains a mystery, why resolving this is important for the design of more effective vaccines and therapeutics, and how recent evidence from multiple lines of inquiry may help us do so.
Collapse
Affiliation(s)
- Matt Pennell
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Victor Greiff
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway.
| |
Collapse
|
11
|
Hayashi S, Ishikawa S. Analyzing Antibody Repertoire Using Next-Generation Sequencing and Machine Learning. Methods Mol Biol 2023; 2552:465-473. [PMID: 36346609 DOI: 10.1007/978-1-0716-2609-2_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Advances in high-throughput sequencing technologies have enabled comprehensive sequencing of the immune repertoire. Since repertoire analysis can help to explain the relationship between the immune system and diseases, several methods have been developed for repertoire analysis. Here, using simulated and real-world datasets, we describe how to use DeepRC, a method that applies cutting-edge machine learning techniques.
Collapse
Affiliation(s)
- Shuto Hayashi
- Department of Preventive Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Shumpei Ishikawa
- Department of Preventive Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
12
|
Ostmeyer J, Cowell L, Christley S. Dynamic kernel matching for non-conforming data: A case study of T cell receptor datasets. PLoS One 2023; 18:e0265313. [PMID: 36881590 PMCID: PMC9990938 DOI: 10.1371/journal.pone.0265313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 03/01/2022] [Indexed: 03/08/2023] Open
Abstract
Most statistical classifiers are designed to find patterns in data where numbers fit into rows and columns, like in a spreadsheet, but many kinds of data do not conform to this structure. To uncover patterns in non-conforming data, we describe an approach for modifying established statistical classifiers to handle non-conforming data, which we call dynamic kernel matching (DKM). As examples of non-conforming data, we consider (i) a dataset of T-cell receptor (TCR) sequences labelled by disease antigen and (ii) a dataset of sequenced TCR repertoires labelled by patient cytomegalovirus (CMV) serostatus, anticipating that both datasets contain signatures for diagnosing disease. We successfully fit statistical classifiers augmented with DKM to both datasets and report the performance on holdout data using standard metrics and metrics allowing for indeterminant diagnoses. Finally, we identify the patterns used by our statistical classifiers to generate predictions and show that these patterns agree with observations from experimental studies.
Collapse
Affiliation(s)
- Jared Ostmeyer
- Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- * E-mail:
| | - Lindsay Cowell
- Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Scott Christley
- Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| |
Collapse
|
13
|
Kanduri C, Scheffer L, Pavlović M, Rand KD, Chernigovskaya M, Pirvandy O, Yaari G, Greiff V, Sandve GK. simAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods. Gigascience 2022; 12:giad074. [PMID: 37848619 PMCID: PMC10580376 DOI: 10.1093/gigascience/giad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 07/20/2023] [Accepted: 08/29/2023] [Indexed: 10/19/2023] Open
Abstract
BACKGROUND Machine learning (ML) has gained significant attention for classifying immune states in adaptive immune receptor repertoires (AIRRs) to support the advancement of immunodiagnostics and therapeutics. Simulated data are crucial for the rigorous benchmarking of AIRR-ML methods. Existing approaches to generating synthetic benchmarking datasets result in the generation of naive repertoires missing the key feature of many shared receptor sequences (selected for common antigens) found in antigen-experienced repertoires. RESULTS We demonstrate that a common approach to generating simulated AIRR benchmark datasets can introduce biases, which may be exploited for undesired shortcut learning by certain ML methods. To mitigate undesirable access to true signals in simulated AIRR datasets, we devised a simulation strategy (simAIRR) that constructs antigen-experienced-like repertoires with a realistic overlap of receptor sequences. simAIRR can be used for constructing AIRR-level benchmarks based on a range of assumptions (or experimental data sources) for what constitutes receptor-level immune signals. This includes the possibility of making or not making any prior assumptions regarding the similarity or commonality of immune state-associated sequences that will be used as true signals. We demonstrate the real-world realism of our proposed simulation approach by showing that basic ML strategies perform similarly on simAIRR-generated and real-world experimental AIRR datasets. CONCLUSIONS This study sheds light on the potential shortcut learning opportunities for ML methods that can arise with the state-of-the-art way of simulating AIRR datasets. simAIRR is available as a Python package: https://github.com/KanduriC/simAIRR.
Collapse
Affiliation(s)
- Chakravarthi Kanduri
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
- UiORealArt Convergence Environment, University of Oslo, 0373 Oslo, Norway
| | - Lonneke Scheffer
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
| | - Milena Pavlović
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
- UiORealArt Convergence Environment, University of Oslo, 0373 Oslo, Norway
| | - Knut Dagestad Rand
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
| | - Maria Chernigovskaya
- Department of Immunology and Oslo University Hospital, University of Oslo, 0373 Oslo, Norway
| | - Oz Pirvandy
- Faculty of Engineering, Bar-Ilan University, 5290002, Israel
| | - Gur Yaari
- Faculty of Engineering, Bar-Ilan University, 5290002, Israel
| | - Victor Greiff
- Department of Immunology and Oslo University Hospital, University of Oslo, 0373 Oslo, Norway
| | - Geir K Sandve
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
- UiORealArt Convergence Environment, University of Oslo, 0373 Oslo, Norway
| |
Collapse
|
14
|
Weber CR, Rubio T, Wang L, Zhang W, Robert PA, Akbar R, Snapkov I, Wu J, Kuijjer ML, Tarazona S, Conesa A, Sandve GK, Liu X, Reddy ST, Greiff V. Reference-based comparison of adaptive immune receptor repertoires. CELL REPORTS METHODS 2022; 2:100269. [PMID: 36046619 PMCID: PMC9421535 DOI: 10.1016/j.crmeth.2022.100269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 04/01/2022] [Accepted: 07/19/2022] [Indexed: 11/26/2022]
Abstract
B and T cell receptor (immune) repertoires can represent an individual's immune history. While current repertoire analysis methods aim to discriminate between health and disease states, they are typically based on only a limited number of parameters. Here, we introduce immuneREF: a quantitative multidimensional measure of adaptive immune repertoire (and transcriptome) similarity that allows interpretation of immune repertoire variation by relying on both repertoire features and cross-referencing of simulated and experimental datasets. To quantify immune repertoire similarity landscapes across health and disease, we applied immuneREF to >2,400 datasets from individuals with varying immune states (healthy, [autoimmune] disease, and infection). We discovered, in contrast to the current paradigm, that blood-derived immune repertoires of healthy and diseased individuals are highly similar for certain immune states, suggesting that repertoire changes to immune perturbations are less pronounced than previously thought. In conclusion, immuneREF enables the population-wide study of adaptive immune response similarity across immune states.
Collapse
Affiliation(s)
- Cédric R. Weber
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Teresa Rubio
- Laboratory of Neurobiology, Centro Investigación Príncipe Felipe, Valencia, Spain
| | - Longlong Wang
- BGI-Shenzhen, Shenzhen, China
- BGI-Education Center, University of Chinese Academy of Sciences, Shenzhen, China
| | - Wei Zhang
- BGI-Shenzhen, Shenzhen, China
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Philippe A. Robert
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, Norway
| | - Rahmad Akbar
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, Norway
| | - Igor Snapkov
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, Norway
| | | | - Marieke L. Kuijjer
- Centre for Molecular Medicine Norway, University of Oslo, Oslo, Norway
- Department of Pathology, Leiden University Medical Center, Leiden, the Netherlands
- Leiden Center for Computational Oncology, Leiden University Medical Center, Leiden, the Netherlands
| | - Sonia Tarazona
- Departamento de Estadística e Investigación Operativa Aplicadas y Calidad, Universitat Politècnica de València, Valencia, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Valencia, Spain
| | - Geir K. Sandve
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Xiao Liu
- BGI-Shenzhen, Shenzhen, China
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
| | - Sai T. Reddy
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Victor Greiff
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, Norway
| |
Collapse
|
15
|
Leem J, Mitchell LS, Farmery JH, Barton J, Galson JD. Deciphering the language of antibodies using self-supervised learning. PATTERNS 2022; 3:100513. [PMID: 35845836 PMCID: PMC9278498 DOI: 10.1016/j.patter.2022.100513] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/01/2022] [Accepted: 04/26/2022] [Indexed: 11/17/2022]
Abstract
An individual’s B cell receptor (BCR) repertoire encodes information about past immune responses and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our understanding of disease and enable discovery of novel diagnostics and antibody therapeutics. A key challenge of BCR sequence analysis is the prediction of BCR properties from their amino acid sequence alone. Here, we present an antibody-specific language model, Antibody-specific Bidirectional Encoder Representation from Transformers (AntiBERTa), which provides a contextualized representation of BCR sequences. Following pre-training, we show that AntiBERTa embeddings capture biologically relevant information, generalizable to a range of applications. As a case study, we fine-tune AntiBERTa to predict paratope positions from an antibody sequence, outperforming public tools across multiple metrics. To our knowledge, AntiBERTa is the deepest protein-family-specific language model, providing a rich representation of BCRs. AntiBERTa embeddings are primed for multiple downstream tasks and can improve our understanding of the language of antibodies. AntiBERTa is an antibody-specific transformer model for representation learning AntiBERTa embeddings capture aspects of antibody function Attention maps of AntiBERTa correspond to structural contacts and binding sites AntiBERTa can be fine-tuned for state-of-the-art paratope prediction
Understanding antibody function is critical for deciphering the biology of disease and for the discovery of novel therapeutic antibodies. The challenge is the vast diversity of antibody variants compared with the limited labeled data available. We overcome this challenge by using self-supervised learning to train a large antibody-specific language model, followed by transfer learning, to fine-tune the model for predicting information related to antibody function. We initially demonstrate the success of the model by providing leading results in antibody binding site prediction. The model is amenable to further fine-tuning for diverse applications to improve our understanding of antibody function.
Collapse
Affiliation(s)
- Jinwoo Leem
- Alchemab Therapeutics, Ltd., East Side, Office 1.02, Kings Cross, London N1C 4AX, UK
- Corresponding author
| | - Laura S. Mitchell
- Alchemab Therapeutics, Ltd., East Side, Office 1.02, Kings Cross, London N1C 4AX, UK
| | - James H.R. Farmery
- Alchemab Therapeutics, Ltd., East Side, Office 1.02, Kings Cross, London N1C 4AX, UK
| | - Justin Barton
- Alchemab Therapeutics, Ltd., East Side, Office 1.02, Kings Cross, London N1C 4AX, UK
| | - Jacob D. Galson
- Alchemab Therapeutics, Ltd., East Side, Office 1.02, Kings Cross, London N1C 4AX, UK
| |
Collapse
|
16
|
Jackson KJL, Kos JT, Lees W, Gibson WS, Smith ML, Peres A, Yaari G, Corcoran M, Busse CE, Ohlin M, Watson CT, Collins AM. A BALB/c IGHV Reference Set, Defined by Haplotype Analysis of Long-Read VDJ-C Sequences From F1 (BALB/c x C57BL/6) Mice. Front Immunol 2022; 13:888555. [PMID: 35720344 PMCID: PMC9205180 DOI: 10.3389/fimmu.2022.888555] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 04/28/2022] [Indexed: 11/13/2022] Open
Abstract
The immunoglobulin genes of inbred mouse strains that are commonly used in models of antibody-mediated human diseases are poorly characterized. This compromises data analysis. To infer the immunoglobulin genes of BALB/c mice, we used long-read SMRT sequencing to amplify VDJ-C sequences from F1 (BALB/c x C57BL/6) hybrid animals. Strain variations were identified in the Ighm and Ighg2b genes, and analysis of VDJ rearrangements led to the inference of 278 germline IGHV alleles. 169 alleles are not present in the C57BL/6 genome reference sequence. To establish a set of expressed BALB/c IGHV germline gene sequences, we computationally retrieved IGHV haplotypes from the IgM dataset. Haplotyping led to the confirmation of 162 BALB/c IGHV gene sequences. A musIGHV398 pseudogene variant also appears to be present in the BALB/cByJ substrain, while a functional musIGHV398 gene is highly expressed in the BALB/cJ substrain. Only four of the BALB/c alleles were also observed in the C57BL/6 haplotype. The full set of inferred BALB/c sequences has been used to establish a BALB/c IGHV reference set, hosted at https://ogrdb.airr-community.org. We assessed whether assemblies from the Mouse Genome Project (MGP) are suitable for the determination of the genes of the IGH loci. Only 37 (43.5%) of the 85 confirmed IMGT-named BALB/c IGHV and 33 (42.9%) of the 77 confirmed non-IMGT IGHV were found in a search of the MGP BALB/cJ genome assembly. This suggests that current MGP assemblies are unsuitable for the comprehensive documentation of germline IGHVs and more efforts will be needed to establish strain-specific reference sets.
Collapse
Affiliation(s)
| | - Justin T. Kos
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - William Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - William S. Gibson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Melissa Laird Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Christian E. Busse
- Division of B Cell Immunology, German Cancer Research Center, Heidelberg, Germany
| | - Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
- *Correspondence: Corey T. Watson, ; Andrew M. Collins,
| | - Andrew M. Collins
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW, Australia
- *Correspondence: Corey T. Watson, ; Andrew M. Collins,
| |
Collapse
|
17
|
Kanduri C, Pavlović M, Scheffer L, Motwani K, Chernigovskaya M, Greiff V, Sandve GK. Profiling the baseline performance and limits of machine learning models for adaptive immune receptor repertoire classification. Gigascience 2022; 11:6593147. [PMID: 35639633 PMCID: PMC9154052 DOI: 10.1093/gigascience/giac046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 12/23/2021] [Accepted: 04/08/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Machine learning (ML) methodology development for the classification of immune states in adaptive immune receptor repertoires (AIRRs) has seen a recent surge of interest. However, so far, there does not exist a systematic evaluation of scenarios where classical ML methods (such as penalized logistic regression) already perform adequately for AIRR classification. This hinders investigative reorientation to those scenarios where method development of more sophisticated ML approaches may be required. RESULTS To identify those scenarios where a baseline ML method is able to perform well for AIRR classification, we generated a collection of synthetic AIRR benchmark data sets encompassing a wide range of data set architecture-associated and immune state-associated sequence patterns (signal) complexity. We trained ≈1,700 ML models with varying assumptions regarding immune signal on ≈1,000 data sets with a total of ≈250,000 AIRRs containing ≈46 billion TCRβ CDR3 amino acid sequences, thereby surpassing the sample sizes of current state-of-the-art AIRR-ML setups by two orders of magnitude. We found that L1-penalized logistic regression achieved high prediction accuracy even when the immune signal occurs only in 1 out of 50,000 AIR sequences. CONCLUSIONS We provide a reference benchmark to guide new AIRR-ML classification methodology by (i) identifying those scenarios characterized by immune signal and data set complexity, where baseline methods already achieve high prediction accuracy, and (ii) facilitating realistic expectations of the performance of AIRR-ML models given training data set properties and assumptions. Our study serves as a template for defining specialized AIRR benchmark data sets for comprehensive benchmarking of AIRR-ML methods.
Collapse
Affiliation(s)
- Chakravarthi Kanduri
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway
| | - Milena Pavlović
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway
| | - Lonneke Scheffer
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway
| | - Keshav Motwani
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida, FL 32610, USA
| | - Maria Chernigovskaya
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, 0372, Norway
| | - Victor Greiff
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, 0372, Norway
| | - Geir K Sandve
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway
| |
Collapse
|
18
|
Pashova S, Balabanski L, Elmadjian G, Savov A, Stoyanova E, Shivarov V, Petrov P, Pashov A. Restriction of the Global IgM Repertoire in Antiphospholipid Syndrome. Front Immunol 2022; 13:865232. [PMID: 35493489 PMCID: PMC9043687 DOI: 10.3389/fimmu.2022.865232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 03/21/2022] [Indexed: 11/22/2022] Open
Abstract
The typical anti-phospholipid antibodies (APLA) in the anti-phospholipid syndrome (APS) are reactive with the phospholipid-binding protein β2GPI as well as a growing list of other protein targets. The relation of APLA to natural antibodies and the fuzzy set of autoantigens involved provoked us to study the changes in the IgM repertoire in APS. To this end, peptides selected by serum IgM from a 7-residue linear peptide phage display library (PDL) were deep sequenced. The analysis was aided by a novel formal representation of the Igome (the mimotope set reflecting the IgM specificities) in the form of a sequence graph. The study involved women with APLA and habitual abortions (n=24) compared to age-matched clinically healthy pregnant women (n=20). Their pooled Igomes (297 028 mimotope sequences) were compared also to the global public repertoire Igome of pooled donor plasma IgM (n=2 796 484) and a set of 7-mer sequences found in the J regions of human immunoglobulins (n=4 433 252). The pooled Igome was represented as a graph connecting the sequences as similar as the mimotopes of the same monoclonal antibody. The criterion was based on previously published data. In the resulting graph, identifiable clusters of vertices were considered related to the footprints of overlapping antibody cross-reactivities. A subgraph based on the clusters with a significant differential expression of APS patients’ mimotopes contained predominantly specificities underrepresented in APS. The differentially expressed IgM footprints showed also an increased cross-reactivity with immunoglobulin J regions. The specificities underexpressed in APS had a higher correlation with public specificities than those overexpressed. The APS associated specificities were strongly related also to the human peptidome with 1 072 mimotope sequences found in 7 519 human proteins. These regions were characterized by low complexity. Thus, the IgM repertoire of the APS patients was found to be characterized by a significant reduction of certain public specificities found in the healthy controls with targets representing low complexity linear self-epitopes homologous to human antibody J regions.
Collapse
Affiliation(s)
- Shina Pashova
- Institute of Biology and Immunology of Reproduction, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Lubomir Balabanski
- Department of Medical Genetics, Medical University-Sofia, Sofia, Bulgaria.,Genomics Laboratory, Hospital "Malinov", Sofia, Bulgaria
| | - Gabriel Elmadjian
- Institute of Biology and Immunology of Reproduction, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Alexey Savov
- Department of Medical Genetics, Medical University-Sofia, Sofia, Bulgaria
| | - Elena Stoyanova
- Institute of Biology and Immunology of Reproduction, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | | | - Peter Petrov
- Institute Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Anastas Pashov
- Institute of Microbiology, Bulgarian Academy of Sciences, Sofia, Bulgaria
| |
Collapse
|
19
|
Ehling RA, Weber CR, Mason DM, Friedensohn S, Wagner B, Bieberich F, Kapetanovic E, Vazquez-Lombardi R, Di Roberto RB, Hong KL, Wagner C, Pataia M, Overath MD, Sheward DJ, Murrell B, Yermanos A, Cuny AP, Savic M, Rudolf F, Reddy ST. SARS-CoV-2 reactive and neutralizing antibodies discovered by single-cell sequencing of plasma cells and mammalian display. Cell Rep 2022; 38:110242. [PMID: 34998467 PMCID: PMC8692065 DOI: 10.1016/j.celrep.2021.110242] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 09/22/2021] [Accepted: 12/20/2021] [Indexed: 01/05/2023] Open
Abstract
Characterization of COVID-19 antibodies has largely focused on memory B cells; however, it is the antibody-secreting plasma cells that are directly responsible for the production of serum antibodies, which play a critical role in resolving SARS-CoV-2 infection. Little is known about the specificity of plasma cells, largely because plasma cells lack surface antibody expression, thereby complicating their screening. Here, we describe a technology pipeline that integrates single-cell antibody repertoire sequencing and mammalian display to interrogate the specificity of plasma cells from 16 convalescent patients. Single-cell sequencing allows us to profile antibody repertoire features and identify expanded clonal lineages. Mammalian display screening is used to reveal that 43 antibodies (of 132 candidates) derived from expanded plasma cell lineages are specific to SARS-CoV-2 antigens, including antibodies with high affinity to the SARS-CoV-2 receptor-binding domain (RBD) that exhibit potent neutralization and broad binding to the RBD of SARS-CoV-2 variants (of concern/interest).
Collapse
Affiliation(s)
- Roy A Ehling
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Cédric R Weber
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; deepCDR Biologics AG, Basel, Switzerland
| | - Derek M Mason
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; deepCDR Biologics AG, Basel, Switzerland
| | - Simon Friedensohn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; deepCDR Biologics AG, Basel, Switzerland
| | - Bastian Wagner
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Florian Bieberich
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Edo Kapetanovic
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | | | - Raphaël B Di Roberto
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Kai-Lin Hong
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Botnar Research Centre for Child Health, Basel, Switzerland
| | | | - Michele Pataia
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; deepCDR Biologics AG, Basel, Switzerland
| | - Max D Overath
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Daniel J Sheward
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Ben Murrell
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Alexander Yermanos
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Botnar Research Centre for Child Health, Basel, Switzerland; Institute of Microbiology and Immunology, Department of Biology, ETH Zurich, Zurich, Switzerland; Department of Pathology and Immunology, University of Geneva, Geneva, Switzerland
| | - Andreas P Cuny
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Swiss Institute of Bioinformatics, Mattenstr. 26, 4058 Basel, Switzerland
| | - Miodrag Savic
- Department of Biomedical Engineering, University of Basel, Allschwil, Switzerland; Department of Surgery, Oral and Cranio-Maxillofacial Surgery, University Hospital Basel, Basel, Switzerland; Department of Health, Economics and Health Directorate, Canton Basel-Landschaft, Switzerland
| | - Fabian Rudolf
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Swiss Institute of Bioinformatics, Mattenstr. 26, 4058 Basel, Switzerland
| | - Sai T Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Botnar Research Centre for Child Health, Basel, Switzerland.
| |
Collapse
|
20
|
Marquez S, Babrak L, Greiff V, Hoehn KB, Lees WD, Luning Prak ET, Miho E, Rosenfeld AM, Schramm CA, Stervbo U. Adaptive Immune Receptor Repertoire (AIRR) Community Guide to Repertoire Analysis. Methods Mol Biol 2022; 2453:297-316. [PMID: 35622333 PMCID: PMC9761518 DOI: 10.1007/978-1-0716-2115-8_17] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Adaptive immune receptor repertoires (AIRRs) are rich with information that can be mined for insights into the workings of the immune system. Gene usage, CDR3 properties, clonal lineage structure, and sequence diversity are all capable of revealing the dynamic immune response to perturbation by disease, vaccination, or other interventions. Here we focus on a conceptual introduction to the many aspects of repertoire analysis and orient the reader toward the uses and advantages of each. Along the way, we note some of the many software tools that have been developed for these investigations and link the ideas discussed to chapters on methods provided elsewhere in this volume.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Chaim A Schramm
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA.
| | - Ulrik Stervbo
- Center for Translational Medicine, Immunology, and Transplantation, Medical Department I, Marien Hospital Herne, University Hospital of the Ruhr-University Bochum, Herne, Germany. .,Immundiagnostik, Marien Hospital Herne, University Hospital of the Ruhr-University Bochum, Herne, Germany.
| | | |
Collapse
|
21
|
Akbar R, Robert PA, Weber CR, Widrich M, Frank R, Pavlović M, Scheffer L, Chernigovskaya M, Snapkov I, Slabodkin A, Mehta BB, Miho E, Lund-Johansen F, Andersen JT, Hochreiter S, Hobæk Haff I, Klambauer G, Sandve GK, Greiff V. In silico proof of principle of machine learning-based antibody design at unconstrained scale. MAbs 2022; 14:2031482. [PMID: 35377271 PMCID: PMC8986205 DOI: 10.1080/19420862.2022.2031482] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Generative machine learning (ML) has been postulated to become a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody-binding parameters. The simulation framework enables the computation of synthetic antibody-antigen 3D-structures, and it functions as an oracle for unrestricted prospective evaluation and benchmarking of antibody design parameters of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (one dimensional: 1D) data can be used to design conformational (three dimensional: 3D) epitope-specific antibodies, matching, or exceeding the training dataset in affinity and developability parameter value variety. Furthermore, we established a lower threshold of sequence diversity necessary for high-accuracy generative antibody ML and demonstrated that this lower threshold also holds on experimental real-world data. Finally, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design.
Collapse
Affiliation(s)
- Rahmad Akbar
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Philippe A Robert
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Cédric R Weber
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Michael Widrich
- Ellis Unit Linz and Lit Ai Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
| | - Robert Frank
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | | | | | - Maria Chernigovskaya
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Igor Snapkov
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Andrei Slabodkin
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Brij Bhushan Mehta
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Enkelejda Miho
- Institute of Medical Engineering and Medical Informatics, School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Muttenz, Switzerland
| | - Fridtjof Lund-Johansen
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Jan Terje Andersen
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway.,Institute of Clinical Medicine, Department of Pharmacology, University of Oslo, Oslo, Norway
| | - Sepp Hochreiter
- Ellis Unit Linz and Lit Ai Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria.,Institute of Advanced Research in Artificial Intelligence (IARAI), Austria
| | | | - Günter Klambauer
- Ellis Unit Linz and Lit Ai Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
| | | | - Victor Greiff
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| |
Collapse
|
22
|
Slabodkin A, Chernigovskaya M, Mikocziova I, Akbar R, Scheffer L, Pavlović M, Bashour H, Snapkov I, Mehta BB, Weber CR, Gutierrez-Marcos J, Sollid LM, Haff IH, Sandve GK, Robert PA, Greiff V. Individualized VDJ recombination predisposes the available Ig sequence space. Genome Res 2021; 31:2209-2224. [PMID: 34815307 PMCID: PMC8647828 DOI: 10.1101/gr.275373.121] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 10/20/2021] [Indexed: 11/25/2022]
Abstract
The process of recombination between variable (V), diversity (D), and joining (J) immunoglobulin (Ig) gene segments determines an individual's naive Ig repertoire and, consequently, (auto)antigen recognition. VDJ recombination follows probabilistic rules that can be modeled statistically. So far, it remains unknown whether VDJ recombination rules differ between individuals. If these rules differed, identical (auto)antigen-specific Ig sequences would be generated with individual-specific probabilities, signifying that the available Ig sequence space is individual specific. We devised a sensitivity-tested distance measure that enables inter-individual comparison of VDJ recombination models. We discovered, accounting for several sources of noise as well as allelic variation in Ig sequencing data, that not only unrelated individuals but also human monozygotic twins and even inbred mice possess statistically distinguishable immunoglobulin recombination models. This suggests that, in addition to genetic, there is also nongenetic modulation of VDJ recombination. We demonstrate that population-wide individualized VDJ recombination can result in orders of magnitude of difference in the probability to generate (auto)antigen-specific Ig sequences. Our findings have implications for immune receptor-based individualized medicine approaches relevant to vaccination, infection, and autoimmunity.
Collapse
Affiliation(s)
- Andrei Slabodkin
- Department of Immunology and Oslo University Hospital, University of Oslo, 0372 Oslo, Norway
| | - Maria Chernigovskaya
- Department of Immunology and Oslo University Hospital, University of Oslo, 0372 Oslo, Norway
| | - Ivana Mikocziova
- Department of Immunology and Oslo University Hospital, University of Oslo, 0372 Oslo, Norway
| | - Rahmad Akbar
- Department of Immunology and Oslo University Hospital, University of Oslo, 0372 Oslo, Norway
| | - Lonneke Scheffer
- Department of Informatics, University of Oslo, 0373 Oslo, Norway
| | - Milena Pavlović
- Department of Informatics, University of Oslo, 0373 Oslo, Norway
| | - Habib Bashour
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Igor Snapkov
- Department of Immunology and Oslo University Hospital, University of Oslo, 0372 Oslo, Norway
| | - Brij Bhushan Mehta
- Department of Immunology and Oslo University Hospital, University of Oslo, 0372 Oslo, Norway
| | - Cédric R Weber
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | | | - Ludvig M Sollid
- Department of Immunology and Oslo University Hospital, University of Oslo, 0372 Oslo, Norway
| | | | | | - Philippe A Robert
- Department of Immunology and Oslo University Hospital, University of Oslo, 0372 Oslo, Norway
| | - Victor Greiff
- Department of Immunology and Oslo University Hospital, University of Oslo, 0372 Oslo, Norway
| |
Collapse
|
23
|
The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00413-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
24
|
Horst A, Smakaj E, Natali EN, Tosoni D, Babrak LM, Meier P, Miho E. Machine Learning Detects Anti-DENV Signatures in Antibody Repertoire Sequences. Front Artif Intell 2021; 4:715462. [PMID: 34708197 PMCID: PMC8542978 DOI: 10.3389/frai.2021.715462] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 07/30/2021] [Indexed: 11/13/2022] Open
Abstract
Dengue infection is a global threat. As of today, there is no universal dengue fever treatment or vaccines unreservedly recommended by the World Health Organization. The investigation of the specific immune response to dengue virus would support antibody discovery as therapeutics for passive immunization and vaccine design. High-throughput sequencing enables the identification of the multitude of antibodies elicited in response to dengue infection at the sequence level. Artificial intelligence can mine the complex data generated and has the potential to uncover patterns in entire antibody repertoires and detect signatures distinctive of single virus-binding antibodies. However, these machine learning have not been harnessed to determine the immune response to dengue virus. In order to enable the application of machine learning, we have benchmarked existing methods for encoding biological and chemical knowledge as inputs and have investigated novel encoding techniques. We have applied different machine learning methods such as neural networks, random forests, and support vector machines and have investigated the parameter space to determine best performing algorithms for the detection and prediction of antibody patterns at the repertoire and antibody sequence levels in dengue-infected individuals. Our results show that immune response signatures to dengue are detectable both at the antibody repertoire and at the antibody sequence levels. By combining machine learning with phylogenies and network analysis, we generated novel sequences that present dengue-binding specific signatures. These results might aid further antibody discovery and support vaccine design.
Collapse
Affiliation(s)
- Alexander Horst
- School of Life Sciences, Institute of Medical Engineering and Medical Informatics, University of Applied Sciences and Arts Northwestern Switzerland FHNW, Muttenz, Switzerland
| | - Erand Smakaj
- School of Life Sciences, Institute of Medical Engineering and Medical Informatics, University of Applied Sciences and Arts Northwestern Switzerland FHNW, Muttenz, Switzerland
| | - Eriberto Noel Natali
- School of Life Sciences, Institute of Medical Engineering and Medical Informatics, University of Applied Sciences and Arts Northwestern Switzerland FHNW, Muttenz, Switzerland
| | - Deniz Tosoni
- School of Life Sciences, Institute of Medical Engineering and Medical Informatics, University of Applied Sciences and Arts Northwestern Switzerland FHNW, Muttenz, Switzerland
| | - Lmar Marie Babrak
- School of Life Sciences, Institute of Medical Engineering and Medical Informatics, University of Applied Sciences and Arts Northwestern Switzerland FHNW, Muttenz, Switzerland
| | - Patrick Meier
- School of Life Sciences, Institute of Medical Engineering and Medical Informatics, University of Applied Sciences and Arts Northwestern Switzerland FHNW, Muttenz, Switzerland
| | - Enkelejda Miho
- School of Life Sciences, Institute of Medical Engineering and Medical Informatics, University of Applied Sciences and Arts Northwestern Switzerland FHNW, Muttenz, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,aiNET GmbH, Basel, Switzerland
| |
Collapse
|
25
|
Ghraichy M, von Niederhäusern V, Kovaltsuk A, Galson JD, Deane CM, Trück J. Different B cell subpopulations show distinct patterns in their IgH repertoire metrics. eLife 2021; 10:73111. [PMID: 34661527 PMCID: PMC8560093 DOI: 10.7554/elife.73111] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 10/17/2021] [Indexed: 12/11/2022] Open
Abstract
Several human B cell subpopulations are recognised in the peripheral blood, which play distinct roles in the humoral immune response. These cells undergo developmental and maturational changes involving VDJ recombination, somatic hypermutation and class switch recombination, altogether shaping their immunoglobulin heavy chain (IgH) repertoire. Here, we sequenced the IgH repertoire of naïve, marginal zone, switched and plasma cells from 10 healthy adults along with matched unsorted and in silico separated CD19+ bulk B cells. Using advanced bioinformatic analysis and machine learning, we show that sorted B cell subpopulations are characterised by distinct repertoire characteristics on both the individual sequence and the repertoire level. Sorted subpopulations shared similar repertoire characteristics with their corresponding in silico separated subsets. Furthermore, certain IgH repertoire characteristics correlated with the position of the constant region on the IgH locus. Overall, this study provides unprecedented insight over mechanisms of B cell repertoire control in peripherally circulating B cell subpopulations.
Collapse
Affiliation(s)
- Marie Ghraichy
- Division of Immunology, University Children's Hospital and Children's Research Center, University of Zurich (UZH), Zurich, Switzerland
| | - Valentin von Niederhäusern
- Division of Immunology, University Children's Hospital and Children's Research Center, University of Zurich (UZH), Zurich, Switzerland
| | | | - Jacob D Galson
- Division of Immunology, University Children's Hospital and Children's Research Center, University of Zurich (UZH), Zurich, Switzerland.,Alchemab Therapeutics Ltd, London, United Kingdom
| | - Charlotte M Deane
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Johannes Trück
- Division of Immunology, University Children's Hospital and Children's Research Center, University of Zurich (UZH), Zurich, Switzerland
| |
Collapse
|
26
|
Wang S, Wang L, Liu Y, Zhu Y, Liu Y. Characteristics of T-cell receptor repertoire of stem cell-like memory CD4+ T cells. PeerJ 2021; 9:e11987. [PMID: 34527440 PMCID: PMC8401816 DOI: 10.7717/peerj.11987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 07/26/2021] [Indexed: 11/20/2022] Open
Abstract
Stem cell-like memory T cells (Tscm) combine phenotypes of naïve and memory. However, it remains unclear how T cell receptor (TCR) characteristics contribute to heterogeneity in Tscm and other memory T cells. We compared the TCR-beta (TRB) repertoire characteristics of CD4+ Tscm with those of naïve and other CD4+ memory (Tm) in 16 human subjects. Compared with Tm, Tscm had an increased diversity across all stretches of TRB repertoire structure, a skewed gene usage, and a shorter length distribution of CDR3 region. These distinctions between Tscm and Tm were enlarged in top1000 abundant clonotypes. Furthermore, top1000 clonotypes in Tscm were more public than those in Tm and grouped in more clusters, implying more epitope types recognized by top1000 clonotypes in Tscm. Importantly, self-reactive clonotypes were public and enriched in Tscm rather than Tm, of type one diabetes patients. Therefore, this study highlights the unique features of Tscm different from those of other memory subsets and provides clues to understand the physiological and pathological functions of Tscm.
Collapse
Affiliation(s)
- Shiyu Wang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China.,BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Longlong Wang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China.,BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Yang Liu
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Yonggang Zhu
- School of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen, Shenzhen, China
| | - Ya Liu
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Single-Cell Omics, BGI-Shenzhen, Shenzhen, China
| |
Collapse
|
27
|
Progress and challenges in mass spectrometry-based analysis of antibody repertoires. Trends Biotechnol 2021; 40:463-481. [PMID: 34535228 DOI: 10.1016/j.tibtech.2021.08.006] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 08/16/2021] [Accepted: 08/17/2021] [Indexed: 12/22/2022]
Abstract
Humoral immunity is divided into the cellular B cell and protein-level antibody responses. High-throughput sequencing has advanced our understanding of both these fundamental aspects of B cell immunology as well as aspects pertaining to vaccine and therapeutics biotechnology. Although the protein-level serum and mucosal antibody repertoire make major contributions to humoral protection, the sequence composition and dynamics of antibody repertoires remain underexplored. This limits insight into important immunological and biotechnological parameters such as the number of antigen-specific antibodies, which are for example, relevant for pathogen neutralization, microbiota regulation, severity of autoimmunity, and therapeutic efficacy. High-resolution mass spectrometry (MS) has allowed initial insights into the antibody repertoire. We outline current challenges in MS-based sequence analysis of antibody repertoires and propose strategies for their resolution.
Collapse
|
28
|
Rettig TA, Tan JC, Nishiyama NC, Chapes SK, Pecaut MJ. An Analysis of the Effects of Spaceflight and Vaccination on Antibody Repertoire Diversity. Immunohorizons 2021; 5:675-686. [PMID: 34433623 PMCID: PMC10996920 DOI: 10.4049/immunohorizons.2100056] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 07/26/2021] [Indexed: 11/19/2022] Open
Abstract
Ab repertoire diversity plays a critical role in the host's ability to fight pathogens. CDR3 is partially responsible for Ab-Ag binding and is a significant source of diversity in the repertoire. CDR3 diversity is generated during VDJ rearrangement because of gene segment selection, gene segment trimming and splicing, and the addition of nucleotides. We analyzed the Ab repertoire diversity across multiple experiments examining the effects of spaceflight on the Ab repertoire after vaccination. Five datasets from four experiments were analyzed using rank-abundance curves and Shannon indices as measures of diversity. We discovered a trend toward lower diversity as a result of spaceflight but did not find the same decrease in our physiological model of microgravity in either the spleen or bone marrow. However, the bone marrow repertoire showed a reduction in diversity after vaccination. We also detected differences in Shannon indices between experiments and tissues. We did not detect a pattern of CDR3 usage across the experiments. Overall, we were able to find differences in the Ab repertoire diversity across experimental groups and tissues.
Collapse
Affiliation(s)
- Trisha A Rettig
- Division of Biomedical Engineering Sciences, Department of Basic Sciences, Loma Linda University, Loma Linda, CA
- Division of Biology, Kansas State University, Manhattan, KS
| | - John C Tan
- Division of Biomedical Engineering Sciences, Department of Basic Sciences, Loma Linda University, Loma Linda, CA
| | - Nina C Nishiyama
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC; and
- Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | | | - Michael J Pecaut
- Division of Biomedical Engineering Sciences, Department of Basic Sciences, Loma Linda University, Loma Linda, CA;
| |
Collapse
|
29
|
Mathew NR, Jayanthan JK, Smirnov IV, Robinson JL, Axelsson H, Nakka SS, Emmanouilidi A, Czarnewski P, Yewdell WT, Schön K, Lebrero-Fernández C, Bernasconi V, Rodin W, Harandi AM, Lycke N, Borcherding N, Yewdell JW, Greiff V, Bemark M, Angeletti D. Single-cell BCR and transcriptome analysis after influenza infection reveals spatiotemporal dynamics of antigen-specific B cells. Cell Rep 2021; 35:109286. [PMID: 34161770 PMCID: PMC7612943 DOI: 10.1016/j.celrep.2021.109286] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 05/07/2021] [Accepted: 06/01/2021] [Indexed: 12/15/2022] Open
Abstract
B cell responses are critical for antiviral immunity. However, a comprehensive picture of antigen-specific B cell differentiation, clonal proliferation, and dynamics in different organs after infection is lacking. Here, by combining single-cell RNA and B cell receptor (BCR) sequencing of antigen-specific cells in lymph nodes, spleen, and lungs after influenza infection in mice, we identify several germinal center (GC) B cell subpopulations and organ-specific differences that persist over the course of the response. We discover transcriptional differences between memory cells in lungs and lymphoid organs and organ-restricted clonal expansion. Remarkably, we find significant clonal overlap between GC-derived memory and plasma cells. By combining BCR-mutational analyses with monoclonal antibody (mAb) expression and affinity measurements, we find that memory B cells are highly diverse and can be selected from both low- and high-affinity precursors. By linking antigen recognition with transcriptional programming, clonal proliferation, and differentiation, these finding provide important advances in our understanding of antiviral immunity.
Collapse
Affiliation(s)
- Nimitha R Mathew
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Jayalal K Jayanthan
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Ilya V Smirnov
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Jonathan L Robinson
- Department of Biology and Biological Engineering, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Chalmers University of Technology, Göteborg, Sweden
| | - Hannes Axelsson
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Sravya S Nakka
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Aikaterini Emmanouilidi
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Paulo Czarnewski
- Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - William T Yewdell
- Immunology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Karin Schön
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Cristina Lebrero-Fernández
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Valentina Bernasconi
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - William Rodin
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Ali M Harandi
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden; Vaccine Evaluation Center, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Nils Lycke
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Nicholas Borcherding
- Department of Pathology and Immunology, Washington University, St. Louis, MO, USA
| | - Jonathan W Yewdell
- Laboratory of Viral Diseases, National Institutes of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Victor Greiff
- Department of Immunology, University of Oslo, Oslo, Norway
| | - Mats Bemark
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden; Region Västra Götaland, Department of Clinical Immunology and Transfusion Medicine, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Davide Angeletti
- Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden.
| |
Collapse
|
30
|
Natali EN, Babrak LM, Miho E. Prospective Artificial Intelligence to Dissect the Dengue Immune Response and Discover Therapeutics. Front Immunol 2021; 12:574411. [PMID: 34211454 PMCID: PMC8239437 DOI: 10.3389/fimmu.2021.574411] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 05/17/2021] [Indexed: 01/02/2023] Open
Abstract
Dengue virus (DENV) poses a serious threat to global health as the causative agent of dengue fever. The virus is endemic in more than 128 countries resulting in approximately 390 million infection cases each year. Currently, there is no approved therapeutic for treatment nor a fully efficacious vaccine. The development of therapeutics is confounded and hampered by the complexity of the immune response to DENV, in particular to sequential infection with different DENV serotypes (DENV1-5). Researchers have shown that the DENV envelope (E) antigen is primarily responsible for the interaction and subsequent invasion of host cells for all serotypes and can elicit neutralizing antibodies in humans. The advent of high-throughput sequencing and the rapid advancements in computational analysis of complex data, has provided tools for the deconvolution of the DENV immune response. Several types of complex statistical analyses, machine learning models and complex visualizations can be applied to begin answering questions about the B- and T-cell immune responses to multiple infections, antibody-dependent enhancement, identification of novel therapeutics and advance vaccine research.
Collapse
Affiliation(s)
- Eriberto N. Natali
- Institute of Medical Engineering and Medical Informatics, School of Life Sciences, University of Applied Sciences and Arts Northwestern Switzerland FHNW, Muttenz, Switzerland
| | - Lmar M. Babrak
- Institute of Medical Engineering and Medical Informatics, School of Life Sciences, University of Applied Sciences and Arts Northwestern Switzerland FHNW, Muttenz, Switzerland
| | - Enkelejda Miho
- Institute of Medical Engineering and Medical Informatics, School of Life Sciences, University of Applied Sciences and Arts Northwestern Switzerland FHNW, Muttenz, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- aiNET GmbH, Basel, Switzerland
| |
Collapse
|
31
|
Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning. Nat Biomed Eng 2021; 5:600-612. [PMID: 33859386 DOI: 10.1038/s41551-021-00699-9] [Citation(s) in RCA: 104] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 02/15/2021] [Indexed: 02/06/2023]
Abstract
The optimization of therapeutic antibodies is time-intensive and resource-demanding, largely because of the low-throughput screening of full-length antibodies (approximately 1 × 103 variants) expressed in mammalian cells, which typically results in few optimized leads. Here we show that optimized antibody variants can be identified by predicting antigen specificity via deep learning from a massively diverse space of antibody sequences. To produce data for training deep neural networks, we deep-sequenced libraries of the therapeutic antibody trastuzumab (about 1 × 104 variants), expressed in a mammalian cell line through site-directed mutagenesis via CRISPR-Cas9-mediated homology-directed repair, and screened the libraries for specificity to human epidermal growth factor receptor 2 (HER2). We then used the trained neural networks to screen a computational library of approximately 1 × 108 trastuzumab variants and predict the HER2-specific subset (approximately 1 × 106 variants), which can then be filtered for viscosity, clearance, solubility and immunogenicity to generate thousands of highly optimized lead candidates. Recombinant expression and experimental testing of 30 randomly selected variants from the unfiltered library showed that all 30 retained specificity for HER2. Deep learning may facilitate antibody engineering and optimization.
Collapse
|
32
|
Large-scale analysis of 2,152 Ig-seq datasets reveals key features of B cell biology and the antibody repertoire. Cell Rep 2021; 35:109110. [PMID: 33979623 DOI: 10.1016/j.celrep.2021.109110] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 03/09/2021] [Accepted: 04/20/2021] [Indexed: 12/20/2022] Open
Abstract
Antibody repertoire sequencing enables researchers to acquire millions of B cell receptors and investigate these molecules at the single-nucleotide level. This power and resolution in studying humoral responses have led to its wide applications. However, most of these studies were conducted with a limited number of samples. Given the extraordinary diversity, assessment of these key features with a large sample set is demanded. Thus, we collect and systematically analyze 2,152 high-quality heavy-chain antibody repertoires. Our study reveals that 52 core variable genes universally contribute to more than 99% of each individual's repertoire; a distal interspersed preferences characterize V gene recombination; the number of public clones between two repertoires follows a linear model, and the positive selection dominates at RGYW motif in somatic hypermutations. Thus, this population-level analysis resolves some critical features of the antibody repertoire and may have significant value to the large cadre of scientists.
Collapse
|
33
|
Soto C, Bombardi RG, Kozhevnikov M, Sinkovits RS, Chen EC, Branchizio A, Kose N, Day SB, Pilkinton M, Gujral M, Mallal S, Crowe JE. High Frequency of Shared Clonotypes in Human T Cell Receptor Repertoires. Cell Rep 2021; 32:107882. [PMID: 32668251 PMCID: PMC7433715 DOI: 10.1016/j.celrep.2020.107882] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 04/18/2020] [Accepted: 06/16/2020] [Indexed: 01/30/2023] Open
Abstract
The collection of T cell receptors (TCRs) generated by somatic recombination is large but unknown. We generate large TCR repertoire datasets as a resource to facilitate detailed studies of the role of TCR clonotypes and repertoires in health and disease. We estimate the size of individual human recombined and expressed TCRs by sequence analysis and determine the extent of sharing between individual repertoires. Our experiments reveal that each blood sample contains between 5 million and 21 million TCR clonotypes. Three individuals share 8% of TCRβ- or 11% of TCRα-chain clonotypes. Sorting by T cell phenotypes in four individuals shows that 5% of naive CD4+ and 3.5% of naive CD8+ subsets share their TCRβ clonotypes, whereas memory CD4+ and CD8+ subsets share 2.3% and 0.4% of their clonotypes, respectively. We identify the sequences of these shared TCR clonotypes that are of interest for studies of human T cell biology. Soto et al. examine the extent to which five healthy adults share their T cell receptor (TCR) repertoire. Using sequencing and bioinformatics, they show a high prevalence of shared clonotypes even considering different T cell phenotypes. Possible functions for some clonotypes are inferred based on homology with TCRs in GenBank.
Collapse
Affiliation(s)
- Cinque Soto
- The Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Robin G Bombardi
- The Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Morgan Kozhevnikov
- The Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Robert S Sinkovits
- San Diego Supercomputer Center, University of California, San Diego, San Diego, CA 92093, USA
| | - Elaine C Chen
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN 37212, USA
| | - Andre Branchizio
- The Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Nurgun Kose
- The Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Samuel B Day
- The Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Mark Pilkinton
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Madhusudan Gujral
- San Diego Supercomputer Center, University of California, San Diego, San Diego, CA 92093, USA
| | - Simon Mallal
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN 37212, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - James E Crowe
- The Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA; Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN 37212, USA.
| |
Collapse
|
34
|
Pertseva M, Gao B, Neumeier D, Yermanos A, Reddy ST. Applications of Machine and Deep Learning in Adaptive Immunity. Annu Rev Chem Biomol Eng 2021; 12:39-62. [PMID: 33852352 DOI: 10.1146/annurev-chembioeng-101420-125021] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Adaptive immunity is mediated by lymphocyte B and T cells, which respectively express a vast and diverse repertoire of B cell and T cell receptors and, in conjunction with peptide antigen presentation through major histocompatibility complexes (MHCs), can recognize and respond to pathogens and diseased cells. In recent years, advances in deep sequencing have led to a massive increase in the amount of adaptive immune receptor repertoire data; additionally, proteomics techniques have led to a wealth of data on peptide-MHC presentation. These large-scale data sets are now making it possible to train machine and deep learning models, which can be used to identify complex and high-dimensional patterns in immune repertoires. This article introduces adaptive immune repertoires and machine and deep learning related to biological sequence data and then summarizes the many applications in this field, which span from predicting the immunological status of a host to the antigen specificity of individual receptors and the engineering of immunotherapeutics.
Collapse
Affiliation(s)
- Margarita Pertseva
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; .,Life Science Zurich Graduate School, ETH Zurich and University of Zurich, 8006 Zurich, Switzerland
| | - Beichen Gao
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland;
| | - Daniel Neumeier
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland;
| | - Alexander Yermanos
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; .,Department of Pathology and Immunology, University of Geneva, 1205 Geneva, Switzerland.,Department of Biology, Institute of Microbiology and Immunology, ETH Zurich, 8093 Zurich, Switzerland
| | - Sai T Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland;
| |
Collapse
|
35
|
Isacchini G, Walczak AM, Mora T, Nourmohammad A. Deep generative selection models of T and B cell receptor repertoires with soNNia. Proc Natl Acad Sci U S A 2021; 118:e2023141118. [PMID: 33795515 PMCID: PMC8040596 DOI: 10.1073/pnas.2023141118] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Subclasses of lymphocytes carry different functional roles to work together and produce an immune response and lasting immunity. Additionally to these functional roles, T and B cell lymphocytes rely on the diversity of their receptor chains to recognize different pathogens. The lymphocyte subclasses emerge from common ancestors generated with the same diversity of receptors during selection processes. Here, we leverage biophysical models of receptor generation with machine learning models of selection to identify specific sequence features characteristic of functional lymphocyte repertoires and subrepertoires. Specifically, using only repertoire-level sequence information, we classify CD4+ and CD8+ T cells, find correlations between receptor chains arising during selection, and identify T cell subsets that are targets of pathogenic epitopes. We also show examples of when simple linear classifiers do as well as more complex machine learning methods.
Collapse
Affiliation(s)
- Giulio Isacchini
- Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, 37077 Göttingen, Germany
- Laboratoire de Physique de l'Ecole Normale Supérieure, Paris Sciences & Lettres (PSL) University, CNRS, Sorbonne Université and Université de Paris, 75005 Paris, France
| | - Aleksandra M Walczak
- Laboratoire de Physique de l'Ecole Normale Supérieure, Paris Sciences & Lettres (PSL) University, CNRS, Sorbonne Université and Université de Paris, 75005 Paris, France;
| | - Thierry Mora
- Laboratoire de Physique de l'Ecole Normale Supérieure, Paris Sciences & Lettres (PSL) University, CNRS, Sorbonne Université and Université de Paris, 75005 Paris, France;
| | - Armita Nourmohammad
- Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, 37077 Göttingen, Germany;
- Department of Physics, University of Washington, Seattle, WA 98195
- Herbold Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109
| |
Collapse
|
36
|
Laustsen AH, Greiff V, Karatt-Vellatt A, Muyldermans S, Jenkins TP. Animal Immunization, in Vitro Display Technologies, and Machine Learning for Antibody Discovery. Trends Biotechnol 2021; 39:1263-1273. [PMID: 33775449 DOI: 10.1016/j.tibtech.2021.03.003] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 03/02/2021] [Accepted: 03/03/2021] [Indexed: 02/07/2023]
Abstract
For years, a discussion has persevered on the benefits and drawbacks of antibody discovery using animal immunization versus in vitro selection from non-animal-derived recombinant repertoires using display technologies. While it has been argued that using recombinant display libraries can reduce animal consumption, we hold that the number of animals used in immunization campaigns is dwarfed by the number sacrificed during preclinical studies. Thus, improving quality control of antibodies before entering in vivo studies will have a larger impact on animal consumption. Both animal immunization and recombinant repertoires present unique advantages for discovering antibodies that are fit for purpose. Furthermore, we anticipate that machine learning will play a significant role within discovery workflows, refining current antibody discovery practices.
Collapse
Affiliation(s)
- Andreas H Laustsen
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kongens Lyngby, Denmark.
| | - Victor Greiff
- Department of Immunology, University of Oslo, Oslo, Norway
| | | | - Serge Muyldermans
- Department of Cellular and Molecular Immunology, Vrije Universiteit Brussel, Brussels, Belgium
| | - Timothy P Jenkins
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
37
|
Yohannes DA, Kaukinen K, Kurppa K, Saavalainen P, Greco D. Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences. BMC Bioinformatics 2021; 22:159. [PMID: 33765908 PMCID: PMC7993519 DOI: 10.1186/s12859-021-04087-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 03/17/2021] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Deep immune receptor sequencing, RepSeq, provides unprecedented opportunities for identifying and studying condition-associated T-cell clonotypes, represented by T-cell receptor (TCR) CDR3 sequences. However, due to the immense diversity of the immune repertoire, identification of condition relevant TCR CDR3s from total repertoires has mostly been limited to either "public" CDR3 sequences or to comparisons of CDR3 frequencies observed in a single individual. A methodology for the identification of condition-associated TCR CDR3s by direct population level comparison of RepSeq samples is currently lacking. RESULTS We present a method for direct population level comparison of RepSeq samples using immune repertoire sub-units (or sub-repertoires) that are shared across individuals. The method first performs unsupervised clustering of CDR3s within each sample. It then finds matching clusters across samples, called immune sub-repertoires, and performs statistical differential abundance testing at the level of the identified sub-repertoires. It finally ranks CDR3s in differentially abundant sub-repertoires for relevance to the condition. We applied the method on total TCR CDR3β RepSeq datasets of celiac disease patients, as well as on public datasets of yellow fever vaccination. The method successfully identified celiac disease associated CDR3β sequences, as evidenced by considerable agreement of TRBV-gene and positional amino acid usage patterns in the detected CDR3β sequences with previously known CDR3βs specific to gluten in celiac disease. It also successfully recovered significantly high numbers of previously known CDR3β sequences relevant to each condition than would be expected by chance. CONCLUSION We conclude that immune sub-repertoires of similar immuno-genomic features shared across unrelated individuals can serve as viable units of immune repertoire comparison, serving as proxy for identification of condition-associated CDR3s.
Collapse
Affiliation(s)
- Dawit A Yohannes
- Research Programs Unit, Translational Immunology, University of Helsinki, Helsinki, Finland.,Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
| | - Katri Kaukinen
- Department of Internal Medicine, Faculty of Medicine and Health Technology, Tampere University Hospital, Tampere University, Tampere, Finland
| | - Kalle Kurppa
- Department of Pediatrics, Tampere University Hospital and Center for Child Health Research, Tampere University, Tampere, Finland
| | - Päivi Saavalainen
- Research Programs Unit, Translational Immunology, University of Helsinki, Helsinki, Finland.,Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland. .,BioMediTech Institute, Tampere University, Tampere, Finland. .,Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
38
|
Shemesh O, Polak P, Lundin KEA, Sollid LM, Yaari G. Machine Learning Analysis of Naïve B-Cell Receptor Repertoires Stratifies Celiac Disease Patients and Controls. Front Immunol 2021; 12:627813. [PMID: 33790900 PMCID: PMC8006302 DOI: 10.3389/fimmu.2021.627813] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 02/17/2021] [Indexed: 12/13/2022] Open
Abstract
Celiac disease (CeD) is a common autoimmune disorder caused by an abnormal immune response to dietary gluten proteins. The disease has high heritability. HLA is the major susceptibility factor, and the HLA effect is mediated via presentation of deamidated gluten peptides by disease-associated HLA-DQ variants to CD4+ T cells. In addition to gluten-specific CD4+ T cells the patients have antibodies to transglutaminase 2 (autoantigen) and deamidated gluten peptides. These disease-specific antibodies recognize defined epitopes and they display common usage of specific heavy and light chains across patients. Interactions between T cells and B cells are likely central in the pathogenesis, but how the repertoires of naïve T and B cells relate to the pathogenic effector cells is unexplored. To this end, we applied machine learning classification models to naïve B cell receptor (BCR) repertoires from CeD patients and healthy controls. Strikingly, we obtained a promising classification performance with an F1 score of 85%. Clusters of heavy and light chain sequences were inferred and used as features for the model, and signatures associated with the disease were then characterized. These signatures included amino acid (AA) 3-mers with distinct bio-physiochemical characteristics and enriched V and J genes. We found that CeD-associated clusters can be identified and that common motifs can be characterized from naïve BCR repertoires. The results may indicate a genetic influence by BCR encoding genes in CeD. Analysis of naïve BCRs as presented here may become an important part of assessing the risk of individuals to develop CeD. Our model demonstrates the potential of using BCR repertoires and in particular, naïve BCR repertoires, as disease susceptibility markers.
Collapse
Affiliation(s)
- Or Shemesh
- Bioengineering, Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnologies and Advanced Materials, Bar Ilan University, Ramat Gan, Israel
| | - Pazit Polak
- Bioengineering, Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnologies and Advanced Materials, Bar Ilan University, Ramat Gan, Israel
| | - Knut E. A. Lundin
- K.G. Jebsen Center for Coeliac Disease Research, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- Department of Gastroenterology, Oslo University Hospital Rikshopsitalet, Oslo, Norway
| | - Ludvig M. Sollid
- K.G. Jebsen Center for Coeliac Disease Research, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- Department of Immunology, Oslo University Hospital Rikshospitalet, Oslo, Norway
| | - Gur Yaari
- Bioengineering, Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnologies and Advanced Materials, Bar Ilan University, Ramat Gan, Israel
| |
Collapse
|
39
|
Akbar R, Robert PA, Pavlović M, Jeliazkov JR, Snapkov I, Slabodkin A, Weber CR, Scheffer L, Miho E, Haff IH, Haug DTT, Lund-Johansen F, Safonova Y, Sandve GK, Greiff V. A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding. Cell Rep 2021; 34:108856. [PMID: 33730590 DOI: 10.1016/j.celrep.2021.108856] [Citation(s) in RCA: 75] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 11/29/2020] [Accepted: 02/22/2021] [Indexed: 12/16/2022] Open
Abstract
Antibody-antigen binding relies on the specific interaction of amino acids at the paratope-epitope interface. The predictability of antibody-antigen binding is a prerequisite for de novo antibody and (neo-)epitope design. A fundamental premise for the predictability of antibody-antigen binding is the existence of paratope-epitope interaction motifs that are universally shared among antibody-antigen structures. In a dataset of non-redundant antibody-antigen structures, we identify structural interaction motifs, which together compose a commonly shared structure-based vocabulary of paratope-epitope interactions. We show that this vocabulary enables the machine learnability of antibody-antigen binding on the paratope-epitope level using generative machine learning. The vocabulary (1) is compact, less than 104 motifs; (2) distinct from non-immune protein-protein interactions; and (3) mediates specific oligo- and polyreactive interactions between paratope-epitope pairs. Our work leverages combined structure- and sequence-based learning to demonstrate that machine-learning-driven predictive paratope and epitope engineering is feasible.
Collapse
Affiliation(s)
- Rahmad Akbar
- Department of Immunology, University of Oslo, Oslo, Norway.
| | | | - Milena Pavlović
- Department of Informatics, University of Oslo, Oslo, Norway; Centre for Bioinformatics, University of Oslo, Norway; K.G. Jebsen Centre for Coeliac Disease Research, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | | | - Igor Snapkov
- Department of Immunology, University of Oslo, Oslo, Norway
| | | | - Cédric R Weber
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Lonneke Scheffer
- Department of Informatics, University of Oslo, Oslo, Norway; Centre for Bioinformatics, University of Oslo, Norway
| | - Enkelejda Miho
- Institute of Medical Engineering and Medical Informatics, School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Muttenz, Switzerland
| | | | | | | | - Yana Safonova
- Computer Science and Engineering Department, University of California, San Diego, La Jolla, CA, USA
| | - Geir K Sandve
- Department of Informatics, University of Oslo, Oslo, Norway; Centre for Bioinformatics, University of Oslo, Norway; K.G. Jebsen Centre for Coeliac Disease Research, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Victor Greiff
- Department of Immunology, University of Oslo, Oslo, Norway.
| |
Collapse
|
40
|
Raybould MIJ, Marks C, Kovaltsuk A, Lewis AP, Shi J, Deane CM. Public Baseline and shared response structures support the theory of antibody repertoire functional commonality. PLoS Comput Biol 2021; 17:e1008781. [PMID: 33647011 PMCID: PMC7951972 DOI: 10.1371/journal.pcbi.1008781] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 03/11/2021] [Accepted: 02/08/2021] [Indexed: 12/14/2022] Open
Abstract
The naïve antibody/B-cell receptor (BCR) repertoires of different individuals ought to exhibit significant functional commonality, given that most pathogens trigger an effective antibody response to immunodominant epitopes. Sequence-based repertoire analysis has so far offered little evidence for this phenomenon. For example, a recent study estimated the number of shared ('public') antibody clonotypes in circulating baseline repertoires to be around 0.02% across ten unrelated individuals. However, to engage the same epitope, antibodies only require a similar binding site structure and the presence of key paratope interactions, which can occur even when their sequences are dissimilar. Here, we search for evidence of geometric similarity/convergence across human antibody repertoires. We first structurally profile naïve ('baseline') antibody diversity using snapshots from 41 unrelated individuals, predicting all modellable distinct structures within each repertoire. This analysis uncovers a high (much greater than random) degree of structural commonality. For instance, around 3% of distinct structures are common to the ten most diverse individual samples ('Public Baseline' structures). Our approach is the first computational method to find levels of BCR commonality commensurate with epitope immunodominance and could therefore be harnessed to find more genetically distant antibodies with same-epitope complementarity. We then apply the same structural profiling approach to repertoire snapshots from three individuals before and after flu vaccination, detecting a convergent structural drift indicative of recognising similar epitopes ('Public Response' structures). We show that Antibody Model Libraries derived from Public Baseline and Public Response structures represent a powerful geometric basis set of low-immunogenicity candidates exploitable for general or target-focused therapeutic antibody screening.
Collapse
Affiliation(s)
- Matthew I. J. Raybould
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Claire Marks
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Aleksandr Kovaltsuk
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Alan P. Lewis
- Data and Computational Sciences, GlaxoSmithKline Research and Development, Stevenage, United Kingdom
| | - Jiye Shi
- Chemistry Department, UCB Pharma, Slough, United Kingdom
| | - Charlotte M. Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
41
|
Shoukat MS, Foers AD, Woodmansey S, Evans SC, Fowler A, Soilleux EJ. Use of machine learning to identify a T cell response to SARS-CoV-2. Cell Rep Med 2021; 2:100192. [PMID: 33495756 PMCID: PMC7816879 DOI: 10.1016/j.xcrm.2021.100192] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 12/08/2020] [Accepted: 01/12/2021] [Indexed: 12/29/2022]
Abstract
The identification of SARS-CoV-2-specific T cell receptor (TCR) sequences is critical for understanding T cell responses to SARS-CoV-2. Accordingly, we reanalyze publicly available data from SARS-CoV-2-recovered patients who had low-severity disease (n = 17) and SARS-CoV-2 infection-naive (control) individuals (n = 39). Applying a machine learning approach to TCR beta (TRB) repertoire data, we can classify patient/control samples with a training sensitivity, specificity, and accuracy of 88.2%, 100%, and 96.4% and a testing sensitivity, specificity, and accuracy of 82.4%, 97.4%, and 92.9%, respectively. Interestingly, the same machine learning approach cannot separate SARS-CoV-2 recovered from SARS-CoV-2 infection-naive individual samples on the basis of B cell receptor (immunoglobulin heavy chain; IGH) repertoire data, suggesting that the T cell response to SARS-CoV-2 may be more stereotyped and longer lived. Following validation in larger cohorts, our method may be useful in detecting protective immunity acquired through natural infection or in determining the longevity of vaccine-induced immunity.
Collapse
Affiliation(s)
- M. Saad Shoukat
- Department of Pathology, University of Cambridge, Cambridge, UK
| | - Andrew D. Foers
- Department of Pathology, University of Cambridge, Cambridge, UK
| | | | | | - Anna Fowler
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Liverpool, UK
| | | |
Collapse
|
42
|
Yiu HH, Schoettle LN, Garcia-Neuer M, Blattman JN, Johnson PLF. Selection influences naive CD8+ TCR-β repertoire sharing. Immunology 2021; 162:464-475. [PMID: 33345304 DOI: 10.1111/imm.13299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 11/22/2020] [Accepted: 11/29/2020] [Indexed: 11/28/2022] Open
Abstract
Within each individual, the adaptive immune system generates a repertoire of cells expressing receptors capable of recognizing diverse potential pathogens. The theoretical diversity of the T-cell receptor (TCR) repertoire exceeds the actual size of the T-cell population in an individual by several orders of magnitude - making the observation of identical TCRs in different individuals extremely improbable if all receptors were equally likely. Despite this disparity between the theoretical and the realized diversity of the repertoire, these 'public' receptor sequences have been identified in autoimmune, cancer and pathogen interaction contexts. Biased generation processes explain the presence of public TCRs in the naive repertoire, but do not adequately explain the different abundances of these public TCRs. We investigate and characterize the distribution of genomic TCR-β sequences of naive CD8+ T cells from three genetically identical mice, comparing non-productive (non-functional sequences) and productive sequences. We find public TCR-β sequences at higher abundances compared with unshared sequences in the productive, but not in the non-productive, repertoire. We show that neutral processes such as recombination biases, codon degeneracy and generation probability do not fully account for these differences, and conclude that thymic or peripheral selection plays an important role in increasing the abundances of public TCR-β sequences.
Collapse
Affiliation(s)
- Hao H Yiu
- Department of Biology, University of Maryland, College Park, MD, USA
| | - Louis N Schoettle
- School of Life Sciences, The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Marlene Garcia-Neuer
- School of Life Sciences, The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Joseph N Blattman
- School of Life Sciences, The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | | |
Collapse
|
43
|
Foers AD, Shoukat MS, Welsh OE, Donovan K, Petry R, Evans SC, FitzPatrick ME, Collins N, Klenerman P, Fowler A, Soilleux EJ. Classification of intestinal T-cell receptor repertoires using machine learning methods can identify patients with coeliac disease regardless of dietary gluten status. J Pathol 2021; 253:279-291. [PMID: 33225446 PMCID: PMC7898595 DOI: 10.1002/path.5592] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 10/29/2020] [Accepted: 11/12/2020] [Indexed: 12/17/2022]
Abstract
In coeliac disease (CeD), immune-mediated small intestinal damage is precipitated by gluten, leading to variable symptoms and complications, occasionally including aggressive T-cell lymphoma. Diagnosis, based primarily on histopathological examination of duodenal biopsies, is confounded by poor concordance between pathologists and minimal histological abnormality if insufficient gluten is consumed. CeD pathogenesis involves both CD4+ T-cell-mediated gluten recognition and CD8+ and γδ T-cell-mediated inflammation, with a previous study demonstrating a permanent change in γδ T-cell populations in CeD. We leveraged this understanding and explored the diagnostic utility of bulk T-cell receptor (TCR) sequencing in assessing duodenal biopsies in CeD. Genomic DNA extracted from duodenal biopsies underwent sequencing for TCR-δ (TRD) (CeD, n = 11; non-CeD, n = 11) and TCR-γ (TRG) (CeD, n = 33; non-CeD, n = 21). We developed a novel machine learning-based analysis of the TCR repertoire, clustering samples by diagnosis. Leave-one-out cross-validation (LOOCV) was performed to validate the classification algorithm. Using TRD repertoire, 100% (22/22) of duodenal biopsies were correctly classified, with a LOOCV accuracy of 91%. Using TCR-γ (TRG) repertoire, 94.4% (51/54) of duodenal biopsies were correctly classified, with LOOCV of 87%. Duodenal biopsy TRG repertoire analysis permitted accurate classification of biopsies from patients with CeD following a strict gluten-free diet for at least 6 months, who would be misclassified by current tests. This result reflects permanent changes to the duodenal γδ TCR repertoire in CeD, even in the absence of gluten consumption. Our method could complement or replace histopathological diagnosis in CeD and might have particular clinical utility in the diagnostic testing of patients unable to tolerate dietary gluten, and for assessing duodenal biopsies with equivocal features. This approach is generalisable to any TCR/BCR locus and any sequencing platform, with potential to predict diagnosis or prognosis in conditions mediated or modulated by the adaptive immune response. © 2020 The Authors. The Journal of Pathology published by John Wiley & Sons, Ltd. on behalf of The Pathological Society of Great Britain and Ireland.
Collapse
Affiliation(s)
- Andrew D Foers
- Department of Pathology, University of Cambridge, Cambridge, UK
| | - M Saad Shoukat
- Department of Pathology, University of Cambridge, Cambridge, UK
| | - Oliver E Welsh
- Department of Pathology, University of Cambridge, Cambridge, UK.,Centre for Mathematical Sciences, University of Cambridge, Cambridge, UK
| | | | - Russell Petry
- Department of Pathology, University of Cambridge, Cambridge, UK.,Centre for Mathematical Sciences, University of Cambridge, Cambridge, UK
| | - Shelley C Evans
- Department of Pathology, University of Cambridge, Cambridge, UK
| | - Michael Eb FitzPatrick
- Translational Gastroenterology Unit, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Nadine Collins
- Department of Molecular Pathology, Royal Surrey NHS Foundation Trust, Guildford, UK
| | - Paul Klenerman
- Translational Gastroenterology Unit, Nuffield Department of Medicine, University of Oxford, Oxford, UK.,Peter Medawar Building for Pathogen Research, University of Oxford, Oxford, UK
| | - Anna Fowler
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Liverpool, UK
| | - Elizabeth J Soilleux
- Department of Pathology, University of Cambridge, Cambridge, UK.,Nuffield Division of Clinical Laboratory Sciences, Radcliffe Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
44
|
Greiff V, Yaari G, Cowell LG. Mining adaptive immune receptor repertoires for biological and clinical information using machine learning. ACTA ACUST UNITED AC 2020. [DOI: 10.1016/j.coisb.2020.10.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
45
|
Collins AM, Yaari G, Shepherd AJ, Lees W, Watson CT. Germline immunoglobulin genes: Disease susceptibility genes hidden in plain sight? CURRENT OPINION IN SYSTEMS BIOLOGY 2020; 24:100-108. [PMID: 37008538 PMCID: PMC10062056 DOI: 10.1016/j.coisb.2020.10.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Immunoglobulin genes are rarely considered as disease susceptibility genes despite their obvious and central contributions to immune function. This appears to be a consequence of historical views on antibody repertoire formation that no longer stand, and of difficulties that until recently surrounded the documentation of the suite of antibody genes in any individual. If these important genes are to be accessible to GWAS studies, allelic variation within the human population needs to be better documented, and a curated set of genomic variations associated with antibody genes needs to be formulated. Repertoire studies arising from the COVID-19 pandemic provide an opportunity to meet these needs, and may provide insights into the profound variability that is seen in outcomes to this infection.
Collapse
|
46
|
Weber CR, Akbar R, Yermanos A, Pavlović M, Snapkov I, Sandve GK, Reddy ST, Greiff V. immuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking. Bioinformatics 2020; 36:3594-3596. [PMID: 32154832 PMCID: PMC7334888 DOI: 10.1093/bioinformatics/btaa158] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 02/03/2020] [Accepted: 03/04/2020] [Indexed: 11/14/2022] Open
Abstract
Summary B- and T-cell receptor repertoires of the adaptive immune system have become a key target for diagnostics and therapeutics research. Consequently, there is a rapidly growing number of bioinformatics tools for immune repertoire analysis. Benchmarking of such tools is crucial for ensuring reproducible and generalizable computational analyses. Currently, however, it remains challenging to create standardized ground truth immune receptor repertoires for immunoinformatics tool benchmarking. Therefore, we developed immuneSIM, an R package that allows the simulation of native-like and aberrant synthetic full-length variable region immune receptor sequences by tuning the following immune receptor features: (i) species and chain type (BCR, TCR, single and paired), (ii) germline gene usage, (iii) occurrence of insertions and deletions, (iv) clonal abundance, (v) somatic hypermutation and (vi) sequence motifs. Each simulated sequence is annotated by the complete set of simulation events that contributed to its in silico generation. immuneSIM permits the benchmarking of key computational tools for immune receptor analysis, such as germline gene annotation, diversity and overlap estimation, sequence similarity, network architecture, clustering analysis and machine learning methods for motif detection. Availability and implementation The package is available via https://github.com/GreiffLab/immuneSIM and on CRAN at https://cran.r-project.org/web/packages/immuneSIM. The documentation is hosted at https://immuneSIM.readthedocs.io. Contact sai.reddy@ethz.ch or victor.greiff@medisin.uio.no Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cédric R Weber
- Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland
| | - Rahmad Akbar
- Department of Immunology, University of Oslo, 0372 Oslo, Norway
| | - Alexander Yermanos
- Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland
| | - Milena Pavlović
- Department of Informatics, University of Oslo, 0373 Oslo, Norway
| | - Igor Snapkov
- Department of Immunology, University of Oslo, 0372 Oslo, Norway
| | - Geir K Sandve
- Department of Informatics, University of Oslo, 0373 Oslo, Norway
| | - Sai T Reddy
- Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland
| | - Victor Greiff
- Department of Immunology, University of Oslo, 0372 Oslo, Norway
| |
Collapse
|
47
|
Abstract
Advances in reading, writing, and editing DNA are providing unprecedented insights into the complexity of immunological systems. This combination of systems and synthetic biology methods is enabling the quantitative and precise understanding of molecular recognition in adaptive immunity, thus providing a framework for reprogramming immune responses for translational medicine. In this review, we will highlight state-of-the-art methods such as immune repertoire sequencing, immunoinformatics, and immunogenomic engineering and their application toward adaptive immunity. We showcase novel and interdisciplinary approaches that have the promise of transforming the design and breadth of molecular and cellular immunotherapies.
Collapse
Affiliation(s)
- Lucia Csepregi
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Roy A. Ehling
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Bastian Wagner
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Sai T. Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| |
Collapse
|
48
|
Smakaj E, Babrak L, Ohlin M, Shugay M, Briney B, Tosoni D, Galli C, Grobelsek V, D'Angelo I, Olson B, Reddy S, Greiff V, Trück J, Marquez S, Lees W, Miho E. Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences. Bioinformatics 2020; 36:1731-1739. [PMID: 31873728 PMCID: PMC7075533 DOI: 10.1093/bioinformatics/btz845] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 10/21/2019] [Accepted: 12/19/2019] [Indexed: 01/01/2023] Open
Abstract
Summary Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets. We analyzed changes in IMGT reference germline database in the last 10 years in order to assess the reproducibility of the annotation output. We found that only 73/183 (40%) V, D and J human genes were shared between the reference germline sets used by the tools. We found that the annotation results differed between tools. In terms of alignment accuracy, MiXCR had the highest average frequency of gene mishits, 0.02 mishit frequency and IgBLAST the lowest, 0.004 mishit frequency. Reproducibility in the output of complementarity determining three regions (CDR3 amino acids) ranged from 4.3% to 77.6% with preprocessed data. In addition, run time of the tools was assessed: MiXCR was the fastest tool for number of sequences processed per unit of time. These results indicate that immunoinformatic analyses greatly depend on the choice of bioinformatics tool. Our results support informed decision-making to immunoinformaticians based on repertoire composition and sequencing platforms. Availability and implementation All tools utilized in the paper are free for academic use. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Erand Smakaj
- Institute of Biomedical Engineering and Medical Informatics, School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Muttenz 4132, Switzerland
| | - Lmar Babrak
- Institute of Biomedical Engineering and Medical Informatics, School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Muttenz 4132, Switzerland
| | - Mats Ohlin
- Department of Immunotechnology, Lund University, Lund 223, Sweden
| | - Mikhail Shugay
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| | - Bryan Briney
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Deniz Tosoni
- Institute of Biomedical Engineering and Medical Informatics, School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Muttenz 4132, Switzerland
| | - Christopher Galli
- Institute of Biomedical Engineering and Medical Informatics, School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Muttenz 4132, Switzerland
| | - Vendi Grobelsek
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland
| | - Igor D'Angelo
- One Amgen Center Drive, Amgen, Inc., Therapeutic Discovery/Molecular Engineering, Thousand Oaks, CA 91320, USA
| | - Branden Olson
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.,Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - Sai Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland
| | - Victor Greiff
- Department of Immunology, University of Oslo, Oslo 0372, Norway
| | - Johannes Trück
- Paediatric Immunology, Children's Research Center, University Children's Hospital, University of Zurich, Zurich 8032, Switzerland
| | - Susanna Marquez
- Department of Pathology, Yale School of Medicine, New Haven, CT 06511, USA
| | - William Lees
- Department of Biological Sciences and Institute of Structural and Molecular Biology, Birkbeck College, University of London, London WC1E 7HX, UK
| | - Enkelejda Miho
- Institute of Biomedical Engineering and Medical Informatics, School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Muttenz 4132, Switzerland.,aiNET GmbH, Switzerland Innovation Park Basel Area AG, Basel 4057, Switzerland
| |
Collapse
|
49
|
Wollacott AM, Xue C, Qin Q, Hua J, Bohnuud T, Viswanathan K, Kolachalama VB. Quantifying the nativeness of antibody sequences using long short-term memory networks. Protein Eng Des Sel 2020; 32:347-354. [PMID: 31504835 PMCID: PMC7372931 DOI: 10.1093/protein/gzz031] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Revised: 06/11/2019] [Accepted: 07/07/2019] [Indexed: 11/12/2022] Open
Abstract
Antibodies often undergo substantial engineering en route to the generation of a therapeutic candidate with good developability properties. Characterization of antibody libraries has shown that retaining native-like sequence improves the overall quality of the library. Motivated by recent advances in deep learning, we developed a bi-directional long short-term memory (LSTM) network model to make use of the large amount of available antibody sequence information, and use this model to quantify the nativeness of antibody sequences. The model scores sequences for their similarity to naturally occurring antibodies, which can be used as a consideration during design and engineering of libraries. We demonstrate the performance of this approach by training a model on human antibody sequences and show that our method outperforms other approaches at distinguishing human antibodies from those of other species. We show the applicability of this method for the evaluation of synthesized antibody libraries and humanization of mouse antibodies.
Collapse
Affiliation(s)
| | - Chonghua Xue
- Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Qiuyuan Qin
- Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - June Hua
- Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | | | | | - Vijaya B Kolachalama
- Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA.,Hariri Institute of Computing and Computational Science & Engineering, Boston University, Boston, MA 02115, USA.,Whitaker Cardiovascular Institute, Boston University School of Medicine, Boston, MA 02118, USA.,Boston University Alzheimer's Disease Center, Boston, MA 02118, USA
| |
Collapse
|
50
|
Meysman P, De Neuter N, Gielis S, Bui Thi D, Ogunjimi B, Laukens K. On the viability of unsupervised T-cell receptor sequence clustering for epitope preference. Bioinformatics 2020; 35:1461-1468. [PMID: 30247624 DOI: 10.1093/bioinformatics/bty821] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 08/29/2018] [Accepted: 09/20/2018] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION The T-cell receptor (TCR) is responsible for recognizing epitopes presented on cell surfaces. Linking TCR sequences to their ability to target specific epitopes is currently an unsolved problem, yet one of great interest. Indeed, it is currently unknown how dissimilar TCR sequences can be before they no longer bind the same epitope. This question is confounded by the fact that there are many ways to define the similarity between two TCR sequences. Here we investigate both issues in the context of TCR sequence unsupervised clustering. RESULTS We provide an overview of the performance of various distance metrics on two large independent datasets with 412 and 2835 TCR sequences respectively. Our results confirm the presence of structural distinct TCR groups that target identical epitopes. In addition, we put forward several recommendations to perform unsupervised T-cell receptor sequence clustering. AVAILABILITY AND IMPLEMENTATION Source code implemented in Python 3 available at https://github.com/pmeysman/TCRclusteringPaper. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pieter Meysman
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS).,Department of Computer Science and Mathematics, ADREM Data Lab.,Biomedical Informatics Research Network Antwerp (biomina), University of Antwerp, Antwerp, Belgium
| | - Nicolas De Neuter
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS).,Department of Computer Science and Mathematics, ADREM Data Lab.,Biomedical Informatics Research Network Antwerp (biomina), University of Antwerp, Antwerp, Belgium
| | - Sofie Gielis
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS).,Department of Computer Science and Mathematics, ADREM Data Lab.,Biomedical Informatics Research Network Antwerp (biomina), University of Antwerp, Antwerp, Belgium
| | - Danh Bui Thi
- Department of Computer Science and Mathematics, ADREM Data Lab.,Biomedical Informatics Research Network Antwerp (biomina), University of Antwerp, Antwerp, Belgium
| | - Benson Ogunjimi
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS).,Antwerp Center for Translational Immunology and Virology (ACTIV), Vaccine & Infectious Disease Institute (VAXINFECTIO).,Centre for Health Economics Research & Modeling Infectious Diseases (CHERMID), Vaccine & Infectious Disease Institute (VAXINFECTIO), University of Antwerp, Wilrijk, Belgium.,Department of Pediatrics, Antwerp University Hospital, Edegem, Belgium
| | - Kris Laukens
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS).,Department of Computer Science and Mathematics, ADREM Data Lab.,Biomedical Informatics Research Network Antwerp (biomina), University of Antwerp, Antwerp, Belgium
| |
Collapse
|