1
|
Collins AM, Ohlin M, Corcoran M, Heather JM, Ralph D, Law M, Martínez-Barnetche J, Ye J, Richardson E, Gibson WS, Rodriguez OL, Peres A, Yaari G, Watson CT, Lees WD. AIRR-C IG Reference Sets: curated sets of immunoglobulin heavy and light chain germline genes. Front Immunol 2024; 14:1330153. [PMID: 38406579 PMCID: PMC10884231 DOI: 10.3389/fimmu.2023.1330153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 12/27/2023] [Indexed: 02/27/2024] Open
Abstract
Introduction Analysis of an individual's immunoglobulin (IG) gene repertoire requires the use of high-quality germline gene reference sets. When sets only contain alleles supported by strong evidence, AIRR sequencing (AIRR-seq) data analysis is more accurate and studies of the evolution of IG genes, their allelic variants and the expressed immune repertoire is therefore facilitated. Methods The Adaptive Immune Receptor Repertoire Community (AIRR-C) IG Reference Sets have been developed by including only human IG heavy and light chain alleles that have been confirmed by evidence from multiple high-quality sources. To further improve AIRR-seq analysis, some alleles have been extended to deal with short 3' or 5' truncations that can lead them to be overlooked by alignment utilities. To avoid other challenges for analysis programs, exact paralogs (e.g. IGHV1-69*01 and IGHV1-69D*01) are only represented once in each set, though alternative sequence names are noted in accompanying metadata. Results and discussion The Reference Sets include less than half the previously recognised IG alleles (e.g. just 198 IGHV sequences), and also include a number of novel alleles: 8 IGHV alleles, 2 IGKV alleles and 5 IGLV alleles. Despite their smaller sizes, erroneous calls were eliminated, and excellent coverage was achieved when a set of repertoires comprising over 4 million V(D)J rearrangements from 99 individuals were analyzed using the Sets. The version-tracked AIRR-C IG Reference Sets are freely available at the OGRDB website (https://ogrdb.airr-community.org/germline_sets/Human) and will be regularly updated to include newly observed and previously reported sequences that can be confirmed by new high-quality data.
Collapse
Affiliation(s)
- Andrew M. Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Mats Ohlin
- Department of Immunotechnology, and SciLifeLab, Lund University, Lund, Sweden
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
| | - James M. Heather
- Mass General Cancer Center, Massachusetts General Hospital, Charlestown, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Duncan Ralph
- Fred Hutchinson Cancer Research Center, Seattle, WA, United States
| | - Mansun Law
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, United States
| | - Jesus Martínez-Barnetche
- Centro de Investigación Sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Morelos, Mexico
| | - Jian Ye
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Eve Richardson
- La Jolla Institute for Immunology, San Diego, CA, United States
| | - William S. Gibson
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, United States
| | - Oscar L. Rodriguez
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, United States
| | - Ayelet Peres
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Gur Yaari
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, United States
| | - William D. Lees
- Institute of Structural and Molecular Biology, Birkbeck College, London, United Kingdom
- Human-Centered Computing and Information Science, Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
| |
Collapse
|
2
|
Rodriguez OL, Safonova Y, Silver CA, Shields K, Gibson WS, Kos JT, Tieri D, Ke H, Jackson KJL, Boyd SD, Smith ML, Marasco WA, Watson CT. Genetic variation in the immunoglobulin heavy chain locus shapes the human antibody repertoire. Nat Commun 2023; 14:4419. [PMID: 37479682 PMCID: PMC10362067 DOI: 10.1038/s41467-023-40070-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 07/11/2023] [Indexed: 07/23/2023] Open
Abstract
Variation in the antibody response has been linked to differential outcomes in disease, and suboptimal vaccine and therapeutic responsiveness, the determinants of which have not been fully elucidated. Countering models that presume antibodies are generated largely by stochastic processes, we demonstrate that polymorphisms within the immunoglobulin heavy chain locus (IGH) impact the naive and antigen-experienced antibody repertoire, indicating that genetics predisposes individuals to mount qualitatively and quantitatively different antibody responses. We pair recently developed long-read genomic sequencing methods with antibody repertoire profiling to comprehensively resolve IGH genetic variation, including novel structural variants, single nucleotide variants, and genes and alleles. We show that IGH germline variants determine the presence and frequency of antibody genes in the expressed repertoire, including those enriched in functional elements linked to V(D)J recombination, and overlapping disease-associated variants. These results illuminate the power of leveraging IGH genetics to better understand the regulation, function, and dynamics of the antibody response in disease.
Collapse
Affiliation(s)
- Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Yana Safonova
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Catherine A Silver
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Kaitlyn Shields
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - William S Gibson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Justin T Kos
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - David Tieri
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Hanzhong Ke
- Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | | | - Scott D Boyd
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Melissa L Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA.
| | - Wayne A Marasco
- Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA.
| |
Collapse
|
3
|
Lees WD, Christley S, Peres A, Kos JT, Corrie B, Ralph D, Breden F, Cowell LG, Yaari G, Corcoran M, Karlsson Hedestam GB, Ohlin M, Collins AM, Watson CT, Busse CE. AIRR community curation and standardised representation for immunoglobulin and T cell receptor germline sets. IMMUNOINFORMATICS (AMSTERDAM, NETHERLANDS) 2023; 10:100025. [PMID: 37388275 PMCID: PMC10310305 DOI: 10.1016/j.immuno.2023.100025] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Analysis of an individual's immunoglobulin or T cell receptor gene repertoire can provide important insights into immune function. High-quality analysis of adaptive immune receptor repertoire sequencing data depends upon accurate and relatively complete germline sets, but current sets are known to be incomplete. Established processes for the review and systematic naming of receptor germline genes and alleles require specific evidence and data types, but the discovery landscape is rapidly changing. To exploit the potential of emerging data, and to provide the field with improved state-of-the-art germline sets, an intermediate approach is needed that will allow the rapid publication of consolidated sets derived from these emerging sources. These sets must use a consistent naming scheme and allow refinement and consolidation into genes as new information emerges. Name changes should be minimised, but, where changes occur, the naming history of a sequence must be traceable. Here we outline the current issues and opportunities for the curation of germline IG/TR genes and present a forward-looking data model for building out more robust germline sets that can dovetail with current established processes. We describe interoperability standards for germline sets, and an approach to transparency based on principles of findability, accessibility, interoperability, and reusability.
Collapse
Affiliation(s)
- William D. Lees
- Institute of Structural and Molecular Biology, Birkbeck College, London, England
- Human-Centered Computing and Information Science, Institute for Systems and Computer Engineering Technology and Science, Porto, Portugal
| | - Scott Christley
- Peter O’Donnell Jr. School of Public Health, UT Southwestern Medical Center, Dallas, TX, USA
| | - Ayelet Peres
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Justin T. Kos
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, KY, USA
| | - Brian Corrie
- Department of Biological Sciences, Simon Fraser University, Burnaby, BC, Canada
| | - Duncan Ralph
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Felix Breden
- Department of Biological Sciences, Simon Fraser University, Burnaby, BC, Canada
| | - Lindsay G. Cowell
- Peter O’Donnell Jr. School of Public Health, Department of Immunology, School of Biomedical Sciences, UT Southwestern Medical Center, Dallas, TX, USA
| | - Gur Yaari
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Swede
| | | | - Mats Ohlin
- Department of Immunotechnology and SciLifeLab, Lund University, Lund, Sweden
| | - Andrew M. Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, KY, USA
| | - Christian E. Busse
- Division of B Cell Immunology, German Cancer Research Center, Heidelberg, Germany
| | | |
Collapse
|
4
|
Ford EE, Tieri D, Rodriguez OL, Francoeur NJ, Soto J, Kos JT, Peres A, Gibson WS, Silver CA, Deikus G, Hudson E, Woolley CR, Beckmann N, Charney A, Mitchell TC, Yaari G, Sebra RP, Watson CT, Smith ML. FLAIRR-Seq: A Method for Single-Molecule Resolution of Near Full-Length Antibody H Chain Repertoires. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2023; 210:1607-1619. [PMID: 37027017 PMCID: PMC10152037 DOI: 10.4049/jimmunol.2200825] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 03/14/2023] [Indexed: 04/08/2023]
Abstract
Current Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) using short-read sequencing strategies resolve expressed Ab transcripts with limited resolution of the C region. In this article, we present the near-full-length AIRR-seq (FLAIRR-seq) method that uses targeted amplification by 5' RACE, combined with single-molecule, real-time sequencing to generate highly accurate (99.99%) human Ab H chain transcripts. FLAIRR-seq was benchmarked by comparing H chain V (IGHV), D (IGHD), and J (IGHJ) gene usage, complementarity-determining region 3 length, and somatic hypermutation to matched datasets generated with standard 5' RACE AIRR-seq using short-read sequencing and full-length isoform sequencing. Together, these data demonstrate robust FLAIRR-seq performance using RNA samples derived from PBMCs, purified B cells, and whole blood, which recapitulated results generated by commonly used methods, while additionally resolving H chain gene features not documented in IMGT at the time of submission. FLAIRR-seq data provide, for the first time, to our knowledge, simultaneous single-molecule characterization of IGHV, IGHD, IGHJ, and IGHC region genes and alleles, allele-resolved subisotype definition, and high-resolution identification of class switch recombination within a clonal lineage. In conjunction with genomic sequencing and genotyping of IGHC genes, FLAIRR-seq of the IgM and IgG repertoires from 10 individuals resulted in the identification of 32 unique IGHC alleles, 28 (87%) of which were previously uncharacterized. Together, these data demonstrate the capabilities of FLAIRR-seq to characterize IGHV, IGHD, IGHJ, and IGHC gene diversity for the most comprehensive view of bulk-expressed Ab repertoires to date.
Collapse
Affiliation(s)
- Easton E. Ford
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - David Tieri
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Oscar L. Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Nancy J. Francoeur
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Juan Soto
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Justin T. Kos
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, Israel
| | - William S. Gibson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Catherine A. Silver
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Gintaras Deikus
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Elizabeth Hudson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Cassandra R. Woolley
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY
| | - Noam Beckmann
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Alexander Charney
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Thomas C. Mitchell
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, Israel
| | - Robert P. Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Melissa L. Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| |
Collapse
|
5
|
Hardt U, Corcoran MM, Narang S, Malmström V, Padyukov L, Karlsson Hedestam GB. Analysis of IGH allele content in a sample group of rheumatoid arthritis patients demonstrates unrevealed population heterogeneity. Front Immunol 2023; 14:1073414. [PMID: 36798124 PMCID: PMC9927645 DOI: 10.3389/fimmu.2023.1073414] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 01/09/2023] [Indexed: 02/04/2023] Open
Abstract
Immunoglobulin heavy chain (IGH) germline gene variations influence the B cell receptor repertoire, with resulting biological consequences such as shaping our response to infections and altering disease susceptibilities. However, the lack of information on polymorphism frequencies in the IGH loci at the population level makes association studies challenging. Here, we genotyped a pilot group of 30 individuals with rheumatoid arthritis (RA) to examine IGH allele content and frequencies in this group. Eight novel IGHV alleles and one novel IGHJ allele were identified in the study. 15 cases were haplotypable using heterozygous IGHJ6 or IGHD anchors. One variant, IGHV4-34*01_S0742, was found in three out of 30 cases and included a single nucleotide change resulting in a non-canonical recombination signal sequence (RSS) heptamer. This variant allele, shown by haplotype analysis to be non-expressed, was also found in three out of 30 healthy controls and matched a single nucleotide polymorphism (SNP) described in the 1000 Genomes Project (1KGP) collection with frequencies that varied between population groups. Our finding of previously unreported alleles in a relatively small group of individuals with RA illustrates the need for baseline information about IG allelic frequencies in targeted study groups in preparation for future analysis of these genes in disease association studies.
Collapse
Affiliation(s)
- Uta Hardt
- Division of Rheumatology, Department of Medicine Solna, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden and Karolinska University Hospital, Stockholm, Sweden
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Martin M. Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Sanjana Narang
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Vivianne Malmström
- Division of Rheumatology, Department of Medicine Solna, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden and Karolinska University Hospital, Stockholm, Sweden
| | - Leonid Padyukov
- Division of Rheumatology, Department of Medicine Solna, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden and Karolinska University Hospital, Stockholm, Sweden
| | | |
Collapse
|
6
|
Pushparaj P, Nicoletto A, Sheward DJ, Das H, Castro Dopico X, Perez Vidakovics L, Hanke L, Chernyshev M, Narang S, Kim S, Fischbach J, Ekström S, McInerney G, Hällberg BM, Murrell B, Corcoran M, Karlsson Hedestam GB. Immunoglobulin germline gene polymorphisms influence the function of SARS-CoV-2 neutralizing antibodies. Immunity 2023; 56:193-206.e7. [PMID: 36574772 PMCID: PMC9742198 DOI: 10.1016/j.immuni.2022.12.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Revised: 09/23/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022]
Abstract
The human immunoglobulin heavy-chain (IGH) locus is exceptionally polymorphic, with high levels of allelic and structural variation. Thus, germline IGH genotypes are personal, which may influence responses to infection and vaccination. For an improved understanding of inter-individual differences in antibody responses, we isolated SARS-CoV-2 spike-specific monoclonal antibodies from convalescent health care workers, focusing on the IGHV1-69 gene, which has the highest level of allelic variation of all IGHV genes. The IGHV1-69∗20-using CAB-I47 antibody and two similar antibodies isolated from an independent donor were critically dependent on allele usage. Neutralization was retained when reverting the V region to the germline IGHV1-69∗20 allele but lost when reverting to other IGHV1-69 alleles. Structural data confirmed that two germline-encoded polymorphisms, R50 and F55, in the IGHV1-69 gene were required for high-affinity receptor-binding domain interaction. These results demonstrate that polymorphisms in IGH genes can influence the function of SARS-CoV-2 neutralizing antibodies.
Collapse
Affiliation(s)
- Pradeepa Pushparaj
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Andrea Nicoletto
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Daniel J Sheward
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Hrishikesh Das
- Department of Cell and Molecular Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Xaquin Castro Dopico
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Laura Perez Vidakovics
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Leo Hanke
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Mark Chernyshev
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Sanjana Narang
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Sungyong Kim
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Julian Fischbach
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Simon Ekström
- Department of Biomedical Engineering, Lund University, 221 84 Lund, Sweden
| | - Gerald McInerney
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - B Martin Hällberg
- Department of Cell and Molecular Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Ben Murrell
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | | |
Collapse
|
7
|
Collins AM, Watson CT, Breden F. Immunoglobulin genes, reproductive isolation and vertebrate speciation. Immunol Cell Biol 2022; 100:497-506. [PMID: 35781330 PMCID: PMC9545137 DOI: 10.1111/imcb.12567] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 06/19/2022] [Accepted: 06/21/2022] [Indexed: 12/15/2022]
Abstract
Reproductive isolation drives the formation of new species, and many genes contribute to this through Dobzhansky–Muller incompatibilities (DMIs). These incompatibilities occur when gene divergence affects loci encoding interacting products such as receptors and their ligands. We suggest here that the nature of vertebrate immunoglobulin (IG) genes must make them prone to DMIs. The genes of these complex loci form functional genes through the process of recombination, giving rise to a repertoire of heterodimeric receptors of incredible diversity. This repertoire, within individuals and within species, must defend against pathogens but must also avoid pathogenic self‐reactivity. We suggest that this avoidance of autoimmunity is only achieved through a coordination of evolution between heavy‐ and light‐chain genes, and between these genes and the rest of the genome. Without coordinated evolution, the hybrid offspring of two diverging populations will carry a heavy burden of DMIs, resulting in a loss of fitness. Critical incompatibilities could manifest as incompatibilities between a mother and her divergent offspring. During fetal development, biochemical differences between the parents of hybrid offspring could make their offspring a target of the maternal immune system. This hypothesis was conceived in the light of recent insights into the population genetics of IG genes. This has suggested that antibody genes are probably as susceptible to evolutionary forces as other parts of the genome. Further repertoire studies in human and nonhuman species should now help determine whether antibody genes have been part of the evolutionary forces that drive the development of species.
Collapse
Affiliation(s)
- Andrew M Collins
- School of Biotechnology and Biomolecular Sciences University of New South Wales Sydney NSW Australia
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics University of Louisville School of Medicine Louisville KY USA
| | - Felix Breden
- Department of Biological Sciences Simon Fraser University Burnaby BC Canada
| |
Collapse
|
8
|
Jackson KJL, Kos JT, Lees W, Gibson WS, Smith ML, Peres A, Yaari G, Corcoran M, Busse CE, Ohlin M, Watson CT, Collins AM. A BALB/c IGHV Reference Set, Defined by Haplotype Analysis of Long-Read VDJ-C Sequences From F1 (BALB/c x C57BL/6) Mice. Front Immunol 2022; 13:888555. [PMID: 35720344 PMCID: PMC9205180 DOI: 10.3389/fimmu.2022.888555] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 04/28/2022] [Indexed: 11/13/2022] Open
Abstract
The immunoglobulin genes of inbred mouse strains that are commonly used in models of antibody-mediated human diseases are poorly characterized. This compromises data analysis. To infer the immunoglobulin genes of BALB/c mice, we used long-read SMRT sequencing to amplify VDJ-C sequences from F1 (BALB/c x C57BL/6) hybrid animals. Strain variations were identified in the Ighm and Ighg2b genes, and analysis of VDJ rearrangements led to the inference of 278 germline IGHV alleles. 169 alleles are not present in the C57BL/6 genome reference sequence. To establish a set of expressed BALB/c IGHV germline gene sequences, we computationally retrieved IGHV haplotypes from the IgM dataset. Haplotyping led to the confirmation of 162 BALB/c IGHV gene sequences. A musIGHV398 pseudogene variant also appears to be present in the BALB/cByJ substrain, while a functional musIGHV398 gene is highly expressed in the BALB/cJ substrain. Only four of the BALB/c alleles were also observed in the C57BL/6 haplotype. The full set of inferred BALB/c sequences has been used to establish a BALB/c IGHV reference set, hosted at https://ogrdb.airr-community.org. We assessed whether assemblies from the Mouse Genome Project (MGP) are suitable for the determination of the genes of the IGH loci. Only 37 (43.5%) of the 85 confirmed IMGT-named BALB/c IGHV and 33 (42.9%) of the 77 confirmed non-IMGT IGHV were found in a search of the MGP BALB/cJ genome assembly. This suggests that current MGP assemblies are unsuitable for the comprehensive documentation of germline IGHVs and more efforts will be needed to establish strain-specific reference sets.
Collapse
Affiliation(s)
| | - Justin T. Kos
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - William Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - William S. Gibson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Melissa Laird Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Christian E. Busse
- Division of B Cell Immunology, German Cancer Research Center, Heidelberg, Germany
| | - Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Andrew M. Collins
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
9
|
Kaduk M, Corcoran M, Karlsson Hedestam GB. Addressing IGHV Gene Structural Diversity Enhances Immunoglobulin Repertoire Analysis: Lessons From Rhesus Macaque. Front Immunol 2022; 13:818440. [PMID: 35419009 PMCID: PMC8995469 DOI: 10.3389/fimmu.2022.818440] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 03/01/2022] [Indexed: 11/13/2022] Open
Abstract
The accurate germline gene assignment and assessment of somatic hypermutation in antibodies induced by immunization or infection are important in immunological studies. Here, we illustrate issues specific to the construction of comprehensive immunoglobulin (IG) germline gene reference databases for outbred animal species using rhesus macaques, a frequently used non-human primate model, as a model test case. We demonstrate that the genotypic variation found in macaque germline inference studies is reflected in similar levels of gene diversity in genomic assemblies. We show that the high frequency of IG heavy chain V (IGHV) region structural and gene copy number variation between subjects means that individual animals lack genes that are present in other animals. Therefore, gene databases compiled from a single or too few animals will inevitably result in inaccurate gene assignment and erroneous SHM level assessment for those genes it lacks. We demonstrate this by assigning a test macaque IgG library to the KIMDB, a database compiled of germline IGHV sequences from 27 rhesus macaques, and, alternatively, to the IMGT rhesus macaque database, based on IGHV genes inferred primarily from the genomic sequence of the rheMac10 reference assembly, supplemented with 10 genes from the Mmul_051212 assembly. We found that the use of a gene-restricted database led to overestimations of SHM by up to 5% due to misassignments. The principles described in the current study provide a model for the creation of comprehensive immunoglobulin reference databases from outbred species to ensure accurate gene assignment, lineage tracing and SHM calculations.
Collapse
Affiliation(s)
- Mateusz Kaduk
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | | |
Collapse
|
10
|
Omer A, Peres A, Rodriguez OL, Watson CT, Lees W, Polak P, Collins AM, Yaari G. T cell receptor beta germline variability is revealed by inference from repertoire data. Genome Med 2022; 14:2. [PMID: 34991709 PMCID: PMC8740489 DOI: 10.1186/s13073-021-01008-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 12/08/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND T and B cell receptor (TCR, BCR) repertoires constitute the foundation of adaptive immunity. Adaptive immune receptor repertoire sequencing (AIRR-seq) is a common approach to study immune system dynamics. Understanding the genetic factors influencing the composition and dynamics of these repertoires is of major scientific and clinical importance. The chromosomal loci encoding for the variable regions of TCRs and BCRs are challenging to decipher due to repetitive elements and undocumented structural variants. METHODS To confront this challenge, AIRR-seq-based methods have recently been developed for B cells, enabling genotype and haplotype inference and discovery of undocumented alleles. However, this approach relies on complete coverage of the receptors' variable regions, whereas most T cell studies sequence a small fraction of that region. Here, we adapted a B cell pipeline for undocumented alleles, genotype, and haplotype inference for full and partial AIRR-seq TCR data sets. The pipeline also deals with gene assignment ambiguities, which is especially important in the analysis of data sets of partial sequences. RESULTS From the full and partial AIRR-seq TCR data sets, we identified 39 undocumented polymorphisms in T cell receptor Beta V (TRBV) and 31 undocumented 5 ' UTR sequences. A subset of these inferences was also observed using independent genomic approaches. We found that a single nucleotide polymorphism differentiating between the two documented T cell receptor Beta D2 (TRBD2) alleles is strongly associated with dramatic changes in the expressed repertoire. CONCLUSIONS We reveal a rich picture of germline variability and demonstrate how a single nucleotide polymorphism dramatically affects the composition of the whole repertoire. Our findings provide a basis for annotation of TCR repertoires for future basic and clinical studies.
Collapse
Affiliation(s)
- Aviv Omer
- Faculty of Engineering, Bar Ilan University, Ramat Gan, 5290002, Israel
- Bar Ilan institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan, 5290002, Israel
- Bar Ilan institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - William Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, UK
| | - Pazit Polak
- Faculty of Engineering, Bar Ilan University, Ramat Gan, 5290002, Israel
- Bar Ilan institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Andrew M Collins
- School of Biotechnology and Biomedical Sciences, University of New South Wales, Sydney, Australia
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan, 5290002, Israel.
- Bar Ilan institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, 5290002, Israel.
| |
Collapse
|
11
|
Huang Y, Thörnqvist L, Ohlin M. Computational Inference, Validation, and Analysis of 5'UTR-Leader Sequences of Alleles of Immunoglobulin Heavy Chain Variable Genes. Front Immunol 2021; 12:730105. [PMID: 34671351 PMCID: PMC8521166 DOI: 10.3389/fimmu.2021.730105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 09/06/2021] [Indexed: 12/05/2022] Open
Abstract
Upstream and downstream sequences of immunoglobulin genes may affect the expression of such genes. However, these sequences are rarely studied or characterized in most studies of immunoglobulin repertoires. Inference from large, rearranged immunoglobulin transcriptome data sets offers an opportunity to define the upstream regions (5'-untranslated regions and leader sequences). We have now established a new data pre-processing procedure to eliminate artifacts caused by a 5'-RACE library generation process, reanalyzed a previously studied data set defining human immunoglobulin heavy chain genes, and identified novel upstream regions, as well as previously identified upstream regions that may have been identified in error. Upstream sequences were also identified for a set of previously uncharacterized germline gene alleles. Several novel upstream region variants were validated, for instance by their segregation to a single haplotype in heterozygotic subjects. SNPs representing several sequence variants were identified from population data. Finally, based on the outcomes of the analysis, we define a set of testable hypotheses with respect to the placement of particular alleles in complex IGHV locus haplotypes, and discuss the evolutionary relatedness of particular heavy chain variable genes based on sequences of their upstream regions.
Collapse
Affiliation(s)
| | | | - Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden
| |
Collapse
|
12
|
Ohlin M. Poorly Expressed Alleles of Several Human Immunoglobulin Heavy Chain Variable Genes are Common in the Human Population. Front Immunol 2021; 11:603980. [PMID: 33717051 PMCID: PMC7943739 DOI: 10.3389/fimmu.2020.603980] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 12/08/2020] [Indexed: 12/23/2022] Open
Abstract
Extensive diversity has been identified in the human heavy chain immunoglobulin locus, including allelic variation, gene duplication, and insertion/deletion events. Several genes have been suggested to be deleted in many haplotypes. Such findings have commonly been based on inference of the germline repertoire from data sets covering antibody heavy chain encoding transcripts. The inference process operates under conditions that may limit identification of genes transcribed at low levels. The presence of rare transcripts that would indicate the existence of poorly expressed alleles in haplotypes that otherwise appear to have deleted these genes has been assessed in the present study. Alleles IGHV1-2*05, IGHV1-3*02, IGHV4-4*01, and IGHV7-4-1*01 were all identified as being expressed from multiple haplotypes, but only at low levels, haplotypes that by inference often appeared not to express these genes at all. These genes are thus not as commonly deleted as previously thought. An assessment of the 5' untranslated region (up to and including the TATA-box), the signal peptide-encoding part of the gene, and the 3'-heptamer suggests that the alleles have no or minimal sequence difference in these regions in comparison to highly expressed alleles. This suggest that they may be able to participate in immunoglobulin gene rearrangement, transcription and translation. However, all four poorly expressed alleles harbor unusual sequence variants within their coding region that may compromise the functionality of the encoded products, thereby limiting their incorporation into the immunoglobulin repertoire. Transcripts based on IGHV7-4-1*01 that had undergone somatic hypermutation and class switch had mutated the codon that encoded the unusual residue in framework region 3 (cysteine 92; located far from the antigen binding site). This finding further supports the poor compatibility of this unusual residue in a fully functional protein product. Indications of a linkage disequilibrium were identified as IGHV1-2*05 and IGHV4-4*01 co-localized to the same haplotypes. Furthermore, transcripts of two of the poorly expressed alleles (IGHV1-3*02 and IGHV4-4*01) mostly do not encode in-frame, functional products, suggesting that these alleles might be essentially non-functional. It is proposed that the functionality status of immunoglobulin genes should also include assessment of their ability to encode functional protein products.
Collapse
Affiliation(s)
- Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden
| |
Collapse
|
13
|
Collins AM, Peres A, Corcoran MM, Watson CT, Yaari G, Lees WD, Ohlin M. Commentary on Population matched (pm) germline allelic variants of immunoglobulin (IG) loci: relevance in infectious diseases and vaccination studies in human populations. Genes Immun 2021; 22:335-338. [PMID: 34667305 PMCID: PMC8674141 DOI: 10.1038/s41435-021-00152-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 09/29/2021] [Accepted: 10/05/2021] [Indexed: 12/11/2022]
Affiliation(s)
- Andrew M. Collins
- grid.1005.40000 0004 4902 0432School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW Australia
| | - Ayelet Peres
- grid.22098.310000 0004 1937 0503Bioengineering, Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel ,grid.22098.310000 0004 1937 0503Bar Ilan Institute of Nanotechnologies and Advanced Materials, Bar Ilan University, Ramat Gan, Israel
| | - Martin M. Corcoran
- grid.4714.60000 0004 1937 0626Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
| | - Corey T. Watson
- grid.266623.50000 0001 2113 1622Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY USA
| | - Gur Yaari
- grid.22098.310000 0004 1937 0503Bioengineering, Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel ,grid.22098.310000 0004 1937 0503Bar Ilan Institute of Nanotechnologies and Advanced Materials, Bar Ilan University, Ramat Gan, Israel
| | - William D. Lees
- grid.509978.a0000 0004 0432 693XInstitute of Structural and Molecular Biology, Birkbeck College, University of London, London, UK
| | - Mats Ohlin
- grid.4514.40000 0001 0930 2361Department of Immunotechnology, Lund University, Lund, Sweden
| |
Collapse
|
14
|
Mikocziova I, Gidoni M, Lindeman I, Peres A, Snir O, Yaari G, Sollid LM. Polymorphisms in human immunoglobulin heavy chain variable genes and their upstream regions. Nucleic Acids Res 2020; 48:5499-5510. [PMID: 32365177 PMCID: PMC7261178 DOI: 10.1093/nar/gkaa310] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 04/20/2020] [Indexed: 01/13/2023] Open
Abstract
Germline variations in immunoglobulin genes influence the repertoire of B cell receptors and antibodies, and such polymorphisms may impact disease susceptibility. However, the knowledge of the genomic variation of the immunoglobulin loci is scarce. Here, we report 25 potential novel germline IGHV alleles as inferred from rearranged naïve B cell cDNA repertoires of 98 individuals. Thirteen novel alleles were selected for validation, out of which ten were successfully confirmed by targeted amplification and Sanger sequencing of non-B cell DNA. Moreover, we detected a high degree of variability upstream of the V-REGION in the 5′UTR, L-PART1 and L-PART2 sequences, and found that identical V-REGION alleles can differ in upstream sequences. Thus, we have identified a large genetic variation not only in the V-REGION but also in the upstream sequences of IGHV genes. Our findings provide a new perspective for annotating immunoglobulin repertoire sequencing data.
Collapse
Affiliation(s)
- Ivana Mikocziova
- K.G.Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372 Oslo, Norway
| | - Moriah Gidoni
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Ida Lindeman
- K.G.Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372 Oslo, Norway
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Omri Snir
- K.G.Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372 Oslo, Norway
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Ludvig M Sollid
- K.G.Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372 Oslo, Norway
| |
Collapse
|
15
|
Peres A, Gidoni M, Polak P, Yaari G. RAbHIT: R Antibody Haplotype Inference Tool. Bioinformatics 2020; 35:4840-4842. [PMID: 31173062 DOI: 10.1093/bioinformatics/btz481] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 05/11/2019] [Accepted: 06/04/2019] [Indexed: 12/11/2022] Open
Abstract
SUMMARY Antibody haplotype inference (chromosomal phasing) may have clinical implications for the identification of genetic predispositions to diseases. Yet, our knowledge of the genomic loci encoding for the variable regions of the antibody is only partial, mostly due to the challenge of aligning short reads from genome sequencing to these highly repetitive loci. A powerful approach to infer the content of these loci relies on analyzing repertoires of rearranged V(D)J sequences. We present here RAbHIT, an R Haplotype Antibody Inference Tool, that implements a novel algorithm to infer V(D)J haplotypes by adapting a Bayesian framework. RAbHIT offers inference of haplotype and gene deletions. It may be applied to sequences from naïve and non-naïve B-cells, sequenced by different library preparation protocols. AVAILABILITY AND IMPLEMENTATION RAbHIT is freely available for academic use from comprehensive R archive network (CRAN) (https://cran.r-project.org/web/packages/rabhit/) under CC BY-SA 4.0 license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Moriah Gidoni
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Pazit Polak
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
16
|
Inter- and intraspecies comparison of phylogenetic fingerprints and sequence diversity of immunoglobulin variable genes. Immunogenetics 2020; 72:279-294. [PMID: 32367185 DOI: 10.1007/s00251-020-01164-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 04/13/2020] [Indexed: 10/24/2022]
Abstract
Protection and neutralization of a vast array of pathogens is accomplished by the tremendous diversity of the B cell receptor (BCR) repertoire. For jawed vertebrates, this diversity is initiated via the somatic recombination of immunoglobulin (Ig) germline elements. While it is clear that the number of these germline segments differs from species to species, the extent of cross-species sequence diversity remains largely uncharacterized. Here we use extensive computational and statistical methods to investigate the sequence diversity and evolutionary relationship between Ig variable (V), diversity (D), and joining (J) germline segments across nine commonly studied species ranging from zebrafish to human. Metrics such as guanine-cytosine (GC) content showed low redundancy across Ig germline genes within a given species. Other comparisons, including amino acid motifs, evolutionary selection, and sequence diversity, revealed species-specific properties. Additionally, we showed that the germline-encoded diversity differs across antibody (recombined V-D-J) repertoires of various B cell subsets. To facilitate future comparative immunogenomics analysis, we created VDJgermlines, an R package that contains the germline sequences from multiple species. Our study informs strategies for the humanization and engineering of therapeutic antibodies.
Collapse
|
17
|
Bhardwaj V, Franceschetti M, Rao R, Pevzner PA, Safonova Y. Automated analysis of immunosequencing datasets reveals novel immunoglobulin D genes across diverse species. PLoS Comput Biol 2020; 16:e1007837. [PMID: 32339161 PMCID: PMC7295240 DOI: 10.1371/journal.pcbi.1007837] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 06/15/2020] [Accepted: 04/01/2020] [Indexed: 12/30/2022] Open
Abstract
Immunoglobulin genes are formed through V(D)J recombination, which joins the variable (V), diversity (D), and joining (J) germline genes. Since variations in germline genes have been linked to various diseases, personalized immunogenomics focuses on finding alleles of germline genes across various patients. Although reconstruction of V and J genes is a well-studied problem, the more challenging task of reconstructing D genes remained open until the IgScout algorithm was developed in 2019. In this work, we address limitations of IgScout by developing a probabilistic MINING-D algorithm for D gene reconstruction, apply it to hundreds of immunosequencing datasets from multiple species, and validate the newly inferred D genes by analyzing diverse whole genome sequencing datasets and haplotyping heterozygous V genes. Antibodies provide specific binding to an enormous range of antigens and represent a key component of the adaptive immune system. Immunosequencing has emerged as a method of choice for generating millions of reads that sample antibody repertoires and provides insights into monitoring immune response to disease and vaccination. Most of the previous immunogenomics studies rely on the reference germline genes in the immunoglobulin locus rather than the germline genes in a specific patient. This approach is deficient since the set of known germline genes is incomplete (particularly for non-European humans and non-human species) and contains alleles that resulted from sequencing and annotation errors. The problem of de novo inference of diversity (D) genes from immunosequencing data remained open until the IgScout algorithm was developed in 2019. We address limitations of IgScout by developing a probabilistic MINING-D algorithm for D gene reconstruction and infer multiple D genes across multiple species that are not present in standard databases.
Collapse
Affiliation(s)
- Vinnu Bhardwaj
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, California, United States of America
| | - Massimo Franceschetti
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, California, United States of America
| | - Ramesh Rao
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, California, United States of America
- Qualcomm Institute, University of California San Diego, San Diego, California, United States of America
| | - Pavel A. Pevzner
- Computer Science and Engineering Department, University of California San Diego, San Diego, California, United States of America
- * E-mail:
| | - Yana Safonova
- Computer Science and Engineering Department, University of California San Diego, San Diego, California, United States of America
- Center for Information Theory and Applications, University of California San Diego, San Diego, California, United States of America
| |
Collapse
|
18
|
Ford M, Haghshenas E, Watson CT, Sahinalp SC. Genotyping and Copy Number Analysis of Immunoglobin Heavy Chain Variable Genes Using Long Reads. iScience 2020; 23:100883. [PMID: 32109676 PMCID: PMC7044747 DOI: 10.1016/j.isci.2020.100883] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Revised: 11/08/2019] [Accepted: 01/29/2020] [Indexed: 11/22/2022] Open
Abstract
One of the remaining challenges to describing an individual's genetic variation lies in the highly heterogeneous and complex genomic regions that impede the use of classical reference-guided mapping and assembly approaches. Once such region is the Immunoglobulin heavy chain locus (IGH), which is critical for the development of antibodies and the adaptive immune system. We describe ImmunoTyper, the first PacBio-based genotyping and copy number calling tool specifically designed for IGH V genes (IGHV). We demonstrate that ImmunoTyper's multi-stage clustering and combinatorial optimization approach represents the most comprehensive IGHV genotyping approach published to date, through validation using gold-standard IGH reference sequence. This preliminary work establishes the feasibility of fine-grained genotype and copy number analysis using error-prone long reads in complex multi-gene loci and opens the door for in-depth investigation into IGHV heterogeneity using accessible and increasingly common whole-genome sequence.
Collapse
Affiliation(s)
- Michael Ford
- School of Computing Science, Simon Fraser University, Burnaby V5A 1S6, Canada
| | - Ehsan Haghshenas
- School of Computing Science, Simon Fraser University, Burnaby V5A 1S6, Canada
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville 40292, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, National Cancer Institute, Bethesda, MD 20892, USA.
| |
Collapse
|
19
|
Omer A, Shemesh O, Peres A, Polak P, Shepherd AJ, Watson C, Boyd SD, Collins AM, Lees W, Yaari G. VDJbase: an adaptive immune receptor genotype and haplotype database. Nucleic Acids Res 2020; 48:D1051-D1056. [PMID: 31602484 PMCID: PMC6943044 DOI: 10.1093/nar/gkz872] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 09/19/2019] [Accepted: 10/01/2019] [Indexed: 12/14/2022] Open
Abstract
VDJbase is a publicly available database that offers easy searching of data describing the complete sets of gene sequences (genotypes and haplotypes) inferred from adaptive immune receptor repertoire sequencing datasets. VDJbase is designed to act as a resource that will allow the scientific community to explore the genetic variability of the immunoglobulin (Ig) and T cell receptor (TR) gene loci. It can also assist in the investigation of Ig- and TR-related genetic predispositions to diseases. Our database includes web-based query and online tools to assist in visualization and analysis of the genotype and haplotype data. It enables users to detect those alleles and genes that are significantly over-represented in a particular population, in terms of genotype, haplotype and gene expression. The database website can be freely accessed at https://www.vdjbase.org/, and no login is required. The data and code use creative common licenses and are freely downloadable from https://bitbucket.org/account/user/yaarilab/projects/GPHP.
Collapse
Affiliation(s)
- Aviv Omer
- Bioengineering, Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel
| | - Or Shemesh
- Bioengineering, Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel
| | - Ayelet Peres
- Bioengineering, Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel
| | - Pazit Polak
- Bioengineering, Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel
| | - Adrian J Shepherd
- Institute of Structural and Molecular Biology, Birkbeck, University of London, London, UK
| | - Corey T Watson
- University of Louisville School of Medicine, Biochemistry and Molecular Genetics, Louisville, KY 40292, USA
| | - Scott D Boyd
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
| | - Andrew M Collins
- School of Biotechnology and Biomolecular Sciences, University of NSW, Kensington, Sydney, NSW 2052, Australia
| | - William Lees
- Institute of Structural and Molecular Biology, Birkbeck, University of London, London, UK
| | - Gur Yaari
- Bioengineering, Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
20
|
Ohlin M, Scheepers C, Corcoran M, Lees WD, Busse CE, Bagnara D, Thörnqvist L, Bürckert JP, Jackson KJL, Ralph D, Schramm CA, Marthandan N, Breden F, Scott J, Matsen IV FA, Greiff V, Yaari G, Kleinstein SH, Christley S, Sherkow JS, Kossida S, Lefranc MP, van Zelm MC, Watson CT, Collins AM. Inferred Allelic Variants of Immunoglobulin Receptor Genes: A System for Their Evaluation, Documentation, and Naming. Front Immunol 2019; 10:435. [PMID: 30936866 PMCID: PMC6431624 DOI: 10.3389/fimmu.2019.00435] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 02/19/2019] [Indexed: 11/13/2022] Open
Abstract
Immunoglobulins or antibodies are the main effector molecules of the B-cell lineage and are encoded by hundreds of variable (V), diversity (D), and joining (J) germline genes, which recombine to generate enormous IG diversity. Recently, high-throughput adaptive immune receptor repertoire sequencing (AIRR-seq) of recombined V-(D)-J genes has offered unprecedented insights into the dynamics of IG repertoires in health and disease. Faithful biological interpretation of AIRR-seq studies depends upon the annotation of raw AIRR-seq data, using reference germline gene databases to identify the germline genes within each rearrangement. Existing reference databases are incomplete, as shown by recent AIRR-seq studies that have inferred the existence of many previously unreported polymorphisms. Completing the documentation of genetic variation in germline gene databases is therefore of crucial importance. Lymphocyte receptor genes and alleles are currently assigned by the Immunoglobulins, T cell Receptors and Major Histocompatibility Nomenclature Subcommittee of the International Union of Immunological Societies (IUIS) and managed in IMGT®, the international ImMunoGeneTics information system® (IMGT). In 2017, the IMGT Group reached agreement with a group of AIRR-seq researchers on the principles of a streamlined process for identifying and naming inferred allelic sequences, for their incorporation into IMGT®. These researchers represented the AIRR Community, a network of over 300 researchers whose objective is to promote all aspects of immunoglobulin and T-cell receptor repertoire studies, including the standardization of experimental and computational aspects of AIRR-seq data generation and analysis. The Inferred Allele Review Committee (IARC) was established by the AIRR Community to devise policies, criteria, and procedures to perform this function. Formalized evaluations of novel inferred sequences have now begun and submissions are invited via a new dedicated portal (https://ogrdb.airr-community.org). Here, we summarize recommendations developed by the IARC-focusing, to begin with, on human IGHV genes-with the goal of facilitating the acceptance of inferred allelic variants of germline IGHV genes. We believe that this initiative will improve the quality of AIRR-seq studies by facilitating the description of human IG germline gene variation, and that in time, it will expand to the documentation of TR and IG genes in many vertebrate species.
Collapse
Affiliation(s)
- Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden
| | - Cathrine Scheepers
- Center for HIV and STIs, National Institute for Communicable Diseases, Johannesburg, South Africa
- Faculty of Health Sciences, School of Pathology, University of the Witwatersrand, Johannesburg, South Africa
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
| | - William D. Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - Christian E. Busse
- Division of B Cell Immunology, German Cancer Research Center, Heidelberg, Germany
| | - Davide Bagnara
- Department of Experimental Medicine, University of Genoa, Genoa, Italy
| | | | | | | | - Duncan Ralph
- Fred Hutchinson Cancer Research Center, Seattle, WA, United States
| | - Chaim A. Schramm
- Vaccine Research Center, National Institutes of Health, Washington, DC, United States
| | - Nishanth Marthandan
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Felix Breden
- Department of Biological Sciences, Simon Fraser University, Burnaby, BC, Canada
| | - Jamie Scott
- Department of Molecular Biology and Biochemistry, Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada
| | | | - Victor Greiff
- Department of Immunology, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
| | | | - Scott Christley
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, United States
| | - Jacob S. Sherkow
- Innovation Center for Law and Technology, New York Law School, New York, NY, United States
| | - Sofia Kossida
- IMGT, The International ImMunoGenetics information system (IMGT), Laboratoire d'ImmunoGénétique Moléculaire (LIGM), CNRS, Institut de Génétique Humaine, Université de Montpellier, Montpellier, France
| | - Marie-Paule Lefranc
- IMGT, The International ImMunoGenetics information system (IMGT), Laboratoire d'ImmunoGénétique Moléculaire (LIGM), CNRS, Institut de Génétique Humaine, Université de Montpellier, Montpellier, France
| | - Menno C. van Zelm
- Department of Immunology and Pathology, Central Clinical School, The Alfred Hospital, Monash University, Melbourne, VIC, Australia
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, United States
| | - Andrew M. Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
21
|
Gidoni M, Snir O, Peres A, Polak P, Lindeman I, Mikocziova I, Sarna VK, Lundin KEA, Clouser C, Vigneault F, Collins AM, Sollid LM, Yaari G. Mosaic deletion patterns of the human antibody heavy chain gene locus shown by Bayesian haplotyping. Nat Commun 2019; 10:628. [PMID: 30733445 PMCID: PMC6367474 DOI: 10.1038/s41467-019-08489-3] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 01/10/2019] [Indexed: 12/11/2022] Open
Abstract
Analysis of antibody repertoires by high-throughput sequencing is of major importance in understanding adaptive immune responses. Our knowledge of variations in the genomic loci encoding immunoglobulin genes is incomplete, resulting in conflicting VDJ gene assignments and biased genotype and haplotype inference. Haplotypes can be inferred using IGHJ6 heterozygosity, observed in one third of the people. Here, we propose a robust novel method for determining VDJ haplotypes by adapting a Bayesian framework. Our method extends haplotype inference to IGHD- and IGHV-based analysis, enabling inference of deletions and copy number variations in the entire population. To test this method, we generated a multi-individual data set of naive B-cell repertoires, and found allele usage bias, as well as a mosaic, tiled pattern of deleted IGHD and IGHV genes. The inferred haplotypes may have clinical implications for genetic disease predispositions. Our findings expand the knowledge that can be extracted from antibody repertoire sequencing data.
Collapse
Affiliation(s)
- Moriah Gidoni
- Faculty of Engineering, Bar Ilan University, 5290002, Ramat Gan, Israel
| | - Omri Snir
- KG Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372, Oslo, Norway
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, 5290002, Ramat Gan, Israel
| | - Pazit Polak
- Faculty of Engineering, Bar Ilan University, 5290002, Ramat Gan, Israel
| | - Ida Lindeman
- KG Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372, Oslo, Norway
| | - Ivana Mikocziova
- KG Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372, Oslo, Norway
| | - Vikas Kumar Sarna
- KG Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372, Oslo, Norway
| | - Knut E A Lundin
- KG Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372, Oslo, Norway
| | | | | | - Andrew M Collins
- School of Biotechnology and Biomolecular Sciences, University of NSW, Kensington, Sydney, NSW, 2052, Australia
| | - Ludvig M Sollid
- KG Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372, Oslo, Norway
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, 5290002, Ramat Gan, Israel.
| |
Collapse
|
22
|
Thörnqvist L, Ohlin M. Critical steps for computational inference of the 3'-end of novel alleles of immunoglobulin heavy chain variable genes - illustrated by an allele of IGHV3-7. Mol Immunol 2018; 103:1-6. [PMID: 30172112 DOI: 10.1016/j.molimm.2018.08.018] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 08/10/2018] [Accepted: 08/18/2018] [Indexed: 01/16/2023]
Abstract
Sequencing of immunoglobulin germline gene loci is a challenging process, e.g. due to their repetitiveness and complexity, hence limiting the insight in the germline gene repertoire of humans and other species. Through next generation sequencing technology, it is possible to generate immunoglobulin transcript data sets large enough to computationally infer the germline genes from which the transcripts originate. Multiple tools for such inference have been developed and they can be used for construction of individual germline gene databases, and for discovery of new immunoglobulin germline genes and alleles. However, there are challenges associated with these methods, many of them related to the biological process through which immunoglobulin coding genes are generated. The junctional diversity introduced during rearrangement of the immunoglobulin heavy chain variable (IGHV), diversity and joining genes specifically complicates the inference of the junction regions, with implications for inference of the 3'-end of IGHV genes. With the aim of coping with such diversity, an inference software package may not be able to identify novel alleles harbouring a difference in these regions compared to their closest relatives in the starting database. In this study, we were able to computationally infer one such previously uncharacterized allele, IGHV3-7*02 A318G. However, this was possible only if a strategy was used in which different variants of IGHV3-7*02 were included in the inference-initiating database. Importantly, the presence of the novel allele, but not the standard IGHV3-7*02 sequence, in the genotype was strongly supported by the actual sequences that were assigned to the allele. We thus showed that the starting database used will impact the germline gene inference process, and that difference in the 3'-end of IGHV genes may remain undetected unless specific, non-standard procedures are used to address this matter. We suggest that inferred genes/alleles should be confirmed e.g. by examination of the nucleotide composition of the 3'-bases of the inference-supporting sequence reads.
Collapse
Affiliation(s)
| | - Mats Ohlin
- Dept. of Immunotechnology, Lund University, Lund, Sweden.
| |
Collapse
|
23
|
Abstract
Probabilistic modeling is fundamental to the statistical analysis of complex data. In addition to forming a coherent description of the data-generating process, probabilistic models enable parameter inference about given datasets. This procedure is well developed in the Bayesian perspective, in which one infers probability distributions describing to what extent various possible parameters agree with the data. In this paper, we motivate and review probabilistic modeling for adaptive immune receptor repertoire data then describe progress and prospects for future work, from germline haplotyping to adaptive immune system deployment across tissues. The relevant quantities in immune sequence analysis include not only continuous parameters such as gene use frequency but also discrete objects such as B-cell clusters and lineages. Throughout this review, we unravel the many opportunities for probabilistic modeling in adaptive immune receptor analysis, including settings for which the Bayesian approach holds substantial promise (especially if one is optimistic about new computational methods). From our perspective, the greatest prospects for progress in probabilistic modeling for repertoires concern ancestral sequence estimation for B-cell receptor lineages, including uncertainty from germline genotype, rearrangement, and lineage development.
Collapse
Affiliation(s)
- Branden Olson
- Computational Biology Program Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Mail stop: M1-B514 Seattle, WA 98109-1024 phone: +1 206 667 7318
| | - Frederick A. Matsen
- Computational Biology Program Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Mail stop: M1-B514 Seattle, WA 98109-1024 phone: +1 206 667 7318
| |
Collapse
|
24
|
Thörnqvist L, Ohlin M. Data on the nucleotide composition of the first codons encoding the complementary determining region 3 (CDR3) in immunoglobulin heavy chains. Data Brief 2018; 19:337-352. [PMID: 29892656 PMCID: PMC5992955 DOI: 10.1016/j.dib.2018.04.125] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 04/30/2018] [Indexed: 01/13/2023] Open
Abstract
The highly variable complementary determining region 3 (CDR3) of antibodies is generated through recombination of immunoglobulin heavy chain variable (IGHV), diversity, and joining genes. The codons encoding the first residues of CDR3 may be derived directly from the IGHV germline gene but they may also be generated as part of the rearrangement process. Data of the nucleotide composition of these codons of rearranged genes, an indicator of the degree of contribution of the IGHV gene to CDR3 diversity, are presented in this article. Analyzed data are presented for two unrelated sets of raw sequence data. The raw data sets consisted of sequences of antibody heavy chain-encoding transcripts of six allergic subjects (European Nucleotide Archive accession number PRJEB18926), and paired antibody heavy and light chain variable region-encoding transcripts of memory B cells of three subjects (European Nucleotide Archive accession numbers SRX709625, SRX709626, and SRX709627). The nucleotide compositions of the corresponding 5′-ends of sequences encoding the CDR3 are presented for transcripts with an origin in 47 different IGHV alleles. These data have been used (Thörnqvist and Ohlin, 2018) [1] to demonstrate the extent of incorporation of the 3′ most bases of IGHV germline genes into rearranged immunoglobulin encoding sequences, and the extent whereby any difference in incorporation affects the specificity of inference of the 3′-end of IGHV genes from immunoglobulin-encoding transcripts. They have also been used to assess the effect of observed gene differences on the composition of the ascending strand of CDR3 associated to antibodies with an origin in different IGHV genes (Thörnqvist and Ohlin, 2018) [1].
Collapse
Affiliation(s)
| | - Mats Ohlin
- Dept. of Immunotechnology, Lund University, Lund, Sweden
| |
Collapse
|
25
|
Thörnqvist L, Ohlin M. The functional 3'-end of immunoglobulin heavy chain variable (IGHV) genes. Mol Immunol 2018; 96:61-68. [PMID: 29499482 DOI: 10.1016/j.molimm.2018.02.013] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 02/01/2018] [Accepted: 02/18/2018] [Indexed: 12/15/2022]
Abstract
Inference of antibody gene repertoires using transcriptome data has emerged as an alternative approach to the complex process of sequencing of adaptive immune receptor germline gene loci. The diversity introduced during rearrangement of immunoglobulin heavy chain variable (IGHV), diversity, and joining genes has however been identified as potentially affecting inference specificity. In this study, we have addressed this issue by analysing the nucleotide composition of unmutated human immunoglobulin heavy chains-encoding transcripts, focusing on the 3ö most bases of 47 IGHV germline genes. Although transcripts derived from some of the germline genes predominately incorporated the germline encoded base even at position 320, the last base of most IGHV genes, transcripts originating in other genes presented other nucleotides to the same extent at this position. In transcripts derived from two of the germline genes, IGHV3-13*01 and IGHV4-30-2*01, the predominating nucleotide (G) was in fact not that of the gene (A). Hence, we suggest that inference of IGHV genes should be limited to bases preceding nucleotide 320, as inference beyond this would jeopardize the specificity of the inference process. The different degree of incorporation of the final base of the IGHV gene directly influences the distribution of amino acids of the ascending strand of the third complementarity determining region of the heavy chain. Thereby it influences the nature of this specificity-determining part of the antibody population. In addition, we also present data that indicate the existence of a common so far un-recognized allelic variant of IGHV3-7 that carries an A318G difference in relation to IGHV3-7*02.
Collapse
Affiliation(s)
| | - Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden.
| |
Collapse
|
26
|
Miho E, Yermanos A, Weber CR, Berger CT, Reddy ST, Greiff V. Computational Strategies for Dissecting the High-Dimensional Complexity of Adaptive Immune Repertoires. Front Immunol 2018; 9:224. [PMID: 29515569 PMCID: PMC5826328 DOI: 10.3389/fimmu.2018.00224] [Citation(s) in RCA: 118] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 01/26/2018] [Indexed: 12/21/2022] Open
Abstract
The adaptive immune system recognizes antigens via an immense array of antigen-binding antibodies and T-cell receptors, the immune repertoire. The interrogation of immune repertoires is of high relevance for understanding the adaptive immune response in disease and infection (e.g., autoimmunity, cancer, HIV). Adaptive immune receptor repertoire sequencing (AIRR-seq) has driven the quantitative and molecular-level profiling of immune repertoires, thereby revealing the high-dimensional complexity of the immune receptor sequence landscape. Several methods for the computational and statistical analysis of large-scale AIRR-seq data have been developed to resolve immune repertoire complexity and to understand the dynamics of adaptive immunity. Here, we review the current research on (i) diversity, (ii) clustering and network, (iii) phylogenetic, and (iv) machine learning methods applied to dissect, quantify, and compare the architecture, evolution, and specificity of immune repertoires. We summarize outstanding questions in computational immunology and propose future directions for systems immunology toward coupling AIRR-seq with the computational discovery of immunotherapeutics, vaccines, and immunodiagnostics.
Collapse
Affiliation(s)
- Enkelejda Miho
- Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- aiNET GmbH, ETH Zürich, Basel, Switzerland
| | - Alexander Yermanos
- Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Cédric R. Weber
- Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Christoph T. Berger
- Department of Biomedicine, University Hospital Basel, Basel, Switzerland
- Department of Internal Medicine, Clinical Immunology, University Hospital Basel, Basel, Switzerland
| | - Sai T. Reddy
- Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Victor Greiff
- Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Department of Immunology, University of Oslo, Oslo, Norway
| |
Collapse
|
27
|
Kirik U, Persson H, Levander F, Greiff L, Ohlin M. Antibody Heavy Chain Variable Domains of Different Germline Gene Origins Diversify through Different Paths. Front Immunol 2017; 8:1433. [PMID: 29180996 PMCID: PMC5694033 DOI: 10.3389/fimmu.2017.01433] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 10/16/2017] [Indexed: 02/04/2023] Open
Abstract
B cells produce antibodies, key effector molecules in health and disease. They mature their properties, including their affinity for antigen, through hypermutation events; processes that involve, e.g., base substitution, codon insertion and deletion, often in association with an isotype switch. Investigations of antibody evolution define modes whereby particular antibody responses are able to form, and such studies provide insight important for instance for development of efficient vaccines. Antibody evolution is also used in vitro for the design of antibodies with improved properties. To better understand the basic concepts of antibody evolution, we analyzed the mutational paths, both in terms of amino acid substitution and insertions and deletions, taken by antibodies of the IgG isotype. The analysis focused on the evolution of the heavy chain variable domain of sets of antibodies, each with an origin in 1 of 11 different germline genes representing six human heavy chain germline gene subgroups. Investigated genes were isolated from cells of human bone marrow, a major site of antibody production, and characterized by next-generation sequencing and an in-house bioinformatics pipeline. Apart from substitutions within the complementarity determining regions, multiple framework residues including those in protein cores were targets of extensive diversification. Diversity, both in terms of substitutions, and insertions and deletions, in antibodies is focused to different positions in the sequence in a germline gene-unique manner. Altogether, our findings create a framework for understanding patterns of evolution of antibodies from defined germline genes.
Collapse
Affiliation(s)
- Ufuk Kirik
- Department of Immunotechnology, Lund University, Lund, Sweden
| | - Helena Persson
- Science for Life Laboratory, Drug Discovery and Development Platform, School of Biotechnology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Fredrik Levander
- Department of Immunotechnology, Lund University, Lund, Sweden.,National Bioinformatics Infrastructure Sweden (NBIS), Science for Life Laboratory, Department of Immunotechnology, Lund University, Lund, Sweden
| | - Lennart Greiff
- Department of Clinical Sciences, Lund University, Lund, Sweden.,Department of Otorhinolaryngology, Head and Neck Surgery, Skåne University Hospital, Lund, Sweden
| | - Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden.,Science for Life Laboratory, Drug Discovery and Development Platform, Human Antibody Therapeutics, Lund University, Lund, Sweden.,U-READ, Lund School of Technology, Lund University, Lund, Sweden
| |
Collapse
|
28
|
Kirik U, Greiff L, Levander F, Ohlin M. Data on haplotype-supported immunoglobulin germline gene inference. Data Brief 2017; 13:620-640. [PMID: 28725665 PMCID: PMC5502703 DOI: 10.1016/j.dib.2017.06.031] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Revised: 05/26/2017] [Accepted: 06/19/2017] [Indexed: 01/03/2023] Open
Abstract
Data that defines IGHV (immunoglobulin heavy chain variable) germline gene inference using sequences of IgM-encoding transcriptomes obtained by Illumina MiSeq sequencing technology are described. Such inference is used to establish personalized germline gene sets for in-depth antibody repertoire studies and to detect new antibody germline genes from widely available immunoglobulin-encoding transcriptome data sets. Specifically, the data has been used to validate (Parallel antibody germline gene and haplotype analyses support the validity of immunoglobulin germline gene inference and discovery (DOI: 10.1016/j.molimm.2017.03.012) (Kirik et al., 2017) [1]) the inference process. This was accomplished based on analysis of the inferred germline genes’ association to the donors’ different haplotypes as defined by their different, expressed IGHJ alleles and/or IGHD genes/alleles. The data is important for development of validated germline gene databases containing entries inferred from immunoglobulin-encoding transcriptome sequencing data sets, and for generation of valid, personalized antibody germline gene repertoires.
Collapse
Affiliation(s)
- Ufuk Kirik
- Dept. of Immunotechnology, Lund University, Lund, Sweden
| | - Lennart Greiff
- Dept. of Clinical Sciences, Division of Otorhinolaryngology, Head and Neck Cancer, Lund University, Sweden.,Dept. of Otorhinolaryngology, Skåne University Hospital, Lund, Sweden
| | | | - Mats Ohlin
- Dept. of Immunotechnology, Lund University, Lund, Sweden
| |
Collapse
|