1
|
Peres A, Lees WD, Rodriguez OL, Lee NY, Polak P, Hope R, Kedmi M, Collins AM, Ohlin M, Kleinstein S, Watson C, Yaari G. IGHV allele similarity clustering improves genotype inference from adaptive immune receptor repertoire sequencing data. Nucleic Acids Res 2023; 51:e86. [PMID: 37548401 PMCID: PMC10484671 DOI: 10.1093/nar/gkad603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 06/26/2023] [Accepted: 08/03/2023] [Indexed: 08/08/2023] Open
Abstract
In adaptive immune receptor repertoire analysis, determining the germline variable (V) allele associated with each T- and B-cell receptor sequence is a crucial step. This process is highly impacted by allele annotations. Aligning sequences, assigning them to specific germline alleles, and inferring individual genotypes are challenging when the repertoire is highly mutated, or sequence reads do not cover the whole V region. Here, we propose an alternative naming scheme for the V alleles, as well as a novel method to infer individual genotypes. We demonstrate the strengths of the two by comparing their outcomes to other genotype inference methods. We validate the genotype approach with independent genomic long-read data. The naming scheme is compatible with current annotation tools and pipelines. Analysis results can be converted from the proposed naming scheme to the nomenclature determined by the International Union of Immunological Societies (IUIS). Both the naming scheme and the genotype procedure are implemented in a freely available R package (PIgLET https://bitbucket.org/yaarilab/piglet). To allow researchers to further explore the approach on real data and to adapt it for their uses, we also created an interactive website (https://yaarilab.github.io/IGHV_reference_book).
Collapse
Affiliation(s)
- Ayelet Peres
- Faculty of Engineering, Bar Ilan University, 5290002 Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, 5290002 Ramat Gan, Israel
| | - William D Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, WC1E 7JE, UK
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, 40202, USA
| | - Noah Y Lee
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06511, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Pazit Polak
- Faculty of Engineering, Bar Ilan University, 5290002 Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, 5290002 Ramat Gan, Israel
| | - Ronen Hope
- Faculty of Engineering, Bar Ilan University, 5290002 Ramat Gan, Israel
| | - Meirav Kedmi
- Department of Pathology, Yale School of Medicine, New Haven, CT, 06520, USA
- Division of Hematology and Bone Marrow Transplantation, Chaim Sheba Medical Center, Tel-Hashomer, 5262000, Israel
- Sackler School of Medicine, Tel-Aviv University, Tel-Aviv, 69978, Israel
| | - Andrew M Collins
- School of Biotechnology and Biomedical Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Mats Ohlin
- Department of Immunotechnology Lund University, Lund, 221 00, Sweden
| | - Steven H Kleinstein
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06511, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, 40202, USA
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, 5290002 Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, 5290002 Ramat Gan, Israel
| |
Collapse
|
2
|
Lees WD, Christley S, Peres A, Kos JT, Corrie B, Ralph D, Breden F, Cowell LG, Yaari G, Corcoran M, Karlsson Hedestam GB, Ohlin M, Collins AM, Watson CT, Busse CE. AIRR community curation and standardised representation for immunoglobulin and T cell receptor germline sets. IMMUNOINFORMATICS (AMSTERDAM, NETHERLANDS) 2023; 10:100025. [PMID: 37388275 PMCID: PMC10310305 DOI: 10.1016/j.immuno.2023.100025] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Analysis of an individual's immunoglobulin or T cell receptor gene repertoire can provide important insights into immune function. High-quality analysis of adaptive immune receptor repertoire sequencing data depends upon accurate and relatively complete germline sets, but current sets are known to be incomplete. Established processes for the review and systematic naming of receptor germline genes and alleles require specific evidence and data types, but the discovery landscape is rapidly changing. To exploit the potential of emerging data, and to provide the field with improved state-of-the-art germline sets, an intermediate approach is needed that will allow the rapid publication of consolidated sets derived from these emerging sources. These sets must use a consistent naming scheme and allow refinement and consolidation into genes as new information emerges. Name changes should be minimised, but, where changes occur, the naming history of a sequence must be traceable. Here we outline the current issues and opportunities for the curation of germline IG/TR genes and present a forward-looking data model for building out more robust germline sets that can dovetail with current established processes. We describe interoperability standards for germline sets, and an approach to transparency based on principles of findability, accessibility, interoperability, and reusability.
Collapse
Affiliation(s)
- William D. Lees
- Institute of Structural and Molecular Biology, Birkbeck College, London, England
- Human-Centered Computing and Information Science, Institute for Systems and Computer Engineering Technology and Science, Porto, Portugal
| | - Scott Christley
- Peter O’Donnell Jr. School of Public Health, UT Southwestern Medical Center, Dallas, TX, USA
| | - Ayelet Peres
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Justin T. Kos
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, KY, USA
| | - Brian Corrie
- Department of Biological Sciences, Simon Fraser University, Burnaby, BC, Canada
| | - Duncan Ralph
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Felix Breden
- Department of Biological Sciences, Simon Fraser University, Burnaby, BC, Canada
| | - Lindsay G. Cowell
- Peter O’Donnell Jr. School of Public Health, Department of Immunology, School of Biomedical Sciences, UT Southwestern Medical Center, Dallas, TX, USA
| | - Gur Yaari
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Swede
| | | | - Mats Ohlin
- Department of Immunotechnology and SciLifeLab, Lund University, Lund, Sweden
| | - Andrew M. Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, KY, USA
| | - Christian E. Busse
- Division of B Cell Immunology, German Cancer Research Center, Heidelberg, Germany
| | | |
Collapse
|
3
|
Narang S, Kaduk M, Chernyshev M, Karlsson Hedestam GB, Corcoran MM. Adaptive immune receptor genotyping using the corecount program. Front Immunol 2023; 14:1125884. [PMID: 37114042 PMCID: PMC10126697 DOI: 10.3389/fimmu.2023.1125884] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 02/27/2023] [Indexed: 04/29/2023] Open
Abstract
We present a new Rep-Seq analysis tool called corecount, for analyzing genotypic variation in immunoglobulin (IG) and T cell receptor (TCR) genes. corecount is highly efficient at identifying V alleles, including those that are infrequently used in expressed repertoires and those that contain 3' end variation that are otherwise refractory to reliable identification during germline inference from expressed libraries. Furthermore, corecount facilitates accurate D and J gene genotyping. The output is highly reproducible and facilitates the comparison of genotypes from multiple individuals, such as those from clinical cohorts. Here, we applied corecount to the genotypic analysis of IgM libraries from 16 individuals. To demonstrate the accuracy of corecount, we Sanger sequenced all the heavy chain IG alleles (65 IGHV, 27 IGHD and 7 IGHJ) from one individual from whom we also produced two independent IgM Rep-seq datasets. Genomic analysis revealed that 5 known IGHV and 2 IGHJ sequences are truncated in current reference databases. This dataset of genomically validated alleles and IgM libraries from the same individual provides a useful resource for benchmarking other bioinformatic programs that involve V, D and J assignments and germline inference, and may facilitate the development of AIRR-Seq analysis tools that can take benefit from the availability of more comprehensive reference databases.
Collapse
|
4
|
Yang X, Zhu Y, Chen S, Zeng H, Guan J, Wang Q, Lan C, Sun D, Yu X, Zhang Z. Novel Allele Detection Tool Benchmark and Application With Antibody Repertoire Sequencing Dataset. Front Immunol 2021; 12:739179. [PMID: 34764956 PMCID: PMC8576399 DOI: 10.3389/fimmu.2021.739179] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Accepted: 10/11/2021] [Indexed: 11/29/2022] Open
Abstract
Detailed knowledge of the diverse immunoglobulin germline genes is critical for the study of humoral immunity. Hundreds of alleles have been discovered by analyzing antibody repertoire sequencing (Rep-seq or Ig-seq) data via multiple novel allele detection tools (NADTs). However, the performance of these NADTs through antibody sequences with intrinsic somatic hypermutations (SHMs) is unclear. Here, we developed a tool to simulate repertoires by integrating the full spectrum features of an antibody repertoire such as germline gene usage, junctional modification, position-specific SHM and clonal expansion based on 2152 high-quality datasets. We then systematically evaluated these NADTs using both simulated and genuine Ig-seq datasets. Finally, we applied these NADTs to 687 Ig-seq datasets and identified 43 novel allele candidates (NACs) using defined criteria. Twenty-five alleles were validated through findings of other sources. In addition to the NACs detected, our simulation tool, the results of our comparison, and the streamline of this process may benefit further humoral immunity studies via Ig-seq.
Collapse
Affiliation(s)
- Xiujia Yang
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China.,Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China.,State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou, China.,Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Yan Zhu
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Sen Chen
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou, China.,Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Huikun Zeng
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China.,Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Junjie Guan
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou, China.,Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Qilong Wang
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China.,Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Chunhong Lan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Deqiang Sun
- Department of Center Laboratory, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Xueqing Yu
- Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China.,Division of Nephrology, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Zhenhai Zhang
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China.,Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China.,State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou, China.,Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.,Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou, China
| |
Collapse
|
5
|
Huang Y, Thörnqvist L, Ohlin M. Computational Inference, Validation, and Analysis of 5'UTR-Leader Sequences of Alleles of Immunoglobulin Heavy Chain Variable Genes. Front Immunol 2021; 12:730105. [PMID: 34671351 PMCID: PMC8521166 DOI: 10.3389/fimmu.2021.730105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 09/06/2021] [Indexed: 12/05/2022] Open
Abstract
Upstream and downstream sequences of immunoglobulin genes may affect the expression of such genes. However, these sequences are rarely studied or characterized in most studies of immunoglobulin repertoires. Inference from large, rearranged immunoglobulin transcriptome data sets offers an opportunity to define the upstream regions (5'-untranslated regions and leader sequences). We have now established a new data pre-processing procedure to eliminate artifacts caused by a 5'-RACE library generation process, reanalyzed a previously studied data set defining human immunoglobulin heavy chain genes, and identified novel upstream regions, as well as previously identified upstream regions that may have been identified in error. Upstream sequences were also identified for a set of previously uncharacterized germline gene alleles. Several novel upstream region variants were validated, for instance by their segregation to a single haplotype in heterozygotic subjects. SNPs representing several sequence variants were identified from population data. Finally, based on the outcomes of the analysis, we define a set of testable hypotheses with respect to the placement of particular alleles in complex IGHV locus haplotypes, and discuss the evolutionary relatedness of particular heavy chain variable genes based on sequences of their upstream regions.
Collapse
Affiliation(s)
| | | | - Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden
| |
Collapse
|
6
|
Rhesus and cynomolgus macaque immunoglobulin heavy-chain genotyping yields comprehensive databases of germline VDJ alleles. Immunity 2021; 54:355-366.e4. [PMID: 33484642 DOI: 10.1016/j.immuni.2020.12.018] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 10/19/2020] [Accepted: 12/30/2020] [Indexed: 12/20/2022]
Abstract
Definition of the specific germline immunoglobulin (Ig) alleles present in an individual is a critical first step to delineate the ontogeny and evolution of antigen-specific antibody responses. Rhesus and cynomolgus macaques are important animal models for pre-clinical studies, with four main sub-groups being used: Indian- and Chinese-origin rhesus macaques and Mauritian and Indonesian cynomolgus macaques. We applied the (Ig) gene inference tool IgDiscover and performed extensive Sanger sequencing-based genomic validation to define germline VDJ alleles in these 4 sub-groups, comprising 45 macaques in total. There was allelic overlap between Chinese- and Indian-origin rhesus macaques and also between the two macaque species, which is consistent with substantial admixture. The island-restricted Mauritian cynomolgus population displayed the lowest number of alleles of the sub-groups, yet maintained high individual allelic diversity. These comprehensive databases of germline IGH alleles for rhesus and cynomolgus macaques provide a resource toward the study of B cell responses in these important pre-clinical models.
Collapse
|
7
|
Mikocziova I, Gidoni M, Lindeman I, Peres A, Snir O, Yaari G, Sollid LM. Polymorphisms in human immunoglobulin heavy chain variable genes and their upstream regions. Nucleic Acids Res 2020; 48:5499-5510. [PMID: 32365177 PMCID: PMC7261178 DOI: 10.1093/nar/gkaa310] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 04/20/2020] [Indexed: 01/13/2023] Open
Abstract
Germline variations in immunoglobulin genes influence the repertoire of B cell receptors and antibodies, and such polymorphisms may impact disease susceptibility. However, the knowledge of the genomic variation of the immunoglobulin loci is scarce. Here, we report 25 potential novel germline IGHV alleles as inferred from rearranged naïve B cell cDNA repertoires of 98 individuals. Thirteen novel alleles were selected for validation, out of which ten were successfully confirmed by targeted amplification and Sanger sequencing of non-B cell DNA. Moreover, we detected a high degree of variability upstream of the V-REGION in the 5′UTR, L-PART1 and L-PART2 sequences, and found that identical V-REGION alleles can differ in upstream sequences. Thus, we have identified a large genetic variation not only in the V-REGION but also in the upstream sequences of IGHV genes. Our findings provide a new perspective for annotating immunoglobulin repertoire sequencing data.
Collapse
Affiliation(s)
- Ivana Mikocziova
- K.G.Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372 Oslo, Norway
| | - Moriah Gidoni
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Ida Lindeman
- K.G.Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372 Oslo, Norway
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Omri Snir
- K.G.Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372 Oslo, Norway
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Ludvig M Sollid
- K.G.Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372 Oslo, Norway
| |
Collapse
|
8
|
Lees W, Busse CE, Corcoran M, Ohlin M, Scheepers C, Matsen FA, Yaari G, Watson CT, Collins A, Shepherd AJ. OGRDB: a reference database of inferred immune receptor genes. Nucleic Acids Res 2020; 48:D964-D970. [PMID: 31566225 PMCID: PMC6943078 DOI: 10.1093/nar/gkz822] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 09/05/2019] [Accepted: 09/16/2019] [Indexed: 12/20/2022] Open
Abstract
High-throughput sequencing of the adaptive immune receptor repertoire (AIRR-seq) is providing unprecedented insights into the immune response to disease and into the development of immune disorders. The accurate interpretation of AIRR-seq data depends on the existence of comprehensive germline gene reference sets. Current sets are known to be incomplete and unrepresentative of the degree of polymorphism and diversity in human and animal populations. A key issue is the complexity of the genomic regions in which they lie, which, because of the presence of multiple repeats, insertions and deletions, have not proved tractable with short-read whole genome sequencing. Recently, tools and methods for inferring such gene sequences from AIRR-seq datasets have become available, and a community approach has been developed for the expert review and publication of such inferences. Here, we present OGRDB, the Open Germline Receptor Database (https://ogrdb.airr-community.org), a public resource for the submission, review and publication of previously unknown receptor germline sequences together with supporting evidence.
Collapse
Affiliation(s)
- William Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London WC1E 7HX, UK
| | - Christian E Busse
- Division of B Cell Immunology, German Cancer Research Center, 69120 Heidelberg, Germany
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Box 280, 171 77 Stockholm, Sweden
| | - Mats Ohlin
- Department of Immunotechnology, Lund University, Medicon Village, S-223 81 Lund, Sweden
| | - Cathrine Scheepers
- Center for HIV and STIs, National Institute for Communicable Diseases of the National Health Laboratory Service, Sandringam, Gauteng 2131, South Africa.,Antibody Immunity Research Unit, School of Pathology, University of the Witwatersrand, Johannesburg 2050, South Africa
| | - Frederick A Matsen
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY 40202, USA
| | | | - Andrew Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Adrian J Shepherd
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London WC1E 7HX, UK
| |
Collapse
|
9
|
Bhardwaj V, Franceschetti M, Rao R, Pevzner PA, Safonova Y. Automated analysis of immunosequencing datasets reveals novel immunoglobulin D genes across diverse species. PLoS Comput Biol 2020; 16:e1007837. [PMID: 32339161 PMCID: PMC7295240 DOI: 10.1371/journal.pcbi.1007837] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 06/15/2020] [Accepted: 04/01/2020] [Indexed: 12/30/2022] Open
Abstract
Immunoglobulin genes are formed through V(D)J recombination, which joins the variable (V), diversity (D), and joining (J) germline genes. Since variations in germline genes have been linked to various diseases, personalized immunogenomics focuses on finding alleles of germline genes across various patients. Although reconstruction of V and J genes is a well-studied problem, the more challenging task of reconstructing D genes remained open until the IgScout algorithm was developed in 2019. In this work, we address limitations of IgScout by developing a probabilistic MINING-D algorithm for D gene reconstruction, apply it to hundreds of immunosequencing datasets from multiple species, and validate the newly inferred D genes by analyzing diverse whole genome sequencing datasets and haplotyping heterozygous V genes. Antibodies provide specific binding to an enormous range of antigens and represent a key component of the adaptive immune system. Immunosequencing has emerged as a method of choice for generating millions of reads that sample antibody repertoires and provides insights into monitoring immune response to disease and vaccination. Most of the previous immunogenomics studies rely on the reference germline genes in the immunoglobulin locus rather than the germline genes in a specific patient. This approach is deficient since the set of known germline genes is incomplete (particularly for non-European humans and non-human species) and contains alleles that resulted from sequencing and annotation errors. The problem of de novo inference of diversity (D) genes from immunosequencing data remained open until the IgScout algorithm was developed in 2019. We address limitations of IgScout by developing a probabilistic MINING-D algorithm for D gene reconstruction and infer multiple D genes across multiple species that are not present in standard databases.
Collapse
Affiliation(s)
- Vinnu Bhardwaj
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, California, United States of America
| | - Massimo Franceschetti
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, California, United States of America
| | - Ramesh Rao
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, California, United States of America
- Qualcomm Institute, University of California San Diego, San Diego, California, United States of America
| | - Pavel A. Pevzner
- Computer Science and Engineering Department, University of California San Diego, San Diego, California, United States of America
- * E-mail:
| | - Yana Safonova
- Computer Science and Engineering Department, University of California San Diego, San Diego, California, United States of America
- Center for Information Theory and Applications, University of California San Diego, San Diego, California, United States of America
| |
Collapse
|
10
|
Watson CT, Kos JT, Gibson WS, Newman L, Deikus G, Busse CE, Smith ML, Jackson KJ, Collins AM. A comparison of immunoglobulin IGHV, IGHD and IGHJ genes in wild-derived and classical inbred mouse strains. Immunol Cell Biol 2019; 97:888-901. [PMID: 31441114 DOI: 10.1111/imcb.12288] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 08/05/2019] [Accepted: 08/20/2019] [Indexed: 01/20/2023]
Abstract
The genomes of classical inbred mouse strains include genes derived from all three major subspecies of the house mouse, Mus musculus. We recently posited that genetic diversity in the immunoglobulin heavy chain (IGH) gene loci of C57BL/6 and BALB/c mice reflects differences in subspecies origin. To investigate this hypothesis, we conducted high-throughput sequencing of IGH gene rearrangements to document IGH variable (IGHV), joining (IGHJ) and diversity (IGHD) genes in four inbred wild-derived mouse strains (CAST/EiJ, LEWES/EiJ, MSM/MsJ and PWD/PhJ) and a single disease model strain (NOD/ShiLtJ), collectively representing genetic backgrounds of several major mouse subspecies. A total of 341 germline IGHV sequences were inferred in the wild-derived strains, including 247 not curated in the international ImMunoGeneTics information system. By contrast, 83/84 inferred NOD IGHV genes had previously been observed in C57BL/6 mice. Variability among the strains examined was observed for only a single IGHJ gene, involving a description of a novel allele. By contrast, unexpected variation was found in the IGHD gene loci, with four previously unreported IGHD gene sequences being documented. Very few IGHV sequences of C57BL/6 and BALB/c mice were shared with strains representing major subspecies, suggesting that their IGH loci may be complex mosaics of genes of disparate origins. This suggests a similar level of diversity is likely present in the IGH loci of other classical inbred strains. This must now be documented if we are to properly understand interstrain variation in models of antibody-mediated disease.
Collapse
Affiliation(s)
- Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, 40202, USA
| | - Justin T Kos
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, 40202, USA
| | - William S Gibson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, 40202, USA
| | - Leah Newman
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Gintaras Deikus
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Christian E Busse
- Division of B Cell Immunology, German Cancer Research Center, 69120, Heidelberg, Germany
| | - Melissa L Smith
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Katherine Jl Jackson
- Immunology Division, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
| | - Andrew M Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| |
Collapse
|
11
|
Safonova Y, Pevzner PA. De novo Inference of Diversity Genes and Analysis of Non-canonical V(DD)J Recombination in Immunoglobulins. Front Immunol 2019; 10:987. [PMID: 31134072 PMCID: PMC6516046 DOI: 10.3389/fimmu.2019.00987] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 04/16/2019] [Indexed: 12/03/2022] Open
Abstract
The V(D)J recombination forms the immunoglobulin genes by joining the variable (V), diversity (D), and joining (J) germline genes. Since variations in germline genes have been linked to various diseases, personalized immunogenomics aims at finding alleles of germline genes across various patients. Although recent studies described algorithms for de novo inference of V and J genes from immunosequencing data, they stopped short of solving a more difficult problem of reconstructing D genes that form the highly divergent CDR3 regions and provide the most important contribution to the antigen binding. We present the IgScout algorithm for de novo D gene reconstruction and apply it to reveal new alleles of human D genes and previously unknown D genes in camel, an important model organism in immunology. We further analyze non-canonical V(DD)J recombination that results in unusually long CDR3s with tandem fused IGHD genes and thus expands the diversity of the antibody repertoires. We demonstrate that tandem CDR3s represent a consistent and functional feature of all analyzed immunosequencing datasets, reveal ultra-long CDR3s, and shed light on the mechanism responsible for their formation.
Collapse
Affiliation(s)
- Yana Safonova
- Center for Information Theory and Applications, University of California, San Diego, San Diego, CA, United States
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, San Diego, CA, United States
| |
Collapse
|
12
|
Vázquez Bernat N, Corcoran M, Hardt U, Kaduk M, Phad GE, Martin M, Karlsson Hedestam GB. High-Quality Library Preparation for NGS-Based Immunoglobulin Germline Gene Inference and Repertoire Expression Analysis. Front Immunol 2019; 10:660. [PMID: 31024532 PMCID: PMC6459949 DOI: 10.3389/fimmu.2019.00660] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 03/11/2019] [Indexed: 12/13/2022] Open
Abstract
Next generation sequencing (NGS) of immunoglobulin (Ig) repertoires (Rep-seq) enables examination of the adaptive immune system at an unprecedented level. Applications include studies of expressed repertoires, gene usage, somatic hypermutation levels, Ig lineage tracing and identification of genetic variation within the Ig loci through inference methods. All these applications require starting libraries that allow the generation of sequence data with low error rate and optimal representation of the expressed repertoire. Here, we provide detailed protocols for the production of libraries suitable for human Ig germline gene inference and Ig repertoire studies. Various parameters used in the process were tested in order to demonstrate factors that are critical to obtain high quality libraries. We demonstrate an improved 5'RACE technique that reduces the length constraints of Illumina MiSeq based Rep-seq analysis but allows for the acquisition of sequences upstream of Ig V genes, useful for primer design. We then describe a 5' multiplex method for library preparation, which yields full length V(D)J sequences suitable for genotype identification and novel gene inference. We provide comprehensive sets of primers targeting IGHV, IGKV, and IGLV genes. Using the optimized protocol, we produced IgM, IgG, IgK, and IgL libraries and analyzed them using the germline inference tool IgDiscover to identify expressed germline V alleles. This process additionally uncovered three IGHV, one IGKV, and six IGLV novel alleles in a single individual, which are absent from the IMGT reference database, highlighting the need for further study of Ig genetic variation. The library generation protocols presented here enable a robust means of analyzing expressed Ig repertoires, identifying novel alleles and producing individualized germline gene databases from humans.
Collapse
Affiliation(s)
- Néstor Vázquez Bernat
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Uta Hardt
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
- Division of Rheumatology, Department of Medicine, Center for Molecular Medicine, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
| | - Mateusz Kaduk
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Ganesh E. Phad
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Marcel Martin
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | | |
Collapse
|
13
|
Ohlin M, Scheepers C, Corcoran M, Lees WD, Busse CE, Bagnara D, Thörnqvist L, Bürckert JP, Jackson KJL, Ralph D, Schramm CA, Marthandan N, Breden F, Scott J, Matsen IV FA, Greiff V, Yaari G, Kleinstein SH, Christley S, Sherkow JS, Kossida S, Lefranc MP, van Zelm MC, Watson CT, Collins AM. Inferred Allelic Variants of Immunoglobulin Receptor Genes: A System for Their Evaluation, Documentation, and Naming. Front Immunol 2019; 10:435. [PMID: 30936866 PMCID: PMC6431624 DOI: 10.3389/fimmu.2019.00435] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 02/19/2019] [Indexed: 11/13/2022] Open
Abstract
Immunoglobulins or antibodies are the main effector molecules of the B-cell lineage and are encoded by hundreds of variable (V), diversity (D), and joining (J) germline genes, which recombine to generate enormous IG diversity. Recently, high-throughput adaptive immune receptor repertoire sequencing (AIRR-seq) of recombined V-(D)-J genes has offered unprecedented insights into the dynamics of IG repertoires in health and disease. Faithful biological interpretation of AIRR-seq studies depends upon the annotation of raw AIRR-seq data, using reference germline gene databases to identify the germline genes within each rearrangement. Existing reference databases are incomplete, as shown by recent AIRR-seq studies that have inferred the existence of many previously unreported polymorphisms. Completing the documentation of genetic variation in germline gene databases is therefore of crucial importance. Lymphocyte receptor genes and alleles are currently assigned by the Immunoglobulins, T cell Receptors and Major Histocompatibility Nomenclature Subcommittee of the International Union of Immunological Societies (IUIS) and managed in IMGT®, the international ImMunoGeneTics information system® (IMGT). In 2017, the IMGT Group reached agreement with a group of AIRR-seq researchers on the principles of a streamlined process for identifying and naming inferred allelic sequences, for their incorporation into IMGT®. These researchers represented the AIRR Community, a network of over 300 researchers whose objective is to promote all aspects of immunoglobulin and T-cell receptor repertoire studies, including the standardization of experimental and computational aspects of AIRR-seq data generation and analysis. The Inferred Allele Review Committee (IARC) was established by the AIRR Community to devise policies, criteria, and procedures to perform this function. Formalized evaluations of novel inferred sequences have now begun and submissions are invited via a new dedicated portal (https://ogrdb.airr-community.org). Here, we summarize recommendations developed by the IARC-focusing, to begin with, on human IGHV genes-with the goal of facilitating the acceptance of inferred allelic variants of germline IGHV genes. We believe that this initiative will improve the quality of AIRR-seq studies by facilitating the description of human IG germline gene variation, and that in time, it will expand to the documentation of TR and IG genes in many vertebrate species.
Collapse
Affiliation(s)
- Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden
| | - Cathrine Scheepers
- Center for HIV and STIs, National Institute for Communicable Diseases, Johannesburg, South Africa
- Faculty of Health Sciences, School of Pathology, University of the Witwatersrand, Johannesburg, South Africa
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
| | - William D. Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - Christian E. Busse
- Division of B Cell Immunology, German Cancer Research Center, Heidelberg, Germany
| | - Davide Bagnara
- Department of Experimental Medicine, University of Genoa, Genoa, Italy
| | | | | | | | - Duncan Ralph
- Fred Hutchinson Cancer Research Center, Seattle, WA, United States
| | - Chaim A. Schramm
- Vaccine Research Center, National Institutes of Health, Washington, DC, United States
| | - Nishanth Marthandan
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Felix Breden
- Department of Biological Sciences, Simon Fraser University, Burnaby, BC, Canada
| | - Jamie Scott
- Department of Molecular Biology and Biochemistry, Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada
| | | | - Victor Greiff
- Department of Immunology, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
| | | | - Scott Christley
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, United States
| | - Jacob S. Sherkow
- Innovation Center for Law and Technology, New York Law School, New York, NY, United States
| | - Sofia Kossida
- IMGT, The International ImMunoGenetics information system (IMGT), Laboratoire d'ImmunoGénétique Moléculaire (LIGM), CNRS, Institut de Génétique Humaine, Université de Montpellier, Montpellier, France
| | - Marie-Paule Lefranc
- IMGT, The International ImMunoGenetics information system (IMGT), Laboratoire d'ImmunoGénétique Moléculaire (LIGM), CNRS, Institut de Génétique Humaine, Université de Montpellier, Montpellier, France
| | - Menno C. van Zelm
- Department of Immunology and Pathology, Central Clinical School, The Alfred Hospital, Monash University, Melbourne, VIC, Australia
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, United States
| | - Andrew M. Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|