1
|
Balashova D, van Schaik BDC, Stratigopoulou M, Guikema JEJ, Caniels TG, Claireaux M, van Gils MJ, Musters A, Anang DC, de Vries N, Greiff V, van Kampen AHC. Systematic evaluation of B-cell clonal family inference approaches. BMC Immunol 2024; 25:13. [PMID: 38331731 DOI: 10.1186/s12865-024-00600-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 01/18/2024] [Indexed: 02/10/2024] Open
Abstract
The reconstruction of clonal families (CFs) in B-cell receptor (BCR) repertoire analysis is a crucial step to understand the adaptive immune system and how it responds to antigens. The BCR repertoire of an individual is formed throughout life and is diverse due to several factors such as gene recombination and somatic hypermutation. The use of Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) using next generation sequencing enabled the generation of full BCR repertoires that also include rare CFs. The reconstruction of CFs from AIRR-seq data is challenging and several approaches have been developed to solve this problem. Currently, most methods use the heavy chain (HC) only, as it is more variable than the light chain (LC). CF reconstruction options include the definition of appropriate sequence similarity measures, the use of shared mutations among sequences, and the possibility of reconstruction without preliminary clustering based on V- and J-gene annotation. In this study, we aimed to systematically evaluate different approaches for CF reconstruction and to determine their impact on various outcome measures such as the number of CFs derived, the size of the CFs, and the accuracy of the reconstruction. The methods were compared to each other and to a method that groups sequences based on identical junction sequences and another method that only determines subclones. We found that after accounting for data set variability, in particular sequencing depth and mutation load, the reconstruction approach has an impact on part of the outcome measures, including the number of CFs. Simulations indicate that unique junctions and subclones should not be used as substitutes for CF and that more complex methods do not outperform simpler methods. Also, we conclude that different approaches differ in their ability to correctly reconstruct CFs when not considering the LC and to identify shared CFs. The results showed the effect of different approaches on the reconstruction of CFs and highlighted the importance of choosing an appropriate method.
Collapse
Affiliation(s)
- Daria Balashova
- Amsterdam UMC location University of Amsterdam, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Public Health, Methodology, Amsterdam, The Netherlands
- Amsterdam Infection and Immunity, Inflammatory Diseases, Amsterdam, The Netherlands
| | - Barbera D C van Schaik
- Amsterdam UMC location University of Amsterdam, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Public Health, Methodology, Amsterdam, The Netherlands
- Amsterdam Infection and Immunity, Inflammatory Diseases, Amsterdam, The Netherlands
| | - Maria Stratigopoulou
- Cancer Center Amsterdam, Amsterdam, The Netherlands
- Amsterdam UMC location University of Amsterdam, Medical Microbiology and Infection Prevention, Meibergdreef 9, Amsterdam, Netherlands
| | - Jeroen E J Guikema
- Cancer Center Amsterdam, Amsterdam, The Netherlands
- Amsterdam UMC location University of Amsterdam, Pathology, Lymphoma and Myeloma Center Amsterdam, Meibergdreef 9, Amsterdam, Netherlands
| | - Tom G Caniels
- Amsterdam UMC location University of Amsterdam, Medical Microbiology and Infection Prevention, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Infection and Immunity, Infectious Diseases, Amsterdam, The Netherlands
| | - Mathieu Claireaux
- Amsterdam UMC location University of Amsterdam, Medical Microbiology and Infection Prevention, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Infection and Immunity, Infectious Diseases, Amsterdam, The Netherlands
| | - Marit J van Gils
- Amsterdam UMC location University of Amsterdam, Medical Microbiology and Infection Prevention, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Infection and Immunity, Infectious Diseases, Amsterdam, The Netherlands
| | - Anne Musters
- Amsterdam UMC location University of Amsterdam, Experimental Immunology, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Rheumatology & Immunology Center, Amsterdam, The Netherlands
| | - Dornatien C Anang
- Amsterdam UMC location University of Amsterdam, Experimental Immunology, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Rheumatology & Immunology Center, Amsterdam, The Netherlands
| | - Niek de Vries
- Amsterdam UMC location University of Amsterdam, Experimental Immunology, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Rheumatology & Immunology Center, Amsterdam, The Netherlands
| | - Victor Greiff
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Antoine H C van Kampen
- Amsterdam UMC location University of Amsterdam, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands.
- Amsterdam Public Health, Methodology, Amsterdam, The Netherlands.
- Amsterdam Infection and Immunity, Inflammatory Diseases, Amsterdam, The Netherlands.
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands.
| |
Collapse
|
2
|
Narang S, Kaduk M, Chernyshev M, Karlsson Hedestam GB, Corcoran MM. Adaptive immune receptor genotyping using the corecount program. Front Immunol 2023; 14:1125884. [PMID: 37114042 PMCID: PMC10126697 DOI: 10.3389/fimmu.2023.1125884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 02/27/2023] [Indexed: 04/29/2023] Open
Abstract
We present a new Rep-Seq analysis tool called corecount, for analyzing genotypic variation in immunoglobulin (IG) and T cell receptor (TCR) genes. corecount is highly efficient at identifying V alleles, including those that are infrequently used in expressed repertoires and those that contain 3' end variation that are otherwise refractory to reliable identification during germline inference from expressed libraries. Furthermore, corecount facilitates accurate D and J gene genotyping. The output is highly reproducible and facilitates the comparison of genotypes from multiple individuals, such as those from clinical cohorts. Here, we applied corecount to the genotypic analysis of IgM libraries from 16 individuals. To demonstrate the accuracy of corecount, we Sanger sequenced all the heavy chain IG alleles (65 IGHV, 27 IGHD and 7 IGHJ) from one individual from whom we also produced two independent IgM Rep-seq datasets. Genomic analysis revealed that 5 known IGHV and 2 IGHJ sequences are truncated in current reference databases. This dataset of genomically validated alleles and IgM libraries from the same individual provides a useful resource for benchmarking other bioinformatic programs that involve V, D and J assignments and germline inference, and may facilitate the development of AIRR-Seq analysis tools that can take benefit from the availability of more comprehensive reference databases.
Collapse
|
3
|
Burnum-Johnson KE, Conrads TP, Drake RR, Herr AE, Iyengar R, Kelly RT, Lundberg E, MacCoss MJ, Naba A, Nolan GP, Pevzner PA, Rodland KD, Sechi S, Slavov N, Spraggins JM, Van Eyk JE, Vidal M, Vogel C, Walt DR, Kelleher NL. New Views of Old Proteins: Clarifying the Enigmatic Proteome. Mol Cell Proteomics 2022; 21:100254. [PMID: 35654359 PMCID: PMC9256833 DOI: 10.1016/j.mcpro.2022.100254] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/09/2022] [Accepted: 05/27/2022] [Indexed: 11/23/2022] Open
Abstract
All human diseases involve proteins, yet our current tools to characterize and quantify them are limited. To better elucidate proteins across space, time, and molecular composition, we provide a >10 years of projection for technologies to meet the challenges that protein biology presents. With a broad perspective, we discuss grand opportunities to transition the science of proteomics into a more propulsive enterprise. Extrapolating recent trends, we describe a next generation of approaches to define, quantify, and visualize the multiple dimensions of the proteome, thereby transforming our understanding and interactions with human disease in the coming decade.
Collapse
Affiliation(s)
- Kristin E Burnum-Johnson
- The Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington, USA.
| | - Thomas P Conrads
- Inova Women's Service Line, Inova Health System, Falls Church, Virginia, USA
| | - Richard R Drake
- Cell and Molecular Pharmacology and Experimental Therapeutics, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Amy E Herr
- Department of Bioengineering, University of California, Berkeley, California, USA
| | - Ravi Iyengar
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Ryan T Kelly
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah, USA
| | - Emma Lundberg
- Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Alexandra Naba
- Department of Physiology and Biophysics, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Garry P Nolan
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, San Diego, California, USA
| | - Karin D Rodland
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Salvatore Sechi
- National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Nikolai Slavov
- Department of Bioengineering, Northeastern University, Boston, Massachusetts, USA
| | - Jeffrey M Spraggins
- Department of Cell and Developmental Biology, Mass Spectrometry Research Center, Vanderbilt University, Nashville, Tennessee, USA
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Institute in the Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Marc Vidal
- Department of Genetics, Harvard University, Cambridge, Massachusetts, USA
| | - Christine Vogel
- New York University Center for Genomics and Systems Biology, New York University, New York, New York, USA
| | - David R Walt
- Department of Pathology, Harvard Medical School, Brigham and Women's Hospital, Wyss Institute at Harvard University, Boston, Massachusetts, USA
| | - Neil L Kelleher
- Department of Chemistry, Northwestern University, Evanston, Illinois, USA.
| |
Collapse
|
4
|
Sirupurapu V, Safonova Y, Pevzner P. Gene prediction in the immunoglobulin loci. Genome Res 2022; 32:1152-1169. [PMID: 35545447 DOI: 10.1101/gr.276676.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 05/06/2022] [Indexed: 11/25/2022]
Abstract
The V(D)J recombination process rearranges the variable (V), diversity (D), and joining (J) genes in the immunoglobulin loci to generate antibody repertoires. Annotation of these loci across various species and predicting the V, D, and J genes (IG genes) is critical for studies of the adaptive immune system. However, since the standard gene finding algorithms are not suitable for predicting IG genes, they have been semi-manually annotated in very few species. We developed the IGDetective algorithm for predicting IG genes and applied it to species with the assembled IG loci. IGDetective generated the first large collection of IG genes across many species and enabled their evolutionary analysis, including the analysis of the "bat IG diversity" hypothesis. This analysis revealed extremely conserved V genes in evolutionary distant species indicating that these genes may be subjected to the same selective pressure, e.g., pressure driven by common pathogens. IGDetective also revealed extremely diverged V genes and a new family of evolutionary conserved V genes in bats with unusual noncanonical cysteines. Moreover, in difference from all other previously reported antibodies, these cysteines are located within complementarity-determining regions. Since cysteines form disulfide bonds, we hypothesize that these cysteine-rich V genes might generate antibodies with noncanonical conformations and could potentially form a unique part of the immune repertoire in bats. We also analyzed the diversity landscape of the recombination signal sequences and revealed their features that trigger the high/low usage of the IG genes.
Collapse
|
5
|
Safonova Y, Shin SB, Kramer L, Reecy J, Watson CT, Smith TPL, Pevzner PA. Variations in antibody repertoires correlate with vaccine responses. Genome Res 2022; 32:791-804. [PMID: 35361626 PMCID: PMC8997358 DOI: 10.1101/gr.276027.121] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Accepted: 02/28/2022] [Indexed: 11/24/2022]
Abstract
An important challenge in vaccine development is to figure out why a vaccine succeeds in some individuals and fails in others. Although antibody repertoires hold the key to answering this question, there have been very few personalized immunogenomics studies so far aimed at revealing how variations in immunoglobulin genes affect a vaccine response. We conducted an immunosequencing study of 204 calves vaccinated against bovine respiratory disease (BRD) with the goal to reveal variations in immunoglobulin genes and somatic hypermutations that impact the efficacy of vaccine response. Our study represents the largest longitudinal personalized immunogenomics study reported to date across all species, including humans. To analyze the generated data set, we developed an algorithm for identifying variations of the immunoglobulin genes (as well as frequent somatic hypermutations) that affect various features of the antibody repertoire and titers of neutralizing antibodies. In contrast to relatively short human antibodies, cattle have a large fraction of ultralong antibodies that have opened new therapeutic opportunities. Our study reveals that ultralong antibodies are a key component of the immune response against the costliest disease of beef cattle in North America. The detected variants of the cattle immunoglobulin genes, which are implicated in the success/failure of the BRD vaccine, have the potential to direct the selection of individual cattle for ongoing breeding programs.
Collapse
Affiliation(s)
- Yana Safonova
- Computer Science and Engineering Department, University of California at San Diego, San Diego, California 92093, USA
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, Kentucky 40202, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Sung Bong Shin
- U.S. Meat Animal Research Center, USDA-ARS, Clay Center, Nebraska 68933, USA
| | - Luke Kramer
- Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | - James Reecy
- Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, Kentucky 40202, USA
| | - Timothy P L Smith
- U.S. Meat Animal Research Center, USDA-ARS, Clay Center, Nebraska 68933, USA
| | - Pavel A Pevzner
- Computer Science and Engineering Department, University of California at San Diego, San Diego, California 92093, USA
| |
Collapse
|
6
|
Characterization of human IgM and IgG repertoires in individuals with chronic HIV-1 infection. Virol Sin 2022; 37:370-379. [PMID: 35247647 PMCID: PMC9243603 DOI: 10.1016/j.virs.2022.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Accepted: 02/24/2022] [Indexed: 11/23/2022] Open
Abstract
Advancements in high-throughput sequencing (HTS) of antibody repertoires (Ig-Seq) have unprecedentedly improved our ability to characterize the antibody repertoires on a large scale. However, currently, only a few studies explored the influence of chronic HIV-1 infection on human antibody repertoires and many of them reached contradictory conclusions, possibly limited by inadequate sequencing depth and throughput. To better understand how HIV-1 infection would impact humoral immune system, in this study, we systematically analyzed the differences between the IgM (HIV-IgM) and IgG (HIV-IgG) heavy chain repertoires of HIV-1 infected patients, as well as between antibody repertoires of HIV-1 patients and healthy donors (HH). Notably, the public unique clones accounted for only a negligible proportion between the HIV-IgM and HIV-IgG repertoires libraries, and the diversity of unique clones in HIV-IgG remarkably reduced. In aspect of somatic mutation rates of CDR1 and CDR2, the HIV-IgG repertoire was higher than HIV-IgM. Besides, the average length of CDR3 region in HIV-IgM was significant longer than that in the HH repertoire, presumably caused by the great number of novel VDJ rearrangement patterns, especially a massive use of IGHJ6. Moreover, some of the B cell clonotypes had numerous clones, and somatic variants were detected within the clonotype lineage in HIV-IgG, indicating HIV-1 neutralizing activities. The in-depth characterization of HIV-IgG and HIV-IgM repertoires enriches our knowledge in the profound effect of HIV-1 infection on human antibody repertoires and may have practical value for the discovery of therapeutic antibodies. Ultra-deep sequencing of both IgM and IgG repertoires in chronic HIV-1 infection. VDJ gene rearrangement patterns can be dramatically changed by HIV-1 infection. Multiple mechanisms cause the high complexity of HIV-1-experienced antibodies. Discovery of promising neutralizing HIV-1 antibodies from antibody repertoires.
Collapse
|
7
|
Qiu Q, Zhang P, Zhang N, Shen Y, Lou S, Deng J. Development of a Prognostic Nomogram for Acute Myeloid Leukemia on IGHD Gene Family. Int J Gen Med 2021; 14:4303-4316. [PMID: 34408473 PMCID: PMC8364394 DOI: 10.2147/ijgm.s317528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/15/2021] [Indexed: 11/29/2022] Open
Abstract
Purpose Acute myeloid leukaemia (AML) is a common haematological disease in adults. The overall survival (OS) remains unsatisfactory. It is critical to identify potential prognostic biomarkers and develop a nomogram that predicts overall survival in patients with AML. Patients and Methods We used gene expression dataset and clinical data from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) to identify differential expression analysis, survival analysis, and prognostic value of IGHD gene family (IGHDs) in AML patients. A risk score model was built through Lasso analysis and multivariate Cox regression. We also developed a nomogram and evaluated its accuracy with Harrell’s Harmony Index (C-index) and calibration curve. Last, the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) database was used for external validation. Results IGHD1-20 mRNA expression level was an independent prognostic factor for patients with AML by multivariate analysis. After Lasso analysis and multivariate Cox regression, we constructed a 3-gene model (IGHD1-1, IGHD1-20, IGHD3-16) associated with OS in AML. Risk score and age were validated as independent risk factors for prognosis and were used to build a nomogram. The C index and calibration curve results show that its ability to predict 1-year, 3-year and 5-year overall survival is accurate. Conclusion The mRNA level of IGHDs was increased in AML patients. IGHD1-20 was an independent risk factor for OS in AML patients. The IGHDs risk model (IGHD1-1, IGHD1-20, IGHD3-16) relates to the OS of AML patients. The nomogram, including risk score and age, can conveniently and effectively predict the overall survival rate of patients.
Collapse
Affiliation(s)
- Qunxiang Qiu
- Department of Hematology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, 400010, People's Republic of China
| | - Ping Zhang
- Hematology Laboratory, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, 400010, People's Republic of China
| | - Nan Zhang
- Department of Hematology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, 400010, People's Republic of China
| | - Yan Shen
- Department of Hematology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, 400010, People's Republic of China
| | - Shifeng Lou
- Department of Hematology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, 400010, People's Republic of China
| | - Jianchuan Deng
- Department of Hematology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, 400010, People's Republic of China
| |
Collapse
|
8
|
Bhardwaj V, Pevzner PA, Rashtchian C, Safonova Y. Trace Reconstruction Problems in Computational Biology. IEEE TRANSACTIONS ON INFORMATION THEORY 2021; 67:3295-3314. [PMID: 34176957 PMCID: PMC8224466 DOI: 10.1109/tit.2020.3030569] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The problem of reconstructing a string from its error-prone copies, the trace reconstruction problem, was introduced by Vladimir Levenshtein two decades ago. While there has been considerable theoretical work on trace reconstruction, practical solutions have only recently started to emerge in the context of two rapidly developing research areas: immunogenomics and DNA data storage. In immunogenomics, traces correspond to mutated copies of genes, with mutations generated naturally by the adaptive immune system. In DNA data storage, traces correspond to noisy copies of DNA molecules that encode digital data, with errors being artifacts of the data retrieval process. In this paper, we introduce several new trace generation models and open questions relevant to trace reconstruction for immunogenomics and DNA data storage, survey theoretical results on trace reconstruction, and highlight their connections to computational biology. Throughout, we discuss the applicability and shortcomings of known solutions and suggest future research directions.
Collapse
Affiliation(s)
- Vinnu Bhardwaj
- Electrical and Computer Engineering Department, University of California San Diego, La Jolla, USA
| | - Pavel A. Pevzner
- Computer Science and Engineering Department, University of California San Diego, La Jolla, USA
| | - Cyrus Rashtchian
- Computer Science and Engineering Department, University of California San Diego, La Jolla, USA
- Qualcomm Institute, University of California San Diego, La Jolla, USA
| | - Yana Safonova
- Computer Science and Engineering Department, University of California San Diego, La Jolla, USA
| |
Collapse
|
9
|
Abstract
Immunogenomics studies have been largely limited to individuals of European ancestry, restricting the ability to identify variation in human adaptive immune responses across populations. Inclusion of a greater diversity of individuals in immunogenomics studies will substantially enhance our understanding of human immunology.
Collapse
|
10
|
Large-scale analysis of 2,152 Ig-seq datasets reveals key features of B cell biology and the antibody repertoire. Cell Rep 2021; 35:109110. [PMID: 33979623 DOI: 10.1016/j.celrep.2021.109110] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 03/09/2021] [Accepted: 04/20/2021] [Indexed: 12/20/2022] Open
Abstract
Antibody repertoire sequencing enables researchers to acquire millions of B cell receptors and investigate these molecules at the single-nucleotide level. This power and resolution in studying humoral responses have led to its wide applications. However, most of these studies were conducted with a limited number of samples. Given the extraordinary diversity, assessment of these key features with a large sample set is demanded. Thus, we collect and systematically analyze 2,152 high-quality heavy-chain antibody repertoires. Our study reveals that 52 core variable genes universally contribute to more than 99% of each individual's repertoire; a distal interspersed preferences characterize V gene recombination; the number of public clones between two repertoires follows a linear model, and the positive selection dominates at RGYW motif in somatic hypermutations. Thus, this population-level analysis resolves some critical features of the antibody repertoire and may have significant value to the large cadre of scientists.
Collapse
|
11
|
Safonova Y, Pevzner PA. V(DD)J recombination is an important and evolutionarily conserved mechanism for generating antibodies with unusually long CDR3s. Genome Res 2020; 30:1547-1558. [PMID: 32948615 PMCID: PMC7605257 DOI: 10.1101/gr.259598.119] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 09/15/2020] [Indexed: 12/20/2022]
Abstract
The V(DD)J recombination is currently viewed as an aberrant and inconsequential variant of the canonical V(D)J recombination. Moreover, since the classical 12/23 rule for the V(D)J recombination fails to explain the V(DD)J recombination, the molecular mechanism of tandem D-D fusions has remained unknown since they were discovered three decades ago. Revealing this mechanism is a biomedically important goal since tandem fusions contribute to broadly neutralizing antibodies with ultralong CDR3s. We reveal previously overlooked cryptic nonamers in the recombination signal sequences of human IGHD genes and demonstrate that these nonamers explain the vast majority of tandem fusions in human repertoires. We further reveal large clonal lineages formed by tandem fusions in antigen-stimulated immunosequencing data sets, suggesting that such data sets contain many more tandem fusions than previously thought and that about a quarter of large clonal lineages with unusually long CDR3s are generated through tandem fusions. Finally, we developed the SEARCH-D algorithm for identifying D genes in mammalian genomes and applied it to the recently completed Vertebrate Genomes Project assemblies, nearly doubling the number of mammalian species with known D genes. Our analysis revealed cryptic nonamers in RSSs of many mammalian genomes, thus demonstrating that the V(DD)J recombination is not a "bug" but an important feature preserved throughout mammalian evolution.
Collapse
Affiliation(s)
- Yana Safonova
- Computer Science and Engineering Department, University of California San Diego, La Jolla, California 92093, USA
| | - Pavel A Pevzner
- Computer Science and Engineering Department, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|
12
|
Abstract
Advances in reading, writing, and editing DNA are providing unprecedented insights into the complexity of immunological systems. This combination of systems and synthetic biology methods is enabling the quantitative and precise understanding of molecular recognition in adaptive immunity, thus providing a framework for reprogramming immune responses for translational medicine. In this review, we will highlight state-of-the-art methods such as immune repertoire sequencing, immunoinformatics, and immunogenomic engineering and their application toward adaptive immunity. We showcase novel and interdisciplinary approaches that have the promise of transforming the design and breadth of molecular and cellular immunotherapies.
Collapse
Affiliation(s)
- Lucia Csepregi
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Roy A. Ehling
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Bastian Wagner
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Sai T. Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| |
Collapse
|
13
|
Shukla N, Siva N, Malik B, Suravajhala P. Current Challenges and Implications of Proteogenomic Approaches in Prostate Cancer. Curr Top Med Chem 2020; 20:1968-1980. [PMID: 32703135 DOI: 10.2174/1568026620666200722112450] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 05/30/2020] [Accepted: 06/29/2020] [Indexed: 12/16/2022]
Abstract
In the recent past, next-generation sequencing (NGS) approaches have heralded the omics era. With NGS data burgeoning, there arose a need to disseminate the omic data better. Proteogenomics has been vividly used for characterising the functions of candidate genes and is applied in ascertaining various diseased phenotypes, including cancers. However, not much is known about the role and application of proteogenomics, especially Prostate Cancer (PCa). In this review, we outline the need for proteogenomic approaches, their applications and their role in PCa.
Collapse
Affiliation(s)
- Nidhi Shukla
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Statue Circle, Jaipur 302001, RJ, India.,Department of Chemistry, School of Basic Sciences, Manipal University Jaipur, Jaipur, India
| | - Narmadhaa Siva
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Statue Circle, Jaipur 302001, RJ, India
| | - Babita Malik
- Department of Chemistry, School of Basic Sciences, Manipal University Jaipur, Jaipur, India
| | - Prashanth Suravajhala
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Statue Circle, Jaipur 302001, RJ, India
| |
Collapse
|
14
|
Brochu HN, Tseng E, Smith E, Thomas MJ, Jones AM, Diveley KR, Law L, Hansen SG, Picker LJ, Gale M, Peng X. Systematic Profiling of Full-Length Ig and TCR Repertoire Diversity in Rhesus Macaque through Long Read Transcriptome Sequencing. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2020; 204:3434-3444. [PMID: 32376650 PMCID: PMC7276939 DOI: 10.4049/jimmunol.1901256] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 04/13/2020] [Indexed: 12/19/2022]
Abstract
The diversity of Ig and TCR repertoires is a focal point of immunological studies. Rhesus macaques (Macaca mulatta) are key for modeling human immune responses, placing critical importance on the accurate annotation and quantification of their Ig and TCR repertoires. However, because of incomplete reference resources, the coverage and accuracy of the traditional targeted amplification strategies for profiling rhesus Ig and TCR repertoires are largely unknown. In this study, using long read sequencing, we sequenced four Indian-origin rhesus macaque tissues and obtained high-quality, full-length sequences for over 6000 unique Ig and TCR transcripts, without the need for sequence assembly. We constructed, to our knowledge, the first complete reference set for the constant regions of all known isotypes and chain types of rhesus Ig and TCR repertoires. We show that sequence diversity exists across the entire variable regions of rhesus Ig and TCR transcripts. Consequently, existing strategies using targeted amplification of rearranged variable regions comprised of V(D)J gene segments miss a significant fraction (27-53% and 42-49%) of rhesus Ig/TCR diversity. To overcome these limitations, we designed new rhesus-specific assays that remove the need for primers conventionally targeting variable regions and allow single cell level Ig and TCR repertoire analysis. Our improved approach will enable future studies to fully capture rhesus Ig and TCR repertoire diversity and is applicable for improving annotations in any model organism.
Collapse
Affiliation(s)
- Hayden N Brochu
- Department of Molecular Biomedical Sciences, North Carolina State University College of Veterinary Medicine, Raleigh, NC 27607
- Bioinformatics Graduate Program, North Carolina State University, Raleigh, NC 27695
| | | | - Elise Smith
- Department of Immunology, University of Washington, Seattle, WA 98109
| | - Matthew J Thomas
- Department of Immunology, University of Washington, Seattle, WA 98109
- Center for Innate Immunity and Immune Diseases, University of Washington, Seattle, WA 98109
| | - Aiden M Jones
- Department of Molecular Biomedical Sciences, North Carolina State University College of Veterinary Medicine, Raleigh, NC 27607
- Genetics Graduate Program, North Carolina State University, Raleigh, NC 27695
| | - Kayleigh R Diveley
- Department of Molecular Biomedical Sciences, North Carolina State University College of Veterinary Medicine, Raleigh, NC 27607
- Genetics Graduate Program, North Carolina State University, Raleigh, NC 27695
| | - Lynn Law
- Department of Immunology, University of Washington, Seattle, WA 98109
- Center for Innate Immunity and Immune Diseases, University of Washington, Seattle, WA 98109
| | - Scott G Hansen
- Vaccine and Gene Therapy Institute, Oregon Health & Science University, Beaverton, OR 97006
| | - Louis J Picker
- Vaccine and Gene Therapy Institute, Oregon Health & Science University, Beaverton, OR 97006
| | - Michael Gale
- Department of Immunology, University of Washington, Seattle, WA 98109
- Center for Innate Immunity and Immune Diseases, University of Washington, Seattle, WA 98109
- Washington National Primate Research Center, University of Washington, Seattle, WA 98121; and
| | - Xinxia Peng
- Department of Molecular Biomedical Sciences, North Carolina State University College of Veterinary Medicine, Raleigh, NC 27607;
- Bioinformatics Graduate Program, North Carolina State University, Raleigh, NC 27695
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695
| |
Collapse
|
15
|
Lees W, Busse CE, Corcoran M, Ohlin M, Scheepers C, Matsen FA, Yaari G, Watson CT, Collins A, Shepherd AJ. OGRDB: a reference database of inferred immune receptor genes. Nucleic Acids Res 2020; 48:D964-D970. [PMID: 31566225 PMCID: PMC6943078 DOI: 10.1093/nar/gkz822] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 09/05/2019] [Accepted: 09/16/2019] [Indexed: 12/20/2022] Open
Abstract
High-throughput sequencing of the adaptive immune receptor repertoire (AIRR-seq) is providing unprecedented insights into the immune response to disease and into the development of immune disorders. The accurate interpretation of AIRR-seq data depends on the existence of comprehensive germline gene reference sets. Current sets are known to be incomplete and unrepresentative of the degree of polymorphism and diversity in human and animal populations. A key issue is the complexity of the genomic regions in which they lie, which, because of the presence of multiple repeats, insertions and deletions, have not proved tractable with short-read whole genome sequencing. Recently, tools and methods for inferring such gene sequences from AIRR-seq datasets have become available, and a community approach has been developed for the expert review and publication of such inferences. Here, we present OGRDB, the Open Germline Receptor Database (https://ogrdb.airr-community.org), a public resource for the submission, review and publication of previously unknown receptor germline sequences together with supporting evidence.
Collapse
Affiliation(s)
- William Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London WC1E 7HX, UK
| | - Christian E Busse
- Division of B Cell Immunology, German Cancer Research Center, 69120 Heidelberg, Germany
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Box 280, 171 77 Stockholm, Sweden
| | - Mats Ohlin
- Department of Immunotechnology, Lund University, Medicon Village, S-223 81 Lund, Sweden
| | - Cathrine Scheepers
- Center for HIV and STIs, National Institute for Communicable Diseases of the National Health Laboratory Service, Sandringam, Gauteng 2131, South Africa.,Antibody Immunity Research Unit, School of Pathology, University of the Witwatersrand, Johannesburg 2050, South Africa
| | - Frederick A Matsen
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY 40202, USA
| | | | - Andrew Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Adrian J Shepherd
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London WC1E 7HX, UK
| |
Collapse
|
16
|
Bhardwaj V, Franceschetti M, Rao R, Pevzner PA, Safonova Y. Automated analysis of immunosequencing datasets reveals novel immunoglobulin D genes across diverse species. PLoS Comput Biol 2020; 16:e1007837. [PMID: 32339161 PMCID: PMC7295240 DOI: 10.1371/journal.pcbi.1007837] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 06/15/2020] [Accepted: 04/01/2020] [Indexed: 12/30/2022] Open
Abstract
Immunoglobulin genes are formed through V(D)J recombination, which joins the variable (V), diversity (D), and joining (J) germline genes. Since variations in germline genes have been linked to various diseases, personalized immunogenomics focuses on finding alleles of germline genes across various patients. Although reconstruction of V and J genes is a well-studied problem, the more challenging task of reconstructing D genes remained open until the IgScout algorithm was developed in 2019. In this work, we address limitations of IgScout by developing a probabilistic MINING-D algorithm for D gene reconstruction, apply it to hundreds of immunosequencing datasets from multiple species, and validate the newly inferred D genes by analyzing diverse whole genome sequencing datasets and haplotyping heterozygous V genes. Antibodies provide specific binding to an enormous range of antigens and represent a key component of the adaptive immune system. Immunosequencing has emerged as a method of choice for generating millions of reads that sample antibody repertoires and provides insights into monitoring immune response to disease and vaccination. Most of the previous immunogenomics studies rely on the reference germline genes in the immunoglobulin locus rather than the germline genes in a specific patient. This approach is deficient since the set of known germline genes is incomplete (particularly for non-European humans and non-human species) and contains alleles that resulted from sequencing and annotation errors. The problem of de novo inference of diversity (D) genes from immunosequencing data remained open until the IgScout algorithm was developed in 2019. We address limitations of IgScout by developing a probabilistic MINING-D algorithm for D gene reconstruction and infer multiple D genes across multiple species that are not present in standard databases.
Collapse
Affiliation(s)
- Vinnu Bhardwaj
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, California, United States of America
| | - Massimo Franceschetti
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, California, United States of America
| | - Ramesh Rao
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, California, United States of America
- Qualcomm Institute, University of California San Diego, San Diego, California, United States of America
| | - Pavel A. Pevzner
- Computer Science and Engineering Department, University of California San Diego, San Diego, California, United States of America
- * E-mail:
| | - Yana Safonova
- Computer Science and Engineering Department, University of California San Diego, San Diego, California, United States of America
- Center for Information Theory and Applications, University of California San Diego, San Diego, California, United States of America
| |
Collapse
|