1
|
Olson BJ, Moghimi P, Schramm CA, Obraztsova A, Ralph D, Vander Heiden JA, Shugay M, Shepherd AJ, Lees W, Matsen FA. sumrep: A Summary Statistic Framework for Immune Receptor Repertoire Comparison and Model Validation. Front Immunol 2019; 10:2533. [PMID: 31736960 PMCID: PMC6838214 DOI: 10.3389/fimmu.2019.02533] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 10/11/2019] [Indexed: 12/28/2022] Open
Abstract
The adaptive immune system generates an incredible diversity of antigen receptors for B and T cells to keep dangerous pathogens at bay. The DNA sequences coding for these receptors arise by a complex recombination process followed by a series of productivity-based filters, as well as affinity maturation for B cells, giving considerable diversity to the circulating pool of receptor sequences. Although these datasets hold considerable promise for medical and public health applications, the complex structure of the resulting adaptive immune receptor repertoire sequencing (AIRR-seq) datasets makes analysis difficult. In this paper we introduce sumrep, an R package that efficiently performs a wide variety of repertoire summaries and comparisons, and show how sumrep can be used to perform model validation. We find that summaries vary in their ability to differentiate between datasets, although many are able to distinguish between covariates such as donor, timepoint, and cell type for BCR and TCR repertoires. We show that deletion and insertion lengths resulting from V(D)J recombination tend to be more discriminative characterizations of a repertoire than summaries that describe the amino acid composition of the CDR3 region. We also find that state-of-the-art generative models excel at recapitulating gene usage and recombination statistics in a given experimental repertoire, but struggle to capture many physiochemical properties of real repertoires.
Collapse
Affiliation(s)
- Branden J Olson
- Fred Hutchinson Cancer Research Center, Seattle, WA, United States.,Department of Statistics, University of Washington, Seattle, WA, United States
| | - Pejvak Moghimi
- Department of Biological Sciences, Institute of Structural and Molecular Biology, Birkbeck, University of London, London, United Kingdom
| | - Chaim A Schramm
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, United States
| | - Anna Obraztsova
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia.,Genomics of Adaptive Immunity Department, Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | - Duncan Ralph
- Fred Hutchinson Cancer Research Center, Seattle, WA, United States
| | - Jason A Vander Heiden
- Department of Bioinformatics and Computational Biology, Genentech, Inc., South San Francisco, CA, United States
| | - Mikhail Shugay
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia.,Genomics of Adaptive Immunity Department, Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,Department of Molecular Technologies, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Adrian J Shepherd
- Department of Biological Sciences, Institute of Structural and Molecular Biology, Birkbeck, University of London, London, United Kingdom
| | - William Lees
- Department of Biological Sciences, Institute of Structural and Molecular Biology, Birkbeck, University of London, London, United Kingdom
| | | |
Collapse
|