1
|
Pelissier A, Luo S, Stratigopoulou M, Guikema JEJ, Rodríguez Martínez M. Exploring the impact of clonal definition on B-cell diversity: implications for the analysis of immune repertoires. Front Immunol 2023; 14:1123968. [PMID: 37138881 PMCID: PMC10150052 DOI: 10.3389/fimmu.2023.1123968] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 03/13/2023] [Indexed: 05/05/2023] Open
Abstract
The adaptive immune system has the extraordinary ability to produce a broad range of immunoglobulins that can bind a wide variety of antigens. During adaptive immune responses, activated B cells duplicate and undergo somatic hypermutation in their B-cell receptor (BCR) genes, resulting in clonal families of diversified B cells that can be related back to a common ancestor. Advances in high-throughput sequencing technologies have enabled the high-throughput characterization of B-cell repertoires, however, the accurate identification of clonally related BCR sequences remains a major challenge. In this study, we compare three different clone identification methods on both simulated and experimental data, and investigate their impact on the characterization of B-cell diversity. We observe that different methods lead to different clonal definitions, which affects the quantification of clonal diversity in repertoire data. Our analyses show that direct comparisons between clonal clusterings and clonal diversity of different repertoires should be avoided if different clone identification methods were used to define the clones. Despite this variability, the diversity indices inferred from the repertoires' clonal characterization across samples show similar patterns of variation regardless of the clonal identification method used. We find the Shannon entropy to be the most robust in terms of the variability of diversity rank across samples. Our analysis also suggests that the traditional germline gene alignment-based method for clonal identification remains the most accurate when the complete information about the sequence is known, but that alignment-free methods may be preferred for shorter sequencing read lengths. We make our implementation freely available as a Python library cdiversity.
Collapse
Affiliation(s)
- Aurelien Pelissier
- IBM Research Europe, Rüschlikon, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Siyuan Luo
- IBM Research Europe, Rüschlikon, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Maria Stratigopoulou
- Department of Pathology, Amsterdam University Medical Centers, location AMC, Lymphoma and Myeloma Center Amsterdam (LYMMCARE), Amsterdam, Netherlands
| | - Jeroen E. J. Guikema
- Department of Pathology, Amsterdam University Medical Centers, location AMC, Lymphoma and Myeloma Center Amsterdam (LYMMCARE), Amsterdam, Netherlands
| | | |
Collapse
|
2
|
Musen MA, O'Connor MJ, Schultes E, Martínez-Romero M, Hardi J, Graybeal J. Modeling community standards for metadata as templates makes data FAIR. Sci Data 2022; 9:696. [PMID: 36371407 PMCID: PMC9653497 DOI: 10.1038/s41597-022-01815-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 11/01/2022] [Indexed: 11/13/2022] Open
Abstract
It is challenging to determine whether datasets are findable, accessible, interoperable, and reusable (FAIR) because the FAIR Guiding Principles refer to highly idiosyncratic criteria regarding the metadata used to annotate datasets. Specifically, the FAIR principles require metadata to be "rich" and to adhere to "domain-relevant" community standards. Scientific communities should be able to define their own machine-actionable templates for metadata that encode these "rich," discipline-specific elements. We have explored this template-based approach in the context of two software systems. One system is the CEDAR Workbench, which investigators use to author new metadata. The other is the FAIRware Workbench, which evaluates the metadata of archived datasets for their adherence to community standards. Benefits accrue when templates for metadata become central elements in an ecosystem of tools to manage online datasets-both because the templates serve as a community reference for what constitutes FAIR data, and because they embody that perspective in a form that can be distributed among a variety of software applications to assist with data stewardship and data sharing.
Collapse
Affiliation(s)
- Mark A Musen
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, 94305, USA.
| | - Martin J O'Connor
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, 94305, USA
| | - Erik Schultes
- GO FAIR Foundation, Rijnsburgerweg 10, 2333 AA, Leiden, Netherlands
| | - Marcos Martínez-Romero
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, 94305, USA
- Acubed Innovation Center, 601 West California Avenue, Sunnyvale, CA, 94086, USA
| | - Josef Hardi
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, 94305, USA
| | - John Graybeal
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
3
|
Corrie BD, Christley S, Busse CE, Cowell LG, Neller KCM, Rubelt F, Schwab N. Data Sharing and Reuse: A Method by the AIRR Community. Methods Mol Biol 2022; 2453:447-476. [PMID: 35622339 PMCID: PMC9761493 DOI: 10.1007/978-1-0716-2115-8_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
High-throughput sequencing of adaptive immune receptor repertoires (AIRR, i.e., IG and TR ) has revolutionized the ability to study the adaptive immune response via large-scale experiments. Since 2009, AIRR sequencing (AIRR-seq) has been widely applied to survey the immune state of individuals (see "The AIRR Community Guide to Repertoire Analysis" chapter for details). One of the goals of the AIRR Community is to make the resulting AIRR-seq data FAIR (Findable, Accessible, Interoperable, and Reusable) (Wilkinson et al. Sci Data 3:1-9, 2016), with a primary goal of making it easy for the research community to reuse AIRR-seq data (Breden et al. Front Immunol 8:1418, 2017; Scott and Breden. Curr Opin Syst Biol 24:71-77, 2020). The basis for this is the MiAIRR data standard (Rubelt et al. Nat Immunol 18:1274-1278, 2017). For long-term preservation, it is recommended that researchers store their sequence read data in an INSDC repository. At the same time, the AIRR Community has established the AIRR Data Commons (Christley et al. Front Big Data 3:22, 2020), a distributed set of AIRR-compliant repositories that store the critically important annotated AIRR-seq data based on the MiAIRR standard, making the data findable, interoperable, and, because the data are annotated, more valuable in its reuse. Here, we build on the other AIRR Community chapters and illustrate how these principles and standards can be incorporated into AIRR-seq data analysis workflows. We discuss the importance of careful curation of metadata to ensure reproducibility and facilitate data sharing and reuse, and we illustrate how data can be shared via the AIRR Data Commons.
Collapse
Affiliation(s)
- Brian D Corrie
- Biological Sciences, Simon Fraser University, Burnaby, BC, Canada.
| | - Scott Christley
- Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USA.
| | | | - Lindsay G Cowell
- Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USA
- Department of Immunology, UT Southwestern Medical Center, Dallas, TX, USA
| | - Kira C M Neller
- Health Sciences, Simon Fraser University, Burnaby, BC, Canada
| | | | - Nicholas Schwab
- Department of Neurology with Institute of Translational Neurology, University of Muenster, Muenster, Germany
| |
Collapse
|
4
|
Lindenbaum O, Nouri N, Kluger Y, Kleinstein SH. Alignment free identification of clones in B cell receptor repertoires. Nucleic Acids Res 2021; 49:e21. [PMID: 33330933 PMCID: PMC7913774 DOI: 10.1093/nar/gkaa1160] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 11/10/2020] [Accepted: 11/13/2020] [Indexed: 11/22/2022] Open
Abstract
Following antigenic challenge, activated B cells rapidly expand and undergo somatic hypermutation, yielding groups of clonally related B cells with diversified immunoglobulin receptors. Inference of clonal relationships based on the receptor sequence is an essential step in many adaptive immune receptor repertoire sequencing studies. These relationships are typically identified by a multi-step process that involves: (i) grouping sequences based on shared V and J gene assignments, and junction lengths and (ii) clustering these sequences using a junction-based distance. However, this approach is sensitive to the initial gene assignments, which are error-prone, and fails to identify clonal relatives whose junction length has changed through accumulation of indels. Through defining a translation-invariant feature space in which we cluster the sequences, we develop an alignment free clonal identification method that does not require gene assignments and is not restricted to a fixed junction length. This alignment free approach has higher sensitivity compared to a typical junction-based distance method without loss of specificity and PPV. While the alignment free procedure identifies clones that are broadly consistent with the junction-based distance method, it also identifies clones with characteristics (multiple V or J gene assignments or junction lengths) that are not detectable with the junction-based distance method.
Collapse
Affiliation(s)
- Ofir Lindenbaum
- Program in Applied Mathematics, Yale University, New Haven, CT, USA
| | - Nima Nouri
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA.,Center for Medical Informatics, Yale University, New Haven, CT 06511, USA
| | - Yuval Kluger
- Program in Applied Mathematics, Yale University, New Haven, CT, USA.,Department of Pathology, Yale School of Medicine, New Haven, CT, USA.,Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Steven H Kleinstein
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA.,Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.,Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| |
Collapse
|
5
|
Prechl J. Network Organization of Antibody Interactions in Sequence and Structure Space: the RADARS Model. Antibodies (Basel) 2020; 9:antib9020013. [PMID: 32384800 PMCID: PMC7345901 DOI: 10.3390/antib9020013] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 04/09/2020] [Accepted: 04/15/2020] [Indexed: 02/06/2023] Open
Abstract
Adaptive immunity in vertebrates is a complex self-organizing network of molecular interactions. While deep sequencing of the immune-receptor repertoire may reveal clonal relationships, functional interpretation of such data is hampered by the inherent limitations of converting sequence to structure to function. In this paper, a novel model of antibody interaction space and network, termed radial adjustment of system resolution, RAdial ADjustment of System Resolution (RADARS), is proposed. The model is based on the radial growth of interaction affinity of antibodies towards an infinity of directions in structure space, each direction corresponding to particular shapes of antigen epitopes. Levels of interaction affinity appear as free energy shells of the system, where hierarchical B-cell development and differentiation takes place. Equilibrium in this immunological thermodynamic system can be described by a power law distribution of antibody-free energies with an ideal network degree exponent of phi square, representing a scale-free fractal network of antibody interactions. Plasma cells are network hubs, memory B cells are nodes with intermediate degrees, and B1 cells function as nodes with minimal degree. Overall, the RADARS model implies that a finite number of antibody structures can interact with an infinite number of antigens by immunologically controlled adjustment of interaction energy distribution. Understanding quantitative network properties of the system should help the organization of sequence-derived predicted structural data.
Collapse
Affiliation(s)
- József Prechl
- Diagnosticum Zrt., 126. Attila u., 1047 Budapest, Hungary
| |
Collapse
|
6
|
Davis MM, Boyd SD. Recent progress in the analysis of αβT cell and B cell receptor repertoires. Curr Opin Immunol 2019; 59:109-114. [PMID: 31326777 PMCID: PMC7075470 DOI: 10.1016/j.coi.2019.05.012] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 05/28/2019] [Indexed: 01/10/2023]
Abstract
T cell receptors (TCRs) and B cell receptors (BCRs) are vertebrate evolution's best answer to the threat of microbial pathogens that can evolve much faster than ourselves. These antigen receptors are generated during T cell or B cell development by combinatorial rearrangement of germline genome V, D and J gene segments, and with junctional residues capable of enormous diversity. For decades the complexity of these receptor repertoires has limited their analysis, but advances in DNA sequencing technology and an array of complementary tools have now made their study much more tractable, filling a major gap in our ability to understand immunology as a system. Here, we summarize the recent approaches and discoveries that are enabling these advances, with some suggestions as to what may lie ahead.
Collapse
Affiliation(s)
- Mark M Davis
- Institute for Immunity, Transplantation, and Infection, Stanford University School of Medicine, Stanford, CA, USA; Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA; The Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| | - Scott D Boyd
- Institute for Immunity, Transplantation, and Infection, Stanford University School of Medicine, Stanford, CA, USA; The Sean N. Parker Center for Allergy and Asthma Research at Stanford University, Stanford, CA, USA; Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
7
|
Martínez-Romero M, O'Connor MJ, Egyedi AL, Willrett D, Hardi J, Graybeal J, Musen MA. Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases. Database (Oxford) 2019; 2019:baz059. [PMID: 31210270 PMCID: PMC6866600 DOI: 10.1093/database/baz059] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Revised: 03/21/2019] [Accepted: 04/15/2019] [Indexed: 12/28/2022]
Abstract
Metadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very poor. A key problem is that the typical metadata acquisition process is onerous and time consuming, with little interactive guidance or assistance provided to users. Secondary problems include the lack of validation and sparse use of standardized terms or ontologies when authoring metadata. There is a pressing need for improvements to the metadata acquisition process that will help users to enter metadata quickly and accurately. In this paper, we outline a recommendation system for metadata that aims to address this challenge. Our approach uses association rule mining to uncover hidden associations among metadata values and to represent them in the form of association rules. These rules are then used to present users with real-time recommendations when authoring metadata. The novelties of our method are that it is able to combine analyses of metadata from multiple repositories when generating recommendations and can enhance those recommendations by aligning them with ontology terms. We implemented our approach as a service integrated into the CEDAR Workbench metadata authoring platform, and evaluated it using metadata from two public biomedical repositories: US-based National Center for Biotechnology Information BioSample and European Bioinformatics Institute BioSamples. The results show that our approach is able to use analyses of previously entered metadata coupled with ontology-based mappings to present users with accurate recommendations when authoring metadata.
Collapse
Affiliation(s)
- Marcos Martínez-Romero
- Stanford Center for Biomedical Informatics Research, 1265 Welch Road, Stanford University School of Medicine, Stanford, CA 94305-5479, USA
| | - Martin J O'Connor
- Stanford Center for Biomedical Informatics Research, 1265 Welch Road, Stanford University School of Medicine, Stanford, CA 94305-5479, USA
| | - Attila L Egyedi
- Stanford Center for Biomedical Informatics Research, 1265 Welch Road, Stanford University School of Medicine, Stanford, CA 94305-5479, USA
| | - Debra Willrett
- Stanford Center for Biomedical Informatics Research, 1265 Welch Road, Stanford University School of Medicine, Stanford, CA 94305-5479, USA
| | - Josef Hardi
- Stanford Center for Biomedical Informatics Research, 1265 Welch Road, Stanford University School of Medicine, Stanford, CA 94305-5479, USA
| | - John Graybeal
- Stanford Center for Biomedical Informatics Research, 1265 Welch Road, Stanford University School of Medicine, Stanford, CA 94305-5479, USA
| | - Mark A Musen
- Stanford Center for Biomedical Informatics Research, 1265 Welch Road, Stanford University School of Medicine, Stanford, CA 94305-5479, USA
| |
Collapse
|
8
|
Vander Heiden JA, Marquez S, Marthandan N, Bukhari SAC, Busse CE, Corrie B, Hershberg U, Kleinstein SH, Matsen IV FA, Ralph DK, Rosenfeld AM, Schramm CA, Christley S, Laserson U. AIRR Community Standardized Representations for Annotated Immune Repertoires. Front Immunol 2018; 9:2206. [PMID: 30323809 PMCID: PMC6173121 DOI: 10.3389/fimmu.2018.02206] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 09/05/2018] [Indexed: 01/21/2023] Open
Abstract
Increased interest in the immune system's involvement in pathophysiological phenomena coupled with decreased DNA sequencing costs have led to an explosion of antibody and T cell receptor sequencing data collectively termed "adaptive immune receptor repertoire sequencing" (AIRR-seq or Rep-Seq). The AIRR Community has been actively working to standardize protocols, metadata, formats, APIs, and other guidelines to promote open and reproducible studies of the immune repertoire. In this paper, we describe the work of the AIRR Community's Data Representation Working Group to develop standardized data representations for storing and sharing annotated antibody and T cell receptor data. Our file format emphasizes ease-of-use, accessibility, scalability to large data sets, and a commitment to open and transparent science. It is composed of a tab-delimited format with a specific schema. Several popular repertoire analysis tools and data repositories already utilize this AIRR-seq data format. We hope that others will follow suit in the interest of promoting interoperable standards.
Collapse
Affiliation(s)
| | - Susanna Marquez
- Department of Pathology, Yale School of Medicine, New Haven, CT, United States
| | - Nishanth Marthandan
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | | | - Christian E. Busse
- Division of B Cell Immunology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Brian Corrie
- Department of Biological Sciences, Simon Fraser University, Burnaby, BC, Canada
| | - Uri Hershberg
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, United States
- Department of Microbiology and Immunology, College of Medicine, Drexel University, Philadelphia, PA, United States
- Department of Human Biology, Faculty of Sciences, University of Haifa, Haifa, Israel
| | - Steven H. Kleinstein
- Department of Pathology, Yale School of Medicine, New Haven, CT, United States
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
| | | | - Duncan K. Ralph
- Fred Hutchinson Cancer Research Center, Seattle, WA, United States
| | - Aaron M. Rosenfeld
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, United States
| | - Chaim A. Schramm
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, United States
| | | | - Scott Christley
- Department of Clinical Sciences, UT Southwestern Medical Center, Dallas, TX, United States
| | - Uri Laserson
- Department of Genetics and Genomic Sciences and Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|