1
|
Chomicz D, Kończak J, Wróbel S, Satława T, Dudzic P, Janusz B, Tarkowski M, Deszyński P, Gawłowski T, Kostyn A, Orłowski M, Klaus T, Schulte L, Martin K, Comeau SR, Krawczyk K. Benchmarking antibody clustering methods using sequence, structural, and machine learning similarity measures for antibody discovery applications. Front Mol Biosci 2024; 11:1352508. [PMID: 38606289 PMCID: PMC11008471 DOI: 10.3389/fmolb.2024.1352508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 02/09/2024] [Indexed: 04/13/2024] Open
Abstract
Antibodies are proteins produced by our immune system that have been harnessed as biotherapeutics. The discovery of antibody-based therapeutics relies on analyzing large volumes of diverse sequences coming from phage display or animal immunizations. Identification of suitable therapeutic candidates is achieved by grouping the sequences by their similarity and subsequent selection of a diverse set of antibodies for further tests. Such groupings are typically created using sequence-similarity measures alone. Maximizing diversity in selected candidates is crucial to reducing the number of tests of molecules with near-identical properties. With the advances in structural modeling and machine learning, antibodies can now be grouped across other diversity dimensions, such as predicted paratopes or three-dimensional structures. Here we benchmarked antibody grouping methods using clonotype, sequence, paratope prediction, structure prediction, and embedding information. The results were benchmarked on two tasks: binder detection and epitope mapping. We demonstrate that on binder detection no method appears to outperform the others, while on epitope mapping, clonotype, paratope, and embedding clusterings are top performers. Most importantly, all the methods propose orthogonal groupings, offering more diverse pools of candidates when using multiple methods than any single method alone. To facilitate exploring the diversity of antibodies using different methods, we have created an online tool-CLAP-available at (clap.naturalantibody.com) that allows users to group, contrast, and visualize antibodies using the different grouping methods.
Collapse
Affiliation(s)
| | | | - Sonia Wróbel
- NaturalAntibody, Szczecin, West Pomeranian, Poland
| | | | - Paweł Dudzic
- NaturalAntibody, Szczecin, West Pomeranian, Poland
| | | | | | | | | | | | - Marek Orłowski
- Pure Biologics, Wrocław, Poland
- Department of Biochemistry, Molecular Biology and Biotechnology, Faculty of Chemistry, Wrocław University of Science and Technology, Wrocław, Poland
| | | | - Lukas Schulte
- Global Computational Biology & Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach, Germany
| | - Kyle Martin
- Biotherapeutics Discovery, Boehringer Ingelheim, Biberach, Germany
| | | | | |
Collapse
|
2
|
Dudzic P, Chomicz D, Kończak J, Satława T, Janusz B, Wrobel S, Gawłowski T, Jaszczyszyn I, Bielska W, Demharter S, Spreafico R, Schulte L, Martin K, Comeau SR, Krawczyk K. Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery. MAbs 2024; 16:2361928. [PMID: 38844871 PMCID: PMC11164219 DOI: 10.1080/19420862.2024.2361928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 05/27/2024] [Indexed: 06/12/2024] Open
Abstract
The naïve human antibody repertoire has theoretical access to an estimated > 1015 antibodies. Identifying subsets of this prohibitively large space where therapeutically relevant antibodies may be found is useful for development of these agents. It was previously demonstrated that, despite the immense sequence space, different individuals can produce the same antibodies. It was also shown that therapeutic antibodies, which typically follow seemingly unnatural development processes, can arise independently naturally. To check for biases in how the sequence space is explored, we data mined public repositories to identify 220 bioprojects with a combined seven billion reads. Of these, we created a subset of human bioprojects that we make available as the AbNGS database (https://naturalantibody.com/ngs/). AbNGS contains 135 bioprojects with four billion productive human heavy variable region sequences and 385 million unique complementarity-determining region (CDR)-H3s. We find that 270,000 (0.07% of 385 million) unique CDR-H3s are highly public in that they occur in at least five of 135 bioprojects. Of 700 unique therapeutic CDR-H3, a total of 6% has direct matches in the small set of 270,000. This observation extends to a match between CDR-H3 and V-gene call as well. Thus, the subspace of shared ('public') CDR-H3s shows utility for serving as a starting point for therapeutic antibody design.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | - Lukas Schulte
- Global Computational Biology & Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riß, Germany
| | - Kyle Martin
- Biotherapeutics Discovery, Boehringer Ingelheim, Ridgefield, CT, USA
| | - Stephen R. Comeau
- Biotherapeutics Discovery, Boehringer Ingelheim, Ridgefield, CT, USA
| | | |
Collapse
|
3
|
Olsen TH, Abanades B, Moal IH, Deane CM. KA-Search, a method for rapid and exhaustive sequence identity search of known antibodies. Sci Rep 2023; 13:11612. [PMID: 37463925 DOI: 10.1038/s41598-023-38108-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 07/03/2023] [Indexed: 07/20/2023] Open
Abstract
Antibodies with similar amino acid sequences, especially across their complementarity-determining regions, often share properties. Finding that an antibody of interest has a similar sequence to naturally expressed antibodies in healthy or diseased repertoires is a powerful approach for the prediction of antibody properties, such as immunogenicity or antigen specificity. However, as the number of available antibody sequences is now in the billions and continuing to grow, repertoire mining for similar sequences has become increasingly computationally expensive. Existing approaches are limited by either being low-throughput, non-exhaustive, not antibody specific, or only searching against entire chain sequences. Therefore, there is a need for a specialized tool, optimized for a rapid and exhaustive search of any antibody region against all known antibodies, to better utilize the full breadth of available repertoire sequences. We introduce Known Antibody Search (KA-Search), a tool that allows for the rapid search of billions of antibody variable domains by amino acid sequence identity across either the variable domain, the complementarity-determining regions, or a user defined antibody region. We show KA-Search in operation on the [Formula: see text]2.4 billion antibody sequences available in the OAS database. KA-Search can be used to find the most similar sequences from OAS within 30 minutes and a representative subset of 10 million sequences in less than 9 seconds. We give examples of how KA-Search can be used to obtain new insights about an antibody of interest. KA-Search is freely available at https://github.com/oxpig/kasearch .
Collapse
Affiliation(s)
- Tobias H Olsen
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK
| | - Brennan Abanades
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK
| | - Iain H Moal
- GSK Medicines Research Centre, GlaxoSmithKline plc, Stevenage, SG1 2NY, UK
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK.
- Exscientia plc, Oxford, OX4 4GE, UK.
| |
Collapse
|
4
|
Jaszczyszyn I, Bielska W, Gawlowski T, Dudzic P, Satława T, Kończak J, Wilman W, Janusz B, Wróbel S, Chomicz D, Galson JD, Leem J, Kelm S, Krawczyk K. Structural modeling of antibody variable regions using deep learning-progress and perspectives on drug discovery. Front Mol Biosci 2023; 10:1214424. [PMID: 37484529 PMCID: PMC10361724 DOI: 10.3389/fmolb.2023.1214424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 06/12/2023] [Indexed: 07/25/2023] Open
Abstract
AlphaFold2 has hallmarked a generational improvement in protein structure prediction. In particular, advances in antibody structure prediction have provided a highly translatable impact on drug discovery. Though AlphaFold2 laid the groundwork for all proteins, antibody-specific applications require adjustments tailored to these molecules, which has resulted in a handful of deep learning antibody structure predictors. Herein, we review the recent advances in antibody structure prediction and relate them to their role in advancing biologics discovery.
Collapse
Affiliation(s)
- Igor Jaszczyszyn
- NaturalAntibody, Kraków, Poland
- Medical University of Warsaw, Warsaw, Poland
| | - Weronika Bielska
- NaturalAntibody, Kraków, Poland
- Medical University of Lodz, Lodz, Poland
| | | | | | | | | | | | | | | | | | | | - Jinwoo Leem
- Alchemab Therapeutics Ltd., London, United Kingdom
| | | | | |
Collapse
|
5
|
Wilman W, Wróbel S, Bielska W, Deszynski P, Dudzic P, Jaszczyszyn I, Kaniewski J, Młokosiewicz J, Rouyan A, Satława T, Kumar S, Greiff V, Krawczyk K. Machine-designed biotherapeutics: opportunities, feasibility and advantages of deep learning in computational antibody discovery. Brief Bioinform 2022; 23:6643456. [PMID: 35830864 PMCID: PMC9294429 DOI: 10.1093/bib/bbac267] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/09/2022] [Accepted: 06/07/2022] [Indexed: 11/13/2022] Open
Abstract
Antibodies are versatile molecular binders with an established and growing role as therapeutics. Computational approaches to developing and designing these molecules are being increasingly used to complement traditional lab-based processes. Nowadays, in silico methods fill multiple elements of the discovery stage, such as characterizing antibody–antigen interactions and identifying developability liabilities. Recently, computational methods tackling such problems have begun to follow machine learning paradigms, in many cases deep learning specifically. This paradigm shift offers improvements in established areas such as structure or binding prediction and opens up new possibilities such as language-based modeling of antibody repertoires or machine-learning-based generation of novel sequences. In this review, we critically examine the recent developments in (deep) machine learning approaches to therapeutic antibody design with implications for fully computational antibody design.
Collapse
|
6
|
Khetan R, Curtis R, Deane CM, Hadsund JT, Kar U, Krawczyk K, Kuroda D, Robinson SA, Sormanni P, Tsumoto K, Warwicker J, Martin ACR. Current advances in biopharmaceutical informatics: guidelines, impact and challenges in the computational developability assessment of antibody therapeutics. MAbs 2022; 14:2020082. [PMID: 35104168 PMCID: PMC8812776 DOI: 10.1080/19420862.2021.2020082] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Therapeutic monoclonal antibodies and their derivatives are key components of clinical pipelines in the global biopharmaceutical industry. The availability of large datasets of antibody sequences, structures, and biophysical properties is increasingly enabling the development of predictive models and computational tools for the "developability assessment" of antibody drug candidates. Here, we provide an overview of the antibody informatics tools applicable to the prediction of developability issues such as stability, aggregation, immunogenicity, and chemical degradation. We further evaluate the opportunities and challenges of using biopharmaceutical informatics for drug discovery and optimization. Finally, we discuss the potential of developability guidelines based on in silico metrics that can be used for the assessment of antibody stability and manufacturability.
Collapse
Affiliation(s)
- Rahul Khetan
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Robin Curtis
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | | | | | - Uddipan Kar
- Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
| | | | - Daisuke Kuroda
- Department of Bioengineering, School of Engineering, The University of Tokyo, Tokyo, Japan.,Medical Device Development and Regulation Research Center, School of Engineering, The University of Tokyo, Tokyo, Japan.,Department of Chemistry and Biotechnology, School of Engineering, The University of Tokyo, Tokyo, Japan
| | | | - Pietro Sormanni
- Chemistry of Health, Yusuf Hamied Department of Chemistry, University of Cambridge
| | - Kouhei Tsumoto
- Department of Bioengineering, School of Engineering, The University of Tokyo, Tokyo, Japan.,Medical Device Development and Regulation Research Center, School of Engineering, The University of Tokyo, Tokyo, Japan.,Department of Chemistry and Biotechnology, School of Engineering, The University of Tokyo, Tokyo, Japan.,The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Jim Warwicker
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Andrew C R Martin
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK
| |
Collapse
|
7
|
Robustification of RosettaAntibody and Rosetta SnugDock. PLoS One 2021; 16:e0234282. [PMID: 33764990 PMCID: PMC7993800 DOI: 10.1371/journal.pone.0234282] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 01/11/2021] [Indexed: 11/19/2022] Open
Abstract
In recent years, the observed antibody sequence space has grown exponentially due to advances in high-throughput sequencing of immune receptors. The rise in sequences has not been mirrored by a rise in structures, as experimental structure determination techniques have remained low-throughput. Computational modeling, however, has the potential to close the sequence–structure gap. To achieve this goal, computational methods must be robust, fast, easy to use, and accurate. Here we report on the latest advances made in RosettaAntibody and Rosetta SnugDock—methods for antibody structure prediction and antibody–antigen docking. We simplified the user interface, expanded and automated the template database, generalized the kinematics of antibody–antigen docking (which enabled modeling of single-domain antibodies) and incorporated new loop modeling techniques. To evaluate the effects of our updates on modeling accuracy, we developed rigorous tests under a new scientific benchmarking framework within Rosetta. Benchmarking revealed that more structurally similar templates could be identified in the updated database and that SnugDock broadened its applicability without losing accuracy. However, there are further advances to be made, including increasing the accuracy and speed of CDR-H3 loop modeling, before computational approaches can accurately model any antibody.
Collapse
|
8
|
Raybould MIJ, Marks C, Kovaltsuk A, Lewis AP, Shi J, Deane CM. Public Baseline and shared response structures support the theory of antibody repertoire functional commonality. PLoS Comput Biol 2021; 17:e1008781. [PMID: 33647011 PMCID: PMC7951972 DOI: 10.1371/journal.pcbi.1008781] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 03/11/2021] [Accepted: 02/08/2021] [Indexed: 12/14/2022] Open
Abstract
The naïve antibody/B-cell receptor (BCR) repertoires of different individuals ought to exhibit significant functional commonality, given that most pathogens trigger an effective antibody response to immunodominant epitopes. Sequence-based repertoire analysis has so far offered little evidence for this phenomenon. For example, a recent study estimated the number of shared ('public') antibody clonotypes in circulating baseline repertoires to be around 0.02% across ten unrelated individuals. However, to engage the same epitope, antibodies only require a similar binding site structure and the presence of key paratope interactions, which can occur even when their sequences are dissimilar. Here, we search for evidence of geometric similarity/convergence across human antibody repertoires. We first structurally profile naïve ('baseline') antibody diversity using snapshots from 41 unrelated individuals, predicting all modellable distinct structures within each repertoire. This analysis uncovers a high (much greater than random) degree of structural commonality. For instance, around 3% of distinct structures are common to the ten most diverse individual samples ('Public Baseline' structures). Our approach is the first computational method to find levels of BCR commonality commensurate with epitope immunodominance and could therefore be harnessed to find more genetically distant antibodies with same-epitope complementarity. We then apply the same structural profiling approach to repertoire snapshots from three individuals before and after flu vaccination, detecting a convergent structural drift indicative of recognising similar epitopes ('Public Response' structures). We show that Antibody Model Libraries derived from Public Baseline and Public Response structures represent a powerful geometric basis set of low-immunogenicity candidates exploitable for general or target-focused therapeutic antibody screening.
Collapse
Affiliation(s)
- Matthew I. J. Raybould
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Claire Marks
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Aleksandr Kovaltsuk
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Alan P. Lewis
- Data and Computational Sciences, GlaxoSmithKline Research and Development, Stevenage, United Kingdom
| | - Jiye Shi
- Chemistry Department, UCB Pharma, Slough, United Kingdom
| | - Charlotte M. Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
9
|
Raybould MIJ, Rees AR, Deane CM. Current strategies for detecting functional convergence across B-cell receptor repertoires. MAbs 2021; 13:1996732. [PMID: 34781829 PMCID: PMC8604390 DOI: 10.1080/19420862.2021.1996732] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 10/10/2021] [Accepted: 10/12/2021] [Indexed: 12/11/2022] Open
Abstract
Convergence across B-cell receptor (BCR) and antibody repertoires has become instrumental in prioritizing candidates in recent rapid therapeutic antibody discovery campaigns. It has also increased our understanding of the immune system, providing evidence for the preferential selection of BCRs to particular (immunodominant) epitopes post vaccination/infection. These important implications for both drug discovery and immunology mean that it is essential to consider the optimal way to combine experimental and computational technology when probing BCR repertoires for convergence signatures. Here, we discuss the theoretical basis for observing BCR repertoire functional convergence and explore factors of study design that can impact functional signal. We also review the computational arsenal available to detect antibodies with similar functional properties, highlighting opportunities enabled by recent clustering algorithms that exploit structural similarities between BCRs. Finally, we suggest future areas of development that should increase the power of BCR repertoire functional clustering.
Collapse
Affiliation(s)
- Matthew I. J. Raybould
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | | | - Charlotte M. Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| |
Collapse
|
10
|
Norman RA, Ambrosetti F, Bonvin AMJJ, Colwell LJ, Kelm S, Kumar S, Krawczyk K. Computational approaches to therapeutic antibody design: established methods and emerging trends. Brief Bioinform 2020; 21:1549-1567. [PMID: 31626279 PMCID: PMC7947987 DOI: 10.1093/bib/bbz095] [Citation(s) in RCA: 106] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 06/07/2019] [Accepted: 07/05/2019] [Indexed: 12/31/2022] Open
Abstract
Antibodies are proteins that recognize the molecular surfaces of potentially noxious molecules to mount an adaptive immune response or, in the case of autoimmune diseases, molecules that are part of healthy cells and tissues. Due to their binding versatility, antibodies are currently the largest class of biotherapeutics, with five monoclonal antibodies ranked in the top 10 blockbuster drugs. Computational advances in protein modelling and design can have a tangible impact on antibody-based therapeutic development. Antibody-specific computational protocols currently benefit from an increasing volume of data provided by next generation sequencing and application to related drug modalities based on traditional antibodies, such as nanobodies. Here we present a structured overview of available databases, methods and emerging trends in computational antibody analysis and contextualize them towards the engineering of candidate antibody therapeutics.
Collapse
|
11
|
Ghraichy M, Galson JD, Kovaltsuk A, von Niederhäusern V, Pachlopnik Schmid J, Recher M, Jauch AJ, Miho E, Kelly DF, Deane CM, Trück J. Maturation of the Human Immunoglobulin Heavy Chain Repertoire With Age. Front Immunol 2020; 11:1734. [PMID: 32849618 PMCID: PMC7424015 DOI: 10.3389/fimmu.2020.01734] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 06/29/2020] [Indexed: 01/01/2023] Open
Abstract
B cells play a central role in adaptive immune processes, mainly through the production of antibodies. The maturation of the B cell system with age is poorly studied. We extensively investigated age-related alterations of naïve and antigen-experienced immunoglobulin heavy chain (IgH) repertoires. The most significant changes were observed in the first 10 years of life, and were characterized by altered immunoglobulin gene usage and an increased frequency of mutated antibodies structurally diverging from their germline precursors. Older age was associated with an increased usage of downstream IgH constant region genes and fewer antibodies with self-reactive properties. As mutations accumulated with age, the frequency of germline-encoded self-reactive antibodies decreased, indicating a possible beneficial role of self-reactive B cells in the developing immune system. Our results suggest a continuous process of change through childhood across a broad range of parameters characterizing IgH repertoires and stress the importance of using well-selected, age-appropriate controls in IgH studies.
Collapse
Affiliation(s)
- Marie Ghraichy
- Division of Immunology, University Children's Hospital, University of Zurich, Zurich, Switzerland.,Children's Research Center, University of Zurich, Zurich, Switzerland
| | - Jacob D Galson
- Children's Research Center, University of Zurich, Zurich, Switzerland.,Alchemab Therapeutics Ltd, London, United Kingdom
| | | | - Valentin von Niederhäusern
- Division of Immunology, University Children's Hospital, University of Zurich, Zurich, Switzerland.,Children's Research Center, University of Zurich, Zurich, Switzerland
| | - Jana Pachlopnik Schmid
- Division of Immunology, University Children's Hospital, University of Zurich, Zurich, Switzerland.,Children's Research Center, University of Zurich, Zurich, Switzerland
| | - Mike Recher
- Immunodeficiency Laboratory, Department of Biomedicine, University and University Hospital of Basel, Basel, Switzerland
| | - Annaïse J Jauch
- Immunodeficiency Laboratory, Department of Biomedicine, University and University Hospital of Basel, Basel, Switzerland
| | - Enkelejda Miho
- Institute of Medical Engineering and Medical Informatics, University of Applied Sciences and Arts Northwestern Switzerland FHNW, Muttenz, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,aiNET GmbH, Basel, Switzerland
| | - Dominic F Kelly
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford, Oxford, United Kingdom.,Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom
| | - Charlotte M Deane
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Johannes Trück
- Division of Immunology, University Children's Hospital, University of Zurich, Zurich, Switzerland.,Children's Research Center, University of Zurich, Zurich, Switzerland
| |
Collapse
|
12
|
Wong WK, Georges G, Ros F, Kelm S, Lewis AP, Taddese B, Leem J, Deane CM. SCALOP: sequence-based antibody canonical loop structure annotation. Bioinformatics 2020; 35:1774-1776. [PMID: 30321295 PMCID: PMC6513161 DOI: 10.1093/bioinformatics/bty877] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 09/17/2018] [Accepted: 10/13/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Canonical forms of the antibody complementarity-determining regions (CDRs) were first described in 1987 and have been redefined on multiple occasions since. The canonical forms are often used to approximate the antibody binding site shape as they can be predicted from sequence. A rapid predictor would facilitate the annotation of CDR structures in the large amounts of repertoire data now becoming available from next generation sequencing experiments. RESULTS SCALOP annotates CDR canonical forms for antibody sequences, supported by an auto-updating database to capture the latest cluster information. Its accuracy is comparable to that of a standard structural predictor but it is 800 times faster. The auto-updating nature of SCALOP ensures that it always attains the best possible coverage. AVAILABILITY AND IMPLEMENTATION SCALOP is available as a web application and for download under a GPLv3 license at opig.stats.ox.ac.uk/webapps/scalop. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wing Ki Wong
- Department of Statistics, University of Oxford, Oxford, UK
| | - Guy Georges
- Roche Pharma Research and Early Development, Large Molecule Research Roche Innovation Center Munich, Penzberg, Germany
| | - Francesca Ros
- Roche Pharma Research and Early Development, Large Molecule Research Roche Innovation Center Munich, Penzberg, Germany
| | | | - Alan P Lewis
- Computational and Modelling Sciences, GlaxoSmithKline Research and Development, Stevenage, UK
| | - Bruck Taddese
- Antibody Discovery and Protein Engineering, MedImmune, Granta Park, Cambridge, UK
| | - Jinwoo Leem
- Department of Statistics, University of Oxford, Oxford, UK
| | | |
Collapse
|
13
|
Marks C, Deane CM. How repertoire data are changing antibody science. J Biol Chem 2020; 295:9823-9837. [PMID: 32409582 DOI: 10.1074/jbc.rev120.010181] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 04/28/2020] [Indexed: 12/13/2022] Open
Abstract
Antibodies are vital proteins of the immune system that recognize potentially harmful molecules and initiate their removal. Mammals can efficiently create vast numbers of antibodies with different sequences capable of binding to any antigen with high affinity and specificity. Because they can be developed to bind to many disease agents, antibodies can be used as therapeutics. In an organism, after antigen exposure, antibodies specific to that antigen are enriched through clonal selection, expansion, and somatic hypermutation. The antibodies present in an organism therefore report on its immune status, describe its innate ability to deal with harmful substances, and reveal how it has previously responded. Next-generation sequencing technologies are being increasingly used to query the antibody, or B-cell receptor (BCR), sequence repertoire, and the amount of BCR data in public repositories is growing. The Observed Antibody Space database, for example, currently contains over a billion sequences from 68 different studies. Repertoires are available that represent both the naive state (i.e. antigen-inexperienced) and that after immunization. This wealth of data has created opportunities to learn more about our immune system. In this review, we discuss the many ways in which BCR repertoire data have been or could be exploited. We highlight its utility for providing insights into how the naive immune repertoire is generated and how it responds to antigens. We also consider how structural information can be used to enhance these data and may lead to more accurate depictions of the sequence space and to applications in the discovery of new therapeutics.
Collapse
Affiliation(s)
- Claire Marks
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Charlotte M Deane
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
14
|
Prechl J. Network Organization of Antibody Interactions in Sequence and Structure Space: the RADARS Model. Antibodies (Basel) 2020; 9:antib9020013. [PMID: 32384800 PMCID: PMC7345901 DOI: 10.3390/antib9020013] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 04/09/2020] [Accepted: 04/15/2020] [Indexed: 02/06/2023] Open
Abstract
Adaptive immunity in vertebrates is a complex self-organizing network of molecular interactions. While deep sequencing of the immune-receptor repertoire may reveal clonal relationships, functional interpretation of such data is hampered by the inherent limitations of converting sequence to structure to function. In this paper, a novel model of antibody interaction space and network, termed radial adjustment of system resolution, RAdial ADjustment of System Resolution (RADARS), is proposed. The model is based on the radial growth of interaction affinity of antibodies towards an infinity of directions in structure space, each direction corresponding to particular shapes of antigen epitopes. Levels of interaction affinity appear as free energy shells of the system, where hierarchical B-cell development and differentiation takes place. Equilibrium in this immunological thermodynamic system can be described by a power law distribution of antibody-free energies with an ideal network degree exponent of phi square, representing a scale-free fractal network of antibody interactions. Plasma cells are network hubs, memory B cells are nodes with intermediate degrees, and B1 cells function as nodes with minimal degree. Overall, the RADARS model implies that a finite number of antibody structures can interact with an infinite number of antigens by immunologically controlled adjustment of interaction energy distribution. Understanding quantitative network properties of the system should help the organization of sequence-derived predicted structural data.
Collapse
Affiliation(s)
- József Prechl
- Diagnosticum Zrt., 126. Attila u., 1047 Budapest, Hungary
| |
Collapse
|
15
|
Kovaltsuk A, Raybould MIJ, Wong WK, Marks C, Kelm S, Snowden J, Trück J, Deane CM. Structural diversity of B-cell receptor repertoires along the B-cell differentiation axis in humans and mice. PLoS Comput Biol 2020; 16:e1007636. [PMID: 32069281 PMCID: PMC7048297 DOI: 10.1371/journal.pcbi.1007636] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2019] [Revised: 02/28/2020] [Accepted: 01/07/2020] [Indexed: 01/18/2023] Open
Abstract
Most current analysis tools for antibody next-generation sequencing data work with primary sequence descriptors, leaving accompanying structural information unharnessed. We have used novel rapid methods to structurally characterize the complementary-determining regions (CDRs) of more than 180 million human and mouse B-cell receptor (BCR) repertoire sequences. These structurally annotated CDRs provide unprecedented insights into both the structural predetermination and dynamics of the adaptive immune response. We show that B-cell types can be distinguished based solely on these structural properties. Antigen-unexperienced BCR repertoires use the highest number and diversity of CDR structures and these patterns of naïve repertoire paratope usage are highly conserved across subjects. In contrast, more differentiated B-cells are more personalized in terms of CDR structure usage. Our results establish the CDR structure differences in BCR repertoires and have applications for many fields including immunodiagnostics, phage display library generation, and “humanness” assessment of BCR repertoires from transgenic animals. The software tool for structural annotation of BCR repertoires, SAAB+, is available at https://github.com/oxpig/saab_plus. B-cell receptors (BCR) are the major components of the adaptive immune system. These are immunoglobulin molecules that bind to foreign substances known as antigens. Each individual has a huge BCR repertoire, where each individual BCR has a specific binding site composed of the complementary-determining regions (CDRs) capable of recognising a specific antigen. Drug discovery and immunodiagnostics inspired by the adaptive immune system rely on our ability to accurately interrogate the structural diversity of the binding sites of the BCR repertoire. Here we report our novel rapid pipeline, SAAB+, which has enabled us to interrogate how the structure of the CDR changes in BCR repertoires along the B-cell differentiation axis. By analysing human and mouse BCR repertoires at an unprecedented scale, we observed species-specific structural predetermination and detected CDR dynamics across multiple stages of B-cell differentiation. We showed that naïve repertoires share the highest number and diversity of CDR structures, a pattern which was highly conserved in all B-cell donors. Our results suggest that increased B-cell differentiation is associated with a personalization of CDR structure usages. Finally, we established the differences in CDR usages between humans and mice, analysis with immediate relevance for BCR repertoire “humanness” assessment and rational immunotherapeutic engineering.
Collapse
Affiliation(s)
| | | | - Wing Ki Wong
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Claire Marks
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | | | | | - Johannes Trück
- Division of Immunology, University Children's Hospital, University of Zurich, Zurich, Switzerland
| | - Charlotte M. Deane
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
16
|
Abstract
The origins of the various elements in the human antibody repertoire have been and still are subject to considerable uncertainty. Uncertainty in respect of whether the various elements have always served a specific defense function or whether they were co-opted from other organismal roles to form a crude naïve repertoire that then became more complex as combinatorial mechanisms were added. Estimates of the current size of the human antibody naïve repertoire are also widely debated with numbers anywhere from 10 million members, based on experimentally derived numbers, to in excess of one thousand trillion members or more, based on the different sequences derived from theoretical combinatorial calculations. There are questions that are relevant at both ends of this number spectrum. At the lower bound it could be questioned whether this is an insufficient repertoire size to counter all the potential antigen-bearing pathogens. At the upper bound the question is rather simpler: How can any individual interrogate such an astronomical number of antibody-bearing B cells in a timeframe that is meaningful? This review evaluates the evolutionary aspects of the adaptive immune system, the calculations that lead to the large repertoire estimates, some of the experimental evidence pointing to a more restricted repertoire whose variation appears to derive from convergent 'structure and specificity features', and includes a theoretical model that seems to support it. Finally, a solution that may reconcile the size difference anomaly, which is still a hot subject of debate, is suggested.
Collapse
|
17
|
Jensen KK, Rantos V, Jappe EC, Olsen TH, Jespersen MC, Jurtz V, Jessen LE, Lanzarotti E, Mahajan S, Peters B, Nielsen M, Marcatili P. TCRpMHCmodels: Structural modelling of TCR-pMHC class I complexes. Sci Rep 2019; 9:14530. [PMID: 31601838 PMCID: PMC6787230 DOI: 10.1038/s41598-019-50932-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Accepted: 09/09/2019] [Indexed: 01/30/2023] Open
Abstract
The interaction between the class I major histocompatibility complex (MHC), the peptide presented by the MHC and the T-cell receptor (TCR) is a key determinant of the cellular immune response. Here, we present TCRpMHCmodels, a method for accurate structural modelling of the TCR-peptide-MHC (TCR-pMHC) complex. This TCR-pMHC modelling pipeline takes as input the amino acid sequence and generates models of the TCR-pMHC complex, with a median Cα RMSD of 2.31 Å. TCRpMHCmodels significantly outperforms TCRFlexDock, a specialised method for docking pMHC and TCR structures. TCRpMHCmodels is simple to use and the modelling pipeline takes, on average, only two minutes. Thanks to its ease of use and high modelling accuracy, we expect TCRpMHCmodels to provide insights into the underlying mechanisms of TCR and pMHC interactions and aid in the development of advanced T-cell-based immunotherapies and rational design of vaccines. The TCRpMHCmodels tool is available at http://www.cbs.dtu.dk/services/TCRpMHCmodels/.
Collapse
Affiliation(s)
| | - Vasileios Rantos
- Department of Bio and Health Informatics, Technical University of Denmark, Kgs. Lyngby, Denmark.,Centre for Structural Systems Biology (CSSB), DESY and European Molecular Biology Laboratory, Notkestrasse 85, 22607, Hamburg, Germany
| | - Emma Christine Jappe
- Department of Bio and Health Informatics, Technical University of Denmark, Kgs. Lyngby, Denmark.,Evaxion Biotech, Bredgade 34E, 1260, Copenhagen, Denmark
| | - Tobias Hegelund Olsen
- Department of Bio and Health Informatics, Technical University of Denmark, Kgs. Lyngby, Denmark
| | | | - Vanessa Jurtz
- Department of Bioinformatics and Data Mining, Novo Nordisk A/S, 2760, Måløv, Denmark
| | - Leon Eyrich Jessen
- Department of Bio and Health Informatics, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Esteban Lanzarotti
- Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina
| | - Swapnil Mahajan
- Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, La Jolla, CA 92037, USA
| | - Bjoern Peters
- Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, La Jolla, CA 92037, USA.,University of California San Diego, Department of Medicine, La Jolla, CA 92037, USA
| | - Morten Nielsen
- Department of Bio and Health Informatics, Technical University of Denmark, Kgs. Lyngby, Denmark.,Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina
| | - Paolo Marcatili
- Department of Bio and Health Informatics, Technical University of Denmark, Kgs. Lyngby, Denmark.
| |
Collapse
|
18
|
Krawczyk K, Raybould MIJ, Kovaltsuk A, Deane CM. Looking for therapeutic antibodies in next-generation sequencing repositories. MAbs 2019; 11:1197-1205. [PMID: 31216939 PMCID: PMC6748601 DOI: 10.1080/19420862.2019.1633884] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Revised: 06/14/2019] [Accepted: 06/14/2019] [Indexed: 12/20/2022] Open
Abstract
Recently it has become possible to query the great diversity of natural antibody repertoires using next-generation sequencing (NGS). These methods are capable of producing millions of sequences in a single experiment. Here we compare clinical-stage therapeutic antibodies to the ~1b sequences from 60 independent sequencing studies in the Observed Antibody Space database, which includes antibody sequences from NGS analysis of immunoglobulin gene repertoires. Of 242 post-Phase 1 antibodies, we found 16 with sequence identity matches of 95% or better for both heavy and light chains. There are also 54 perfect matches to therapeutic CDR-H3 regions in the NGS outputs, suggesting a nontrivial amount of convergence between naturally observed sequences and those developed artificially. This has potential implications for both the legal protection of commercial antibodies and the discovery of antibody therapeutics.
Collapse
|
19
|
Li L, Chen S, Miao Z, Liu Y, Liu X, Xiao ZX, Cao Y. AbRSA: A robust tool for antibody numbering. Protein Sci 2019; 28:1524-1531. [PMID: 31020723 DOI: 10.1002/pro.3633] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2019] [Accepted: 04/18/2019] [Indexed: 12/25/2022]
Abstract
The remarkable progress in cancer immunotherapy in recent years has led to the heat of great development for therapeutic antibodies. Antibody numbering, which standardizes a residue index at each position of an antibody variable domain, is an important step in immunoinformatic analysis. It provides an equivalent index for the comparison of sequences or structures, which is particularly valuable for antibody modeling and engineering. However, due to the extremely high diversity of antibody sequences, antibody-numbering tools cannot work in all cases. This article introduces a new antibody-numbering tool named AbRSA, which integrates heuristic knowledge of region-specific features into sequence mapping to enhance the robustness. The benchmarks demonstrate that, AbRSA exhibits robust performance in numbering sequences with diverse lengths and patterns compared with the state-of-the-art tools. AbRSA offers a user-friendly interface for antibody numbering, complementarity-determining region delimitation, and 3D structure rendering. It is freely available at http://cao.labshare.cn/AbRSA.
Collapse
Affiliation(s)
- Lei Li
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, People's Republic of China
| | - Shuang Chen
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, People's Republic of China
| | - Zhichao Miao
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, CB10 1SD, United Kingdom.,Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, United Kingdom
| | - Yang Liu
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, People's Republic of China
| | - Xu Liu
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, People's Republic of China
| | - Zhi-Xiong Xiao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, People's Republic of China
| | - Yang Cao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, People's Republic of China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109-2218
| |
Collapse
|
20
|
Fink K. Can We Improve Vaccine Efficacy by Targeting T and B Cell Repertoire Convergence? Front Immunol 2019; 10:110. [PMID: 30814993 PMCID: PMC6381292 DOI: 10.3389/fimmu.2019.00110] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 01/15/2019] [Indexed: 01/31/2023] Open
Abstract
Traditional vaccine development builds on the assumption that healthy individuals have virtually unlimited antigen recognition repertoires of receptors in B cells and T cells [the B cell receptor (BCR) and TCR respectively]. However, there are indications that there are "holes" in the breadth of repertoire diversity, where no or few B or T cell are able to bind to a given antigen. Repertoire diversity may in these cases be a limiting factor for vaccine efficacy. Assuming that it is possible to predict which B and T cell receptors will respond to a given immunogen, vaccine strategies could be optimized and personalized. In addition, vaccine testing could be simplified if we could predict responses through sequencing BCR and TCRs. Bulk sequencing has shown putatively specific converging sequences after infection or vaccination. However, only single cell technologies have made it possible to capture the sequence of both heavy and light chains of a BCR or the alpha and beta chains the TCR. This has enabled the cloning of receptors and the functional validation of a predicted specificity. This review summarizes recent evidence of converging sequences in infectious diseases. Current and potential future applications of single cell technology in immune repertoire analysis are then discussed. Finally, possible short- and long- term implications for vaccine research are highlighted.
Collapse
Affiliation(s)
- Katja Fink
- Singapore Immunology Network, Agency for Science, Technology and Research, Singapore, Singapore
| |
Collapse
|
21
|
Kovaltsuk A, Krawczyk K, Kelm S, Snowden J, Deane CM. Filtering Next-Generation Sequencing of the Ig Gene Repertoire Data Using Antibody Structural Information. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2018; 201:3694-3704. [PMID: 30397033 PMCID: PMC6485405 DOI: 10.4049/jimmunol.1800669] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Accepted: 10/02/2018] [Indexed: 01/29/2023]
Abstract
Next-generation sequencing of the Ig gene repertoire (Ig-seq) produces large volumes of information at the nucleotide sequence level. Such data have improved our understanding of immune systems across numerous species and have already been successfully applied in vaccine development and drug discovery. However, the high-throughput nature of Ig-seq means that it is afflicted by high error rates. This has led to the development of error-correction approaches. Computational error-correction methods use sequence information alone, primarily designating sequences as likely to be correct if they are observed frequently. In this work, we describe an orthogonal method for filtering Ig-seq data, which considers the structural viability of each sequence. A typical natural Ab structure requires the presence of a disulfide bridge within each of its variable chains to maintain the fold. Our Ab Sequence Selector (ABOSS) uses the presence/absence of this bridge as a way of both identifying structurally viable sequences and estimating the sequencing error rate. On simulated Ig-seq datasets, ABOSS is able to identify more than 99% of structurally viable sequences. Applying our method to six independent Ig-seq datasets (one mouse and five human), we show that our error calculations are in line with previous experimental and computational error estimates. We also show how ABOSS is able to identify structurally impossible sequences missed by other error-correction methods.
Collapse
Affiliation(s)
- Aleksandr Kovaltsuk
- Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom; and
| | - Konrad Krawczyk
- Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom; and
| | | | | | - Charlotte M Deane
- Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom; and
| |
Collapse
|