1
|
Curion F, Theis FJ. Machine learning integrative approaches to advance computational immunology. Genome Med 2024; 16:80. [PMID: 38862979 PMCID: PMC11165829 DOI: 10.1186/s13073-024-01350-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 05/23/2024] [Indexed: 06/13/2024] Open
Abstract
The study of immunology, traditionally reliant on proteomics to evaluate individual immune cells, has been revolutionized by single-cell RNA sequencing. Computational immunologists play a crucial role in analysing these datasets, moving beyond traditional protein marker identification to encompass a more detailed view of cellular phenotypes and their functional roles. Recent technological advancements allow the simultaneous measurements of multiple cellular components-transcriptome, proteome, chromatin, epigenetic modifications and metabolites-within single cells, including in spatial contexts within tissues. This has led to the generation of complex multiscale datasets that can include multimodal measurements from the same cells or a mix of paired and unpaired modalities. Modern machine learning (ML) techniques allow for the integration of multiple "omics" data without the need for extensive independent modelling of each modality. This review focuses on recent advancements in ML integrative approaches applied to immunological studies. We highlight the importance of these methods in creating a unified representation of multiscale data collections, particularly for single-cell and spatial profiling technologies. Finally, we discuss the challenges of these holistic approaches and how they will be instrumental in the development of a common coordinate framework for multiscale studies, thereby accelerating research and enabling discoveries in the computational immunology field.
Collapse
Affiliation(s)
- Fabiola Curion
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| |
Collapse
|
2
|
Boone K, Tjokro N, Chu KN, Chen C, Snead ML, Tamerler C. Machine learning enabled design features of antimicrobial peptides selectively targeting peri-implant disease progression. FRONTIERS IN DENTAL MEDICINE 2024; 5:1372534. [PMID: 38846578 PMCID: PMC11155447 DOI: 10.3389/fdmed.2024.1372534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2024] Open
Abstract
Peri-implantitis is a complex infectious disease that manifests as progressive loss of alveolar bone around the dental implants and hyper-inflammation associated with microbial dysbiosis. Using antibiotics in treating peri-implantitis is controversial because of antibiotic resistance threats, the non-selective suppression of pathogens and commensals within the microbial community, and potentially serious systemic sequelae. Therefore, conventional treatment for peri-implantitis comprises mechanical debridement by nonsurgical or surgical approaches with adjunct local microbicidal agents. Consequently, current treatment options may not prevent relapses, as the pathogens either remain unaffected or quickly re-emerge after treatment. Successful mitigation of disease progression in peri-implantitis requires a specific mode of treatment capable of targeting keystone pathogens and restoring bacterial community balance toward commensal species. Antimicrobial peptides (AMPs) hold promise as alternative therapeutics through their bacterial specificity and targeted inhibitory activity. However, peptide sequence space exhibits complex relationships such as sparse vector encoding of sequences, including combinatorial and discrete functions describing peptide antimicrobial activity. In this paper, we generated a transparent Machine Learning (ML) model that identifies sequence-function relationships based on rough set theory using simple summaries of the hydropathic features of AMPs. Comparing the hydropathic features of peptides according to their differential activity for different classes of bacteria empowered predictability of antimicrobial targeting. Enriching the sequence diversity by a genetic algorithm, we generated numerous candidate AMPs designed for selectively targeting pathogens and predicted their activity using classifying rough sets. Empirical growth inhibition data is iteratively fed back into our ML training to generate new peptides, resulting in increasingly more rigorous rules for which peptides match targeted inhibition levels for specific bacterial strains. The subsequent top scoring candidates were empirically tested for their inhibition against keystone and accessory peri-implantitis pathogens as well as an oral commensal bacterium. A novel peptide, VL-13, was confirmed to be selectively active against a keystone pathogen. Considering the continually increasing number of oral implants placed each year and the complexity of the disease progression, prevalence of peri-implant diseases continues to rise. Our approach offers transparent ML-enabled paths towards developing antimicrobial peptide-based therapies targeting the changes in the microbial communities that can beneficially impact disease progression.
Collapse
Affiliation(s)
- Kyle Boone
- Institute for Bioengineering Research, University of Kansas, Lawrence, KS, United States
- Department of Mechanical Engineering, University of Kansas, Lawrence, KS, United States
| | - Natalia Tjokro
- Center for Craniofacial Molecular Biology, Herman Ostrow School of Dentistry of USC, University of Southern California, Los Angeles, CA, United States
| | - Kalea N. Chu
- Institute for Bioengineering Research, University of Kansas, Lawrence, KS, United States
- Bioengineering Program, University of Kansas, Lawrence, KS, United States
| | - Casey Chen
- Center for Craniofacial Molecular Biology, Herman Ostrow School of Dentistry of USC, University of Southern California, Los Angeles, CA, United States
| | - Malcolm L. Snead
- Center for Craniofacial Molecular Biology, Herman Ostrow School of Dentistry of USC, University of Southern California, Los Angeles, CA, United States
- Bioengineering Program, University of Kansas, Lawrence, KS, United States
| | - Candan Tamerler
- Institute for Bioengineering Research, University of Kansas, Lawrence, KS, United States
- Department of Mechanical Engineering, University of Kansas, Lawrence, KS, United States
- Bioengineering Program, University of Kansas, Lawrence, KS, United States
| |
Collapse
|
3
|
This S, Costantino S, Melichar HJ. Machine learning predictions of T cell antigen specificity from intracellular calcium dynamics. SCIENCE ADVANCES 2024; 10:eadk2298. [PMID: 38446885 PMCID: PMC10917351 DOI: 10.1126/sciadv.adk2298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/29/2024] [Indexed: 03/08/2024]
Abstract
Adoptive T cell therapies rely on the production of T cells with an antigen receptor that directs their specificity toward tumor-specific antigens. Methods for identifying relevant T cell receptor (TCR) sequences, predominantly achieved through the enrichment of antigen-specific T cells, represent a major bottleneck in the production of TCR-engineered cell therapies. Fluctuation of intracellular calcium is a proximal readout of TCR signaling and candidate marker for antigen-specific T cell identification that does not require T cell expansion; however, calcium fluctuations downstream of TCR engagement are highly variable. We propose that machine learning algorithms may allow for T cell classification from complex datasets such as polyclonal T cell signaling events. Using deep learning tools, we demonstrate accurate prediction of TCR-transgenic CD8+ T cell activation based on calcium fluctuations and test the algorithm against T cells bearing a distinct TCR as well as polyclonal T cells. This provides the foundation for an antigen-specific TCR sequence identification pipeline for adoptive T cell therapies.
Collapse
Affiliation(s)
- Sébastien This
- Centre de recherche de l'Hôpital Maisonneuve-Rosemont, Montréal, Québec, Canada
- Département de Microbiologie, Infectiologie et Immunologie, Université de Montréal, Montréal, Québec, Canada
- Department of Microbiology and Immunology, Goodman Cancer Institute, McGill University, Montréal, Québec, Canada
| | - Santiago Costantino
- Centre de recherche de l'Hôpital Maisonneuve-Rosemont, Montréal, Québec, Canada
- Département d’Ophtalmologie, Université de Montréal, Montréal, Québec, Canada
| | - Heather J. Melichar
- Centre de recherche de l'Hôpital Maisonneuve-Rosemont, Montréal, Québec, Canada
- Department of Microbiology and Immunology, Goodman Cancer Institute, McGill University, Montréal, Québec, Canada
- Département de Médecine, Université de Montréal, Montréal, Québec, Canada
| |
Collapse
|
4
|
Irvine EB, Reddy ST. Advancing Antibody Engineering through Synthetic Evolution and Machine Learning. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2024; 212:235-243. [PMID: 38166249 DOI: 10.4049/jimmunol.2300492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 10/20/2023] [Indexed: 01/04/2024]
Abstract
Abs are versatile molecules with the potential to achieve exceptional binding to target Ags, while also possessing biophysical properties suitable for therapeutic drug development. Protein display and directed evolution systems have transformed synthetic Ab discovery, engineering, and optimization, vastly expanding the number of Ab clones able to be experimentally screened for binding. Moreover, the burgeoning integration of high-throughput screening, deep sequencing, and machine learning has further augmented in vitro Ab optimization, promising to accelerate the design process and massively expand the Ab sequence space interrogated. In this Brief Review, we discuss the experimental and computational tools employed in synthetic Ab engineering and optimization. We also explore the therapeutic challenges posed by developing Abs for infectious diseases, and the prospects for leveraging machine learning-guided protein engineering to prospectively design Abs resistant to viral escape.
Collapse
Affiliation(s)
- Edward B Irvine
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Sai T Reddy
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| |
Collapse
|
5
|
Li T, Li Y, Zhu X, He Y, Wu Y, Ying T, Xie Z. Artificial intelligence in cancer immunotherapy: Applications in neoantigen recognition, antibody design and immunotherapy response prediction. Semin Cancer Biol 2023; 91:50-69. [PMID: 36870459 DOI: 10.1016/j.semcancer.2023.02.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 02/13/2023] [Accepted: 02/28/2023] [Indexed: 03/06/2023]
Abstract
Cancer immunotherapy is a method of controlling and eliminating tumors by reactivating the body's cancer-immunity cycle and restoring its antitumor immune response. The increased availability of data, combined with advancements in high-performance computing and innovative artificial intelligence (AI) technology, has resulted in a rise in the use of AI in oncology research. State-of-the-art AI models for functional classification and prediction in immunotherapy research are increasingly used to support laboratory-based experiments. This review offers a glimpse of the current AI applications in immunotherapy, including neoantigen recognition, antibody design, and prediction of immunotherapy response. Advancing in this direction will result in more robust predictive models for developing better targets, drugs, and treatments, and these advancements will eventually make their way into the clinical setting, pushing AI forward in the field of precision oncology.
Collapse
Affiliation(s)
- Tong Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yupeng Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Xiaoyi Zhu
- MOE/NHC Key Laboratory of Medical Molecular Virology, Shanghai Institute of Infectious Disease and Biosecurity, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China; Shanghai Engineering Research Center for Synthetic Immunology, Shanghai, China
| | - Yao He
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yanling Wu
- MOE/NHC Key Laboratory of Medical Molecular Virology, Shanghai Institute of Infectious Disease and Biosecurity, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China; Shanghai Engineering Research Center for Synthetic Immunology, Shanghai, China
| | - Tianlei Ying
- MOE/NHC Key Laboratory of Medical Molecular Virology, Shanghai Institute of Infectious Disease and Biosecurity, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China; Shanghai Engineering Research Center for Synthetic Immunology, Shanghai, China.
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China; Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
6
|
Kanduri C, Scheffer L, Pavlović M, Rand KD, Chernigovskaya M, Pirvandy O, Yaari G, Greiff V, Sandve GK. simAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods. Gigascience 2022; 12:giad074. [PMID: 37848619 PMCID: PMC10580376 DOI: 10.1093/gigascience/giad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 07/20/2023] [Accepted: 08/29/2023] [Indexed: 10/19/2023] Open
Abstract
BACKGROUND Machine learning (ML) has gained significant attention for classifying immune states in adaptive immune receptor repertoires (AIRRs) to support the advancement of immunodiagnostics and therapeutics. Simulated data are crucial for the rigorous benchmarking of AIRR-ML methods. Existing approaches to generating synthetic benchmarking datasets result in the generation of naive repertoires missing the key feature of many shared receptor sequences (selected for common antigens) found in antigen-experienced repertoires. RESULTS We demonstrate that a common approach to generating simulated AIRR benchmark datasets can introduce biases, which may be exploited for undesired shortcut learning by certain ML methods. To mitigate undesirable access to true signals in simulated AIRR datasets, we devised a simulation strategy (simAIRR) that constructs antigen-experienced-like repertoires with a realistic overlap of receptor sequences. simAIRR can be used for constructing AIRR-level benchmarks based on a range of assumptions (or experimental data sources) for what constitutes receptor-level immune signals. This includes the possibility of making or not making any prior assumptions regarding the similarity or commonality of immune state-associated sequences that will be used as true signals. We demonstrate the real-world realism of our proposed simulation approach by showing that basic ML strategies perform similarly on simAIRR-generated and real-world experimental AIRR datasets. CONCLUSIONS This study sheds light on the potential shortcut learning opportunities for ML methods that can arise with the state-of-the-art way of simulating AIRR datasets. simAIRR is available as a Python package: https://github.com/KanduriC/simAIRR.
Collapse
Affiliation(s)
- Chakravarthi Kanduri
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
- UiORealArt Convergence Environment, University of Oslo, 0373 Oslo, Norway
| | - Lonneke Scheffer
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
| | - Milena Pavlović
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
- UiORealArt Convergence Environment, University of Oslo, 0373 Oslo, Norway
| | - Knut Dagestad Rand
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
| | - Maria Chernigovskaya
- Department of Immunology and Oslo University Hospital, University of Oslo, 0373 Oslo, Norway
| | - Oz Pirvandy
- Faculty of Engineering, Bar-Ilan University, 5290002, Israel
| | - Gur Yaari
- Faculty of Engineering, Bar-Ilan University, 5290002, Israel
| | - Victor Greiff
- Department of Immunology and Oslo University Hospital, University of Oslo, 0373 Oslo, Norway
| | - Geir K Sandve
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
- UiORealArt Convergence Environment, University of Oslo, 0373 Oslo, Norway
| |
Collapse
|
7
|
Weber CR, Rubio T, Wang L, Zhang W, Robert PA, Akbar R, Snapkov I, Wu J, Kuijjer ML, Tarazona S, Conesa A, Sandve GK, Liu X, Reddy ST, Greiff V. Reference-based comparison of adaptive immune receptor repertoires. CELL REPORTS METHODS 2022; 2:100269. [PMID: 36046619 PMCID: PMC9421535 DOI: 10.1016/j.crmeth.2022.100269] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 04/01/2022] [Accepted: 07/19/2022] [Indexed: 11/26/2022]
Abstract
B and T cell receptor (immune) repertoires can represent an individual's immune history. While current repertoire analysis methods aim to discriminate between health and disease states, they are typically based on only a limited number of parameters. Here, we introduce immuneREF: a quantitative multidimensional measure of adaptive immune repertoire (and transcriptome) similarity that allows interpretation of immune repertoire variation by relying on both repertoire features and cross-referencing of simulated and experimental datasets. To quantify immune repertoire similarity landscapes across health and disease, we applied immuneREF to >2,400 datasets from individuals with varying immune states (healthy, [autoimmune] disease, and infection). We discovered, in contrast to the current paradigm, that blood-derived immune repertoires of healthy and diseased individuals are highly similar for certain immune states, suggesting that repertoire changes to immune perturbations are less pronounced than previously thought. In conclusion, immuneREF enables the population-wide study of adaptive immune response similarity across immune states.
Collapse
Affiliation(s)
- Cédric R. Weber
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Teresa Rubio
- Laboratory of Neurobiology, Centro Investigación Príncipe Felipe, Valencia, Spain
| | - Longlong Wang
- BGI-Shenzhen, Shenzhen, China
- BGI-Education Center, University of Chinese Academy of Sciences, Shenzhen, China
| | - Wei Zhang
- BGI-Shenzhen, Shenzhen, China
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Philippe A. Robert
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, Norway
| | - Rahmad Akbar
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, Norway
| | - Igor Snapkov
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, Norway
| | | | - Marieke L. Kuijjer
- Centre for Molecular Medicine Norway, University of Oslo, Oslo, Norway
- Department of Pathology, Leiden University Medical Center, Leiden, the Netherlands
- Leiden Center for Computational Oncology, Leiden University Medical Center, Leiden, the Netherlands
| | - Sonia Tarazona
- Departamento de Estadística e Investigación Operativa Aplicadas y Calidad, Universitat Politècnica de València, Valencia, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Valencia, Spain
| | - Geir K. Sandve
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Xiao Liu
- BGI-Shenzhen, Shenzhen, China
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
| | - Sai T. Reddy
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Victor Greiff
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, Norway
| |
Collapse
|
8
|
Designing antibodies as therapeutics. Cell 2022; 185:2789-2805. [PMID: 35868279 DOI: 10.1016/j.cell.2022.05.029] [Citation(s) in RCA: 67] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 05/18/2022] [Accepted: 05/31/2022] [Indexed: 12/25/2022]
Abstract
Antibody therapeutics are a large and rapidly expanding drug class providing major health benefits. We provide a snapshot of current antibody therapeutics including their formats, common targets, therapeutic areas, and routes of administration. Our focus is on selected emerging directions in antibody design where progress may provide a broad benefit. These topics include enhancing antibodies for cancer, antibody delivery to organs such as the brain, gastrointestinal tract, and lungs, plus antibody developability challenges including immunogenicity risk assessment and mitigation and subcutaneous delivery. Machine learning has the potential, albeit as yet largely unrealized, for a transformative future impact on antibody discovery and engineering.
Collapse
|
9
|
Wilman W, Wróbel S, Bielska W, Deszynski P, Dudzic P, Jaszczyszyn I, Kaniewski J, Młokosiewicz J, Rouyan A, Satława T, Kumar S, Greiff V, Krawczyk K. Machine-designed biotherapeutics: opportunities, feasibility and advantages of deep learning in computational antibody discovery. Brief Bioinform 2022; 23:bbac267. [PMID: 35830864 PMCID: PMC9294429 DOI: 10.1093/bib/bbac267] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/09/2022] [Accepted: 06/07/2022] [Indexed: 11/13/2022] Open
Abstract
Antibodies are versatile molecular binders with an established and growing role as therapeutics. Computational approaches to developing and designing these molecules are being increasingly used to complement traditional lab-based processes. Nowadays, in silico methods fill multiple elements of the discovery stage, such as characterizing antibody-antigen interactions and identifying developability liabilities. Recently, computational methods tackling such problems have begun to follow machine learning paradigms, in many cases deep learning specifically. This paradigm shift offers improvements in established areas such as structure or binding prediction and opens up new possibilities such as language-based modeling of antibody repertoires or machine-learning-based generation of novel sequences. In this review, we critically examine the recent developments in (deep) machine learning approaches to therapeutic antibody design with implications for fully computational antibody design.
Collapse
|
10
|
Gao B, Han J, Reddy ST. Learning what not to select for in antibody drug discovery. CELL REPORTS METHODS 2022; 2:100258. [PMID: 35880020 PMCID: PMC9308151 DOI: 10.1016/j.crmeth.2022.100258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Identifying antibodies with high affinity and target specificity is crucial for drug discovery and development; however, filtering out antibody candidates with nonspecific or polyspecific binding profiles is also important. In this issue of Cell Reports Methods, Saksena et al. report a computational counterselection method combining deep sequencing and machine learning for identifying nonspecific antibody candidates and demonstrate that it has advantages over more established molecular counterselection methods.
Collapse
Affiliation(s)
- Beichen Gao
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland
| | - Jiami Han
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland
| | - Sai T. Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland
| |
Collapse
|
11
|
Chen Y, Ye Z, Zhang Y, Xie W, Chen Q, Lan C, Yang X, Zeng H, Zhu Y, Ma C, Tang H, Wang Q, Guan J, Chen S, Li F, Yang W, Yan H, Yu X, Zhang Z. A Deep Learning Model for Accurate Diagnosis of Infection Using Antibody Repertoires. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2022; 208:2675-2685. [PMID: 35606050 DOI: 10.4049/jimmunol.2200063] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 04/11/2022] [Indexed: 06/15/2023]
Abstract
The adaptive immune receptor repertoire consists of the entire set of an individual's BCRs and TCRs and is believed to contain a record of prior immune responses and the potential for future immunity. Analyses of TCR repertoires via deep learning (DL) methods have successfully diagnosed cancers and infectious diseases, including coronavirus disease 2019. However, few studies have used DL to analyze BCR repertoires. In this study, we collected IgG H chain Ab repertoires from 276 healthy control subjects and 326 patients with various infections. We then extracted a comprehensive feature set consisting of 10 subsets of repertoire-level features and 160 sequence-level features and tested whether these features can distinguish between infected individuals and healthy control subjects. Finally, we developed an ensemble DL model, namely, DL method for infection diagnosis (https://github.com/chenyuan0510/DeepID), and used this model to differentiate between the infected and healthy individuals. Four subsets of repertoire-level features and four sequence-level features were selected because of their excellent predictive performance. The DL method for infection diagnosis outperformed traditional machine learning methods in distinguishing between healthy and infected samples (area under the curve = 0.9883) and achieved a multiclassification accuracy of 0.9104. We also observed differences between the healthy and infected groups in V genes usage, clonal expansion, the complexity of reads within clone, the physical properties in the α region, and the local flexibility of the CDR3 amino acid sequence. Our results suggest that the Ab repertoire is a promising biomarker for the diagnosis of various infections.
Collapse
Affiliation(s)
- Yuan Chen
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Zhiming Ye
- Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- Division of Nephrology, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Yanfang Zhang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Wenxi Xie
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Qingyun Chen
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Chunhong Lan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Xiujia Yang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Huikun Zeng
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Yan Zhu
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Cuiyu Ma
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Haipei Tang
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Qilong Wang
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Junjie Guan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Sen Chen
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Fenxiang Li
- Department of Infectious Disease Control and Prevention, Center for Disease Control and Prevention of Southern Theatre Command, Guangzhou, China
| | - Wei Yang
- Department of Pathology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Huacheng Yan
- Department of Infectious Disease Control and Prevention, Center for Disease Control and Prevention of Southern Theatre Command, Guangzhou, China
| | - Xueqing Yu
- Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China;
- Division of Nephrology, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Zhenhai Zhang
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China;
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- State Key Laboratory of Organ Failure Research, Division of Nephrology, Southern Medical University, Guangzhou, China; and
- Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou, China
| |
Collapse
|
12
|
Kanduri C, Pavlović M, Scheffer L, Motwani K, Chernigovskaya M, Greiff V, Sandve GK. Profiling the baseline performance and limits of machine learning models for adaptive immune receptor repertoire classification. Gigascience 2022; 11:giac046. [PMID: 35639633 PMCID: PMC9154052 DOI: 10.1093/gigascience/giac046] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 12/23/2021] [Accepted: 04/08/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Machine learning (ML) methodology development for the classification of immune states in adaptive immune receptor repertoires (AIRRs) has seen a recent surge of interest. However, so far, there does not exist a systematic evaluation of scenarios where classical ML methods (such as penalized logistic regression) already perform adequately for AIRR classification. This hinders investigative reorientation to those scenarios where method development of more sophisticated ML approaches may be required. RESULTS To identify those scenarios where a baseline ML method is able to perform well for AIRR classification, we generated a collection of synthetic AIRR benchmark data sets encompassing a wide range of data set architecture-associated and immune state-associated sequence patterns (signal) complexity. We trained ≈1,700 ML models with varying assumptions regarding immune signal on ≈1,000 data sets with a total of ≈250,000 AIRRs containing ≈46 billion TCRβ CDR3 amino acid sequences, thereby surpassing the sample sizes of current state-of-the-art AIRR-ML setups by two orders of magnitude. We found that L1-penalized logistic regression achieved high prediction accuracy even when the immune signal occurs only in 1 out of 50,000 AIR sequences. CONCLUSIONS We provide a reference benchmark to guide new AIRR-ML classification methodology by (i) identifying those scenarios characterized by immune signal and data set complexity, where baseline methods already achieve high prediction accuracy, and (ii) facilitating realistic expectations of the performance of AIRR-ML models given training data set properties and assumptions. Our study serves as a template for defining specialized AIRR benchmark data sets for comprehensive benchmarking of AIRR-ML methods.
Collapse
Affiliation(s)
- Chakravarthi Kanduri
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway
| | - Milena Pavlović
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway
| | - Lonneke Scheffer
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway
| | - Keshav Motwani
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida,
FL 32610, USA
| | - Maria Chernigovskaya
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, 0372, Norway
| | - Victor Greiff
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, 0372, Norway
| | - Geir K Sandve
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway
| |
Collapse
|
13
|
Dahal-Koirala S, Balaban G, Neumann RS, Scheffer L, Lundin KEA, Greiff V, Sollid LM, Qiao SW, Sandve GK. TCRpower: quantifying the detection power of T-cell receptor sequencing with a novel computational pipeline calibrated by spike-in sequences. Brief Bioinform 2022; 23:bbab566. [PMID: 35062022 PMCID: PMC8921636 DOI: 10.1093/bib/bbab566] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 12/02/2021] [Accepted: 12/11/2021] [Indexed: 01/19/2023] Open
Abstract
T-cell receptor (TCR) sequencing has enabled the development of innovative diagnostic tests for cancers, autoimmune diseases and other applications. However, the rarity of many T-cell clonotypes presents a detection challenge, which may lead to misdiagnosis if diagnostically relevant TCRs remain undetected. To address this issue, we developed TCRpower, a novel computational pipeline for quantifying the statistical detection power of TCR sequencing methods. TCRpower calculates the probability of detecting a TCR sequence as a function of several key parameters: in-vivo TCR frequency, T-cell sample count, read sequencing depth and read cutoff. To calibrate TCRpower, we selected unique TCRs of 45 T-cell clones (TCCs) as spike-in TCRs. We sequenced the spike-in TCRs from TCCs, together with TCRs from peripheral blood, using a 5' RACE protocol. The 45 spike-in TCRs covered a wide range of sample frequencies, ranging from 5 per 100 to 1 per 1 million. The resulting spike-in TCR read counts and ground truth frequencies allowed us to calibrate TCRpower. In our TCR sequencing data, we observed a consistent linear relationship between sample and sequencing read frequencies. We were also able to reliably detect spike-in TCRs with frequencies as low as one per million. By implementing an optimized read cutoff, we eliminated most of the falsely detected sequences in our data (TCR α-chain 99.0% and TCR β-chain 92.4%), thereby improving diagnostic specificity. TCRpower is publicly available and can be used to optimize future TCR sequencing experiments, and thereby enable reliable detection of disease-relevant TCRs for diagnostic applications.
Collapse
Affiliation(s)
- Shiva Dahal-Koirala
- K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, Oslo, 0372, Norway
- Department of Immunology, University of Oslo and Oslo University Hospital-Rikshospitalet, Oslo, 0372, Norway
| | - Gabriel Balaban
- Biomedical Informatics, Department of Informatics, University of Oslo, 0373, Oslo, Norway
- Department of Computational Physiology, Simula Research Laboratory, 1364, Fornebu, Norway
- PharmaTox Strategic Research Initiative, Faculty of Mathematics and Natural Sciences, University of Oslo, 0373, Oslo, Norway
| | - Ralf Stefan Neumann
- K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, Oslo, 0372, Norway
| | - Lonneke Scheffer
- Biomedical Informatics, Department of Informatics, University of Oslo, 0373, Oslo, Norway
| | - Knut Erik Aslaksen Lundin
- K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, Oslo, 0372, Norway
- Department of Gastroenterology, Oslo University Hospital-Rikshospitalet, 0372, Oslo, Norway
| | - Victor Greiff
- Department of Immunology, University of Oslo and Oslo University Hospital-Rikshospitalet, Oslo, 0372, Norway
| | - Ludvig Magne Sollid
- K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, Oslo, 0372, Norway
- Department of Immunology, University of Oslo and Oslo University Hospital-Rikshospitalet, Oslo, 0372, Norway
| | - Shuo-Wang Qiao
- K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, Oslo, 0372, Norway
- Department of Immunology, University of Oslo and Oslo University Hospital-Rikshospitalet, Oslo, 0372, Norway
| | - Geir Kjetil Sandve
- Biomedical Informatics, Department of Informatics, University of Oslo, 0373, Oslo, Norway
- PharmaTox Strategic Research Initiative, Faculty of Mathematics and Natural Sciences, University of Oslo, 0373, Oslo, Norway
| |
Collapse
|
14
|
Varga JK, Diffley K, Welker Leng KR, Fierke CA, Schueler-Furman O. Structure-based prediction of HDAC6 substrates validated by enzymatic assay reveals determinants of promiscuity and detects new potential substrates. Sci Rep 2022; 12:1788. [PMID: 35110592 PMCID: PMC8810773 DOI: 10.1038/s41598-022-05681-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 01/17/2022] [Indexed: 01/25/2023] Open
Abstract
Histone deacetylases play important biological roles well beyond the deacetylation of histone tails. In particular, HDAC6 is involved in multiple cellular processes such as apoptosis, cytoskeleton reorganization, and protein folding, affecting substrates such as ɑ-tubulin, Hsp90 and cortactin proteins. We have applied a biochemical enzymatic assay to measure the activity of HDAC6 on a set of candidate unlabeled peptides. These served for the calibration of a structure-based substrate prediction protocol, Rosetta FlexPepBind, previously used for the successful substrate prediction of HDAC8 and other enzymes. A proteome-wide screen of reported acetylation sites using our calibrated protocol together with the enzymatic assay provide new peptide substrates and avenues to novel potential functional regulatory roles of this promiscuous, multi-faceted enzyme. In particular, we propose novel regulatory roles of HDAC6 in tumorigenesis and cancer cell survival via the regulation of EGFR/Akt pathway activation. The calibration process and comparison of the results between HDAC6 and HDAC8 highlight structural differences that explain the established promiscuity of HDAC6.
Collapse
Affiliation(s)
- Julia K Varga
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada (IMRIC), The Hebrew University of Jerusalem, Faculty of Medicine, POB 12272, 9112102, Jerusalem, Israel
| | - Kelsey Diffley
- Department of Chemistry, University of Michigan, 930 North University Avenue, Ann Arbor, MI, 48109, USA
| | - Katherine R Welker Leng
- Department of Chemistry, University of Michigan, 930 North University Avenue, Ann Arbor, MI, 48109, USA
| | - Carol A Fierke
- Department of Chemistry, University of Michigan, 930 North University Avenue, Ann Arbor, MI, 48109, USA
- Department of Biochemistry, Brandeis University, 415 South Street, Waltham, MA, 02453, USA
| | - Ora Schueler-Furman
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada (IMRIC), The Hebrew University of Jerusalem, Faculty of Medicine, POB 12272, 9112102, Jerusalem, Israel.
| |
Collapse
|
15
|
Hofer S, Hofstätter N, Punz B, Hasenkopf I, Johnson L, Himly M. Immunotoxicity of nanomaterials in health and disease: Current challenges and emerging approaches for identifying immune modifiers in susceptible populations. WILEY INTERDISCIPLINARY REVIEWS. NANOMEDICINE AND NANOBIOTECHNOLOGY 2022; 14:e1804. [PMID: 36416020 PMCID: PMC9787548 DOI: 10.1002/wnan.1804] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/24/2022] [Accepted: 03/30/2022] [Indexed: 11/24/2022]
Abstract
Nanosafety assessment has experienced an intense era of research during the past decades driven by a vivid interest of regulators, industry, and society. Toxicological assays based on in vitro cellular models have undergone an evolution from experimentation using nanoparticulate systems on singular epithelial cell models to employing advanced complex models more realistically mimicking the respective body barriers for analyzing their capacity to alter the immune state of exposed individuals. During this phase, a number of lessons were learned. We have thus arrived at a state where the next chapters have to be opened, pursuing the following objectives: (1) to elucidate underlying mechanisms, (2) to address effects on vulnerable groups, (3) to test material mixtures, and (4) to use realistic doses on (5) sophisticated models. Moreover, data reproducibility has become a significant demand. In this context, we studied the emerging concept of adverse outcome pathways (AOPs) from the perspective of immune activation and modulation resulting in pro-inflammatory versus tolerogenic responses. When considering the interaction of nanomaterials with biological systems, protein corona formation represents the relevant molecular initiating event (e.g., by potential alterations of nanomaterial-adsorbed proteins). Using this as an example, we illustrate how integrated experimental-computational workflows combining in vitro assays with in silico models aid in data enrichment and upon comprehensive ontology-annotated (meta)data upload to online repositories assure FAIRness (Findability, Accessibility, Interoperability, Reusability). Such digital twinning may, in future, assist in early-stage decision-making during therapeutic development, and hence, promote safe-by-design innovation in nanomedicine. Moreover, it may, in combination with in silico-based exposure-relevant dose-finding, serve for risk monitoring in particularly loaded areas, for example, workplaces, taking into account pre-existing health conditions. This article is categorized under: Toxicology and Regulatory Issues in Nanomedicine > Toxicology of Nanomaterials.
Collapse
Affiliation(s)
- Sabine Hofer
- Division of Allergy & Immunology, Department of Biosciences & Medical BiologyParis Lodron University of SalzburgSalzburgAustria
| | - Norbert Hofstätter
- Division of Allergy & Immunology, Department of Biosciences & Medical BiologyParis Lodron University of SalzburgSalzburgAustria
| | - Benjamin Punz
- Division of Allergy & Immunology, Department of Biosciences & Medical BiologyParis Lodron University of SalzburgSalzburgAustria
| | - Ingrid Hasenkopf
- Division of Allergy & Immunology, Department of Biosciences & Medical BiologyParis Lodron University of SalzburgSalzburgAustria
| | - Litty Johnson
- Division of Allergy & Immunology, Department of Biosciences & Medical BiologyParis Lodron University of SalzburgSalzburgAustria
| | - Martin Himly
- Division of Allergy & Immunology, Department of Biosciences & Medical BiologyParis Lodron University of SalzburgSalzburgAustria
| |
Collapse
|
16
|
Akbar R, Robert PA, Weber CR, Widrich M, Frank R, Pavlović M, Scheffer L, Chernigovskaya M, Snapkov I, Slabodkin A, Mehta BB, Miho E, Lund-Johansen F, Andersen JT, Hochreiter S, Hobæk Haff I, Klambauer G, Sandve GK, Greiff V. In silico proof of principle of machine learning-based antibody design at unconstrained scale. MAbs 2022; 14:2031482. [PMID: 35377271 PMCID: PMC8986205 DOI: 10.1080/19420862.2022.2031482] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 01/17/2022] [Indexed: 12/15/2022] Open
Abstract
Generative machine learning (ML) has been postulated to become a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody-binding parameters. The simulation framework enables the computation of synthetic antibody-antigen 3D-structures, and it functions as an oracle for unrestricted prospective evaluation and benchmarking of antibody design parameters of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (one dimensional: 1D) data can be used to design conformational (three dimensional: 3D) epitope-specific antibodies, matching, or exceeding the training dataset in affinity and developability parameter value variety. Furthermore, we established a lower threshold of sequence diversity necessary for high-accuracy generative antibody ML and demonstrated that this lower threshold also holds on experimental real-world data. Finally, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design.
Collapse
Affiliation(s)
- Rahmad Akbar
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Philippe A. Robert
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Cédric R. Weber
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Michael Widrich
- Ellis Unit Linz and Lit Ai Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
| | - Robert Frank
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | | | | | - Maria Chernigovskaya
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Igor Snapkov
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Andrei Slabodkin
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Brij Bhushan Mehta
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Enkelejda Miho
- Institute of Medical Engineering and Medical Informatics, School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Muttenz, Switzerland
| | - Fridtjof Lund-Johansen
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Jan Terje Andersen
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
- Institute of Clinical Medicine, Department of Pharmacology, University of Oslo, Oslo, Norway
| | - Sepp Hochreiter
- Ellis Unit Linz and Lit Ai Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
- Institute of Advanced Research in Artificial Intelligence (IARAI), Austria
| | | | - Günter Klambauer
- Ellis Unit Linz and Lit Ai Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
| | | | - Victor Greiff
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| |
Collapse
|