1
|
Ostmeyer J, Cowell L, Christley S. Dynamic kernel matching for non-conforming data: A case study of T cell receptor datasets. PLoS One 2023; 18:e0265313. [PMID: 36881590 PMCID: PMC9990938 DOI: 10.1371/journal.pone.0265313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 03/01/2022] [Indexed: 03/08/2023] Open
Abstract
Most statistical classifiers are designed to find patterns in data where numbers fit into rows and columns, like in a spreadsheet, but many kinds of data do not conform to this structure. To uncover patterns in non-conforming data, we describe an approach for modifying established statistical classifiers to handle non-conforming data, which we call dynamic kernel matching (DKM). As examples of non-conforming data, we consider (i) a dataset of T-cell receptor (TCR) sequences labelled by disease antigen and (ii) a dataset of sequenced TCR repertoires labelled by patient cytomegalovirus (CMV) serostatus, anticipating that both datasets contain signatures for diagnosing disease. We successfully fit statistical classifiers augmented with DKM to both datasets and report the performance on holdout data using standard metrics and metrics allowing for indeterminant diagnoses. Finally, we identify the patterns used by our statistical classifiers to generate predictions and show that these patterns agree with observations from experimental studies.
Collapse
Affiliation(s)
- Jared Ostmeyer
- Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- * E-mail:
| | - Lindsay Cowell
- Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Scott Christley
- Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| |
Collapse
|
2
|
Chen Y, Ye Z, Zhang Y, Xie W, Chen Q, Lan C, Yang X, Zeng H, Zhu Y, Ma C, Tang H, Wang Q, Guan J, Chen S, Li F, Yang W, Yan H, Yu X, Zhang Z. A Deep Learning Model for Accurate Diagnosis of Infection Using Antibody Repertoires. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2022; 208:2675-2685. [PMID: 35606050 DOI: 10.4049/jimmunol.2200063] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 04/11/2022] [Indexed: 06/15/2023]
Abstract
The adaptive immune receptor repertoire consists of the entire set of an individual's BCRs and TCRs and is believed to contain a record of prior immune responses and the potential for future immunity. Analyses of TCR repertoires via deep learning (DL) methods have successfully diagnosed cancers and infectious diseases, including coronavirus disease 2019. However, few studies have used DL to analyze BCR repertoires. In this study, we collected IgG H chain Ab repertoires from 276 healthy control subjects and 326 patients with various infections. We then extracted a comprehensive feature set consisting of 10 subsets of repertoire-level features and 160 sequence-level features and tested whether these features can distinguish between infected individuals and healthy control subjects. Finally, we developed an ensemble DL model, namely, DL method for infection diagnosis (https://github.com/chenyuan0510/DeepID), and used this model to differentiate between the infected and healthy individuals. Four subsets of repertoire-level features and four sequence-level features were selected because of their excellent predictive performance. The DL method for infection diagnosis outperformed traditional machine learning methods in distinguishing between healthy and infected samples (area under the curve = 0.9883) and achieved a multiclassification accuracy of 0.9104. We also observed differences between the healthy and infected groups in V genes usage, clonal expansion, the complexity of reads within clone, the physical properties in the α region, and the local flexibility of the CDR3 amino acid sequence. Our results suggest that the Ab repertoire is a promising biomarker for the diagnosis of various infections.
Collapse
Affiliation(s)
- Yuan Chen
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Zhiming Ye
- Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- Division of Nephrology, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Yanfang Zhang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Wenxi Xie
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Qingyun Chen
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Chunhong Lan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Xiujia Yang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Huikun Zeng
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Yan Zhu
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Cuiyu Ma
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Haipei Tang
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Qilong Wang
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Junjie Guan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Sen Chen
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Fenxiang Li
- Department of Infectious Disease Control and Prevention, Center for Disease Control and Prevention of Southern Theatre Command, Guangzhou, China
| | - Wei Yang
- Department of Pathology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Huacheng Yan
- Department of Infectious Disease Control and Prevention, Center for Disease Control and Prevention of Southern Theatre Command, Guangzhou, China
| | - Xueqing Yu
- Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China;
- Division of Nephrology, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Zhenhai Zhang
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China;
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- State Key Laboratory of Organ Failure Research, Division of Nephrology, Southern Medical University, Guangzhou, China; and
- Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou, China
| |
Collapse
|
3
|
Ostmeyer J, Cowell L, Greenberg B, Christley S. Reconstituting T cell receptor selection in-silico. Genes Immun 2021; 22:187-193. [PMID: 34127826 DOI: 10.1038/s41435-021-00141-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 05/13/2021] [Accepted: 05/26/2021] [Indexed: 11/09/2022]
Abstract
Each T cell receptor (TCR) gene is created without regard for which substances (antigens) the receptor can recognize. T cell selection culls developing T cells when their TCRs (i) fail to recognize major histocompatibility complexes (MHCs) that act as antigen presenting platforms or (ii) recognize with high affinity self-antigens derived from healthy cells and tissue. While T cell selection has been thoroughly studied, little is known about which TCRs are retained or removed by this process. Therefore, we develop an approach using TCR gene sequencing and machine learning to identify patterns in TCR protein sequences influencing the outcome of T cell receptor selection. We verify the trained models classify TCRs from developing T cells as being before selection and TCRs from mature T cells as being after selection. Our approach may provide future avenues for studying the relationship between T cell selection and conditions like autoimmune diseases.
Collapse
Affiliation(s)
- Jared Ostmeyer
- Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USA.
| | - Lindsay Cowell
- Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USA
| | - Benjamin Greenberg
- Department of Neurology, UT Southwestern Medical Center, Dallas, TX, USA
| | - Scott Christley
- Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USA
| |
Collapse
|