1
|
Li F, Qian X, Zhu X, Lai X, Zhang X, Wang J. TCRcost: a deep learning model utilizing TCR 3D structure for enhanced of TCR-peptide binding. Front Genet 2024; 15:1346784. [PMID: 39415981 PMCID: PMC11479912 DOI: 10.3389/fgene.2024.1346784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 09/05/2024] [Indexed: 10/19/2024] Open
Abstract
Introduction Predicting TCR-peptide binding is a complex and significant computational problem in systems immunology. During the past decade, a series of computational methods have been developed for better predicting TCR-peptide binding from amino acid sequences. However, the performance of sequence-based methods appears to have hit a bottleneck. Considering the 3D structures of TCR-peptide complexes, which provide much more information, could potentially lead to better prediction outcomes. Methods In this study, we developed TCRcost, a deep learning method, to predict TCR-peptide binding by incorporating 3D structures. TCRcost overcomes two significant challenges: acquiring a sufficient number of high-quality TCR-peptide structures and effectively extracting information from these structures for binding prediction. TCRcost corrects TCR 3D structures generated by protein structure tools, significantly extending the available datasets. The main and side chains of a TCR structure are separately corrected using a long short-term memory (LSTM) model. This approach prevents interference between the chains and accurately extracts interactions among both adjacent and global atoms. A 3D convolutional neural network (CNN) is designed to extract the atomic features relevant to TCR-peptide binding. The spatial features extracted by the 3DCNN are then processed through a fully connected layer to estimate the probability of TCR-peptide binding. Results Test results demonstrated that predicting TCR-peptide binding from 3D TCR structures is both efficient and highly accurate with an average accuracy of 0.974 on precise structures. Furthermore, the average accuracy on corrected structures was 0.762, significantly higher than the average accuracy of 0.375 on uncorrected original structures. Additionally, the average root mean square distance (RMSD) to precise structures was significantly reduced from 12.753 Å for predicted structures to 8.785 Å for corrected structures. Discussion Thus, utilizing structural information of TCR-peptide complexes is a promising approach to improve the accuracy of binding predictions.
Collapse
Affiliation(s)
- Fan Li
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Xinyang Qian
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Xiaoyan Zhu
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Xin Lai
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Xuanping Zhang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Jiayin Wang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| |
Collapse
|
2
|
Qian X, Yang G, Li F, Zhang X, Zhu X, Lai X, Xiao X, Wang T, Wang J. DeepLION2: deep multi-instance contrastive learning framework enhancing the prediction of cancer-associated T cell receptors by attention strategy on motifs. Front Immunol 2024; 15:1345586. [PMID: 38515756 PMCID: PMC10956474 DOI: 10.3389/fimmu.2024.1345586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 02/19/2024] [Indexed: 03/23/2024] Open
Abstract
Introduction T cell receptor (TCR) repertoires provide valuable insights into complex human diseases, including cancers. Recent advancements in immune sequencing technology have significantly improved our understanding of TCR repertoire. Some computational methods have been devised to identify cancer-associated TCRs and enable cancer detection using TCR sequencing data. However, the existing methods are often limited by their inadequate consideration of the correlations among TCRs within a repertoire, hindering the identification of crucial TCRs. Additionally, the sparsity of cancer-associated TCR distribution presents a challenge in accurate prediction. Methods To address these issues, we presented DeepLION2, an innovative deep multi-instance contrastive learning framework specifically designed to enhance cancer-associated TCR prediction. DeepLION2 leveraged content-based sparse self-attention, focusing on the top k related TCRs for each TCR, to effectively model inter-TCR correlations. Furthermore, it adopted a contrastive learning strategy for bootstrapping parameter updates of the attention matrix, preventing the model from fixating on non-cancer-associated TCRs. Results Extensive experimentation on diverse patient cohorts, encompassing over ten cancer types, demonstrated that DeepLION2 significantly outperformed current state-of-the-art methods in terms of accuracy, sensitivity, specificity, Matthews correlation coefficient, and area under the curve (AUC). Notably, DeepLION2 achieved impressive AUC values of 0.933, 0.880, and 0.763 on thyroid, lung, and gastrointestinal cancer cohorts, respectively. Furthermore, it effectively identified cancer-associated TCRs along with their key motifs, highlighting the amino acids that play a crucial role in TCR-peptide binding. Conclusion These compelling results underscore DeepLION2's potential for enhancing cancer detection and facilitating personalized cancer immunotherapy. DeepLION2 is publicly available on GitHub, at https://github.com/Bioinformatics7181/DeepLION2, for academic use only.
Collapse
Affiliation(s)
- Xinyang Qian
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Guang Yang
- Department of Clinical Oncology, The Second Affiliated Hospital of Air Force Medical University, Xi’an, China
| | - Fan Li
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Xuanping Zhang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Xiaoyan Zhu
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Xin Lai
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Xiao Xiao
- Genomics Institute, Geneplus-Shenzhen, Shenzhen, China
| | - Tao Wang
- Department of Thoracic Surgery, The Second Affiliated Hospital of Air Force Medical University, Xi’an, China
| | - Jiayin Wang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| |
Collapse
|
3
|
Chen C, Liu Y, Yao J, Wang K, Zhang M, Shi F, Tian Y, Gao L, Ying Y, Pan Q, Wang H, Wu J, Qi X, Wang Y, Xu D. Deep learning approaches for differentiating thyroid nodules with calcification: a two-center study. BMC Cancer 2023; 23:1139. [PMID: 37996814 PMCID: PMC10668439 DOI: 10.1186/s12885-023-11456-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 09/27/2023] [Indexed: 11/25/2023] Open
Abstract
BACKGROUND Calcification is a common phenomenon in both benign and malignant thyroid nodules. However, the clinical significance of calcification remains unclear. Therefore, we explored a more objective method for distinguishing between benign and malignant thyroid calcified nodules. METHODS This retrospective study, conducted at two centers, involved a total of 631 thyroid nodules, all of which were pathologically confirmed. Ultrasound image sets were employed for analysis. The primary evaluation index was the area under the receiver-operator characteristic curve (AUROC). We compared the diagnostic performance of deep learning (DL) methods with that of radiologists and determined whether DL could enhance the diagnostic capabilities of radiologists. RESULTS The Xception classification model exhibited the highest performance, achieving an AUROC of up to 0.970, followed by the DenseNet169 model, which attained an AUROC of up to 0.959. Notably, both DL models outperformed radiologists (P < 0.05). The success of the Xception model can be attributed to its incorporation of deep separable convolution, which effectively reduces the model's parameter count. This feature enables the model to capture features more effectively during the feature extraction process, resulting in superior performance, particularly when dealing with limited data. CONCLUSIONS This study conclusively demonstrated that DL outperformed radiologists in differentiating between benign and malignant calcified thyroid nodules. Additionally, the diagnostic capabilities of radiologists could be enhanced with the aid of DL.
Collapse
Affiliation(s)
- Chen Chen
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310022, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, Taizhou, 317502, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), Taizhou, 317502, China
| | - Yuanzhen Liu
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310022, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, Taizhou, 317502, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), Taizhou, 317502, China
| | - Jincao Yao
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310022, China
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, Hangzhou, 310022, China
- Key Laboratory of Head & Neck Cancer Translational Research of Zhejiang Province, Hangzhou, 310022, China
| | - Kai Wang
- Department of Ultrasound, The Affiliated Dongyang Hospital of Wenzhou Medical University, Dongyang, 317502, China
| | - Maoliang Zhang
- Department of Ultrasound, The Affiliated Dongyang Hospital of Wenzhou Medical University, Dongyang, 317502, China
| | - Fang Shi
- Capacity Building and Continuing Education Center of National Health Commission, Beijing, 100098, China
| | - Yuan Tian
- Capacity Building and Continuing Education Center of National Health Commission, Beijing, 100098, China
| | - Lu Gao
- Capacity Building and Continuing Education Center of National Health Commission, Beijing, 100098, China
| | - Yajun Ying
- Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), Taizhou, 317502, China
| | - Qianmeng Pan
- Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), Taizhou, 317502, China
| | - Hui Wang
- Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), Taizhou, 317502, China
| | - Jinxin Wu
- Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), Taizhou, 317502, China
| | - Xiaoqing Qi
- Department of Ultrasound, Hangzhou Ninth People's Hospital, Hangzhou, 311225, China
| | - Yifan Wang
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310022, China.
- Wenling Big Data and Artificial Intelligence Institute in Medicine, Taizhou, 317502, China.
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), Taizhou, 317502, China.
| | - Dong Xu
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310022, China.
- Wenling Big Data and Artificial Intelligence Institute in Medicine, Taizhou, 317502, China.
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), Taizhou, 317502, China.
| |
Collapse
|
4
|
Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences. BMC Bioinformatics 2022; 23:469. [PMID: 36348271 PMCID: PMC9644450 DOI: 10.1186/s12859-022-05012-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 10/26/2022] [Indexed: 11/11/2022] Open
Abstract
Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of the host immunity system in tumor biology. However, the one-to-many correspondence between a patient and multiple TCR sequences hinders researchers from simply adopting classical statistical/machine learning methods. There were recent attempts to model this type of data in the context of multiple instance learning (MIL). Despite the novel application of MIL to cancer detection using TCR sequences and the demonstrated adequate performance in several tumor types, there is still room for improvement, especially for certain cancer types. Furthermore, explainable neural network models are not fully investigated for this application. In this article, we propose multiple instance neural networks based on sparse attention (MINN-SA) to enhance the performance in cancer detection and explainability. The sparse attention structure drops out uninformative instances in each bag, achieving both interpretability and better predictive performance in combination with the skip connection. Our experiments show that MINN-SA yields the highest area under the ROC curve scores on average measured across 10 different types of cancers, compared to existing MIL approaches. Moreover, we observe from the estimated attentions that MINN-SA can identify the TCRs that are specific for tumor antigens in the same T cell repertoire.
Collapse
|
5
|
Xu Y, Qian X, Tong Y, Li F, Wang K, Zhang X, Liu T, Wang J. AttnTAP: A Dual-input Framework Incorporating the Attention Mechanism for Accurately Predicting TCR-peptide Binding. Front Genet 2022; 13:942491. [PMID: 36072653 PMCID: PMC9441555 DOI: 10.3389/fgene.2022.942491] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 06/28/2022] [Indexed: 11/30/2022] Open
Abstract
T-cell receptors (TCRs) are formed by random recombination of genomic precursor elements, some of which mediate the recognition of cancer-associated antigens. Due to the complicated process of T-cell immune response and limited biological empirical evidence, the practical strategy for identifying TCRs and their recognized peptides is the computational prediction from population and/or individual TCR repertoires. In recent years, several machine/deep learning-based approaches have been proposed for TCR-peptide binding prediction. However, the predictive performances of these methods can be further improved by overcoming several significant flaws in neural network design. The interrelationship between amino acids in TCRs is critical for TCR antigen recognition, which was not properly considered by the existing methods. They also did not pay more attention to the amino acids that play a significant role in antigen-binding specificity. Moreover, complex networks tended to increase the risk of overfitting and computational costs. In this study, we developed a dual-input deep learning framework, named AttnTAP, to improve the TCR-peptide binding prediction. It used the bi-directional long short-term memory model for robust feature extraction of TCR sequences, which considered the interrelationships between amino acids and their precursors and postcursors. We also introduced the attention mechanism to give amino acids different weights and pay more attention to the contributing ones. In addition, we used the multilayer perceptron model instead of complex networks to extract peptide features to reduce overfitting and computational costs. AttnTAP achieved high areas under the curves (AUCs) in TCR-peptide binding prediction on both balanced and unbalanced datasets (higher than 0.838 on McPAS-TCR and 0.908 on VDJdb). Furthermore, it had the highest average AUCs in TPP-I and TPP-II tasks compared with the other five popular models (TPP-I: 0.84 on McPAS-TCR and 0.894 on VDJdb; TPP-II: 0.837 on McPAS-TCR and 0.893 on VDJdb). In conclusion, AttnTAP is a reasonable and practical framework for predicting TCR-peptide binding, which can accelerate identifying neoantigens and activated T cells for immunotherapy to meet urgent clinical needs.
Collapse
Affiliation(s)
- Ying Xu
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Xinyang Qian
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Yao Tong
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Fan Li
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Ke Wang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
- Geneplus Beijing Institute, Beijing, China
| | - Xuanping Zhang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Tao Liu
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
- Geneplus Beijing Institute, Beijing, China
| | - Jiayin Wang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
- *Correspondence: Jiayin Wang,
| |
Collapse
|