1
|
Song K, Xu H, Shi Y, Zou X, Da LT, Hao J. Investigating TCR-pMHC interactions for TCRs without identified epitopes by constructing a computational pipeline. Int J Biol Macromol 2024; 282:136502. [PMID: 39423970 DOI: 10.1016/j.ijbiomac.2024.136502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 10/04/2024] [Accepted: 10/09/2024] [Indexed: 10/21/2024]
Abstract
The molecular mechanisms underlying epitope recognition by T cell receptors (TCRs) are critical for activating T cell immune responses and rationally designing TCR-based therapeutics. Single-cell sequencing techniques vastly boost the accumulation of TCR sequences, while the limitation of available TCR-pMHC structures hampers further investigations. In this study, we proposed a computational pipeline that incorporates structural information and single-cell sequencing data to investigate the epitope-recognition mechanisms for TCRs without identified epitopes. By antigen specificity clustering, we mapped the epitope sequences between epitope-known and epitope-unknown TCRs from COVID-19 patients. One reported SARS-CoV-2 epitope, NQKLIANQF (S919-927), was identified for a TCR expressed by 614 T cells (TCR-614). Epitope screening also identified a potential cross-reactive epitope, KLKTLVATA (NSP31790-1798), for a TCR expressed by 204 T cells (TCR-204). By molecular dynamics (MD) simulations, we revealed the detailed epitope-recognition mechanisms for both TCRs. The structural motifs responsible for epitope recognition revealed by the MD simulations are consistent with the sequential features recognized by the sequence-based clustering method. We hope that this strategy could facilitate the discovery and optimization of TCR-based therapeutics.
Collapse
Affiliation(s)
- Kaiyuan Song
- Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Honglin Xu
- School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Shi
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, 1954 Huashan Road, Shanghai 200030, China; Shanghai Key Laboratory of Psychotic Disorders, Brain Science and Technology Research Center, Shanghai Jiao Tong University, 1954 Huashan Road, Shanghai 200030, China
| | - Xin Zou
- Digital Diagnosis and Treatment Innovation Center for Cancer, Institute of Translational Medicine, Shanghai Jiao Tong University, Shanghai 200240, China; Ninth People's Hospital, Shanghai Key Laboratory of Stomatology and Shanghai Research Institute of Stomatology, National Clinical Research Center of Stomatology, Shanghai Jiao Tong University, School of Medicine, Shanghai 200011, China.
| | - Lin-Tai Da
- Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.
| | - Jie Hao
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai 200032, China.
| |
Collapse
|
2
|
Luo Z, Yu L, Xu Z, Liu K, Gu L. Comprehensive Review and Assessment of Computational Methods for Prediction of N6-Methyladenosine Sites. BIOLOGY 2024; 13:777. [PMID: 39452086 PMCID: PMC11504118 DOI: 10.3390/biology13100777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Revised: 09/19/2024] [Accepted: 09/23/2024] [Indexed: 10/26/2024]
Abstract
N6-methyladenosine (m6A) plays a crucial regulatory role in the control of cellular functions and gene expression. Recent advances in sequencing techniques for transcriptome-wide m6A mapping have accelerated the accumulation of m6A site information at a single-nucleotide level, providing more high-confidence training data to develop computational approaches for m6A site prediction. However, it is still a major challenge to precisely predict m6A sites using in silico approaches. To advance the computational support for m6A site identification, here, we curated 13 up-to-date benchmark datasets from nine different species (i.e., H. sapiens, M. musculus, Rat, S. cerevisiae, Zebrafish, A. thaliana, Pig, Rhesus, and Chimpanzee). This will assist the research community in conducting an unbiased evaluation of alternative approaches and support future research on m6A modification. We revisited 52 computational approaches published since 2015 for m6A site identification, including 30 traditional machine learning-based, 14 deep learning-based, and 8 ensemble learning-based methods. We comprehensively reviewed these computational approaches in terms of their training datasets, calculated features, computational methodologies, performance evaluation strategy, and webserver/software usability. Using these benchmark datasets, we benchmarked nine predictors with available online websites or stand-alone software and assessed their prediction performance. We found that deep learning and traditional machine learning approaches generally outperformed scoring function-based approaches. In summary, the curated benchmark dataset repository and the systematic assessment in this study serve to inform the design and implementation of state-of-the-art computational approaches for m6A identification and facilitate more rigorous comparisons of new methods in the future.
Collapse
Affiliation(s)
- Zhengtao Luo
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China;
- Anhui Provincial Key Laboratory of Smart Agriculture Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
| | - Liyi Yu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China; (L.Y.); (Z.X.)
| | - Zhaochun Xu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China; (L.Y.); (Z.X.)
- School for Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin 150076, China
| | - Kening Liu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China; (L.Y.); (Z.X.)
| | - Lichuan Gu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China;
- Anhui Provincial Key Laboratory of Smart Agriculture Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
| |
Collapse
|
3
|
Meynard-Piganeau B, Feinauer C, Weigt M, Walczak AM, Mora T. TULIP: A transformer-based unsupervised language model for interacting peptides and T cell receptors that generalizes to unseen epitopes. Proc Natl Acad Sci U S A 2024; 121:e2316401121. [PMID: 38838016 PMCID: PMC11181096 DOI: 10.1073/pnas.2316401121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 04/29/2024] [Indexed: 06/07/2024] Open
Abstract
The accurate prediction of binding between T cell receptors (TCR) and their cognate epitopes is key to understanding the adaptive immune response and developing immunotherapies. Current methods face two significant limitations: the shortage of comprehensive high-quality data and the bias introduced by the selection of the negative training data commonly used in the supervised learning approaches. We propose a method, Transformer-based Unsupervised Language model for Interacting Peptides and T cell receptors (TULIP), that addresses both limitations by leveraging incomplete data and unsupervised learning and using the transformer architecture of language models. Our model is flexible and integrates all possible data sources, regardless of their quality or completeness. We demonstrate the existence of a bias introduced by the sampling procedure used in previous supervised approaches, emphasizing the need for an unsupervised approach. TULIP recognizes the specific TCRs binding an epitope, performing well on unseen epitopes. Our model outperforms state-of-the-art models and offers a promising direction for the development of more accurate TCR epitope recognition models.
Collapse
Affiliation(s)
- Barthelemy Meynard-Piganeau
- Laboratory of Computational and Quantitative Biology, Institut de Biologie Paris Seine, CNRS, Sorbonne Université, Paris75005, France
- Department of Computing Sciences, Bocconi University, Milan20100, Italy
| | | | - Martin Weigt
- Laboratory of Computational and Quantitative Biology, Institut de Biologie Paris Seine, CNRS, Sorbonne Université, Paris75005, France
| | - Aleksandra M. Walczak
- Laboratoire de Physique de l’Ecole Normale Supérieure, Université Paris Sciences et Lettres, CNRS, Sorbonne Université, Université de Paris Cité, Paris75005, France
| | - Thierry Mora
- Laboratoire de Physique de l’Ecole Normale Supérieure, Université Paris Sciences et Lettres, CNRS, Sorbonne Université, Université de Paris Cité, Paris75005, France
| |
Collapse
|
4
|
Cai Y, Luo M, Yang W, Xu C, Wang P, Xue G, Jin X, Cheng R, Que J, Zhou W, Pang B, Xu S, Li Y, Jiang Q, Xu Z. The Deep Learning Framework iCanTCR Enables Early Cancer Detection Using the T-cell Receptor Repertoire in Peripheral Blood. Cancer Res 2024; 84:1915-1928. [PMID: 38536129 DOI: 10.1158/0008-5472.can-23-0860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 07/20/2023] [Accepted: 03/19/2024] [Indexed: 06/05/2024]
Abstract
T cells recognize tumor antigens and initiate an anticancer immune response in the very early stages of tumor development, and the antigen specificity of T cells is determined by the T-cell receptor (TCR). Therefore, monitoring changes in the TCR repertoire in peripheral blood may offer a strategy to detect various cancers at a relatively early stage. Here, we developed the deep learning framework iCanTCR to identify patients with cancer based on the TCR repertoire. The iCanTCR framework uses TCRβ sequences from an individual as an input and outputs the predicted cancer probability. The model was trained on over 2,000 publicly available TCR repertoires from 11 types of cancer and healthy controls. Analysis of several additional publicly available datasets validated the ability of iCanTCR to distinguish patients with cancer from noncancer individuals and demonstrated the capability of iCanTCR for the accurate classification of multiple cancers. Importantly, iCanTCR precisely identified individuals with early-stage cancer with an AUC of 86%. Altogether, this work provides a liquid biopsy approach to capture immune signals from peripheral blood for noninvasive cancer diagnosis. SIGNIFICANCE Development of a deep learning-based method for multicancer detection using the TCR repertoire in the peripheral blood establishes the potential of evaluating circulating immune signals for noninvasive early cancer detection.
Collapse
Affiliation(s)
- Yideng Cai
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Wenyi Yang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chang Xu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Pingping Wang
- School for Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, China
| | - Guangfu Xue
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiyun Jin
- School for Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, China
| | - Rui Cheng
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jinhao Que
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Boran Pang
- Center for Difficult and Complicated Abdominal Surgery, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China
| | - Shouping Xu
- Department of Breast Cancer, Harbin Medical University Cancer Hospital, Harbin, China
| | - Yu Li
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
- School for Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, China
| | - Zhaochun Xu
- School for Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, China
| |
Collapse
|
5
|
Yu Z, Jiang M, Lan X. HeteroTCR: A heterogeneous graph neural network-based method for predicting peptide-TCR interaction. Commun Biol 2024; 7:684. [PMID: 38834836 DOI: 10.1038/s42003-024-06380-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 05/23/2024] [Indexed: 06/06/2024] Open
Abstract
Identifying interactions between T-cell receptors (TCRs) and immunogenic peptides holds profound implications across diverse research domains and clinical scenarios. Unsupervised clustering models (UCMs) cannot predict peptide-TCR binding directly, while supervised predictive models (SPMs) often face challenges in identifying antigens previously unencountered by the immune system or possessing limited TCR binding repertoires. Therefore, we propose HeteroTCR, an SPM based on Heterogeneous Graph Neural Network (GNN), to accurately predict peptide-TCR binding probabilities. HeteroTCR captures within-type (TCR-TCR or peptide-peptide) similarity information and between-type (peptide-TCR) interaction insights for predictions on unseen peptides and TCRs, surpassing limitations of existing SPMs. Our evaluation shows HeteroTCR outperforms state-of-the-art models on independent datasets. Ablation studies and visual interpretation underscore the Heterogeneous GNN module's critical role in enhancing HeteroTCR's performance by capturing pivotal binding process features. We further demonstrate the robustness and reliability of HeteroTCR through validation using single-cell datasets, aligning with the expectation that pMHC-TCR complexes with higher predicted binding probabilities correspond to increased binding fractions.
Collapse
Affiliation(s)
- Zilan Yu
- School of Medicine, Tsinghua University, 100084, Beijing, China
- Centre for Life Sciences, Tsinghua University, 100084, Beijing, China
| | - Mengnan Jiang
- School of Medicine, Tsinghua University, 100084, Beijing, China
| | - Xun Lan
- School of Medicine, Tsinghua University, 100084, Beijing, China.
- Centre for Life Sciences, Tsinghua University, 100084, Beijing, China.
- Tsinghua-Peking Center for Life Sciences, MOE Key Laboratory of Tsinghua University, Beijing, China.
- MOE Key Laboratory of Bioinformatics, Tsinghua University, 100084, Beijing, China.
| |
Collapse
|
6
|
Machaca V, Goyzueta V, Cruz MG, Sejje E, Pilco LM, López J, Túpac Y. Transformers meets neoantigen detection: a systematic literature review. J Integr Bioinform 2024; 21:jib-2023-0043. [PMID: 38960869 PMCID: PMC11377031 DOI: 10.1515/jib-2023-0043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 03/20/2024] [Indexed: 07/05/2024] Open
Abstract
Cancer immunology offers a new alternative to traditional cancer treatments, such as radiotherapy and chemotherapy. One notable alternative is the development of personalized vaccines based on cancer neoantigens. Moreover, Transformers are considered a revolutionary development in artificial intelligence with a significant impact on natural language processing (NLP) tasks and have been utilized in proteomics studies in recent years. In this context, we conducted a systematic literature review to investigate how Transformers are applied in each stage of the neoantigen detection process. Additionally, we mapped current pipelines and examined the results of clinical trials involving cancer vaccines.
Collapse
Affiliation(s)
| | | | | | - Erika Sejje
- Universidad Nacional de San Agustín, Arequipa, Perú
| | | | | | - Yván Túpac
- 187038 Universidad Católica San Pablo , Arequipa, Perú
| |
Collapse
|
7
|
Nie Z, Gao M, Jin X, Rao Y, Zhang X. MFPINC: prediction of plant ncRNAs based on multi-source feature fusion. BMC Genomics 2024; 25:531. [PMID: 38816689 PMCID: PMC11137975 DOI: 10.1186/s12864-024-10439-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 05/21/2024] [Indexed: 06/01/2024] Open
Abstract
Non-coding RNAs (ncRNAs) are recognized as pivotal players in the regulation of essential physiological processes such as nutrient homeostasis, development, and stress responses in plants. Common methods for predicting ncRNAs are susceptible to significant effects of experimental conditions and computational methods, resulting in the need for significant investment of time and resources. Therefore, we constructed an ncRNA predictor(MFPINC), to predict potential ncRNA in plants which is based on the PINC tool proposed by our previous studies. Specifically, sequence features were carefully refined using variance thresholding and F-test methods, while deep features were extracted and feature fusion were performed by applying the GRU model. The comprehensive evaluation of multiple standard datasets shows that MFPINC not only achieves more comprehensive and accurate identification of gene sequences, but also significantly improves the expressive and generalization performance of the model, and MFPINC significantly outperforms the existing competing methods in ncRNA identification. In addition, it is worth mentioning that our tool can also be found on Github ( https://github.com/Zhenj-Nie/MFPINC ) the data and source code can also be downloaded for free.
Collapse
Affiliation(s)
- Zhenjun Nie
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Mengqing Gao
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Xiu Jin
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
- Key Laboratory of Agricultural Sensors, Ministry of Agriculture and Rural Affairs, Hefei, 230036, China
| | - Yuan Rao
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
- Key Laboratory of Agricultural Sensors, Ministry of Agriculture and Rural Affairs, Hefei, 230036, China
| | - Xiaodan Zhang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.
- Key Laboratory of Agricultural Sensors, Ministry of Agriculture and Rural Affairs, Hefei, 230036, China.
| |
Collapse
|
8
|
Bulashevska A, Nacsa Z, Lang F, Braun M, Machyna M, Diken M, Childs L, König R. Artificial intelligence and neoantigens: paving the path for precision cancer immunotherapy. Front Immunol 2024; 15:1394003. [PMID: 38868767 PMCID: PMC11167095 DOI: 10.3389/fimmu.2024.1394003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/13/2024] [Indexed: 06/14/2024] Open
Abstract
Cancer immunotherapy has witnessed rapid advancement in recent years, with a particular focus on neoantigens as promising targets for personalized treatments. The convergence of immunogenomics, bioinformatics, and artificial intelligence (AI) has propelled the development of innovative neoantigen discovery tools and pipelines. These tools have revolutionized our ability to identify tumor-specific antigens, providing the foundation for precision cancer immunotherapy. AI-driven algorithms can process extensive amounts of data, identify patterns, and make predictions that were once challenging to achieve. However, the integration of AI comes with its own set of challenges, leaving space for further research. With particular focus on the computational approaches, in this article we have explored the current landscape of neoantigen prediction, the fundamental concepts behind, the challenges and their potential solutions providing a comprehensive overview of this rapidly evolving field.
Collapse
Affiliation(s)
- Alla Bulashevska
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Zsófia Nacsa
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Franziska Lang
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, Mainz, Germany
| | - Markus Braun
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Martin Machyna
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Mustafa Diken
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, Mainz, Germany
| | - Liam Childs
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Renate König
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| |
Collapse
|
9
|
Leary AY, Scott D, Gupta NT, Waite JC, Skokos D, Atwal GS, Hawkins PG. Designing meaningful continuous representations of T cell receptor sequences with deep generative models. Nat Commun 2024; 15:4271. [PMID: 38769289 PMCID: PMC11106309 DOI: 10.1038/s41467-024-48198-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 04/24/2024] [Indexed: 05/22/2024] Open
Abstract
T Cell Receptor (TCR) antigen binding underlies a key mechanism of the adaptive immune response yet the vast diversity of TCRs and the complexity of protein interactions limits our ability to build useful low dimensional representations of TCRs. To address the current limitations in TCR analysis we develop a capacity-controlled disentangling variational autoencoder trained using a dataset of approximately 100 million TCR sequences, that we name TCR-VALID. We design TCR-VALID such that the model representations are low-dimensional, continuous, disentangled, and sufficiently informative to provide high-quality TCR sequence de novo generation. We thoroughly quantify these properties of the representations, providing a framework for future protein representation learning in low dimensions. The continuity of TCR-VALID representations allows fast and accurate TCR clustering and is benchmarked against other state-of-the-art TCR clustering tools and pre-trained language models.
Collapse
Affiliation(s)
- Allen Y Leary
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA.
| | - Darius Scott
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Namita T Gupta
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Janelle C Waite
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Dimitris Skokos
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Gurinder S Atwal
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Peter G Hawkins
- Regeneron Pharmaceuticals Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA.
| |
Collapse
|
10
|
Zhu X, Ma E, Ning K, Feng X, Quan W, Wang F, Zhu C, Ma Y, Dong Y, Jiang Q. A comparative analysis of TCR immune repertoire in COVID-19 patients. Hum Immunol 2024; 85:110795. [PMID: 38582657 DOI: 10.1016/j.humimm.2024.110795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 03/22/2024] [Accepted: 03/25/2024] [Indexed: 04/08/2024]
Abstract
The coronavirus disease 2019 (COVID-19) has merged as a global health threat since its outbreak in December 2019. Despite widespread recognition, there has been a paucity of studies focusing on the T cell receptor (TCR) bias in adaptive immunity induced by SARS-CoV-2. This research conducted a comparative analysis of the TCR immune repertoire to identify notable αβ TCR bias sequences associated with the SARS-CoV-2 virus antigen. The present study encompassed 73 symptomatic COVID-19 patients, categorized as moderate/mild or severe/critical, along with 9 healthy controls. Our findings revealed specific TCR chains prominently utilized by moderate and severe patients, identified as TRAV30-J34-TRBV3-1-J2-7 and TRAV12-3-J6-TRBV28-J1-1, respectively. Additionally, our research explored critical TCR preferences in the bronchoalveolar lavage fluid (BALF) of COVID-19 patients at various disease stages. Indeed, monitoring the dynamics of immune repertoire changes in COVID-19 patients could serve as a crucial biomarker for predicting disease progression and recovery. Furthermore, the study explored TCR bias in both peripheral blood mononuclear cells (PBMCs) and BALF. The most common αβ VJ pair observed in BALF was TRAV12-3-J18-TRBV7-6-J2-7. In addition, a comparative analysis with the VDJdb database indicated that the HLA-A*02:01 allele exhibited the widest distribution and highest frequency in COVID-19 patients across different periods. This comprehensive examination provided a global characterization of the TCR immune repertoire in COVID-19 patients, contributing significantly to our understanding of TCR bias induced by SARS-CoV-2.
Collapse
MESH Headings
- Humans
- COVID-19/immunology
- SARS-CoV-2/immunology
- Male
- Female
- Middle Aged
- Receptors, Antigen, T-Cell, alpha-beta/genetics
- Receptors, Antigen, T-Cell, alpha-beta/immunology
- Receptors, Antigen, T-Cell, alpha-beta/metabolism
- Adult
- Bronchoalveolar Lavage Fluid/immunology
- Aged
- Receptors, Antigen, T-Cell/immunology
- Receptors, Antigen, T-Cell/genetics
- Receptors, Antigen, T-Cell/metabolism
- Adaptive Immunity/immunology
- Severity of Illness Index
Collapse
Affiliation(s)
- Xiao Zhu
- School of Computer and Control Engineering, Yantai University, Yantai, Shandong, China; Lead Contact.
| | - Enze Ma
- School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China
| | - Ke Ning
- School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China
| | - Xiangyan Feng
- Department of Hematology, Yantai Yuhuangding Hospital Affiliated to Qingdao University, Yantai, Shandong, China.
| | - Wei Quan
- School of Computer and Control Engineering, Yantai University, Yantai, Shandong, China
| | - Fei Wang
- School of Computer and Control Engineering, Yantai University, Yantai, Shandong, China
| | - Chaoqun Zhu
- School of Computer and Control Engineering, Yantai University, Yantai, Shandong, China
| | - Yuanjun Ma
- School of Computer and Control Engineering, Yantai University, Yantai, Shandong, China
| | - Yucui Dong
- Department of Immunology, Binzhou Medical University, Yantai, Shandong, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| |
Collapse
|
11
|
Eskandari A, Leow TC, Rahman MBA, Oslan SN. Advances in Therapeutic Cancer Vaccines, Their Obstacles, and Prospects Toward Tumor Immunotherapy. Mol Biotechnol 2024:10.1007/s12033-024-01144-3. [PMID: 38625508 DOI: 10.1007/s12033-024-01144-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 03/15/2024] [Indexed: 04/17/2024]
Abstract
Over the past few decades, cancer immunotherapy has experienced a significant revolution due to the advancements in immune checkpoint inhibitors (ICIs) and adoptive cell therapies (ACTs), along with their regulatory approvals. In recent times, there has been hope in the effectiveness of cancer vaccines for therapy as they have been able to stimulate de novo T-cell reactions against tumor antigens. These tumor antigens include both tumor-associated antigen (TAA) and tumor-specific antigen (TSA). Nevertheless, the constant quest to fully achieve these abilities persists. Therefore, this review offers a broad perspective on the existing status of cancer immunizations. Cancer vaccine design has been revolutionized due to the advancements made in antigen selection, the development of antigen delivery systems, and a deeper understanding of the strategic intricacies involved in effective antigen presentation. In addition, this review addresses the present condition of clinical tests and deliberates on their approaches, with a particular emphasis on the immunogenicity specific to tumors and the evaluation of effectiveness against tumors. Nevertheless, the ongoing clinical endeavors to create cancer vaccines have failed to produce remarkable clinical results as a result of substantial obstacles, such as the suppression of the tumor immune microenvironment, the identification of suitable candidates, the assessment of immune responses, and the acceleration of vaccine production. Hence, there are possibilities for the industry to overcome challenges and enhance patient results in the coming years. This can be achieved by recognizing the intricate nature of clinical issues and continuously working toward surpassing existing limitations.
Collapse
Affiliation(s)
- Azadeh Eskandari
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia.
- Department of Biochemistry, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia.
| | - Thean Chor Leow
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Enzyme Technology and X-ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| | | | - Siti Nurbaya Oslan
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Department of Biochemistry, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Enzyme Technology and X-ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| |
Collapse
|
12
|
Li Y, Wu X, Fang D, Luo Y. Informing immunotherapy with multi-omics driven machine learning. NPJ Digit Med 2024; 7:67. [PMID: 38486092 PMCID: PMC10940614 DOI: 10.1038/s41746-024-01043-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 02/14/2024] [Indexed: 03/18/2024] Open
Abstract
Progress in sequencing technologies and clinical experiments has revolutionized immunotherapy on solid and hematologic malignancies. However, the benefits of immunotherapy are limited to specific patient subsets, posing challenges for broader application. To improve its effectiveness, identifying biomarkers that can predict patient response is crucial. Machine learning (ML) play a pivotal role in harnessing multi-omic cancer datasets and unlocking new insights into immunotherapy. This review provides an overview of cutting-edge ML models applied in omics data for immunotherapy analysis, including immunotherapy response prediction and immunotherapy-relevant tumor microenvironment identification. We elucidate how ML leverages diverse data types to identify significant biomarkers, enhance our understanding of immunotherapy mechanisms, and optimize decision-making process. Additionally, we discuss current limitations and challenges of ML in this rapidly evolving field. Finally, we outline future directions aimed at overcoming these barriers and improving the efficiency of ML in immunotherapy research.
Collapse
Affiliation(s)
- Yawei Li
- Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL, 60611, USA
- Center for Collaborative AI in Healthcare, Northwestern University, Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Xin Wu
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, 60612, USA
| | - Deyu Fang
- Department of Pathology, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL, 60611, USA.
- Center for Collaborative AI in Healthcare, Northwestern University, Feinberg School of Medicine, Chicago, IL, 60611, USA.
| |
Collapse
|
13
|
Bravi B. Development and use of machine learning algorithms in vaccine target selection. NPJ Vaccines 2024; 9:15. [PMID: 38242890 PMCID: PMC10798987 DOI: 10.1038/s41541-023-00795-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 12/07/2023] [Indexed: 01/21/2024] Open
Abstract
Computer-aided discovery of vaccine targets has become a cornerstone of rational vaccine design. In this article, I discuss how Machine Learning (ML) can inform and guide key computational steps in rational vaccine design concerned with the identification of B and T cell epitopes and correlates of protection. I provide examples of ML models, as well as types of data and predictions for which they are built. I argue that interpretable ML has the potential to improve the identification of immunogens also as a tool for scientific discovery, by helping elucidate the molecular processes underlying vaccine-induced immune responses. I outline the limitations and challenges in terms of data availability and method development that need to be addressed to bridge the gap between advances in ML predictions and their translational application to vaccine design.
Collapse
Affiliation(s)
- Barbara Bravi
- Department of Mathematics, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
14
|
Chen J, Zhao B, Lin S, Sun H, Mao X, Wang M, Chu Y, Hong L, Wei D, Li M, Xiong Y. TEPCAM: Prediction of T-cell receptor-epitope binding specificity via interpretable deep learning. Protein Sci 2024; 33:e4841. [PMID: 37983648 PMCID: PMC10731497 DOI: 10.1002/pro.4841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/11/2023] [Accepted: 11/16/2023] [Indexed: 11/22/2023]
Abstract
The recognition of T-cell receptor (TCR) on the surface of T cell to specific epitope presented by the major histocompatibility complex is the key to trigger the immune response. Identifying the binding rules of TCR-epitope pair is crucial for developing immunotherapies, including neoantigen vaccine and drugs. Accurate prediction of TCR-epitope binding specificity via deep learning remains challenging, especially in test cases which are unseen in the training set. Here, we propose TEPCAM (TCR-EPitope identification based on Cross-Attention and Multi-channel convolution), a deep learning model that incorporates self-attention, cross-attention mechanism, and multi-channel convolution to improve the generalizability and enhance the model interpretability. Experimental results demonstrate that our model outperformed several state-of-the-art models on two challenging tasks including a strictly split dataset and an external dataset. Furthermore, the model can learn some interaction patterns between TCR and epitope by extracting the interpretable matrix from cross-attention layer and mapping them to the three-dimensional structures. The source code and data are freely available at https://github.com/Chenjw99/TEPCAM.
Collapse
Affiliation(s)
- Junwei Chen
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Bowen Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Heqi Sun
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Meng Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and EngineeringCentral South UniversityChangshaChina
| | - Yanyi Chu
- Department of PathologyStanford University School of MedicineStandfordCaliforniaUSA
| | - Liang Hong
- Institute of Natural Sciences, Shanghai Jiao Tong UniversityShanghaiChina
- Artificial Intelligence Biomedical Center, Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong UniversityShanghaiChina
| | - Dong‐Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and EngineeringCentral South UniversityChangshaChina
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
- Artificial Intelligence Biomedical Center, Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong UniversityShanghaiChina
| |
Collapse
|
15
|
Koyama K, Hashimoto K, Nagao C, Mizuguchi K. Attention network for predicting T-cell receptor-peptide binding can associate attention with interpretable protein structural properties. FRONTIERS IN BIOINFORMATICS 2023; 3:1274599. [PMID: 38170146 PMCID: PMC10759225 DOI: 10.3389/fbinf.2023.1274599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/27/2023] [Indexed: 01/05/2024] Open
Abstract
Understanding how a T-cell receptor (TCR) recognizes its specific ligand peptide is crucial for gaining an insight into biological functions and disease mechanisms. Despite its importance, experimentally determining TCR-peptide-major histocompatibility complex (TCR-pMHC) interactions is expensive and time-consuming. To address this challenge, computational methods have been proposed, but they are typically evaluated by internal retrospective validation only, and few researchers have incorporated and tested an attention layer from language models into structural information. Therefore, in this study, we developed a machine learning model based on a modified version of Transformer, a source-target attention neural network, to predict the TCR-pMHC interaction solely from the amino acid sequences of the TCR complementarity-determining region (CDR) 3 and the peptide. This model achieved competitive performance on a benchmark dataset of the TCR-pMHC interaction, as well as on a truly new external dataset. Additionally, by analyzing the results of binding predictions, we associated the neural network weights with protein structural properties. By classifying the residues into large- and small-attention groups, we identified statistically significant properties associated with the largely attended residues such as hydrogen bonds within CDR3. The dataset that we created and the ability of our model to provide an interpretable prediction of TCR-peptide binding should increase our knowledge about molecular recognition and pave the way for designing new therapeutics.
Collapse
Affiliation(s)
- Kyohei Koyama
- Laboratory for Computational Biology, Institute for Protein Research, Osaka University, Osaka, Japan
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
| | - Kosuke Hashimoto
- Laboratory for Computational Biology, Institute for Protein Research, Osaka University, Osaka, Japan
| | - Chioko Nagao
- Laboratory for Computational Biology, Institute for Protein Research, Osaka University, Osaka, Japan
| | - Kenji Mizuguchi
- Laboratory for Computational Biology, Institute for Protein Research, Osaka University, Osaka, Japan
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
- Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
| |
Collapse
|
16
|
Fan T, Zhang M, Yang J, Zhu Z, Cao W, Dong C. Therapeutic cancer vaccines: advancements, challenges, and prospects. Signal Transduct Target Ther 2023; 8:450. [PMID: 38086815 PMCID: PMC10716479 DOI: 10.1038/s41392-023-01674-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 09/08/2023] [Accepted: 09/19/2023] [Indexed: 12/18/2023] Open
Abstract
With the development and regulatory approval of immune checkpoint inhibitors and adoptive cell therapies, cancer immunotherapy has undergone a profound transformation over the past decades. Recently, therapeutic cancer vaccines have shown promise by eliciting de novo T cell responses targeting tumor antigens, including tumor-associated antigens and tumor-specific antigens. The objective was to amplify and diversify the intrinsic repertoire of tumor-specific T cells. However, the complete realization of these capabilities remains an ongoing pursuit. Therefore, we provide an overview of the current landscape of cancer vaccines in this review. The range of antigen selection, antigen delivery systems development the strategic nuances underlying effective antigen presentation have pioneered cancer vaccine design. Furthermore, this review addresses the current status of clinical trials and discusses their strategies, focusing on tumor-specific immunogenicity and anti-tumor efficacy assessment. However, current clinical attempts toward developing cancer vaccines have not yielded breakthrough clinical outcomes due to significant challenges, including tumor immune microenvironment suppression, optimal candidate identification, immune response evaluation, and vaccine manufacturing acceleration. Therefore, the field is poised to overcome hurdles and improve patient outcomes in the future by acknowledging these clinical complexities and persistently striving to surmount inherent constraints.
Collapse
Affiliation(s)
- Ting Fan
- Department of Oncology, East Hospital Affiliated to Tongji University, Tongji University School of Medicine, Shanghai, China
| | - Mingna Zhang
- Postgraduate Training Base, Shanghai East Hospital, Jinzhou Medical University, Shanghai, 200120, China
| | - Jingxian Yang
- Department of Oncology, East Hospital Affiliated to Tongji University, Tongji University School of Medicine, Shanghai, China
| | - Zhounan Zhu
- Department of Oncology, East Hospital Affiliated to Tongji University, Tongji University School of Medicine, Shanghai, China
| | - Wanlu Cao
- Department of Oncology, East Hospital Affiliated to Tongji University, Tongji University School of Medicine, Shanghai, China.
| | - Chunyan Dong
- Department of Oncology, East Hospital Affiliated to Tongji University, Tongji University School of Medicine, Shanghai, China.
| |
Collapse
|
17
|
Zhao M, Xu SX, Yang Y, Yuan M. GGNpTCR: A Generative Graph Structure Neural Network for Predicting Immunogenic Peptides for T-cell Immune Response. J Chem Inf Model 2023; 63:7557-7567. [PMID: 37990917 DOI: 10.1021/acs.jcim.3c01293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Identifying the interactions between T-cell receptor (TCRs) and human antigens is a crucial step in developing new vaccines, diagnostics, and immunotherapy. Current methods primarily focus on learning binding patterns from known TCR binding repertoires by using sequence information alone without considering the binding specificity of new antigens or exogenous peptides that have not appeared in the training set. Furthermore, the spatial structure of antigens plays a critical role in immune studies and immunotherapy, which should be addressed properly in the identification of interacting TCR-antigen pairs. In this study, we introduced a novel deep learning framework based on generative graph structures, GGNpTCR, for predicting interactions between TCR and peptides from sequence information. Results of real data analysis indicate that our model achieved excellent prediction for new antigens unseen in the training data set, making significant improvements compared to existing methods. We also applied the model to a large COVID-19 data set with no antigens in the training data set, and the improvement was also significant. Furthermore, through incorporation of additional supervised mechanisms, GGNpTCR demonstrated the ability to precisely forecast the locations of peptide-TCR interactions within 3D configurations. This enhancement substantially improved the model's interpretability. In summary, based on the performance on multiple data sets, GGNpTCR has made significant progress in terms of performance, universality, and interpretability.
Collapse
Affiliation(s)
- Minghua Zhao
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Steven X Xu
- Genmab US, Inc., Princeton, New Jersey 08540, United States
| | - Yaning Yang
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Min Yuan
- School of Public Health Administration, Anhui Medical University, Hefei 230032, China
| |
Collapse
|
18
|
Pang Z, Lu MM, Zhang Y, Gao Y, Bai JJ, Gu JY, Xie L, Wu WZ. Neoantigen-targeted TCR-engineered T cell immunotherapy: current advances and challenges. Biomark Res 2023; 11:104. [PMID: 38037114 PMCID: PMC10690996 DOI: 10.1186/s40364-023-00534-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 10/22/2023] [Indexed: 12/02/2023] Open
Abstract
Adoptive cell therapy using T cell receptor-engineered T cells (TCR-T) is a promising approach for cancer therapy with an expectation of no significant side effects. In the human body, mature T cells are armed with an incredible diversity of T cell receptors (TCRs) that theoretically react to the variety of random mutations generated by tumor cells. The outcomes, however, of current clinical trials using TCR-T cell therapies are not very successful especially involving solid tumors. The therapy still faces numerous challenges in the efficient screening of tumor-specific antigens and their cognate TCRs. In this review, we first introduce TCR structure-based antigen recognition and signaling, then describe recent advances in neoantigens and their specific TCR screening technologies, and finally summarize ongoing clinical trials of TCR-T therapies against neoantigens. More importantly, we also present the current challenges of TCR-T cell-based immunotherapies, e.g., the safety of viral vectors, the mismatch of T cell receptor, the impediment of suppressive tumor microenvironment. Finally, we highlight new insights and directions for personalized TCR-T therapy.
Collapse
Affiliation(s)
- Zhi Pang
- Liver Cancer Institute, Key Laboratory of Carcinogenesis and Cancer Invasion, Ministry of Education, Zhongshan Hospital, Fudan University, Shanghai, 200032, China
- Clinical Center for Biotherapy, Zhongshan Hospital, Fudan University, Shanghai, 200032, China
| | - Man-Man Lu
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, 200237, China
| | - Yu Zhang
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, 200237, China
| | - Yuan Gao
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, 200237, China
| | - Jin-Jin Bai
- Liver Cancer Institute, Key Laboratory of Carcinogenesis and Cancer Invasion, Ministry of Education, Zhongshan Hospital, Fudan University, Shanghai, 200032, China
- Clinical Center for Biotherapy, Zhongshan Hospital, Fudan University, Shanghai, 200032, China
| | - Jian-Ying Gu
- Clinical Center for Biotherapy, Zhongshan Hospital, Fudan University, Shanghai, 200032, China
| | - Lu Xie
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, 200237, China.
| | - Wei-Zhong Wu
- Liver Cancer Institute, Key Laboratory of Carcinogenesis and Cancer Invasion, Ministry of Education, Zhongshan Hospital, Fudan University, Shanghai, 200032, China.
- Clinical Center for Biotherapy, Zhongshan Hospital, Fudan University, Shanghai, 200032, China.
| |
Collapse
|
19
|
Zhang J, Ma W, Yao H. Accurate TCR-pMHC interaction prediction using a BERT-based transfer learning method. Brief Bioinform 2023; 25:bbad436. [PMID: 38040492 PMCID: PMC10783865 DOI: 10.1093/bib/bbad436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 10/05/2023] [Accepted: 11/06/2023] [Indexed: 12/03/2023] Open
Abstract
Accurate prediction of TCR-pMHC binding is important for the development of cancer immunotherapies, especially TCR-based agents. Existing algorithms often experience diminished performance when dealing with unseen epitopes, primarily due to the complexity in TCR-pMHC recognition patterns and the scarcity of available data for training. We have developed a novel deep learning model, 'TCR Antigen Binding Recognition' based on BERT, named as TABR-BERT. Leveraging BERT's potent representation learning capabilities, TABR-BERT effectively captures essential information regarding TCR-pMHC interactions from TCR sequences, antigen epitope sequences and epitope-MHC binding. By transferring this knowledge to predict TCR-pMHC recognition, TABR-BERT demonstrated better results in benchmark tests than existing methods, particularly for unseen epitopes.
Collapse
Affiliation(s)
- Jiawei Zhang
- Fresh Wind Biotechnologies Inc. (Tianjin), Tianjin, China
| | - Wang Ma
- Fresh Wind Biotechnologies Inc. (Tianjin), Tianjin, China
| | - Hui Yao
- Fresh Wind Biotechnologies USA Inc., Houston, TX, USA
| |
Collapse
|
20
|
Li B, Jing P, Zheng G, Pi C, Zhang L, Yin Z, Xu L, Qiu J, Gu H, Qiu T, Fang J. Neo-intline: integrated pipeline enables neoantigen design through the in-silico presentation of T-cell epitope. Signal Transduct Target Ther 2023; 8:397. [PMID: 37848417 PMCID: PMC10582007 DOI: 10.1038/s41392-023-01644-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 08/22/2023] [Accepted: 09/14/2023] [Indexed: 10/19/2023] Open
Abstract
Neoantigen vaccines are one of the most effective immunotherapies for personalized tumour treatment. The current immunogen design of neoantigen vaccines is usually based on whole-genome sequencing (WGS) and bioinformatics prediction that focuses on the prediction of binding affinity between peptide and MHC molecules, ignoring other peptide-presenting related steps. This may result in a gap between high prediction accuracy and relatively low clinical effectiveness. In this study, we designed an integrated in-silico pipeline, Neo-intline, which started from the SNPs and indels of the tumour samples to simulate the presentation process of peptides in-vivo through an integrated calculation model. Validation on the benchmark dataset of TESLA and clinically validated neoantigens illustrated that neo-intline could outperform current state-of-the-art tools on both sample level and melanoma level. Furthermore, by taking the mouse melanoma model as an example, we verified the effectiveness of 20 neoantigens, including 10 MHC-I and 10 MHC-II peptides. The in-vitro and in-vivo experiments showed that both peptides predicted by Neo-intline could recruit corresponding CD4+ T cells and CD8+ T cells to induce a T-cell-mediated cellular immune response. Moreover, although the therapeutic effect of neoantigen vaccines alone is not sufficient, combinations with other specific therapies, such as broad-spectrum immune-enhanced adjuvants of granulocyte-macrophage colony-stimulating factor (GM-CSF) and polyinosinic-polycytidylic acid (poly(I:C)), or immune checkpoint inhibitors, such as PD-1/PD-L1 antibodies, can illustrate significant anticancer effects on melanoma. Neo-intline can be used as a benchmark process for the design and screening of immunogenic targets for neoantigen vaccines.
Collapse
Affiliation(s)
- Bingyu Li
- Laboratory of Molecular Medicine, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and Technology, Tongji Hospital, Tongji University Suzhou Institute, Tongji University, Shanghai, China
- School of Basic Medical Sciences, Henan University of Science and Technology, Luoyang, Henan, China
| | - Ping Jing
- Laboratory of Molecular Medicine, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and Technology, Tongji Hospital, Tongji University Suzhou Institute, Tongji University, Shanghai, China
| | - Genhui Zheng
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai, China
- Oden Institute for Computational Engineering and Sciences (ICES), University of Texas at Austin, Austin, TX, USA
| | - Chenyu Pi
- Laboratory of Molecular Medicine, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and Technology, Tongji Hospital, Tongji University Suzhou Institute, Tongji University, Shanghai, China
| | - Lu Zhang
- Laboratory of Molecular Medicine, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and Technology, Tongji Hospital, Tongji University Suzhou Institute, Tongji University, Shanghai, China
| | - Zuojing Yin
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Lijun Xu
- Laboratory of Molecular Medicine, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and Technology, Tongji Hospital, Tongji University Suzhou Institute, Tongji University, Shanghai, China
- School of Basic Medical Sciences, Henan University of Science and Technology, Luoyang, Henan, China
| | - Jingxuan Qiu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Hua Gu
- Laboratory of Molecular Medicine, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and Technology, Tongji Hospital, Tongji University Suzhou Institute, Tongji University, Shanghai, China
| | - Tianyi Qiu
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai, China.
- Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, 200032, China.
| | - Jianmin Fang
- Laboratory of Molecular Medicine, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and Technology, Tongji Hospital, Tongji University Suzhou Institute, Tongji University, Shanghai, China.
| |
Collapse
|
21
|
Myronov A, Mazzocco G, Król P, Plewczynski D. BERTrand-peptide:TCR binding prediction using Bidirectional Encoder Representations from Transformers augmented with random TCR pairing. Bioinformatics 2023; 39:btad468. [PMID: 37535685 PMCID: PMC10444968 DOI: 10.1093/bioinformatics/btad468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 06/28/2023] [Accepted: 08/01/2023] [Indexed: 08/05/2023] Open
Abstract
MOTIVATION The advent of T-cell receptor (TCR) sequencing experiments allowed for a significant increase in the amount of peptide:TCR binding data available and a number of machine-learning models appeared in recent years. High-quality prediction models for a fixed epitope sequence are feasible, provided enough known binding TCR sequences are available. However, their performance drops significantly for previously unseen peptides. RESULTS We prepare the dataset of known peptide:TCR binders and augment it with negative decoys created using healthy donors' T-cell repertoires. We employ deep learning methods commonly applied in Natural Language Processing to train part a peptide:TCR binding model with a degree of cross-peptide generalization (0.69 AUROC). We demonstrate that BERTrand outperforms the published methods when evaluated on peptide sequences not used during model training. AVAILABILITY AND IMPLEMENTATION The datasets and the code for model training are available at https://github.com/SFGLab/bertrand.
Collapse
Affiliation(s)
- Alexander Myronov
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Ardigen, Krakow, Poland
| | | | | | - Dariusz Plewczynski
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| |
Collapse
|
22
|
Zhang Y, Jian X, Xu L, Zhao J, Lu M, Lin Y, Xie L. iTCep: a deep learning framework for identification of T cell epitopes by harnessing fusion features. Front Genet 2023; 14:1141535. [PMID: 37229205 PMCID: PMC10203616 DOI: 10.3389/fgene.2023.1141535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 04/20/2023] [Indexed: 05/27/2023] Open
Abstract
Neoantigens recognized by cytotoxic T cells are effective targets for tumor-specific immune responses for personalized cancer immunotherapy. Quite a few neoantigen identification pipelines and computational strategies have been developed to improve the accuracy of the peptide selection process. However, these methods mainly consider the neoantigen end and ignore the interaction between peptide-TCR and the preference of each residue in TCRs, resulting in the filtered peptides often fail to truly elicit an immune response. Here, we propose a novel encoding approach for peptide-TCR representation. Subsequently, a deep learning framework, namely iTCep, was developed to predict the interactions between peptides and TCRs using fusion features derived from a feature-level fusion strategy. The iTCep achieved high predictive performance with AUC up to 0.96 on the testing dataset and above 0.86 on independent datasets, presenting better prediction performance compared with other predictors. Our results provided strong evidence that model iTCep can be a reliable and robust method for predicting TCR binding specificities of given antigen peptides. One can access the iTCep through a user-friendly web server at http://biostatistics.online/iTCep/, which supports prediction modes of peptide-TCR pairs and peptide-only. A stand-alone software program for T cell epitope prediction is also available for convenient installing at https://github.com/kbvstmd/iTCep/.
Collapse
Affiliation(s)
- Yu Zhang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Institute of Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
| | - Xingxing Jian
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Institute of Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
- Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Linfeng Xu
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Institute of Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, Institute of Bio-Diversity Science, School of Life Sciences, Fudan University, Shanghai, China
| | - Jingjing Zhao
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Institute of Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
| | - Manman Lu
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Institute of Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
| | - Yong Lin
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Lu Xie
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Institute of Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
| |
Collapse
|
23
|
Deng L, Ly C, Abdollahi S, Zhao Y, Prinz I, Bonn S. Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency. Front Immunol 2023; 14:1128326. [PMID: 37143667 PMCID: PMC10152969 DOI: 10.3389/fimmu.2023.1128326] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 03/24/2023] [Indexed: 05/06/2023] Open
Abstract
The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches.
Collapse
Affiliation(s)
- Lihua Deng
- Institute of Systems Immunology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Cedric Ly
- Institut of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Sina Abdollahi
- Institut of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Yu Zhao
- Institut of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Immo Prinz
- Institute of Systems Immunology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Stefan Bonn
- Institut of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
24
|
Wu P, Nie Z, Huang Z, Zhang X. CircPCBL: Identification of Plant CircRNAs with a CNN-BiGRU-GLT Model. PLANTS (BASEL, SWITZERLAND) 2023; 12:1652. [PMID: 37111874 PMCID: PMC10143888 DOI: 10.3390/plants12081652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 04/10/2023] [Accepted: 04/13/2023] [Indexed: 06/19/2023]
Abstract
Circular RNAs (circRNAs), which are produced post-splicing of pre-mRNAs, are strongly linked to the emergence of several tumor types. The initial stage in conducting follow-up studies involves identifying circRNAs. Currently, animals are the primary target of most established circRNA recognition technologies. However, the sequence features of plant circRNAs differ from those of animal circRNAs, making it impossible to detect plant circRNAs. For example, there are non-GT/AG splicing signals at circRNA junction sites and few reverse complementary sequences and repetitive elements in the flanking intron sequences of plant circRNAs. In addition, there have been few studies on circRNAs in plants, and thus it is urgent to create a plant-specific method for identifying circRNAs. In this study, we propose CircPCBL, a deep-learning approach that only uses raw sequences to distinguish between circRNAs found in plants and other lncRNAs. CircPCBL comprises two separate detectors: a CNN-BiGRU detector and a GLT detector. The CNN-BiGRU detector takes in the one-hot encoding of the RNA sequence as the input, while the GLT detector uses k-mer (k = 1 - 4) features. The output matrices of the two submodels are then concatenated and ultimately pass through a fully connected layer to produce the final output. To verify the generalization performance of the model, we evaluated CircPCBL using several datasets, and the results revealed that it had an F1 of 85.40% on the validation dataset composed of six different plants species and 85.88%, 75.87%, and 86.83% on the three cross-species independent test sets composed of Cucumis sativus, Populus trichocarpa, and Gossypium raimondii, respectively. With an accuracy of 90.9% and 90%, respectively, CircPCBL successfully predicted ten of the eleven circRNAs of experimentally reported Poncirus trifoliata and nine of the ten lncRNAs of rice on the real set. CircPCBL could potentially contribute to the identification of circRNAs in plants. In addition, it is remarkable that CircPCBL also achieved an average accuracy of 94.08% on the human datasets, which is also an excellent result, implying its potential application in animal datasets. Ultimately, CircPCBL is available as a web server, from which the data and source code can also be downloaded free of charge.
Collapse
Affiliation(s)
- Pengpeng Wu
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
- School of Life Science, Anhui Agricultural University, Hefei 230036, China
| | - Zhenjun Nie
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China
| | - Zhiqiang Huang
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China
| | - Xiaodan Zhang
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China
| |
Collapse
|
25
|
Zulfiqar H, Guo Z, Grace-Mercure BK, Zhang ZY, Gao H, Lin H, Wu Y. Empirical comparison and recent advances of computational prediction of hormone binding proteins using machine learning methods. Comput Struct Biotechnol J 2023; 21:2253-2261. [PMID: 37035551 PMCID: PMC10073991 DOI: 10.1016/j.csbj.2023.03.024] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 03/15/2023] [Accepted: 03/16/2023] [Indexed: 03/19/2023] Open
Abstract
Hormone binding proteins (HBPs) belong to the group of soluble carrier proteins. These proteins selectively and non-covalently interact with hormones and promote growth hormone signaling in human and other animals. The HBPs are useful in many medical and commercial fields. Thus, the identification of HBPs is very important because it can help to discover more details about hormone binding proteins. Meanwhile, the experimental methods are time-consuming and expensive for hormone binding proteins recognition. Computational prediction methods have played significant roles in the correct recognition of hormone binding proteins with the use of sequence information and ML algorithms. In this review, we compared and assessed the implementation of ML-based tools in recognition of HBPs in a unique way. We hope that this study will give enough awareness and knowledge for research on HBPs.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang 313001, China
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
- School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhiling Guo
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Bakanina Kissanga Grace-Mercure
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhao-Yue Zhang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Gao
- School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang 313001, China
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yun Wu
- College of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China
| |
Collapse
|
26
|
Gao Y, Gao Y, Fan Y, Zhu C, Wei Z, Zhou C, Chuai G, Chen Q, Zhang H, Liu Q. Pan-Peptide Meta Learning for T-cell receptor–antigen binding recognition. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-023-00619-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
|
27
|
Zhang YF, Wang YH, Gu ZF, Pan XR, Li J, Ding H, Zhang Y, Deng KJ. Bitter-RF: A random forest machine model for recognizing bitter peptides. Front Med (Lausanne) 2023; 10:1052923. [PMID: 36778738 PMCID: PMC9909039 DOI: 10.3389/fmed.2023.1052923] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 01/05/2023] [Indexed: 01/27/2023] Open
Abstract
Introduction Bitter peptides are short peptides with potential medical applications. The huge potential behind its bitter taste remains to be tapped. To better explore the value of bitter peptides in practice, we need a more effective classification method for identifying bitter peptides. Methods In this study, we developed a Random forest (RF)-based model, called Bitter-RF, using sequence information of the bitter peptide. Bitter-RF covers more comprehensive and extensive information by integrating 10 features extracted from the bitter peptides and achieves better results than the latest generation model on independent validation set. Results The proposed model can improve the accurate classification of bitter peptides (AUROC = 0.98 on independent set test) and enrich the practical application of RF method in protein classification tasks which has not been used to build a prediction model for bitter peptides. Discussion We hope the Bitter-RF could provide more conveniences to scholars for bitter peptide research.
Collapse
Affiliation(s)
- Yu-Fei Zhang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yu-Hao Wang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhi-Feng Gu
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xian-Run Pan
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Jian Li
- School of Basic Medical Sciences, Chengdu University, Chengdu, China
| | - Hui Ding
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Ke-Jun Deng
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
28
|
Su W, Deng S, Gu Z, Yang K, Ding H, Chen H, Zhang Z. Prediction of apoptosis protein subcellular location based on amphiphilic pseudo amino acid composition. Front Genet 2023; 14:1157021. [PMID: 36926588 PMCID: PMC10011625 DOI: 10.3389/fgene.2023.1157021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 02/20/2023] [Indexed: 03/08/2023] Open
Abstract
Introduction: Apoptosis proteins play an important role in the process of cell apoptosis, which makes the rate of cell proliferation and death reach a relative balance. The function of apoptosis protein is closely related to its subcellular location, it is of great significance to study the subcellular locations of apoptosis proteins. Many efforts in bioinformatics research have been aimed at predicting their subcellular location. However, the subcellular localization of apoptotic proteins needs to be carefully studied. Methods: In this paper, based on amphiphilic pseudo amino acid composition and support vector machine algorithm, a new method was proposed for the prediction of apoptosis proteins\x{2019} subcellular location. Results and Discussion: The method achieved good performance on three data sets. The Jackknife test accuracy of the three data sets reached 90.5%, 93.9% and 84.0%, respectively. Compared with previous methods, the prediction accuracies of APACC_SVM were improved.
Collapse
Affiliation(s)
- Wenxia Su
- College of Science, Inner Mongolia Agriculture University, Hohhot, China
| | - Shuyi Deng
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhifeng Gu
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Keli Yang
- Nonlinear Research Institute, Baoji University of Arts and Sciences, Baoji, China
| | - Hui Ding
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Chen
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Zhaoyue Zhang
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| |
Collapse
|
29
|
Zhou B, Ding M, Feng J, Ji B, Huang P, Zhang J, Yu X, Cao Z, Yang Y, Zhou Y, Wang J. EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning. Brief Bioinform 2022; 24:6961472. [PMID: 36573492 PMCID: PMC9851331 DOI: 10.1093/bib/bbac583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/02/2022] [Accepted: 11/29/2022] [Indexed: 12/28/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) played essential roles in nearly every biological process and disease. Many algorithms were developed to distinguish lncRNAs from mRNAs in transcriptomic data and facilitated discoveries of more than 600 000 of lncRNAs. However, only a tiny fraction (<1%) of lncRNA transcripts (~4000) were further validated by low-throughput experiments (EVlncRNAs). Given the cost and labor-intensive nature of experimental validations, it is necessary to develop computational tools to prioritize those potentially functional lncRNAs because many lncRNAs from high-throughput sequencing (HTlncRNAs) could be resulted from transcriptional noises. Here, we employed deep learning algorithms to separate EVlncRNAs from HTlncRNAs and mRNAs. For overcoming the challenge of small datasets, we employed a three-layer deep-learning neural network (DNN) with a K-mer feature as the input and a small convolutional neural network (CNN) with one-hot encoding as the input. Three separate models were trained for human (h), mouse (m) and plant (p), respectively. The final concatenated models (EVlncRNA-Dpred (h), EVlncRNA-Dpred (m) and EVlncRNA-Dpred (p)) provided substantial improvement over a previous model based on support-vector-machines (EVlncRNA-pred). For example, EVlncRNA-Dpred (h) achieved 0.896 for the area under receiver-operating characteristic curve, compared with 0.582 given by sequence-based EVlncRNA-pred model. The models developed here should be useful for screening lncRNA transcripts for experimental validations. EVlncRNA-Dpred is available as a web server at https://www.sdklab-biophysics-dzu.net/EVlncRNA-Dpred/index.html, and the data and source code can be freely available along with the web server.
Collapse
Affiliation(s)
- Bailing Zhou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Maolin Ding
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Jing Feng
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Baohua Ji
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Pingping Huang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Junye Zhang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Xue Yu
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Zanxia Cao
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yaoqi Zhou
- Corresponding authors: Yaoqi Zhou, Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China. Tel.: +86 (755) 6275 2684; E-mail: ; Jihua Wang, Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China. Tel.: +86 (534) 898 5933; E-mail:
| | - Jihua Wang
- Corresponding authors: Yaoqi Zhou, Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China. Tel.: +86 (755) 6275 2684; E-mail: ; Jihua Wang, Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China. Tel.: +86 (534) 898 5933; E-mail:
| |
Collapse
|
30
|
Luo Z, Lou L, Qiu W, Xu Z, Xiao X. Predicting N6-Methyladenosine Sites in Multiple Tissues of Mammals through Ensemble Deep Learning. Int J Mol Sci 2022; 23:15490. [PMID: 36555143 PMCID: PMC9778682 DOI: 10.3390/ijms232415490] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/03/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022] Open
Abstract
N6-methyladenosine (m6A) is the most abundant within eukaryotic messenger RNA modification, which plays an essential regulatory role in the control of cellular functions and gene expression. However, it remains an outstanding challenge to detect mRNA m6A transcriptome-wide at base resolution via experimental approaches, which are generally time-consuming and expensive. Developing computational methods is a good strategy for accurate in silico detection of m6A modification sites from the large amount of RNA sequence data. Unfortunately, the existing computational models are usually only for m6A site prediction in a single species, without considering the tissue level of species, while most of them are constructed based on low-confidence level data generated by an m6A antibody immunoprecipitation (IP)-based sequencing method, thereby restricting reliability and generalizability of proposed models. Here, we review recent advances in computational prediction of m6A sites and construct a new computational approach named im6APred using ensemble deep learning to accurately identify m6A sites based on high-confidence level data in multiple tissues of mammals. Our model im6APred builds upon a comprehensive evaluation of multiple classification methods, including four traditional classification algorithms and three deep learning methods and their ensembles. The optimal base-classifier combinations are then chosen by five-fold cross-validation test to achieve an effective stacked model. Our model im6APred can produce the area under the receiver operating characteristic curve (AUROC) in the range of 0.82-0.91 on independent tests, indicating that our model has the ability to learn general methylation rules on RNA bases and generalize to m6A transcriptome-wide identification. Moreover, AUROCs in the range of 0.77-0.96 were achieved using cross-species/tissues validation on the benchmark dataset, demonstrating differences in predictive performance at the tissue level and the need for constructing tissue-specific models for m6A site prediction.
Collapse
Affiliation(s)
| | | | | | - Zhaochun Xu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| |
Collapse
|
31
|
Identification of adaptor proteins using the ANOVA feature selection technique. Methods 2022; 208:42-47. [DOI: 10.1016/j.ymeth.2022.10.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 10/01/2022] [Accepted: 10/24/2022] [Indexed: 11/06/2022] Open
|
32
|
Dong B, Li M, Jiang B, Gao B, Li D, Zhang T. Antimicrobial Peptides Prediction method based on sequence multidimensional feature embedding. Front Genet 2022; 13:1069558. [PMID: 36468005 PMCID: PMC9714691 DOI: 10.3389/fgene.2022.1069558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 11/02/2022] [Indexed: 09/10/2024] Open
Abstract
Antimicrobial peptides (AMPs) are alkaline substances with efficient bactericidal activity produced in living organisms. As the best substitute for antibiotics, they have been paid more and more attention in scientific research and clinical application. AMPs can be produced from almost all organisms and are capable of killing a wide variety of pathogenic microorganisms. In addition to being antibacterial, natural AMPs have many other therapeutically important activities, such as wound healing, antioxidant and immunomodulatory effects. To discover new AMPs, the use of wet experimental methods is expensive and difficult, and bioinformatics technology can effectively solve this problem. Recently, some deep learning methods have been applied to the prediction of AMPs and achieved good results. To further improve the prediction accuracy of AMPs, this paper designs a new deep learning method based on sequence multidimensional representation. By encoding and embedding sequence features, and then inputting the model to identify AMPs, high-precision classification of AMPs and Non-AMPs with lengths of 10-200 is achieved. The results show that our method improved accuracy by 1.05% compared to the most advanced model in independent data validation without decreasing other indicators.
Collapse
Affiliation(s)
- Benzhi Dong
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Mengna Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Bei Jiang
- Tianjin Second People's Hospital, Tianjin Institute of Hepatology, Tianjin, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Dan Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Tianjiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| |
Collapse
|
33
|
Xiao J, Liu M, Huang Q, Sun Z, Ning L, Duan J, Zhu S, Huang J, Lin H, Yang H. Analysis and modeling of myopia-related factors based on questionnaire survey. Comput Biol Med 2022; 150:106162. [PMID: 36252365 DOI: 10.1016/j.compbiomed.2022.106162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/12/2022] [Accepted: 10/01/2022] [Indexed: 11/03/2022]
Abstract
With the rapid development of science and technology, the trend of low age myopia is becoming increasingly significant. The latest national survey done by the Chinese government found that more than 80% of Chinese teenagers suffer from myopia. Adolescent myopia is closely related to living environment, heredity, and living habits. Quantifying the relationship between myopia and living environment, heredity, and living habits is conductive to the prevention and intervention of adolescent myopia. In this study, we investigated the relationships between four main factors (environment, habits, parental vision, and demographic) and myopia status by analyzing the questionnaire data. Data were collected from Chengdu, China in 2021, including 2808 myopia samples and 5693 non-myopia samples, with a total of 22 features. Then, these 22 features were inputted into three machine learning algorithms to discriminate the two classes of samples. Results show that the computational model could produce an AUC of 0.768. To pick out the most important features which play important roles in classification, we used incremental feature selection strategy to screen the 22 features. As a result, we found that the 4 most influential features with XGBoost could achieve a competitive AUC of 0.764. To further investigate the risk and protective factors affecting adolescent myopia, we used OR values derived from MLE-LR to analyze the relationship between 22 features and adolescent myopia. Results showed that the age variable was the most significant risk factor for myopia, followed by the myopia status of parents. The most protective factor for eyesight is the measure taken by the children, followed by the distance between books and eyes when reading. These discoveries can guide the prevention and control of myopia in children and adolescents.
Collapse
Affiliation(s)
- Jianqiang Xiao
- Eye School, Chengdu University of Traditional Chinese Medicine, Ineye Hospital of Chengdu University of TCM, China
| | - Mujiexin Liu
- Eye School, Chengdu University of Traditional Chinese Medicine, Ineye Hospital of Chengdu University of TCM, China
| | - Qinlai Huang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Zijie Sun
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Lin Ning
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, 611844, China
| | - Junguo Duan
- Eye School, Chengdu University of Traditional Chinese Medicine, Ineye Hospital of Chengdu University of TCM, China
| | - Siquan Zhu
- Eye School, Chengdu University of Traditional Chinese Medicine, Ineye Hospital of Chengdu University of TCM, China
| | - Jian Huang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Hui Yang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; School of Computer Science, Chengdu University of Information Technology, Chengdu, 611844, China.
| |
Collapse
|
34
|
PD-BertEDL: An Ensemble Deep Learning Method Using BERT and Multivariate Representation to Predict Peptide Detectability. Int J Mol Sci 2022; 23:ijms232012385. [PMID: 36293242 PMCID: PMC9604182 DOI: 10.3390/ijms232012385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 10/11/2022] [Accepted: 10/12/2022] [Indexed: 12/03/2022] Open
Abstract
Peptide detectability is defined as the probability of identifying a peptide from a mixture of standard samples, which is a key step in protein identification and analysis. Exploring effective methods for predicting peptide detectability is helpful for disease treatment and clinical research. However, most existing computational methods for predicting peptide detectability rely on a single information. With the increasing complexity of feature representation, it is necessary to explore the influence of multivariate information on peptide detectability. Thus, we propose an ensemble deep learning method, PD-BertEDL. Bidirectional encoder representations from transformers (BERT) is introduced to capture the context information of peptides. Context information, sequence information, and physicochemical information of peptides were combined to construct the multivariate feature space of peptides. We use different deep learning methods to capture the high-quality features of different categories of peptides information and use the average fusion strategy to integrate three model prediction results to solve the heterogeneity problem and to enhance the robustness and adaptability of the model. The experimental results show that PD-BertEDL is superior to the existing prediction methods, which can effectively predict peptide detectability and provide strong support for protein identification and quantitative analysis, as well as disease treatment.
Collapse
|
35
|
Yuan SS, Gao D, Xie XQ, Ma CY, Su W, Zhang ZY, Zheng Y, Ding H. IBPred: A sequence-based predictor for identifying ion binding protein in phage. Comput Struct Biotechnol J 2022; 20:4942-4951. [PMID: 36147670 PMCID: PMC9474292 DOI: 10.1016/j.csbj.2022.08.053] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 08/23/2022] [Accepted: 08/24/2022] [Indexed: 11/16/2022] Open
Abstract
Ion binding proteins (IBPs) can selectively and non-covalently interact with ions. IBPs in phages also play an important role in biological processes. Therefore, accurate identification of IBPs is necessary for understanding their biological functions and molecular mechanisms that involve binding to ions. Since molecular biology experimental methods are still labor-intensive and cost-ineffective in identifying IBPs, it is helpful to develop computational methods to identify IBPs quickly and efficiently. In this work, a random forest (RF)-based model was constructed to quickly identify IBPs. Based on the protein sequence information and residues' physicochemical properties, the dipeptide composition combined with the physicochemical correlation between two residues were proposed for the extraction of features. A feature selection technique called analysis of variance (ANOVA) was used to exclude redundant information. By comparing with other classified methods, we demonstrated that our method could identify IBPs accurately. Based on the model, a Python package named IBPred was built with the source code which can be accessed at https://github.com/ShishiYuan/IBPred.
Collapse
Affiliation(s)
- Shi-Shi Yuan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dong Gao
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xue-Qin Xie
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cai-Yi Ma
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Su
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhao-Yue Zhang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| | - Yan Zheng
- Baotou Medical College, Baotou 014040, China
| | - Hui Ding
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
36
|
Xu Y, Qian X, Tong Y, Li F, Wang K, Zhang X, Liu T, Wang J. AttnTAP: A Dual-input Framework Incorporating the Attention Mechanism for Accurately Predicting TCR-peptide Binding. Front Genet 2022; 13:942491. [PMID: 36072653 PMCID: PMC9441555 DOI: 10.3389/fgene.2022.942491] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 06/28/2022] [Indexed: 11/30/2022] Open
Abstract
T-cell receptors (TCRs) are formed by random recombination of genomic precursor elements, some of which mediate the recognition of cancer-associated antigens. Due to the complicated process of T-cell immune response and limited biological empirical evidence, the practical strategy for identifying TCRs and their recognized peptides is the computational prediction from population and/or individual TCR repertoires. In recent years, several machine/deep learning-based approaches have been proposed for TCR-peptide binding prediction. However, the predictive performances of these methods can be further improved by overcoming several significant flaws in neural network design. The interrelationship between amino acids in TCRs is critical for TCR antigen recognition, which was not properly considered by the existing methods. They also did not pay more attention to the amino acids that play a significant role in antigen-binding specificity. Moreover, complex networks tended to increase the risk of overfitting and computational costs. In this study, we developed a dual-input deep learning framework, named AttnTAP, to improve the TCR-peptide binding prediction. It used the bi-directional long short-term memory model for robust feature extraction of TCR sequences, which considered the interrelationships between amino acids and their precursors and postcursors. We also introduced the attention mechanism to give amino acids different weights and pay more attention to the contributing ones. In addition, we used the multilayer perceptron model instead of complex networks to extract peptide features to reduce overfitting and computational costs. AttnTAP achieved high areas under the curves (AUCs) in TCR-peptide binding prediction on both balanced and unbalanced datasets (higher than 0.838 on McPAS-TCR and 0.908 on VDJdb). Furthermore, it had the highest average AUCs in TPP-I and TPP-II tasks compared with the other five popular models (TPP-I: 0.84 on McPAS-TCR and 0.894 on VDJdb; TPP-II: 0.837 on McPAS-TCR and 0.893 on VDJdb). In conclusion, AttnTAP is a reasonable and practical framework for predicting TCR-peptide binding, which can accelerate identifying neoantigens and activated T cells for immunotherapy to meet urgent clinical needs.
Collapse
Affiliation(s)
- Ying Xu
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Xinyang Qian
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Yao Tong
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Fan Li
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Ke Wang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
- Geneplus Beijing Institute, Beijing, China
| | - Xuanping Zhang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Tao Liu
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
- Geneplus Beijing Institute, Beijing, China
| | - Jiayin Wang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
- *Correspondence: Jiayin Wang,
| |
Collapse
|
37
|
Ding Q, Yang W, Luo M, Xu C, Xu Z, Pang F, Cai Y, Anashkina AA, Su X, Chen N, Jiang Q. CBLRR: a cauchy-based bounded constraint low-rank representation method to cluster single-cell RNA-seq data. Brief Bioinform 2022; 23:6649282. [DOI: 10.1093/bib/bbac300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/17/2022] [Accepted: 07/02/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
The rapid development of single-cel+l RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for exploring biological phenomena at the single-cell level. The discovery of cell types is one of the major applications for researchers to explore the heterogeneity of cells. Some computational methods have been proposed to solve the problem of scRNA-seq data clustering. However, the unavoidable technical noise and notorious dropouts also reduce the accuracy of clustering methods. Here, we propose the cauchy-based bounded constraint low-rank representation (CBLRR), which is a low-rank representation-based method by introducing cauchy loss function (CLF) and bounded nuclear norm regulation, aiming to alleviate the above issue. Specifically, as an effective loss function, the CLF is proven to enhance the robustness of the identification of cell types. Then, we adopt the bounded constraint to ensure the entry values of single-cell data within the restricted interval. Finally, the performance of CBLRR is evaluated on 15 scRNA-seq datasets, and compared with other state-of-the-art methods. The experimental results demonstrate that CBLRR performs accurately and robustly on clustering scRNA-seq data. Furthermore, CBLRR is an effective tool to cluster cells, and provides great potential for downstream analysis of single-cell data. The source code of CBLRR is available online at https://github.com/Ginnay/CBLRR.
Collapse
Affiliation(s)
- Qian Ding
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Wenyi Yang
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Chang Xu
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Zhaochun Xu
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Fenglan Pang
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Yideng Cai
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Anastasia A Anashkina
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences , Moscow, Russia
| | - Xi Su
- Foshan Maternity & Child Healthcare Hospital, Southern Medical University , Foshan, Guangdong, China
| | - Na Chen
- Department of Hematology, Shandong Provincial Hospital Affiliated to Shandong First Medical University , Jinan, Shandong, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| |
Collapse
|
38
|
Hu RS, Wu J, Zhang L, Zhou X, Zhang Y. CD8TCEI-EukPath: A Novel Predictor to Rapidly Identify CD8+ T-Cell Epitopes of Eukaryotic Pathogens Using a Hybrid Feature Selection Approach. Front Genet 2022; 13:935989. [PMID: 35937988 PMCID: PMC9354802 DOI: 10.3389/fgene.2022.935989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 05/24/2022] [Indexed: 12/02/2022] Open
Abstract
Computational prediction to screen potential vaccine candidates has been proven to be a reliable way to provide guarantees for vaccine discovery in infectious diseases. As an important class of organisms causing infectious diseases, pathogenic eukaryotes (such as parasitic protozoans) have evolved the ability to colonize a wide range of hosts, including humans and animals; meanwhile, protective vaccines are urgently needed. Inspired by the immunological idea that pathogen-derived epitopes are able to mediate the CD8+ T-cell-related host adaptive immune response and with the available positive and negative CD8+ T-cell epitopes (TCEs), we proposed a novel predictor called CD8TCEI-EukPath to detect CD8+ TCEs of eukaryotic pathogens. Our method integrated multiple amino acid sequence-based hybrid features, employed a well-established feature selection technique, and eventually built an efficient machine learning classifier to differentiate CD8+ TCEs from non-CD8+ TCEs. Based on the feature selection results, 520 optimal hybrid features were used for modeling by utilizing the LightGBM algorithm. CD8TCEI-EukPath achieved impressive performance, with an accuracy of 79.255% in ten-fold cross-validation and an accuracy of 78.169% in the independent test. Collectively, CD8TCEI-EukPath will contribute to rapidly screening epitope-based vaccine candidates, particularly from large peptide-coding datasets. To conduct the prediction of CD8+ TCEs conveniently, an online web server is freely accessible (http://lab.malab.cn/∼hrs/CD8TCEI-EukPath/).
Collapse
Affiliation(s)
- Rui-Si Hu
- Yangtze Delta Region Institute, University of Electronic Science and Technology of China, Quzhou, China
| | - Jin Wu
- School of Management, Shenzhen Polytechnic, Shenzhen, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, China
| | - Xun Zhou
- Beidahuang Industry Group General Hospital, Harbin, China
- *Correspondence: Xun Zhou, ; Ying Zhang,
| | - Ying Zhang
- Department of Anesthesiology, Hospital (T.C.M) Affiliated of Southwest Medical University, Luzhou, China
- *Correspondence: Xun Zhou, ; Ying Zhang,
| |
Collapse
|
39
|
Neoantigens in precision cancer immunotherapy: from identification to clinical applications. Chin Med J (Engl) 2022; 135:1285-1298. [PMID: 35838545 PMCID: PMC9433083 DOI: 10.1097/cm9.0000000000002181] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Immunotherapies targeting cancer neoantigens are safe, effective, and precise. Neoantigens can be identified mainly by genomic techniques such as next-generation sequencing and high-throughput single-cell sequencing; proteomic techniques such as mass spectrometry; and bioinformatics tools based on high-throughput sequencing data, mass spectrometry data, and biological databases. Neoantigen-related therapies are widely used in clinical practice and include neoantigen vaccines, neoantigen-specific CD8+ and CD4+ T cells, and neoantigen-pulsed dendritic cells. In addition, neoantigens can be used as biomarkers to assess immunotherapy response, resistance, and prognosis. Therapies based on neoantigens are an important and promising branch of cancer immunotherapy. Unremitting efforts are needed to unravel the comprehensive role of neoantigens in anti-tumor immunity and to extend their clinical application. This review aimed to summarize the progress in neoantigen research and to discuss its opportunities and challenges in precision cancer immunotherapy.
Collapse
|
40
|
Liu P, Ding Y, Rong Y, Chen D. Prediction of cell penetrating peptides and their uptake efficiency using random forest‐based feature selections. AIChE J 2022. [DOI: 10.1002/aic.17781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Peng Liu
- Institute of Fundamental and Frontier Sciences University of Electronic Science and Technology of China Chengdu China
- Institute of Yangtze Delta Region (Quzhou) University of Electronic Science and Technology of China Quzhou China
| | - Yijie Ding
- Institute of Yangtze Delta Region (Quzhou) University of Electronic Science and Technology of China Quzhou China
| | - Ying Rong
- Beidahuang Industry Group General Hospital Harbin China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University Quzhou China
| |
Collapse
|
41
|
Lu M, Xu L, Jian X, Tan X, Zhao J, Liu Z, Zhang Y, Liu C, Chen L, Lin Y, Xie L. dbPepNeo2.0: A Database for Human Tumor Neoantigen Peptides From Mass Spectrometry and TCR Recognition. Front Immunol 2022; 13:855976. [PMID: 35493528 PMCID: PMC9043652 DOI: 10.3389/fimmu.2022.855976] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 03/17/2022] [Indexed: 12/04/2022] Open
Abstract
Neoantigens are widely reported to induce T-cell response and lead to tumor regression, indicating a promising potential to immunotherapy. Previously, we constructed an open-access database, i.e., dbPepNeo, providing a systematic resource for human tumor neoantigens to storage and query. In order to expand data volume and application scope, we updated dbPepNeo to version 2.0 (http://www.biostatistics.online/dbPepNeo2). Here, we provide about 801 high-confidence (HC) neoantigens (increased by 170%) and 842,289 low-confidence (LC) HLA immunopeptidomes (increased by 107%). Notably, 55 class II HC neoantigens and 630 neoantigen-reactive T-cell receptor-β (TCRβ) sequences were firstly included. Besides, two new analytical tools are developed, DeepCNN-Ineo and BLASTdb. DeepCNN-Ineo predicts the immunogenicity of class I neoantigens, and BLASTdb performs local alignments to look for sequence similarities in dbPepNeo2.0. Meanwhile, the web features and interface have been greatly improved and enhanced.
Collapse
Affiliation(s)
- Manman Lu
- College of Food Science and Technology, Shanghai Ocean University, Shanghai, China.,Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
| | - Linfeng Xu
- Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China.,School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Xingxing Jian
- Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China.,Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, China
| | - Xiaoxiu Tan
- Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China.,Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Jingjing Zhao
- College of Food Science and Technology, Shanghai Ocean University, Shanghai, China.,Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
| | - Zhenhao Liu
- Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
| | - Yu Zhang
- Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China.,School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Chunyu Liu
- College of Food Science and Technology, Shanghai Ocean University, Shanghai, China.,Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
| | - Lanming Chen
- College of Food Science and Technology, Shanghai Ocean University, Shanghai, China
| | - Yong Lin
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Lu Xie
- College of Food Science and Technology, Shanghai Ocean University, Shanghai, China.,Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China.,Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
42
|
Peng L, Tan J, Tian X, Zhou L. EnANNDeep: An Ensemble-based lncRNA-protein Interaction Prediction Framework with Adaptive k-Nearest Neighbor Classifier and Deep Models. Interdiscip Sci 2022; 14:209-232. [PMID: 35006529 DOI: 10.1007/s12539-021-00483-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/14/2021] [Accepted: 09/15/2021] [Indexed: 01/08/2023]
Abstract
lncRNA-protein interactions (LPIs) prediction can deepen the understanding of many important biological processes. Artificial intelligence methods have reported many possible LPIs. However, most computational techniques were evaluated mainly on one dataset, which may produce prediction bias. More importantly, they were validated only under cross validation on lncRNA-protein pairs, and did not consider the performance under cross validations on lncRNAs and proteins, thus fail to search related proteins/lncRNAs for a new lncRNA/protein. Under an ensemble learning framework (EnANNDeep) composed of adaptive k-nearest neighbor classifier and Deep models, this study focuses on systematically finding underlying linkages between lncRNAs and proteins. First, five LPI-related datasets are arranged. Second, multiple source features are integrated to depict an lncRNA-protein pair. Third, adaptive k-nearest neighbor classifier, deep neural network, and deep forest are designed to score unknown lncRNA-protein pairs, respectively. Finally, interaction probabilities from the three predictors are integrated based on a soft voting technique. In comparing to five classical LPI identification models (SFPEL, PMDKN, CatBoost, PLIPCOM, and LPI-SKF) under fivefold cross validations on lncRNAs, proteins, and LPIs, EnANNDeep computes the best average AUCs of 0.8660, 0.8775, and 0.9166, respectively, and the best average AUPRs of 0.8545, 0.8595, and 0.9054, respectively, indicating its superior LPI prediction ability. Case study analyses indicate that SNHG10 may have dense linkage with Q15717. In the ensemble framework, adaptive k-nearest neighbor classifier can separately pick the most appropriate k for each query lncRNA-protein pair. More importantly, deep models including deep neural network and deep forest can effectively learn the representative features of lncRNAs and proteins.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China. .,College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China.
| | - Jingwei Tan
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xiongfei Tian
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China.
| |
Collapse
|
43
|
Ma D, Chen Z, He Z, Huang X. A SNARE Protein Identification Method Based on iLearnPlus to Efficiently Solve the Data Imbalance Problem. Front Genet 2022; 12:818841. [PMID: 35154261 PMCID: PMC8832978 DOI: 10.3389/fgene.2021.818841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Accepted: 12/14/2021] [Indexed: 11/13/2022] Open
Abstract
Machine learning has been widely used to solve complex problems in engineering applications and scientific fields, and many machine learning-based methods have achieved good results in different fields. SNAREs are key elements of membrane fusion and required for the fusion process of stable intermediates. They are also associated with the formation of some psychiatric disorders. This study processes the original sequence data with the synthetic minority oversampling technique (SMOTE) to solve the problem of data imbalance and produces the most suitable machine learning model with the iLearnPlus platform for the identification of SNARE proteins. Ultimately, a sensitivity of 66.67%, specificity of 93.63%, accuracy of 91.33%, and MCC of 0.528 were obtained in the cross-validation dataset, and a sensitivity of 66.67%, specificity of 93.63%, accuracy of 91.33%, and MCC of 0.528 were obtained in the independent dataset (the adaptive skip dipeptide composition descriptor was used for feature extraction, and LightGBM with proper parameters was used as the classifier). These results demonstrate that this combination can perform well in the classification of SNARE proteins and is superior to other methods.
Collapse
|
44
|
Chen Y, Juan L, Lv X, Shi L. Bioinformatics Research on Drug Sensitivity Prediction. Front Pharmacol 2021; 12:799712. [PMID: 34955863 PMCID: PMC8696280 DOI: 10.3389/fphar.2021.799712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 11/18/2021] [Indexed: 11/28/2022] Open
Abstract
Modeling-based anti-cancer drug sensitivity prediction has been extensively studied in recent years. While most drug sensitivity prediction models only use gene expression data, the remarkable impacts of gene mutation, methylation, and copy number variation on drug sensitivity are neglected. Drug sensitivity prediction can both help protect patients from some adverse drug reactions and improve the efficacy of treatment. Genomics data are extremely useful for drug sensitivity prediction task. This article reviews the role of drug sensitivity prediction, describes a variety of methods for predicting drug sensitivity. Moreover, the research significance of drug sensitivity prediction, as well as existing problems are well discussed.
Collapse
Affiliation(s)
- Yaojia Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiao Lv
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Lei Shi
- Department of Spine Surgery Changzheng Hospital, Naval Medical University, Shanghai, China
| |
Collapse
|
45
|
Guo Y, Ju Y, Chen D, Wang L. Research on the Computational Prediction of Essential Genes. Front Cell Dev Biol 2021; 9:803608. [PMID: 34938741 PMCID: PMC8685449 DOI: 10.3389/fcell.2021.803608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 11/22/2021] [Indexed: 11/19/2022] Open
Abstract
Genes, the nucleotide sequences that encode a polypeptide chain or functional RNA, are the basic genetic unit controlling biological traits. They are the guarantee of the basic structures and functions in organisms, and they store information related to biological factors and processes such as blood type, gestation, growth, and apoptosis. The environment and genetics jointly affect important physiological processes such as reproduction, cell division, and protein synthesis. Genes are related to a wide range of phenomena including growth, decline, illness, aging, and death. During the evolution of organisms, there is a class of genes that exist in a conserved form in multiple species. These genes are often located on the dominant strand of DNA and tend to have higher expression levels. The protein encoded by it usually either performs very important functions or is responsible for maintaining and repairing these essential functions. Such genes are called persistent genes. Among them, the irreplaceable part of the body’s life activities is the essential gene. For example, when starch is the only source of energy, the genes related to starch digestion are essential genes. Without them, the organism will die because it cannot obtain enough energy to maintain basic functions. The function of the proteins encoded by these genes is thought to be fundamental to life. Nowadays, DNA can be extracted from blood, saliva, or tissue cells for genetic testing, and detailed genetic information can be obtained using the most advanced scientific instruments and technologies. The information gained from genetic testing is useful to assess the potential risks of disease, and to help determine the prognosis and development of diseases. Such information is also useful for developing personalized medication and providing targeted health guidance to improve the quality of life. Therefore, it is of great theoretical and practical significance to identify important and essential genes. In this paper, the research status of essential genes and the essential genome database of bacteria are reviewed, the computational prediction method of essential genes based on communication coding theory is expounded, and the significance and practical application value of essential genes are discussed.
Collapse
Affiliation(s)
- Yuxin Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Lihong Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
46
|
Gu X, Guo L, Liao B, Jiang Q. Pseudo-188D: Phage Protein Prediction Based on a Model of Pseudo-188D. Front Genet 2021; 12:796327. [PMID: 34925468 PMCID: PMC8672092 DOI: 10.3389/fgene.2021.796327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 11/15/2021] [Indexed: 11/13/2022] Open
Abstract
Phages have seriously affected the biochemical systems of the world, and not only are phages related to our health, but medical treatments for many cancers and skin infections are related to phages; therefore, this paper sought to identify phage proteins. In this paper, a Pseudo-188D model was established. The digital features of the phage were extracted by PseudoKNC, an appropriate vector was selected by the AdaBoost tool, and features were extracted by 188D. Then, the extracted digital features were combined together, and finally, the viral proteins of the phage were predicted by a stochastic gradient descent algorithm. Our model effect reached 93.4853%. To verify the stability of our model, we randomly selected 80% of the downloaded data to train the model and used the remaining 20% of the data to verify the robustness of our model.
Collapse
Affiliation(s)
- Xiaomei Gu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Institute of Yangtze River Delta, University of Electronic Science and Technology of China, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lina Guo
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Qinghua Jiang
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
47
|
Gong Y, Liao B, Wang P, Zou Q. DrugHybrid_BS: Using Hybrid Feature Combined With Bagging-SVM to Predict Potentially Druggable Proteins. Front Pharmacol 2021; 12:771808. [PMID: 34916947 PMCID: PMC8669608 DOI: 10.3389/fphar.2021.771808] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 11/15/2021] [Indexed: 01/09/2023] Open
Abstract
Drug targets are biological macromolecules or biomolecule structures capable of specifically binding a therapeutic effect with a particular drug or regulating physiological functions. Due to the important value and role of drug targets in recent years, the prediction of potential drug targets has become a research hotspot. The key to the research and development of modern new drugs is first to identify potential drug targets. In this paper, a new predictor, DrugHybrid_BS, is developed based on hybrid features and Bagging-SVM to identify potentially druggable proteins. This method combines the three features of monoDiKGap (k = 2), cross-covariance, and grouped amino acid composition. It removes redundant features and analyses key features through MRMD and MRMD2.0. The cross-validation results show that 96.9944% of the potentially druggable proteins can be accurately identified, and the accuracy of the independent test set has reached 96.5665%. This all means that DrugHybrid_BS has the potential to become a useful predictive tool for druggable proteins. In addition, the hybrid key features can identify 80.0343% of the potentially druggable proteins combined with Bagging-SVM, which indicates the significance of this part of the features for research.
Collapse
Affiliation(s)
- Yuxin Gong
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Peng Wang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|