1
|
Tran TX, Khanh Le NQ, Nguyen VN. Integrating CNN and Bi-LSTM for protein succinylation sites prediction based on Natural Language Processing technique. Comput Biol Med 2025; 186:109664. [PMID: 39798505 DOI: 10.1016/j.compbiomed.2025.109664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 12/10/2024] [Accepted: 01/06/2025] [Indexed: 01/15/2025]
Abstract
Protein succinylation, a post-translational modification wherein a succinyl group (-CO-CH₂-CH₂-CO-) attaches to lysine residues, plays a critical regulatory role in cellular processes. Dysregulated succinylation has been implicated in the onset and progression of various diseases, including liver, cardiac, pulmonary, and neurological disorders. However, identifying succinylation sites through experimental methods is often labor-intensive, costly, and technically challenging. To address this, we introduce an approach called CbiLSuccSite, that integrates Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory (Bi-LSTM) networks for the accurate prediction of protein succinylation sites. Our approach employs a word embedding layer to encode protein sequences, enabling the automatic learning of intricate patterns and dependencies without manual feature extraction. In 10-fold cross-validation, CBiLSuccSite achieved superior predictive performance, with an Area Under the Curve (AUC) of 0.826 and a Matthews Correlation Coefficient (MCC) of 0.502. Independent testing further validated its robustness, yielding an AUC of 0.818 and an MCC of 0.53. The integration of CNN and Bi-LSTM leverages the strengths of both architectures, establishing CBiLSuccSite as an effective tool for protein language processing and succinylation site prediction. Our model and code are publicly accessible at: https://github.com/nuinvtnu/CBiLSuccSite.
Collapse
Affiliation(s)
- Thi-Xuan Tran
- Thai Nguyen University of Economics and Business Administration, Thai Nguyen City, Viet Nam.
| | - Nguyen Quoc Khanh Le
- In-Service Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taiwan; AIBioMed Research Group, Taipei Medical University, Taiwan.
| | - Van-Nui Nguyen
- Thai Nguyen University of Information and Communication Technology, Thai Nguyen City, Viet Nam.
| |
Collapse
|
2
|
Lin L, Long Y, Liu J, Deng D, Yuan Y, Liu L, Tan B, Qi H. FRP-XGBoost: Identification of ferroptosis-related proteins based on multi-view features. Int J Biol Macromol 2024; 262:130180. [PMID: 38360239 DOI: 10.1016/j.ijbiomac.2024.130180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 02/11/2024] [Accepted: 02/12/2024] [Indexed: 02/17/2024]
Abstract
Ferroptosis represents a novel form of programmed cell death. Pan-cancer bioinformatics analysis indicates that identifying and modulating ferroptosis offer innovative approaches for preventing and treating diverse tumor pathologies. However, the precise detection of ferroptosis-related proteins via conventional wet-laboratory techniques remains a formidable challenge, largely due to the constraints of existing methodologies. These traditional approaches are not only labor-intensive but also financially burdensome. Consequently, there is an imperative need for the development of more sophisticated and efficient computational tools to facilitate the detection of these proteins. In this paper, we presented a XGBoost and multi-view features-based machine learning prediction method for predicting ferroptosis-related proteins, which was referred to as FRP-XGBoost. In this study, we explored four types of protein feature extraction methods and evaluated their effectiveness in predicting ferroptosis-related proteins using six of the most commonly used traditional classifiers. To enhance the representational power of the hybrid features, we employed a two-step feature selection technique to identify the optimal subset of features. Subsequently, we constructed a prediction model using the XGBoost algorithm. The FRP-XGBoost achieved an accuracy of 96.74 % in 10-fold cross-validation and a further accuracy of 91.52 % in an independent test. The implementation source code of FRP-XGBoost is available at https://github.com/linli5417/FRP-XGBoost.
Collapse
Affiliation(s)
- Li Lin
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China
| | - Yao Long
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China; Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Jinkai Liu
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China; Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Dongliang Deng
- Department of Oncology, Chongqing Traditional Chinese Medicine Hospital, Chongqing 400021, China
| | - Yu Yuan
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China
| | - Lubin Liu
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China
| | - Bin Tan
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China; Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China.
| | - Hongbo Qi
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China; Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China.
| |
Collapse
|
3
|
Ahmed SS, Rifat ZT, Rahman MS, Rahman MS. Succinylated lysine residue prediction revisited. Brief Bioinform 2023; 24:6865109. [PMID: 36460620 DOI: 10.1093/bib/bbac510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/30/2022] [Accepted: 10/25/2022] [Indexed: 12/04/2022] Open
Abstract
Lysine succinylation is a kind of post-translational modification (PTM) that plays a crucial role in regulating the cellular processes. Aberrant succinylation may cause inflammation, cancers, metabolism diseases and nervous system diseases. The experimental methods to detect succinylation sites are time-consuming and costly. This thus calls for computational models with high efficacy, and attention has been given in the literature to develop such models, albeit with only moderate success in the context of different evaluation metrics. One crucial aspect in this context is the biochemical and physicochemical properties of amino acids, which appear to be useful as features for such computational predictors. However, some of the existing computational models did not use the biochemical and physicochemical properties of amino acids. In contrast, some others used them without considering the inter-dependency among the properties. The combinations of biochemical and physicochemical properties derived through our optimization process achieve better results than the results achieved by combining all the properties. We propose three deep learning architectures: CNN+Bi-LSTM (CBL), Bi-LSTM+CNN (BLC) and their combination (CBL_BLC). We find that CBL_BLC outperforms the other two. Ensembling of different models successfully improves the results. Notably, tuning the threshold of the ensemble classifiers further improves the results. Upon comparing our work with other existing works on two datasets, we successfully achieve better sensitivity and specificity by varying the threshold value.
Collapse
Affiliation(s)
- Shehab Sarar Ahmed
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, 1000, Dhaka, Bangladesh
| | - Zaara Tasnim Rifat
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, 1000, Dhaka, Bangladesh
| | - M Saifur Rahman
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, 1000, Dhaka, Bangladesh
| | - M Sohel Rahman
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, 1000, Dhaka, Bangladesh
| |
Collapse
|
4
|
Liu X, Xu LL, Lu YP, Yang T, Gu XY, Wang L, Liu Y. Deep_KsuccSite: A novel deep learning method for the identification of lysine succinylation sites. Front Genet 2022; 13:1007618. [PMID: 36246655 PMCID: PMC9557156 DOI: 10.3389/fgene.2022.1007618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/08/2022] [Indexed: 11/13/2022] Open
Abstract
Identification of lysine (symbol Lys or K) succinylation (Ksucc) sites centralizes the basis for disclosing the mechanism and function of lysine succinylation modifications. Traditional experimental methods for Ksucc site ientification are often costly and time-consuming. Therefore, it is necessary to construct an efficient computational method to prediction the presence of Ksucc sites in protein sequences. In this study, we proposed a novel and effective predictor for the identification of Ksucc sites based on deep learning algorithms that was termed as Deep_KsuccSite. The predictor adopted Composition, Transition, and Distribution (CTD) Composition (CTDC), Enhanced Grouped Amino Acid Composition (EGAAC), Amphiphilic Pseudo-Amino Acid Composition (APAAC), and Embedding Encoding methods to encode peptides, then constructed three base classifiers using one-dimensional (1D) convolutional neural network (CNN) and 2D-CNN, and finally utilized voting method to get the final results. K-fold cross-validation and independent testing showed that Deep_KsuccSite could serve as an effective tool to identify Ksucc sites in protein sequences. In addition, the ablation experiment results based on voting, feature combination, and model architecture showed that Deep_KsuccSite could make full use of the information of different features to construct an effective classifier. Taken together, we developed Deep_KsuccSite in this study, which was based on deep learning algorithm and could achieved better prediction accuracy than current methods for lysine succinylation sites. The code and dataset involved in this methodological study are permanently available at the URL https://github.com/flyinsky6/Deep_KsuccSite.
Collapse
Affiliation(s)
- Xin Liu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
- *Correspondence: Xin Liu, ; Liang Wang, ; Yong Liu,
| | - Lin-Lin Xu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
| | - Ya-Ping Lu
- College of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Ting Yang
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
| | - Xin-Yu Gu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
| | - Liang Wang
- Laboratory Medicine, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- *Correspondence: Xin Liu, ; Liang Wang, ; Yong Liu,
| | - Yong Liu
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Cancer Institute, Xuzhou Medical University, Xuzhou, Jiangsu, China
- *Correspondence: Xin Liu, ; Liang Wang, ; Yong Liu,
| |
Collapse
|