1
|
Su W, Xie XQ, Liu XW, Gao D, Ma CY, Zulfiqar H, Yang H, Lin H, Yu XL, Li YW. iRNA-ac4C: A novel computational method for effectively detecting N4-acetylcytidine sites in human mRNA. Int J Biol Macromol 2023; 227:1174-1181. [PMID: 36470433 DOI: 10.1016/j.ijbiomac.2022.11.299] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 11/10/2022] [Accepted: 11/25/2022] [Indexed: 12/07/2022]
Abstract
RNA N4-acetylcytidine (ac4C) is the acetylation of cytidine at the nitrogen-4 position, which is a highly conserved RNA modification and involves a variety of biological processes. Hence, accurate identification of genome-wide ac4C sites is vital for understanding regulation mechanism of gene expression. In this work, a novel predictor, named iRNA-ac4C, was established to identify ac4C sites in human mRNA based on three feature extraction methods, including nucleotide composition, nucleotide chemical property, and accumulated nucleotide frequency. Subsequently, minimum-Redundancy-Maximum-Relevance combined with incremental feature selection strategies was utilized to select the optimal feature subset. According to the optimal feature subset, the best ac4C classification model was trained by gradient boosting decision tree with 10-fold cross-validation. The results of independent testing set indicated that our proposed method could produce encouraging generalization capabilities. For the convenience of other researchers, we established a user-friendly web server which is freely available at http://lin-group.cn/server/iRNA-ac4C/. We hope that the tool could provide guide for wet-experimental scholars.
Collapse
Affiliation(s)
- Wei Su
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Xue-Qin Xie
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Xiao-Wei Liu
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Dong Gao
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Cai-Yi Ma
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Hasan Zulfiqar
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Hui Yang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Hao Lin
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Xiao-Long Yu
- School of Materials Science and Engineering, Hainan University, Haikou 570228, China.
| | - Yan-Wen Li
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China; Key Laboratory of Intelligent Information Processing of Jilin Province, Northeast Normal University, Changchun 130117, China; Institute of Computational Biology, Northeast Normal University, Changchun 130117, China.
| |
Collapse
|
2
|
Zulfiqar H, Yuan SS, Huang QL, Sun ZJ, Dao FY, Yu XL, Lin H. Identification of cyclin protein using gradient boost decision tree algorithm. Comput Struct Biotechnol J 2021; 19:4123-4131. [PMID: 34527186 PMCID: PMC8346528 DOI: 10.1016/j.csbj.2021.07.013] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 07/15/2021] [Accepted: 07/15/2021] [Indexed: 12/12/2022] Open
Abstract
Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shi-Shi Yuan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Qin-Lai Huang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Jie Sun
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiao-Long Yu
- School of Materials Science and Engineering, Hainan University, Haikou 570228, China
| | - Hao Lin
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
3
|
LC-MS/MS-Based Quantitative Proteomics Analysis of Different Stages of Non-Small-Cell Lung Cancer. BIOMED RESEARCH INTERNATIONAL 2021; 2021:5561569. [PMID: 33728331 PMCID: PMC7937045 DOI: 10.1155/2021/5561569] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 01/25/2021] [Accepted: 02/07/2021] [Indexed: 12/11/2022]
Abstract
Lung cancer has a higher incidence rate and mortality rate than all other cancers. Early diagnosis and treatment of lung cancer remain a major challenge, and the 5-year survival rate of its patients is only 15%. Basic and clinical research, especially the discovery of biomarkers, is crucial for improving the diagnosis and treatment of lung cancer patients. To identify novel biomarkers for lung cancer, we used the iTRAQ8-plex labeling technology combined with liquid chromatography-tandem mass spectrometry (LC-MS/MS) to analyze the serum and urine of patients with different stages of lung adenocarcinoma and healthy individuals. A total of 441 proteins were identified in the serum, and 1,161 proteins were identified in the urine. The levels of elongation factor 1-alpha 2, proteasome subunit alpha type, and spermatogenesis-associated protein increased significantly in the serum of patients with lung cancer compared with those in healthy controls. The levels of transmembrane protein 143, cadherin 5, fibronectin 1, and collectin-11 decreased significantly in the serum of patients with metastases compared with those of nonmetastatic lung cancer patients. In the urine of stage III and IV lung cancer patients, the prostate-specific antigen and prostatic acid phosphatase decreased significantly, whereas neutrophil defensin 1 increased significantly. The results of LC-MS/MS were confirmed by enzyme-linked immunosorbent assay (ELISA) for transmembrane protein 143, cadherin 5, fibronectin 1, and collectin-11 in the serum. These proteins may be a potential early diagnosis and metastasis biomarkers for lung adenocarcinoma. Furthermore, the relative content of these markers in the serum and urine could be used to determine the progression of lung adenocarcinoma and achieve accurate staging and diagnosis.
Collapse
|