1
|
Ding H, Xing F, Zou L, Zhao L. QSAR analysis of VEGFR-2 inhibitors based on machine learning, Topomer CoMFA and molecule docking. BMC Chem 2024; 18:59. [PMID: 38555462 PMCID: PMC10981835 DOI: 10.1186/s13065-024-01165-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 03/12/2024] [Indexed: 04/02/2024] Open
Abstract
VEGFR-2 kinase inhibitors are clinically approved drugs that can effectively target cancer angiogenesis. However, such inhibitors have adverse effects such as skin toxicity, gastrointestinal reactions and hepatic impairment. In this study, machine learning and Topomer CoMFA, which is an alignment-dependent, descriptor-based method, were employed to build structural activity relationship models of potentially new VEGFR-2 inhibitors. The prediction ac-curacy of the training and test sets of the 2D-SAR model were 82.4 and 80.1%, respectively, with KNN. Topomer CoMFA approach was then used for 3D-QSAR modeling of VEGFR-2 inhibitors. The coefficient of q2 for cross-validation of the model 1 was greater than 0.5, suggesting that a stable drug activity-prediction model was obtained. Molecular docking was further performed to simulate the interactions between the five most promising compounds and VEGFR-2 target protein and the Total Scores were all greater than 6, indicating that they had a strong hydrogen bond interactions were present. This study successfully used machine learning to obtain five potentially novel VEGFR-2 inhibitors to increase our arsenal of drugs to combat cancer.
Collapse
Affiliation(s)
- Hao Ding
- Department of Ultrasound, Shengjing Hospital of China Medical University, Shenyang, 110004, Liaoning, China
| | - Fei Xing
- Department of Oncology, Shengjing Hospital of China Medical University, Shenyang, 110004, Liaoning, China
| | - Lin Zou
- Medical College of Guangxi University, Nanning, 530004, Guangxi, China
| | - Liang Zhao
- Hepatobiliary and Splenic Surgery Ward, Department of General Surgery, Shengjing Hospital of China Medical University, Shenyang, 110004, Liaoning, China.
| |
Collapse
|
2
|
Zhang ZY, Sun ZJ, Gao D, Hao YD, Lin H, Liu F. Excavation of gene markers associated with pancreatic ductal adenocarcinoma based on interrelationships of gene expression. IET Syst Biol 2024. [PMID: 38530028 DOI: 10.1049/syb2.12090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 02/06/2024] [Accepted: 03/10/2024] [Indexed: 03/27/2024] Open
Abstract
Pancreatic ductal adenocarcinoma (PDAC) accounts for 95% of all pancreatic cancer cases, posing grave challenges to its diagnosis and treatment. Timely diagnosis is pivotal for improving patient survival, necessitating the discovery of precise biomarkers. An innovative approach was introduced to identify gene markers for precision PDAC detection. The core idea of our method is to discover gene pairs that display consistent opposite relative expression and differential co-expression patterns between PDAC and normal samples. Reversal gene pair analysis and differential partial correlation analysis were performed to determine reversal differential partial correlation (RDC) gene pairs. Using incremental feature selection, the authors refined the selected gene set and constructed a machine-learning model for PDAC recognition. As a result, the approach identified 10 RDC gene pairs. And the model could achieve a remarkable accuracy of 96.1% during cross-validation, surpassing gene expression-based models. The experiment on independent validation data confirmed the model's performance. Enrichment analysis revealed the involvement of these genes in essential biological processes and shed light on their potential roles in PDAC pathogenesis. Overall, the findings highlight the potential of these 10 RDC gene pairs as effective diagnostic markers for early PDAC detection, bringing hope for improving patient prognosis and survival.
Collapse
Affiliation(s)
- Zhao-Yue Zhang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Zi-Jie Sun
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Dong Gao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yu-Duo Hao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fen Liu
- Department of Radiation Oncology, Peking University Cancer Hospital (Inner Mongolia Campus), Affiliated Cancer Hospital of Inner Mongolia Medical University, Inner Mongolia Cancer Hospital, Hohhot, China
| |
Collapse
|
3
|
Meng C, Yuan Y, Zhao H, Pei Y, Li Z. IIFS: An improved incremental feature selection method for protein sequence processing. Comput Biol Med 2023; 167:107654. [PMID: 37944304 DOI: 10.1016/j.compbiomed.2023.107654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/09/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023]
Abstract
MOTIVATION Discrete features can be obtained from protein sequences using a feature extraction method. These features are the basis of downstream processing of protein data, but it is necessary to screen and select some important features from them as they generally have data redundancy. RESULT Here, we report IIFS, an improved incremental feature selection method that exploits a new subset search strategy to find the optimal feature set. IIFS combines nonadjacent sorting features to prevent the drawbacks of data explosion and excessive reliance on feature sorting results. The comparative experimental results on 27 feature sorting data show that IIFS can find more accurate and important features compared to existing methods.The IIFS approach also handles data redundancy more efficiently and finds more representative and discriminatory features while ensuring minimal feature dimensionality and good evaluation metrics. Moreover, we wrap this method and deploy it on a web server for access at http://112.124.26.17:8005/.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China; Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, China
| | - Ye Yuan
- Beidahuang Industry Group General Hospital, Harbin, 150001, China
| | - Haiyan Zhao
- College of Integration of Traditional Chinese and Western Medicine to Southwest Medical University, Luzhou, Sichuan, 646000, China
| | - Yue Pei
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100190, China
| | - Zhi Li
- Department of Spleen and Stomach Diseases, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, Sichuan, 646000, China.
| |
Collapse
|
4
|
Liu B, Yang Z, Liu Q, Zhang Y, Ding H, Lai H, Li Q. Computational prediction of allergenic proteins based on multi-feature fusion. Front Genet 2023; 14:1294159. [PMID: 37928245 PMCID: PMC10622758 DOI: 10.3389/fgene.2023.1294159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 10/11/2023] [Indexed: 11/07/2023] Open
Abstract
Allergy is an autoimmune disorder described as an undesirable response of the immune system to typically innocuous substance in the environment. Studies have shown that the ability of proteins to trigger allergic reactions in susceptible individuals can be evaluated by bioinformatics tools. However, developing computational methods to accurately identify new allergenic proteins remains a vital challenge. This work aims to propose a machine learning model based on multi-feature fusion for predicting allergenic proteins efficiently. Firstly, we prepared a benchmark dataset of allergenic and non-allergenic protein sequences and pretested on it with a machine-learning platform. Then, three preferable feature extraction methods, including amino acid composition (AAC), dipeptide composition (DPC) and composition of k-spaced amino acid pairs (CKSAAP) were chosen to extract protein sequence features. Subsequently, these features were fused and optimized by Pearson correlation coefficient (PCC) and principal component analysis (PCA). Finally, the most representative features were picked out to build the optimal predictor based on random forest (RF) algorithm. Performance evaluation results via 5-fold cross-validation showed that the final model, called iAller (https://github.com/laihongyan/iAller), could precisely distinguish allergenic proteins from non-allergenic proteins. The prediction accuracy and AUC value for validation dataset achieved 91.4% and 0.97%, respectively. This model will provide guide for users to identify more allergenic proteins.
Collapse
Affiliation(s)
- Bin Liu
- Department of Anesthesiology, The Fourth People’s Hospital of Sichuan Province, Chengdu, Sichuan, China
| | - Ziman Yang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Qing Liu
- Department of Pain, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, Sichuan, China
| | - Ying Zhang
- Department of Anesthesiology, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, Sichuan, China
| | - Hui Ding
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hongyan Lai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Qun Li
- Department of Pain, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, Sichuan, China
- Research Center of Integrated Traditional Chinese and Western Medicine, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, Sichuan, China
| |
Collapse
|
5
|
Chen XG, Yang X, Li C, Lin X, Zhang W. Non-coding RNA identification with pseudo RNA sequences and feature representation learning. Comput Biol Med 2023; 165:107355. [PMID: 37639767 DOI: 10.1016/j.compbiomed.2023.107355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/16/2023] [Accepted: 08/12/2023] [Indexed: 08/31/2023]
Abstract
Distinguishing non-coding RNAs (ncRNAs) from coding RNAs is very important in bioinformatics. Although many methods have been proposed for solving this task, it remains highly challenging to further improve the accuracy of ncRNA identification. In this paper, we propose a coding potential predictor using feature representation learning based on pseudo RNA sequences named CPPFLPS. In this method, we use the pseudo RNA sequences generated by simulating RNA sequence mutations as new samples for data augmentation, and six string operations simulating RNA sequence mutations are considered: base replacement, base insertion, base deletion, subsequence reversion, subsequence repetition and subsequence deletion. In the feature representation learning framework, different types of pseudo RNA sequences are added to the training set to form new training sets that can be used to train baseline classifiers, thus obtaining baseline models. The resulting labels of these baseline models are used as feature vectors to represent RNA sequences, and the resulting feature vectors acquired after feature selection are used to train a predictive model for distinguishing ncRNAs from coding RNAs. Our method achieves better performance compared with that of existing state-of-the-art methods. The implementation of the proposed method is available at https://github.com/chenxgscuec/CPPFLPS.
Collapse
Affiliation(s)
- Xian-Gan Chen
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Xiaofei Yang
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Chenhong Li
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Xianguang Lin
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
6
|
Zhanga S, Yao Y, Wang J, Liang Y. Identification of DNA N4-methylcytosine sites based on multi-source features and gradient boosting decision tree. Anal Biochem 2022; 652:114746. [DOI: 10.1016/j.ab.2022.114746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 05/13/2022] [Accepted: 05/18/2022] [Indexed: 11/16/2022]
|