1
|
Chen M, Zhang X, Ju Y, Liu Q, Ding Y. iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:13829-13850. [PMID: 36654069 DOI: 10.3934/mbe.2022644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Biological sequence analysis is an important basic research work in the field of bioinformatics. With the explosive growth of data, machine learning methods play an increasingly important role in biological sequence analysis. By constructing a classifier for prediction, the input sequence feature vector is predicted and evaluated, and the knowledge of gene structure, function and evolution is obtained from a large amount of sequence information, which lays a foundation for researchers to carry out in-depth research. At present, many machine learning methods have been applied to biological sequence analysis such as RNA gene recognition and protein secondary structure prediction. As a biological sequence, RNA plays an important biological role in the encoding, decoding, regulation and expression of genes. The analysis of RNA data is currently carried out from the aspects of structure and function, including secondary structure prediction, non-coding RNA identification and functional site prediction. Pseudouridine (У) is the most widespread and rich RNA modification and has been discovered in a variety of RNAs. It is highly essential for the study of related functional mechanisms and disease diagnosis to accurately identify У sites in RNA sequences. At present, several computational approaches have been suggested as an alternative to experimental methods to detect У sites, but there is still potential for improvement in their performance. In this study, we present a model based on twin support vector machine (TWSVM) for У site identification. The model combines a variety of feature representation techniques and uses the max-relevance and min-redundancy methods to obtain the optimum feature subset for training. The independent testing accuracy is improved by 3.4% in comparison to current advanced У site predictors. The outcomes demonstrate that our model has better generalization performance and improves the accuracy of У site identification. iPseU-TWSVM can be a helpful tool to identify У sites.
Collapse
Affiliation(s)
- Mingshuai Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Xin Zhang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Qing Liu
- Department of Anesthesiology, Hospital (T.C.M) Affiliated to Southwest Medical University, Luzhou, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| |
Collapse
|
2
|
|
3
|
HKAM-MKM: A hybrid kernel alignment maximization-based multiple kernel model for identifying DNA-binding proteins. Comput Biol Med 2022; 145:105395. [PMID: 35334314 DOI: 10.1016/j.compbiomed.2022.105395] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/08/2022] [Accepted: 03/08/2022] [Indexed: 12/24/2022]
Abstract
The identification of DNA-binding proteins (DBPs) has always been a hot issue in the field of sequence classification. However, considering that the experimental identification method is very resource-intensive, the construction of a computational prediction model is worthwhile. This study developed and evaluated a hybrid kernel alignment maximization-based multiple kernel model (HKAM-MKM) for predicting DBPs. First, we collected two datasets and performed feature extraction on the sequences to obtain six feature groups, and then constructed the corresponding kernels. To ensure the effective utilisation of the base kernel and avoid ignoring the difference between the sample and its neighbours, we proposed local kernel alignment to calculate the kernel between the sample and its neighbours, with each sample as the centre. We combined the global and local kernel alignments to develop a hybrid kernel alignment model, and balance the relationship between the two through parameters. By maximising the hybrid kernel alignment value, we obtained the weight of each kernel and then linearly combined the kernels in the form of weights. Finally, the fused kernel was input into a support vector machine for training and prediction. Finally, in the independent test sets PDB186 and PDB2272, we obtained the highest Matthew's correlation coefficient (MCC) (0.768 and 0.5962, respectively) and the highest accuracy (87.1% and 78.43%, respectively), which were superior to the other predictors. Therefore, HKAM-MKM is an efficient prediction tool for DBPs.
Collapse
|
4
|
Wang X, Li Q, Liu Y, Du Z, Jin R. Drug repositioning of COVID-19 based on mixed graph network and ion channel. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:3269-3284. [PMID: 35341251 DOI: 10.3934/mbe.2022151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Research on the relationship between drugs and targets is the key to precision medicine. Ion channel is a kind of important drug targets. Aiming at the urgent needs of corona virus disease 2019 (COVID-19) treatment and drug development, this paper designed a mixed graph network model to predict the affinity between ion channel targets of COVID-19 and drugs. According to the simplified molecular input line entry specification (SMILES) code of drugs, firstly, the atomic features were extracted to construct the point sets, and edge sets were constructed according to atomic bonds. Then the undirected graph with atomic features was generated by RDKit tool and the graph attention layer was used to extract the drug feature information. Five ion channel target proteins were screened from the whole SARS-CoV-2 genome sequences of NCBI database, and the protein features were extracted by convolution neural network (CNN). Using attention mechanism and graph convolutional network (GCN), the extracted drug features and target features information were connected. After two full connection layers operation, the drug-target affinity was output, and model was obtained. Kiba dataset was used to train the model and determine the model parameters. Compared with DeepDTA, WideDTA, graph attention network (GAT), GCN and graph isomorphism network (GIN) models, it was proved that the mean square error (MSE) of the proposed model was decreased by 0.055, 0.04, 0.001, 0.046, 0.013 and the consistency index (CI) was increased by 0.028, 0.016, 0.003, 0.03 and 0.01, respectively. It can predict the drug-target affinity more accurately. According to the prediction results of drug-target affinity of SARS-CoV-2 ion channel targets, seven kinds of small molecule drugs acting on five ion channel targets were obtained, namely SCH-47112, Dehydroaltenusin, alternariol 5-o-sulfate, LPA1 antagonist 1, alternariol, butin, and AT-9283.These drugs provide a reference for drug repositioning and precise treatment of COVID-19.
Collapse
Affiliation(s)
- Xianfang Wang
- Henan Institute of Technology, Xinxiang 453003, China
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Qimeng Li
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Yifeng Liu
- Henan Institute of Technology, Xinxiang 453003, China
| | - Zhiyong Du
- Henan Institute of Technology, Xinxiang 453003, China
| | - Ruixia Jin
- SanQuan Medical College, Xinxiang 453003, China
| |
Collapse
|
5
|
Guo X, Zhou W, Yu Y, Cai Y, Zhang Y, Du A, Lu Q, Ding Y, Li C. Multiple Laplacian Regularized RBF Neural Network for Assessing Dry Weight of Patients With End-Stage Renal Disease. Front Physiol 2021; 12:790086. [PMID: 34966294 PMCID: PMC8711098 DOI: 10.3389/fphys.2021.790086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 11/17/2021] [Indexed: 11/28/2022] Open
Abstract
Dry weight (DW) is an important dialysis index for patients with end-stage renal disease. It can guide clinical hemodialysis. Brain natriuretic peptide, chest computed tomography image, ultrasound, and bioelectrical impedance analysis are key indicators (multisource information) for assessing DW. By these approaches, a trial-and-error method (traditional measurement method) is employed to assess DW. The assessment of clinician is time-consuming. In this study, we developed a method based on artificial intelligence technology to estimate patient DW. Based on the conventional radial basis function neural (RBFN) network, we propose a multiple Laplacian-regularized RBFN (MLapRBFN) model to predict DW of patient. Compared with other model and body composition monitor, our method achieves the lowest value (1.3226) of root mean square error. In Bland-Altman analysis of MLapRBFN, the number of out agreement interval is least (17 samples). MLapRBFN integrates multiple Laplace regularization terms, and employs an efficient iterative algorithm to solve the model. The ratio of out agreement interval is 3.57%, which is lower than 5%. Therefore, our method can be tentatively applied for clinical evaluation of DW in hemodialysis patients.
Collapse
Affiliation(s)
- Xiaoyi Guo
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Wei Zhou
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yan Yu
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yinghua Cai
- Department of Nursing, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yuan Zhang
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Aiyan Du
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Qun Lu
- Department of Nursing, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Chao Li
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| |
Collapse
|
6
|
Jia Y, Huang S, Zhang T. KK-DBP: A Multi-Feature Fusion Method for DNA-Binding Protein Identification Based on Random Forest. Front Genet 2021; 12:811158. [PMID: 34912382 PMCID: PMC8667860 DOI: 10.3389/fgene.2021.811158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 11/15/2021] [Indexed: 02/04/2023] Open
Abstract
DNA-binding protein (DBP) is a protein with a special DNA binding domain that is associated with many important molecular biological mechanisms. Rapid development of computational methods has made it possible to predict DBP on a large scale; however, existing methods do not fully integrate DBP-related features, resulting in rough prediction results. In this article, we develop a DNA-binding protein identification method called KK-DBP. To improve prediction accuracy, we propose a feature extraction method that fuses multiple PSSM features. The experimental results show a prediction accuracy on the independent test dataset PDB186 of 81.22%, which is the highest of all existing methods.
Collapse
Affiliation(s)
- Yuran Jia
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Shan Huang
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Tianjiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| |
Collapse
|
7
|
Liu X, Zhang X, Zhang Y, Ding Y, Shan W, Huang Y, Wang L, Guo X. Kernelized k-Local Hyperplane Distance Nearest-Neighbor Model for Predicting Cerebrovascular Disease in Patients With End-Stage Renal Disease. Front Neurosci 2021; 15:773208. [PMID: 34759797 PMCID: PMC8573245 DOI: 10.3389/fnins.2021.773208] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 10/04/2021] [Indexed: 11/30/2022] Open
Abstract
Detecting and treating cerebrovascular diseases are essential for the survival of patients with chronic kidney disease (CKD). Machine learning algorithms can be used to effectively predict stroke risk in patients with end-stage renal disease (ESRD). An imbalance in the amount of collected data associated with different risk levels can influence the classification task. Therefore, we propose the use of a kernelized k-local hyperplane nearest-neighbor model (KHKNN) for the effective prediction of stroke risk in patients with ESRD. We compared our proposed method with other conventional machine learning methods, which revealed that our method could effectively perform the task of classifying stroke risk.
Collapse
Affiliation(s)
- Xiaobin Liu
- Department of Nephrology, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Xiran Zhang
- Department of Nephrology, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yi Zhang
- NHC Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular Nuclear Medicine, Jiangsu Institute of Nuclear Medicine, Wuxi, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Weiwei Shan
- Department of Nephrology, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yiqing Huang
- Department of Nephrology, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Liang Wang
- Department of Nephrology, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Xiaoyi Guo
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
8
|
A PLA2R-IgG4 Antibody-Based Predictive Model for Assessing Risk Stratification of Idiopathic Membranous Nephropathy. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:1521013. [PMID: 34512932 PMCID: PMC8424241 DOI: 10.1155/2021/1521013] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 08/19/2021] [Accepted: 08/20/2021] [Indexed: 11/22/2022]
Abstract
Background Known as an autoimmune glomerular disease, idiopathic membranous nephropathy (IMN) is considered to be associated with phospholipase A2 receptor (PLA2R) in terms of the main pathogenesis. The quantitative detection of serum PLA2R-IgG and PLA2R-IgG4 antibodies by time-resolved fluoroimmunoassay (TRFIA) was determined, and the value of them, both in the clinical prediction of risk stratification in IMN, was observed in this study. Methods 95 patients with IMN proved by renal biopsy were enrolled, who had tested positive for serum PLA2R antibodies by ELISA, and the quantitative detection of serum PLA2R-IgG and PLA2R-IgG4 antibodies was achieved by TRFIA. All the patients were divided into low-, medium-, and high-risk groups, respectively, which were set as dependent variables, according to proteinuria and renal function. Random forest (RF) was used to estimate the value of serum PLA2R-IgG and PLA2R-IgG4 in predicting the risk stratification of progression in IMN. Results Out-of-bag estimates of variable importance in RF were employed to evaluate the impact of each input variable on the final classification accuracy. The variable of albumin, PLA2R-IgG, and PLA2R-IgG4 had high values (>0.3) of 0.3156, 0.3981, and 0.7682, respectively, which meant that these three were more important for the risk stratification of progression in IMN. In order to further assess the contribution of PLA2R-IgG and PLA2R-IgG4 to the model, we built four different models and found that PLA2R-IgG4 played an important role in improving the predictive ability of the model. Conclusions In this study, we established a random forest model to evaluate the value of serum PLA2R-IgG4 antibodies in predicting risk stratification of IMN. Compared with PLA2R-IgG, PLA2R-IgG4 is a more efficient biomarker in predicting the risk of progression in IMN.
Collapse
|
9
|
Assessing the Adequacy of Hemodialysis Patients via the Graph-Based Takagi-Sugeno-Kang Fuzzy System. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:9036322. [PMID: 34367320 PMCID: PMC8337127 DOI: 10.1155/2021/9036322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 07/10/2021] [Indexed: 01/09/2023]
Abstract
Maintenance hemodialysis is the main method for the treatment of end-stage renal disease in China. The Kt/V value is the gold standard of hemodialysis adequacy. However, Kt/V requires repeated blood drawing and evaluation; it is hard to monitor dialysis adequacy frequently. In order to meet the need for repeated clinical assessments of dialysis adequacy, we want to find a noninvasive way to assess dialysis adequacy. Therefore, we collect some clinically relevant data and develop a machine learning- (ML-) based model to predict dialysis adequacy for clinical hemodialysis patients. We collect 250 patients, including gender, age, ultrafiltration (UF), predialysis body weight (preBW), postdialysis body weights (postBW), blood pressure (BP), heart rate (HR), and blood flow (BF). An efficient graph-based Takagi-Sugeno-Kang Fuzzy System (G-TSK-FS) model is proposed to predict the dialysis adequacy of hemodialysis patients. The root mean square error (RMSE) of our model is 0.1578. The proposed model can be used as a feasible method to predict dialysis adequacy, providing a new way for clinical practice. Our G-TSK-FS model could be used as a feasible method to predict dialysis adequacy, providing a new way for clinical practice.
Collapse
|
10
|
A Self-Representation-Based Fuzzy SVM Model for Predicting Vascular Calcification of Hemodialysis Patients. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:2464821. [PMID: 34367315 PMCID: PMC8337133 DOI: 10.1155/2021/2464821] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/30/2021] [Accepted: 07/08/2021] [Indexed: 01/09/2023]
Abstract
In end-stage renal disease (ESRD), vascular calcification risk factors are essential for the survival of hemodialysis patients. To effectively assess the level of vascular calcification, the machine learning algorithm can be used to predict the vascular calcification risk in ESRD patients. As the amount of collected data is unbalanced under different risk levels, it has an influence on the classification task. So, an effective fuzzy support vector machine based on self-representation (FSVM-SR) is proposed to predict vascular calcification risk in this work. In addition, our method is also compared with other conventional machine learning methods, and the results show that our method can better complete the classification task of the vascular calcification risk.
Collapse
|
11
|
Assessing Dry Weight of Hemodialysis Patients via Sparse Laplacian Regularized RVFL Neural Network with L 2,1-Norm. BIOMED RESEARCH INTERNATIONAL 2021; 2021:6627650. [PMID: 33628794 PMCID: PMC7880720 DOI: 10.1155/2021/6627650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2021] [Revised: 01/21/2021] [Accepted: 01/25/2021] [Indexed: 11/28/2022]
Abstract
Dry weight is the normal weight of hemodialysis patients after hemodialysis. If the amount of water in diabetes is too much (during hemodialysis), the patient will experience hypotension and shock symptoms. Therefore, the correct assessment of the patient's dry weight is clinically important. These methods all rely on professional instruments and technicians, which are time-consuming and labor-intensive. To avoid this limitation, we hope to use machine learning methods on patients. This study collected demographic and anthropometric data of 476 hemodialysis patients, including age, gender, blood pressure (BP), body mass index (BMI), years of dialysis (YD), and heart rate (HR). We propose a Sparse Laplacian regularized Random Vector Functional Link (SLapRVFL) neural network model on the basis of predecessors. When we evaluate the prediction performance of the model, we fully compare SLapRVFL with the Body Composition Monitor (BCM) instrument and other models. The Root Mean Square Error (RMSE) of SLapRVFL is 1.3136, which is better than other methods. The SLapRVFL neural network model could be a viable alternative of dry weight assessment.
Collapse
|