1
|
Zhang HQ, Liu SH, Li R, Yu JW, Ye DX, Yuan SS, Lin H, Huang CB, Tang H. MIBPred: Ensemble Learning-Based Metal Ion-Binding Protein Classifier. ACS OMEGA 2024; 9:8439-8447. [PMID: 38405489 PMCID: PMC10882704 DOI: 10.1021/acsomega.3c09587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/16/2024] [Accepted: 01/22/2024] [Indexed: 02/27/2024]
Abstract
In biological organisms, metal ion-binding proteins participate in numerous metabolic activities and are closely associated with various diseases. To accurately predict whether a protein binds to metal ions and the type of metal ion-binding protein, this study proposed a classifier named MIBPred. The classifier incorporated advanced Word2Vec technology from the field of natural language processing to extract semantic features of the protein sequence language and combined them with position-specific score matrix (PSSM) features. Furthermore, an ensemble learning model was employed for the metal ion-binding protein classification task. In the model, we independently trained XGBoost, LightGBM, and CatBoost algorithms and integrated the output results through an SVM voting mechanism. This innovative combination has led to a significant breakthrough in the predictive performance of our model. As a result, we achieved accuracies of 95.13% and 85.19%, respectively, in predicting metal ion-binding proteins and their types. Our research not only confirms the effectiveness of Word2Vec technology in extracting semantic information from protein sequences but also highlights the outstanding performance of the MIBPred classifier in the problem of metal ion-binding protein types. This study provides a reliable tool and method for the in-depth exploration of the structure and function of metal ion-binding proteins.
Collapse
Affiliation(s)
- Hong-Qi Zhang
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Shang-Hua Liu
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Rui Li
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Jun-Wen Yu
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Dong-Xin Ye
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Shi-Shi Yuan
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Hao Lin
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School
of Computer Science and Technology, Aba Teachers University, Aba 623002, China
| | - Hua Tang
- School
of Basic Medical Sciences, Southwest Medical
University, Luzhou 646000, China
- Central
Nervous System Drug Key Laboratory of Sichuan Province, Luzhou 646000, China
| |
Collapse
|
2
|
Ding Y, Zhou H, Zou Q, Yuan L. Identification of drug-side effect association via correntropy-loss based matrix factorization with neural tangent kernel. Methods 2023; 219:73-81. [PMID: 37783242 DOI: 10.1016/j.ymeth.2023.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/18/2023] [Accepted: 09/20/2023] [Indexed: 10/04/2023] Open
Abstract
Adverse drug reactions include side effects, allergic reactions, and secondary infections. Severe adverse reactions can cause cancer, deformity, or mutation. The monitoring of drug side effects is an important support for post marketing safety supervision of drugs, and an important basis for revising drug instructions. Its purpose is to timely detect and control drug safety risks. Traditional methods are time-consuming. To accelerate the discovery of side effects, we propose a machine learning based method, called correntropy-loss based matrix factorization with neural tangent kernel (CLMF-NTK), to solve the prediction of drug side effects. Our method and other computational methods are tested on three benchmark datasets, and the results show that our method achieves the best predictive performance.
Collapse
Affiliation(s)
- Yijie Ding
- Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou 571158, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China; School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Hongmei Zhou
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, 100# Minjiang Main Road, Quzhou 324000, China.
| |
Collapse
|
3
|
Su W, Qian X, Yang K, Ding H, Huang C, Zhang Z. Recognition of outer membrane proteins using multiple feature fusion. Front Genet 2023; 14:1211020. [PMID: 37351347 PMCID: PMC10284346 DOI: 10.3389/fgene.2023.1211020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 05/24/2023] [Indexed: 06/24/2023] Open
Abstract
Introduction: Outer membrane proteins are crucial in maintaining the structural stability and permeability of the outer membrane. Outer membrane proteins exhibit several functions such as antigenicity and strong immunogenicity, which have potential applications in clinical diagnosis and disease prevention. However, wet experiments for studying OMPs are time and capital-intensive, thereby necessitating the use of computational methods for their identification. Methods: In this study, we developed a computational model to predict outer membrane proteins. The non-redundant dataset consists of a positive set of 208 outer membrane proteins and a negative set of 876 non-outer membrane proteins. In this study, we employed the pseudo amino acid composition method to extract feature vectors and subsequently utilized the support vector machine for prediction. Results and Discussion: In the Jackknife cross-validation, the overall accuracy and the area under receiver operating characteristic curve were observed to be 93.19% and 0.966, respectively. These results demonstrate that our model can produce accurate predictions, and could serve as a valuable guide for experimental research on outer membrane proteins.
Collapse
Affiliation(s)
- Wenxia Su
- College of Science, Inner Mongolia Agriculture University, Hohhot, China
| | - Xiaojun Qian
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Keli Yang
- Nonlinear Research Institute, Baoji University of Arts and Sciences, Baoji, China
| | - Hui Ding
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chengbing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| | - Zhaoyue Zhang
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| |
Collapse
|
4
|
Ming Y, Liu H, Cui Y, Guo S, Ding Y, Liu R. Identification of DNA-binding proteins by Kernel Sparse Representation via L 2,1-matrix norm. Comput Biol Med 2023; 159:106849. [PMID: 37060772 DOI: 10.1016/j.compbiomed.2023.106849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 02/26/2023] [Accepted: 03/30/2023] [Indexed: 04/17/2023]
Abstract
An understanding of DNA-binding proteins is helpful in exploring the role that proteins play in cell biology. Furthermore, the prediction of DNA-binding proteins is essential for the chemical modification and structural composition of DNA, and is of great importance in protein functional analysis and drug design. In recent years, DNA-binding protein prediction has typically used machine learning-based methods. The prediction accuracy of various classifiers has improved considerably, but researchers continue to spend time and effort on improving prediction performance. In this paper, we combine protein sequence evolutionary information with a classification method based on kernel sparse representation for the prediction of DNA-binding proteins, and based on the field of machine learning, a model for the identification of DNA-binding proteins by sequence information was finally proposed. Based on the confirmation of the final experimental results, we achieved good prediction accuracy on both the PDB1075 and PDB186 datasets. Our training result for cross-validation on PDB1075 was 81.37%, and our independent test result on PDB186 was 83.9%, both of which outperformed the other methods to some extent. Therefore, the proposed method in this paper is proven to be effective and feasible for predicting DNA-binding proteins.
Collapse
Affiliation(s)
- Yutong Ming
- School of Computer Science and Engineering, Beijing Technology and Business University, China
| | - Hongzhi Liu
- School of Computer Science and Engineering, Beijing Technology and Business University, China
| | - Yizhi Cui
- School of Computer Science and Engineering, Beijing Technology and Business University, China
| | - Shaoyong Guo
- Beijing University of Posts and Telecommunications, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China.
| | - Ruijun Liu
- School of Computer Science and Engineering, Beijing Technology and Business University, China.
| |
Collapse
|
5
|
Li Y, Ma D, Chen D, Chen Y. ACP-GBDT: An improved anticancer peptide identification method with gradient boosting decision tree. Front Genet 2023; 14:1165765. [PMID: 37065496 PMCID: PMC10090421 DOI: 10.3389/fgene.2023.1165765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 03/09/2023] [Indexed: 03/31/2023] Open
Abstract
Cancer is one of the most dangerous diseases in the world, killing millions of people every year. Drugs composed of anticancer peptides have been used to treat cancer with low side effects in recent years. Therefore, identifying anticancer peptides has become a focus of research. In this study, an improved anticancer peptide predictor named ACP-GBDT, based on gradient boosting decision tree (GBDT) and sequence information, is proposed. To encode the peptide sequences included in the anticancer peptide dataset, ACP-GBDT uses a merged-feature composed of AAIndex and SVMProt-188D. A GBDT is adopted to train the prediction model in ACP-GBDT. Independent testing and ten-fold cross-validation show that ACP-GBDT can effectively distinguish anticancer peptides from non-anticancer ones. The comparison results of the benchmark dataset show that ACP-GBDT is simpler and more effective than other existing anticancer peptide prediction methods.
Collapse
Affiliation(s)
- Yanjuan Li
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Di Ma
- College of Computer, Hangzhou Dianzi University, Hangzhou, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
- *Correspondence: Dong Chen, ; Yu Chen,
| | - Yu Chen
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
- *Correspondence: Dong Chen, ; Yu Chen,
| |
Collapse
|
6
|
Guan S, Qian Y, Jiang T, Jiang M, Ding Y, Wu H. MV-H-RKM: A Multiple View-Based Hypergraph Regularized Restricted Kernel Machine for Predicting DNA-Binding Proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1246-1256. [PMID: 35731758 DOI: 10.1109/tcbb.2022.3183191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
DNA-binding proteins (DBPs) have a significant impact on many life activities, so identification of DBPs is a crucial issue. And it is greatly helpful to understand the mechanism of protein-DNA interactions. In traditional experimental methods, it is significant time-consuming and labor-consuming to identify DBPs. In recent years, many researchers have proposed lots of different DBP identification methods based on machine learning algorithm to overcome shortcomings mentioned above. However, most existing methods cannot get satisfactory results. In this paper, we focus on developing a new predictor of DBPs, called Multi-View Hypergraph Restricted Kernel Machines (MV-H-RKM). In this method, we extract five features from the three views of the proteins. To fuse these features, we couple them by means of the shared hidden vector. Besides, we employ the hypergraph regularization to enforce the structure consistency between original features and the hidden vector. Experimental results show that the accuracy of MV-H-RKM is 84.09% and 85.48% on PDB1075 and PDB186 data set respectively, and demonstrate that our proposed method performs better than other state-of-the-art approaches. The code is publicly available at https://github.com/ShixuanGG/MV-H-RKM.
Collapse
|
7
|
Su W, Deng S, Gu Z, Yang K, Ding H, Chen H, Zhang Z. Prediction of apoptosis protein subcellular location based on amphiphilic pseudo amino acid composition. Front Genet 2023; 14:1157021. [PMID: 36926588 PMCID: PMC10011625 DOI: 10.3389/fgene.2023.1157021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 02/20/2023] [Indexed: 03/08/2023] Open
Abstract
Introduction: Apoptosis proteins play an important role in the process of cell apoptosis, which makes the rate of cell proliferation and death reach a relative balance. The function of apoptosis protein is closely related to its subcellular location, it is of great significance to study the subcellular locations of apoptosis proteins. Many efforts in bioinformatics research have been aimed at predicting their subcellular location. However, the subcellular localization of apoptotic proteins needs to be carefully studied. Methods: In this paper, based on amphiphilic pseudo amino acid composition and support vector machine algorithm, a new method was proposed for the prediction of apoptosis proteins\x{2019} subcellular location. Results and Discussion: The method achieved good performance on three data sets. The Jackknife test accuracy of the three data sets reached 90.5%, 93.9% and 84.0%, respectively. Compared with previous methods, the prediction accuracies of APACC_SVM were improved.
Collapse
Affiliation(s)
- Wenxia Su
- College of Science, Inner Mongolia Agriculture University, Hohhot, China
| | - Shuyi Deng
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhifeng Gu
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Keli Yang
- Nonlinear Research Institute, Baoji University of Arts and Sciences, Baoji, China
| | - Hui Ding
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Chen
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Zhaoyue Zhang
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| |
Collapse
|
8
|
Chen M, Zhang X, Ju Y, Liu Q, Ding Y. iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:13829-13850. [PMID: 36654069 DOI: 10.3934/mbe.2022644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Biological sequence analysis is an important basic research work in the field of bioinformatics. With the explosive growth of data, machine learning methods play an increasingly important role in biological sequence analysis. By constructing a classifier for prediction, the input sequence feature vector is predicted and evaluated, and the knowledge of gene structure, function and evolution is obtained from a large amount of sequence information, which lays a foundation for researchers to carry out in-depth research. At present, many machine learning methods have been applied to biological sequence analysis such as RNA gene recognition and protein secondary structure prediction. As a biological sequence, RNA plays an important biological role in the encoding, decoding, regulation and expression of genes. The analysis of RNA data is currently carried out from the aspects of structure and function, including secondary structure prediction, non-coding RNA identification and functional site prediction. Pseudouridine (У) is the most widespread and rich RNA modification and has been discovered in a variety of RNAs. It is highly essential for the study of related functional mechanisms and disease diagnosis to accurately identify У sites in RNA sequences. At present, several computational approaches have been suggested as an alternative to experimental methods to detect У sites, but there is still potential for improvement in their performance. In this study, we present a model based on twin support vector machine (TWSVM) for У site identification. The model combines a variety of feature representation techniques and uses the max-relevance and min-redundancy methods to obtain the optimum feature subset for training. The independent testing accuracy is improved by 3.4% in comparison to current advanced У site predictors. The outcomes demonstrate that our model has better generalization performance and improves the accuracy of У site identification. iPseU-TWSVM can be a helpful tool to identify У sites.
Collapse
Affiliation(s)
- Mingshuai Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Xin Zhang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Qing Liu
- Department of Anesthesiology, Hospital (T.C.M) Affiliated to Southwest Medical University, Luzhou, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| |
Collapse
|
9
|
Lu W, Shen J, Zhang Y, Wu H, Qian Y, Chen X, Fu Q. Identifying Membrane Protein Types Based on Lifelong Learning With Dynamically Scalable Networks. Front Genet 2022; 12:834488. [PMID: 35371189 PMCID: PMC8964460 DOI: 10.3389/fgene.2021.834488] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 12/21/2021] [Indexed: 11/13/2022] Open
Abstract
Membrane proteins are an essential part of the body's ability to maintain normal life activities. Further research into membrane proteins, which are present in all aspects of life science research, will help to advance the development of cells and drugs. The current methods for predicting proteins are usually based on machine learning, but further improvements in prediction effectiveness and accuracy are needed. In this paper, we propose a dynamic deep network architecture based on lifelong learning in order to use computers to classify membrane proteins more effectively. The model extends the application area of lifelong learning and provides new ideas for multiple classification problems in bioinformatics. To demonstrate the performance of our model, we conducted experiments on top of two datasets and compared them with other classification methods. The results show that our model achieves high accuracy (95.3 and 93.5%) on benchmark datasets and is more effective compared to other methods.
Collapse
Affiliation(s)
- Weizhong Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China.,Suzhou Key Laboratory of Virtual Reality Intelligent Interaction and Application Technology, Suzhou University of Science and Technology, Suzhou, China.,Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, China
| | - Jiawei Shen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Yu Zhang
- Suzhou Industrial Park Institute of Services Outsourcing, Suzhou, China
| | - Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China.,Suzhou Key Laboratory of Virtual Reality Intelligent Interaction and Application Technology, Suzhou University of Science and Technology, Suzhou, China
| | - Yuqing Qian
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Xiaoyi Chen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Qiming Fu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| |
Collapse
|
10
|
RFLMDA: A Novel Reinforcement Learning-Based Computational Model for Human MicroRNA-Disease Association Prediction. Biomolecules 2021; 11:biom11121835. [PMID: 34944479 PMCID: PMC8699433 DOI: 10.3390/biom11121835] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 12/01/2021] [Accepted: 12/02/2021] [Indexed: 11/23/2022] Open
Abstract
Numerous studies have confirmed that microRNAs play a crucial role in the research of complex human diseases. Identifying the relationship between miRNAs and diseases is important for improving the treatment of complex diseases. However, traditional biological experiments are not without restrictions. It is an urgent necessity for computational simulation to predict unknown miRNA-disease associations. In this work, we combine Q-learning algorithm of reinforcement learning to propose a RFLMDA model, three submodels CMF, NRLMF, and LapRLS are fused via Q-learning algorithm to obtain the optimal weight S. The performance of RFLMDA was evaluated through five-fold cross-validation and local validation. As a result, the optimal weight is obtained as S (0.1735, 0.2913, 0.5352), and the AUC is 0.9416. By comparing the experiments with other methods, it is proved that RFLMDA model has better performance. For better validate the predictive performance of RFLMDA, we use eight diseases for local verification and carry out case study on three common human diseases. Consequently, all the top 50 miRNAs related to Colorectal Neoplasms and Breast Neoplasms have been confirmed. Among the top 50 miRNAs related to Colon Neoplasms, Gastric Neoplasms, Pancreatic Neoplasms, Kidney Neoplasms, Esophageal Neoplasms, and Lymphoma, we confirm 47, 41, 49, 46, 46 and 48 miRNAs respectively.
Collapse
|