1
|
Chu DH, An JY, Nie XM. An Effective Computational Method for Predicting Self-Interacting Proteins Based on VGGNet Convolutional Neural Network and Gray-Level Co-occurrence Matrix. Evol Bioinform Online 2024; 20:11769343241292224. [PMID: 39464790 PMCID: PMC11503870 DOI: 10.1177/11769343241292224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 10/01/2024] [Indexed: 10/29/2024] Open
Abstract
Introduction Predicting Self-interacting proteins (SIPs) is a crucial area of research in predicting protein functions, as well as in understanding gene-disease and disease-drug associations. These interactions are integral to numerous cellular processes and play pivotal roles within cells. However, traditional methods for identifying SIPs through biological experiments are often expensive, time-consuming, and have long cycles. Therefore, the development of effective computational methods for accurately predicting SIPs is not only necessary but also presents a significant challenge. Results In this research, we introduce a novel computational prediction technique, VGGNGLCM, which leverages protein sequence data. This method integrates the VGGNet deep convolutional neural network (VGGN) with the Gray-Level Co-occurrence Matrix (GLCM) to detect Self-interacting proteins associations. Specifically, we initially utilized Position Specific Scoring Matrix (PSSM) to capture protein evolutionary information and integrated key features from PSSM using GLCM. We then employed VGGNet as a predictive classifier, leveraging its capabilities for powerful learning and classification prediction. Subsequently, the extracted features were input into the VGGNet deep convolutional neural network to identify Self-interacting proteins. To evaluate the performance of the VGGNGLCM model, we conducted experiments using yeast and human datasets, achieving average accuracies of 95.68% and 97.72% respectively. Additionally, we compared the prediction performance of the VGGNet classifier with that of the Convolutional Neural Network (CNN) and the state-of-the-art Support Vector Machine (SVM) using the same feature extraction method. We also compared the prediction ability of VGGNGLCM with other existing approaches. The comparison results further demonstrate the superior performance of VGGNGLCM over other prediction models in this domain. Conclusion The experimental verification further strengthens the evidence that VGGNGLCM is effective and robust compared to existing methods. It also highlights the high accuracy and robustness of the VGGNGLCM model in predicting Self-interacting proteins (SIPs). Consequently, we believe that the VGGNGLCM method serves as a valuable computational tool and can catalyze extensive bioinformatics research related to SIPs prediction.
Collapse
Affiliation(s)
- Dan-Hua Chu
- School of Mathematics, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Xiao-Mei Nie
- The Library of China University of Mining and Technology, Xuzhou, Jiangsu, China
| |
Collapse
|
2
|
Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier. BMC Bioinformatics 2022; 23:518. [PMID: 36457083 PMCID: PMC9713954 DOI: 10.1186/s12859-022-04880-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 08/03/2022] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Self-interacting proteins (SIPs), two or more copies of the protein that can interact with each other expressed by one gene, play a central role in the regulation of most living cells and cellular functions. Although numerous SIPs data can be provided by using high-throughput experimental techniques, there are still several shortcomings such as in time-consuming, costly, inefficient, and inherently high in false-positive rates, for the experimental identification of SIPs even nowadays. Therefore, it is more and more significant how to develop efficient and accurate automatic approaches as a supplement of experimental methods for assisting and accelerating the study of predicting SIPs from protein sequence information. RESULTS In this paper, we present a novel framework, termed GLCM-WSRC (gray level co-occurrence matrix-weighted sparse representation based classification), for predicting SIPs automatically based on protein evolutionary information from protein primary sequences. More specifically, we firstly convert the protein sequence into Position Specific Scoring Matrix (PSSM) containing protein sequence evolutionary information, exploiting the Position Specific Iterated BLAST (PSI-BLAST) tool. Secondly, using an efficient feature extraction approach, i.e., GLCM, we extract abstract salient and invariant feature vectors from the PSSM, and then perform a pre-processing operation, the adaptive synthetic (ADASYN) technique, to balance the SIPs dataset to generate new feature vectors for classification. Finally, we employ an efficient and reliable WSRC model to identify SIPs according to the known information of self-interacting and non-interacting proteins. CONCLUSIONS Extensive experimental results show that the proposed approach exhibits high prediction performance with 98.10% accuracy on the yeast dataset, and 91.51% accuracy on the human dataset, which further reveals that the proposed model could be a useful tool for large-scale self-interacting protein prediction and other bioinformatics tasks detection in the future.
Collapse
|
3
|
Qin X, Zhang L, Liu M, Xu Z, Liu G. ASFold-DNN: Protein Fold Recognition Based on Evolutionary Features With Variable Parameters Using Full Connected Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2712-2722. [PMID: 34133282 DOI: 10.1109/tcbb.2021.3089168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Protein fold recognition contribute to comprehend the function of proteins, which is of great help to the gene therapy of diseases and the development of new drugs. Researchers have been working in this direction and have made considerable achievements, but challenges still exist on low sequence similarity datasets. In this study, we propose the ASFold-DNN framework for protein fold recognition research. Above all, four groups of evolutionary features are extracted from the primary structures of proteins, and a preliminary selection of variable parameter is made for two groups of features including ACC _HMM and SXG _HMM, respectively. Then several feature selection algorithms are selected for comparison and the best feature selection scheme is obtained by changing their internal threshold values. Finally, multiple hyper-parameters of Full Connected Neural Network are fully optimized to construct the best model. DD, EDD and TG datasets with low sequence similarities are chosen to evaluate the performance of the models constructed by the framework, and the final prediction accuracy are 85.28, 95.00 and 88.84 percent, respectively. Furthermore, the ASTRAL186 and LE datasets are introduced to further verify the generalization ability of our proposed framework. Comprehensive experimental results prove that the ASFold-DNN framework is more prominent than the state-of-the-art studies on protein fold recognition. The source code and data of ASFold-DNN can be downloaded from https://github.com/Bioinformatics-Laboratory/project/tree/master/ASFold.
Collapse
|
4
|
Wang Y, Wang LL, Wong L, Li Y, Wang L, You ZH. SIPGCN: A Novel Deep Learning Model for Predicting Self-Interacting Proteins from Sequence Information Using Graph Convolutional Networks. Biomedicines 2022; 10:biomedicines10071543. [PMID: 35884848 PMCID: PMC9313220 DOI: 10.3390/biomedicines10071543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
Protein is the basic organic substance that constitutes the cell and is the material condition for the life activity and the guarantee of the biological function activity. Elucidating the interactions and functions of proteins is a central task in exploring the mysteries of life. As an important protein interaction, self-interacting protein (SIP) has a critical role. The fast growth of high-throughput experimental techniques among biomolecules has led to a massive influx of available SIP data. How to conduct scientific research using the massive amount of SIP data has become a new challenge that is being faced in related research fields such as biology and medicine. In this work, we design an SIP prediction method SIPGCN using a deep learning graph convolutional network (GCN) based on protein sequences. First, protein sequences are characterized using a position-specific scoring matrix, which is able to describe the biological evolutionary message, then their hidden features are extracted by the deep learning method GCN, and, finally, the random forest is utilized to predict whether there are interrelationships between proteins. In the cross-validation experiment, SIPGCN achieved 93.65% accuracy and 99.64% specificity in the human data set. SIPGCN achieved 90.69% and 99.08% of these two indicators in the yeast data set, respectively. Compared with other feature models and previous methods, SIPGCN showed excellent results. These outcomes suggest that SIPGCN may be a suitable instrument for predicting SIP and may be a reliable candidate for future wet experiments.
Collapse
Affiliation(s)
- Ying Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
| | - Lin-Lin Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Correspondence: (L.-L.W.); (L.W.)
| | - Leon Wong
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- Correspondence: (L.-L.W.); (L.W.)
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
5
|
Jia LN, Yan X, You ZH, Zhou X, Li LP, Wang L, Song KJ. NLPEI: A Novel Self-Interacting Protein Prediction Model Based on Natural Language Processing and Evolutionary Information. Evol Bioinform Online 2020; 16:1176934320984171. [PMID: 33488064 PMCID: PMC7768313 DOI: 10.1177/1176934320984171] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 12/01/2020] [Indexed: 12/13/2022] Open
Abstract
The study of protein self-interactions (SIPs) can not only reveal the function of proteins at the molecular level, but is also crucial to understand activities such as growth, development, differentiation, and apoptosis, providing an important theoretical basis for exploring the mechanism of major diseases. With the rapid advances in biotechnology, a large number of SIPs have been discovered. However, due to the long period and high cost inherent to biological experiments, the gap between the identification of SIPs and the accumulation of data is growing. Therefore, fast and accurate computational methods are needed to effectively predict SIPs. In this study, we designed a new method, NLPEI, for predicting SIPs based on natural language understanding theory and evolutionary information. Specifically, we first understand the protein sequence as natural language and use natural language processing algorithms to extract its features. Then, we use the Position-Specific Scoring Matrix (PSSM) to represent the evolutionary information of the protein and extract its features through the Stacked Auto-Encoder (SAE) algorithm of deep learning. Finally, we fuse the natural language features of proteins with evolutionary features and make accurate predictions by Extreme Learning Machine (ELM) classifier. In the SIPs gold standard data sets of human and yeast, NLPEI achieved 94.19% and 91.29% prediction accuracy. Compared with different classifier models, different feature models, and other existing methods, NLPEI obtained the best results. These experimental results indicated that NLPEI is an effective tool for predicting SIPs and can provide reliable candidates for biological experiments.
Collapse
Affiliation(s)
- Li-Na Jia
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
| | - Xin Yan
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
- School of Foreign Languages, Zaozhuang University, Zaozhuang, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Xi Zhou
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Li-Ping Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- Lei Wang, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.
| | - Ke-Jian Song
- School of information engineering, Jiangxi University of Science and Technology, Ganzhou, China
| |
Collapse
|
6
|
An JY, Zhou Y, Yan ZJ, Zhao YJ. Predicting Self-Interacting Proteins Using a Recurrent Neural Network and Protein Evolutionary Information. Evol Bioinform Online 2020; 16:1176934320924674. [PMID: 32550764 PMCID: PMC7278102 DOI: 10.1177/1176934320924674] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 04/16/2020] [Indexed: 11/15/2022] Open
Abstract
Self-interacting proteins (SIPs) play crucial roles in biological activities of organisms. Many high-throughput methods can be used to identify SIPs. However, these methods are both time-consuming and expensive. How to develop effective computational approaches for identifying SIPs is a challenging task. In the article, we present a novel computational method called RRN-SIFT, which combines the recurrent neural network (RNN) with scale invariant feature transform (SIFT) to predict SIPs based on protein evolutionary information. The main advantage of the proposed RNN-SIFT model is that it uses SIFT for extracting key feature by exploring the evolutionary information embedded in Position-Specific Iterated BLAST-constructed position-specific scoring matrix and employs an RNN classifier to perform classification based on extracted features. Extensive experiments show that the RRN-SIFT obtained average accuracy of 94.34% and 97.12% on the yeast and human dataset, respectively. We also compared our performance with the back propagation neural network (BPNN), the state-of-the-art support vector machine (SVM), and other existing methods. By comparing with experimental results, the performance of RNN-SIFT is significantly better than that of the BPNN, SVM, and other previous methods in the domain. Therefore, we conclude that the proposed RNN-SIFT model is a useful tool for predicting SIPs, as well to solve other bioinformatics tasks. To facilitate widely studies and encourage future proteomics research, a freely available web server called RNN-SIFT-SIPs was developed at http://219.219.62.123:8888/RNNSIFT/ including the source code and the SIP datasets.
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China
University of Mining and Technology, Xuzhou, China
| | - Yong Zhou
- School of Computer Science and Technology, China
University of Mining and Technology, Xuzhou, China
| | - Zi-Ji Yan
- School of Computer Science and Technology, China
University of Mining and Technology, Xuzhou, China
| | - Yu-Jun Zhao
- School of Computer Science and Technology, China
University of Mining and Technology, Xuzhou, China
| |
Collapse
|
7
|
Qu J, Zhao Y, Zhang L, Cai SB, Ming Z, Wang CC. Computational Models for Self-Interacting Proteins Prediction. Protein Pept Lett 2019; 27:392-399. [PMID: 31880240 DOI: 10.2174/0929866527666191227141713] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 11/19/2019] [Accepted: 11/21/2019] [Indexed: 11/22/2022]
Abstract
Self-Interacting Proteins (SIPs), whose two or more copies can interact with each other, have significant roles in cellular functions and evolution of Protein Interaction Networks (PINs). Knowing whether a protein can act on itself is important to understand its functions. Previous studies on SIPs have focused on their structures and functions, while their whole properties are less emphasized. Not surprisingly, identifying SIPs is one of the most important works in biomedical research, which will help to understanding the function and mechanism of proteins. It is worth noting that high throughput methods can be used for SIPs prediction, but can be costly, time consuming and challenging. Therefore, it is urgent to design computational models for the identification of SIPs. In this review, the concept and function of SIPs were introduced in detail. We further introduced SIPs data and some excellent computational models that have been designed for SIPs prediction. Specially, the most existing approaches were developed based on machine learning through carrying out different extract feature methods. Finally, we discussed several difficult problems in developing computational models for SIPs prediction.
Collapse
Affiliation(s)
- Jia Qu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Shu-Bin Cai
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Zhong Ming
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Chun-Chun Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
8
|
Chen ZH, You ZH, Li LP, Wang YB, Qiu Y, Hu PW. Identification of self-interacting proteins by integrating random projection classifier and finite impulse response filter. BMC Genomics 2019; 20:928. [PMID: 31881833 PMCID: PMC6933882 DOI: 10.1186/s12864-019-6301-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Background Identification of protein-protein interactions (PPIs) is crucial for understanding biological processes and investigating the cellular functions of genes. Self-interacting proteins (SIPs) are those in which more than two identical proteins can interact with each other and they are the specific type of PPIs. More and more researchers draw attention to the SIPs detection, and several prediction model have been proposed, but there are still some problems. Hence, there is an urgent need to explore a efficient computational model for SIPs prediction. Results In this study, we developed an effective model to predict SIPs, called RP-FIRF, which merges the Random Projection (RP) classifier and Finite Impulse Response Filter (FIRF) together. More specifically, each protein sequence was firstly transformed into the Position Specific Scoring Matrix (PSSM) by exploiting Position Specific Iterated BLAST (PSI-BLAST). Then, to effectively extract the discriminary SIPs feature to improve the performance of SIPs prediction, a FIRF method was used on PSSM. The R’classifier was proposed to execute the classification and predict novel SIPs. We evaluated the performance of the proposed RP-FIRF model and compared it with the state-of-the-art support vector machine (SVM) on human and yeast datasets, respectively. The proposed model can achieve high average accuracies of 97.89 and 97.35% using five-fold cross-validation. To further evaluate the high performance of the proposed method, we also compared it with other six exiting methods, the experimental results demonstrated that the capacity of our model surpass that of the other previous approaches. Conclusion Experimental results show that self-interacting proteins are accurately well-predicted by the proposed model on human and yeast datasets, respectively. It fully show that the proposed model can predict the SIPs effectively and sufficiently. Thus, RP-FIRF model is an automatic decision support method which should provide useful insights into the recognition of SIPs.
Collapse
Affiliation(s)
- Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Li-Ping Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Yan-Bin Wang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Yu Qiu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | | |
Collapse
|
9
|
Chen ZH, You ZH, Zhang WB, Wang YB, Cheng L, Alghazzawi D. Global Vectors Representation of Protein Sequences and Its Application for Predicting Self-Interacting Proteins with Multi-Grained Cascade Forest Model. Genes (Basel) 2019; 10:genes10110924. [PMID: 31726752 PMCID: PMC6896115 DOI: 10.3390/genes10110924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Revised: 11/05/2019] [Accepted: 11/06/2019] [Indexed: 11/22/2022] Open
Abstract
Self-interacting proteins (SIPs) is of paramount importance in current molecular biology. There have been developed a number of traditional biological experiment methods for predicting SIPs in the past few years. However, these methods are costly, time-consuming and inefficient, and often limit their usage for predicting SIPs. Therefore, the development of computational method emerges at the times require. In this paper, we for the first time proposed a novel deep learning model which combined natural language processing (NLP) method for potential SIPs prediction from the protein sequence information. More specifically, the protein sequence is de novo assembled by k-mers. Then, we obtained the global vectors representation for each protein sequences by using natural language processing (NLP) technique. Finally, based on the knowledge of known self-interacting and non-interacting proteins, a multi-grained cascade forest model is trained to predict SIPs. Comprehensive experiments were performed on yeast and human datasets, which obtained an accuracy rate of 91.45% and 93.12%, respectively. From our evaluations, the experimental results show that the use of amino acid semantics information is very helpful for addressing the problem of sequences containing both self-interacting and non-interacting pairs of proteins. This work would have potential applications for various biological classification problems.
Collapse
Affiliation(s)
- Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (Z.-H.C.); (W.-B.Z.); (Y.-B.W.); (L.C.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (Z.-H.C.); (W.-B.Z.); (Y.-B.W.); (L.C.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Correspondence: or ; Tel.: +86-991-3835-823
| | - Wen-Bo Zhang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (Z.-H.C.); (W.-B.Z.); (Y.-B.W.); (L.C.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yan-Bin Wang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (Z.-H.C.); (W.-B.Z.); (Y.-B.W.); (L.C.)
| | - Li Cheng
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (Z.-H.C.); (W.-B.Z.); (Y.-B.W.); (L.C.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Daniyal Alghazzawi
- Department of Information Systems, King Abdulaziz University, Jeddah 21589, Saudi Arabia;
| |
Collapse
|
10
|
Chen ZH, Li LP, He Z, Zhou JR, Li Y, Wong L. An Improved Deep Forest Model for Predicting Self-Interacting Proteins From Protein Sequence Using Wavelet Transformation. Front Genet 2019; 10:90. [PMID: 30881376 PMCID: PMC6405691 DOI: 10.3389/fgene.2019.00090] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 01/29/2019] [Indexed: 12/23/2022] Open
Abstract
Self-interacting proteins (SIPs), whose more than two identities can interact with each other, play significant roles in the understanding of cellular process and cell functions. Although a number of experimental methods have been designed to detect the SIPs, they remain to be extremely time-consuming, expensive, and challenging even nowadays. Therefore, there is an urgent need to develop the computational methods for predicting SIPs. In this study, we propose a deep forest based predictor for accurate prediction of SIPs using protein sequence information. More specifically, a novel feature representation method, which integrate position-specific scoring matrix (PSSM) with wavelet transform, is introduced. To evaluate the performance of the proposed method, cross-validation tests are performed on two widely used benchmark datasets. The experimental results show that the proposed model achieved high accuracies of 95.43 and 93.65% on human and yeast datasets, respectively. The AUC value for evaluating the performance of the proposed method was also reported. The AUC value for yeast and human datasets are 0.9203 and 0.9586, respectively. To further show the advantage of the proposed method, it is compared with several existing methods. The results demonstrate that the proposed model is better than other SIPs prediction methods. This work can offer an effective architecture to biologists in detecting new SIPs.
Collapse
Affiliation(s)
- Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Li-Ping Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Zhou He
- College of Engineering and Applied Science, University of Colorado Boulder, Boulder, CO, United States
| | - Ji-Ren Zhou
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Yangming Li
- ECTET, Rochester Institute of Technology, Rochester, NY, United States
| | - Leon Wong
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
11
|
Chen ZH, You ZH, Li LP, Wang YB, Wong L, Yi HC. Prediction of Self-Interacting Proteins from Protein Sequence Information Based on Random Projection Model and Fast Fourier Transform. Int J Mol Sci 2019; 20:ijms20040930. [PMID: 30795499 PMCID: PMC6412412 DOI: 10.3390/ijms20040930] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 01/06/2019] [Accepted: 01/07/2019] [Indexed: 12/30/2022] Open
Abstract
It is significant for biological cells to predict self-interacting proteins (SIPs) in the field of bioinformatics. SIPs mean that two or more identical proteins can interact with each other by one gene expression. This plays a major role in the evolution of protein‒protein interactions (PPIs) and cellular functions. Owing to the limitation of the experimental identification of self-interacting proteins, it is more and more significant to develop a useful biological tool for the prediction of SIPs from protein sequence information. Therefore, we propose a novel prediction model called RP-FFT that merges the Random Projection (RP) model and Fast Fourier Transform (FFT) for detecting SIPs. First, each protein sequence was transformed into a Position Specific Scoring Matrix (PSSM) using the Position Specific Iterated BLAST (PSI-BLAST). Second, the features of protein sequences were extracted by the FFT method on PSSM. Lastly, we evaluated the performance of RP-FFT and compared the RP classifier with the state-of-the-art support vector machine (SVM) classifier and other existing methods on the human and yeast datasets; after the five-fold cross-validation, the RP-FFT model can obtain high average accuracies of 96.28% and 91.87% on the human and yeast datasets, respectively. The experimental results demonstrated that our RP-FFT prediction model is reasonable and robust.
Collapse
Affiliation(s)
- Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Li-Ping Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
| | - Yan-Bin Wang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
| | - Leon Wong
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Hai-Cheng Yi
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
12
|
Yan K, Fang X, Xu Y, Liu B. Protein fold recognition based on multi-view modeling. Bioinformatics 2019; 35:2982-2990. [DOI: 10.1093/bioinformatics/btz040] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Revised: 12/29/2018] [Accepted: 01/16/2019] [Indexed: 12/22/2022] Open
Abstract
Abstract
Motivation
Protein fold recognition has attracted increasing attention because it is critical for studies of the 3D structures of proteins and drug design. Researchers have been extensively studying this important task, and several features with high discriminative power have been proposed. However, the development of methods that efficiently combine these features to improve the predictive performance remains a challenging problem.
Results
In this study, we proposed two algorithms: MV-fold and MT-fold. MV-fold is a new computational predictor based on the multi-view learning model for fold recognition. Different features of proteins were treated as different views of proteins, including the evolutionary information, secondary structure information and physicochemical properties. These different views constituted the latent space. The ε-dragging technique was employed to enlarge the margins between different protein folds, improving the predictive performance of MV-fold. Then, MV-fold was combined with two template-based methods: HHblits and HMMER. The ensemble method is called MT-fold incorporating the advantages of both discriminative methods and template-based methods. Experimental results on five widely used benchmark datasets (DD, RDD, EDD, TG and LE) showed that the proposed methods outperformed some state-of-the-art methods in this field, indicating that MV-fold and MT-fold are useful computational tools for protein fold recognition and protein homology detection and would be efficient tools for protein sequence analysis. Finally, we constructed an update and rigorous benchmark dataset based on SCOPe (version 2.07) to fairly evaluate the performance of the proposed method, and our method achieved stable performance on this new dataset. This new benchmark dataset will become a widely used benchmark dataset to fairly evaluate the performance of different methods for fold recognition.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ke Yan
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Xiaozhao Fang
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
| | - Yong Xu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
13
|
An JY, Zhou Y, Zhang L, Niu Q, Wang DF. Improving Self-interacting Proteins Prediction Accuracy Using Protein Evolutionary Information and Weighed-Extreme Learning Machine. Curr Bioinform 2019. [DOI: 10.2174/1574893613666180209161152] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Background:
Self Interacting Proteins (SIPs) play an essential role in various aspects of the
structural and functional organization of the cell.
Objective:
In the study, we presented a novelty sequence-based computational approach for predicting
Self-interacting proteins using Weighed-Extreme Learning Machine (WELM) model combined with an
Autocorrelation (AC) descriptor protein feature representation.
Method:
The major advantage of the proposed method mainly lies in adopting an effective feature
extraction method to represent candidate self-interacting proteins by using the evolutionary information
embedded in PSI-BLAST-constructed Position Specific Scoring Matrix (PSSM); and then employing a
reliable and effective WELM classifier to perform classify.
</P><P>
Result: In order to evaluate the performance, the proposed approach is applied to yeast and human SIP
datasets. The experimental results show that our method obtained 93.43% and 98.15% prediction
accuracies on yeast and human dataset, respectively. Extensive experiments are carried out to compare
our approach with the SVM classifier and existing sequence-based method on yeast and human dataset.
Experimental results show that the performance of our method is better than several other state-of-theart
methods.
Conclusion:
It is demonstrated that the proposed method is suitable for SIPs detection and can execute
incredibly well for identifying Sips. In order to facilitate extensive studies for future proteomics
research, we developed a freely available web server called WELM-AC-SIPs in Hypertext Preprocessor
(PHP) for predicting SIPs. The web server including source code and the datasets are available at
http://219.219.62.123:8888/WELMAC/.
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 21116, China
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 21116, China
| | - Lei Zhang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 21116, China
| | - Qiang Niu
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 21116, China
| | - Da-Fu Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 21116, China
| |
Collapse
|
14
|
Wang YB, You ZH, Li X, Jiang TH, Cheng L, Chen ZH. Prediction of protein self-interactions using stacked long short-term memory from protein sequences information. BMC SYSTEMS BIOLOGY 2018; 12:129. [PMID: 30577794 PMCID: PMC6302371 DOI: 10.1186/s12918-018-0647-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
BACKGROUND Self-interacting Proteins (SIPs) plays a critical role in a series of life function in most living cells. Researches on SIPs are important part of molecular biology. Although numerous SIPs data be provided, traditional experimental methods are labor-intensive, time-consuming and costly and can only yield limited results in real-world needs. Hence,it's urgent to develop an efficient computational SIPs prediction method to fill the gap. Deep learning technologies have proven to produce subversive performance improvements in many areas, but the effectiveness of deep learning methods for SIPs prediction has not been verified. RESULTS We developed a deep learning model for predicting SIPs by constructing a Stacked Long Short-Term Memory (SLSTM) neural network that contains "dropout". We extracted features from protein sequences using a novel feature extraction scheme that combined Zernike Moments (ZMs) with Position Specific Weight Matrix (PSWM). The capability of the proposed approach was assessed on S.erevisiae and Human SIPs datasets. The result indicates that the approach based on deep learning can effectively resist data skew and achieve good accuracies of 95.69 and 97.88%, respectively. To demonstrate the progressiveness of deep learning, we compared the results of the SLSTM-based method and the celebrated Support Vector Machine (SVM) method and several other well-known methods on the same datasets. CONCLUSION The results show that our method is overall superior to any of the other existing state-of-the-art techniques. As far as we know, this study first applies deep learning method to predict SIPs, and practical experimental results reveal its potential in SIPs identification.
Collapse
Affiliation(s)
- Yan-Bin Wang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011 China
- University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011 China
| | - Xiao Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011 China
| | - Tong-Hai Jiang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011 China
| | - Li Cheng
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011 China
| | - Zhan-Heng Chen
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011 China
- University of Chinese Academy of Sciences, Beijing, 100049 China
| |
Collapse
|
15
|
Wang YB, You ZH, Li LP, Huang DS, Zhou FF, Yang S. Improving Prediction of Self-interacting Proteins Using Stacked Sparse Auto-Encoder with PSSM profiles. Int J Biol Sci 2018; 14:983-991. [PMID: 29989064 PMCID: PMC6036743 DOI: 10.7150/ijbs.23817] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Accepted: 03/29/2018] [Indexed: 12/05/2022] Open
Abstract
Self-interacting proteins (SIPs) play a significant role in the execution of most important molecular processes in cells, such as signal transduction, gene expression regulation, immune response and enzyme activation. Although the traditional experimental methods can be used to generate SIPs data, it is very expensive and time-consuming based only on biological technique. Therefore, it is important and urgent to develop an efficient computational method for SIPs detection. In this study, we present a novel SIPs identification method based on machine learning technology by combing the Zernike Moments (ZMs) descriptor on Position Specific Scoring Matrix (PSSM) with Probabilistic Classification Vector Machines (PCVM) and Stacked Sparse Auto-Encoder (SSAE). More specifically, an efficient feature extraction technique called ZMs is firstly utilized to generate feature vectors on Position Specific Scoring Matrix (PSSM); Then, Deep neural network is employed for reducing the feature dimensions and noise; Finally, the Probabilistic Classification Vector Machine is used to execute the classification. The prediction performance of the proposed method is evaluated on S.erevisiae and Human SIPs datasets via cross-validation. The experimental results indicate that the proposed method can achieve good accuracies of 92.55% and 97.47%, respectively. To further evaluate the advantage of our scheme for SIPs prediction, we also compared the PCVM classifier with the Support Vector Machine (SVM) and other existing techniques on the same data sets. Comparison results reveal that the proposed strategy is outperforms other methods and could be a used tool for identifying SIPs.
Collapse
Affiliation(s)
- Yan-Bin Wang
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - Li-Ping Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Caoan Road 4800, Shanghai 201804, China
| | - Feng-Feng Zhou
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Shan Yang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| |
Collapse
|
16
|
Li JQ, You ZH, Li X, Ming Z, Chen X. PSPEL: In Silico Prediction of Self-Interacting Proteins from Amino Acids Sequences Using Ensemble Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1165-1172. [PMID: 28092572 DOI: 10.1109/tcbb.2017.2649529] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Self interacting proteins (SIPs) play an important role in various aspects of the structural and functional organization of the cell. Detecting SIPs is one of the most important issues in current molecular biology. Although a large number of SIPs data has been generated by experimental methods, wet laboratory approaches are both time-consuming and costly. In addition, they yield high false negative and positive rates. Thus, there is a great need for in silico methods to predict SIPs accurately and efficiently. In this study, a new sequence-based method is proposed to predict SIPs. The evolutionary information contained in Position-Specific Scoring Matrix (PSSM) is extracted from of protein with known sequence. Then, features are fed to an ensemble classifier to distinguish the self-interacting and non-self-interacting proteins. When performed on Saccharomyces cerevisiae and Human SIPs data sets, the proposed method can achieve high accuracies of 86.86 and 91.30 percent, respectively. Our method also shows a good performance when compared with the SVM classifier and previous methods. Consequently, the proposed method can be considered to be a novel promising tool to predict SIPs.
Collapse
|
17
|
An JY, Zhang L, Zhou Y, Zhao YJ, Wang DF. Computational methods using weighed-extreme learning machine to predict protein self-interactions with protein evolutionary information. J Cheminform 2017; 9:47. [PMID: 29086182 PMCID: PMC5561767 DOI: 10.1186/s13321-017-0233-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 08/05/2017] [Indexed: 02/07/2023] Open
Abstract
Self-interactions Proteins (SIPs) is important for their biological activity owing to the inherent interaction amongst their secondary structures or domains. However, due to the limitations of experimental Self-interactions detection, one major challenge in the study of prediction SIPs is how to exploit computational approaches for SIPs detection based on evolutionary information contained protein sequence. In the work, we presented a novel computational approach named WELM-LAG, which combined the Weighed-Extreme Learning Machine (WELM) classifier with Local Average Group (LAG) to predict SIPs based on protein sequence. The major improvement of our method lies in presenting an effective feature extraction method used to represent candidate Self-interactions proteins by exploring the evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix (PSSM); and then employing a reliable and robust WELM classifier to carry out classification. In addition, the Principal Component Analysis (PCA) approach is used to reduce the impact of noise. The WELM-LAG method gave very high average accuracies of 92.94 and 96.74% on yeast and human datasets, respectively. Meanwhile, we compared it with the state-of-the-art support vector machine (SVM) classifier and other existing methods on human and yeast datasets, respectively. Comparative results indicated that our approach is very promising and may provide a cost-effective alternative for predicting SIPs. In addition, we developed a freely available web server called WELM-LAG-SIPs to predict SIPs. The web server is available at http://219.219.62.123:8888/WELMLAG/ .
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| | - Lei Zhang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| | - Yu-Jun Zhao
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| | - Da-Fu Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| |
Collapse
|
18
|
An JY, You ZH, Chen X, Huang DS, Li ZW, Liu G, Wang Y. Identification of self-interacting proteins by exploring evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix. Oncotarget 2016; 7:82440-82449. [PMID: 27732957 PMCID: PMC5347703 DOI: 10.18632/oncotarget.12517] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 09/28/2016] [Indexed: 01/31/2023] Open
Abstract
Self-interacting Proteins (SIPs) play an essential role in a wide range of biological processes, such as gene expression regulation, signal transduction, enzyme activation and immune response. Because of the limitations for experimental self-interaction proteins identification, developing an effective computational method based on protein sequence to detect SIPs is much important. In the study, we proposed a novel computational approach called RVMBIGP that combines the Relevance Vector Machine (RVM) model and Bi-gram probability (BIGP) to predict SIPs based on protein sequence. The proposed prediction model includes as following steps: (1) an effective feature extraction method named BIGP is used to represent protein sequences on Position Specific Scoring Matrix (PSSM); (2) Principal Component Analysis (PCA) method is employed for integrating the useful information and reducing the influence of noise; (3) the robust classifier Relevance Vector Machine (RVM) is used to carry out classification. When performed on yeast and human datasets, the proposed RVMBIGP model can achieve very high accuracies of 95.48% and 98.80%, respectively. The experimental results show that our proposed method is very promising and may provide a cost-effective alternative for SIPs identification. In addition, to facilitate extensive studies for future proteomics research, the RVMBIGP server is freely available for academic use at http://219.219.62.123:8888/RVMBIGP.
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - Xing Chen
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - De-Shuang Huang
- School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Zheng-Wei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China
| | - Gang Liu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Yin Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China
| |
Collapse
|