1
|
Mohammadi A, Zahiri J, Mohammadi S, Khodarahmi M, Arab SS. PSSMCOOL: A Comprehensive R Package for Generating Evolutionary-based Descriptors of Protein Sequences from PSSM Profiles. BIOLOGY METHODS AND PROTOCOLS 2022; 7:bpac008. [PMID: 35388370 PMCID: PMC8977839 DOI: 10.1093/biomethods/bpac008] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 01/21/2022] [Indexed: 11/14/2022]
Abstract
Position-specific scoring matrix (PSSM), also called profile, is broadly used for representing the evolutionary history of a given protein sequence. Several investigations reported that the PSSM-based feature descriptors can improve the prediction of various protein attributes such as interaction, function, subcellular localization, secondary structure, disorder regions, and accessible surface area. While plenty of algorithms have been suggested for extracting evolutionary features from PSSM in recent years, there is not any integrated standalone tool for providing these descriptors. Here, we introduce PSSMCOOL, a flexible comprehensive R package that generates 38 PSSM-based feature vectors. To our best knowledge, PSSMCOOL is the first PSSM-based feature extraction tool implemented in R. With the growing demand for exploiting machine-learning algorithms in computational biology, this package would be a practical tool for machine-learning predictions.
Collapse
Affiliation(s)
- Alireza Mohammadi
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Javad Zahiri
- Department of Neuroscience, University of California San Diego, California, USA
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Saber Mohammadi
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Mohsen Khodarahmi
- Department of Radiology, Shahid Madani Hospital, Karaj, Iran
- Bahar Medical Imaging Center, Karaj, Iran
- Dr. Khodarahmi Medical Imaging Center, Karaj, Iran
| | - Seyed Shahriar Arab
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
2
|
Xu H, Xu D, Zhang N, Zhang Y, Gao R. Protein-Protein Interaction Prediction Based on Spectral Radius and General Regression Neural Network. J Proteome Res 2021; 20:1657-1665. [PMID: 33555893 DOI: 10.1021/acs.jproteome.0c00871] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Protein-protein interaction (PPI) not only plays a critical role in cell life activities, but also plays an important role in discovering the mechanism of biological activity, protein function, and disease states. Developing computational methods is of great significance for PPIs prediction since experimental methods are time-consuming and laborious. In this paper, we proposed a PPI prediction algorithm called GRNN-PPI only using the amino acid sequence information based on general regression neural network and two feature extraction methods. Specifically, we designed a new feature extraction method named Mutation Spectral Radius (MSR) to extract evolutionary information by the BLOSUM62 matrix. Meanwhile, we integrated another feature extraction method, autocorrelation description, which can completely extract information on physicochemical properties and protein sequences. The principal component analysis was applied to eliminate noise, and the general regression neural network was adopted as a classifier. The prediction accuracy of the yeast, human, and Helicobacter pylori1 (H. pylori1) data sets were 97.47%, 99.63%, and 99.97%, respectively. In addition, we also conducted experiments on two important PPI networks and six independent data sets. All results were significantly higher than some state-of-the-art methods used for comparison, showing that our method is feasible and robust.
Collapse
Affiliation(s)
- Hanxiao Xu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Da Xu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Naiqian Zhang
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan 250061, China
| |
Collapse
|
3
|
An JY, Meng FR, Yan ZJ. An efficient computational method for predicting drug-target interactions using weighted extreme learning machine and speed up robot features. BioData Min 2021; 14:3. [PMID: 33472664 PMCID: PMC7816443 DOI: 10.1186/s13040-021-00242-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 01/10/2021] [Indexed: 01/09/2023] Open
Abstract
Background Prediction of novel Drug–Target interactions (DTIs) plays an important role in discovering new drug candidates and finding new proteins to target. In consideration of the time-consuming and expensive of experimental methods. Therefore, it is a challenging task that how to develop efficient computational approaches for the accurate predicting potential associations between drug and target. Results In the paper, we proposed a novel computational method called WELM-SURF based on drug fingerprints and protein evolutionary information for identifying DTIs. More specifically, for exploiting protein sequence feature, Position Specific Scoring Matrix (PSSM) is applied to capturing protein evolutionary information and Speed up robot features (SURF) is employed to extract sequence key feature from PSSM. For drug fingerprints, the chemical structure of molecular substructure fingerprints was used to represent drug as feature vector. Take account of the advantage that the Weighted Extreme Learning Machine (WELM) has short training time, good generalization ability, and most importantly ability to efficiently execute classification by optimizing the loss function of weight matrix. Therefore, the WELM classifier is used to carry out classification based on extracted features for predicting DTIs. The performance of the WELM-SURF model was evaluated by experimental validations on enzyme, ion channel, GPCRs and nuclear receptor datasets by using fivefold cross-validation test. The WELM-SURF obtained average accuracies of 93.54, 90.58, 85.43 and 77.45% on enzyme, ion channels, GPCRs and nuclear receptor dataset respectively. We also compared our performance with the Extreme Learning Machine (ELM), the state-of-the-art Support Vector Machine (SVM) on enzyme and ion channels dataset and other exiting methods on four datasets. By comparing with experimental results, the performance of WELM-SURF is significantly better than that of ELM, SVM and other previous methods in the domain. Conclusion The results demonstrated that the proposed WELM-SURF model is competent for predicting DTIs with high accuracy and robustness. It is anticipated that the WELM-SURF method is a useful computational tool to facilitate widely bioinformatics studies related to DTIs prediction.
Collapse
Affiliation(s)
- Ji-Yong An
- Engineering Research Center of Mine Digitalization (China University of Mining and Technology), Ministry of Education, Xuzhou, China. .,School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China.
| | - Fan-Rong Meng
- Engineering Research Center of Mine Digitalization (China University of Mining and Technology), Ministry of Education, Xuzhou, China.,School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China
| | - Zi-Ji Yan
- Engineering Research Center of Mine Digitalization (China University of Mining and Technology), Ministry of Education, Xuzhou, China.,School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China
| |
Collapse
|
4
|
An JY, Zhou Y, Yan ZJ, Zhao YJ. Predicting Self-Interacting Proteins Using a Recurrent Neural Network and Protein Evolutionary Information. Evol Bioinform Online 2020; 16:1176934320924674. [PMID: 32550764 PMCID: PMC7278102 DOI: 10.1177/1176934320924674] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 04/16/2020] [Indexed: 11/15/2022] Open
Abstract
Self-interacting proteins (SIPs) play crucial roles in biological activities of organisms. Many high-throughput methods can be used to identify SIPs. However, these methods are both time-consuming and expensive. How to develop effective computational approaches for identifying SIPs is a challenging task. In the article, we present a novel computational method called RRN-SIFT, which combines the recurrent neural network (RNN) with scale invariant feature transform (SIFT) to predict SIPs based on protein evolutionary information. The main advantage of the proposed RNN-SIFT model is that it uses SIFT for extracting key feature by exploring the evolutionary information embedded in Position-Specific Iterated BLAST-constructed position-specific scoring matrix and employs an RNN classifier to perform classification based on extracted features. Extensive experiments show that the RRN-SIFT obtained average accuracy of 94.34% and 97.12% on the yeast and human dataset, respectively. We also compared our performance with the back propagation neural network (BPNN), the state-of-the-art support vector machine (SVM), and other existing methods. By comparing with experimental results, the performance of RNN-SIFT is significantly better than that of the BPNN, SVM, and other previous methods in the domain. Therefore, we conclude that the proposed RNN-SIFT model is a useful tool for predicting SIPs, as well to solve other bioinformatics tasks. To facilitate widely studies and encourage future proteomics research, a freely available web server called RNN-SIFT-SIPs was developed at http://219.219.62.123:8888/RNNSIFT/ including the source code and the SIP datasets.
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China
University of Mining and Technology, Xuzhou, China
| | - Yong Zhou
- School of Computer Science and Technology, China
University of Mining and Technology, Xuzhou, China
| | - Zi-Ji Yan
- School of Computer Science and Technology, China
University of Mining and Technology, Xuzhou, China
| | - Yu-Jun Zhao
- School of Computer Science and Technology, China
University of Mining and Technology, Xuzhou, China
| |
Collapse
|
5
|
Chen Y, Wang W, Liu J, Feng J, Gong X. Protein Interface Complementarity and Gene Duplication Improve Link Prediction of Protein-Protein Interaction Network. Front Genet 2020; 11:291. [PMID: 32300358 PMCID: PMC7142252 DOI: 10.3389/fgene.2020.00291] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 03/10/2020] [Indexed: 12/20/2022] Open
Abstract
Protein-protein interactions are the foundations of cellular life activities. At present, the already known protein-protein interactions only account for a small part of the total. With the development of experimental and computing technology, more and more PPI data are mined, PPI networks are more and more dense. It is possible to predict protein-protein interaction from the perspective of network structure. Although there are many high-throughput experimental methods to detect protein-protein interactions, the cost of experiments is high, time-consuming, and there is a certain error rate meanwhile. Network-based approaches can provide candidates of protein pairs for high-throughput experiments and improve the accuracy rate. This paper presents a new link prediction approach "Sim" for PPI networks from the perspectives of proteins' complementary interfaces and gene duplication. By integrating our approach "Sim" with the state-of-art network-based approach "L3," the prediction accuracy and robustness are improved.
Collapse
Affiliation(s)
- Yu Chen
- School of Mathematics, Renmin University of China, Beijing, China.,School of Mathematics and Statistics, Minnan Normal University, Zhangzhou, China.,Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Wei Wang
- School of Mathematics, Renmin University of China, Beijing, China
| | - Jiale Liu
- School of Mathematics, Renmin University of China, Beijing, China.,Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Jinping Feng
- School of Mathematics and Statistics, Henan University, Kaifeng, China
| | - Xinqi Gong
- School of Mathematics, Renmin University of China, Beijing, China.,Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| |
Collapse
|
6
|
Qu J, Zhao Y, Zhang L, Cai SB, Ming Z, Wang CC. Computational Models for Self-Interacting Proteins Prediction. Protein Pept Lett 2019; 27:392-399. [PMID: 31880240 DOI: 10.2174/0929866527666191227141713] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 11/19/2019] [Accepted: 11/21/2019] [Indexed: 11/22/2022]
Abstract
Self-Interacting Proteins (SIPs), whose two or more copies can interact with each other, have significant roles in cellular functions and evolution of Protein Interaction Networks (PINs). Knowing whether a protein can act on itself is important to understand its functions. Previous studies on SIPs have focused on their structures and functions, while their whole properties are less emphasized. Not surprisingly, identifying SIPs is one of the most important works in biomedical research, which will help to understanding the function and mechanism of proteins. It is worth noting that high throughput methods can be used for SIPs prediction, but can be costly, time consuming and challenging. Therefore, it is urgent to design computational models for the identification of SIPs. In this review, the concept and function of SIPs were introduced in detail. We further introduced SIPs data and some excellent computational models that have been designed for SIPs prediction. Specially, the most existing approaches were developed based on machine learning through carrying out different extract feature methods. Finally, we discussed several difficult problems in developing computational models for SIPs prediction.
Collapse
Affiliation(s)
- Jia Qu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Shu-Bin Cai
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Zhong Ming
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Chun-Chun Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
7
|
An JY, Zhou Y, Zhao YJ, Yan ZJ. An Efficient Feature Extraction Technique Based on Local Coding PSSM and Multifeatures Fusion for Predicting Protein-Protein Interactions. Evol Bioinform Online 2019; 15:1176934319879920. [PMID: 31619921 PMCID: PMC6777060 DOI: 10.1177/1176934319879920] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 09/11/2019] [Indexed: 12/20/2022] Open
Abstract
Background Increasing evidence has indicated that protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of a cell. Thus, continuing to uncover potential PPIs is an important topic in the biomedical domain. Although various feature extraction methods with machine learning approaches have enhanced the prediction of PPIs. There remains room for improvement by developing novel and effective feature extraction methods and classifier approaches to identify PPIs. Method In this study, we proposed a sequence-based feature extraction method called LCPSSMMF, which combined local coding position-specific scoring matrix (PSSM) with multifeatures fusion. First, we used a novel local coding method based on PSSM to build a new PSSM (CPSSM); the advantage of this method is that it incorporated global and local feature extraction, which can account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. Second, we adopted 2 different feature extraction methods (Local Average Group [LAG] and Bigram Probability [BP]) to capture multiple key feature information by employing the evolutionary information embedded in the CPSSM matrix. Finally, feature vectors were acquired by using multifeatures fusion method. Result To evaluate the performance of the proposed feature extraction approach, we employed support vector machine (SVM) as a prediction classifier and applied this method to yeast and human PPI datasets. The prediction accuracies of LCPSSMMF were 93.43% and 90.41% on the yeast and human datasets, respectively. Moreover, we also compared the proposed method with the previous sequence-based approaches on the yeast datasets by using the same SVM classifier. The experimental results indicated that the performance of LCPSSMMF significantly exceeded that of several other state-of-the-art methods. It is proven that the LCPSSMMF approach can capture more local and global discriminatory information than almost all previous methods and can function remarkably well in identifying PPIs. To facilitate extensive research in future proteomics studies, we developed a LCPSSMMFSVM server, which is freely available for academic use at http://219.219.62.123:8888/LCPSSMMFSVM.
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.,Mine Digitization Engineering Research Center, Ministry of Education, Xuzhou, People's Republic of China
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.,Mine Digitization Engineering Research Center, Ministry of Education, Xuzhou, People's Republic of China
| | - Yu-Jun Zhao
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.,Mine Digitization Engineering Research Center, Ministry of Education, Xuzhou, People's Republic of China
| | - Zi-Ji Yan
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.,Mine Digitization Engineering Research Center, Ministry of Education, Xuzhou, People's Republic of China
| |
Collapse
|
8
|
An JY, You ZH, Zhou Y, Wang DF. Sequence-based Prediction of Protein-Protein Interactions Using Gray Wolf Optimizer-Based Relevance Vector Machine. Evol Bioinform Online 2019; 15:1176934319844522. [PMID: 31080346 PMCID: PMC6498782 DOI: 10.1177/1176934319844522] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 03/20/2019] [Indexed: 12/18/2022] Open
Abstract
Protein-protein interactions (PPIs) are essential to a number of biological processes. The PPIs generated by biological experiment are both time-consuming and expensive. Therefore, many computational methods have been proposed to identify PPIs. However, most of these methods are limited as they are difficult to compute and rely on a large number of homologous proteins. Accordingly, it is urgent to develop effective computational methods to detect PPIs using only protein sequence information. The kernel parameter of relevance vector machine (RVM) is set by experience, which may not obtain the optimal solution, affecting the prediction performance of RVM. In this work, we presented a novel computational approach called GWORVM-BIG, which used Bi-gram (BIG) to represent protein sequences on a position-specific scoring matrix (PSSM) and GWORVM classifier to perform classification for predicting PPIs. More specifically, the proposed GWORVM model can obtain the optimum solution of kernel parameters using gray wolf optimizer approach, which has the advantages of less control parameters, strong global optimization ability, and ease of implementation compared with other optimization algorithms. The experimental results on yeast and human data sets demonstrated the good accuracy and efficiency of the proposed GWORVM-BIG method. The results showed that the proposed GWORVM classifier can significantly improve the prediction performance compared with the RVM model using other optimizer algorithms including grid search (GS), genetic algorithm (GA), and particle swarm optimization (PSO). In addition, the proposed method is also compared with other existing algorithms, and the experimental results further indicated that the proposed GWORVM-BIG model yields excellent prediction performance. For facilitating extensive studies for future proteomics research, the GWORVMBIG server is freely available for academic use at http://219.219.62.123:8888/GWORVMBIG.
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.,Mine Digitization Engineering Research Center of Minstry of Education of the People's Republic of China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.,Mine Digitization Engineering Research Center of Minstry of Education of the People's Republic of China
| | - Da-Fu Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.,Mine Digitization Engineering Research Center of Minstry of Education of the People's Republic of China
| |
Collapse
|