1
|
Zou HT, Ji BY, Xie XL. A multi-source molecular network representation model for protein-protein interactions prediction. Sci Rep 2024; 14:6184. [PMID: 38485942 PMCID: PMC10940665 DOI: 10.1038/s41598-024-56286-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 03/05/2024] [Indexed: 03/18/2024] Open
Abstract
The prediction of potential protein-protein interactions (PPIs) is a critical step in decoding diseases and understanding cellular mechanisms. Traditional biological experiments have identified plenty of potential PPIs in recent years, but this problem is still far from being solved. Hence, there is urgent to develop computational models with good performance and high efficiency to predict potential PPIs. In this study, we propose a multi-source molecular network representation learning model (called MultiPPIs) to predict potential protein-protein interactions. Specifically, we first extract the protein sequence features according to the physicochemical properties of amino acids by utilizing the auto covariance method. Second, a multi-source association network is constructed by integrating the known associations among miRNAs, proteins, lncRNAs, drugs, and diseases. The graph representation learning method, DeepWalk, is adopted to extract the multisource association information of proteins with other biomolecules. In this way, the known protein-protein interaction pairs can be represented as a concatenation of the protein sequence and the multi-source association features of proteins. Finally, the Random Forest classifier and corresponding optimal parameters are used for training and prediction. In the results, MultiPPIs obtains an average 86.03% prediction accuracy with 82.69% sensitivity at the AUC of 93.03% under five-fold cross-validation. The experimental results indicate that MultiPPIs has a good prediction performance and provides valuable insights into the field of potential protein-protein interactions prediction. MultiPPIs is free available at https://github.com/jiboyalab/multiPPIs .
Collapse
Affiliation(s)
- Hai-Tao Zou
- College of Information Science and Engineering, Guilin University of Technology, Guilin, 541000, China
| | - Bo-Ya Ji
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China.
| | - Xiao-Lan Xie
- College of Information Science and Engineering, Guilin University of Technology, Guilin, 541000, China.
| |
Collapse
|
2
|
Bao W, Liu Y, Chen B. Oral_voting_transfer: classification of oral microorganisms' function proteins with voting transfer model. Front Microbiol 2024; 14:1277121. [PMID: 38384719 PMCID: PMC10879614 DOI: 10.3389/fmicb.2023.1277121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 12/19/2023] [Indexed: 02/23/2024] Open
Abstract
Introduction The oral microbial group typically represents the human body's highly complex microbial group ecosystem. Oral microorganisms take part in human diseases, including Oral cavity inflammation, mucosal disease, periodontal disease, tooth decay, and oral cancer. On the other hand, oral microbes can also cause endocrine disorders, digestive function, and nerve function disorders, such as diabetes, digestive system diseases, and Alzheimer's disease. It was noted that the proteins of oral microbes play significant roles in these serious diseases. Having a good knowledge of oral microbes can be helpful in analyzing the procession of related diseases. Moreover, the high-dimensional features and imbalanced data lead to the complexity of oral microbial issues, which can hardly be solved with traditional experimental methods. Methods To deal with these challenges, we proposed a novel method, which is oral_voting_transfer, to deal with such classification issues in the field of oral microorganisms. Such a method employed three features to classify the five oral microorganisms, including Streptococcus mutans, Staphylococcus aureus, abiotrophy adjacent, bifidobacterial, and Capnocytophaga. Firstly, we utilized the highly effective model, which successfully classifies the organelle's proteins and transfers to deal with the oral microorganisms. And then, some classification methods can be treated as the local classifiers in this work. Finally, the results are voting from the transfer classifiers and the voting ones. Results and discussion The proposed method achieved the well performances in the five oral microorganisms. The oral_voting_transfer is a standalone tool, and all its source codes are publicly available at https://github.com/baowz12345/voting_transfer.
Collapse
Affiliation(s)
- Wenzheng Bao
- School of Information Engineering, Xuzhou University of Technology, Xuzhou, China
| | - Yujun Liu
- School of Information Engineering, Xuzhou University of Technology, Xuzhou, China
| | - Baitong Chen
- The Affiliated Xuzhou Municipal Hospital of Xuzhou Medical University, Xuzhou, China
- Department of Stomatology, Xuzhou First People’s Hospital, Xuzhou, China
| |
Collapse
|
3
|
Raj SS, Chandra SSV. Significance of Sequence Features in Classification of Protein-Protein Interactions Using Machine Learning. Protein J 2024; 43:72-83. [PMID: 38114669 DOI: 10.1007/s10930-023-10168-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/30/2023] [Indexed: 12/21/2023]
Abstract
Protein-protein interactions are crucial for the entry of viruses into the cell. Understanding the mechanism of interactions is essential in studying human-virus association, developing new biologics and drug candidates, as well as viral infections and antiviral responses. Experimental methods to analyze human-virus protein-protein interactions based on protein sequence data are time-consuming and labor-intensive, so machine learning models are being developed to predict interactions and determine large-scale interactomes between species. The present work highlights the importance of sequence features in classifying interacting and non-interacting proteins from the protein sequence data. Higher dimensional amino acid sequence features such as Amino Acid Composition (AAC), Dipeptide Composition (DPC), Grouped Amino Acid Composition (GAAC), Pseudo-Amino Acid Composition (PAAC) etc., are extracted. Following feature extraction, three datasets were created: Dataset 1 contains all of the extracted features. While Datasets 2 and 3 contain the most relevant features obtained through dimensionality reduction. To analyze the importance of high-dimensional features and their participation in protein-protein interactions, a random forest classifier is trained on three datasets. With dimensionality reduction, the model exhibited exceptional accuracy, indicating that dimensionality reduction fails to capture the complexity of interactions and the underlying relationships between human and viral proteins. As a result of retaining high-dimensional features, it is possible to capture all the characteristics of protein-protein interactions that resemble host-pathogen associations, leading to the development of biologically meaningful models. Our proposed approach is a more realistic and comprehensive classification model, leading to deeper insights and better applications in virology and drug development.
Collapse
Affiliation(s)
- Sini S Raj
- Machine Intelligence Research Lab, Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India.
| | - S S Vinod Chandra
- Machine Intelligence Research Lab, Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India
| |
Collapse
|
4
|
Bernett J, Blumenthal DB, List M. Cracking the black box of deep sequence-based protein-protein interaction prediction. Brief Bioinform 2024; 25:bbae076. [PMID: 38446741 PMCID: PMC10939362 DOI: 10.1093/bib/bbae076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 01/09/2024] [Indexed: 03/08/2024] Open
Abstract
Identifying protein-protein interactions (PPIs) is crucial for deciphering biological pathways. Numerous prediction methods have been developed as cheap alternatives to biological experiments, reporting surprisingly high accuracy estimates. We systematically investigated how much reproducible deep learning models depend on data leakage, sequence similarities and node degree information, and compared them with basic machine learning models. We found that overlaps between training and test sets resulting from random splitting lead to strongly overestimated performances. In this setting, models learn solely from sequence similarities and node degrees. When data leakage is avoided by minimizing sequence similarities between training and test set, performances become random. Moreover, baseline models directly leveraging sequence similarity and network topology show good performances at a fraction of the computational cost. Thus, we advocate that any improvements should be reported relative to baseline methods in the future. Our findings suggest that predicting PPIs remains an unsolved task for proteins showing little sequence similarity to previously studied proteins, highlighting that further experimental research into the 'dark' protein interactome and better computational methods are needed.
Collapse
Affiliation(s)
- Judith Bernett
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof Forum 3, 85354, Freising, Germany
| | - David B Blumenthal
- Biomedical Network Science Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Werner-von-Siemens-Str. 61, 91052, Erlangen, Germany
| | - Markus List
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof Forum 3, 85354, Freising, Germany
| |
Collapse
|
5
|
Zhang F, Zhang Y, Zhu X, Chen X, Lu F, Zhang X. DeepSG2PPI: A Protein-Protein Interaction Prediction Method Based on Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2907-2919. [PMID: 37079417 DOI: 10.1109/tcbb.2023.3268661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Protein-protein interaction (PPI) plays an important role in almost all life activities. Many protein interaction sites have been confirmed by biological experiments, but these PPI site identification methods are time-consuming and expensive. In this study, a deep learning-based PPI prediction method, named DeepSG2PPI, is developed. First, the protein sequence information is retrieved and the local context information of each amino acid residue is calculated. A two-dimensional convolutional neural network (2D-CNN) model is employed to extract features from a two-channel coding structure, in which an attention mechanism is embedded to assign higher weights to key features. Second, the global statistical information of each amino acid residue and the relationship graph between the protein and GO (Gene Ontology) function annotation are built, and the graph embedding vector is constructed to represent the biological features of the protein. Finally, a 2D-CNN model and two 1D-CNN models are combined for PPI prediction. The comparison analysis with existing algorithms shows that the DeepSG2PPI method has better performance. It provides more accurate and effective PPI site prediction, which will be helpful in reducing the cost and failure rate of biological experiments.
Collapse
|
6
|
Jha K, Karmakar S, Saha S. Graph-BERT and language model-based framework for protein-protein interaction identification. Sci Rep 2023; 13:5663. [PMID: 37024543 PMCID: PMC10079975 DOI: 10.1038/s41598-023-31612-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 03/14/2023] [Indexed: 04/08/2023] Open
Abstract
Identification of protein-protein interactions (PPI) is among the critical problems in the domain of bioinformatics. Previous studies have utilized different AI-based models for PPI classification with advances in artificial intelligence (AI) techniques. The input to these models is the features extracted from different sources of protein information, mainly sequence-derived features. In this work, we present an AI-based PPI identification model utilizing a PPI network and protein sequences. The PPI network is represented as a graph where each node is a protein pair, and an edge is defined between two nodes if there exists a common protein between these nodes. Each node in a graph has a feature vector. In this work, we have used the language model to extract feature vectors directly from protein sequences. The feature vectors for protein in pairs are concatenated and used as a node feature vector of a PPI network graph. Finally, we have used the Graph-BERT model to encode the PPI network graph with sequence-based features and learn the hidden representation of the feature vector for each node. The next step involves feeding the learned representations of nodes to the fully connected layer, the output of which is fed into the softmax layer to classify the protein interactions. To assess the efficacy of the proposed PPI model, we have performed experiments on several PPI datasets. The experimental results demonstrate that the proposed approach surpasses the existing PPI works and designed baselines in classifying PPI.
Collapse
Affiliation(s)
- Kanchan Jha
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, 801103, India.
| | - Sourav Karmakar
- Department of Computer Science and Engineering, National Institute of Technology Durgapur, Durgapur, West Bengal, 713209, India
| | - Sriparna Saha
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, 801103, India
| |
Collapse
|
7
|
Wang XW, Madeddu L, Spirohn K, Martini L, Fazzone A, Becchetti L, Wytock TP, Kovács IA, Balogh OM, Benczik B, Pétervári M, Ágg B, Ferdinandy P, Vulliard L, Menche J, Colonnese S, Petti M, Scarano G, Cuomo F, Hao T, Laval F, Willems L, Twizere JC, Vidal M, Calderwood MA, Petrillo E, Barabási AL, Silverman EK, Loscalzo J, Velardi P, Liu YY. Assessment of community efforts to advance network-based prediction of protein-protein interactions. Nat Commun 2023; 14:1582. [PMID: 36949045 PMCID: PMC10033937 DOI: 10.1038/s41467-023-37079-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 03/02/2023] [Indexed: 03/24/2023] Open
Abstract
Comprehensive understanding of the human protein-protein interaction (PPI) network, aka the human interactome, can provide important insights into the molecular mechanisms of complex biological processes and diseases. Despite the remarkable experimental efforts undertaken to date to determine the structure of the human interactome, many PPIs remain unmapped. Computational approaches, especially network-based methods, can facilitate the identification of previously uncharacterized PPIs. Many such methods have been proposed. Yet, a systematic evaluation of existing network-based methods in predicting PPIs is still lacking. Here, we report community efforts initiated by the International Network Medicine Consortium to benchmark the ability of 26 representative network-based methods to predict PPIs across six different interactomes of four different organisms: A. thaliana, C. elegans, S. cerevisiae, and H. sapiens. Through extensive computational and experimental validations, we found that advanced similarity-based methods, which leverage the underlying network characteristics of PPIs, show superior performance over other general link prediction methods in the interactomes we considered.
Collapse
Affiliation(s)
- Xu-Wen Wang
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Lorenzo Madeddu
- Translational and Precision Medicine Department Sapienza University of Rome, Rome, Italy
| | - Kerstin Spirohn
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Leonardo Martini
- Department of Computer, Control, and Management Engineering "Antonio Rubert", Sapienza University of Rome, Rome, Italy
| | | | - Luca Becchetti
- Department of Computer, Control, and Management Engineering "Antonio Rubert", Sapienza University of Rome, Rome, Italy
| | - Thomas P Wytock
- Department of Physics and Astronomy, Northwestern University, Evanston, IL, 60208, USA
| | - István A Kovács
- Department of Physics and Astronomy, Northwestern University, Evanston, IL, 60208, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, 60208, USA
| | - Olivér M Balogh
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - Bettina Benczik
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, 6722, Szeged, Hungary
| | - Mátyás Pétervári
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - Bence Ágg
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, 6722, Szeged, Hungary
| | - Péter Ferdinandy
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, 6722, Szeged, Hungary
| | - Loan Vulliard
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Jörg Menche
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
- Faculty of Mathematics, University of Vienna, Vienna, Austria
| | - Stefania Colonnese
- Department of Information Engineering, Electronics, and Telecommunications (DIET), University of Rome "Sapienza", Rome, Italy
| | - Manuela Petti
- Department of Computer, Control, and Management Engineering "Antonio Rubert", Sapienza University of Rome, Rome, Italy
| | - Gaetano Scarano
- Department of Information Engineering, Electronics, and Telecommunications (DIET), University of Rome "Sapienza", Rome, Italy
| | - Francesca Cuomo
- Department of Information Engineering, Electronics, and Telecommunications (DIET), University of Rome "Sapienza", Rome, Italy
| | - Tong Hao
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Florent Laval
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Laboratory of Molecular and Cellular Epigenetic, GIGA Institute, University of Liège, Liège, Belgium
- Laboratory of Viral Interactomes, GIGA Institute, University of Liège, Liège, Belgium
- TERRA Teaching and Research Centre, University of Liège, Gembloux, Belgium
| | - Luc Willems
- Laboratory of Molecular and Cellular Epigenetic, GIGA Institute, University of Liège, Liège, Belgium
- TERRA Teaching and Research Centre, University of Liège, Gembloux, Belgium
| | - Jean-Claude Twizere
- Laboratory of Viral Interactomes, GIGA Institute, University of Liège, Liège, Belgium
- TERRA Teaching and Research Centre, University of Liège, Gembloux, Belgium
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Michael A Calderwood
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Enrico Petrillo
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Department of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, 02115, USA
| | - Albert-László Barabási
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Network Science Institute and Department of Physics, Northeastern University, Boston, MA, 02115, USA
- Department of Network and Data Science, Central European University, Budapest, H-1051, Hungary
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Joseph Loscalzo
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Paola Velardi
- Translational and Precision Medicine Department Sapienza University of Rome, Rome, Italy.
| | - Yang-Yu Liu
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA.
- Center for Artificial Intelligence and Modeling, The Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, 61801, USA.
| |
Collapse
|
8
|
Zhang Y, Li Z. RF_phage virion: Classification of phage virion proteins with a random forest model. Front Genet 2023; 13:1103783. [PMID: 36846294 PMCID: PMC9945117 DOI: 10.3389/fgene.2022.1103783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 12/30/2022] [Indexed: 02/10/2023] Open
Abstract
Introduction: Phages play essential roles in biological procession, and the virion proteins encoded by the phage genome constitute critical elements of the assembled phage particle. Methods: This study uses machine learning methods to classify phage virion proteins. We proposed a novel approach, RF_phage virion, for the effective classification of the virion and non-virion proteins. The model uses four protein sequence coding methods as features, and the random forest algorithm was employed to solve the classification problem. Results: The performance of the RF_phage virion model was analyzed by comparing the performance of this algorithm with that of classical machine learning methods. The proposed method achieved a specificity (Sp) of 93.37%%, sensitivity (Sn) of 90.30%, accuracy (Acc) of 91.84%, Matthews correlation coefficient (MCC) of .8371, and an F1 score of .9196.
Collapse
Affiliation(s)
- Yanqing Zhang
- School of Finance, Xuzhou University of Technology, Xuzhou, China
| | - Zhiyuan Li
- School of Artificial Intelligence and Software College, Jiangsu Normal University Kewen College, Xuzhou, China,*Correspondence: Zhiyuan Li,
| |
Collapse
|
9
|
Protein-protein interaction and non-interaction predictions using gene sequence natural vector. Commun Biol 2022; 5:652. [PMID: 35780196 PMCID: PMC9250521 DOI: 10.1038/s42003-022-03617-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 06/21/2022] [Indexed: 12/02/2022] Open
Abstract
Predicting protein–protein interaction and non-interaction are two important different aspects of multi-body structure predictions, which provide vital information about protein function. Some computational methods have recently been developed to complement experimental methods, but still cannot effectively detect real non-interacting protein pairs. We proposed a gene sequence-based method, named NVDT (Natural Vector combine with Dinucleotide and Triplet nucleotide), for the prediction of interaction and non-interaction. For protein–protein non-interactions (PPNIs), the proposed method obtained accuracies of 86.23% for Homo sapiens and 85.34% for Mus musculus, and it performed well on three types of non-interaction networks. For protein-protein interactions (PPIs), we obtained accuracies of 99.20, 94.94, 98.56, 95.41, and 94.83% for Saccharomyces cerevisiae, Drosophila melanogaster, Helicobacter pylori, Homo sapiens, and Mus musculus, respectively. Furthermore, NVDT outperformed established sequence-based methods and demonstrated high prediction results for cross-species interactions. NVDT is expected to be an effective approach for predicting PPIs and PPNIs. Protein-protein non-interactions and interactions are distinguished and predicted by gene sequence using single nucleotide and contiguous nucleotides combined with machine learning models.
Collapse
|
10
|
Song B, Luo X, Luo X, Liu Y, Niu Z, Zeng X. Learning spatial structures of proteins improves protein-protein interaction prediction. Brief Bioinform 2022; 23:6501351. [PMID: 35018418 DOI: 10.1093/bib/bbab558] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 12/07/2021] [Accepted: 12/07/2021] [Indexed: 01/09/2023] Open
Abstract
Spatial structures of proteins are closely related to protein functions. Integrating protein structures improves the performance of protein-protein interaction (PPI) prediction. However, the limited quantity of known protein structures restricts the application of structure-based prediction methods. Utilizing the predicted protein structure information is a promising method to improve the performance of sequence-based prediction methods. We propose a novel end-to-end framework, TAGPPI, to predict PPIs using protein sequence alone. TAGPPI extracts multi-dimensional features by employing 1D convolution operation on protein sequences and graph learning method on contact maps constructed from AlphaFold. A contact map contains abundant spatial structure information, which is difficult to obtain from 1D sequence data directly. We further demonstrate that the spatial information learned from contact maps improves the ability of TAGPPI in PPI prediction tasks. We compare the performance of TAGPPI with those of nine state-of-the-art sequence-based methods, and TAGPPI outperforms such methods in all metrics. To the best of our knowledge, this is the first method to use the predicted protein topology structure graph for sequence-based PPI prediction. More importantly, our proposed architecture could be extended to other prediction tasks related to proteins.
Collapse
Affiliation(s)
- Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410012, Hunan, China
| | - Xiaoyan Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410012, Hunan, China.,MindRank AI ltd., Hangzhou, 311113, Zhejiang, China
| | - Xiaoli Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410012, Hunan, China.,BioMap, Haidian, 100089, Beijing, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410012, Hunan, China
| | | | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410012, Hunan, China
| |
Collapse
|
11
|
Yang F, Fan K, Song D, Lin H. Graph-based prediction of Protein-protein interactions with attributed signed graph embedding. BMC Bioinformatics 2020; 21:323. [PMID: 32693790 PMCID: PMC7372763 DOI: 10.1186/s12859-020-03646-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 07/08/2020] [Indexed: 12/12/2022] Open
Abstract
Background Protein-protein interactions (PPIs) are central to many biological processes. Considering that the experimental methods for identifying PPIs are time-consuming and expensive, it is important to develop automated computational methods to better predict PPIs. Various machine learning methods have been proposed, including a deep learning technique which is sequence-based that has achieved promising results. However, it only focuses on sequence information while ignoring the structural information of PPI networks. Structural information of PPI networks such as their degree, position, and neighboring nodes in a graph has been proved to be informative in PPI prediction. Results Facing the challenge of representing graph information, we introduce an improved graph representation learning method. Our model can study PPI prediction based on both sequence information and graph structure. Moreover, our study takes advantage of a representation learning model and employs a graph-based deep learning method for PPI prediction, which shows superiority over existing sequence-based methods. Statistically, Our method achieves state-of-the-art accuracy of 99.15% on Human protein reference database (HPRD) dataset and also obtains best results on Database of Interacting Protein (DIP) Human, Drosophila, Escherichia coli (E. coli), and Caenorhabditis elegans (C. elegan) datasets. Conclusion Here, we introduce signed variational graph auto-encoder (S-VGAE), an improved graph representation learning method, to automatically learn to encode graph structure into low-dimensional embeddings. Experimental results demonstrate that our method outperforms other existing sequence-based methods on several datasets. We also prove the robustness of our model for very sparse networks and the generalization for a new dataset that consists of four datasets: HPRD, E.coli, C.elegan, and Drosophila.
Collapse
Affiliation(s)
- Fang Yang
- School of Computer Science and Technology, Beijing Institute of Technology, 5 South Zhongguancun Street, Haidian District, Beijing, 100081, China
| | - Kunjie Fan
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Ohio, Columbus, 43210, USA
| | - Dandan Song
- School of Computer Science and Technology, Beijing Institute of Technology, 5 South Zhongguancun Street, Haidian District, Beijing, 100081, China.
| | - Huakang Lin
- School of Computer Science and Technology, Beijing Institute of Technology, 5 South Zhongguancun Street, Haidian District, Beijing, 100081, China
| |
Collapse
|
12
|
Kong M, Zhang Y, Xu D, Chen W, Dehmer M. FCTP-WSRC: Protein-Protein Interactions Prediction via Weighted Sparse Representation Based Classification. Front Genet 2020; 11:18. [PMID: 32117437 PMCID: PMC7010952 DOI: 10.3389/fgene.2020.00018] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 01/07/2020] [Indexed: 12/21/2022] Open
Abstract
The task of predicting protein–protein interactions (PPIs) has been essential in the context of understanding biological processes. This paper proposes a novel computational model namely FCTP-WSRC to predict PPIs effectively. Initially, combinations of the F-vector, composition (C) and transition (T) are used to map each protein sequence onto numeric feature vectors. Afterwards, an effective feature extraction method PCA (principal component analysis) is employed to reconstruct the most discriminative feature subspaces, which is subsequently used as input in weighted sparse representation based classification (WSRC) for prediction. The FCTP-WSRC model achieves accuracies of 96.67%, 99.82%, and 98.09% for H. pylori, Human and Yeast datasets respectively. Furthermore, the FCTP-WSRC model performs well when predicting three significant PPIs networks: the single-core network (CD9), the multiple-core network (Ras-Raf-Mek-Erk-Elk-Srf pathway), and the cross-connection network (Wnt-related Network). Consequently, the promising results show that the proposed method can be a powerful tool for PPIs prediction with excellent performance and less time.
Collapse
Affiliation(s)
- Meng Kong
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, China
| | - Da Xu
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, China
| | - Wei Chen
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, China
| | - Matthias Dehmer
- University of Applied Sciences Upper Austria, School of Management, Steyr, Austria.,College of Artificial Intellegience, Nankai University, Tianjin, China.,Department of Biomedical Computer Science and Mechantronics, UMIT Hall, Tyrol, Austria
| |
Collapse
|
13
|
Sarkar D, Saha S. Machine-learning techniques for the prediction of protein-protein interactions. J Biosci 2019; 44:104. [PMID: 31502581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein-protein interactions (PPIs) are important for the study of protein functions and pathways involved in different biological processes, as well as for understanding the cause and progression of diseases. Several high-throughput experimental techniques have been employed for the identification of PPIs in a few model organisms, but still, there is a huge gap in identifying all possible binary PPIs in an organism. Therefore, PPI prediction using machine-learning algorithms has been used in conjunction with experimental methods for discovery of novel protein interactions. The two most popular supervised machine-learning techniques used in the prediction of PPIs are support vector machines and random forest classifiers. Bayesian-probabilistic inference has also been used but mainly for the scoring of high-throughput PPI dataset confidence measures. Recently, deep-learning algorithms have been used for sequence-based prediction of PPIs. Several clustering methods such as hierarchical and k-means are useful as unsupervised machine-learning algorithms for the prediction of interacting protein pairs without explicit data labelling. In summary, machine-learning techniques have been widely used for the prediction of PPIs thus allowing experimental researchers to study cellular PPI networks.
Collapse
|
14
|
Prediction of Protein-Protein Interactions Based on Domain. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019; 2019:5238406. [PMID: 31531123 PMCID: PMC6720845 DOI: 10.1155/2019/5238406] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 07/09/2019] [Accepted: 07/30/2019] [Indexed: 11/17/2022]
Abstract
Protein-protein interactions (PPIs) play a crucial role in various biological processes. To better comprehend the pathogenesis and treatments of various diseases, it is necessary to learn the detail of these interactions. However, the current experimental method still has many false-positive and false-negative problems. Computational prediction of protein-protein interaction has become a more important prediction method which can overcome the obstacles of the experimental method. In this work, we proposed a novel computational domain-based method for PPI prediction, and an SVM model for the prediction was built based on the physicochemical property of the domain. The outcomes of SVM and the domain-domain score were used to construct the prediction model for protein-protein interaction. The predicted results demonstrated the domain-based research can enhance the ability to predict protein interactions.
Collapse
|
15
|
Li Y, Li LP, Wang L, Yu CQ, Wang Z, You ZH. An Ensemble Classifier to Predict Protein-Protein Interactions by Combining PSSM-based Evolutionary Information with Local Binary Pattern Model. Int J Mol Sci 2019; 20:ijms20143511. [PMID: 31319578 PMCID: PMC6679202 DOI: 10.3390/ijms20143511] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2019] [Revised: 07/04/2019] [Accepted: 07/15/2019] [Indexed: 01/03/2023] Open
Abstract
Protein plays a critical role in the regulation of biological cell functions. Among them, whether proteins interact with each other has become a fundamental problem, because proteins usually perform their functions by interacting with other proteins. Although a large amount of protein–protein interactions (PPIs) data has been produced by high-throughput biotechnology, the disadvantage of biological experimental technique is time-consuming and costly. Thus, computational methods for predicting protein interactions have become a research hot spot. In this research, we propose an efficient computational method that combines Rotation Forest (RF) classifier with Local Binary Pattern (LBP) feature extraction method to predict PPIs from the perspective of Position-Specific Scoring Matrix (PSSM). The proposed method has achieved superior performance in predicting Yeast, Human, and H. pylori datasets with average accuracies of 92.12%, 96.21%, and 86.59%, respectively. In addition, we also evaluated the performance of the proposed method on the four independent datasets of C. elegans, H. pylori, H. sapiens, and M. musculus datasets. These obtained experimental results fully prove that our model has good feasibility and robustness in predicting PPIs.
Collapse
Affiliation(s)
- Yang Li
- School of Information Engineering, Xijing University, Xi'an 710123, China
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi'an 710123, China.
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an 710123, China.
| | - Zheng Wang
- School of Information Engineering, Xijing University, Xi'an 710123, China
| | - Zhu-Hong You
- School of Information Engineering, Xijing University, Xi'an 710123, China
| |
Collapse
|
16
|
Wang L, Wang HF, Liu SR, Yan X, Song KJ. Predicting Protein-Protein Interactions from Matrix-Based Protein Sequence Using Convolution Neural Network and Feature-Selective Rotation Forest. Sci Rep 2019; 9:9848. [PMID: 31285519 PMCID: PMC6614364 DOI: 10.1038/s41598-019-46369-4] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 06/10/2019] [Indexed: 01/09/2023] Open
Abstract
Protein is an essential component of the living organism. The prediction of protein-protein interactions (PPIs) has important implications for understanding the behavioral processes of life, preventing diseases, and developing new drugs. Although the development of high-throughput technology makes it possible to identify PPIs in large-scale biological experiments, it restricts the extensive use of experimental methods due to the constraints of time, cost, false positive rate and other conditions. Therefore, there is an urgent need for computational methods as a supplement to experimental methods to predict PPIs rapidly and accurately. In this paper, we propose a novel approach, namely CNN-FSRF, for predicting PPIs based on protein sequence by combining deep learning Convolution Neural Network (CNN) with Feature-Selective Rotation Forest (FSRF). The proposed method firstly converts the protein sequence into the Position-Specific Scoring Matrix (PSSM) containing biological evolution information, then uses CNN to objectively and efficiently extracts the deeply hidden features of the protein, and finally removes the redundant noise information by FSRF and gives the accurate prediction results. When performed on the PPIs datasets Yeast and Helicobacter pylori, CNN-FSRF achieved a prediction accuracy of 97.75% and 88.96%. To further evaluate the prediction performance, we compared CNN-FSRF with SVM and other existing methods. In addition, we also verified the performance of CNN-FSRF on independent datasets. Excellent experimental results indicate that CNN-FSRF can be used as a useful complement to biological experiments to identify protein interactions.
Collapse
Affiliation(s)
- Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China. .,Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, P.R. China.
| | - Hai-Feng Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China
| | - San-Rong Liu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China
| | - Xin Yan
- School of Foreign Languages, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China.
| | - Ke-Jian Song
- School of information engineering, JiangXi University of Science and Technology, Ganzhou, Jiangxi, 341000, P.R. China
| |
Collapse
|
17
|
Chen ZH, Li LP, He Z, Zhou JR, Li Y, Wong L. An Improved Deep Forest Model for Predicting Self-Interacting Proteins From Protein Sequence Using Wavelet Transformation. Front Genet 2019; 10:90. [PMID: 30881376 PMCID: PMC6405691 DOI: 10.3389/fgene.2019.00090] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 01/29/2019] [Indexed: 12/23/2022] Open
Abstract
Self-interacting proteins (SIPs), whose more than two identities can interact with each other, play significant roles in the understanding of cellular process and cell functions. Although a number of experimental methods have been designed to detect the SIPs, they remain to be extremely time-consuming, expensive, and challenging even nowadays. Therefore, there is an urgent need to develop the computational methods for predicting SIPs. In this study, we propose a deep forest based predictor for accurate prediction of SIPs using protein sequence information. More specifically, a novel feature representation method, which integrate position-specific scoring matrix (PSSM) with wavelet transform, is introduced. To evaluate the performance of the proposed method, cross-validation tests are performed on two widely used benchmark datasets. The experimental results show that the proposed model achieved high accuracies of 95.43 and 93.65% on human and yeast datasets, respectively. The AUC value for evaluating the performance of the proposed method was also reported. The AUC value for yeast and human datasets are 0.9203 and 0.9586, respectively. To further show the advantage of the proposed method, it is compared with several existing methods. The results demonstrate that the proposed model is better than other SIPs prediction methods. This work can offer an effective architecture to biologists in detecting new SIPs.
Collapse
Affiliation(s)
- Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Li-Ping Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Zhou He
- College of Engineering and Applied Science, University of Colorado Boulder, Boulder, CO, United States
| | - Ji-Ren Zhou
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Yangming Li
- ECTET, Rochester Institute of Technology, Rochester, NY, United States
| | - Leon Wong
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
18
|
Tian B, Wu X, Chen C, Qiu W, Ma Q, Yu B. Predicting protein–protein interactions by fusing various Chou's pseudo components and using wavelet denoising approach. J Theor Biol 2019; 462:329-346. [DOI: 10.1016/j.jtbi.2018.11.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 11/08/2018] [Accepted: 11/15/2018] [Indexed: 12/26/2022]
|
19
|
Zhang L, Yu G, Guo M, Wang J. Predicting protein-protein interactions using high-quality non-interacting pairs. BMC Bioinformatics 2018; 19:525. [PMID: 30598096 PMCID: PMC6311908 DOI: 10.1186/s12859-018-2525-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Identifying protein-protein interactions (PPIs) is of paramount importance for understanding cellular processes. Machine learning-based approaches have been developed to predict PPIs, but the effectiveness of these approaches is unsatisfactory. One major reason is that they randomly choose non-interacting protein pairs (negative samples) or heuristically select non-interacting pairs with low quality. RESULTS To boost the effectiveness of predicting PPIs, we propose two novel approaches (NIP-SS and NIP-RW) to generate high quality non-interacting pairs based on sequence similarity and random walk, respectively. Specifically, the known PPIs collected from public databases are used to generate the positive samples. NIP-SS then selects the top-m dissimilar protein pairs as negative examples and controls the degree distribution of selected proteins to construct the negative dataset. NIP-RW performs random walk on the PPI network to update the adjacency matrix of the network, and then selects protein pairs not connected in the updated network as negative samples. Next, we use auto covariance (AC) descriptor to encode the feature information of amino acid sequences. After that, we employ deep neural networks (DNNs) to predict PPIs based on extracted features, positive and negative examples. Extensive experiments show that NIP-SS and NIP-RW can generate negative samples with higher quality than existing strategies and thus enable more accurate prediction. CONCLUSIONS The experimental results prove that negative datasets constructed by NIP-SS and NIP-RW can reduce the bias and have good generalization ability. NIP-SS and NIP-RW can be used as a plugin to boost the effectiveness of PPIs prediction. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NIP .
Collapse
Affiliation(s)
- Long Zhang
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Guoxian Yu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, China
| | - Jun Wang
- College of Computer and Information Sciences, Southwest University, Chongqing, China.
| |
Collapse
|
20
|
Molecular Skin Surface-Based Transformation Visualization between Biological Macromolecules. JOURNAL OF HEALTHCARE ENGINEERING 2017; 2017:4818604. [PMID: 29065609 PMCID: PMC5415869 DOI: 10.1155/2017/4818604] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 01/10/2017] [Indexed: 01/08/2023]
Abstract
Molecular skin surface (MSS), proposed by Edelsbrunner, is a C2 continuous smooth surface modeling approach of biological macromolecules. Compared to the traditional methods of molecular surface representations (e.g., the solvent exclusive surface), MSS has distinctive advantages including having no self-intersection and being decomposable and transformable. For further promoting MSS to the field of bioinformatics, transformation between different MSS representations mimicking the macromolecular dynamics is demanded. The transformation process helps biologists understand the macromolecular dynamics processes visually in the atomic level, which is important in studying the protein structures and binding sites for optimizing drug design. However, modeling the transformation between different MSSs suffers from high computational cost while the traditional approaches reconstruct every intermediate MSS from respective intermediate union of balls. In this study, we propose a novel computational framework named general MSS transformation framework (GMSSTF) between two MSSs without the assistance of union of balls. To evaluate the effectiveness of GMSSTF, we applied it on a popular public database PDB (Protein Data Bank) and compared the existing MSS algorithms with and without GMSSTF. The simulation results show that the proposed GMSSTF effectively improves the computational efficiency and is potentially useful for macromolecular dynamic simulations.
Collapse
|
21
|
Accelerating smooth molecular surface calculation. J Math Biol 2017; 76:779-793. [PMID: 28689219 DOI: 10.1007/s00285-017-1156-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2015] [Revised: 04/06/2017] [Indexed: 10/19/2022]
Abstract
This study proposes a novel approach, namely, skin flow complex algorithm (SFCA), to decompose the molecular skin surface into topological disks. The main contributions of SFCA include providing a simple decomposition and fast calculation of the molecular skin surface. Unlike most existing works which partition the molecular skin surface into sphere and hyperboloid patches, SFCA partitions the molecular skin surface into triangular quadratic patches and rectangular quadratic patches. Each quadratic patch is proven to be a topological disk and rendered by a rational Bézier patch. The skin surface is constructed by assembling all rational Bézier patches. Experimental results show that the SFCA is more efficient than most existing algorithms, and produces a triangulation of molecular skin surface which is decomposable, deformable, smooth, watertight and feature-preserved.
Collapse
|
22
|
Wang Y, You Z, Li X, Chen X, Jiang T, Zhang J. PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein-Protein Interactions from Protein Sequences. Int J Mol Sci 2017; 18:ijms18051029. [PMID: 28492483 PMCID: PMC5454941 DOI: 10.3390/ijms18051029] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Revised: 04/24/2017] [Accepted: 04/29/2017] [Indexed: 01/08/2023] Open
Abstract
Protein–protein interactions (PPIs) are essential for most living organisms’ process. Thus, detecting PPIs is extremely important to understand the molecular mechanisms of biological systems. Although many PPIs data have been generated by high-throughput technologies for a variety of organisms, the whole interatom is still far from complete. In addition, the high-throughput technologies for detecting PPIs has some unavoidable defects, including time consumption, high cost, and high error rate. In recent years, with the development of machine learning, computational methods have been broadly used to predict PPIs, and can achieve good prediction rate. In this paper, we present here PCVMZM, a computational method based on a Probabilistic Classification Vector Machines (PCVM) model and Zernike moments (ZM) descriptor for predicting the PPIs from protein amino acids sequences. Specifically, a Zernike moments (ZM) descriptor is used to extract protein evolutionary information from Position-Specific Scoring Matrix (PSSM) generated by Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then, PCVM classifier is used to infer the interactions among protein. When performed on PPIs datasets of Yeast and H. Pylori, the proposed method can achieve the average prediction accuracy of 94.48% and 91.25%, respectively. In order to further evaluate the performance of the proposed method, the state-of-the-art support vector machines (SVM) classifier is used and compares with the PCVM model. Experimental results on the Yeast dataset show that the performance of PCVM classifier is better than that of SVM classifier. The experimental results indicate that our proposed method is robust, powerful and feasible, which can be used as a helpful tool for proteomics research.
Collapse
Affiliation(s)
- Yanbin Wang
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| | - Zhuhong You
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| | - Xiao Li
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China.
| | - Tonghai Jiang
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| | - Jingting Zhang
- Department of Mathematics and Statistics, Henan University, Kaifeng 100190, China.
| |
Collapse
|
23
|
Zhang J, Kurgan L. Review and comparative assessment of sequence-based predictors of protein-binding residues. Brief Bioinform 2017; 19:821-837. [DOI: 10.1093/bib/bbx022] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Indexed: 12/31/2022] Open
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
24
|
Translocation of a Polymer through a Crowded Channel under Electrical Force. BIOMED RESEARCH INTERNATIONAL 2017; 2017:5267185. [PMID: 28459062 PMCID: PMC5385253 DOI: 10.1155/2017/5267185] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Accepted: 03/09/2017] [Indexed: 11/18/2022]
Abstract
The translocation of a polymer chain through a crowded cylindrical channel is studied using the Langevin dynamics simulations. The influences of the field strength F, the chain length N, and the crowding extent ρ on the translocation time are evaluated, respectively. Scaling relation τ ~ F−α is observed. With the crowding extent ρ increasing, the scaling exponent α becomes large. It is found that, for noncrowded channel, translocation probability drops when the field strength becomes large. However, for high-crowded channel, it is the opposite. Moreover, the translocation time and the average translocation time for all segments both have exponential growth with the crowding extent. The investigation of shape factor 〈δ〉 shows maximum value with increasing of the number of segments outside s. At last, the number of segments inside channel Nin in the process of translocation is calculated and a peak is observed. All the information from the study may benefit protein translocation.
Collapse
|
25
|
Computational Approaches for Predicting Binding Partners, Interface Residues, and Binding Affinity of Protein-Protein Complexes. Methods Mol Biol 2017; 1484:237-253. [PMID: 27787830 DOI: 10.1007/978-1-4939-6406-2_16] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Studying protein-protein interactions leads to a better understanding of the underlying principles of several biological pathways. Cost and labor-intensive experimental techniques suggest the need for computational methods to complement them. Several such state-of-the-art methods have been reported for analyzing diverse aspects such as predicting binding partners, interface residues, and binding affinity for protein-protein complexes with reliable performance. However, there are specific drawbacks for different methods that indicate the need for their improvement. This review highlights various available computational algorithms for analyzing diverse aspects of protein-protein interactions and endorses the necessity for developing new robust methods for gaining deep insights about protein-protein interactions.
Collapse
|
26
|
Ens-PPI: A Novel Ensemble Classifier for Predicting the Interactions of Proteins Using Autocovariance Transformation from PSSM. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4563524. [PMID: 27437399 PMCID: PMC4942601 DOI: 10.1155/2016/4563524] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2016] [Accepted: 05/08/2016] [Indexed: 11/17/2022]
Abstract
Protein-Protein Interactions (PPIs) play vital roles in most biological activities. Although the development of high-throughput biological technologies has generated considerable PPI data for various organisms, many problems are still far from being solved. A number of computational methods based on machine learning have been developed to facilitate the identification of novel PPIs. In this study, a novel predictor was designed using the Rotation Forest (RF) algorithm combined with Autocovariance (AC) features extracted from the Position-Specific Scoring Matrix (PSSM). More specifically, the PSSMs are generated using the information of protein amino acids sequence. Then, an effective sequence-based features representation, Autocovariance, is employed to extract features from PSSMs. Finally, the RF model is used as a classifier to distinguish between the interacting and noninteracting protein pairs. The proposed method achieves promising prediction performance when performed on the PPIs of Yeast, H. pylori, and independent datasets. The good results show that the proposed model is suitable for PPIs prediction and could also provide a useful supplementary tool for solving other bioinformatics problems.
Collapse
|
27
|
Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4783801. [PMID: 27314023 PMCID: PMC4893571 DOI: 10.1155/2016/4783801] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2016] [Accepted: 04/12/2016] [Indexed: 01/08/2023]
Abstract
We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.
Collapse
|
28
|
Huang YA, You ZH, Chen X, Chan K, Luo X. Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding. BMC Bioinformatics 2016; 17:184. [PMID: 27112932 PMCID: PMC4845433 DOI: 10.1186/s12859-016-1035-4] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2015] [Accepted: 04/12/2016] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Proteins are the important molecules which participate in virtually every aspect of cellular function within an organism in pairs. Although high-throughput technologies have generated considerable protein-protein interactions (PPIs) data for various species, the processes of experimental methods are both time-consuming and expensive. In addition, they are usually associated with high rates of both false positive and false negative results. Accordingly, a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. Therefore, it is very urgent to develop effective computational methods for prediction of PPIs solely using protein sequence information. RESULTS In this study, we present a novel computational model combining weighted sparse representation based classifier (WSRC) and global encoding (GE) of amino acid sequence. Two kinds of protein descriptors, composition and transition, are extracted for representing each protein sequence. On the basis of such a feature representation, novel weighted sparse representation based classifier is introduced to predict protein interaction class. When the proposed method was evaluated with the PPIs data of S. cerevisiae, Human and H. pylori, it achieved high prediction accuracies of 96.82, 97.66 and 92.83 % respectively. Extensive experiments were performed for cross-species PPIs prediction and the prediction accuracies were also very promising. CONCLUSIONS To further evaluate the performance of the proposed method, we then compared its performance with the method based on support vector machine (SVM). The results show that the proposed method achieved a significant improvement. Thus, the proposed method is a very efficient method to predict PPIs and may be a useful supplementary tool for future proteomics studies.
Collapse
Affiliation(s)
- Yu-An Huang
- />College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060 China
| | - Zhu-Hong You
- />School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116 China
| | - Xing Chen
- />Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190 China
| | - Keith Chan
- />Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong 999077 China
| | - Xin Luo
- />Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong 999077 China
| |
Collapse
|
29
|
Pai PP, Mondal S. MOWGLI: prediction of protein-MannOse interacting residues With ensemble classifiers usinG evoLutionary Information. J Biomol Struct Dyn 2015; 34:2069-83. [PMID: 26457920 DOI: 10.1080/07391102.2015.1106978] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Proteins interact with carbohydrates to perform various cellular interactions. Of the many carbohydrate ligands that proteins bind with, mannose constitute an important class, playing important roles in host defense mechanisms. Accurate identification of mannose-interacting residues (MIR) may provide important clues to decipher the underlying mechanisms of protein-mannose interactions during infections. This study proposes an approach using an ensemble of base classifiers for prediction of MIR using their evolutionary information in the form of position-specific scoring matrix. The base classifiers are random forests trained by different subsets of training data set Dset128 using 10-fold cross-validation. The optimized ensemble of base classifiers, MOWGLI, is then used to predict MIR on protein chains of the test data set Dtestset29 which showed a promising performance with 92.0% accurate prediction. An overall improvement of 26.6% in precision was observed upon comparison with the state-of-art. It is hoped that this approach, yielding enhanced predictions, could be eventually used for applications in drug design and vaccine development.
Collapse
Affiliation(s)
- Priyadarshini P Pai
- a Department of Biological Sciences , Birla Institute of Technology and Science-Pilani , K.K. Birla Goa Campus, Near NH17 Bypass Road, Zuarinagar , Goa 403726 , India
| | - Sukanta Mondal
- a Department of Biological Sciences , Birla Institute of Technology and Science-Pilani , K.K. Birla Goa Campus, Near NH17 Bypass Road, Zuarinagar , Goa 403726 , India
| |
Collapse
|