1
|
Akbarzadeh S, Coşkun Ö, Günçer B. Studying protein-protein interactions: Latest and most popular approaches. J Struct Biol 2024; 216:108118. [PMID: 39214321 DOI: 10.1016/j.jsb.2024.108118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 08/20/2024] [Accepted: 08/23/2024] [Indexed: 09/04/2024]
Abstract
PPIs, or protein-protein interactions, are essential for many biological processes. According to the findings, abnormal PPIs have been linked to several diseases, such as cancer and infectious and neurological disorders. Consequently, focusing on PPIs is a path toward disease treatment and a crucial tool for producing novel medications. Many methods exist to investigate PPIs, including low- and high-throughput studies. Since many PPIs have been discovered using in vitro and in vivo experimental approaches, the use of computational methods to predict PPIs has grown due to the expanding scale of PPI data and the intrinsic complexity of interacting mechanisms. Recognizing PPI networks offers a systematic means of predicting protein functions, and pathways that are included. These investigations can help uncover the underlying molecular mechanisms of complex phenotypes and clarify the biological processes related to health and diseases. Therefore, our goal in this study is to provide an overview of the latest and most popular approaches for investigating PPIs. We also overview some important clinical approaches based on the PPIs and how these interactions can be targeted.
Collapse
Affiliation(s)
- Sama Akbarzadeh
- Department of Biophysics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye; Institute of Graduate Studies in Health Sciences, Istanbul University, Istanbul, Türkiye
| | - Özlem Coşkun
- Department of Biophysics, Faculty of Medicine, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye
| | - Başak Günçer
- Department of Biophysics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye.
| |
Collapse
|
2
|
Ko YS, Parkinson J, Liu C, Wang W. TUnA: an uncertainty-aware transformer model for sequence-based protein-protein interaction prediction. Brief Bioinform 2024; 25:bbae359. [PMID: 39051117 PMCID: PMC11269822 DOI: 10.1093/bib/bbae359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/31/2024] [Accepted: 07/10/2024] [Indexed: 07/27/2024] Open
Abstract
Protein-protein interactions (PPIs) are important for many biological processes, but predicting them from sequence data remains challenging. Existing deep learning models often cannot generalize to proteins not present in the training set and do not provide uncertainty estimates for their predictions. To address these limitations, we present TUnA, a Transformer-based uncertainty-aware model for PPI prediction. TUnA uses ESM-2 embeddings with Transformer encoders and incorporates a Spectral-normalized Neural Gaussian Process. TUnA achieves state-of-the-art performance and, importantly, evaluates uncertainty for unseen sequences. We demonstrate that TUnA's uncertainty estimates can effectively identify the most reliable predictions, significantly reducing false positives. This capability is crucial in bridging the gap between computational predictions and experimental validation.
Collapse
Affiliation(s)
- Young Su Ko
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0359, United States
| | - Jonathan Parkinson
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0359, United States
| | - Cong Liu
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0359, United States
| | - Wei Wang
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0359, United States
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093-0359, United States
| |
Collapse
|
3
|
Hao T, Zhang M, Song Z, Gou Y, Wang B, Sun J. Reconstruction of Eriocheir sinensis Protein-Protein Interaction Network Based on DGO-SVM Method. Curr Issues Mol Biol 2024; 46:7353-7372. [PMID: 39057077 PMCID: PMC11276262 DOI: 10.3390/cimb46070436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 06/25/2024] [Accepted: 07/10/2024] [Indexed: 07/28/2024] Open
Abstract
Eriocheir sinensis is an economically important aquatic animal. Its regulatory mechanisms underlying many biological processes are still vague due to the lack of systematic analysis tools. The protein-protein interaction network (PIN) is an important tool for the systematic analysis of regulatory mechanisms. In this work, a novel machine learning method, DGO-SVM, was applied to predict the protein-protein interaction (PPI) in E. sinensis, and its PIN was reconstructed. With the domain, biological process, molecular functions and subcellular locations of proteins as the features, DGO-SVM showed excellent performance in Bombyx mori, humans and five aquatic crustaceans, with 92-96% accuracy. With DGO-SVM, the PIN of E. sinensis was reconstructed, containing 14,703 proteins and 7,243,597 interactions, in which 35,604 interactions were associated with 566 novel proteins mainly involved in the response to exogenous stimuli, cellular macromolecular metabolism and regulation. The DGO-SVM demonstrated that the biological process, molecular functions and subcellular locations of proteins are significant factors for the precise prediction of PPIs. We reconstructed the largest PIN for E. sinensis, which provides a systematic tool for the regulatory mechanism analysis. Furthermore, the novel-protein-related PPIs in the PIN may provide important clues for the mechanism analysis of the underlying specific physiological processes in E. sinensis.
Collapse
Affiliation(s)
| | | | | | | | - Bin Wang
- Tianjin Key Laboratory of Animal and Plant Resistance, College of Life Sciences, Tianjin Normal University, Tianjin 300387, China; (T.H.); (M.Z.); (Z.S.); (Y.G.)
| | - Jinsheng Sun
- Tianjin Key Laboratory of Animal and Plant Resistance, College of Life Sciences, Tianjin Normal University, Tianjin 300387, China; (T.H.); (M.Z.); (Z.S.); (Y.G.)
| |
Collapse
|
4
|
Wang XW, Madeddu L, Spirohn K, Martini L, Fazzone A, Becchetti L, Wytock TP, Kovács IA, Balogh OM, Benczik B, Pétervári M, Ágg B, Ferdinandy P, Vulliard L, Menche J, Colonnese S, Petti M, Scarano G, Cuomo F, Hao T, Laval F, Willems L, Twizere JC, Vidal M, Calderwood MA, Petrillo E, Barabási AL, Silverman EK, Loscalzo J, Velardi P, Liu YY. Assessment of community efforts to advance network-based prediction of protein-protein interactions. Nat Commun 2023; 14:1582. [PMID: 36949045 PMCID: PMC10033937 DOI: 10.1038/s41467-023-37079-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 03/02/2023] [Indexed: 03/24/2023] Open
Abstract
Comprehensive understanding of the human protein-protein interaction (PPI) network, aka the human interactome, can provide important insights into the molecular mechanisms of complex biological processes and diseases. Despite the remarkable experimental efforts undertaken to date to determine the structure of the human interactome, many PPIs remain unmapped. Computational approaches, especially network-based methods, can facilitate the identification of previously uncharacterized PPIs. Many such methods have been proposed. Yet, a systematic evaluation of existing network-based methods in predicting PPIs is still lacking. Here, we report community efforts initiated by the International Network Medicine Consortium to benchmark the ability of 26 representative network-based methods to predict PPIs across six different interactomes of four different organisms: A. thaliana, C. elegans, S. cerevisiae, and H. sapiens. Through extensive computational and experimental validations, we found that advanced similarity-based methods, which leverage the underlying network characteristics of PPIs, show superior performance over other general link prediction methods in the interactomes we considered.
Collapse
Affiliation(s)
- Xu-Wen Wang
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Lorenzo Madeddu
- Translational and Precision Medicine Department Sapienza University of Rome, Rome, Italy
| | - Kerstin Spirohn
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Leonardo Martini
- Department of Computer, Control, and Management Engineering "Antonio Rubert", Sapienza University of Rome, Rome, Italy
| | | | - Luca Becchetti
- Department of Computer, Control, and Management Engineering "Antonio Rubert", Sapienza University of Rome, Rome, Italy
| | - Thomas P Wytock
- Department of Physics and Astronomy, Northwestern University, Evanston, IL, 60208, USA
| | - István A Kovács
- Department of Physics and Astronomy, Northwestern University, Evanston, IL, 60208, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, 60208, USA
| | - Olivér M Balogh
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - Bettina Benczik
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, 6722, Szeged, Hungary
| | - Mátyás Pétervári
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - Bence Ágg
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, 6722, Szeged, Hungary
| | - Péter Ferdinandy
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, 6722, Szeged, Hungary
| | - Loan Vulliard
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Jörg Menche
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
- Faculty of Mathematics, University of Vienna, Vienna, Austria
| | - Stefania Colonnese
- Department of Information Engineering, Electronics, and Telecommunications (DIET), University of Rome "Sapienza", Rome, Italy
| | - Manuela Petti
- Department of Computer, Control, and Management Engineering "Antonio Rubert", Sapienza University of Rome, Rome, Italy
| | - Gaetano Scarano
- Department of Information Engineering, Electronics, and Telecommunications (DIET), University of Rome "Sapienza", Rome, Italy
| | - Francesca Cuomo
- Department of Information Engineering, Electronics, and Telecommunications (DIET), University of Rome "Sapienza", Rome, Italy
| | - Tong Hao
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Florent Laval
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Laboratory of Molecular and Cellular Epigenetic, GIGA Institute, University of Liège, Liège, Belgium
- Laboratory of Viral Interactomes, GIGA Institute, University of Liège, Liège, Belgium
- TERRA Teaching and Research Centre, University of Liège, Gembloux, Belgium
| | - Luc Willems
- Laboratory of Molecular and Cellular Epigenetic, GIGA Institute, University of Liège, Liège, Belgium
- TERRA Teaching and Research Centre, University of Liège, Gembloux, Belgium
| | - Jean-Claude Twizere
- Laboratory of Viral Interactomes, GIGA Institute, University of Liège, Liège, Belgium
- TERRA Teaching and Research Centre, University of Liège, Gembloux, Belgium
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Michael A Calderwood
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Enrico Petrillo
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Department of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, 02115, USA
| | - Albert-László Barabási
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Network Science Institute and Department of Physics, Northeastern University, Boston, MA, 02115, USA
- Department of Network and Data Science, Central European University, Budapest, H-1051, Hungary
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Joseph Loscalzo
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Paola Velardi
- Translational and Precision Medicine Department Sapienza University of Rome, Rome, Italy.
| | - Yang-Yu Liu
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA.
- Center for Artificial Intelligence and Modeling, The Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, 61801, USA.
| |
Collapse
|
5
|
Soleymani F, Paquet E, Viktor HL, Michalowski W, Spinello D. ProtInteract: A deep learning framework for predicting protein-protein interactions. Comput Struct Biotechnol J 2023; 21:1324-1348. [PMID: 36817951 PMCID: PMC9929211 DOI: 10.1016/j.csbj.2023.01.028] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/20/2023] [Accepted: 01/20/2023] [Indexed: 01/26/2023] Open
Abstract
Proteins mainly perform their functions by interacting with other proteins. Protein-protein interactions underpin various biological activities such as metabolic cycles, signal transduction, and immune response. However, due to the sheer number of proteins, experimental methods for finding interacting and non-interacting protein pairs are time-consuming and costly. We therefore developed the ProtInteract framework to predict protein-protein interaction. ProtInteract comprises two components: first, a novel autoencoder architecture that encodes each protein's primary structure to a lower-dimensional vector while preserving its underlying sequence attributes. This leads to faster training of the second network, a deep convolutional neural network (CNN) that receives encoded proteins and predicts their interaction under three different scenarios. In each scenario, the deep CNN predicts the class of a given encoded protein pair. Each class indicates different ranges of confidence scores corresponding to the probability of whether a predicted interaction occurs or not. The proposed framework features significantly low computational complexity and relatively fast response. The contributions of this work are twofold. First, ProtInteract assimilates the protein's primary structure into a pseudo-time series. Therefore, we leverage the nature of the time series of proteins and their physicochemical properties to encode a protein's amino acid sequence into a lower-dimensional vector space. This approach enables extracting highly informative sequence attributes while reducing computational complexity. Second, the ProtInteract framework utilises this information to identify protein interactions with other proteins based on its amino acid configuration. Our results suggest that the proposed framework performs with high accuracy and efficiency in predicting protein-protein interactions.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada,Corresponding author.
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON K1N 6N5, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| |
Collapse
|
6
|
Nambiar A, Liu S, Heflin M, Forsyth JM, Maslov S, Hopkins M, Ritz A. Transformer Neural Networks for Protein Family and Interaction Prediction Tasks. J Comput Biol 2023; 30:95-111. [PMID: 35950958 DOI: 10.1089/cmb.2022.0132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer neural network that pre-trains task-agnostic sequence representations. This model is fine-tuned to solve two different protein prediction tasks: protein family classification and protein interaction prediction. Our method is comparable to existing state-of-the-art approaches for protein family classification while being much more general than other architectures. Further, our method outperforms other approaches for protein interaction prediction for two out of three different scenarios that we generated. These results offer a promising framework for fine-tuning the pre-trained sequence representations for other protein prediction tasks.
Collapse
Affiliation(s)
- Ananthan Nambiar
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Simon Liu
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.,Department of Computer Science, and University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Maeve Heflin
- Department of Computer Science, and University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - John Malcolm Forsyth
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.,Department of Computer Science, and University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Sergei Maslov
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.,Department of Computer Science, and University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Mark Hopkins
- Department of Computer Science and Reed College, Portland, Oregon, USA
| | - Anna Ritz
- Department of Biology, Reed College, Portland, Oregon, USA
| |
Collapse
|
7
|
Soleymani F, Paquet E, Viktor H, Michalowski W, Spinello D. Protein-protein interaction prediction with deep learning: A comprehensive review. Comput Struct Biotechnol J 2022; 20:5316-5341. [PMID: 36212542 PMCID: PMC9520216 DOI: 10.1016/j.csbj.2022.08.070] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/15/2022] Open
Abstract
Most proteins perform their biological function by interacting with themselves or other molecules. Thus, one may obtain biological insights into protein functions, disease prevalence, and therapy development by identifying protein-protein interactions (PPI). However, finding the interacting and non-interacting protein pairs through experimental approaches is labour-intensive and time-consuming, owing to the variety of proteins. Hence, protein-protein interaction and protein-ligand binding problems have drawn attention in the fields of bioinformatics and computer-aided drug discovery. Deep learning methods paved the way for scientists to predict the 3-D structure of proteins from genomes, predict the functions and attributes of a protein, and modify and design new proteins to provide desired functions. This review focuses on recent deep learning methods applied to problems including predicting protein functions, protein-protein interaction and their sites, protein-ligand binding, and protein design.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada
| | - Herna Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
8
|
Jiang Y, Wang Y, Shen L, Adjeroh DA, Liu Z, Lin J. Identification of all-against-all protein-protein interactions based on deep hash learning. BMC Bioinformatics 2022; 23:266. [PMID: 35804303 PMCID: PMC9264577 DOI: 10.1186/s12859-022-04811-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 06/17/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein-protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming. RESULTS In this work, we propose a more efficient model, called deep hash learning protein-and-protein interaction (DHL-PPI), to predict all-against-all PPI relationships in a database of proteins. First, DHL-PPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the pre-screening stage of DHL-PPI, the string matching problem of comparing a protein sequence against a database with M proteins can be transformed into a much more simpler problem: to find a number inside a sorted array of length M. This pre-screening process narrows down the search to a much smaller set of candidate proteins for further confirmation. As a final step, DHL-PPI uses the Hamming distance to verify the final PPI relationship. CONCLUSIONS The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is shown to be superior or competitive when compared to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI reduced the time complexity from [Formula: see text] to [Formula: see text] for performing an all-against-all PPI prediction for a database with M proteins. With the proposed approach, a protein database can be preprocessed and stored for later search using the proposed encoding scheme. This can provide a more efficient way to cope with the rapidly increasing volume of protein datasets.
Collapse
Affiliation(s)
- Yue Jiang
- College of Computer and Cyber Security, Fujian Normal University, Fuzhou, 350108, People's Republic of China
| | - Yuxuan Wang
- No. 2 Thoracic Surgery Department Beijing Chest Hospital, Capital Medical University, Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing, 101149, People's Republic of China
| | - Lin Shen
- College of Computer and Cyber Security, Fujian Normal University, Fuzhou, 350108, People's Republic of China
| | - Donald A Adjeroh
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, 26506, USA
| | - Zhidong Liu
- No. 2 Thoracic Surgery Department Beijing Chest Hospital, Capital Medical University, Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing, 101149, People's Republic of China.
| | - Jie Lin
- College of Computer and Cyber Security, Fujian Normal University, Fuzhou, 350108, People's Republic of China.
| |
Collapse
|
9
|
Martins YC, Ziviani A, Nicolás MF, de Vasconcelos ATR. Large-Scale Protein Interactions Prediction by Multiple Evidence Analysis Associated With an In-Silico Curation Strategy. FRONTIERS IN BIOINFORMATICS 2021; 1:731345. [PMID: 36303787 PMCID: PMC9581021 DOI: 10.3389/fbinf.2021.731345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Accepted: 08/23/2021] [Indexed: 11/17/2022] Open
Abstract
Predicting the physical or functional associations through protein-protein interactions (PPIs) represents an integral approach for inferring novel protein functions and discovering new drug targets during repositioning analysis. Recent advances in high-throughput data generation and multi-omics techniques have enabled large-scale PPI predictions, thus promoting several computational methods based on different levels of biological evidence. However, integrating multiple results and strategies to optimize, extract interaction features automatically and scale up the entire PPI prediction process is still challenging. Most procedures do not offer an in-silico validation process to evaluate the predicted PPIs. In this context, this paper presents the PredPrIn scientific workflow that enables PPI prediction based on multiple lines of evidence, including the structure, sequence, and functional annotation categories, by combining boosting and stacking machine learning techniques. We also present a pipeline (PPIVPro) for the validation process based on cellular co-localization filtering and a focused search of PPI evidence on scientific publications. Thus, our combined approach provides means to extensive scale training or prediction of new PPIs and a strategy to evaluate the prediction quality. PredPrIn and PPIVPro are publicly available at https://github.com/YasCoMa/predprin and https://github.com/YasCoMa/ppi_validation_process.
Collapse
Affiliation(s)
- Yasmmin Côrtes Martins
- Bioinformatics Laboratory, National Laboratory of Scientific Computing, Petrópolis, Brazil
| | - Artur Ziviani
- Data Extreme Lab (DEXL), National Laboratory of Scientific Computing, Petrópolis, Brazil
| | - Marisa Fabiana Nicolás
- Bioinformatics Laboratory, National Laboratory of Scientific Computing, Petrópolis, Brazil
| | - Ana Tereza Ribeiro de Vasconcelos
- Bioinformatics Laboratory, National Laboratory of Scientific Computing, Petrópolis, Brazil
- *Correspondence: Ana Tereza Ribeiro de Vasconcelos,
| |
Collapse
|
10
|
Pei F, Shi Q, Zhang H, Bahar I. Predicting Protein-Protein Interactions Using Symmetric Logistic Matrix Factorization. J Chem Inf Model 2021; 61:1670-1682. [PMID: 33831302 DOI: 10.1021/acs.jcim.1c00173] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Accurate assessment of protein-protein interactions (PPIs) is critical to deciphering disease mechanisms and developing novel drugs, and with rapidly growing PPI data, the need for more efficient predictive methods is emerging. We propose here a symmetric logistic matrix factorization (symLMF)-based approach to predict PPIs, especially useful for large PPI networks. Benchmarked against two widely used datasets (Saccharomyces cerevisiae and Homo sapiens benchmarks) and their extended versions, the symLMF-based method proves to outperform most of the state-of-the-art data-driven methods applied to human PPIs, and it shows a performance comparable to those of deep learning methods despite its conceptual and technical simplicity and efficiency. Tests performed on humans, yeast, and tissue (brain and liver)- and disease (neurodegenerative and metabolic disorders)-specific datasets further demonstrate the high capability to capture the hidden interactions. Notably, many "de novo predictions" made by symLMF are verified to exist in PPI databases other than those used for training/testing the method, indicating that the method could be of broad utility as a simple, yet efficient and accurate, tool applicable to PPI datasets.
Collapse
Affiliation(s)
| | - Qingya Shi
- School of Medicine, Tsinghua University, Beijing 100084, China
| | | | | |
Collapse
|
11
|
Chen YM, Zu XP, Li D. Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction. Front Genet 2020; 11:569100. [PMID: 33193664 PMCID: PMC7581905 DOI: 10.3389/fgene.2020.569100] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 09/09/2020] [Indexed: 12/03/2022] Open
Abstract
Tobacco mosaic virus, TMV for short, is widely distributed in the global tobacco industry and has a significant impact on tobacco production. It can reduce the amount of tobacco grown by 50-70%. In this research of study, we aimed to identify tobacco mosaic virus proteins and healthy tobacco leaf proteins by using machine learning approaches. The experiment's results showed that the support vector machine algorithm achieved high accuracy in different feature extraction methods. And 188-dimensions feature extraction method improved the classification accuracy. In that the support vector machine algorithm and 188-dimensions feature extraction method were finally selected as the final experimental methods. In the 10-fold cross-validation processes, the SVM combined with 188-dimensions achieved 93.5% accuracy on the training set and 92.7% accuracy on the independent validation set. Besides, the evaluation index of the results of experiments indicate that the method developed by us is valid and robust.
Collapse
Affiliation(s)
| | | | - Dan Li
- Information and Computer Engineering College, Northeast Forestry University, Harbin, China
| |
Collapse
|
12
|
Zhang SW, Zhang XX, Fan XN, Li WN. LPI-CNNCP: Prediction of lncRNA-protein interactions by using convolutional neural network with the copy-padding trick. Anal Biochem 2020; 601:113767. [PMID: 32454029 DOI: 10.1016/j.ab.2020.113767] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Revised: 04/27/2020] [Accepted: 05/01/2020] [Indexed: 11/17/2022]
Abstract
Long noncoding RNAs (lncRNAs) play critical roles in many pathological and biological processes, such as post-transcription, cell differentiation and gene regulation. Increasingly more studies have shown that lncRNAs function through mainly interactions with specific RNA binding proteins (RBPs). However, experimental identification of potential lncRNA-protein interactions is costly and time-consuming. In this work, we propose a novel convolutional neural network-based method with the copy-padding trick (named LPI-CNNCP) to predict lncRNA-protein interactions. The copy-padding trick of the LPI-CNNCP convert the protein/RNA sequences with variable-length into the fixed-length sequences, thus enabling the construction of the CNN model. A high-order one-hot encoding is also applied to transform the protein/RNA sequences into image-like inputs for capturing the dependencies among amino acids (or nucleotides). In the end, these encoded protein/RNA sequences are feed into a CNN to predict the lncRNA-protein interactions. Compared with other state-of-the-art methods in 10-fold cross-validation (10CV) test, LPI-CNNCP shows the best performance. Results in the independent test demonstrate that our LPI-CNNCP can effectively predict the potential lncRNA-protein interactions. We also compared the copy-padding trick with two other existing tricks (i.e., zero-padding and cropping), and the results show that our copy-padding rick outperforms the zero-padding and cropping tricks on predicting lncRNA-protein interactions. The source code of LPI-CNNCP and the datasets used in this work are available at https://github.com/NWPU-903PR/LPI-CNNCP for academic users.
Collapse
Affiliation(s)
- Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Xi-Xi Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Xiao-Nan Fan
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Wei-Na Li
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| |
Collapse
|
13
|
Prediction of Protein-Protein Interactions Based on Domain. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019; 2019:5238406. [PMID: 31531123 PMCID: PMC6720845 DOI: 10.1155/2019/5238406] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 07/09/2019] [Accepted: 07/30/2019] [Indexed: 11/17/2022]
Abstract
Protein-protein interactions (PPIs) play a crucial role in various biological processes. To better comprehend the pathogenesis and treatments of various diseases, it is necessary to learn the detail of these interactions. However, the current experimental method still has many false-positive and false-negative problems. Computational prediction of protein-protein interaction has become a more important prediction method which can overcome the obstacles of the experimental method. In this work, we proposed a novel computational domain-based method for PPI prediction, and an SVM model for the prediction was built based on the physicochemical property of the domain. The outcomes of SVM and the domain-domain score were used to construct the prediction model for protein-protein interaction. The predicted results demonstrated the domain-based research can enhance the ability to predict protein interactions.
Collapse
|
14
|
Abstract
Background:
Revealing the subcellular location of a newly discovered protein can
bring insight into their function and guide research at the cellular level. The experimental methods
currently used to identify the protein subcellular locations are both time-consuming and expensive.
Thus, it is highly desired to develop computational methods for efficiently and effectively identifying
the protein subcellular locations. Especially, the rapidly increasing number of protein sequences
entering the genome databases has called for the development of automated analysis methods.
Methods:
In this review, we will describe the recent advances in predicting the protein subcellular
locations with machine learning from the following aspects: i) Protein subcellular location benchmark
dataset construction, ii) Protein feature representation and feature descriptors, iii) Common
machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web
servers.
Result & Conclusion:
Concomitant with a large number of protein sequences generated by highthroughput
technologies, four future directions for predicting protein subcellular locations with
machine learning should be paid attention. One direction is the selection of novel and effective features
(e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins.
Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth
one is the protein multiple location sites prediction.
Collapse
Affiliation(s)
- Ting-He Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| |
Collapse
|
15
|
Wang X, Wu Y, Wang R, Wei Y, Gui Y. A novel matrix of sequence descriptors for predicting protein-protein interactions from amino acid sequences. PLoS One 2019; 14:e0217312. [PMID: 31173605 PMCID: PMC6555512 DOI: 10.1371/journal.pone.0217312] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 05/08/2019] [Indexed: 12/20/2022] Open
Abstract
Protein-protein interactions (PPIs) play an important role in the life activities of organisms. With the availability of large amounts of protein sequence data, PPIs prediction methods have attracted increasing attention. A variety of protein sequence coding methods have emerged, but the training of these methods is particularly time consuming. To solve this issue, we have proposed a novel matrix sequence coding method. Based on deep neural network (DNN) and a novel matrix protein sequence descriptor, we constructed a protein interaction prediction model for predicting PPIs. When performed on human PPIs data, the method achieved an accuracy of 94.34%, a recall of 98.28%, an area under the curve (AUC) of 97.79% and a loss of 23.25%. A non-redundant dataset was used to evaluate this prediction model, and the prediction accuracy is 88.29%. These results indicate that the matrix of sequence (MOS) descriptor can enhance the predictive power of PPIs and reduce training time, which can be a useful complement for future proteomics research. The experimental code and experimental results can be found at https://github.com/smalltalkman/hppi-tensorflow.
Collapse
Affiliation(s)
- Xue Wang
- Institute of Technical Biology & Agriculture Engineering, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
- University of Science and Technology of China, HeFei City, AnHui Province, China
- Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
| | - Yuejin Wu
- Institute of Technical Biology & Agriculture Engineering, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
- University of Science and Technology of China, HeFei City, AnHui Province, China
| | - Rujing Wang
- University of Science and Technology of China, HeFei City, AnHui Province, China
- Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
| | - Yuanyuan Wei
- Institute of Technical Biology & Agriculture Engineering, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
| | - Yuanmiao Gui
- Institute of Technical Biology & Agriculture Engineering, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
- University of Science and Technology of China, HeFei City, AnHui Province, China
- * E-mail:
| |
Collapse
|
16
|
Exploring the Potential of Spherical Harmonics and PCVM for Compounds Activity Prediction. Int J Mol Sci 2019; 20:ijms20092175. [PMID: 31052500 PMCID: PMC6539940 DOI: 10.3390/ijms20092175] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 04/14/2019] [Accepted: 04/29/2019] [Indexed: 01/11/2023] Open
Abstract
Biologically active chemical compounds may provide remedies for several diseases. Meanwhile, Machine Learning techniques applied to Drug Discovery, which are cheaper and faster than wet-lab experiments, have the capability to more effectively identify molecules with the expected pharmacological activity. Therefore, it is urgent and essential to develop more representative descriptors and reliable classification methods to accurately predict molecular activity. In this paper, we investigate the potential of a novel representation based on Spherical Harmonics fed into Probabilistic Classification Vector Machines classifier, namely SHPCVM, to compound the activity prediction task. We make use of representation learning to acquire the features which describe the molecules as precise as possible. To verify the performance of SHPCVM ten-fold cross-validation tests are performed on twenty-one G protein-coupled receptors (GPCRs). Experimental outcomes (accuracy of 0.86) assessed by the classification accuracy, precision, recall, Matthews’ Correlation Coefficient and Cohen’s kappa reveal that using our Spherical Harmonics-based representation which is relatively short and Probabilistic Classification Vector Machines can achieve very satisfactory performance results for GPCRs.
Collapse
|
17
|
Wang X, Wang R, Wei Y, Gui Y. A novel conjoint triad auto covariance (CTAC) coding method for predicting protein-protein interaction based on amino acid sequence. Math Biosci 2019; 313:41-47. [PMID: 31029609 DOI: 10.1016/j.mbs.2019.04.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 03/19/2019] [Accepted: 04/18/2019] [Indexed: 01/07/2023]
Abstract
Protein-protein interactions (PPIs) play a crucial role in the life-sustaining activities of organisms. Although various methods for the prediction of PPIs have been developed in the past decades, their robustness and prediction accuracy need to be improved. Therefore, it is necessary to develop an effective and accurate method to predict PPIs. Aiming at making sure that PPIs can be predicted effectively, in this paper, we propose a new sequence-based approach based on deep neural network (DNN) and conjoint triad auto covariance (CTAC) to improve the effectiveness of predicting PPIs. The coding method of CTAC combines the advantages of conjoint triad and auto covariance. Therefore, the CTAC can obtain more PPIs information from the amino acid sequence. The model of DNNCTAC achieved an accuracy of 98.37%, recall of 99.41%, area under the curve (AUC) of 99.24% and loss of 22.7%, respectively, on human dataset. These results indicate that DNNCTAC can enhance the predictive power of PPIs and can significantly enhance the accuracy of the prediction. And, it has proved to be a useful complement to future proteomics research. The source codes and all datasets are available at https://github.com/smalltalkman/hppi-tensorflow.
Collapse
Affiliation(s)
- Xue Wang
- Institute of Technical Biology & Agriculture Engineering, Chinese Academy of Sciences, Science Island, HeFei City, AnHui Province 230031, China; Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Science Island, HeFei City, AnHui Province 230031, China; University of Science and Technology of China, Hefei City, Anhui Province 230026, China.
| | - Rujing Wang
- Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Science Island, HeFei City, AnHui Province 230031, China.
| | - Yuanyuan Wei
- Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Science Island, HeFei City, AnHui Province 230031, China.
| | - Yuanmiao Gui
- Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Science Island, HeFei City, AnHui Province 230031, China; University of Science and Technology of China, Hefei City, Anhui Province 230026, China.
| |
Collapse
|
18
|
GUI YUANMIAO, WANG RUJING, WEI YUANYUAN, WANG XUE. DNN-PPI: A LARGE-SCALE PREDICTION OF PROTEIN–PROTEIN INTERACTIONS BASED ON DEEP NEURAL NETWORKS. J BIOL SYST 2019. [DOI: 10.1142/s0218339019500013] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein–protein interaction (PPI) is very important for various biological processes and has given rise to a series of prediction-computing methods. In spite of different computing methods in relation to PPI prediction, PPI network projects fail to perform on a large scale. Aiming at ensuring that PPI can be predicted effectively, we used a deep neural network (DNN) for the study of PPI prediction that is based on an amino acid sequence. We present a novel DNN-PPI model with an auto covariance (AC) descriptor and a conjoint triad (CT) descriptor for the prediction of PPI that is based only on the protein sequence information. The 10-fold cross-validation indicated that the best DNN-PPI model with CT achieved 97.65% accuracy, 98.96% recall and a 98.51% area under the curve (AUC). The model exhibits a prediction accuracy of 94.20–97.10% for other external datasets. All of these suggest the high validity of the proposed algorithm in relation to various species.
Collapse
Affiliation(s)
- YUANMIAO GUI
- Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province 230031, P. R. China
- University of Science and Technology of China, HeFei City, AnHui Province 230026, P. R. China
| | - RUJING WANG
- Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province 230031, P. R. China
- University of Science and Technology of China, HeFei City, AnHui Province 230026, P. R. China
| | - YUANYUAN WEI
- Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province 230031, P. R. China
| | - XUE WANG
- Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province 230031, P. R. China
- University of Science and Technology of China, HeFei City, AnHui Province 230026, P. R. China
- Institute of Technical Biology & Agriculture Engineering, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province 230031, P. R. China
| |
Collapse
|
19
|
Niu Y, Wu H, Wang Y. Protein-Protein Interaction Identification Using a Similarity-Constrained Graph Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:607-616. [PMID: 29989990 DOI: 10.1109/tcbb.2017.2777448] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Protein-protein interaction (PPI) identification is an important task in text mining. Most PPI detection systems make predictions solely based on evidence within a single sentence and often suffer from the heavy burden of manual annotation. This paper approaches PPI detection task from a different paradigm by investigating the context of protein pairs collected from a large corpus and their relations. First, crucial cues in the context are exploited to make initial predictions. Then, relational similarity between protein pairs is calculated. Finally, evidence from the two views is integrated in the framework of minimum cuts algorithm. Experimental results show that the graph model achieves better performance than standard supervised approaches. Using 20 percent data as the training set, our algorithm achieves higher accuracy than support vector machine (SVM) using 80 percent data as training data. Moreover, the semi-supervised settings reveal promising directions for PPI identification exploiting unlabeled data.
Collapse
|
20
|
Prediction of drug-target interaction by integrating diverse heterogeneous information source with multiple kernel learning and clustering methods. Comput Biol Chem 2019; 78:460-467. [DOI: 10.1016/j.compbiolchem.2018.11.028] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 11/30/2018] [Accepted: 11/30/2018] [Indexed: 02/08/2023]
|
21
|
Xi J, Li A, Wang M. A novel unsupervised learning model for detecting driver genes from pan-cancer data through matrix tri-factorization framework with pairwise similarities constraints. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.03.026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
22
|
Peng X, Wang J, Peng W, Wu FX, Pan Y. Protein-protein interactions: detection, reliability assessment and applications. Brief Bioinform 2017; 18:798-819. [PMID: 27444371 DOI: 10.1093/bib/bbw066] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Indexed: 01/06/2023] Open
Abstract
Protein-protein interactions (PPIs) participate in all important biological processes in living organisms, such as catalyzing metabolic reactions, DNA replication, DNA transcription, responding to stimuli and transporting molecules from one location to another. To reveal the function mechanisms in cells, it is important to identify PPIs that take place in the living organism. A large number of PPIs have been discovered by high-throughput experiments and computational methods. However, false-positive PPIs have been introduced too. Therefore, to obtain reliable PPIs, many computational methods have been proposed. Generally, these methods can be classified into two categories. One category includes the methods that are designed to determine new reliable PPIs. The other one is designed to assess the reliability of existing PPIs and filter out the unreliable ones. In this article, we review the two kinds of methods for detecting reliable PPIs, and then focus on evaluating the performance of some of these typical methods. Later on, we also enumerate several PPI network-based applications with taking a reliability assessment of the PPI data into consideration. Finally, we will discuss the challenges for obtaining reliable PPIs and future directions of the construction of reliable PPI networks. Our research will provide readers some guidance for choosing appropriate methods and features for obtaining reliable PPIs.
Collapse
|
23
|
Li Y, Ilie L. SPRINT: ultrafast protein-protein interaction prediction of the entire human interactome. BMC Bioinformatics 2017; 18:485. [PMID: 29141584 PMCID: PMC5688644 DOI: 10.1186/s12859-017-1871-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Accepted: 10/17/2017] [Indexed: 12/30/2022] Open
Abstract
Background Proteins perform their functions usually by interacting with other proteins. Predicting which proteins interact is a fundamental problem. Experimental methods are slow, expensive, and have a high rate of error. Many computational methods have been proposed among which sequence-based ones are very promising. However, so far no such method is able to predict effectively the entire human interactome: they require too much time or memory. Results We present SPRINT (Scoring PRotein INTeractions), a new sequence-based algorithm and tool for predicting protein-protein interactions. We comprehensively compare SPRINT with state-of-the-art programs on seven most reliable human PPI datasets and show that it is more accurate while running orders of magnitude faster and using very little memory. Conclusion SPRINT is the only sequence-based program that can effectively predict the entire human interactome: it requires between 15 and 100 min, depending on the dataset. Our goal is to transform the very challenging problem of predicting the entire human interactome into a routine task. Availability The source code of SPRINT is freely available from https://github.com/lucian-ilie/SPRINT/
and the datasets and predicted PPIs from www.csd.uwo.ca/faculty/ilie/SPRINT/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1871-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yiwei Li
- Department of Computer Science, The University of Western Ontario, London, N6A 5B7, Ontario, Canada
| | - Lucian Ilie
- Department of Computer Science, The University of Western Ontario, London, N6A 5B7, Ontario, Canada.
| |
Collapse
|
24
|
Gene Prediction in Metagenomic Fragments with Deep Learning. BIOMED RESEARCH INTERNATIONAL 2017; 2017:4740354. [PMID: 29250541 PMCID: PMC5698827 DOI: 10.1155/2017/4740354] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 10/08/2017] [Indexed: 01/14/2023]
Abstract
Next generation sequencing technologies used in metagenomics yield numerous sequencing fragments which come from thousands of different species. Accurately identifying genes from metagenomics fragments is one of the most fundamental issues in metagenomics. In this article, by fusing multifeatures (i.e., monocodon usage, monoamino acid usage, ORF length coverage, and Z-curve features) and using deep stacking networks learning model, we present a novel method (called Meta-MFDL) to predict the metagenomic genes. The results with 10 CV and independent tests show that Meta-MFDL is a powerful tool for identifying genes from metagenomic fragments.
Collapse
|
25
|
Sun T, Zhou B, Lai L, Pei J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics 2017; 18:277. [PMID: 28545462 PMCID: PMC5445391 DOI: 10.1186/s12859-017-1700-2] [Citation(s) in RCA: 186] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Accepted: 05/18/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein-protein interactions (PPIs) are critical for many biological processes. It is therefore important to develop accurate high-throughput methods for identifying PPI to better understand protein function, disease occurrence, and therapy design. Though various computational methods for predicting PPI have been developed, their robustness for prediction with external datasets is unknown. Deep-learning algorithms have achieved successful results in diverse areas, but their effectiveness for PPI prediction has not been tested. RESULTS We used a stacked autoencoder, a type of deep-learning algorithm, to study the sequence-based PPI prediction. The best model achieved an average accuracy of 97.19% with 10-fold cross-validation. The prediction accuracies for various external datasets ranged from 87.99% to 99.21%, which are superior to those achieved with previous methods. CONCLUSIONS To our knowledge, this research is the first to apply a deep-learning algorithm to sequence-based PPI prediction, and the results demonstrate its potential in this field.
Collapse
Affiliation(s)
- Tanlin Sun
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Bo Zhou
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China.,Beijing National Laboratory for Molecular Science, State Key Laboratory for Structural Chemistry of Unstable and Stable Species, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China.,Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China.
| |
Collapse
|
26
|
Chai J, Ju J, Zhang SW, Shen ZY, Liang L, Yang XM, Ma C, Ni QW, Sun MY. p12CDK2-AP1 interacts with CD82 to regulate the proliferation and survival of human oral squamous cell carcinoma cells. Oncol Rep 2016; 36:737-44. [PMID: 27349208 DOI: 10.3892/or.2016.4893] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Accepted: 02/20/2016] [Indexed: 11/05/2022] Open
Abstract
p12 cyclin-dependent kinase 2 (CDK2)-associating protein 1 (p12CDK2-AP1) has been demonstrated to negatively regulate the activity of CDK2. However, the underlying molecular mechanism remains largely unknown. We aimed to determine the potential binding proteins of p12CDK2-AP1 and to elucidate the role of p12CDK2-AP1 in the regulation of the proliferation, invasion, apoptosis, and in vivo growth of human oral squamous cell carcinoma cells. The protein-protein interaction was predicted using computational decision templates. The predicted p12CDK2‑AP1 interacting proteins were overexpressed in human oral squamous cell carcinoma OSCC-15 cells, and the protein binding was examined using co-precipitation (Co-IP). Cell proliferation and invasion were determined via MTT assay and Transwell system, respectively. Cell apoptosis was evaluated using Annexin V-FITC/PI double staining followed by flow cytometric analysis. The in vivo growth of OSCC-15 cells was examined in nude mouse tumor xenografts. We found that overexpression of either p12CDK2-AP1 or CD82 significantly suppressed the proliferation and invasion but promoted the apoptosis of OSCC-15 cells (P<0.05). Importantly, combined overexpression of p12CDK2-AP1 and CD82 showed synergistic antitumor activity compared with the overexpression of a single protein alone (P<0.05). Additionally, the simultaneous overexpression of p12CDK2-AP1 and CD82 significantly suppressed the in vivo tumor growth of OSCC-15 cells in nude mice compared with the negative control (P<0.05). Our findings indicate that p12CDK2-AP1 interacts with CD82 to play a functional role in suppressing the in vitro and in vivo growth of OSCC-15 cells.
Collapse
Affiliation(s)
- Juan Chai
- State Key Laboratory of Military Stomatology, Department of Oral and Maxillofacial Surgery, School of Stomatology, The Fourth Military Medical University, Xi'an, Shaanxi 710032, P.R. China
| | - Jun Ju
- State Key Laboratory of Military Stomatology, Department of Oral and Maxillofacial Surgery, School of Stomatology, The Fourth Military Medical University, Xi'an, Shaanxi 710032, P.R. China
| | - Shao-Wu Zhang
- College of Automation, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, P.R. China
| | - Zhi-Yuan Shen
- State Key Laboratory of Military Stomatology, Department of Oral and Maxillofacial Surgery, School of Stomatology, The Fourth Military Medical University, Xi'an, Shaanxi 710032, P.R. China
| | - Liang Liang
- State Key Laboratory of Military Stomatology, Department of Oral and Maxillofacial Surgery, School of Stomatology, The Fourth Military Medical University, Xi'an, Shaanxi 710032, P.R. China
| | - Xiang-Ming Yang
- State Key Laboratory of Military Stomatology, Department of Oral and Maxillofacial Surgery, School of Stomatology, The Fourth Military Medical University, Xi'an, Shaanxi 710032, P.R. China
| | - Chao Ma
- State Key Laboratory of Military Stomatology, Department of Oral and Maxillofacial Surgery, School of Stomatology, The Fourth Military Medical University, Xi'an, Shaanxi 710032, P.R. China
| | - Qian-Wei Ni
- State Key Laboratory of Military Stomatology, Department of Oral and Maxillofacial Surgery, School of Stomatology, The Fourth Military Medical University, Xi'an, Shaanxi 710032, P.R. China
| | - Mo-Yi Sun
- State Key Laboratory of Military Stomatology, Department of Oral and Maxillofacial Surgery, School of Stomatology, The Fourth Military Medical University, Xi'an, Shaanxi 710032, P.R. China
| |
Collapse
|
27
|
Prediction of human protein–protein interaction by a domain-based approach. J Theor Biol 2016; 396:144-53. [DOI: 10.1016/j.jtbi.2016.02.026] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Revised: 01/29/2016] [Accepted: 02/20/2016] [Indexed: 02/04/2023]
|
28
|
Rigid-Docking Approaches to Explore Protein-Protein Interaction Space. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2016; 160:33-55. [PMID: 27830312 DOI: 10.1007/10_2016_41] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Protein-protein interactions play core roles in living cells, especially in the regulatory systems. As information on proteins has rapidly accumulated on publicly available databases, much effort has been made to obtain a better picture of protein-protein interaction networks using protein tertiary structure data. Predicting relevant interacting partners from their tertiary structure is a challenging task and computer science methods have the potential to assist with this. Protein-protein rigid docking has been utilized by several projects, docking-based approaches having the advantages that they can suggest binding poses of predicted binding partners which would help in understanding the interaction mechanisms and that comparing docking results of both non-binders and binders can lead to understanding the specificity of protein-protein interactions from structural viewpoints. In this review we focus on explaining current computational prediction methods to predict pairwise direct protein-protein interactions that form protein complexes.
Collapse
|
29
|
Yan XY, Zhang SW, Zhang SY. Prediction of drug–target interaction by label propagation with mutual interaction information derived from heterogeneous network. MOLECULAR BIOSYSTEMS 2016; 12:520-31. [DOI: 10.1039/c5mb00615e] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
By implementing label propagation on drug/target similarity network with mutual interaction information derived from drug–target heterogeneous network, LPMIHN algorithm identifies potential drug–target interactions.
Collapse
Affiliation(s)
- Xiao-Ying Yan
- Key Laboratory of Information Fusion Technology of Ministry of Education
- School of Automation
- Northwestern Polytechnical University
- Xi'an
- China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education
- School of Automation
- Northwestern Polytechnical University
- Xi'an
- China
| | - Song-Yao Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education
- School of Automation
- Northwestern Polytechnical University
- Xi'an
- China
| |
Collapse
|
30
|
Niu Y, Wang Y. Protein-protein interaction identification using a hybrid model. Artif Intell Med 2015; 64:185-93. [PMID: 26054427 DOI: 10.1016/j.artmed.2015.05.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Revised: 05/13/2015] [Accepted: 05/15/2015] [Indexed: 11/26/2022]
Abstract
BACKGROUND Most existing systems that identify protein-protein interaction (PPI) in literature make decisions solely on evidence within a single sentence and ignore the rich context of PPI descriptions in large corpora. Moreover, they often suffer from the heavy burden of manual annotation. METHODS To address these problems, a new relational-similarity (RS)-based approach exploiting context in large-scale text is proposed. A basic RS model is first established to make initial predictions. Then word similarity matrices that are sensitive to the PPI identification task are constructed using a corpus-based approach. Finally, a hybrid model is developed to integrate the word similarity model with the basic RS model. RESULTS The experimental results show that the basic RS model achieves F-scores much higher than a baseline of random guessing on interactions (from 50.6% to 75.0%) and non-interactions (from 49.4% to 74.2%). The hybrid model further improves F-score by about 2% on interactions and 3% on non-interactions. CONCLUSION The experimental evaluations conducted with PPIs in well-known databases showed the effectiveness of our approach that explores context information in PPI identification. This investigation confirmed that within the framework of relational similarity, the word similarity model relieves the data sparseness problem in similarity calculation.
Collapse
Affiliation(s)
- Yun Niu
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, 29 Yudao Street, Qinhuaiqu, Nanjing, Jiangsu 210016, China.
| | - Yuwei Wang
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, 29 Yudao Street, Qinhuaiqu, Nanjing, Jiangsu 210016, China
| |
Collapse
|
31
|
Zhong WZ, Zhou SF. Molecular science for drug development and biomedicine. Int J Mol Sci 2014; 15:20072-8. [PMID: 25375190 PMCID: PMC4264156 DOI: 10.3390/ijms151120072] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [MESH Headings] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Accepted: 10/24/2014] [Indexed: 01/21/2023] Open
Affiliation(s)
- Wei-Zhu Zhong
- Gordon Life Science Institute, Belmont, MA 02478, USA.
| | - Shu-Feng Zhou
- Department of Pharmaceutical Sciences, College of Pharmacy, University of South Florida, Tampa, FL 33620, USA.
| |
Collapse
|
32
|
Folador EL, Hassan SS, Lemke N, Barh D, Silva A, Ferreira RS, Azevedo V. An improved interolog mapping-based computational prediction of protein–protein interactions with increased network coverage. Integr Biol (Camb) 2014; 6:1080-7. [DOI: 10.1039/c4ib00136b] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Automated and efficient methods that map ortholog interactions from several organisms and public databases (pDB) are needed to identify new interactions in an organism of interest (interolog mapping).
Collapse
Affiliation(s)
- Edson Luiz Folador
- Department of General Biology
- Instituto de Ciências Biológicas (ICB)
- Federal University of Minas Gerais (UFMG)
- Belo Horizonte, Brazil
| | - Syed Shah Hassan
- Department of General Biology
- Instituto de Ciências Biológicas (ICB)
- Federal University of Minas Gerais (UFMG)
- Belo Horizonte, Brazil
| | - Ney Lemke
- Laboratory of Bioinformatic and Computational Biofisic
- Instituto de Biociência
- Universidade Estadual de São Paulo (UNESP)
- Botucatu, Brazil
| | - Debmalya Barh
- Centre for Genomics and Applied Gene Technology
- Institute of Integrative Omics and Applied Biotechnology (IIOAB)
- Purba Medinipur, India
| | - Artur Silva
- Instituto de Ciências Biológicas
- Universidade Federal do Para
- Belém, Brazil
| | - Rafaela Salgado Ferreira
- Department of Biochemistry and Immunology
- Federal University of Minas Gerais (UFMG)
- Belo Horizonte, Brazil
| | - Vasco Azevedo
- Department of General Biology
- Instituto de Ciências Biológicas (ICB)
- Federal University of Minas Gerais (UFMG)
- Belo Horizonte, Brazil
| |
Collapse
|