1
|
Jia X, Luo W, Li J, Xing J, Sun H, Wu S, Su X. A deep learning framework for predicting disease-gene associations with functional modules and graph augmentation. BMC Bioinformatics 2024; 25:214. [PMID: 38877401 DOI: 10.1186/s12859-024-05841-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 06/12/2024] [Indexed: 06/16/2024] Open
Abstract
BACKGROUND The exploration of gene-disease associations is crucial for understanding the mechanisms underlying disease onset and progression, with significant implications for prevention and treatment strategies. Advances in high-throughput biotechnology have generated a wealth of data linking diseases to specific genes. While graph representation learning has recently introduced groundbreaking approaches for predicting novel associations, existing studies always overlooked the cumulative impact of functional modules such as protein complexes and the incompletion of some important data such as protein interactions, which limits the detection performance. RESULTS Addressing these limitations, here we introduce a deep learning framework called ModulePred for predicting disease-gene associations. ModulePred performs graph augmentation on the protein interaction network using L3 link prediction algorithms. It builds a heterogeneous module network by integrating disease-gene associations, protein complexes and augmented protein interactions, and develops a novel graph embedding for the heterogeneous module network. Subsequently, a graph neural network is constructed to learn node representations by collectively aggregating information from topological structure, and gene prioritization is carried out by the disease and gene embeddings obtained from the graph neural network. Experimental results underscore the superiority of ModulePred, showcasing the effectiveness of incorporating functional modules and graph augmentation in predicting disease-gene associations. This research introduces innovative ideas and directions, enhancing the understanding and prediction of gene-disease relationships.
Collapse
Affiliation(s)
- Xianghu Jia
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, Shandong, China
| | - Weiwen Luo
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, Shandong, China
| | - Jiaqi Li
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, Shandong, China
| | - Jieqi Xing
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, Shandong, China
| | - Hongjie Sun
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, Shandong, China
| | - Shunyao Wu
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, Shandong, China.
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, Shandong, China.
| |
Collapse
|
2
|
Jang YH, Han J, Shim SK, Cheong S, Lee SH, Han JK, Hwang CS. Cross-Wired Memristive Crossbar Array for Effective Graph Data Analysis. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2023:e2311040. [PMID: 38145578 DOI: 10.1002/adma.202311040] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 12/06/2023] [Indexed: 12/27/2023]
Abstract
Graphs adequately represent the enormous interconnections among numerous entities in big data, incurring high computational costs in analyzing them with conventional hardware. Physical graph representation (PGR) is an approach that replicates the graph within a physical system, allowing for efficient analysis. This study introduces a cross-wired crossbar array (cwCBA), uniquely connecting diagonal and non-diagonal components in a CBA by a cross-wiring process. The cross-wired diagonal cells enable cwCBA to achieve precise PGR and dynamic node state control. For this purpose, a cwCBA is fabricated using Pt/Ta2 O5 /HfO2 /TiN (PTHT) memristor with high on/off and self-rectifying characteristics. The structural and device benefits of PTHT cwCBA for enhanced PGR precision are highlighted, and the practical efficacy is demonstrated for two applications. First, it executes a dynamic path-finding algorithm, identifying the shortest paths in a dynamic graph. PTHT cwCBA shows a more accurate inferred distance and ≈1/3800 lower processing complexity than the conventional method. Second, it analyzes the protein-protein interaction (PPI) networks containing self-interacting proteins, which possess intricate characteristics compared to typical graphs. The PPI prediction results exhibit an average of 30.5% and 21.3% improvement in area under the curve and F1-score, respectively, compared to existing algorithms.
Collapse
Affiliation(s)
- Yoon Ho Jang
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Janguk Han
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Sung Keun Shim
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Sunwoo Cheong
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Soo Hyung Lee
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Joon-Kyu Han
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Cheol Seong Hwang
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| |
Collapse
|
3
|
Wang XW, Madeddu L, Spirohn K, Martini L, Fazzone A, Becchetti L, Wytock TP, Kovács IA, Balogh OM, Benczik B, Pétervári M, Ágg B, Ferdinandy P, Vulliard L, Menche J, Colonnese S, Petti M, Scarano G, Cuomo F, Hao T, Laval F, Willems L, Twizere JC, Vidal M, Calderwood MA, Petrillo E, Barabási AL, Silverman EK, Loscalzo J, Velardi P, Liu YY. Assessment of community efforts to advance network-based prediction of protein-protein interactions. Nat Commun 2023; 14:1582. [PMID: 36949045 PMCID: PMC10033937 DOI: 10.1038/s41467-023-37079-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 03/02/2023] [Indexed: 03/24/2023] Open
Abstract
Comprehensive understanding of the human protein-protein interaction (PPI) network, aka the human interactome, can provide important insights into the molecular mechanisms of complex biological processes and diseases. Despite the remarkable experimental efforts undertaken to date to determine the structure of the human interactome, many PPIs remain unmapped. Computational approaches, especially network-based methods, can facilitate the identification of previously uncharacterized PPIs. Many such methods have been proposed. Yet, a systematic evaluation of existing network-based methods in predicting PPIs is still lacking. Here, we report community efforts initiated by the International Network Medicine Consortium to benchmark the ability of 26 representative network-based methods to predict PPIs across six different interactomes of four different organisms: A. thaliana, C. elegans, S. cerevisiae, and H. sapiens. Through extensive computational and experimental validations, we found that advanced similarity-based methods, which leverage the underlying network characteristics of PPIs, show superior performance over other general link prediction methods in the interactomes we considered.
Collapse
Affiliation(s)
- Xu-Wen Wang
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Lorenzo Madeddu
- Translational and Precision Medicine Department Sapienza University of Rome, Rome, Italy
| | - Kerstin Spirohn
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Leonardo Martini
- Department of Computer, Control, and Management Engineering "Antonio Rubert", Sapienza University of Rome, Rome, Italy
| | | | - Luca Becchetti
- Department of Computer, Control, and Management Engineering "Antonio Rubert", Sapienza University of Rome, Rome, Italy
| | - Thomas P Wytock
- Department of Physics and Astronomy, Northwestern University, Evanston, IL, 60208, USA
| | - István A Kovács
- Department of Physics and Astronomy, Northwestern University, Evanston, IL, 60208, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, 60208, USA
| | - Olivér M Balogh
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - Bettina Benczik
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, 6722, Szeged, Hungary
| | - Mátyás Pétervári
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - Bence Ágg
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, 6722, Szeged, Hungary
| | - Péter Ferdinandy
- Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, 6722, Szeged, Hungary
| | - Loan Vulliard
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Jörg Menche
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
- Faculty of Mathematics, University of Vienna, Vienna, Austria
| | - Stefania Colonnese
- Department of Information Engineering, Electronics, and Telecommunications (DIET), University of Rome "Sapienza", Rome, Italy
| | - Manuela Petti
- Department of Computer, Control, and Management Engineering "Antonio Rubert", Sapienza University of Rome, Rome, Italy
| | - Gaetano Scarano
- Department of Information Engineering, Electronics, and Telecommunications (DIET), University of Rome "Sapienza", Rome, Italy
| | - Francesca Cuomo
- Department of Information Engineering, Electronics, and Telecommunications (DIET), University of Rome "Sapienza", Rome, Italy
| | - Tong Hao
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Florent Laval
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Laboratory of Molecular and Cellular Epigenetic, GIGA Institute, University of Liège, Liège, Belgium
- Laboratory of Viral Interactomes, GIGA Institute, University of Liège, Liège, Belgium
- TERRA Teaching and Research Centre, University of Liège, Gembloux, Belgium
| | - Luc Willems
- Laboratory of Molecular and Cellular Epigenetic, GIGA Institute, University of Liège, Liège, Belgium
- TERRA Teaching and Research Centre, University of Liège, Gembloux, Belgium
| | - Jean-Claude Twizere
- Laboratory of Viral Interactomes, GIGA Institute, University of Liège, Liège, Belgium
- TERRA Teaching and Research Centre, University of Liège, Gembloux, Belgium
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Michael A Calderwood
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Enrico Petrillo
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Department of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, 02115, USA
| | - Albert-László Barabási
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Network Science Institute and Department of Physics, Northeastern University, Boston, MA, 02115, USA
- Department of Network and Data Science, Central European University, Budapest, H-1051, Hungary
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Joseph Loscalzo
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Paola Velardi
- Translational and Precision Medicine Department Sapienza University of Rome, Rome, Italy.
| | - Yang-Yu Liu
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA.
- Center for Artificial Intelligence and Modeling, The Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, 61801, USA.
| |
Collapse
|
4
|
Protein Integrated Network Analysis to Reveal Potential Drug Targets Against Extended Drug-Resistant Mycobacterium tuberculosis XDR1219. Mol Biotechnol 2021; 63:1252-1267. [PMID: 34382159 DOI: 10.1007/s12033-021-00377-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 07/30/2021] [Indexed: 10/20/2022]
Abstract
The reconstruction and analysis of the protein-protein interaction (PPI) network is a powerful approach to understand the complex biological and molecular functions in normal and disease states of the cell. The interactome of most organisms is largely unidentified except some model organisms. The current study focused on the construction of PPI network for the human pathogen Mycobacterium tuberculosis (MTB)-resistant strain XDR1219 using computational methods. In this work, a bioinformatics approach was employed to reveal potential drug targets. The pipeline adopted the combination of an extensive integrated network analysis that led to identify 22 key proteins involved in drug resistance, resistant metabolic pathways, virulence, pathogenesis and persistency of the infection. The MTB XDR1219 interactome consists of 11,383 non-redundant PPIs among 1499 proteins covering 38% of the entire MTB XDR1219 proteome. The overall quality of the network was assessed and topological parameters of the PPI were calculated. The predicted interactions were functionally annotated and their relevance was assessed with the functional similarity. The study attempts to present the interactome of previously unidentified MTB XDR1219 and revealed potential drug targets that can be further explored by scientific community.
Collapse
|
5
|
Salmanian S, Pezeshk H, Sadeghi M. Inter-protein residue covariation information unravels physically interacting protein dimers. BMC Bioinformatics 2020; 21:584. [PMID: 33334319 PMCID: PMC7745481 DOI: 10.1186/s12859-020-03930-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 12/09/2020] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Predicting physical interaction between proteins is one of the greatest challenges in computational biology. There are considerable various protein interactions and a huge number of protein sequences and synthetic peptides with unknown interacting counterparts. Most of co-evolutionary methods discover a combination of physical interplays and functional associations. However, there are only a handful of approaches which specifically infer physical interactions. Hybrid co-evolutionary methods exploit inter-protein residue coevolution to unravel specific physical interacting proteins. In this study, we introduce a hybrid co-evolutionary-based approach to predict physical interplays between pairs of protein families, starting from protein sequences only. RESULTS In the present analysis, pairs of multiple sequence alignments are constructed for each dimer and the covariation between residues in those pairs are calculated by CCMpred (Contacts from Correlated Mutations predicted) and three mutual information based approaches for ten accessible surface area threshold groups. Then, whole residue couplings between proteins of each dimer are unified into a single Frobenius norm value. Norms of residue contact matrices of all dimers in different accessible surface area thresholds are fed into support vector machine as single or multiple feature models. The results of training the classifiers by single features show no apparent different accuracies in distinct methods for different accessible surface area thresholds. Nevertheless, mutual information product and context likelihood of relatedness procedures may roughly have an overall higher and lower performances than other two methods for different accessible surface area cut-offs, respectively. The results also demonstrate that training support vector machine with multiple norm features for several accessible surface area thresholds leads to a considerable improvement of prediction performance. In this context, CCMpred roughly achieves an overall better performance than mutual information based approaches. The best accuracy, sensitivity, specificity, precision and negative predictive value for that method are 0.98, 1, 0.962, 0.96, and 0.962, respectively. CONCLUSIONS In this paper, by feeding norm values of protein dimers into support vector machines in different accessible surface area thresholds, we demonstrate that even small number of proteins in pairs of multiple alignments could allow one to accurately discriminate between positive and negative dimers.
Collapse
Affiliation(s)
- Sara Salmanian
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
- Present Address: Department of Mathematics and Statistics, Concordia University, Montreal, Canada
- School of Biological Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
6
|
Kundrotas PJ, Kotthoff I, Choi SW, Copeland MM, Vakser IA. Dockground Tool for Development and Benchmarking of Protein Docking Procedures. Methods Mol Biol 2020; 2165:289-300. [PMID: 32621232 DOI: 10.1007/978-1-0716-0708-4_17] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Databases of protein-protein complexes are essential for the development of protein modeling/docking techniques. Such databases provide a knowledge base for docking algorithms, intermolecular potentials, search procedures, scoring functions, and refinement protocols. Development of docking techniques requires systematic validation of the modeling protocols on carefully curated benchmark sets of complexes. We present a description and a guide to the DOCKGROUND resource ( http://dockground.compbio.ku.edu ) for structural modeling of protein interactions. The resource integrates various datasets of protein complexes and other data for the development and testing of protein docking techniques. The sets include bound complexes, experimentally determined unbound, simulated unbound, model-model complexes, and docking decoys. The datasets are available to the user community through a Web interface.
Collapse
Affiliation(s)
- Petras J Kundrotas
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, USA.
| | - Ian Kotthoff
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, USA
| | - Sherman W Choi
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, USA
| | - Matthew M Copeland
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, USA
| | - Ilya A Vakser
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, USA.
| |
Collapse
|
7
|
V K MA, Chandrasekaran VM, Pandurangan S. Protein Domain Level Cancer Drug Targets in the Network of MAPK Pathways. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:2057-2065. [PMID: 29993692 DOI: 10.1109/tcbb.2018.2829507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Proteins in the MAPK pathways considered as potential drug targets for cancer treatment. Pathways along with the cross-talks increase their scope to view them as a network of MAPK pathways. Side effect causing targeted domains act as a proxy for drug targets due to its structural similarity and frequent reuse of their variants. We proposed to identify non-repeatable protein domains as the drug targets to disrupt the signal transduction than targeting the whole protein. Network based approach is used to understand the contribution of 52 domains in non-hub, non-essential, and intra-pathway cancerous nodes and to identify potential drug target domains. 34 distinct domains in the cancerous proteins are playing vital roles in making cancer as a complex disease and pose challenges to identify potential drug targets. Distribution of domain families follows the power law in the network. Single promiscuous domains are contributing to the formation of hubs like Pkinease, Pkinease Tyr, and Ras. Hub nodes are positively correlated with the domain coverage and targeting them would disrupt functional properties of the proteins. EIF 4EBP, alpha Kinase, Sel1, ROKNT, and KH 1 are the domains identified as potential domain targets for the disruption of the signaling mechanism involved in cancer.
Collapse
|
8
|
Sim EUH, Talwar SP. In silico evidence of de novo interactions between ribosomal and Epstein - Barr virus proteins. BMC Mol Cell Biol 2019; 20:34. [PMID: 31416416 PMCID: PMC6694676 DOI: 10.1186/s12860-019-0219-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 08/08/2019] [Indexed: 12/29/2022] Open
Abstract
Background Association of Epstein-Barr virus (EBV) encoded latent gene products with host ribosomal proteins (RPs) has not been fully explored, despite their involvement in the aetiology of several human cancers. To gain an insight into their plausible interactions, we employed a computational approach that encompasses structural alignment, gene ontology analysis, pathway analysis, and molecular docking. Results In this study, the alignment analysis based on structural similarity allows the prediction of 48 potential interactions between 27 human RPs and the EBV proteins EBNA1, LMP1, LMP2A, and LMP2B. Gene ontology analysis of the putative protein-protein interactions (PPIs) reveals their probable involvement in RNA binding, ribosome biogenesis, metabolic and biosynthetic processes, and gene regulation. Pathway analysis shows their possible participation in viral infection strategies (viral translation), as well as oncogenesis (Wnt and EGFR signalling pathways). Finally, our molecular docking assay predicts the functional interactions of EBNA1 with four RPs individually: EBNA1-eS10, EBNA1-eS25, EBNA1-uL10 and EBNA1-uL11. Conclusion These interactions have never been revealed previously via either experimental or in silico approach. We envisage that the calculated interactions between the ribosomal and EBV proteins herein would provide a hypothetical model for future experimental studies on the functional relationship between ribosomal proteins and EBV infection. Electronic supplementary material The online version of this article (10.1186/s12860-019-0219-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Edmund Ui-Hang Sim
- Faculty of Resource Science and Technology, Universiti Malaysia Sarawak, 94300, Kota Samarahan, Sarawak, Malaysia.
| | - Shruti Prashant Talwar
- Faculty of Resource Science and Technology, Universiti Malaysia Sarawak, 94300, Kota Samarahan, Sarawak, Malaysia
| |
Collapse
|
9
|
Pearce R, Huang X, Setiawan D, Zhang Y. EvoDesign: Designing Protein-Protein Binding Interactions Using Evolutionary Interface Profiles in Conjunction with an Optimized Physical Energy Function. J Mol Biol 2019; 431:2467-2476. [PMID: 30851277 DOI: 10.1016/j.jmb.2019.02.028] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Revised: 02/10/2019] [Accepted: 02/26/2019] [Indexed: 01/19/2023]
Abstract
EvoDesign (https://zhanglab.ccmb.med.umich.edu/EvoDesign) is an online server system for protein design. The method uses evolutionary profiles to guide the sequence search simulation and demonstrated significant advantages over physics-based approaches in terms of more accurately designing proteins that adopt desired target folds. Despite the success, the previous EvoDesign program focused only on monomer protein design, which limited its ability and usefulness in terms of designing functional proteins. In this work, we propose a new EvoDesign server, which extends the principles of evolution-based design to design protein-protein interactions. Starting from a two-chain complex structure, structurally similar interfaces are identified from known protein-protein interaction databases. An interface evolutionary profile is then constructed from a multiple sequence alignment of the interface analogies, which is combined with a newly developed, atomic-level physical energy function to guide the replica-exchange Monte Carlo simulation search. The purpose of the server is to redesign the specified complex chain to increase its stability and binding affinity for the other chain in the complex. With the improved scope and accuracy of the methodology, the new EvoDesign pipeline should become a useful online tool for functional protein design and drug discovery studies.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Dani Setiawan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
10
|
Tian B, Wu X, Chen C, Qiu W, Ma Q, Yu B. Predicting protein–protein interactions by fusing various Chou's pseudo components and using wavelet denoising approach. J Theor Biol 2019; 462:329-346. [DOI: 10.1016/j.jtbi.2018.11.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 11/08/2018] [Accepted: 11/15/2018] [Indexed: 12/26/2022]
|
11
|
Kundrotas PJ, Anishchenko I, Dauzhenka T, Kotthoff I, Mnevets D, Copeland MM, Vakser IA. Dockground: A comprehensive data resource for modeling of protein complexes. Protein Sci 2017; 27:172-181. [PMID: 28891124 DOI: 10.1002/pro.3295] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 09/06/2017] [Accepted: 09/07/2017] [Indexed: 12/28/2022]
Abstract
Characterization of life processes at the molecular level requires structural details of protein interactions. The number of experimentally determined structures of protein-protein complexes accounts only for a fraction of known protein interactions. This gap in structural description of the interactome has to be bridged by modeling. An essential part of the development of structural modeling/docking techniques for protein interactions is databases of protein-protein complexes. They are necessary for studying protein interfaces, providing a knowledge base for docking algorithms, and developing intermolecular potentials, search procedures, and scoring functions. Development of protein-protein docking techniques requires thorough benchmarking of different parts of the docking protocols on carefully curated sets of protein-protein complexes. We present a comprehensive description of the Dockground resource (http://dockground.compbio.ku.edu) for structural modeling of protein interactions, including previously unpublished unbound docking benchmark set 4, and the X-ray docking decoy set 2. The resource offers a variety of interconnected datasets of protein-protein complexes and other data for the development and testing of different aspects of protein docking methodologies. Based on protein-protein complexes extracted from the PDB biounit files, Dockground offers sets of X-ray unbound, simulated unbound, model, and docking decoy structures. All datasets are freely available for download, as a whole or selecting specific structures, through a user-friendly interface on one integrated website.
Collapse
Affiliation(s)
- Petras J Kundrotas
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, 66045
| | - Ivan Anishchenko
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, 66045
| | - Taras Dauzhenka
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, 66045
| | - Ian Kotthoff
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, 66045
| | - Daniil Mnevets
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, 66045
| | - Matthew M Copeland
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, 66045
| | - Ilya A Vakser
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, 66045.,Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, 66045
| |
Collapse
|
12
|
Tian B, Zhao C, Gu F, He Z. A two-step framework for inferring direct protein-protein interaction network from AP-MS data. BMC SYSTEMS BIOLOGY 2017; 11:82. [PMID: 28950876 PMCID: PMC5615237 DOI: 10.1186/s12918-017-0452-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Background Affinity purification-mass spectrometry (AP-MS) has been widely used for generating bait-prey data sets so as to identify underlying protein-protein interactions and protein complexes. However, the AP-MS data sets in terms of bait-prey pairs are highly noisy, where candidate pairs contain many false positives. Recently, numerous computational methods have been developed to identify genuine interactions from AP-MS data sets. However, most of these methods aim at removing false positives that contain contaminants, ignoring the distinction between direct interactions and indirect interactions. Results In this paper, we present an initialization-and-refinement framework for inferring direct PPI networks from AP-MS data, in which an initial network is first generated with existing scoring methods and then a refined network is constructed by the application of indirect association removal methods. Experimental results on several real AP-MS data sets show that our method is capable of identifying more direct interactions than traditional scoring methods. Conclusions The proposed framework is sufficiently general to incorporate any feasible methods in each step so as to have potential for handling different types of AP-MS data in the future applications. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0452-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bo Tian
- School of Software, Dalian University of Technology, Tuqiang Road, Dalian, China
| | - Can Zhao
- School of Software, Dalian University of Technology, Tuqiang Road, Dalian, China
| | - Feiyang Gu
- School of Software, Dalian University of Technology, Tuqiang Road, Dalian, China
| | - Zengyou He
- School of Software, Dalian University of Technology, Tuqiang Road, Dalian, China. .,Key Laboratory for Ubiquitous Network and Service Software of Liaoning, Tuqiang Road 321, Dalian, 116600, China.
| |
Collapse
|
13
|
Abstract
Attachment of ubiquitin to proteins relies on a sophisticated enzyme cascade that is tightly regulated. The machinery of ubiquitylation responds to a range of signals, which remarkably includes ubiquitin itself. Thus, ubiquitin is not only the central player in the ubiquitylation cascade but also a key regulator. The ubiquitin E3 ligases provide specificity to the cascade and often bind the substrate, while the ubiquitin-conjugating enzymes (E2s) have a pivotal role in determining chain linkage and length. Interaction of ubiquitin with the E2 is important for activity, but the weak nature of these contacts has made them hard to identify and study. By reviewing available crystal structures, we identify putative ubiquitin binding sites on E2s, which may enhance E2 processivity and the assembly of chains of a defined linkage. The implications of these new sites are discussed in the context of known E2-ubiquitin interactions.
Collapse
|
14
|
A Comprehensive Guide for Performing Sample Preparation and Top-Down Protein Analysis. Proteomes 2017; 5:proteomes5020011. [PMID: 28387712 PMCID: PMC5489772 DOI: 10.3390/proteomes5020011] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2016] [Revised: 04/04/2017] [Accepted: 04/04/2017] [Indexed: 12/21/2022] Open
Abstract
Methodologies for the global analysis of proteins in a sample, or proteome analysis, have been available since 1975 when Patrick O′Farrell published the first paper describing two-dimensional gel electrophoresis (2D-PAGE). This technique allowed the resolution of single protein isoforms, or proteoforms, into single ‘spots’ in a polyacrylamide gel, allowing the quantitation of changes in a proteoform′s abundance to ascertain changes in an organism′s phenotype when conditions change. In pursuit of the comprehensive profiling of the proteome, significant advances in technology have made the identification and quantitation of intact proteoforms from complex mixtures of proteins more routine, allowing analysis of the proteome from the ‘Top-Down’. However, the number of proteoforms detected by Top-Down methodologies such as 2D-PAGE or mass spectrometry has not significantly increased since O’Farrell’s paper when compared to Bottom-Up, peptide-centric techniques. This article explores and explains the numerous methodologies and technologies available to analyse the proteome from the Top-Down with a strong emphasis on the necessity to analyse intact proteoforms as a better indicator of changes in biology and phenotype. We arrive at the conclusion that the complete and comprehensive profiling of an organism′s proteome is still, at present, beyond our reach but the continuing evolution of protein fractionation techniques and mass spectrometry brings comprehensive Top-Down proteome profiling closer.
Collapse
|
15
|
Etrych T, Boustta M, Leclercq L, Vert M. Release of Polyanions from Polyelectrolyte Complexes by Selective Degradation of the Polycation. J BIOACT COMPAT POL 2016. [DOI: 10.1177/0883911506062974] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
One of the major problems associated with analyzing polyelectrolyte complexes is the separation of strongly bound oppositely charged polymeric components. As part of a work aimed at better understanding the factors that affect polyelectrolyte complex formation and stability, an investigation of the possibility to release and analyze the polyanion, after hydrolytic or enzymatic degradation of the partner polycation, was made. Mixtures of poly(acrylic acid) or poly(L-lysine citramide) polyanions with poly(L-lysine) or poly(amino serinate) polycations were investigated. For each polycation-polyanion couple, four complex fractions were obtained by adding the polycation to the polyanion according to a titration protocol. The selective degradation of the polycation within the different complex fractions was investigated after the complex was disrupted with a NaCl solution. The molecular weights of the recovered polyanionic macromolecules were assessed by both static light scattering and size exclusion chromatography. The data supported previous findings that complexation was selective according to the molecular weight of the polyanion for a given polycation. The lower the degree of neutralization of the polyanion negative charges by the polycation positive charges, the greater the molecular weight of the complexed polyanionic macromolecules.
Collapse
Affiliation(s)
- T. Etrych
- Institute of Macromolecular Chemistry, Academy of Sciences of the Czech Republic, Heyrovsky sq. 2, Prague 6, 162 06, Czech Republic
| | - M. Boustta
- Research Centre for Artificial Biopolymers - UMR CNRS 5473, University of Montpellier 1 - Faculty of Pharmacy, 15 Avenue Charles Flahault - BP 14491, F-34093 Montpellier Cedex 5, France
| | - L. Leclercq
- Research Centre for Artificial Biopolymers - UMR CNRS 5473, University of Montpellier 1 - Faculty of Pharmacy, 15 Avenue Charles Flahault - BP 14491, F-34093 Montpellier Cedex 5, France,
| | - M. Vert
- Research Centre for Artificial Biopolymers - UMR CNRS 5473, University of Montpellier 1 - Faculty of Pharmacy, 15 Avenue Charles Flahault - BP 14491, F-34093 Montpellier Cedex 5, France
| |
Collapse
|
16
|
Affiliation(s)
- Fangqiang Zhu
- Department
of Physics, Indiana University - Purdue University, Indianapolis, Indiana 46202, United States
| | - Bo Chen
- Department
of Physics, University of Central Florida, Orlando, Florida 32816, United States
| |
Collapse
|
17
|
Anishchenko I, Kundrotas PJ, Tuzikov AV, Vakser IA. Structural templates for comparative protein docking. Proteins 2015; 83:1563-70. [PMID: 25488330 DOI: 10.1002/prot.24736] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Revised: 11/15/2014] [Accepted: 11/26/2014] [Indexed: 11/07/2022]
Abstract
Structural characterization of protein-protein interactions is important for understanding life processes. Because of the inherent limitations of experimental techniques, such characterization requires computational approaches. Along with the traditional protein-protein docking (free search for a match between two proteins), comparative (template-based) modeling of protein-protein complexes has been gaining popularity. Its development puts an emphasis on full and partial structural similarity between the target protein monomers and the protein-protein complexes previously determined by experimental techniques (templates). The template-based docking relies on the quality and diversity of the template set. We present a carefully curated, nonredundant library of templates containing 4950 full structures of binary complexes and 5936 protein-protein interfaces extracted from the full structures at 12 Å distance cut-off. Redundancy in the libraries was removed by clustering the PDB structures based on structural similarity. The value of the clustering threshold was determined from the analysis of the clusters and the docking performance on a benchmark set. High structural quality of the interfaces in the template and validation sets was achieved by automated procedures and manual curation. The library is included in the Dockground resource for molecular recognition studies at http://dockground.bioinformatics.ku.edu.
Collapse
Affiliation(s)
- Ivan Anishchenko
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas, 66047.,United Institute of Informatics Problems, National Academy of Sciences, Minsk, 220012, Belarus
| | - Petras J Kundrotas
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas, 66047
| | - Alexander V Tuzikov
- United Institute of Informatics Problems, National Academy of Sciences, Minsk, 220012, Belarus
| | - Ilya A Vakser
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas, 66047.,Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, 66045
| |
Collapse
|
18
|
Malhotra S, Mathew OK, Sowdhamini R. DOCKSCORE: a webserver for ranking protein-protein docked poses. BMC Bioinformatics 2015; 16:127. [PMID: 25902779 PMCID: PMC4414291 DOI: 10.1186/s12859-015-0572-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Accepted: 04/13/2015] [Indexed: 11/28/2022] Open
Abstract
Background Proteins interact with a variety of other molecules such as nucleic acids, small molecules and other proteins inside the cell. Structure-determination of protein-protein complexes is challenging due to several reasons such as the large molecular weights of these macromolecular complexes, their dynamic nature, difficulty in purification and sample preparation. Computational docking permits an early understanding of the feasibility and mode of protein-protein interactions. However, docking algorithms propose a number of solutions and it is a challenging task to select the native or near native pose(s) from this pool. DockScore is an objective scoring scheme that can be used to rank protein-protein docked poses. It considers several interface parameters, namely, surface area, evolutionary conservation, hydrophobicity, short contacts and spatial clustering at the interface for scoring. Results We have implemented DockScore in form of a webserver for its use by the scientific community. DockScore webserver can be employed, subsequent to docking, to perform scoring of the docked solutions, starting from multiple poses as inputs. The results, on scores and ranks for all the poses, can be downloaded as a csv file and graphical view of the interface of best ranking poses is possible. Conclusions The webserver for DockScore is made freely available for the scientific community at: http://caps.ncbs.res.in/dockscore/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0572-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sony Malhotra
- National Centre for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, Bangalore, 560 065, India.
| | - Oommen K Mathew
- National Centre for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, Bangalore, 560 065, India. .,SASTRA University, Tirumalaisamudram, Thanjavur, 613 401, Tamil Nadu, India.
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, Bangalore, 560 065, India.
| |
Collapse
|
19
|
Maheshwari S, Brylinski M. Predicting protein interface residues using easily accessible on-line resources. Brief Bioinform 2015; 16:1025-34. [PMID: 25797794 DOI: 10.1093/bib/bbv009] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Indexed: 01/20/2023] Open
Abstract
It has been more than a decade since the completion of the Human Genome Project that provided us with a complete list of human proteins. The next obvious task is to figure out how various parts interact with each other. On that account, we review 10 methods for protein interface prediction, which are freely available as web servers. In addition, we comparatively evaluate their performance on a common data set comprising different quality target structures. We find that using experimental structures and high-quality homology models, structure-based methods outperform those using only protein sequences, with global template-based approaches providing the best performance. For moderate-quality models, sequence-based methods often perform better than those structure-based techniques that rely on fine atomic details. We note that post-processing protocols implemented in several methods quantitatively improve the results only for experimental structures, suggesting that these procedures should be tuned up for computer-generated models. Finally, we anticipate that advanced meta-prediction protocols are likely to enhance interface residue prediction. Notwithstanding further improvements, easily accessible web servers already provide the scientific community with convenient resources for the identification of protein-protein interaction sites.
Collapse
|
20
|
|
21
|
Nayarisseri A, Yadav M, Wishard R. Computational evaluation of new homologous down regulators of translationally controlled tumor protein (TCTP) targeted for tumor reversion. Interdiscip Sci 2014; 5:274-9. [DOI: 10.1007/s12539-013-0183-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2012] [Revised: 06/18/2012] [Accepted: 06/25/2012] [Indexed: 01/13/2023]
|
22
|
Shi H, Xu J, Zhang G, Xu L, Li C, Wang L, Zhao Z, Jiang W, Guo Z, Li X. Walking the interactome to identify human miRNA-disease associations through the functional link between miRNA targets and disease genes. BMC SYSTEMS BIOLOGY 2013; 7:101. [PMID: 24103777 PMCID: PMC4124764 DOI: 10.1186/1752-0509-7-101] [Citation(s) in RCA: 183] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Accepted: 10/03/2013] [Indexed: 12/19/2022]
Abstract
Background MicroRNAs (miRNAs) are important post-transcriptional regulators that have been demonstrated to play an important role in human diseases. Elucidating the associations between miRNAs and diseases at the systematic level will deepen our understanding of the molecular mechanisms of diseases. However, miRNA-disease associations identified by previous computational methods are far from completeness and more effort is needed. Results We developed a computational framework to identify miRNA-disease associations by performing random walk analysis, and focused on the functional link between miRNA targets and disease genes in protein-protein interaction (PPI) networks. Furthermore, a bipartite miRNA-disease network was constructed, from which several miRNA-disease co-regulated modules were identified by hierarchical clustering analysis. Our approach achieved satisfactory performance in identifying known cancer-related miRNAs for nine human cancers with an area under the ROC curve (AUC) ranging from 71.3% to 91.3%. By systematically analyzing the global properties of the miRNA-disease network, we found that only a small number of miRNAs regulated genes involved in various diseases, genes associated with neurological diseases were preferentially regulated by miRNAs and some immunological diseases were associated with several specific miRNAs. We also observed that most diseases in the same co-regulated module tended to belong to the same disease category, indicating that these diseases might share similar miRNA regulatory mechanisms. Conclusions In this study, we present a computational framework to identify miRNA-disease associations, and further construct a bipartite miRNA-disease network for systematically analyzing the global properties of miRNA regulation of disease genes. Our findings provide a broad perspective on the relationships between miRNAs and diseases and could potentially aid future research efforts concerning miRNA involvement in disease pathogenesis.
Collapse
Affiliation(s)
- Hongbo Shi
- College of Bioinformatics Science and Technology and State-Province Key Laboratories of Biomedicine-Pharmaceutics of China, Harbin Medical University, Harbin, Heilongjiang 150081, PR China.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Zhang QC, Petrey D, Garzón JI, Deng L, Honig B. PrePPI: a structure-informed database of protein-protein interactions. Nucleic Acids Res 2013; 41:D828-33. [PMID: 23193263 PMCID: PMC3531098 DOI: 10.1093/nar/gks1231] [Citation(s) in RCA: 182] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
PrePPI (http://bhapp.c2b2.columbia.edu/PrePPI) is a database that combines predicted and experimentally determined protein-protein interactions (PPIs) using a Bayesian framework. Predicted interactions are assigned probabilities of being correct, which are derived from calculated likelihood ratios (LRs) by combining structural, functional, evolutionary and expression information, with the most important contribution coming from structure. Experimentally determined interactions are compiled from a set of public databases that manually collect PPIs from the literature and are also assigned LRs. A final probability is then assigned to every interaction by combining the LRs for both predicted and experimentally determined interactions. The current version of PrePPI contains ∼2 million PPIs that have a probability more than ∼0.1 of which ∼60 000 PPIs for yeast and ∼370 000 PPIs for human are considered high confidence (probability > 0.5). The PrePPI database constitutes an integrated resource that enables users to examine aggregate information on PPIs, including both known and potentially novel interactions, and that provides structural models for many of the PPIs.
Collapse
Affiliation(s)
- Qiangfeng Cliff Zhang
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - Donald Petrey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - José Ignacio Garzón
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - Lei Deng
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - Barry Honig
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
- *To whom correspondence should be addressed. Tel: +1 212 851 4651; Fax: +1 212 851 4650,
| |
Collapse
|
24
|
Proteome-wide prediction of protein-protein interactions from high-throughput data. Protein Cell 2012; 3:508-20. [PMID: 22729399 DOI: 10.1007/s13238-012-2945-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2012] [Accepted: 05/30/2012] [Indexed: 12/15/2022] Open
Abstract
In this paper, we present a brief review of the existing computational methods for predicting proteome-wide protein-protein interaction networks from high-throughput data. The availability of various types of omics data provides great opportunity and also unprecedented challenge to infer the interactome in cells. Reconstructing the interactome or interaction network is a crucial step for studying the functional relationship among proteins and the involved biological processes. The protein interaction network will provide valuable resources and alternatives to decipher the mechanisms of these functionally interacting elements as well as the running system of cellular operations. In this paper, we describe the main steps of predicting protein-protein interaction networks and categorize the available approaches to couple the physical and functional linkages. The future topics and the analyses beyond prediction are also discussed and concluded.
Collapse
|
25
|
Garma L, Mukherjee S, Mitra P, Zhang Y. How many protein-protein interactions types exist in nature? PLoS One 2012; 7:e38913. [PMID: 22719985 PMCID: PMC3374795 DOI: 10.1371/journal.pone.0038913] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2012] [Accepted: 05/14/2012] [Indexed: 11/18/2022] Open
Abstract
“Protein quaternary structure universe” refers to the ensemble of all protein-protein complexes across all organisms in nature. The number of quaternary folds thus corresponds to the number of ways proteins physically interact with other proteins. This study focuses on answering two basic questions: Whether the number of protein-protein interactions is limited and, if yes, how many different quaternary folds exist in nature. By all-to-all sequence and structure comparisons, we grouped the protein complexes in the protein data bank (PDB) into 3,629 families and 1,761 folds. A statistical model was introduced to obtain the quantitative relation between the numbers of quaternary families and quaternary folds in nature. The total number of possible protein-protein interactions was estimated around 4,000, which indicates that the current protein repository contains only 42% of quaternary folds in nature and a full coverage needs approximately a quarter century of experimental effort. The results have important implications to the protein complex structural modeling and the structure genomics of protein-protein interactions.
Collapse
Affiliation(s)
- Leonardo Garma
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Biocenter Oulu and Department of Biochemistry, University of Oulu, Oulu, Finland
| | - Srayanta Mukherjee
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Pralay Mitra
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
26
|
Abstract
One important challenge in the post-genomic era is uncovering the relationships among distinct pathophenotypes by using molecular signatures. Given the complex functional interdependencies between cellular components, a disease is seldom the consequence of a defect in a single gene product, instead reflecting the perturbations of a group of closely related gene products that carry out specific functions together. Therefore, it is meaningful to explore how the community of protein complexes impacts disease associations. Here, by integrating a large amount of information from protein complexes and the cellular basis of diseases, we built a human disease network in which two diseases are linked if they share common disease-related protein complex. A systemic analysis revealed that linked disease pairs exhibit higher comorbidity than those that have no links, and that the stronger association two diseases have based on protein complexes, the higher comorbidity they are prone to display. Moreover, more connected diseases tend to be malignant, which have high prevalence. We provide novel disease associations that cannot be identified through previous analysis. These findings will potentially provide biologists and clinicians new insights into the etiology, classification and treatment of diseases.
Collapse
|
27
|
Li B, Kihara D. Protein docking prediction using predicted protein-protein interface. BMC Bioinformatics 2012; 13:7. [PMID: 22233443 PMCID: PMC3287255 DOI: 10.1186/1471-2105-13-7] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2011] [Accepted: 01/10/2012] [Indexed: 11/10/2022] Open
Abstract
Background Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. Results We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. Conclusion We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.
Collapse
Affiliation(s)
- Bin Li
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | | |
Collapse
|
28
|
Generation and Analysis of Large-Scale Data-Driven Mycobacterium tuberculosis Functional Networks for Drug Target Identification. Adv Bioinformatics 2011; 2011:801478. [PMID: 22190924 PMCID: PMC3235424 DOI: 10.1155/2011/801478] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Accepted: 08/28/2011] [Indexed: 11/18/2022] Open
Abstract
Technological developments in large-scale biological experiments, coupled with bioinformatics tools, have opened the doors to computational approaches for the global analysis of whole genomes. This has provided the opportunity to look at genes within their context in the cell. The integration of vast amounts of data generated by these technologies provides a strategy for identifying potential drug targets within microbial pathogens, the causative agents of infectious diseases. As proteins are druggable targets, functional interaction networks between proteins are used to identify proteins essential to the survival, growth, and virulence of these microbial pathogens. Here we have integrated functional genomics data to generate functional interaction networks between Mycobacterium tuberculosis proteins and carried out computational analyses to dissect the functional interaction network produced for identifying drug targets using network topological properties. This study has provided the opportunity to expand the range of potential drug targets and to move towards optimal target-based strategies.
Collapse
|
29
|
Jessulat M, Pitre S, Gui Y, Hooshyar M, Omidi K, Samanfar B, Tan LH, Alamgir M, Green J, Dehne F, Golshani A. Recent advances in protein-protein interaction prediction: experimental and computational methods. Expert Opin Drug Discov 2011; 6:921-35. [PMID: 22646215 DOI: 10.1517/17460441.2011.603722] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
INTRODUCTION Proteins within the cell act as part of complex networks, which allow pathways and processes to function. Therefore, understanding how proteins interact is a significant area of current research. AREAS COVERED This review aims to present an overview of key experimental techniques (yeast two-hybrid, tandem affinity purification and protein microarrays) used to discover protein-protein interactions (PPIs), as well as to briefly discuss certain computational methods for predicting protein interactions based on gene localization, phylogenetic information, 3D structural modeling or primary protein sequence data. Due to the large-scale applicability of primary sequence-based methods, the authors have chosen to focus on this strategy for our review. There is an emphasis on a recent algorithm called Protein Interaction Prediction Engine (PIPE) that can predict global PPIs. The readers will discover recent advances both in the practical determination of protein interaction and the strategies that are available to attempt to anticipate interactions without the time and costs of experimental work. EXPERT OPINION Global PPI maps can help understand the biology of complex diseases and facilitate the identification of novel drug target sites. This study describes different techniques used for PPI prediction that we believe will significantly impact the development of the field in a new future. We expect to see a growing number of similar techniques capable of large-scale PPI predictions.
Collapse
Affiliation(s)
- Matthew Jessulat
- Carleton University , Department of Biology , 209 Nesbitt Building, 1125 Colonel By Drive, Ottawa, Ontario K1S 5B6 , Canada
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Dehouck Y, Kwasigroch JM, Gilis D, Rooman M. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics 2011; 12:151. [PMID: 21569468 PMCID: PMC3113940 DOI: 10.1186/1471-2105-12-151] [Citation(s) in RCA: 367] [Impact Index Per Article: 28.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2010] [Accepted: 05/13/2011] [Indexed: 12/31/2022] Open
Abstract
Background The rational design of modified proteins with controlled stability is of extreme importance in a whole range of applications, notably in the biotechnological and environmental areas, where proteins are used for their catalytic or other functional activities. Future breakthroughs in medical research may also be expected from an improved understanding of the effect of naturally occurring disease-causing mutations on the molecular level. Results PoPMuSiC-2.1 is a web server that predicts the thermodynamic stability changes caused by single site mutations in proteins, using a linear combination of statistical potentials whose coefficients depend on the solvent accessibility of the mutated residue. PoPMuSiC presents good prediction performances (correlation coefficient of 0.8 between predicted and measured stability changes, in cross validation, after exclusion of 10% outliers). It is moreover very fast, allowing the prediction of the stability changes resulting from all possible mutations in a medium size protein in less than a minute. This unique functionality is user-friendly implemented in PoPMuSiC and is particularly easy to exploit. Another new functionality of our server concerns the estimation of the optimality of each amino acid in the sequence, with respect to the stability of the structure. It may be used to detect structural weaknesses, i.e. clusters of non-optimal residues, which represent particularly interesting sites for introducing targeted mutations. This sequence optimality data is also expected to have significant implications in the prediction and the analysis of particular structural or functional protein regions. To illustrate the interest of this new functionality, we apply it to a dataset of known catalytic sites, and show that a much larger than average concentration of structural weaknesses is detected, quantifying how these sites have been optimized for function rather than stability. Conclusion The freely available PoPMuSiC-2.1 web server is highly useful for identifying very rapidly a list of possibly relevant mutations with the desired stability properties, on which subsequent experimental studies can be focused. It can also be used to detect sequence regions corresponding to structural weaknesses, which could be functionally important or structurally delicate regions, with obvious applications in rational protein design.
Collapse
Affiliation(s)
- Yves Dehouck
- Bioinformatique génomique et structurale, Université Libre de Bruxelles, Av, Fr, Roosevelt 50, CP165/61, 1050 Brussels, Belgium.
| | | | | | | |
Collapse
|
31
|
Maulik U, Bhattacharyya M, Mukhopadhyay A, Bandyopadhyay S. Identifying the immunodeficiency gateway proteins in humans and their involvement in microRNA regulation. MOLECULAR BIOSYSTEMS 2011; 7:1842-51. [PMID: 21437347 DOI: 10.1039/c1mb05026e] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Very little is known to date about the regulation protocol between transcription factors (TFs), genes and microRNAs (miRNAs) associated with diseases in various organisms. In this paper, we focus on finding the activity of miRNAs through the HIV-1 regulatory pathway in humans at the systems level. For this, we integrate and study the characteristics of the interaction information between HIV-1 and human proteins obtained from literature and prediction analysis. This information, realized in the form of a bipartite network, is subsequently mined with an exhaustive graph search technique to identify the strong significant biclusters, which are effectively the bicliques. They are unified further to form the core bipartite subnetwork. Many of the known HIV-1 associated kinase proteins (including LCK) are found in this core module. From this, the secondary significant proteins are identified by mapping these gateway proteins to the human protein-protein interaction network. Finally, these proteins are mapped onto the TF-to-miRNA and miRNA-to-gene regulatory networks derived from a couple of current studies to obtain a global view of the HIV-1 mediated TF-gene-miRNA inter-regulatory network. Interestingly, a few miRNAs participating in this pathway at the secondary level are found to have oncogenic involvement.
Collapse
Affiliation(s)
- Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata-700032, India.
| | | | | | | |
Collapse
|
32
|
Mitra RC, Zhang Z, Alexov E. In silico modeling of pH-optimum of protein-protein binding. Proteins 2010; 79:925-36. [PMID: 21287623 DOI: 10.1002/prot.22931] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Revised: 10/12/2010] [Accepted: 10/29/2010] [Indexed: 01/05/2023]
Abstract
Protein-protein association is a pH-dependent process and thus the binding affinity depends on the local pH. In vivo the association occurs in a particular cellular compartment, where the individual monomers are supposed to meet and form a complex. Since the monomers and the complex exist in the same micro environment, it is plausible that they coevolved toward its properties, in particular, toward the characteristic subcellular pH. Here we show that the pH at which the monomers are most stable (pH-optimum) or the pH at which stability is almost pH-independent (pH-flat) of monomers are correlated with the pH-optimum of maximal affinity (pH-optimum of binding) or pH interval at which affinity is almost pH-independent (pH-flat of binding) of the complexes made of the corresponding monomers. The analysis of interfacial properties of protein complexes demonstrates that pH-dependent properties can be roughly estimated using the interface charge alone. In addition, we introduce a parameter beta, proportional to the square root of the absolute product of the net charges of monomers, and show that protein complexes characterized with small or very large beta tend to have neutral pH-optimum. Further more, protein complexes made of monomers carrying the same polarity net charge at neutral pH have either very low or very high pH-optimum of binding. These findings are used to propose empirical rule for predicting pH-optimum of binding provided that the amino acid compositions of the corresponding monomers are available.
Collapse
Affiliation(s)
- Rooplekha C Mitra
- Physics Department, Computational Biophysics and Bioinformatics, Clemson University, Clemson, South Carolina 29634, USA
| | | | | |
Collapse
|
33
|
Leclercq L, Boustta M, Rixte J, Vert M. Degradability of poly(l-lysine) and poly(dl-aminoserinate) complexed with a polyanion under conditions modelling physico-chemical characteristics of body fluids. J Colloid Interface Sci 2010; 350:459-64. [DOI: 10.1016/j.jcis.2010.07.015] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2010] [Revised: 07/05/2010] [Accepted: 07/09/2010] [Indexed: 10/19/2022]
|
34
|
Venkatraman V, Yang YD, Sael L, Kihara D. Protein-protein docking using region-based 3D Zernike descriptors. BMC Bioinformatics 2009; 10:407. [PMID: 20003235 PMCID: PMC2800122 DOI: 10.1186/1471-2105-10-407] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2009] [Accepted: 12/09/2009] [Indexed: 12/02/2022] Open
Abstract
Background Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur. Results We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-αRMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases. Conclusion We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, USA.
| | | | | | | |
Collapse
|
35
|
Shin CJ, Davis MJ, Ragan MA. Towards the mammalian interactome: Inference of a core mammalian interaction set in mouse. Proteomics 2009; 9:5256-66. [DOI: 10.1002/pmic.200900262] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
36
|
Dong X, Yang B, Li Y, Zhong C, Ding J. Molecular basis of the acceleration of the GDP-GTP exchange of human ras homolog enriched in brain by human translationally controlled tumor protein. J Biol Chem 2009; 284:23754-64. [PMID: 19570981 PMCID: PMC2749149 DOI: 10.1074/jbc.m109.012823] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2009] [Revised: 06/16/2009] [Indexed: 01/07/2023] Open
Abstract
Ras homolog enriched in brain (Rheb), a small GTPase, positively regulates the mTORC1 pathway. The GDP-GTP exchange of Rheb has been suggested to be facilitated by translationally controlled tumor protein (TCTP). Here we demonstrate that human TCTP (hTCTP) interacts with human Rheb (hRheb) and accelerates its GDP release in vitro and that hTCTP activates the mTORC1 pathway in vivo. To investigate the underlying mechanism, we built structure models of GDP- and GTP-bound hRheb in complexes with hTCTP and performed molecular dynamics simulations of the models, which predict key residues involved in the interactions and region of hRheb undergoing conformational change during the GDP-GTP exchange. These results are verified with site-directed mutagenesis and in vitro biochemical and in vivo cell biological analyses. Furthermore, a crystal structure of the E12V mutant hTCTP, which lacks the guanine nucleotide exchange factor activity, shows that the deficiency appears to be caused by loss of a salt-bridging interaction with Lys-45 of hRheb. These data collectively provide insights into the molecular mechanisms of how hTCTP interacts with hRheb and activates the mTORC1 pathway.
Collapse
Affiliation(s)
- Xianchi Dong
- From the State Key Laboratory of Molecular Biology and Research Center for Structural Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and
- the Graduate School of Chinese Academy of Sciences, 320 Yue-Yang Road, Shanghai 200031, China
| | - Bei Yang
- From the State Key Laboratory of Molecular Biology and Research Center for Structural Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and
- the Graduate School of Chinese Academy of Sciences, 320 Yue-Yang Road, Shanghai 200031, China
| | - Yingjie Li
- From the State Key Laboratory of Molecular Biology and Research Center for Structural Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and
- the Graduate School of Chinese Academy of Sciences, 320 Yue-Yang Road, Shanghai 200031, China
| | - Chen Zhong
- From the State Key Laboratory of Molecular Biology and Research Center for Structural Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and
| | - Jianping Ding
- From the State Key Laboratory of Molecular Biology and Research Center for Structural Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and
| |
Collapse
|
37
|
Chakicherla A, Ecale Zhou CL, Dang ML, Rodriguez V, Hansen JN, Zemla A. SpaK/SpaR two-component system characterized by a structure-driven domain-fusion method and in vitro phosphorylation studies. PLoS Comput Biol 2009; 5:e1000401. [PMID: 19503843 PMCID: PMC2686270 DOI: 10.1371/journal.pcbi.1000401] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2008] [Accepted: 05/04/2009] [Indexed: 12/23/2022] Open
Abstract
Here we introduce a quantitative structure-driven computational domain-fusion
method, which we used to predict the structures of proteins believed to be
involved in regulation of the subtilin pathway in Bacillus
subtilis, and used to predict a protein-protein complex formed by
interaction between the proteins. Homology modeling of SpaK and SpaR yielded
preliminary structural models based on a best template for SpaK comprising a
dimer of a histidine kinase, and for SpaR a response regulator protein. Our LGA
code was used to identify multi-domain proteins with structure homology to both
modeled structures, yielding a set of domain-fusion templates then used to model
a hypothetical SpaK/SpaR complex. The models were used to identify putative
functional residues and residues at the protein-protein interface, and
bioinformatics was used to compare functionally and structurally relevant
residues in corresponding positions among proteins with structural homology to
the templates. Models of the complex were evaluated in light of known properties
of the functional residues within two-component systems involving His-Asp
phosphorelays. Based on this analysis, a phosphotransferase complexed with a
beryllofluoride was selected as the optimal template for modeling a SpaK/SpaR
complex conformation. In vitro phosphorylation studies
performed using wild type and site-directed SpaK mutant proteins validated the
predictions derived from application of the structure-driven domain-fusion
method: SpaK was phosphorylated in the presence of 32P-ATP and the
phosphate moiety was subsequently transferred to SpaR, supporting the hypothesis
that SpaK and SpaR function as sensor and response regulator, respectively, in a
two-component signal transduction system, and furthermore suggesting that the
structure-driven domain-fusion approach correctly predicted a physical
interaction between SpaK and SpaR. Our domain-fusion algorithm leverages
quantitative structure information and provides a tool for generation of
hypotheses regarding protein function, which can then be tested using empirical
methods. Because proteins so frequently function in coordination with other proteins,
identification and characterization of the interactions among proteins are
essential for understanding how proteins work. Computational methods for
identification of protein-protein interactions have been limited by the degree
to which proteins are similar in sequence. However, methods that leverage
structure information can overcome this limitation of sequence-based methods;
the three-dimensional information provided by structure enables identification
of related proteins even when their sequences are dissimilar. In this work we
present a quantitative method for identification of protein interacting
partners, and we demonstrate its use in modeling the structure of a hypothetical
complex between two proteins that function in a bacterial signaling system. This
quantitative approach comprises a tool for generation of hypotheses regarding
protein function, which can then be tested using empirical methods, and provides
a basis for high-throughput prediction of protein-protein interactions, which
could be applied on a whole-genome scale.
Collapse
Affiliation(s)
- Anu Chakicherla
- Computing Applications and Research Department, Lawrence Livermore
National Laboratory, Livermore, California, United States of America
| | - Carol L. Ecale Zhou
- Computing Applications and Research Department, Lawrence Livermore
National Laboratory, Livermore, California, United States of America
- * E-mail:
| | | | - Virginia Rodriguez
- Genome Technology Branch, National Human Genome Research Institute,
National Institutes of Health, Bethesda, Maryland, United States of
America
| | - J. Norman Hansen
- Department of Chemistry and Biochemistry, University of Maryland, College
Park, Maryland, United States of America
| | - Adam Zemla
- Computing Applications and Research Department, Lawrence Livermore
National Laboratory, Livermore, California, United States of America
| |
Collapse
|
38
|
Zaki N, Lazarova-Molnar S, El-Hajj W, Campbell P. Protein-protein interaction based on pairwise similarity. BMC Bioinformatics 2009; 10:150. [PMID: 19445721 PMCID: PMC2701420 DOI: 10.1186/1471-2105-10-150] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2008] [Accepted: 05/17/2009] [Indexed: 11/28/2022] Open
Abstract
Background Protein-protein interaction (PPI) is essential to most biological processes. Abnormal interactions may have implications in a number of neurological syndromes. Given that the association and dissociation of protein molecules is crucial, computational tools capable of effectively identifying PPI are desirable. In this paper, we propose a simple yet effective method to detect PPI based on pairwise similarity and using only the primary structure of the protein. The PPI based on Pairwise Similarity (PPI-PS) method consists of a representation of each protein sequence by a vector of pairwise similarities against large subsequences of amino acids created by a shifting window which passes over concatenated protein training sequences. Each coordinate of this vector is typically the E-value of the Smith-Waterman score. These vectors are then used to compute the kernel matrix which will be exploited in conjunction with support vector machines. Results To assess the ability of the proposed method to recognize the difference between "interacted" and "non-interacted" proteins pairs, we applied it on different datasets from the available yeast saccharomyces cerevisiae protein interaction. The proposed method achieved reasonable improvement over the existing state-of-the-art methods for PPI prediction. Conclusion Pairwise similarity score provides a relevant measure of similarity between protein sequences. This similarity incorporates biological knowledge about proteins and it is extremely powerful when combined with support vector machine to predict PPI.
Collapse
Affiliation(s)
- Nazar Zaki
- Bioinformatics Laboratory, Department of Computer Science, College of Information Technology, UAE University, Al Ain 17551, UAE.
| | | | | | | |
Collapse
|
39
|
Identifying protein–protein interaction sites in transient complexes with temperature factor, sequence profile and accessible surface area. Amino Acids 2009; 38:263-70. [DOI: 10.1007/s00726-009-0245-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2008] [Accepted: 01/21/2009] [Indexed: 11/26/2022]
|
40
|
Li M, Huang Y, Xiao Y. Effects of external interactions on protein sequence-structure relations of beta-trefoil fold. Proteins 2009; 72:1161-70. [PMID: 18320584 DOI: 10.1002/prot.22010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Proteins with symmetric structures are ideal models to investigate the sequence-structure relations. We investigate proteins with beta-trefoil fold and find they have different degrees of sequence symmetries although they show similar symmetric structures. To understand this, we calculate the strength of interactions of the beta-trefoil folds with surrounding environments and find the low degrees of sequence symmetries are often correlated with large external interactions. Our results give an additional confirmation of Anfinsen's thermodynamic hypothesis that protein structures are not only determined by their sequences but also by their surrounding environments. We suggest the external interactions should be considered additionally in protein structure prediction through ab initio folding.
Collapse
Affiliation(s)
- Mingfeng Li
- Department of Physics, Biomolecular Physics and Modeling Group, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | | | | |
Collapse
|
41
|
Zhu Z, Tovchigrechko A, Baronova T, Gao Y, Douguet D, O'Toole N, Vakser IA. Large-scale structural modeling of protein complexes at low resolution. J Bioinform Comput Biol 2008; 6:789-810. [PMID: 18763743 DOI: 10.1142/s0219720008003679] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2007] [Revised: 11/20/2007] [Accepted: 01/04/2008] [Indexed: 11/18/2022]
Abstract
Structural aspects of protein-protein interactions provided by large-scale, genome-wide studies are essential for the description of life processes at the molecular level. A methodology is developed that applies the protein docking approach (GRAMM), based on the knowledge of experimentally determined protein-protein structures (DOCKGROUND resource) and properties of intermolecular energy landscapes, to genome-wide systems of protein interactions. The full sequence-to-structure-of-complex modeling pipeline is implemented in the Genome Wide Docking Database (GWIDD) resource. Protein interaction data are imported to GWIDD from external datasets of experimentally determined interaction networks. Essential information is extracted and unified to form the GWIDD database. Structures of individual interacting proteins in the database are retrieved (if available) or modeled, and protein complex structures are predicted by the docking program. All protein sequence, structure, and docking information is conveniently accessible through a Web interface.
Collapse
Affiliation(s)
- Zhengwei Zhu
- Center for Bioinformatics, The University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, USA
| | | | | | | | | | | | | |
Collapse
|
42
|
Hunjan J, Tovchigrechko A, Gao Y, Vakser IA. The size of the intermolecular energy funnel in protein-protein interactions. Proteins 2008; 72:344-52. [PMID: 18214966 DOI: 10.1002/prot.21930] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Revealing the fundamental principles of protein interactions is essential for the basic knowledge of molecular processes and designing better predictive tools. Protein docking procedures allow systematic sampling of intermolecular energy landscapes, revealing the distribution of energy basins and their characteristics. A systematic search docking procedure GRAMM-X was applied to a comprehensive nonredundant database of nonobligate protein-protein complexes to determine the size of the intermolecular energy funnel. The unbound structures were simulated using rotamer library. The procedure generated grid-based matches, based on a smoothed Lennard-Jones potential, and minimized them off the grid with the same potential. The minimization generated a distribution of distances, based on a variety of metrics, between the grid-based and the minimized matches. The metric selected for the analysis, ligand interface RMSD, provided three independent estimates of the funnel size: based on the distribution amplitude for the near-native matches, deviation from random, and correlation with the energy values. The three methods converge to similar estimates of approximately 6-8 A ligand interface RMSD. The results indicated dependence of the funnel size on the type of the complex (smaller for antigen-antibody, medium for enzyme-inhibitor, and larger for the rest of the complexes) and the funnel size correlation with the size of the interface. Guidelines for the optimal sampling of docking coordinates, based on the funnel size estimates, were explored.
Collapse
Affiliation(s)
- Jagtar Hunjan
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas 66047, USA
| | | | | | | |
Collapse
|
43
|
Clauset A, Moore C, Newman MEJ. Hierarchical structure and the prediction of missing links in networks. Nature 2008; 453:98-101. [PMID: 18451861 DOI: 10.1038/nature06830] [Citation(s) in RCA: 537] [Impact Index Per Article: 33.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2007] [Accepted: 02/07/2008] [Indexed: 11/09/2022]
|
44
|
O'Toole N, Vakser IA. Large-scale characteristics of the energy landscape in protein-protein interactions. Proteins 2008; 71:144-52. [PMID: 17932937 DOI: 10.1002/prot.21665] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Characterization of intermolecular energy landscapes in protein-protein interactions is important for understanding the mechanisms of these interactions as well as for designing better protein docking methods. A simplified representation of the landscape was used for a systematic study of its large-scale characteristics in a large nonredundant dataset of protein complexes. The focus of the study is on the basic features of the low-resolution energy basins and their distribution on the landscape. The results clearly show that, in general, the number of such basins is small, these basins are well formed, correlated with actual binding modes, and the pattern of basins distribution depends on the type of the complex. For docking studies, the results suggest that adequate starting points for the structural refinement are detectable by low-resolution procedures and the number of such starting points is relatively small.
Collapse
Affiliation(s)
- Nicholas O'Toole
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Australia
| | | |
Collapse
|
45
|
Ruvinsky AM, Vakser IA. Interaction cutoff effect on ruggedness of protein-protein energy landscape. Proteins 2008; 70:1498-505. [PMID: 17910068 DOI: 10.1002/prot.21644] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The concept of the energy landscape is important for better understanding of protein-protein interactions and for designing adequate docking procedures. The intermolecular landscape has a rugged terrain that impedes search procedures. Its inherent ruggedness is related to the conformational characteristics of the molecules and to the form of the potential function--more rugged for short-range potentials and less rugged for "soft," typically long-range potentials. Our study determined that the landscape ruggedness is further substantially exacerbated by truncation of the potentials. This additional ruggedness appears below certain critical interaction ranges that depend on the form of the potential. The theoretical model describing the cutoff effect on the landscape ruggedness is confirmed by the energy calculation on a dataset of protein-protein complexes. The negative effect of the potentials cutoff is well known. However, revealing its physical basis in terms of the energy landscape is important for better understanding of intermolecular interactions.
Collapse
Affiliation(s)
- Anatoly M Ruvinsky
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas, USA
| | | |
Collapse
|
46
|
Gao Y, Douguet D, Tovchigrechko A, Vakser IA. DOCKGROUND system of databases for protein recognition studies: unbound structures for docking. Proteins 2008; 69:845-51. [PMID: 17803215 DOI: 10.1002/prot.21714] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Computational docking approaches are important as a source of protein-protein complexes structures and as a means to understand the principles of protein association. A key element in designing better docking approaches, including search procedures, potentials, and scoring functions is their validation on experimentally determined structures. Thus, the databases of such structures (benchmark sets) are important. The previous, first release of the DOCKGROUND resource (Douguet et al., Bioinformatics 2006; 22:2612-2618) implemented a comprehensive database of cocrystallized (bound) protein-protein complexes in a relational database of annotated structures. The current release adds important features to the set of bound structures, such as regularly updated downloadable datasets: automatically generated nonredundant set, built according to most common criteria, and a manually curated set that includes only biological nonobligate complexes along with a number of additional useful characteristics. The main focus of the current release is unbound (experimental and simulated) protein-protein complexes. Complexes from the bound dataset are used to identify crystallized unbound analogs. If such analogs do not exist, the unbound structures are simulated by rotamer library optimization. Thus, the database contains comprehensive sets of complexes suitable for large scale benchmarking of docking algorithms. Advanced methodologies for simulating unbound conformations are being explored for the next release. The future releases will include datasets of modeled protein-protein complexes, and systematic sets of docking decoys obtained by different docking algorithms. The growing DOCKGROUND resource is designed to become a comprehensive public environment for developing and validating new docking methodologies.
Collapse
Affiliation(s)
- Ying Gao
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas 66047-1620, USA
| | | | | | | |
Collapse
|
47
|
Wang RS, Wang Y, Wu LY, Zhang XS, Chen L. Analysis on multi-domain cooperation for predicting protein-protein interactions. BMC Bioinformatics 2007; 8:391. [PMID: 17937822 PMCID: PMC2222654 DOI: 10.1186/1471-2105-8-391] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2007] [Accepted: 10/16/2007] [Indexed: 11/17/2022] Open
Abstract
Background Domains are the basic functional units of proteins. It is believed that protein-protein interactions are realized through domain interactions. Revealing multi-domain cooperation can provide deep insights into the essential mechanism of protein-protein interactions at the domain level and be further exploited to improve the accuracy of protein interaction prediction. Results In this paper, we aim to identify cooperative domains for protein interactions by extending two-domain interactions to multi-domain interactions. Based on the high-throughput experimental data from multiple organisms with different reliabilities, the interactions of domains were inferred by a Linear Programming algorithm with Multi-domain pairs (LPM) and an Association Probabilistic Method with Multi-domain pairs (APMM). Experimental results demonstrate that our approach not only can find cooperative domains effectively but also has a higher accuracy for predicting protein interaction than the existing methods. Cooperative domains, including strongly cooperative domains and superdomains, were detected from major interaction databases MIPS and DIP, and many of them were verified by physical interactions from the crystal structures of protein complexes in PDB which provide intuitive evidences for such cooperation. Comparison experiments in terms of protein/domain interaction prediction justified the benefit of considering multi-domain cooperation. Conclusion From the computational viewpoint, this paper gives a general framework to predict protein interactions in a more accurate manner by considering the information of both multi-domains and multiple organisms, which can also be applied to identify cooperative domains, to reconstruct large complexes and further to annotate functions of domains. Supplementary information and software are provided in and .
Collapse
Affiliation(s)
- Rui-Sheng Wang
- School of Information, Renmin University of China, Beijing 100872, China.
| | | | | | | | | |
Collapse
|
48
|
Abstract
In a cell, it has been estimated that each protein on average interacts with roughly 10 others, resulting in tens of thousands of proteins known or suspected to have interaction partners; of these, only a tiny fraction have solved protein structures. To partially address this problem, we have developed M-TASSER, a hierarchical method to predict protein quaternary structure from sequence that involves template identification by multimeric threading, followed by multimer model assembly and refinement. The final models are selected by structure clustering. M-TASSER has been tested on a benchmark set comprising 241 dimers having templates with weak sequence similarity and 246 without multimeric templates in the dimer library. Of the total of 207 targets predicted to interact as dimers, 165 (80%) were correctly assigned as interacting with a true positive rate of 68% and a false positive rate of 17%. The initial best template structures have an average root mean-square deviation to native of 5.3, 6.7, and 7.4 A for the monomer, interface, and dimer structures. The final model shows on average a root mean-square deviation improvement of 1.3, 1.3, and 1.5 A over the initial template structure for the monomer, interface, and dimer structures, with refinement evident for 87% of the cases. Thus, we have developed a promising approach to predict full-length quaternary structure for proteins that have weak sequence similarity to proteins of solved quaternary structure.
Collapse
Affiliation(s)
| | - Jeffrey Skolnick
- Address reprint requests to Jeffrey Skolnick, Tel.: 404-407-8975; Fax: 404-385-7478.
| |
Collapse
|
49
|
Zhong S, MacKerell AD. Binding response: a descriptor for selecting ligand binding site on protein surfaces. J Chem Inf Model 2007; 47:2303-15. [PMID: 17900106 DOI: 10.1021/ci700149k] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The identification of ligand binding sites on a protein is an essential step in the selection of inhibitors of protein-ligand or protein-protein interactions via virtual database screening. To facilitate binding site identification, a novel descriptor, the binding response, is proposed in the present paper to quantitatively evaluate putative binding sites on the basis of their response to a test set of probe compounds. The binding response is determined on the basis of contributions from both the ligand-protein interaction energy and the geometry of binding poses for a database of test ligands. A favorable binding response is obtained for binding sites with favorable ligand binding energies and with ligand geometries within the putative site for the majority of compounds in the test set. The utility of this descriptor is illustrated by applying it to a number of known protein-ligand complexes, showing the approach to identify the experimental binding sites as the highest scoring site in 26 out of 29 cases; in the remaining three cases, it was among the top three scoring sites. This method is combined with sphere-based site identification and clustering methods to yield an automated approach for the identification of binding sites on proteins suitable for database screen or de novo drug design.
Collapse
Affiliation(s)
- Shijun Zhong
- Computer-Aided Drug Design Center, Department of Pharmaceutical Sciences, University of Maryland, 20 Penn Street, Baltimore, MD 21201, USA
| | | |
Collapse
|
50
|
Arga KY, Onsan ZI, Kirdar B, Ulgen KO, Nielsen J. Understanding signaling in yeast: Insights from network analysis. Biotechnol Bioeng 2007; 97:1246-58. [PMID: 17252576 DOI: 10.1002/bit.21317] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Reconstruction of protein interaction networks that represent groups of proteins contributing to the same cellular function is a key step towards quantitative studies of signal transduction pathways. Here we present a novel approach to reconstruct a highly correlated protein interaction network and to identify previously unknown components of a signaling pathway through integration of protein-protein interaction data, gene expression data, and Gene Ontology annotations. A novel algorithm is designed to reconstruct a highly correlated protein interaction network which is composed of the candidate proteins for signal transduction mechanisms in yeast Saccharomyces cerevisiae. The high efficiency of the reconstruction process is proved by a Receiver Operating Characteristic curve analysis. Identification and scoring of the possible linear pathways enables reconstruction of specific sub-networks for glucose-induction signaling and high osmolarity MAPK signaling in S. cerevisiae. All of the known components of these pathways are identified together with several new "candidate" proteins, indicating the successful reconstructions of two model pathways involved in S. cerevisiae. The integrated approach is hence shown useful for (i) prediction of new signaling pathways, (ii) identification of unknown members of documented pathways, and (iii) identification of network modules consisting of a group of related components that often incorporate the same functional mechanism.
Collapse
Affiliation(s)
- K Yalçin Arga
- Department of Chemical Engineering, Boğaziçi University, 34342 Istanbul, Turkey
| | | | | | | | | |
Collapse
|