1
|
Das JK, Chakraborty S, Roy S. A scheme for inferring viral-host associations based on codon usage patterns identifies the most affected signaling pathways during COVID-19. J Biomed Inform 2021; 118:103801. [PMID: 33965637 PMCID: PMC8102073 DOI: 10.1016/j.jbi.2021.103801] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 05/02/2021] [Accepted: 05/03/2021] [Indexed: 12/16/2022]
Abstract
Understanding the molecular mechanism of COVID-19 pathogenesis helps in the rapid therapeutic target identification. Usually, viral protein targets host proteins in an organized fashion. The expression of any viral gene depends mostly on the host translational machinery. Recent studies report the great significance of codon usage biases in establishing host-viral protein–protein interactions (PPI). Exploring the codon usage patterns between a pair of co-evolved host and viral proteins may present novel insight into the host-viral protein interactomes during disease pathogenesis. Leveraging the similarity in codon usage patterns, we propose a computational scheme to recreate the host-viral protein–protein interaction network. We use host proteins from seventeen (17) essential signaling pathways for our current work towards understanding the possible targeting mechanism of SARS-CoV-2 proteins. We infer both negatively and positively interacting edges in the network. Further, extensive analysis is performed to understand the host PPI network topologically and the attacking behavior of the viral proteins. Our study reveals that viral proteins mostly utilize codons, rare in the targeted host proteins (negatively correlated interaction). Among them, non-structural proteins, NSP3 and structural protein, Spike (S), are the most influential proteins in interacting with multiple host proteins. While ranking the most affected pathways, MAPK pathways observe to be the worst affected during the SARS-CoV-2 infection. Several proteins participating in multiple pathways are highly central in host PPI and mostly targeted by multiple viral proteins. We observe many potential targets (host proteins) from the affected pathways associated with the various drug molecules, including Arsenic trioxide, Dexamethasone, Hydroxychloroquine, Ritonavir, and Interferon beta, which are either under clinical trial or in use during COVID-19.
Collapse
Affiliation(s)
- Jayanta Kumar Das
- Department of Pediatrics, Johns Hopkins University, School of Medicine, MD, USA
| | | | - Swarup Roy
- Network Reconstruction & Analysis (NetRA) Lab, Department of Computer Applications, Sikkim University, Gangtok, India.
| |
Collapse
|
2
|
Yakubu RR, Nieves E, Weiss LM. The Methods Employed in Mass Spectrometric Analysis of Posttranslational Modifications (PTMs) and Protein-Protein Interactions (PPIs). ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1140:169-198. [PMID: 31347048 DOI: 10.1007/978-3-030-15950-4_10] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Mass Spectrometry (MS) has revolutionized the way we study biomolecules, especially proteins, their interactions and posttranslational modifications (PTM). As such MS has established itself as the leading tool for the analysis of PTMs mainly because this approach is highly sensitive, amenable to high throughput and is capable of assigning PTMs to specific sites in the amino acid sequence of proteins and peptides. Along with the advances in MS methodology there have been improvements in biochemical, genetic and cell biological approaches to mapping the interactome which are discussed with consideration for both the practical and technical considerations of these techniques. The interactome of a species is generally understood to represent the sum of all potential protein-protein interactions. There are still a number of barriers to the elucidation of the human interactome or any other species as physical contact between protein pairs that occur by selective molecular docking in a particular spatiotemporal biological context are not easily captured and measured.PTMs massively increase the complexity of organismal proteomes and play a role in almost all aspects of cell biology, allowing for fine-tuning of protein structure, function and localization. There are an estimated 300 PTMS with a predicted 5% of the eukaryotic genome coding for enzymes involved in protein modification, however we have not yet been able to reliably map PTM proteomes due to limitations in sample preparation, analytical techniques, data analysis, and the substoichiometric and transient nature of some PTMs. Improvements in proteomic and mass spectrometry methods, as well as sample preparation, have been exploited in a large number of proteome-wide surveys of PTMs in many different organisms. Here we focus on previously published global PTM proteome studies in the Apicomplexan parasites T. gondii and P. falciparum which offer numerous insights into the abundance and function of each of the studied PTM in the Apicomplexa. Integration of these datasets provide a more complete picture of the relative importance of PTM and crosstalk between them and how together PTM globally change the cellular biology of the Apicomplexan protozoa. A multitude of techniques used to investigate PTMs, mostly techniques in MS-based proteomics, are discussed for their ability to uncover relevant biological function.
Collapse
Affiliation(s)
- Rama R Yakubu
- Department of Pathology, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Edward Nieves
- Department of Biochemistry, Albert Einstein College of Medicine, Bronx, NY, USA.,Department of Developmental and Molecular Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Louis M Weiss
- Department of Pathology, Albert Einstein College of Medicine, Bronx, NY, USA. .,Department of Medicine, Albert Einstein College of Medicine, Bronx, NY, USA.
| |
Collapse
|
3
|
Liluashvili V, Kalayci S, Fluder E, Wilson M, Gabow A, Gümüs ZH. iCAVE: an open source tool for visualizing biomolecular networks in 3D, stereoscopic 3D and immersive 3D. Gigascience 2018; 6:1-13. [PMID: 28814063 PMCID: PMC5554349 DOI: 10.1093/gigascience/gix054] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 07/05/2017] [Indexed: 02/02/2023] Open
Abstract
Visualizations of biomolecular networks assist in systems-level data exploration in many cellular processes. Data generated from high-throughput experiments increasingly inform these networks, yet current tools do not adequately scale with concomitant increase in their size and complexity. We present an open source software platform, interactome-CAVE (iCAVE), for visualizing large and complex biomolecular interaction networks in 3D. Users can explore networks (i) in 3D using a desktop, (ii) in stereoscopic 3D using 3D-vision glasses and a desktop, or (iii) in immersive 3D within a CAVE environment. iCAVE introduces 3D extensions of known 2D network layout, clustering, and edge-bundling algorithms, as well as new 3D network layout algorithms. Furthermore, users can simultaneously query several built-in databases within iCAVE for network generation or visualize their own networks (e.g., disease, drug, protein, metabolite). iCAVE has modular structure that allows rapid development by addition of algorithms, datasets, or features without affecting other parts of the code. Overall, iCAVE is the first freely available open source tool that enables 3D (optionally stereoscopic or immersive) visualizations of complex, dense, or multi-layered biomolecular networks. While primarily designed for researchers utilizing biomolecular networks, iCAVE can assist researchers in any field.
Collapse
Affiliation(s)
- Vaja Liluashvili
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Selim Kalayci
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eugene Fluder
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Manda Wilson
- Computational Biology Center, Memorial-Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Aaron Gabow
- Computational Biology Center, Memorial-Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Zeynep H Gümüs
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
4
|
Huang L, Liao L, Wu CH. Completing sparse and disconnected protein-protein network by deep learning. BMC Bioinformatics 2018; 19:103. [PMID: 29566671 PMCID: PMC5863833 DOI: 10.1186/s12859-018-2112-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Accepted: 03/12/2018] [Indexed: 12/01/2022] Open
Abstract
Background Protein-protein interaction (PPI) prediction remains a central task in systems biology to achieve a better and holistic understanding of cellular and intracellular processes. Recently, an increasing number of computational methods have shifted from pair-wise prediction to network level prediction. Many of the existing network level methods predict PPIs under the assumption that the training network should be connected. However, this assumption greatly affects the prediction power and limits the application area because the current golden standard PPI networks are usually very sparse and disconnected. Therefore, how to effectively predict PPIs based on a training network that is sparse and disconnected remains a challenge. Results In this work, we developed a novel PPI prediction method based on deep learning neural network and regularized Laplacian kernel. We use a neural network with an autoencoder-like architecture to implicitly simulate the evolutionary processes of a PPI network. Neurons of the output layer correspond to proteins and are labeled with values (1 for interaction and 0 for otherwise) from the adjacency matrix of a sparse disconnected training PPI network. Unlike autoencoder, neurons at the input layer are given all zero input, reflecting an assumption of no a priori knowledge about PPIs, and hidden layers of smaller sizes mimic ancient interactome at different times during evolution. After the training step, an evolved PPI network whose rows are outputs of the neural network can be obtained. We then predict PPIs by applying the regularized Laplacian kernel to the transition matrix that is built upon the evolved PPI network. The results from cross-validation experiments show that the PPI prediction accuracies for yeast data and human data measured as AUC are increased by up to 8.4 and 14.9% respectively, as compared to the baseline. Moreover, the evolved PPI network can also help us leverage complementary information from the disconnected training network and multiple heterogeneous data sources. Tested by the yeast data with six heterogeneous feature kernels, the results show our method can further improve the prediction performance by up to 2%, which is very close to an upper bound that is obtained by an Approximate Bayesian Computation based sampling method. Conclusions The proposed evolution deep neural network, coupled with regularized Laplacian kernel, is an effective tool in completing sparse and disconnected PPI networks and in facilitating integration of heterogeneous data sources.
Collapse
Affiliation(s)
- Lei Huang
- Department of Computer and Information Sciences, University of Delaware, 18 Amstel Avenue, Newark, 19716, Delaware, USA
| | - Li Liao
- Department of Computer and Information Sciences, University of Delaware, 18 Amstel Avenue, Newark, 19716, Delaware, USA.
| | - Cathy H Wu
- Department of Computer and Information Sciences, University of Delaware, 18 Amstel Avenue, Newark, 19716, Delaware, USA.,Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, 19711, Delaware, USA
| |
Collapse
|
5
|
Ur Rehman H, Bari I, Ali A, Mahmood H. A Bayesian approach for estimating protein-protein interactions by integrating structural and non-structural biological data. MOLECULAR BIOSYSTEMS 2017; 13:2592-2602. [PMID: 29028065 DOI: 10.1039/c7mb00484b] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Accurate elucidation of genome wide protein-protein interactions is crucial for understanding the regulatory processes of the cell. High-throughput techniques, such as the yeast-2-hybrid (Y2H) assay, co-immunoprecipitation (co-IP), mass spectrometric (MS) protein complex identification, affinity purification (AP) etc., are generally relied upon to determine protein interactions. Unfortunately, each type of method is inherently subject to different types of noise and results in false positive interactions. On the other hand, precise understanding of proteins, especially knowledge of their functional associations is necessary for understanding how complex molecular machines function. To solve this problem, computational techniques are generally relied upon to precisely predict protein interactions. In this work, we present a novel method that combines structural and non-structural biological data to precisely predict protein interactions. The conceptual novelty of our approach lies in identifying and precisely associating biological information that provides substantial interaction clues. Our model combines structural and non-structural information using Bayesian statistics to calculate the likelihood of each interaction. The proposed model is tested on Saccharomyces cerevisiae's interactions extracted from the DIP and IntAct databases and provides substantial improvements in terms of accuracy, precision, recall and F1 score, as compared with the most widely used related state-of-the-art techniques.
Collapse
Affiliation(s)
- Hafeez Ur Rehman
- Department of Computer Science, FAST National University of Computer & Emerging Sciences, Peshawar, Pakistan.
| | | | | | | |
Collapse
|
6
|
Computational Resources for Predicting Protein-Protein Interactions. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2017; 110:251-275. [PMID: 29412998 DOI: 10.1016/bs.apcsb.2017.07.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Proteins are the essential building blocks and functional components of a cell. They account for the vital functions of an organism. Proteins interact with each other and form protein interaction networks. These protein interactions play a major role in all the biological processes and pathways. The previous methods of predicting protein interactions were experimental which focused on a small set of proteins or a particular protein. However, these experimental approaches are low-throughput as they are time-consuming and require a significant amount of human effort. This led to the development of computational techniques that uses high-throughput experimental data for analyzing protein-protein interactions. The main purpose of this review is to provide an overview on the computational advancements and tools for the prediction of protein interactions. The major databases for the deposition of these interactions are also described. The advantages, as well as the specific limitations of these tools, are highlighted which will shed light on the computational aspects that can help the biologist and researchers in their research.
Collapse
|
7
|
Mutations at protein-protein interfaces: Small changes over big surfaces have large impacts on human health. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2017; 128:3-13. [DOI: 10.1016/j.pbiomolbio.2016.10.002] [Citation(s) in RCA: 107] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2016] [Revised: 10/15/2016] [Accepted: 10/19/2016] [Indexed: 12/22/2022]
|
8
|
Dubovenko A, Nikolsky Y, Rakhmatulin E, Nikolskaya T. Functional Analysis of OMICs Data and Small Molecule Compounds in an Integrated "Knowledge-Based" Platform. Methods Mol Biol 2017; 1613:101-124. [PMID: 28849560 DOI: 10.1007/978-1-4939-7027-8_6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Analysis of NGS and other sequencing data, gene variants, gene expression, proteomics, and other high-throughput (OMICs) data is challenging because of its biological complexity and high level of technical and biological noise. One way to deal with both problems is to perform analysis with a high fidelity annotated knowledgebase of protein interactions, pathways, and functional ontologies. This knowledgebase has to be structured in a computer-readable format and must include software tools for managing experimental data, analysis, and reporting. Here, we present MetaCore™ and Key Pathway Advisor (KPA), an integrated platform for functional data analysis. On the content side, MetaCore and KPA encompass a comprehensive database of molecular interactions of different types, pathways, network models, and ten functional ontologies covering human, mouse, and rat genes. The analytical toolkit includes tools for gene/protein list enrichment analysis, statistical "interactome" tool for the identification of over- and under-connected proteins in the dataset, and a biological network analysis module made up of network generation algorithms and filters. The suite also features Advanced Search, an application for combinatorial search of the database content, as well as a Java-based tool called Pathway Map Creator for drawing and editing custom pathway maps. Applications of MetaCore and KPA include molecular mode of action of disease research, identification of potential biomarkers and drug targets, pathway hypothesis generation, analysis of biological effects for novel small molecule compounds and clinical applications (analysis of large cohorts of patients, and translational and personalized medicine).
Collapse
Affiliation(s)
- Alexey Dubovenko
- Clarivate Analytics, 1500 Spring Garden Street, Fourth Floor, Philadelphia, PA, 19130, USA.
| | - Yuri Nikolsky
- Prosapia Genetics, Solana Beach, CA, 92075, USA.,School of Systems Biology, George Mason University, Fairfax, VA, USA
| | - Eugene Rakhmatulin
- Clarivate Analytics, 1500 Spring Garden Street, Fourth Floor, Philadelphia, PA, 19130, USA
| | | |
Collapse
|
9
|
Huang L, Liao L, Wu CH. Protein-protein interaction prediction based on multiple kernels and partial network with linear programming. BMC SYSTEMS BIOLOGY 2016. [PMCID: PMC4977483 DOI: 10.1186/s12918-016-0296-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
Abstract
Background Prediction of de novo protein-protein interaction is a critical step toward reconstructing PPI networks, which is a central task in systems biology. Recent computational approaches have shifted from making PPI prediction based on individual pairs and single data source to leveraging complementary information from multiple heterogeneous data sources and partial network structure. However, how to quickly learn weights for heterogeneous data sources remains a challenge. In this work, we developed a method to infer de novo PPIs by combining multiple data sources represented in kernel format and obtaining optimal weights based on random walk over the existing partial networks. Results Our proposed method utilizes Barker algorithm and the training data to construct a transition matrix which constrains how a random walk would traverse the partial network. Multiple heterogeneous features for the proteins in the network are then combined into the form of weighted kernel fusion, which provides a new "adjacency matrix" for the whole network that may consist of disconnected components but is required to comply with the transition matrix on the training subnetwork. This requirement is met by adjusting the weights to minimize the element-wise difference between the transition matrix and the weighted kernels. The minimization problem is solved by linear programming. The weighted kernel fusion is then transformed to regularized Laplacian (RL) kernel to infer missing or new edges in the PPI network, which can potentially connect the previously disconnected components. Conclusions The results on synthetic data demonstrated the soundness and robustness of the proposed algorithms under various conditions. And the results on real data show that the accuracies of PPI prediction for yeast data and human data measured as AUC are increased by up to 19 % and 11 % respectively, as compared to a control method without using optimal weights. Moreover, the weights learned by our method Weight Optimization by Linear Programming (WOLP) are very consistent with that learned by sampling, and can provide insights into the relations between PPIs and various feature kernel, thereby improving PPI prediction even for disconnected PPI networks.
Collapse
|
10
|
Huang L, Liao L, Wu CH. Inference of protein-protein interaction networks from multiple heterogeneous data. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2016; 2016:8. [PMID: 26941784 PMCID: PMC4761017 DOI: 10.1186/s13637-016-0040-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Accepted: 02/09/2016] [Indexed: 11/29/2022]
Abstract
Protein-protein interaction (PPI) prediction is a central task in achieving a better understanding of cellular and intracellular processes. Because high-throughput experimental methods are both expensive and time-consuming, and are also known of suffering from the problems of incompleteness and noise, many computational methods have been developed, with varied degrees of success. However, the inference of PPI network from multiple heterogeneous data sources remains a great challenge. In this work, we developed a novel method based on approximate Bayesian computation and modified differential evolution sampling (ABC-DEP) and regularized laplacian (RL) kernel. The method enables inference of PPI networks from topological properties and multiple heterogeneous features including gene expression and Pfam domain profiles, in forms of weighted kernels. The optimal weights are obtained by ABC-DEP, and the kernel fusion built based on optimal weights serves as input to RL to infer missing or new edges in the PPI network. Detailed comparisons with control methods have been made, and the results show that the accuracy of PPI prediction measured by AUC is increased by up to 23 %, as compared to a baseline without using optimal weights. The method can provide insights into the relations between PPIs and various feature kernels and demonstrates strong capability of predicting faraway interactions that cannot be well detected by traditional RL method.
Collapse
Affiliation(s)
- Lei Huang
- Department of Computer and Information Sciences, University of Delaware, 18 Amstel Avenue, Newark, 19716 DE USA
| | - Li Liao
- Department of Computer and Information Sciences, University of Delaware, 18 Amstel Avenue, Newark, 19716 DE USA
| | - Cathy H Wu
- Department of Computer and Information Sciences, University of Delaware, 18 Amstel Avenue, Newark, 19716 DE USA ; Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, 19711 DE USA
| |
Collapse
|
11
|
Ramakrishnan G, Chandra NR, Srinivasan N. From workstations to workbenches: Towards predicting physicochemically viable protein-protein interactions across a host and a pathogen. IUBMB Life 2014; 66:759-74. [DOI: 10.1002/iub.1331] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Revised: 11/06/2014] [Accepted: 11/16/2014] [Indexed: 01/03/2023]
Affiliation(s)
- Gayatri Ramakrishnan
- Indian Institute of Science Mathematics Initiative; Indian Institute of Science; Bangalore Karnataka India
- Molecular Biophysics Unit; Indian Institute of Science; Bangalore Karnataka India
| | - Nagasuma R. Chandra
- Department of Biochemistry; Indian Institute of Science; Bangalore Karnataka India
| | | |
Collapse
|
12
|
Haga SW, Wu HF. Overview of software options for processing, analysis and interpretation of mass spectrometric proteomic data. JOURNAL OF MASS SPECTROMETRY : JMS 2014; 49:959-969. [PMID: 25303385 DOI: 10.1002/jms.3414] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Revised: 05/23/2014] [Accepted: 06/13/2014] [Indexed: 06/04/2023]
Abstract
Recently, the interests in proteomics have been intensively increased, and the proteomic methods have been widely applied to many problems in cell biology. If the age of 1990s is considered to be a decade of genomics, we can claim that the following years of the new century is a decade of proteomics. The rapid evolution of proteomics has continued through these years, with a series of innovations in separation techniques and the core technologies of two-dimensional gel electrophoresis and MS. Both technologies are fueled by automation and high throughput computation for profiling of proteins from biological systems. As Patterson ever mentioned, 'data analysis is the Achilles heel of proteomics and our ability to generate data now outstrips our ability to analyze it'. The development of automatic and high throughput technologies for rapid identification of proteins is essential for large-scale proteome projects and automatic protein identification and characterization is essential for high throughput proteomics. This review provides a snap shot of the tools and applications that are available for mass spectrometric high throughput biocomputation. The review starts with a brief introduction of proteomics and MS. Computational tools that can be employed at various stages of analysis are presented, including that for data processing, identification, quantification, and the understanding of the biological functions of individual proteins and their dynamic interactions. The challenges of computation software development and its future trends in MS-based proteomics have also been speculated.
Collapse
Affiliation(s)
- Steve W Haga
- Department of Computer Science and Engineering, National Sun Yat Sen University, Kaohsiung, 804, Taiwan
| | | |
Collapse
|
13
|
Hall BA, Halim KA, Buyan A, Emmanouil B, Sansom MSP. Sidekick for Membrane Simulations: Automated Ensemble Molecular Dynamics Simulations of Transmembrane Helices. J Chem Theory Comput 2014; 10:2165-75. [PMID: 26580541 PMCID: PMC4871227 DOI: 10.1021/ct500003g] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The interactions of transmembrane (TM) α-helices with the phospholipid membrane and with one another are central to understanding the structure and stability of integral membrane proteins. These interactions may be analyzed via coarse grained molecular dynamics (CGMD) simulations. To obtain statistically meaningful analysis of TM helix interactions, large (N ca. 100) ensembles of CGMD simulations are needed. To facilitate the running and analysis of such ensembles of simulations, we have developed Sidekick, an automated pipeline software for performing high throughput CGMD simulations of α-helical peptides in lipid bilayer membranes. Through an end-to-end approach, which takes as input a helix sequence and outputs analytical metrics derived from CGMD simulations, we are able to predict the orientation and likelihood of insertion into a lipid bilayer of a given helix of a family of helix sequences. We illustrate this software via analyses of insertion into a membrane of short hydrophobic TM helices containing a single cationic arginine residue positioned at different positions along the length of the helix. From analyses of these ensembles of simulations, we estimate apparent energy barriers to insertion which are comparable to experimentally determined values. In a second application, we use CGMD simulations to examine the self-assembly of dimers of TM helices from the ErbB1 receptor tyrosine kinase and analyze the numbers of simulation repeats necessary to obtain convergence of simple descriptors of the mode of packing of the two helices within a dimer. Our approach offers a proof-of-principle platform for the further employment of automation in large ensemble CGMD simulations of membrane proteins.
Collapse
Affiliation(s)
- Benjamin A Hall
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU
- current address: Microsoft Research Cambridge, 21 Station Road, Cambridge, CB1 2FB
| | - Khairul Abd Halim
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU
| | - Amanda Buyan
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU
| | - Beatrice Emmanouil
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU
| | - Mark S P Sansom
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU
| |
Collapse
|
14
|
Mosca R, Pons T, Céol A, Valencia A, Aloy P. Towards a detailed atlas of protein–protein interactions. Curr Opin Struct Biol 2013; 23:929-40. [DOI: 10.1016/j.sbi.2013.07.005] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Revised: 07/04/2013] [Accepted: 07/08/2013] [Indexed: 12/30/2022]
|
15
|
Zhang QC, Petrey D, Garzón JI, Deng L, Honig B. PrePPI: a structure-informed database of protein-protein interactions. Nucleic Acids Res 2013; 41:D828-33. [PMID: 23193263 PMCID: PMC3531098 DOI: 10.1093/nar/gks1231] [Citation(s) in RCA: 182] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
PrePPI (http://bhapp.c2b2.columbia.edu/PrePPI) is a database that combines predicted and experimentally determined protein-protein interactions (PPIs) using a Bayesian framework. Predicted interactions are assigned probabilities of being correct, which are derived from calculated likelihood ratios (LRs) by combining structural, functional, evolutionary and expression information, with the most important contribution coming from structure. Experimentally determined interactions are compiled from a set of public databases that manually collect PPIs from the literature and are also assigned LRs. A final probability is then assigned to every interaction by combining the LRs for both predicted and experimentally determined interactions. The current version of PrePPI contains ∼2 million PPIs that have a probability more than ∼0.1 of which ∼60 000 PPIs for yeast and ∼370 000 PPIs for human are considered high confidence (probability > 0.5). The PrePPI database constitutes an integrated resource that enables users to examine aggregate information on PPIs, including both known and potentially novel interactions, and that provides structural models for many of the PPIs.
Collapse
Affiliation(s)
- Qiangfeng Cliff Zhang
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - Donald Petrey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - José Ignacio Garzón
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - Lei Deng
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - Barry Honig
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
- *To whom correspondence should be addressed. Tel: +1 212 851 4651; Fax: +1 212 851 4650,
| |
Collapse
|
16
|
Swapna LS, Srinivasan N, Robertson DL, Lovell SC. The origins of the evolutionary signal used to predict protein-protein interactions. BMC Evol Biol 2012; 12:238. [PMID: 23217198 PMCID: PMC3537733 DOI: 10.1186/1471-2148-12-238] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 11/17/2012] [Indexed: 12/02/2022] Open
Abstract
Background The correlation of genetic distances between pairs of protein sequence alignments has been used to infer protein-protein interactions. It has been suggested that these correlations are based on the signal of co-evolution between interacting proteins. However, although mutations in different proteins associated with maintaining an interaction clearly occur (particularly in binding interfaces and neighbourhoods), many other factors contribute to correlated rates of sequence evolution. Proteins in the same genome are usually linked by shared evolutionary history and so it would be expected that there would be topological similarities in their phylogenetic trees, whether they are interacting or not. For this reason the underlying species tree is often corrected for. Moreover processes such as expression level, are known to effect evolutionary rates. However, it has been argued that the correlated rates of evolution used to predict protein interaction explicitly includes shared evolutionary history; here we test this hypothesis. Results In order to identify the evolutionary mechanisms giving rise to the correlations between interaction proteins, we use phylogenetic methods to distinguish similarities in tree topologies from similarities in genetic distances. We use a range of datasets of interacting and non-interacting proteins from Saccharomyces cerevisiae. We find that the signal of correlated evolution between interacting proteins is predominantly a result of shared evolutionary rates, rather than similarities in tree topology, independent of evolutionary divergence. Conclusions Since interacting proteins do not have tree topologies that are more similar than the control group of non-interacting proteins, it is likely that coevolution does not contribute much to, if any, of the observed correlations.
Collapse
|
17
|
Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature 2012; 490:556-60. [PMID: 23023127 PMCID: PMC3482288 DOI: 10.1038/nature11503] [Citation(s) in RCA: 489] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Accepted: 08/10/2012] [Indexed: 12/23/2022]
Abstract
The genome-wide identification of pairs of interacting proteins is an important step in the elucidation of cell regulatory mechanisms1,2. Much of our current knowledge derives from high-throughput techniques such as yeast two hybrid and affinity purification3, as well as from manual curation of experiments on individual systems4. A variety of computational approaches based, for example, on sequence homology, gene co-expression, and phylogenetic profiles have also been developed for the genome-wide inference of protein-protein interactions (PPIs)5,6. Yet, comparative studies suggest that the development of accurate and complete repertoires of PPIs is still in its early stages7–9. Here we show that three-dimensional structural information can be used to predict PPIs with an accuracy and coverage that are superior to predictions based on non-structural evidence. Moreover, an algorithm, PrePPI, that combines structural information with other functional clues is comparable in accuracy to high-throughput experiments, yielding over 30,000 high confidence interactions for yeast and over 300,000 for human. Experimental tests of a number of predictions demonstrate the ability of the PrePPI algorithm to identify unexpected PPIs of significant biological interest. The surprising effectiveness of three-dimensional structural information can be attributed to the use of homology models combined with the exploitation of both close and remote geometric relationships between proteins.
Collapse
|
18
|
Randhawa V, Bagler G. Identification of SRC as a potent drug target for asthma, using an integrative approach of protein interactome analysis and in silico drug discovery. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2012; 16:513-26. [PMID: 22775150 DOI: 10.1089/omi.2011.0160] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Network-biology inspired modeling of interactome data and computational chemistry have the potential to revolutionize drug discovery by complementing conventional methods. We consider asthma, a complex disease characterized by intricate molecular mechanisms, for our study. We aim to integrate prediction of potent drug targets using graph-theoretical methods and subsequent identification of small molecules capable of modulating activity of the best target. In this work, we construct the protein interactome underlying this disease: Asthma Protein Interactome (API). Using a strategy based on network analysis of the interactome, we identify a set of potential drug targets for asthma. Topologically and dynamically, v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (SRC) emerges as the most central target in API. SRC is known to play an important role in promoting airway smooth muscle cell growth and facilitating migration in airway remodeling. From interactome analysis, and with the reported role in respiratory mechanisms, SRC emerges as a promising drug target for asthma. Further, we proceed to identify leads for SRC from a public database of small molecules. We predict two potential leads for SRC using ligand-based virtual screening methodology.
Collapse
Affiliation(s)
- Vinay Randhawa
- Biotechnology Division, Institute of Himalayan Bioresource Technology, Council of Scientific and Industrial Research (CSIR-IHBT), Palampur, India
| | | |
Collapse
|
19
|
|
20
|
Hawkins T, Kihara D. FUNCTION PREDICTION OF UNCHARACTERIZED PROTEINS. J Bioinform Comput Biol 2011; 5:1-30. [PMID: 17477489 DOI: 10.1142/s0219720007002503] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2006] [Revised: 09/23/2006] [Accepted: 10/10/2006] [Indexed: 11/18/2022]
Abstract
Function prediction of uncharacterized protein sequences generated by genome projects has emerged as an important focus for computational biology. We have categorized several approaches beyond traditional sequence similarity that utilize the overwhelmingly large amounts of available data for computational function prediction, including structure-, association (genomic context)-, interaction (cellular context)-, process (metabolic context)-, and proteomics-experiment-based methods. Because they incorporate structural and experimental data that is not used in sequence-based methods, they can provide additional accuracy and reliability to protein function prediction. Here, first we review the definition of protein function. Then the recent developments of these methods are introduced with special focus on the type of predictions that can be made. The need for further development of comprehensive systems biology techniques that can utilize the ever-increasing data presented by the genomics and proteomics communities is emphasized. For the readers' convenience, tables of useful online resources in each category are included. The role of computational scientists in the near future of biological research and the interplay between computational and experimental biology are also addressed.
Collapse
Affiliation(s)
- Troy Hawkins
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| | | |
Collapse
|
21
|
Monji H, Koizumi S, Ozaki T, Ohkawa T. Interaction site prediction by structural similarity to neighboring clusters in protein-protein interaction networks. BMC Bioinformatics 2011; 12 Suppl 1:S39. [PMID: 21342570 PMCID: PMC3044295 DOI: 10.1186/1471-2105-12-s1-s39] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recently, revealing the function of proteins with protein-protein interaction (PPI) networks is regarded as one of important issues in bioinformatics. With the development of experimental methods such as the yeast two-hybrid method, the data of protein interaction have been increasing extremely. Many databases dealing with these data comprehensively have been constructed and applied to analyzing PPI networks. However, few research on prediction interaction sites using both PPI networks and the 3D protein structures complementarily has explored. RESULTS We propose a method of predicting interaction sites in proteins with unknown function by using both of PPI networks and protein structures. For a protein with unknown function as a target, several clusters are extracted from the neighboring proteins based on their structural similarity. Then, interaction sites are predicted by extracting similar sites from the group of a protein cluster and the target protein. Moreover, the proposed method can improve the prediction accuracy by introducing repetitive prediction process. CONCLUSIONS The proposed method has been applied to small scale dataset, then the effectiveness of the method has been confirmed. The challenge will now be to apply the method to large-scale datasets.
Collapse
Affiliation(s)
- Hiroyuki Monji
- Graduate School of System Informatics, Kobe University, Rokkodai, Nada, Kobe 657-8501, Japan.
| | | | | | | |
Collapse
|
22
|
Kelly WP, Stumpf MPH. Trees on networks: resolving statistical patterns of phylogenetic similarities among interacting proteins. BMC Bioinformatics 2010; 11:470. [PMID: 20854660 PMCID: PMC2955699 DOI: 10.1186/1471-2105-11-470] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2010] [Accepted: 09/20/2010] [Indexed: 11/28/2022] Open
Abstract
Background Phylogenies capture the evolutionary ancestry linking extant species. Correlations and similarities among a set of species are mediated by and need to be understood in terms of the phylogenic tree. In a similar way it has been argued that biological networks also induce correlations among sets of interacting genes or their protein products. Results We develop suitable statistical resampling schemes that can incorporate these two potential sources of correlation into a single inferential framework. To illustrate our approach we apply it to protein interaction data in yeast and investigate whether the phylogenetic trees of interacting proteins in a panel of yeast species are more similar than would be expected by chance. Conclusions While we find only negligible evidence for such increased levels of similarities, our statistical approach allows us to resolve the previously reported contradictory results on the levels of co-evolution induced by protein-protein interactions. We conclude with a discussion as to how we may employ the statistical framework developed here in further functional and evolutionary analyses of biological networks and systems.
Collapse
|
23
|
Amoutzias GD, Robertson DL, Bornberg-Bauer E. The evolution of protein interaction networks in regulatory proteins. Comp Funct Genomics 2010; 5:79-84. [PMID: 18629034 PMCID: PMC2447317 DOI: 10.1002/cfg.365] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2003] [Revised: 11/18/2003] [Accepted: 11/25/2003] [Indexed: 12/05/2022] Open
Abstract
Interactions between proteins are essential for intracellular communication. They
form complex networks which have become an important source for functional
analysis of proteins. Combining phylogenies with network analysis, we investigate
the evolutionary history of interaction networks from the bHLH, NR and bZIP
transcription-factor families. The bHLH and NR networks show a hub-like structure
with varying γ values. Mutation and gene duplication play an important role
in adding and removing interactions. We conclude that in several of the protein
families that we have studied, networks have primarily arisen by the development of
heterodimerizing transcription factors, from an ancestral gene which interacts with
any of the newly emerging proteins but also homodimerizes.
Collapse
Affiliation(s)
- Gregory D Amoutzias
- School of Biological Sciences, University of Manchester, 2.205 Stopford Building, Oxford Road, Manchester M13 9PT, UK
| | | | | |
Collapse
|
24
|
Abstract
With the advent of Systems Biology, the prediction of whether two proteins form a complex has become a problem of increased importance. A variety of experimental techniques have been applied to the problem, but three-dimensional structural information has not been widely exploited. Here we explore the range of applicability of such information by analyzing the extent to which the location of binding sites on protein surfaces is conserved among structural neighbors. We find, as expected, that interface conservation is most significant among proteins that have a clear evolutionary relationship, but that there is a significant level of conservation even among remote structural neighbors. This finding is consistent with recent evidence that information available from structural neighbors, independent of classification, should be exploited in the search for functional insights. The value of such structural information is highlighted through the development of a new protein interface prediction method, PredUs, that identifies what residues on protein surfaces are likely to participate in complexes with other proteins. The performance of PredUs, as measured through comparisons with other methods, suggests that relationships across protein structure space can be successfully exploited in the prediction of protein-protein interactions.
Collapse
|
25
|
Development of a Novel Bioinformatics Tool for In Silico Validation of Protein Interactions. J Biomed Biotechnol 2010; 2010:670125. [PMID: 20625507 PMCID: PMC2896714 DOI: 10.1155/2010/670125] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2009] [Revised: 03/10/2010] [Accepted: 03/30/2010] [Indexed: 11/17/2022] Open
Abstract
Protein interactions are crucial in most biological processes. Several in silico methods have been recently developed to predict them. This paper describes a bioinformatics method that combines sequence similarity and structural information to support experimental studies on protein interactions. Given a target protein, the approach selects the most likely interactors among the candidates revealed by experimental techniques, but not yet in vivo validated. The sequence and the structural information of the in vivo confirmed proteins and complexes are exploited to evaluate the candidate interactors. Finally, a score is calculated to suggest the most likely interactors of the target protein. As an example, we searched for GRB2 interactors. We ranked a set of 46 candidate interactors by the presented method. These candidates were then reduced to 21, through a score threshold chosen by means of a cross-validation strategy. Among them, the isoform 1 of MAPK14 was in silico confirmed as a GRB2 interactor. Finally, given a set of already confirmed interactors of GRB2, the accuracy and the precision of the approach were 75% and 86%, respectively. In conclusion, the proposed method can be conveniently exploited to select the proteins to be experimentally investigated within a set of potential interactors.
Collapse
|
26
|
Venkatraman V, Yang YD, Sael L, Kihara D. Protein-protein docking using region-based 3D Zernike descriptors. BMC Bioinformatics 2009; 10:407. [PMID: 20003235 PMCID: PMC2800122 DOI: 10.1186/1471-2105-10-407] [Citation(s) in RCA: 126] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2009] [Accepted: 12/09/2009] [Indexed: 12/02/2022] Open
Abstract
Background Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur. Results We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-αRMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases. Conclusion We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, USA.
| | | | | | | |
Collapse
|
27
|
Kushwaha SK, Shakya M. PINAT1.0: protein interaction network analysis tool. Bioinformation 2009; 3:419-21. [PMID: 19759862 PMCID: PMC2737494 DOI: 10.6026/97320630003419] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2009] [Revised: 04/01/2009] [Accepted: 04/08/2009] [Indexed: 11/28/2022] Open
Abstract
Cellular processes are regulated by interaction of various proteins i.e. multiprotein complexes and absences of these
interactions are often the cause of disorder or disease. Such type of protein interactions are of great interest for drug
designing. In hostparasite diseases like Tuberculosis, non-homologous proteins as drug target are first preference. Most
potent drug target can be identifying among large number of non-homologous protein through protein interaction network
analysis. Drug target should be those non-homologous protein which is associated with maximum number of functional
proteins i.e. has highest number of interactants, so that maximum harm can be caused to pathogen only. In present work,
Protein Interaction Network Analysis Tool (PINAT) has been developed to identification of potential protein interaction for
drug target identification. PINAT is standalone, GUI application software made for protein-protein interaction (PPI) analysis
and network building by using coevolutionary profile. PINAT is very useful for large data PPI study with easiest handling
among available softwares. PINAT provides excellent facilities for the assembly of data for network building with visual
presentation of the results and interaction score. The software is written in JAVA and provides reliability through
transparency with user.
Collapse
|
28
|
De Bodt S, Proost S, Vandepoele K, Rouzé P, Van de Peer Y. Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression. BMC Genomics 2009; 10:288. [PMID: 19563678 PMCID: PMC2719670 DOI: 10.1186/1471-2164-10-288] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2009] [Accepted: 06/29/2009] [Indexed: 12/31/2022] Open
Abstract
Background Large-scale identification of the interrelationships between different components of the cell, such as the interactions between proteins, has recently gained great interest. However, unraveling large-scale protein-protein interaction maps is laborious and expensive. Moreover, assessing the reliability of the interactions can be cumbersome. Results In this study, we have developed a computational method that exploits the existing knowledge on protein-protein interactions in diverse species through orthologous relations on the one hand, and functional association data on the other hand to predict and filter protein-protein interactions in Arabidopsis thaliana. A highly reliable set of protein-protein interactions is predicted through this integrative approach making use of existing protein-protein interaction data from yeast, human, C. elegans and D. melanogaster. Localization, biological process, and co-expression data are used as powerful indicators for protein-protein interactions. The functional repertoire of the identified interactome reveals interactions between proteins functioning in well-conserved as well as plant-specific biological processes. We observe that although common mechanisms (e.g. actin polymerization) and components (e.g. ARPs, actin-related proteins) exist between different lineages, they are active in specific processes such as growth, cancer metastasis and trichome development in yeast, human and Arabidopsis, respectively. Conclusion We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions. Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered. Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses.
Collapse
Affiliation(s)
- Stefanie De Bodt
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Technologiepark 927, B-9052 Gent, Belgium.
| | | | | | | | | |
Collapse
|
29
|
Chakicherla A, Ecale Zhou CL, Dang ML, Rodriguez V, Hansen JN, Zemla A. SpaK/SpaR two-component system characterized by a structure-driven domain-fusion method and in vitro phosphorylation studies. PLoS Comput Biol 2009; 5:e1000401. [PMID: 19503843 PMCID: PMC2686270 DOI: 10.1371/journal.pcbi.1000401] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2008] [Accepted: 05/04/2009] [Indexed: 12/23/2022] Open
Abstract
Here we introduce a quantitative structure-driven computational domain-fusion
method, which we used to predict the structures of proteins believed to be
involved in regulation of the subtilin pathway in Bacillus
subtilis, and used to predict a protein-protein complex formed by
interaction between the proteins. Homology modeling of SpaK and SpaR yielded
preliminary structural models based on a best template for SpaK comprising a
dimer of a histidine kinase, and for SpaR a response regulator protein. Our LGA
code was used to identify multi-domain proteins with structure homology to both
modeled structures, yielding a set of domain-fusion templates then used to model
a hypothetical SpaK/SpaR complex. The models were used to identify putative
functional residues and residues at the protein-protein interface, and
bioinformatics was used to compare functionally and structurally relevant
residues in corresponding positions among proteins with structural homology to
the templates. Models of the complex were evaluated in light of known properties
of the functional residues within two-component systems involving His-Asp
phosphorelays. Based on this analysis, a phosphotransferase complexed with a
beryllofluoride was selected as the optimal template for modeling a SpaK/SpaR
complex conformation. In vitro phosphorylation studies
performed using wild type and site-directed SpaK mutant proteins validated the
predictions derived from application of the structure-driven domain-fusion
method: SpaK was phosphorylated in the presence of 32P-ATP and the
phosphate moiety was subsequently transferred to SpaR, supporting the hypothesis
that SpaK and SpaR function as sensor and response regulator, respectively, in a
two-component signal transduction system, and furthermore suggesting that the
structure-driven domain-fusion approach correctly predicted a physical
interaction between SpaK and SpaR. Our domain-fusion algorithm leverages
quantitative structure information and provides a tool for generation of
hypotheses regarding protein function, which can then be tested using empirical
methods. Because proteins so frequently function in coordination with other proteins,
identification and characterization of the interactions among proteins are
essential for understanding how proteins work. Computational methods for
identification of protein-protein interactions have been limited by the degree
to which proteins are similar in sequence. However, methods that leverage
structure information can overcome this limitation of sequence-based methods;
the three-dimensional information provided by structure enables identification
of related proteins even when their sequences are dissimilar. In this work we
present a quantitative method for identification of protein interacting
partners, and we demonstrate its use in modeling the structure of a hypothetical
complex between two proteins that function in a bacterial signaling system. This
quantitative approach comprises a tool for generation of hypotheses regarding
protein function, which can then be tested using empirical methods, and provides
a basis for high-throughput prediction of protein-protein interactions, which
could be applied on a whole-genome scale.
Collapse
Affiliation(s)
- Anu Chakicherla
- Computing Applications and Research Department, Lawrence Livermore
National Laboratory, Livermore, California, United States of America
| | - Carol L. Ecale Zhou
- Computing Applications and Research Department, Lawrence Livermore
National Laboratory, Livermore, California, United States of America
- * E-mail:
| | | | - Virginia Rodriguez
- Genome Technology Branch, National Human Genome Research Institute,
National Institutes of Health, Bethesda, Maryland, United States of
America
| | - J. Norman Hansen
- Department of Chemistry and Biochemistry, University of Maryland, College
Park, Maryland, United States of America
| | - Adam Zemla
- Computing Applications and Research Department, Lawrence Livermore
National Laboratory, Livermore, California, United States of America
| |
Collapse
|
30
|
Cho KI, Kim D, Lee D. A feature-based approach to modeling protein-protein interaction hot spots. Nucleic Acids Res 2009; 37:2672-87. [PMID: 19273533 PMCID: PMC2677884 DOI: 10.1093/nar/gkp132] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to π–related interactions, especially π · · · π interactions.
Collapse
Affiliation(s)
- Kyu-il Cho
- Department of Bio and Brain Engineering, KAIST, 305-701, Daejeon, South Korea
| | | | | |
Collapse
|
31
|
Gadkari RA, Varughese D, Srinivasan N. Recognition of interaction interface residues in low-resolution structures of protein assemblies solely from the positions of C(alpha) atoms. PLoS One 2009; 4:e4476. [PMID: 19214247 PMCID: PMC2641018 DOI: 10.1371/journal.pone.0004476] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2008] [Accepted: 12/22/2008] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The number of available structures of large multi-protein assemblies is quite small. Such structures provide phenomenal insights on the organization, mechanism of formation and functional properties of the assembly. Hence detailed analysis of such structures is highly rewarding. However, the common problem in such analyses is the low resolution of these structures. In the recent times a number of attempts that combine low resolution cryo-EM data with higher resolution structures determined using X-ray analysis or NMR or generated using comparative modeling have been reported. Even in such attempts the best result one arrives at is the very course idea about the assembly structure in terms of trace of the C(alpha) atoms which are modeled with modest accuracy. METHODOLOGY/PRINCIPAL FINDINGS In this paper first we present an objective approach to identify potentially solvent exposed and buried residues solely from the position of C(alpha) atoms and amino acid sequence using residue type-dependent thresholds for accessible surface areas of C(alpha). We extend the method further to recognize potential protein-protein interface residues. CONCLUSION/ SIGNIFICANCE: Our approach to identify buried and exposed residues solely from the positions of C(alpha) atoms resulted in an accuracy of 84%, sensitivity of 83-89% and specificity of 67-94% while recognition of interfacial residues corresponded to an accuracy of 94%, sensitivity of 70-96% and specificity of 58-94%. Interestingly, detailed analysis of cases of mismatch between recognition of interface residues from C(alpha) positions and all-atom models suggested that, recognition of interfacial residues using C(alpha) atoms only correspond better with intuitive notion of what is an interfacial residue. Our method should be useful in the objective analysis of structures of protein assemblies when positions of only (alpha) positions are available as, for example, in the cases of integration of cryo-EM data and high resolution structures of the components of the assembly.
Collapse
Affiliation(s)
- Rupali A. Gadkari
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- * E-mail: (RAG); (NS)
| | - Deepthi Varughese
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - N. Srinivasan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- * E-mail: (RAG); (NS)
| |
Collapse
|
32
|
Li M, Huang Y, Xiao Y. Effects of external interactions on protein sequence-structure relations of beta-trefoil fold. Proteins 2009; 72:1161-70. [PMID: 18320584 DOI: 10.1002/prot.22010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Proteins with symmetric structures are ideal models to investigate the sequence-structure relations. We investigate proteins with beta-trefoil fold and find they have different degrees of sequence symmetries although they show similar symmetric structures. To understand this, we calculate the strength of interactions of the beta-trefoil folds with surrounding environments and find the low degrees of sequence symmetries are often correlated with large external interactions. Our results give an additional confirmation of Anfinsen's thermodynamic hypothesis that protein structures are not only determined by their sequences but also by their surrounding environments. We suggest the external interactions should be considered additionally in protein structure prediction through ab initio folding.
Collapse
Affiliation(s)
- Mingfeng Li
- Department of Physics, Biomolecular Physics and Modeling Group, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | | | | |
Collapse
|
33
|
Functional analysis of OMICs data and small molecule compounds in an integrated "knowledge-based" platform. Methods Mol Biol 2009; 563:177-96. [PMID: 19597786 DOI: 10.1007/978-1-60761-175-2_10] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Analysis of microarray, SNPs, proteomics, and other high-throughput (OMICs) data is challenging because of its biological complexity and high level of technical and biological noise. One way to deal with both problems is to perform analysis with a high-fidelity annotated knowledge base of protein interactions, pathways, and functional ontologies. This knowledge base has to be structured in a computer-readable format and must include software tools for managing experimental data, analysis, and reporting. Here we present MetaDiscovery, an integrated platform for functional data analysis which is being developed at GeneGo for the past 8 years. On the content side, MetaDiscovery encompasses a comprehensive database of protein interactions of different types, pathways, network models and 10 functional ontologies covering human, mouse, and rat proteins. The analytical toolkit includes tools for gene/protein list enrichment analysis, statistical "interactome" tool for identification of over- and under-connected proteins in the data set, and a network module made up of network generation algorithms and filters. The suite also features MetaSearch, an application for combinatorial search of the database content, as well as a Java-based tool called MapEditor for drawing and editing custom pathway maps. Applications of MetaDiscovery include identification of potential biomarkers and drug targets, pathway hypothesis generation, analysis of biological effects for novel small molecule compounds, and clinical applications (analysis of large cohorts of patients and translational and personalized medicine).
Collapse
|
34
|
|
35
|
Zhu Z, Tovchigrechko A, Baronova T, Gao Y, Douguet D, O'Toole N, Vakser IA. Large-scale structural modeling of protein complexes at low resolution. J Bioinform Comput Biol 2008; 6:789-810. [PMID: 18763743 DOI: 10.1142/s0219720008003679] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2007] [Revised: 11/20/2007] [Accepted: 01/04/2008] [Indexed: 11/18/2022]
Abstract
Structural aspects of protein-protein interactions provided by large-scale, genome-wide studies are essential for the description of life processes at the molecular level. A methodology is developed that applies the protein docking approach (GRAMM), based on the knowledge of experimentally determined protein-protein structures (DOCKGROUND resource) and properties of intermolecular energy landscapes, to genome-wide systems of protein interactions. The full sequence-to-structure-of-complex modeling pipeline is implemented in the Genome Wide Docking Database (GWIDD) resource. Protein interaction data are imported to GWIDD from external datasets of experimentally determined interaction networks. Essential information is extracted and unified to form the GWIDD database. Structures of individual interacting proteins in the database are retrieved (if available) or modeled, and protein complex structures are predicted by the docking program. All protein sequence, structure, and docking information is conveniently accessible through a Web interface.
Collapse
Affiliation(s)
- Zhengwei Zhu
- Center for Bioinformatics, The University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, USA
| | | | | | | | | | | | | |
Collapse
|
36
|
Liu ZP, Wu LY, Wang Y, Zhang XS, Chen L. Bridging protein local structures and protein functions. Amino Acids 2008; 35:627-50. [PMID: 18421562 PMCID: PMC7088341 DOI: 10.1007/s00726-008-0088-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Accepted: 03/10/2008] [Indexed: 12/11/2022]
Abstract
One of the major goals of molecular and evolutionary biology is to understand the functions of proteins by extracting functional information from protein sequences, structures and interactions. In this review, we summarize the repertoire of methods currently being applied and report recent progress in the field of in silico annotation of protein function based on the accumulation of vast amounts of sequence and structure data. In particular, we emphasize the newly developed structure-based methods, which are able to identify locally structural motifs and reveal their relationship with protein functions. These methods include computational tools to identify the structural motifs and reveal the strong relationship between these pre-computed local structures and protein functions. We also discuss remaining problems and possible directions for this exciting and challenging area.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 100080, Beijing, China
| | | | | | | | | |
Collapse
|
37
|
Juan D, Pazos F, Valencia A. Co-evolution and co-adaptation in protein networks. FEBS Lett 2008; 582:1225-30. [DOI: 10.1016/j.febslet.2008.02.017] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2008] [Accepted: 02/08/2008] [Indexed: 10/22/2022]
|
38
|
Abstract
Computational methods for predicting protein interaction partners are becoming increasingly popular. Many of them are mature enough to be widely used by molecular biologists who can look for proteins related to the protein of interest in order to infer information about its context in the cell. In this chapter we describe the use of the mirrortree set of programs and related software for predicting protein interactions. They are all based on the idea that interacting or functionally related proteins tend to show similar phylogenetic trees due to coevolution. The basic mirrortree program can be used to calculate the similarity between the phylogenetic trees implicit in the multiple sequence alignments of two protein families. The ECID database contains protein interactions and relationships from different computational and experimental sources for the model organism Escherichia coli, including the ones generated with mirrortree. Finally, the TSEMA server uses the concept of tree similarity between interacting families to look for the best mapping between two families of interacting proteins: which member in one family interacts with which member in the other.
Collapse
|
39
|
Pitre S, Alamgir M, Green JR, Dumontier M, Dehne F, Golshani A. Computational methods for predicting protein-protein interactions. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2008; 110:247-67. [PMID: 18202838 DOI: 10.1007/10_2007_089] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Protein-protein interactions (PPIs) play a critical role in many cellular functions. A number of experimental techniques have been applied to discover PPIs; however, these techniques are expensive in terms of time, money, and expertise. There are also large discrepancies between the PPI data collected by the same or different techniques in the same organism. We therefore turn to computational techniques for the prediction of PPIs. Computational techniques have been applied to the collection, indexing, validation, analysis, and extrapolation of PPI data. This chapter will focus on computational prediction of PPI, reviewing a number of techniques including PIPE, developed in our own laboratory. For comparison, the conventional large-scale approaches to predict PPIs are also briefly discussed. The chapter concludes with a discussion of the limitations of both experimental and computational methods of determining PPIs.
Collapse
Affiliation(s)
- Sylvain Pitre
- School of Computer Science, Carleton University, 5304 Herzberg Building, 1125 Colonel By Drive, K1S 5B6, Ottawa, Ontario, Canada
| | | | | | | | | | | |
Collapse
|
40
|
Kim YC, Hummer G. Coarse-grained models for simulations of multiprotein complexes: application to ubiquitin binding. J Mol Biol 2007; 375:1416-33. [PMID: 18083189 DOI: 10.1016/j.jmb.2007.11.063] [Citation(s) in RCA: 207] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2007] [Revised: 11/19/2007] [Accepted: 11/19/2007] [Indexed: 10/22/2022]
Abstract
We develop coarse-grained models and effective energy functions for simulating thermodynamic and structural properties of multiprotein complexes with relatively low binding affinity (K(d) >1 microM) and apply them to binding of Vps27 to membrane-tethered ubiquitin. Folded protein domains are represented as rigid bodies. The interactions between the domains are treated at the residue level with amino-acid-dependent pair potentials and Debye-Hückel-type electrostatic interactions. Flexible linker peptides connecting rigid protein domains are represented as amino acid beads on a polymer with appropriate stretching, bending, and torsion-angle potentials. In simulations of membrane-attached protein complexes, interactions between amino acids and the membrane are described by residue-dependent short-range potentials and long-range electrostatics. We parameterize the energy functions by fitting the osmotic second virial coefficient of lysozyme and the binding affinity of the ubiquitin-CUE complex. For validation, extensive replica-exchange Monte Carlo simulations are performed of various protein complexes. Binding affinities for these complexes are in good agreement with the experimental data. The simulated structures are clustered on the basis of distance matrices between two proteins and ranked according to cluster population. In approximately 70% of the complexes, the distance root-mean-square is less than 5 A from the experimental structures. In approximately 90% of the complexes, the binding interfaces on both proteins are predicted correctly, and in all other cases at least one interface is correct. Transient and nonspecifically bound structures are also observed. With the validated model, we simulate the interaction between the Vps27 multiprotein complex and a membrane-tethered ubiquitin. Ubiquitin is found to bind preferentially to the two UIM domains of Vps27, but transient interactions between ubiquitin and the VHS and FYVE domains are observed as well. These specific and nonspecific interactions are found to be positively cooperative, resulting in a substantial enhancement of the overall binding affinity beyond the approximately 300 microM of the specific domains. We also find that the interactions between ubiquitin and Vps27 are highly dynamic, with conformational rearrangements enabling binding of Vps27 to diverse targets as part of the multivesicular-body protein-sorting pathway.
Collapse
Affiliation(s)
- Young C Kim
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520, USA
| | | |
Collapse
|
41
|
Sun J, Sun Y, Ding G, Liu Q, Wang C, He Y, Shi T, Li Y, Zhao Z. InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes. BMC Bioinformatics 2007; 8:414. [PMID: 17963500 PMCID: PMC2238723 DOI: 10.1186/1471-2105-8-414] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2007] [Accepted: 10/26/2007] [Indexed: 01/04/2023] Open
Abstract
Background Although many genomic features have been used in the prediction of protein-protein interactions (PPIs), frequently only one is used in a computational method. After realizing the limited power in the prediction using only one genomic feature, investigators are now moving toward integration. So far, there have been few integration studies for PPI prediction; one failed to yield appreciable improvement of prediction and the others did not conduct performance comparison. It remains unclear whether an integration of multiple genomic features can improve the PPI prediction and, if it can, how to integrate these features. Results In this study, we first performed a systematic evaluation on the PPI prediction in Escherichia coli (E. coli) by four genomic context based methods: the phylogenetic profile method, the gene cluster method, the gene fusion method, and the gene neighbor method. The number of predicted PPIs and the average degree in the predicted PPI networks varied greatly among the four methods. Further, no method outperformed the others when we tested using three well-defined positive datasets from the KEGG, EcoCyc, and DIP databases. Based on these comparisons, we developed a novel integrated method, named InPrePPI. InPrePPI first normalizes the AC value (an integrated value of the accuracy and coverage) of each method using three positive datasets, then calculates a weight for each method, and finally uses the weight to calculate an integrated score for each protein pair predicted by the four genomic context based methods. We demonstrate that InPrePPI outperforms each of the four individual methods and, in general, the other two existing integrated methods: the joint observation method and the integrated prediction method in STRING. These four methods and InPrePPI are implemented in a user-friendly web interface. Conclusion This study evaluated the PPI prediction by four genomic context based methods, and presents an integrated evaluation method that shows better performance in E. coli.
Collapse
Affiliation(s)
- Jingchun Sun
- Virginia Institute for Psychiatric and Behavioral Genetics and Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Wang Y, Stieglitz KA, Bubunenko M, Court DL, Stec B, Roberts MF. The structure of the R184A mutant of the inositol monophosphatase encoded by suhB and implications for its functional interactions in Escherichia coli. J Biol Chem 2007; 282:26989-26996. [PMID: 17652087 DOI: 10.1074/jbc.m701210200] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The Escherichia coli product of the suhB gene, SuhB, is an inositol monophosphatase (IMPase) that is best known as a suppressor of temperature-sensitive growth phenotypes in E. coli. To gain insights into these biological diverse effects, we determined the structure of the SuhB R184A mutant protein. The structure showed a dimer organization similar to other IMPases, but with an altered interface suggesting that the presence of Arg-184 in the wild-type protein could shift the monomer-dimer equilibrium toward monomer. In parallel, a gel shift assay showed that SuhB forms a tight complex with RNA polymerase (RNA pol) that inhibits the IMPase catalytic activity of SuhB. A variety of SuhB mutant proteins designed to stabilize the dimer interface did not show a clear correlation with the ability of a specific mutant protein to complement the DeltasuhB mutation when introduced extragenically despite being active IMPases. However, the loss of sensitivity to RNA pol binding, i.e. in G173V, R184I, and L96F/R184I, did correlate strongly with loss of complementation of DeltasuhB. Because residue 184 forms the core of the SuhB dimer, it is likely that the interaction with RNA polymerase requires monomeric SuhB. The exposure of specific residues facilitates the interaction of SuhB with RNA pol (or another target with a similar binding surface) and it is this heterodimer formation that is critical to the ability of SuhB to rescue temperature-sensitive phenotypes in E. coli.
Collapse
Affiliation(s)
- Yanling Wang
- Department of Chemistry, Merkert Chemistry Center, Boston College, Chestnut Hill, Massachusetts 02467
| | - Kimberly A Stieglitz
- Department of Chemistry, Merkert Chemistry Center, Boston College, Chestnut Hill, Massachusetts 02467
| | - Mikhail Bubunenko
- Basic Research Program, SAIC-Frederick, Inc; Molecular Control and Genetics Section, Gene Regulation and Chromosome Biology Laboratory, Center for Cancer Research, NCI-Frederick, National Institutes of Health, Frederick, Maryland 21702
| | - Donald L Court
- Molecular Control and Genetics Section, Gene Regulation and Chromosome Biology Laboratory, Center for Cancer Research, NCI-Frederick, National Institutes of Health, Frederick, Maryland 21702
| | - Boguslaw Stec
- The Burnham Institute for Medical Research, La Jolla, California 92037
| | - Mary F Roberts
- Department of Chemistry, Merkert Chemistry Center, Boston College, Chestnut Hill, Massachusetts 02467.
| |
Collapse
|
43
|
Fukuhara N, Go N, Kawabata T. Prediction of interacting proteins from homology-modeled complex structures using sequence and structure scores. Biophysics (Nagoya-shi) 2007; 3:13-26. [PMID: 27857563 PMCID: PMC5036659 DOI: 10.2142/biophysics.3.13] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2007] [Accepted: 05/31/2007] [Indexed: 12/01/2022] Open
Abstract
Protein-protein interactions support most biological processes, and it is important to find specifically interacting partner proteins among homologous proteins in order to elucidate cellular functions such as signal transduction systems. Various high-throughput experimental methods for identifying these interactions have been invented, and used to generate a huge amount of data. Because these experiments have been applied to only a few organisms, and their accuracy is believed to be limited, it would be valuable to develop computational methods for predicting protein-protein interactions from their amino acid sequences or tertiary structural information. In this study, we describe a prediction method of interacting proteins based on homology-modeled complex structures. We employed the statistical residue-residue contact energy used in a previous study, and two types of new scores, simple electrostatic energy and sequence similarity between target sequences and template structures. The validity of each protein-protein complex model was measured using their single and combined scores. We applied our method to all the protein heterodimers of Saccharomyces cerevisiae. To evaluate the prediction performance of our method, we prepared two types of protein-protein interaction dataset: a complete dataset and high confidence dataset. The complete dataset (10,325 protein dimer models) contains all the yeast protein heterodimers whose complex structures can be modeled. Among them, pairs registered in the DIP database are defined as interacting pairs, and those not registered are defined as non-interacting protein pairs. The high confidence dataset (3,219 protein dimer models) is a more reliable subset of the complete dataset extracted using the criteria of the common subcellular localization. Both datasets show that sequence similarity has a much higher discrimination power than the other structure-based scores, but that the inclusion of contact energy results in significant improvement over predictions using sequence similarity alone. These results suggest that the sequence similarity is indispensable for the prediction, whereas structure scores can play supporting roles.
Collapse
Affiliation(s)
- Naoshi Fukuhara
- Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Nobuhiro Go
- Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan; Neutron Biology Research Center, Quantum Beam Science Directorate, Japan Atomic Energy Agency, 8-1 Umemidai, Kizu, Soraku, Kyoto, 619-0215, Japan
| | - Takeshi Kawabata
- Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan; CREST, JST
| |
Collapse
|
44
|
Anbarasu A, Sethumadhavan R. Exploring the role of cation–π interactions in glycoproteins lipid-binding proteins and RNA-binding proteins. J Theor Biol 2007; 247:346-53. [PMID: 17451749 DOI: 10.1016/j.jtbi.2007.02.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2006] [Revised: 01/30/2007] [Accepted: 02/27/2007] [Indexed: 11/28/2022]
Abstract
We have analyzed and compared the influence of cation-pi interactions in glycoproteins (GPs), lipid-binding proteins (LBPs) and RNA-binding proteins (RBPs) in this study. We observed that all the proteins included in the study had profound cation-pi interactions. There is an average of one energetically significant cation-pi interaction for every 71 residues in GPs, for every 58 residues in LBPs and for every 64 residues in RBPs. Long-range contacts are predominant in all the three types of proteins studied. The pair-wise cation-pi interaction energy between the positively charged and aromatic residues shows that Arg-Trp pair energy was the strongest among all six possible pairs in all the three types of proteins studied. There were considerable differences in the preference of cation-pi interacting residues to different secondary structure elements and ASA and these might contribute to differences in biochemical functions of GPs, LBPs and RBPs. It was interesting to note that all the five residues involved in cation-pi interactions were found to have stabilization centers in GPs, LBPs and RBPs. Majority of the cation-pi interacting residues investigated in the present study had a conservation score of 6, the cutoff value used to identify the stabilizing residues. A small percentage of cation-pi interacting residues were also present as stabilizing residues. The cation-pi interaction-forming residues play an important role in the structural stability of in GPs, LBPs and RBPs. The results obtained in this study will be helpful in further understanding the stability, specificity and differences in the biochemical functions of GPs, LBPs and RBPs.
Collapse
Affiliation(s)
- Anand Anbarasu
- School of Bio-Technology Chemical and Bio-Medical Engineering, VIT University, Vellore 632014, India
| | | |
Collapse
|
45
|
Abstract
Many essential cellular processes such as signal transduction, transport, cellular motion and most regulatory mechanisms are mediated by protein-protein interactions. In recent years, new experimental techniques have been developed to discover the protein-protein interaction networks of several organisms. However, the accuracy and coverage of these techniques have proven to be limited, and computational approaches remain essential both to assist in the design and validation of experimental studies and for the prediction of interaction partners and detailed structures of protein complexes. Here, we provide a critical overview of existing structure-independent and structure-based computational methods. Although these techniques have significantly advanced in the past few years, we find that most of them are still in their infancy. We also provide an overview of experimental techniques for the detection of protein-protein interactions. Although the developments are promising, false positive and false negative results are common, and reliable detection is possible only by taking a consensus of different experimental approaches. The shortcomings of experimental techniques affect both the further development and the fair evaluation of computational prediction methods. For an adequate comparative evaluation of prediction and high-throughput experimental methods, an appropriately large benchmark set of biophysically characterized protein complexes would be needed, but is sorely lacking.
Collapse
Affiliation(s)
- András Szilágyi
- Center of Excellence in Bioinformatics, University at Buffalo, State University of New York, 901 Washington St, Buffalo, NY 14203, USA
| | | | | | | |
Collapse
|
46
|
Sun J, Zhao Z. Construction of phylogenetic profiles based on the genetic distance of hundreds of genomes. Biochem Biophys Res Commun 2007; 355:849-53. [PMID: 17320815 DOI: 10.1016/j.bbrc.2007.02.048] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2007] [Accepted: 02/12/2007] [Indexed: 11/28/2022]
Abstract
Phylogenetic profiles have been widely applied in functional genomics research, especially in the prediction of protein-protein interactions (PPIs). A key issue in phylogenetic profiling is how to effectively select reference organisms from the available hundreds of genomes. In this study, we performed an assessment of reference organism selection based on the genetic distance between the target organism and 167 reference organisms. We found that inclusion of reference organisms from all distance levels had better performance in the prediction of PPIs than that at each distance level. The PPI prediction reached an optimal level when 70% of the reference organisms at all distance levels were selected; and this performance was similar to that in the optimal condition based on the taxonomy tree in our previous study. Because measurement of genetic distance is direct and simple compared to the topology of the taxonomy tree, we suggest selecting reference organisms based on genetic distance in the construction of phylogenetic profiles.
Collapse
Affiliation(s)
- Jingchun Sun
- Bioinformatics Laboratory, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA
| | | |
Collapse
|
47
|
Bi R, Zhou Y, Lu F, Wang W. Predicting Gene Ontology functions based on support vector machines and statistical significance estimation. Neurocomputing 2007. [DOI: 10.1016/j.neucom.2006.10.006] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
48
|
Sun J, Li Y, Zhao Z. Phylogenetic profiles for the prediction of protein-protein interactions: how to select reference organisms? Biochem Biophys Res Commun 2006; 353:985-91. [PMID: 17207465 DOI: 10.1016/j.bbrc.2006.12.146] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2006] [Accepted: 12/18/2006] [Indexed: 10/23/2022]
Abstract
The phylogenetic profile method has been widely applied in the prediction of protein-protein interactions (PPIs). Studies often use all of the available complete genomes for this method. With more than 400 genomes complete and new ones on the horizon, it remains unclear how to select reference organisms for profile construction and then influence the PPI prediction. Here, we performed a systematic assessment of reference organism selection from 225 complete genomes with their evolutionary tree. Our results suggest that reference organisms should be selected from moderately and highly genetically distant organisms, from all three domains (Bacteria, Archaea, and Eukarya), and by their even distribution at the fifth hierarchical level in the evolutionary tree. Our study provides important guidance on the construction of phylogenetic profiles for PPI prediction and functional genomics, which has become challenging due to the large and increasing number of available candidate organisms.
Collapse
Affiliation(s)
- Jingchun Sun
- Bioinformatics Laboratory, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA
| | | | | |
Collapse
|
49
|
Morozova N, Allers J, Myers J, Shamoo Y. Protein-RNA interactions: exploring binding patterns with a three-dimensional superposition analysis of high resolution structures. Bioinformatics 2006; 22:2746-52. [PMID: 16966360 DOI: 10.1093/bioinformatics/btl470] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The recognition of specific RNA sequences and structures by proteins is critical to our understanding of RNA processing, gene expression and viral replication. The diversity of RNA structures suggests that RNA recognition is substantially different than that of DNA. RESULTS The atomic coordinates of 41 protein-RNA complexes have been used to probe composite nucleoside binding pockets that form the structural and chemical underpinnings of base recognition. Composite nucleoside binding pockets were constructed using three-dimensional superpositions of each RNA nucleoside. Unlike protein-DNA interactions which are dominated by accessibility, RNA recognition frequently occurs in non-canonical and single-strand-like structures that allow interactions to occur from a much wider set of geometries and make fuller use of unique base shapes and hydrogen-bonding ability. By constructing composites that include all van der Waals, hydrogen-bonding, stacking and general non-polar interactions made to a particular nucleoside, the strategies employed are made readily visible. Protein-RNA interactions can result in the formation of a glove-like tight binding pocket around RNA bases, but the size, shape and non-polar binding patterns differ between specific RNA bases. We show that adenine can be distinguished from guanine based on the size and shape of the binding pocket and steric exclusion of the guanine N2 exocyclic amino group. The unique shape and hydrogen-bonding pattern for each RNA base allow proteins to make specific interactions through a very small number of contacts, as few as two in some cases. AVAILABILITY The program ENTANGLE is available from http://www.bioc.rice.edu/~shamoo
Collapse
Affiliation(s)
- N Morozova
- Department of Biochemistry and Cell Biology, Rice University 6100 Main Street, Houston, TX 77005, USA
| | | | | | | |
Collapse
|
50
|
Morrison JL, Breitling R, Higham DJ, Gilbert DR. A lock-and-key model for protein-protein interactions. Bioinformatics 2006; 22:2012-9. [PMID: 16787977 DOI: 10.1093/bioinformatics/btl338] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein-protein interaction networks are one of the major post-genomic data sources available to molecular biologists. They provide a comprehensive view of the global interaction structure of an organism's proteome, as well as detailed information on specific interactions. Here we suggest a physical model of protein interactions that can be used to extract additional information at an intermediate level: It enables us to identify proteins which share biological interaction motifs, and also to identify potentially missing or spurious interactions. RESULTS Our new graph model explains observed interactions between proteins by an underlying interaction of complementary binding domains (lock-and-key model). This leads to a novel graph-theoretical algorithm to identify bipartite subgraphs within protein-protein interaction networks where the underlying data are taken from yeast two-hybrid experimental results. By testing on synthetic data, we demonstrate that under certain modelling assumptions, the algorithm will return correct domain information about each protein in the network. Tests on data from various model organisms show that the local and global patterns predicted by the model are indeed found in experimental data. Using functional and protein structure annotations, we show that bipartite subnetworks can be identified that correspond to biologically relevant interaction motifs. Some of these are novel and we discuss an example involving SH3 domains from the Saccharomyces cerevisiae interactome. AVAILABILITY The algorithm (in Matlab format) is available (see http://www.maths.strath.ac.uk/~aas96106/lock_key.html).
Collapse
Affiliation(s)
- Julie L Morrison
- Bioinformatics Research Centre, Department of Computing Science, University of Glasgow G12 8QQ, UK.
| | | | | | | |
Collapse
|