151
|
Hangasky JA, Taabazuing CY, Valliere MA, Knapp MJ. Imposing function down a (cupin)-barrel: secondary structure and metal stereochemistry in the αKG-dependent oxygenases. Metallomics 2013; 5:287-301. [PMID: 23446356 PMCID: PMC4109655 DOI: 10.1039/c3mt20153h] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The Fe(ii)/αketoglutarate (αKG) dependent oxygenases catalyze a diverse range of reactions significant in biological processes such as antibiotic biosynthesis, lipid metabolism, oxygen sensing, and DNA and RNA repair. Although functionally diverse, the eight-stranded β-barrel (cupin) and HX(D/E)XnH facial triad motifs are conserved in this super-family of enzymes. Crystal structure analysis of 25 αKG oxygenases reveals two stereoisomers of the Fe cofactor, Anti and Clock, which differ in the relative position of the exchangeable ligand position and the primary substrate. Herein, we discuss the relationship between the chemical mechanism and the secondary coordination sphere of the αKG oxygenases, within the constraints of the stereochemistry of the Fe cofactor. Sequence analysis of the cupin barrel indicates that a small subset of positions constitute the second coordination sphere, which has significant ramifications for the structure of the ferryl intermediate. The competence of both Anti and Clock stereoisomers of Fe points to a ferryl intermediate that is 5 coordinate. The small number of conserved close contacts within the active sites of αKG oxygenases can be extended to chemically related enzymes, such as the αKG-dependent halogenases SyrB2 and CytC3, and the non-αKG dependent dioxygenases isopenicillin N synthase (IPNS) and cysteine dioxygenase (CDO).
Collapse
Affiliation(s)
- John A. Hangasky
- Department of Chemistry, University of Massachusetts, Amherst, MA 01003, USA
| | | | - Meaghan A. Valliere
- Department of Chemistry, University of Massachusetts, Amherst, MA 01003, USA
| | - Michael J. Knapp
- Department of Chemistry, University of Massachusetts, Amherst, MA 01003, USA
| |
Collapse
|
152
|
Thomas JC, O'Hara JM, Hu L, Gao FP, Joshi SB, Volkin DB, Brey RN, Fang J, Karanicolas J, Mantis NJ, Middaugh CR. Effect of single-point mutations on the stability and immunogenicity of a recombinant ricin A chain subunit vaccine antigen. Hum Vaccin Immunother 2013; 9:744-52. [PMID: 23563512 DOI: 10.4161/hv.22998] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
There is great interest in the design and development of highly thermostable and immunogenic protein subunit vaccines for biodefense. In this study, we used two orthogonal and complementary computational protein design approaches to generate a series of single-point mutants of RiVax, an attenuated recombinant ricin A chain (RTA) protein subunit vaccine antigen. As assessed by differential scanning calorimetry, the conformational stabilities of the designed mutants ranged from 4°C less stable to 4.5°C more stable than RiVax, depending on solution pH. Two more thermostable (V18P, C171L) and two less thermostable (T13V, S89T) mutants that displayed native-like secondary and tertiary structures (as determined by circular dichroism and fluorescence spectral analysis, respectively) were tested for their capacity to elicit RTA-specific antibodies and toxin-neutralizing activity. Following a prime-boost regimen, we found qualitative differences with respect to specific antibody titers and toxin neutralizing antibody levels induced by the different mutants. Upon a second boost with the more thermostable mutant C171L, a statistically significant increase in RTA-specific antibody titers was observed when compared with RiVax-immunized mice. Notably, the results indicate that single residue changes can be made to the RiVax antigen that increase its thermal stability without adversely impacting the efficacy of the vaccine.
Collapse
Affiliation(s)
- Justin C Thomas
- Macromolecule and Vaccine Stabilization Center; Department of Pharmaceutical Chemistry; University of Kansas; Lawrence, KS USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
153
|
Description of local and global shape properties of protein helices. J Mol Model 2013; 19:2901-11. [PMID: 23529181 DOI: 10.1007/s00894-013-1819-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2012] [Accepted: 03/05/2013] [Indexed: 10/27/2022]
Abstract
A new method, dubbed "HAXIS" is introduced to describe local and global shape properties of a protein helix via its axis. HAXIS is based on coarse-graining and spline-fitting of the helix backbone. At each Cα anchor point of the backbone, a Frenet frame is calculated, which directly provides the local vector presentation of the helix. After cubic spline-fitting of the axis line, its curvature and torsion are calculated. This makes a rapid comparison of different helix forms and the determination of helix similarity possible. Distortions of the helix caused by individual residues are projected onto the helix axis and presented either by the rise parameter per residue or by the local curvature of the axis. From a non-redundant set of 2,017 proteins, 15,068 helices were investigated in this way. Helix start and helix end as well as bending and kinking of the helix are accurately described. The global properties of the helix are assessed by a polynomial fit of the helix axis and the determination of its overall curving and twisting. Long helices are more regular shaped and linear whereas short helices are often strongly bent and twisted. The distribution of different helix forms as a function of helix length is analyzed.
Collapse
|
154
|
Implementation of a parallel protein structure alignment service on cloud. Int J Genomics 2013; 2013:439681. [PMID: 23671842 PMCID: PMC3647543 DOI: 10.1155/2013/439681] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Accepted: 02/20/2013] [Indexed: 12/20/2022] Open
Abstract
Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.
Collapse
|
155
|
Ashby C, Johnson D, Walker K, Kanj IA, Xia G, Huang X. New enumeration algorithm for protein structure comparison and classification. BMC Genomics 2013; 14 Suppl 2:S1. [PMID: 23445440 PMCID: PMC3582452 DOI: 10.1186/1471-2164-14-s2-s1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein structure comparison and classification is an effective method for exploring protein structure-function relations. This problem is computationally challenging. Many different computational approaches for protein structure comparison apply the secondary structure elements (SSEs) representation of protein structures. RESULTS We study the complexity of the protein structure comparison problem based on a mixed-graph model with respect to different computational frameworks. We develop an effective approach for protein structure comparison based on a novel independent set enumeration algorithm. Our approach (named: ePC, efficient enumeration-based Protein structure Comparison) is tested for general purpose protein structure comparison as well as for specific protein examples. Compared with other graph-based approaches for protein structure comparison, the theoretical running-time O(1.47 rnn2) of our approach ePC is significantly better, where n is the smaller number of SSEs of the two proteins, r is a parameter of small value. CONCLUSION Through the enumeration algorithm, our approach can identify different substructures from a list of high-scoring solutions of biological interest. Our approach is flexible to conduct protein structure comparison with the SSEs in sequential and non-sequential order as well. Supplementary data of additional testing and the source of ePC will be available at http://bioinformatics.astate.edu/.
Collapse
Affiliation(s)
- Cody Ashby
- Molecular Bioscience Graduate Program, Arkansas State University, Arkansas, USA
| | | | | | | | | | | |
Collapse
|
156
|
Abstract
MOTIVATION To recognize remote relationships between RNA molecules, one must be able to align structures without regard to sequence similarity. We have implemented a method, which is swift [O(n(2))], sensitive and tolerant of large gaps and insertions. Molecules are broken into overlapping fragments, which are characterized by their memberships in a probabilistic classification based on local geometry and H-bonding descriptors. This leads to a probabilistic similarity measure that is used in a conventional dynamic programming method. RESULTS Examples are given of database searching, the detection of structural similarities, which would not be found using sequence based methods, and comparisons with a previously published approach. AVAILABILITY AND IMPLEMENTATION Source code (C and perl) and binaries for linux are freely available at www.zbh.uni-hamburg.de/fries.
Collapse
Affiliation(s)
- Tim Wiegels
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, D-20146 Hamburg, Germany.
| | | | | |
Collapse
|
157
|
Amela I, Delicado P, Gómez A, Querol E, Cedano J. A dynamic model of the proteins that form the initial iron-sulfur cluster biogenesis machinery in yeast mitochondria. Protein J 2013; 32:183-96. [PMID: 23463383 DOI: 10.1007/s10930-013-9475-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The assembly of iron-sulfur clusters (ISCs) in eukaryotes involves the protein Frataxin. Deficits in this protein have been associated with iron inside the mitochondria and impair ISC biogenesis as it is postulated to act as the iron donor for ISCs assembly in this organelle. A pronounced lack of Frataxin causes Friedreich's Ataxia, which is a human neurodegenerative and hereditary disease mainly affecting the equilibrium, coordination, muscles and heart. Moreover, it is the most common autosomal recessive ataxia. High similarities between the human and yeast molecular mechanisms that involve Frataxin have been suggested making yeast a good model to study that process. In yeast, the protein complex that forms the central assembly platform for the initial step of ISC biogenesis is composed by yeast frataxin homolog, Nfs1-Isd11 and Isu. In general, it is commonly accepted that protein function involves interaction with other protein partners, but in this case not enough is known about the structure of the protein complex and, therefore, how it exactly functions. The objective of this work is to model the protein complex in order to gain insight into structural details that end up with its biological function. To achieve this goal several bioinformatics tools, modeling techniques and protein docking programs have been used. As a result, the structure of the protein complex and the dynamic behavior of its components, along with that of the iron and sulfur atoms required for the ISC assembly, have been modeled. This hypothesis will help to better understand the function and molecular properties of Frataxin as well as those of its ISC assembly protein partners.
Collapse
Affiliation(s)
- I Amela
- Departament de Bioquímica i Biologia Molecular, Institut de Biotecnologia i de Biomedicina, Parc de Recerca Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Catalonia, Spain
| | | | | | | | | |
Collapse
|
158
|
von Behren MM, Volkamer A, Henzler AM, Schomburg KT, Urbaczek S, Rarey M. Fast protein binding site comparison via an index-based screening technology. J Chem Inf Model 2013; 53:411-22. [PMID: 23390978 DOI: 10.1021/ci300469h] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
We present TrixP, a new index-based method for fast protein binding site comparison and function prediction. TrixP determines binding site similarities based on the comparison of descriptors that encode pharmacophoric and spatial features. Therefore, it adopts the efficient core components of TrixX, a structure-based virtual screening technology for large compound libraries. TrixP expands this technology by new components in order to allow a screening of protein libraries. TrixP accounts for the inherent flexibility of proteins employing a partial shape matching routine. After the identification of structures with matching pharmacophoric features and geometric shape, TrixP superimposes the binding sites and, finally, assesses their similarity according to the fit of pharmacophoric properties. TrixP is able to find analogies between closely and distantly related binding sites. Recovery rates of 81.8% for similar binding site pairs, assisted by rejecting rates of 99.5% for dissimilar pairs on a test data set containing 1331 pairs, confirm this ability. TrixP exclusively identifies members of the same protein family on top ranking positions out of a library consisting of 9802 binding sites. Furthermore, 30 predicted kinase binding sites can almost perfectly be classified into their known subfamilies.
Collapse
Affiliation(s)
- Mathias M von Behren
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | | | | | | | | | | |
Collapse
|
159
|
Torshin IY, Esipova NG, Tumanyan VG. Alternatingly twisted β-hairpins and nonglycine residues in the disallowed II′ region of the Ramachandran plot. J Biomol Struct Dyn 2013; 32:198-208. [DOI: 10.1080/07391102.2012.759451] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
160
|
Awwad K, Desai A, Smith C, Sommerhalter M. Structural and functional characterization of a noncanonical nucleoside triphosphate pyrophosphatase from Thermotoga maritima. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2013; 69:184-93. [PMID: 23385455 PMCID: PMC3565439 DOI: 10.1107/s0907444912044630] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2012] [Accepted: 10/29/2012] [Indexed: 11/11/2022]
Abstract
The hyperthermophilic bacterium Thermotoga maritima has a noncanonical nucleoside triphosphatase that catalyzes the conversion of inosine triphosphate (ITP), deoxyinosine triphosphate (dITP) and xanthosine triphosphate (XTP) into inosine monophosphate (IMP), deoxyinosine monophosphate (IMP) and xanthosine monophosphate (XMP), respectively. The k(cat)/K(m) values determined at 323 and 353 K fall between 1.31 × 10(4) and 7.80 × 10(4) M(-1) s(-1). ITP and dITP are slightly preferred over XTP. Activity towards canonical nucleoside triphosphates (ATP and GTP) was not detected. The enzyme has an absolute requirement for Mg(2+) as a cofactor and has a preference for alkaline conditions. A protein X-ray structure of the enzyme with bound IMP was obtained at 2.15 Å resolution. The active site houses a well conserved network of residues that are critical for substrate recognition and catalysis. The crystal structure shows a tetramer with two possible dimer interfaces. One of these interfaces strongly resembles the dimer interface that is found in the structures of other noncanonical nucleoside pyrophosphatases from human (human ITPase) and archaea (Mj0226 and PhNTPase).
Collapse
Affiliation(s)
- Khaldeyah Awwad
- Chemistry and Biochemistry, California State University East Bay, 25800 Carlos Bee Boulevard, Hayward, CA 94542, USA
| | - Anna Desai
- Chemistry and Biochemistry, California State University East Bay, 25800 Carlos Bee Boulevard, Hayward, CA 94542, USA
| | - Clyde Smith
- Stanford Synchrotron Radiation Lightsource, 2575 Sand Hill Road, Menlo Park, CA 94025, USA
| | - Monika Sommerhalter
- Chemistry and Biochemistry, California State University East Bay, 25800 Carlos Bee Boulevard, Hayward, CA 94542, USA
| |
Collapse
|
161
|
Devi PP, Adhikari S. Homology modeling and functional sites prediction of azoreductase enzyme from the cyanobacterium Nostoc sp. PCC7120. Interdiscip Sci 2013; 4:310-8. [DOI: 10.1007/s12539-012-0140-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2012] [Revised: 05/02/2012] [Accepted: 07/30/2012] [Indexed: 10/27/2022]
|
162
|
Li SC. The difficulty of protein structure alignment under the RMSD. Algorithms Mol Biol 2013; 8:1. [PMID: 23286762 PMCID: PMC3599502 DOI: 10.1186/1748-7188-8-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 12/17/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein structure alignment is often modeled as the largest common point set (LCP) problem based on the Root Mean Square Deviation (RMSD), a measure commonly used to evaluate structural similarity. In the problem, each residue is represented by the coordinate of the Cαatom, and a structure is modeled as a sequence of 3D points. Out of two such sequences, one is to find two equal-sized subsequences of the maximum length, and a bijection between the points of the subsequences which gives an RMSD within a given threshold. The problem is considered to be difficult in terms of time complexity, but the reasons for its difficulty is not well-understood. Improving this time complexity is considered important in protein structure prediction and structural comparison, where the task of comparing very numerous structures is commonly encountered. RESULTS To study why the LCP problem is difficult, we define a natural variant of the problem, called the minimum aligned distance (MAD). In the MAD problem, the length of the subsequences to obtain is specified in the input; and instead of fulfilling a threshold, the RMSD between the points of the two subsequences is to be minimized. Our results show that the difficulty of the two problems does not lie solely in the combinatorial complexity of finding the optimal subsequences, or in the task of superimposing the structures. By placing a limit on the distance between consecutive points, and assuming that the points are specified as integral values, we show that both problems are equally difficult, in the sense that they are reducible to each other. In this case, both problems can be exactly solved in polynomial time, although the time complexity remains high. CONCLUSIONS We showed insights and techniques which we hope will lead to practical algorithms for the LCP problem for protein structures. The study identified two important factors in the problem's complexity: (1) The lack of a limit in the distance between the consecutive points of a structure; (2) The arbitrariness of the precision allowed in the input values. Both issues are of little practical concern for the purpose of protein structure alignment. When these factors are removed, the LCP problem is as hard as that of minimizing the RMSD (MAD problem), and can be solved exactly in polynomial time.
Collapse
|
163
|
Hitaoka S, Shibata Y, Matoba H, Kawano A, Harada M, Rahman MM, Tsuji D, Hirokawa T, Itoh K, Yoshida T, Chuman H. Modeling of Human Neuraminidase-1 and Its Validation by LERE-Correlation Analysis. CHEM-BIO INFORMATICS JOURNAL 2013. [DOI: 10.1273/cbij.13.30] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Seiji Hitaoka
- Institute of Health Biosciences, The University of Tokushima Graduate School
| | - Yuto Shibata
- Institute of Health Biosciences, The University of Tokushima Graduate School
| | - Hiroshi Matoba
- Institute of Health Biosciences, The University of Tokushima Graduate School
| | - Akihiro Kawano
- Institute of Health Biosciences, The University of Tokushima Graduate School
| | - Masataka Harada
- Institute of Health Biosciences, The University of Tokushima Graduate School
| | - M Motiur Rahman
- Institute of Health Biosciences, The University of Tokushima Graduate School
| | - Daisuke Tsuji
- Institute of Health Biosciences, The University of Tokushima Graduate School
| | - Takatsugu Hirokawa
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST)
| | - Kohji Itoh
- Institute of Health Biosciences, The University of Tokushima Graduate School
| | - Tatsusada Yoshida
- Institute of Health Biosciences, The University of Tokushima Graduate School
| | - Hiroshi Chuman
- Institute of Health Biosciences, The University of Tokushima Graduate School
| |
Collapse
|
164
|
Production of bulk chemicals via novel metabolic pathways in microorganisms. Biotechnol Adv 2012; 31:925-35. [PMID: 23280013 DOI: 10.1016/j.biotechadv.2012.12.008] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2012] [Revised: 12/09/2012] [Accepted: 12/23/2012] [Indexed: 02/05/2023]
Abstract
Metabolic engineering has been playing important roles in developing high performance microorganisms capable of producing various chemicals and materials from renewable biomass in a sustainable manner. Synthetic and systems biology are also contributing significantly to the creation of novel pathways and the whole cell-wide optimization of metabolic performance, respectively. In order to expand the spectrum of chemicals that can be produced biotechnologically, it is necessary to broaden the metabolic capacities of microorganisms. Expanding the metabolic pathways for biosynthesizing the target chemicals requires not only the enumeration of a series of known enzymes, but also the identification of biochemical gaps whose corresponding enzymes might not actually exist in nature; this issue is the focus of this paper. First, pathway prediction tools, effectively combining reactions that lead to the production of a target chemical, are analyzed in terms of logics representing chemical information, and designing and ranking the proposed metabolic pathways. Then, several approaches for potentially filling in the gaps of the novel metabolic pathway are suggested along with relevant examples, including the use of promiscuous enzymes that flexibly utilize different substrates, design of novel enzymes for non-natural reactions, and exploration of hypothetical proteins. Finally, strain optimization by systems metabolic engineering in the context of novel metabolic pathways constructed is briefly described. It is hoped that this review paper will provide logical ways of efficiently utilizing 'big' biological data to design and develop novel metabolic pathways for the production of various bulk chemicals that are currently produced from fossil resources.
Collapse
|
165
|
Volkamer A, Kuhn D, Rippmann F, Rarey M. Predicting enzymatic function from global binding site descriptors. Proteins 2012; 81:479-89. [DOI: 10.1002/prot.24205] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Revised: 09/21/2012] [Accepted: 10/11/2012] [Indexed: 11/09/2022]
|
166
|
Abstract
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page.
Collapse
Affiliation(s)
- NCBI Resource Coordinators
- *To whom correspondence should be addressed. Eric W. Sayers. Tel: +30 1 49 62 475; Fax: +30 1 48 09 241;
| |
Collapse
|
167
|
Structure of the type III secretion effector protein ExoU in complex with its chaperone SpcU. PLoS One 2012; 7:e49388. [PMID: 23166655 PMCID: PMC3498133 DOI: 10.1371/journal.pone.0049388] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2012] [Accepted: 10/10/2012] [Indexed: 11/21/2022] Open
Abstract
Disease causing bacteria often manipulate host cells in a way that facilitates the infectious process. Many pathogenic gram-negative bacteria accomplish this by using type III secretion systems. In these complex secretion pathways, bacterial chaperones direct effector proteins to a needle-like secretion apparatus, which then delivers the effector protein into the host cell cytosol. The effector protein ExoU and its chaperone SpcU are components of the Pseudomonas aeruginosa type III secretion system. Secretion of ExoU has been associated with more severe infections in both humans and animal models. Here we describe the 1.92 Å X-ray structure of the ExoU–SpcU complex, a full-length type III effector in complex with its full-length cognate chaperone. Our crystallographic data allow a better understanding of the mechanism by which ExoU kills host cells and provides a foundation for future studies aimed at designing inhibitors of this potent toxin.
Collapse
|
168
|
Chen BY, Bandyopadhyay S. A regionalizable statistical model of intersecting regions in protein-ligand binding cavities. J Bioinform Comput Biol 2012; 10:1242004. [PMID: 22809380 DOI: 10.1142/s0219720012420048] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Finding elements of proteins that influence ligand binding specificity is an essential aspect of research in many fields. To assist in this effort, this paper presents two statistical models, based on the same theoretical foundation, for evaluating structural similarity among binding cavities. The first model specializes in the "unified" comparison of whole cavities, enabling the selection of cavities that are too dissimilar to have similar binding specificity. The second model enables a "regionalized" comparison of cavities within a user-defined region, enabling the selection of cavities that are too dissimilar to bind the same molecular fragments in the given region. We applied these models to analyze the ligand binding cavities of the serine protease and enolase superfamilies. Next, we observed that our unified model correctly separated sets of cavities with identical binding preferences from other sets with varying binding preferences, and that our regionalized model correctly distinguished cavity regions that are too dissimilar to bind similar molecular fragments in the user-defined region. These observations point to applications of statistical modeling that can be used to examine and, more importantly, identify influential structural similarities within binding site structure in order to better detect influences on protein-ligand binding specificity.
Collapse
Affiliation(s)
- Brian Y Chen
- Department of Computer Science and Engineering, Lehigh University, 19 Memorial Drive West, Bethlehem, PA 18015, USA.
| | | |
Collapse
|
169
|
Structure and function of a unique pore-forming protein from a pathogenic acanthamoeba. Nat Chem Biol 2012; 9:37-42. [PMID: 23143413 DOI: 10.1038/nchembio.1116] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2012] [Accepted: 10/15/2012] [Indexed: 11/08/2022]
Abstract
Human pathogens often produce soluble protein toxins that generate pores inside membranes, resulting in the death of target cells and tissue damage. In pathogenic amoebae, this has been exemplified with amoebapores of the enteric protozoan parasite Entamoeba histolytica. Here we characterize acanthaporin, to our knowledge the first pore-forming toxin to be described from acanthamoebae, which are free-living, bacteria-feeding, unicellular organisms that are opportunistic pathogens of increasing importance and cause severe and often fatal diseases. We isolated acanthaporin from extracts of virulent Acanthamoeba culbertsoni by tracking its pore-forming activity, molecularly cloned the gene of its precursor and recombinantly expressed the mature protein in bacteria. Acanthaporin was cytotoxic for human neuronal cells and exerted antimicrobial activity against a variety of bacterial strains by permeabilizing their membranes. The tertiary structures of acanthaporin's active monomeric form and inactive dimeric form, both solved by NMR spectroscopy, revealed a currently unknown protein fold and a pH-dependent trigger mechanism of activation.
Collapse
|
170
|
Xu D. Protein databases on the internet. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2012; Chapter 2:2.6.1-2.6.17. [PMID: 23151744 DOI: 10.1002/0471140864.ps0206s70] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Protein databases have become a crucial part of modern biology. Huge amounts of data for protein structures, functions, and particularly sequences are being generated. Searching databases is often the first step in the study of a new protein. Comparison between proteins or between protein families provides information about the relationship between proteins within a genome or across different species, and hence offers much more information than can be obtained by studying only an isolated protein. In addition, secondary databases derived from experimental databases are also widely available. These databases reorganize and annotate the data or provide predictions. The use of multiple databases often helps researchers understand the structure and function of a protein. Although some protein databases are widely known, they are far from being fully utilized in the protein science community. This unit provides a starting point for readers to explore the potential of protein databases on the Internet.
Collapse
Affiliation(s)
- Dong Xu
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri
| |
Collapse
|
171
|
Feldman HJ. Identifying structural domains of proteins using clustering. BMC Bioinformatics 2012; 13:286. [PMID: 23116496 PMCID: PMC3534501 DOI: 10.1186/1471-2105-13-286] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2012] [Accepted: 10/29/2012] [Indexed: 11/16/2022] Open
Abstract
Background Protein structures are comprised of modular elements known as domains. These units are used and re-used over and over in nature, and usually serve some particular function in the structure. Thus it is useful to be able to break up a protein of interest into its component domains, prior to similarity searching for example. Numerous computational methods exist for doing so, but most operate only on a single protein chain and many are limited to making a series of cuts to the sequence, while domains can and do span multiple chains. Results This study presents a novel clustering-based approach to domain identification, which works equally well on individual chains or entire complexes. The method is simple and fast, taking only a few milliseconds to run, and works by clustering either vectors representing secondary structure elements, or buried alpha-carbon positions, using average-linkage clustering. Each resulting cluster corresponds to a domain of the structure. The method is competitive with others, achieving 70% agreement with SCOP on a large non-redundant data set, and 80% on a set more heavily weighted in multi-domain proteins on which both SCOP and CATH agree. Conclusions It is encouraging that a basic method such as this performs nearly as well or better than some far more complex approaches. This suggests that protein domains are indeed for the most part simply compact regions of structure with a higher density of buried contacts within themselves than between each other. By representing the structure as a set of points or vectors in space, it allows us to break free of any artificial limitations that other approaches may depend upon.
Collapse
|
172
|
Ritchie DW, Ghoorah AW, Mavridis L, Venkatraman V. Fast protein structure alignment using Gaussian overlap scoring of backbone peptide fragment similarity. Bioinformatics 2012; 28:3274-81. [DOI: 10.1093/bioinformatics/bts618] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
173
|
Meux E, Prosper P, Masai E, Mulliert G, Dumarçay S, Morel M, Didierjean C, Gelhaye E, Favier F. Sphingobium sp. SYK-6 LigG involved in lignin degradation is structurally and biochemically related to the glutathione transferase ω class. FEBS Lett 2012; 586:3944-50. [PMID: 23058289 DOI: 10.1016/j.febslet.2012.09.036] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2012] [Revised: 09/13/2012] [Accepted: 09/21/2012] [Indexed: 10/27/2022]
Abstract
SpLigG is one of the three glutathione transferases (GSTs) involved in the process of lignin breakdown in the soil bacterium Sphingobium sp. SYK-6. Sequence comparisons showed that SpLigG and several proteobacteria homologues form an independent cluster within cysteine-containing GSTs. The relationship between SpLigG and other GSTs was investigated. The X-ray structure and biochemical properties of SpLigG indicate that this enzyme belongs to the omega class of glutathione transferases. However, the hydrophilic substrate binding site of SpLigG, together with its known ability to stereoselectively deglutathionylate the physiological substrate α-glutathionyl-β-hydroxypropiovanillone, argues for broadening the definition of the omega class.
Collapse
Affiliation(s)
- Edgar Meux
- Université de Lorraine, IAM, UMR 1136, IFR 110 EFABA, Vandoeuvre-les-Nancy, F-54506, France
| | | | | | | | | | | | | | | | | |
Collapse
|
174
|
Santini G, Soldano H, Pothier J. Automatic classification of protein structures relying on similarities between alignments. BMC Bioinformatics 2012; 13:233. [PMID: 22974051 PMCID: PMC3534633 DOI: 10.1186/1471-2105-13-233] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2011] [Accepted: 08/20/2012] [Indexed: 11/10/2022] Open
Abstract
Background Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins. Results When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a ternary similarity on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments. Conclusions We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP.
Collapse
Affiliation(s)
- Guillaume Santini
- Université Paris 13, Sorbonne Paris Cité, Laboratoire d'Informatique de Paris-Nord (LIPN), CNRS(, UMR 7030), Villetaneuse, F-93430, France.
| | | | | |
Collapse
|
175
|
Manjasetty BA, Yu XH, Panjikar S, Taguchi G, Chance MR, Liu CJ. Structural basis for modification of flavonol and naphthol glucoconjugates by Nicotiana tabacum malonyltransferase (NtMaT1). PLANTA 2012; 236:781-93. [PMID: 22610270 DOI: 10.1007/s00425-012-1660-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2012] [Accepted: 04/23/2012] [Indexed: 06/01/2023]
Abstract
Plant HXXXD acyltransferase-catalyzed malonylation is an important modification reaction in elaborating the structural diversity of flavonoids and anthocyanins, and a universal adaptive mechanism to detoxify xenobiotics. Nicotiana tabacum malonyltransferase 1 (NtMaT1) is a member of anthocyanin acyltransferase subfamily that uses malonyl-CoA (MLC) as donor catalyzing transacylation in a range of flavonoid and naphthol glucosides. To gain insights into the molecular basis underlying its catalytic mechanism and versatile substrate specificity, we resolved the X-ray crystal structure of NtMaT1 to 3.1 Å resolution. The structure comprises two α/β mixed subdomains, as typically found in the HXXXD acyltransferases. The partial electron density map of malonyl-CoA allowed us to reliably dock the entire molecule into the solvent channel and subsequently define the binding sites for both donor and acceptor substrates. MLC bound to the NtMaT1 occupies one end of the long solvent channel between two subdomains. On superimposing and comparing the structure of NtMaT1 with that of an enzyme from anthocyanin acyltransferase subfamily from red chrysanthemum (Dm3Mat3) revealed large architectural variation in the binding sites, both for the acyl donor and for the acceptor, although their overall protein folds are structurally conserved. Consequently, the shape and the interactions of malonyl-CoA with the binding sites' amino acid residues differ substantially. These major local architectural disparities point to the independent, divergent evolution of plant HXXXD acyltransferases in different species. The structural flexibility of the enzyme and the amendable binding pattern of the substrates provide a basis for the evolution of the distinct, versatile substrate specificity of plant HXXXD acyltransferases.
Collapse
Affiliation(s)
- Babu A Manjasetty
- European Molecular Biology Laboratory, Grenoble Outstation and Unit of Virus Host-Cell Interactions, UJF-EMBL-CNRS, UMI 3265, 6 Rue Jules Horowitz, 38042, Grenoble Cedex 9, France
| | | | | | | | | | | |
Collapse
|
176
|
Bonnel N, Marteau PF. LNA: fast protein structural comparison using a Laplacian characterization of tertiary structure. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1451-1458. [PMID: 22547433 DOI: 10.1109/tcbb.2012.64] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Abstract—In the last two decades, a lot of protein 3D shapes have been discovered, characterized, and made available thanks to the Protein Data Bank (PDB), that is nevertheless growing very quickly. New scalable methods are thus urgently required to search through the PDB efficiently. This paper presents an approach entitled LNA (Laplacian Norm Alignment) that performs a structural comparison of two proteins with dynamic programming algorithms. This is achieved by characterizing each residue in the protein with scalar features. The feature values are calculated using a Laplacian operator applied on the graph corresponding to the adjacency matrix of the residues. The weighted Laplacian operator we use estimates, at various scales, local deformations of the topology where each residue is located. On some benchmarks, which are widely shared by the community, we obtain qualitatively similar results compared to other competing approaches, but with an algorithm one or two order of magnitudes faster. 180,000 protein comparisons can be done within 1 second with a single recent Graphical Processing Unit (GPU), which makes our algorithm very scalable and suitable for real-time database querying across the web.
Collapse
Affiliation(s)
- Nicolas Bonnel
- IRISA, Université de Bretagne Sud, Campus de Tohannic, Vannes 56000, France.
| | | |
Collapse
|
177
|
Burrell M, Hanfrey CC, Kinch LN, Elliott KA, Michael AJ. Evolution of a novel lysine decarboxylase in siderophore biosynthesis. Mol Microbiol 2012; 86:485-99. [PMID: 22906379 DOI: 10.1111/j.1365-2958.2012.08208.x] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/07/2012] [Indexed: 12/30/2022]
Abstract
Structural backbones of iron-scavenging siderophore molecules include polyamines 1,3-diaminopropane and 1,5-diaminopentane (cadaverine). For the cadaverine-based desferroxiamine E siderophore in Streptomyces coelicolor, the corresponding biosynthetic gene cluster contains an ORF encoded by desA that was suspected of producing the cadaverine (decarboxylated lysine) backbone. However, desA encodes an l-2,4-diaminobutyrate decarboxylase (DABA DC) homologue and not any known form of lysine decarboxylase (LDC). The only known function of DABA DC is, together with l-2,4-aminobutyrate aminotransferase (DABA AT), to synthesize 1,3-diaminopropane. We show here that S. coelicolor desA encodes a novel LDC and we hypothesized that DABA DC homologues present in siderophore biosynthetic clusters in the absence of DABA AT ORFs would be novel LDCs. We confirmed this by correctly predicting the LDC activity of a DABA DC homologue from a Yersinia pestis siderophore biosynthetic pathway. The corollary was confirmed for a DABA DC homologue, adjacent to a DABA AT ORF in a siderophore pathway in the cyanobacterium Anabaena variabilis, which was shown to be a bona fide DABA DC. These findings enable prediction of whether a siderophore pathway will utilize 1,3-diaminopropane or cadaverine, and suggest that the majority of bacteria use DABA AT and DABA DC for siderophore, rather than norspermidine/polyamine biosynthesis.
Collapse
|
178
|
Sousounis K, Haney CE, Cao J, Sunchu B, Tsonis PA. Conservation of the three-dimensional structure in non-homologous or unrelated proteins. Hum Genomics 2012; 6:10. [PMID: 23244440 PMCID: PMC3500211 DOI: 10.1186/1479-7364-6-10] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 05/14/2012] [Indexed: 12/12/2022] Open
Abstract
In this review, we examine examples of conservation of protein structural motifs in unrelated or non-homologous proteins. For this, we have selected three DNA-binding motifs: the histone fold, the helix-turn-helix motif, and the zinc finger, as well as the globin-like fold. We show that indeed similar structures exist in unrelated proteins, strengthening the concept that three-dimensional conservation might be more important than the primary amino acid sequence.
Collapse
|
179
|
Horton JR, Mabuchi MY, Cohen-Karni D, Zhang X, Griggs RM, Samaranayake M, Roberts RJ, Zheng Y, Cheng X. Structure and cleavage activity of the tetrameric MspJI DNA modification-dependent restriction endonuclease. Nucleic Acids Res 2012; 40:9763-73. [PMID: 22848107 PMCID: PMC3479186 DOI: 10.1093/nar/gks719] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The MspJI modification-dependent restriction endonuclease recognizes 5-methylcytosine or 5-hydroxymethylcytosine in the context of CNN(G/A) and cleaves both strands at fixed distances (N12/N16) away from the modified cytosine at the 3′-side. We determined the crystal structure of MspJI of Mycobacterium sp. JLS at 2.05-Å resolution. Each protein monomer harbors two domains: an N-terminal DNA-binding domain and a C-terminal endonuclease. The N-terminal domain is structurally similar to that of the eukaryotic SET and RING-associated domain, which is known to bind to a hemi-methylated CpG dinucleotide. Four protein monomers are found in the crystallographic asymmetric unit. Analytical gel-filtration and ultracentrifugation measurements confirm that the protein exists as a tetramer in solution. Two monomers form a back-to-back dimer mediated by their C-terminal endonuclease domains. Two back-to-back dimers interact to generate a tetramer with two double-stranded DNA cleavage modules. Each cleavage module contains two active sites facing each other, enabling double-strand DNA cuts. Biochemical, mutagenesis and structural characterization suggest three different monomers of the tetramer may be involved respectively in binding the modified cytosine, making the first proximal N12 cleavage in the same strand and then the second distal N16 cleavage in the opposite strand. Both cleavage events require binding of at least a second recognition site either in cis or in trans.
Collapse
Affiliation(s)
- John R Horton
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
180
|
Mitin N, Rossman KL, Der CJ. Identification of a novel actin-binding domain within the Rho guanine nucleotide exchange factor TEM4. PLoS One 2012; 7:e41876. [PMID: 22911862 PMCID: PMC3404065 DOI: 10.1371/journal.pone.0041876] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Accepted: 06/27/2012] [Indexed: 11/19/2022] Open
Abstract
Spatio-temporal activation of Rho GTPases is essential for their function in a variety of biological processes and is achieved in part by regulating the localization of their activators, the Rho guanine nucleotide exchange factors (RhoGEFs). In this study, we provide the first characterization of the full-length protein encoded by RhoGEF TEM4 and delineate its domain structure, catalytic activity, and subcellular localization. First, we determined that TEM4 can stimulate guanine nucleotide exchange on RhoA and the related RhoB and RhoC isoforms. Second, we determined that TEM4, like other Dbl RhoGEFs, contains a functional pleckstrin homology (PH) domain immediately C-terminal to the catalytic Dbl homology (DH) domain. Third, using immunofluorescence analysis, we showed that TEM4 localizes to the actin cytoskeleton through sequences in the N-terminus of TEM4 independently of the DH/PH domains. Using site-directed mutagenesis and deletion analysis, we identified a minimal region between residues 81 and 135 that binds directly to F-actin and has an ∼90-fold higher affinity for ATP-loaded F-actin. Finally, we demonstrated that a single point mutation (R130D) within full-length TEM4 abolishes actin binding and localization of TEM4 to the actin cytoskeleton, as well as dampens the in vivo activity of TEM4 towards RhoC. Taken together, our data demonstrate that TEM4 contains a novel actin binding domain and binding to actin is essential for TEM4 subcellular localization and activity. The unique subcellular localization of TEM4 suggests a spatially-restricted activity and expands the diversity of mechanisms by which RhoGEF function can be regulated.
Collapse
Affiliation(s)
- Natalia Mitin
- Lineberger Comprehensive Cancer Center and Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.
| | | | | |
Collapse
|
181
|
Aiello D, Caffrey DR. Evolution of specific protein-protein interaction sites following gene duplication. J Mol Biol 2012; 423:257-72. [PMID: 22789570 DOI: 10.1016/j.jmb.2012.06.039] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Revised: 05/16/2012] [Accepted: 06/29/2012] [Indexed: 11/15/2022]
Abstract
Gene duplication is a common evolutionary process that leads to the expansion and functional diversification of protein subfamilies. The evolutionary events that cause paralogous proteins to bind different protein ligands (functionally diverged interfaces) are investigated and compared to paralogous proteins that bind the same protein ligand (functionally preserved interfaces). We find that functionally diverged interfaces possess more subfamily-specific residues than functionally preserved interfaces. These subfamily-specific residues are usually partially buried at the interface rim and achieve specific binding through optimized hydrogen bond geometries. In addition to optimized hydrogen bond geometries, side-chain modeling experiments suggest that steric effects are also important for binding specificity. Residues that are completely buried at the interface hub are also less conserved in functionally diverged interfaces than in functionally preserved interfaces. Consistent with this finding, hub residues contribute less to free energy of binding in functionally diverged interfaces than in functionally preserved interfaces. Therefore, we propose that protein binding is a delicate balance between binding affinity that primarily occurs at the interface hub and binding specificity that primarily occurs at the interface rim.
Collapse
Affiliation(s)
- Daniel Aiello
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | | |
Collapse
|
182
|
Mirceva G, Cingovska I, Dimov Z, Davcev D. Efficient approaches for retrieving protein tertiary structures. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1166-1179. [PMID: 22025763 DOI: 10.1109/tcbb.2011.138] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The 3D conformation of a protein in the space is the main factor which determines its function in living organisms. Due to the huge amount of newly discovered proteins, there is a need for fast and accurate computational methods for retrieving protein structures. Their purpose is to speed up the process of understanding the structure-to-function relationship which is crucial in the development of new drugs. There are many algorithms addressing the problem of protein structure retrieval. In this paper, we present several novel approaches for retrieving protein tertiary structures. We present our voxel-based descriptor. Then we present our protein ray-based descriptors which are applied on the interpolated protein backbone. We introduce five novel wavelet descriptors which perform wavelet transforms on the protein distance matrix. We also propose an efficient algorithm for distance matrix alignment named Matrix Alignment by Sequence Alignment within Sliding Window (MASASW), which has shown as much faster than DALI, CE, and MatAlign. We compared our approaches between themselves and with several existing algorithms, and they generally prove to be fast and accurate. MASASW achieves the highest accuracy. The ray and wavelet-based descriptors as well as MASASW are more accurate than CE.
Collapse
Affiliation(s)
- Georgina Mirceva
- Department of Computer Science and Computer Engineering, Faculty of Electrical Engineering and Information Technologies, Ss. Cyril and Methodius University in Skopje, PO Box 574, 1000 Skopje, Macedonia.
| | | | | | | |
Collapse
|
183
|
Hung K, Wang JC, Chen CW, Chuang CL, Tsai KN, Chen CM. Enhancement of initial equivalency for protein structure alignment based on encoded local structures. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE : A PUBLICATION OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY 2012; 16:1185-92. [PMID: 22717522 DOI: 10.1109/titb.2012.2204892] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Most alignment algorithms find an initial equivalent residue pair followed by an iterative optimization process to explore better near-optimal alignments in the surrounding solution space of the initial alignment. It plays a decisive role in determining the alignment quality since a poor initial alignment may make the final alignment trapped in an undesirable local optimum even with an iterative optimization. We proposed a vector-based alignment algorithm with a new initial alignment approach accounting for local structure features called MIRAGE-align. The new idea is to enhance the quality of the initial alignment based on encoded local structural alphabets to identify the protein structure pair whose sequence identity falls in or below twilight zone. The statistical analysis of alignment quality based on Match Index (MI) and computation time demonstrated that MIRAGE-align algorithm outperformed four previously published algorithms, i.e., the residue-based algorithm (CE), the vector-based algorithm (SSM), TM-align, and Fr-TM-align. MIRAGE-align yields a better estimate of initial solution to enhance the quality of initial alignment and enable the employment of a non-iterative optimization process to achieve a better alignment.
Collapse
|
184
|
Joseph AP, Valadié H, Srinivasan N, de Brevern AG. Local structural differences in homologous proteins: specificities in different SCOP classes. PLoS One 2012; 7:e38805. [PMID: 22745680 PMCID: PMC3382195 DOI: 10.1371/journal.pone.0038805] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 05/10/2012] [Indexed: 11/19/2022] Open
Abstract
The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also identified as preferred sites for insertions.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMR 665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Hélène Valadié
- INSERM UMR-S 726, DSIMB, Université Paris Diderot - Paris 7, Paris, France
| | | | - Alexandre G. de Brevern
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMR 665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
- * E-mail:
| |
Collapse
|
185
|
Chen BY, Bandyopadhyay S. Modeling regionalized volumetric differences in protein-ligand binding cavities. Proteome Sci 2012; 10 Suppl 1:S6. [PMID: 22759583 PMCID: PMC3390949 DOI: 10.1186/1477-5956-10-s1-s6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Identifying elements of protein structures that create differences in protein-ligand
binding specificity is an essential method for explaining the molecular mechanisms
underlying preferential binding. In some cases, influential mechanisms can be
visually identified by experts in structural biology, but subtler mechanisms, whose
significance may only be apparent from the analysis of many structures, are harder to
find. To assist this process, we present a geometric algorithm and two statistical
models for identifying significant structural differences in protein-ligand binding
cavities. We demonstrate these methods in an analysis of sequentially nonredundant
structural representatives of the canonical serine proteases and the enolase
superfamily. Here, we observed that statistically significant structural variations
identified experimentally established determinants of specificity. We also observed
that an analysis of individual regions inside cavities can reveal areas where small
differences in shape can correspond to differences in specificity.
Collapse
Affiliation(s)
- Brian Y Chen
- Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA.
| | | |
Collapse
|
186
|
Launay G, Téletchéa S, Wade F, Pajot-Augy E, Gibrat JF, Sanz G. Automatic modeling of mammalian olfactory receptors and docking of odorants. Protein Eng Des Sel 2012; 25:377-86. [PMID: 22691703 DOI: 10.1093/protein/gzs037] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
We present a procedure that (i) automates the homology modeling of mammalian olfactory receptors (ORs) based on the six three-dimensional (3D) structures of G protein-coupled receptors (GPCRs) available so far and (ii) performs the docking of odorants on these models, using the concept of colony energy to score the complexes. ORs exhibit low-sequence similarities with other GPCR and current alignment methods often fail to provide a reliable alignment. Here, we use a fold recognition technique to obtain a robust initial alignment. We then apply our procedure to a human OR that we have previously functionally characterized. The analysis of the resulting in silico complexes, supported by receptor mutagenesis and functional assays in a heterologous expression system, suggests that antagonists dock in the upper part of the binding pocket whereas agonists dock in the narrow lower part. We propose that the potency of agonists in activating receptors depends on their ability to establish tight interactions with the floor of the binding pocket. We developed a web site that allows the user to upload a GPCR sequence, choose a ligand in a library and obtain the 3D structure of the free receptor and ligand-receptor complex (http://genome.jouy.inra.fr/GPCRautomodel).
Collapse
Affiliation(s)
- Guillaume Launay
- INRA, Mathématique, Informatique et Génome UR1077, 78350 Jouy-en-Josas, France
| | | | | | | | | | | |
Collapse
|
187
|
Shah SB, Sahinidis NV. SAS-Pro: simultaneous residue assignment and structure superposition for protein structure alignment. PLoS One 2012; 7:e37493. [PMID: 22662161 PMCID: PMC3360771 DOI: 10.1371/journal.pone.0037493] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Accepted: 04/24/2012] [Indexed: 11/19/2022] Open
Abstract
Protein structure alignment is the problem of determining an assignment between the amino-acid residues of two given proteins in a way that maximizes a measure of similarity between the two superimposed protein structures. By identifying geometric similarities, structure alignment algorithms provide critical insights into protein functional similarities. Existing structure alignment tools adopt a two-stage approach to structure alignment by decoupling and iterating between the assignment evaluation and structure superposition problems. We introduce a novel approach, SAS-Pro, which addresses the assignment evaluation and structure superposition simultaneously by formulating the alignment problem as a single bilevel optimization problem. The new formulation does not require the sequentiality constraints, thus generalizing the scope of the alignment methodology to include non-sequential protein alignments. We employ derivative-free optimization methodologies for searching for the global optimum of the highly nonlinear and non-differentiable RMSD function encountered in the proposed model. Alignments obtained with SAS-Pro have better RMSD values and larger lengths than those obtained from other alignment tools. For non-sequential alignment problems, SAS-Pro leads to alignments with high degree of similarity with known reference alignments. The source code of SAS-Pro is available for download at http://eudoxus.cheme.cmu.edu/saspro/SAS-Pro.html.
Collapse
Affiliation(s)
- Shweta B. Shah
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Nikolaos V. Sahinidis
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
188
|
Abstract
A computational pipeline PocketAnnotate for functional annotation of proteins at the level of binding sites has been proposed in this study. The pipeline integrates three in-house algorithms for site-based function annotation: PocketDepth, for prediction of binding sites in protein structures; PocketMatch, for rapid comparison of binding sites and PocketAlign, to obtain detailed alignment between pair of binding sites. A novel scheme has been developed to rapidly generate a database of non-redundant binding sites. For a given input protein structure, putative ligand-binding sites are identified, matched in real time against the database and the query substructure aligned with the promising hits, to obtain a set of possible ligands that the given protein could bind to. The input can be either whole protein structures or merely the substructures corresponding to possible binding sites. Structure-based function annotation at the level of binding sites thus achieved could prove very useful for cases where no obvious functional inference can be obtained based purely on sequence or fold-level analyses. An attempt has also been made to analyse proteins of no known function from Protein Data Bank. PocketAnnotate would be a valuable tool for the scientific community and contribute towards structure-based functional inference. The web server can be freely accessed at http://proline.biochem.iisc.ernet.in/pocketannotate/.
Collapse
Affiliation(s)
- Praveen Anand
- Department of Biochemistry, Indian Institute of Science, Bangalore 560012, Karnataka, India
| | | | | |
Collapse
|
189
|
Wang J, Gao X, Wang Q, Li Y. ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval. BMC Bioinformatics 2012; 13 Suppl 7:S2. [PMID: 22594999 PMCID: PMC3348016 DOI: 10.1186/1471-2105-13-s7-s2] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database. RESULTS In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure dij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context N(i) and N(j) is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing dij by a factor learned from the context N(i) and N(j).Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial Context Coherently in an iterative algorithm--ProDis-ContSHC.We test the performance of ProDis-ContSHC on two benchmark sets, i.e., the ASTRAL 1.73 database and the FSSP/DALI database. Experimental results demonstrate that plugging our supervised contextual dissimilarity measures into the retrieval systems significantly outperforms the context-free dissimilarity/similarity measures and other unsupervised contextual dissimilarity measures that do not use the class label information. CONCLUSIONS Using the contextual proteins with their class labels in the database, we can improve the accuracy of the pairwise dissimilarity/similarity measures dramatically for the protein retrieval tasks. In this work, for the first time, we propose the idea of supervised contextual dissimilarity learning, resulting in the ProDis-ContSHC algorithm. Among different contextual dissimilarity learning approaches that can be used to compare a pair of proteins, ProDis-ContSHC provides the highest accuracy. Finally, ProDis-ContSHC compares favorably with other methods reported in the recent literature.
Collapse
Affiliation(s)
- Jingyan Wang
- King Abdullah University of Science and Technology (KAUST), Mathematical and Computer Sciences and Engineering Division, Thuwal, 23955-6900, Saudi Arabia
| | | | | | | |
Collapse
|
190
|
Schlenker C, Goel A, Tripet BP, Menon S, Willi T, Dlakić M, Young MJ, Lawrence CM, Copié V. Structural studies of E73 from a hyperthermophilic archaeal virus identify the "RH3" domain, an elaborated ribbon-helix-helix motif involved in DNA recognition. Biochemistry 2012; 51:2899-910. [PMID: 22409376 PMCID: PMC3326356 DOI: 10.1021/bi201791s] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Hyperthermophilic archaeal viruses, including Sulfolobus spindle-shaped viruses (SSVs) such as SSV-1 and SSV-Ragged Hills, exhibit remarkable morphology and genetic diversity. However, they remain poorly understood, in part because their genomes exhibit limited or unrecognizable sequence similarity to genes with known function. Here we report structural and functional studies of E73, a 73-residue homodimeric protein encoded within the SSV-Ragged Hills genome. Despite lacking significant sequence similarity, the nuclear magnetic resonance (NMR) structure reveals clear similarity to ribbon-helix-helix (RHH) domains present in numerous proteins involved in transcriptional regulation. In vitro double-stranded DNA (dsDNA) binding experiments confirm the ability of E73 to bind dsDNA in a nonspecific manner with micromolar affinity, and characterization of the K11E variant confirms the location of the predicted DNA binding surface. E73 is distinct, however, from known RHH domains. The RHH motif is elaborated upon by the insertion of a third helix that is tightly integrated into the structural domain, giving rise to the "RH3" fold. Within the homodimer, this helix results in the formation of a conserved, symmetric cleft distal to the DNA binding surface, where it may mediate protein-protein interactions or contribute to the high thermal stability of E73. Analysis of backbone amide dynamics by NMR provides evidence of a rigid core, fast picosecond to nanosecond time scale NH bond vector motions for residues located within the antiparallel β-sheet region of the proposed DNA-binding surface, and slower microsecond to millisecond time scale motions for residues in the α1-α2 loop. The roles of E73 and its SSV homologues in the viral life cycle are discussed.
Collapse
Affiliation(s)
- Casey Schlenker
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
| | - Anupam Goel
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
| | - Brian P. Tripet
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
| | - Smita Menon
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
| | - Taylor Willi
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
| | - Mensur Dlakić
- Department of Microbiology, Montana State University, Bozeman, MT 59717
| | - Mark J. Young
- Department of Microbiology, Montana State University, Bozeman, MT 59717
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT 59717
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717
| | - C Martin Lawrence
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717
| | - Valérie Copié
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717
| |
Collapse
|
191
|
Derbyshire MK, Lanczycki CJ, Bryant SH, Marchler-Bauer A. Annotation of functional sites with the Conserved Domain Database. Database (Oxford) 2012; 2012:bar058. [PMID: 22434827 PMCID: PMC3308149 DOI: 10.1093/database/bar058] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Revised: 11/21/2011] [Accepted: 11/23/2011] [Indexed: 11/13/2022]
Abstract
The overwhelming fraction of proteins whose sequences have been collected in comprehensive databases may never be assessed for function experimentally. Commonly, putative function is assigned based on similarity to experimentally characterized homologs, either on the level of the entire protein or for single evolutionarily conserved domains. The annotation of individual sites provides more detailed insights regarding the correspondence between sequence and function, as well as context for the interpretation of sequence variation and the outcomes of experiments. In general, site annotation has to be extracted from the published literature, and can often be transferred to closely related sequence neighbors. The National Center for Biotechnology Information's Conserved Domain Database (CDD) provides a system for curators to record functional (such as active sites or binding sites for cofactors) or characteristic sites (such as signature motifs), which are conserved across domain families, and for the transfer of that annotation to protein database sequences via high-confidence domain matches. Recently, CDD curators have begun to sort-site annotations into seven categories (active, polypeptide binding, nucleic acid binding, ion binding, chemical binding, post-translational modification and other) and here we present a first comparative analysis of sites obtained via domain model matches, juxtaposed with existing site annotation encountered in high-quality data sets. Site annotation derived from domain annotation has the potential to cover large fractions of protein sequences, and we observe that CDD-based site annotation complements existing site annotation in many cases, which may, in part, originate from CDD's curation practice of collecting sites conserved across diverse taxa and supported by evidence from multiple 3D structures.
Collapse
Affiliation(s)
| | | | | | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38 A, Room 8N805, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
192
|
Abstract
Motivation: Structural alignment methods are widely used to generate gold standard alignments for improving multiple sequence alignments and transferring functional annotations, as well as for assigning structural distances between proteins. However, the correctness of the alignments generated by these methods is difficult to assess objectively since little is known about the exact evolutionary history of most proteins. Since homology is an equivalence relation, an upper bound on alignment quality can be found by assessing the consistency of alignments. Measuring the consistency of current methods of structure alignment and determining the causes of inconsistencies can, therefore, provide information on the quality of current methods and suggest possibilities for further improvement. Results: We analyze the self-consistency of seven widely-used structural alignment methods (SAP, TM-align, Fr-TM-align, MAMMOTH, DALI, CE and FATCAT) on a diverse, non-redundant set of 1863 domains from the SCOP database and demonstrate that even for relatively similar proteins the degree of inconsistency of the alignments on a residue level is high (30%). We further show that levels of consistency vary substantially between methods, with two methods (SAP and Fr-TM-align) producing more consistent alignments than the rest. Inconsistency is found to be higher near gaps and for proteins of low structural complexity, as well as for helices. The ability of the methods to identify good structural alignments is also assessed using geometric measures, for which FATCAT (flexible mode) is found to be the best performer despite being highly inconsistent. We conclude that there is substantial scope for improving the consistency of structural alignment methods. Contact:msadows@nimr.mrc.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M I Sadowski
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London, UK
| | | |
Collapse
|
193
|
Alvarez MA, Yan C. A new protein graph model for function prediction. Comput Biol Chem 2012; 37:6-10. [PMID: 22381922 DOI: 10.1016/j.compbiolchem.2012.01.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2011] [Revised: 01/02/2012] [Accepted: 01/04/2012] [Indexed: 11/27/2022]
Abstract
As several structural proteomic projects are producing an increasing number of protein structures with unknown function, methods that can reliably predict protein functions from protein structures are in urgent need. In this paper, we present a method to explore the clustering patterns of amino acids on the 3-dimensional space for protein function prediction. First, amino acid residues on a protein structure are clustered into spatial groups using hierarchical agglomerative clustering, based on the distance between them. Second, the protein structure is represented using a graph, where each node denotes a cluster of amino acids. The nodes are labeled with an evolutionary profile derived from the multiple alignment of homologous sequences. Then, a shortest-path graph kernel is used to calculate similarities between the graphs. Finally, a support vector machine using this graph kernel is used to train classifiers for protein function prediction. We applied the proposed method to two separate problems, namely, prediction of enzymes and prediction of DNA-binding proteins. In both cases, the results showed that the proposed method outperformed other state-of-the-art methods.
Collapse
Affiliation(s)
- Marco A Alvarez
- Department of Computer Science, Utah State University, Logan, UT 84322, USA
| | | |
Collapse
|
194
|
Abstract
An overwhelming array of structural variants has evolved from a comparatively small number of protein structural domains; which has in turn facilitated an expanse of functional derivatives. Herein, I review the primary mechanisms which have contributed to the vastness of our existing, and expanding, protein repertoires. Protein function prediction strategies, both sequence and structure based, are also discussed and their associated strengths and weaknesses assessed.
Collapse
Affiliation(s)
- Roy D Sleator
- Department of Biological Sciences, Cork Institute of Technology, Cork, Ireland.
| |
Collapse
|
195
|
Tyagi M, Hashimoto K, Shoemaker BA, Wuchty S, Panchenko AR. Large-scale mapping of human protein interactome using structural complexes. EMBO Rep 2012; 13:266-71. [PMID: 22261719 PMCID: PMC3296913 DOI: 10.1038/embor.2011.261] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2011] [Revised: 11/23/2011] [Accepted: 12/09/2011] [Indexed: 11/09/2022] Open
Abstract
Although the identification of protein interactions by high-throughput (HTP) methods progresses at a fast pace, 'interactome' data sets still suffer from high rates of false positives and low coverage. To map the human protein interactome, we describe a new framework that uses experimental evidence on structural complexes, the atomic details of binding interfaces and evolutionary conservation. The structurally inferred interaction network is highly modular and more functionally coherent compared with experimental interaction networks derived from multiple literature citations. Moreover, structurally inferred and high-confidence HTP networks complement each other well, allowing us to construct a merged network to generate testable hypotheses and provide valuable experimental leads.
Collapse
Affiliation(s)
- Manoj Tyagi
- National Center for Biotechnology Information, US National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, Maryland 20894, USA
| | | | | | | | | |
Collapse
|
196
|
Samson F, Shrager R, Tai CH, Sam V, Lee B, Munson PJ, Gibrat JF, Garnier J. DOMIRE: a web server for identifying structural domains and their neighbors in proteins. Bioinformatics 2012; 28:1040-1. [PMID: 22345617 DOI: 10.1093/bioinformatics/bts076] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SUMMARY The DOMIRE web server implements a novel, automatic, protein structural domain assignment procedure based on 3D substructures of the query protein which are also found within structures of a non-redundant protein database. These common 3D substructures are transformed into a co-occurrence matrix that offers a global view of the protein domain organization. Three different algorithms are employed to define structural domain boundaries from this co-occurrence matrix. For each query, a list of structural neighbors and their alignments are provided. DOMIRE, by displaying the protein structural domain organization, can be a useful tool for defining protein common cores and for unravelling the evolutionary relationship between different proteins. AVAILABILITY http://genome.jouy.inra.fr/domire CONTACT jean.garnier@jouy.inra.fr.
Collapse
Affiliation(s)
- Franck Samson
- Institut National de la Recherche Agronomique, UR1077, Unité Mathématique, Informatique et Génome, 78350 Jouy-en-Josas, France
| | | | | | | | | | | | | | | |
Collapse
|
197
|
Tyagi M, Thangudu RR, Zhang D, Bryant SH, Madej T, Panchenko AR. Homology inference of protein-protein interactions via conserved binding sites. PLoS One 2012; 7:e28896. [PMID: 22303436 PMCID: PMC3269416 DOI: 10.1371/journal.pone.0028896] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2011] [Accepted: 11/16/2011] [Indexed: 11/18/2022] Open
Abstract
The coverage and reliability of protein-protein interactions determined by high-throughput experiments still needs to be improved, especially for higher organisms, therefore the question persists, how interactions can be verified and predicted by computational approaches using available data on protein structural complexes. Recently we developed an approach called IBIS (Inferred Biomolecular Interaction Server) to predict and annotate protein-protein binding sites and interaction partners, which is based on the assumption that the structural location and sequence patterns of protein-protein binding sites are conserved between close homologs. In this study first we confirmed high accuracy of our method and found that its accuracy depends critically on the usage of all available data on structures of homologous complexes, compared to the approaches where only a non-redundant set of complexes is employed. Second we showed that there exists a trade-off between specificity and sensitivity if we employ in the prediction only evolutionarily conserved binding site clusters or clusters supported by only one observation (singletons). Finally we addressed the question of identifying the biologically relevant interactions using the homology inference approach and demonstrated that a large majority of crystal packing interactions can be correctly identified and filtered by our algorithm. At the same time, about half of biological interfaces that are not present in the protein crystallographic asymmetric unit can be reconstructed by IBIS from homologous complexes without the prior knowledge of crystal parameters of the query protein.
Collapse
Affiliation(s)
- Manoj Tyagi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Ratna R. Thangudu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Dachuan Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Stephen H. Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Thomas Madej
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (TM); (AP)
| | - Anna R. Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (TM); (AP)
| |
Collapse
|
198
|
Tomii K, Sawada Y, Honda S. Convergent evolution in structural elements of proteins investigated using cross profile analysis. BMC Bioinformatics 2012; 13:11. [PMID: 22244085 PMCID: PMC3398312 DOI: 10.1186/1471-2105-13-11] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Accepted: 01/16/2012] [Indexed: 11/10/2022] Open
Abstract
Background Evolutionary relations of similar segments shared by different protein folds remain controversial, even though many examples of such segments have been found. To date, several methods such as those based on the results of structure comparisons, sequence-based classifications, and sequence-based profile-profile comparisons have been applied to identify such protein segments that possess local similarities in both sequence and structure across protein folds. However, to capture more precise sequence-structure relations, no method reported to date combines structure-based profiles, and sequence-based profiles based on evolutionary information. The former are generally regarded as representing the amino acid preferences at each position of a specific conformation of protein segment. They might reflect the nature of ancient short peptide ancestors, using the results of structural classifications of protein segments. Results This report describes the development and use of "Cross Profile Analysis" to compare sequence-based profiles and structure-based profiles based on amino acid occurrences at each position within a protein segment cluster. Using systematic cross profile analysis, we found structural clusters of 9-residue and 15-residue segments showing remarkably strong correlation with particular sequence profiles. These correlations reflect structural similarities among constituent segments of both sequence-based and structure-based profiles. We also report previously undetectable sequence-structure patterns that transcend protein family and fold boundaries, and present results of the conformational analysis of the deduced peptide of a segment cluster. These results suggest the existence of ancient short-peptide ancestors. Conclusions Cross profile analysis reveals the polyphyletic and convergent evolution of β-hairpin-like structures, which were verified both experimentally and computationally. The results presented here give us new insights into the evolution of short protein segments.
Collapse
|
199
|
Gibney G, Baxevanis AD. Searching NCBI Databases Using Entrez. CURRENT PROTOCOLS IN HUMAN GENETICS 2012; Chapter 6:Unit6.10. [PMID: 21975942 DOI: 10.1002/0471142905.hg0610s71] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two basic protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An alternate protocol builds upon the first basic protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The support protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.
Collapse
|
200
|
Abstract
The wealth of available protein structural data provides unprecedented opportunity to study and better understand the underlying principles of protein folding and protein structure evolution. A key to achieving this lies in the ability to analyse these data and to organize them in a coherent classification scheme. Over the past years several protein classifications have been developed that aim to group proteins based on their structural relationships. Some of these classification schemes explore the concept of structural neighbourhood (structural continuum), whereas other utilize the notion of protein evolution and thus provide a discrete rather than continuum view of protein structure space. This chapter presents a strategy for classification of proteins with known three-dimensional structure. Steps in the classification process along with basic definitions are introduced. Examples illustrating some fundamental concepts of protein folding and evolution with a special focus on the exceptions to them are presented.
Collapse
|