1
|
Abdollahi S, Raoufi Z. A novel vaccine candidate against A. baumannii based on a new OmpW family protein (OmpW2); structural characterization, antigenicity and epitope investigation, and in-vivo analysis. Microb Pathog 2023; 183:106317. [PMID: 37611777 DOI: 10.1016/j.micpath.2023.106317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 06/06/2023] [Accepted: 08/20/2023] [Indexed: 08/25/2023]
Abstract
A. baumannii is an MDR pathogen whose SARS-CoV-2 has recently increased its mortality rate in hospitalized patients. So, the virulence factors investigation and design of a vaccine against this bacterium seem to be critical. In this regard, the OmpW2 protein was structurally characterized by this study, and its B-T cell epitopes were mapped by bioinformatic tools. In-vivo analyses were employed to verify the immunogenicity of this protein and its selected epitopes. The results indicated that OmpW2 is a conserved virulent antigen, not toxic for the host, and not similar to the human or mouse proteome. A putative interaction between OmpW2 and a Fe-S-cluster redox enzyme was detected. Based on the results, OmpW2 belongs to the OmpW superfamily and eight beta sheets have been predicted in its tight beta-barrel structure. Several exposed epitopes were detected in the OmpW2 sequence and structure, and a sub-unit potential vaccine was generated based on the epitopes. The ELISA results indicated that after the second booster vaccination of BALB/c mice with the whole OmpW2 protein or its sub-unit fragment, the IgG titer significantly raised (p < 0.05). The mortality rate and the bacterial burden in the lung, liver, kidney, and spleen in both passive and active immunized mice were significantly decreased (p ≤ 0.001). In-vivo experiments confirmed that the OmpW2 whole protein and its sub-unit fragment induce the host immune system and can be applied to design a commercial vaccine or diagnostic kit.
Collapse
Affiliation(s)
- Sajad Abdollahi
- Department of Biology, Faculty of Basic Science, Behbahan Khatam Alanbia University of Technology, Behbahan, Iran.
| | - Zeinab Raoufi
- Department of Biology, Faculty of Basic Science, Behbahan Khatam Alanbia University of Technology, Behbahan, Iran
| |
Collapse
|
2
|
Abdollahi S, Raoufi Z, Fakoor MH. Physicochemical and structural characterization, epitope mapping and vaccine potential investigation of a new protein containing Tetratrico Peptide Repeats of Acinetobacter baumannii: An in-silico and in-vivo approach. Mol Immunol 2021; 140:22-34. [PMID: 34649027 DOI: 10.1016/j.molimm.2021.10.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 09/13/2021] [Accepted: 10/04/2021] [Indexed: 11/24/2022]
Abstract
Acinetobacter baumannii is an opportunistic multidrug-resistant pathogen that causes a significant mortality rate. The proteins containing Tetratrico Peptide Repeats (TPRs) are involved in the pathogenicity and virulence of bacteria and have different roles such as transfer of bacterial virulence factors to host cells, binding to the host cells and inhibition of phagolysosomal maturation. So, in this study, physicochemical properties of a new protein containing TPRs in A. baumannii which was named PcTPRs1 by this study were characterized and its 3D structure was predicted by in-silico tools. The protein B and T cell epitopes were mapped and its vaccine potential was in-silico and in-vivo investigated. Domain analysis indicated that the protein contains the Flp pilus assembly protein TadD domain which has three TPRs. The helix is dominant in the protein structure, and this protein is an outer membrane antigen which, is extremely conserved among A. baumannii strains; thus, has good properties to be applied as a recombinant vaccine. The best-predicted and refined model was applied in ligand-binding sites and conformational epitopes prediction. Based on epitope mapping results, several epitopes were characterized which could stimulate both immune systems. BLAST results showed the introduced epitopes are completely conserved among A. baumannii strains. The in-vivo analysis indicates that a 101 amino acid fragment of the protein which contains the best selected epitope, can produce a good protectivity against A. baumannii as well as the whole TPR protein and thus could be investigated as an effective subunit and potential vaccines.
Collapse
Affiliation(s)
- Sajad Abdollahi
- Department of Biology, Faculty of Basic Science, Behbahan Khatam Alanbia University of Technology, Behbahan, Iran.
| | - Zeinab Raoufi
- Department of Biology, Faculty of Basic Science, Behbahan Khatam Alanbia University of Technology, Behbahan, Iran.
| | | |
Collapse
|
3
|
Selim KA, Tremiño L, Marco-Marín C, Alva V, Espinosa J, Contreras A, Hartmann MD, Forchhammer K, Rubio V. Functional and structural characterization of PII-like protein CutA does not support involvement in heavy metal tolerance and hints at a small-molecule carrying/signaling role. FEBS J 2020; 288:1142-1162. [PMID: 32599651 DOI: 10.1111/febs.15464] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 04/26/2020] [Accepted: 06/01/2020] [Indexed: 12/23/2022]
Abstract
The PII-like protein CutA is annotated as being involved in Cu2+ tolerance, based on analysis of Escherichia coli mutants. However, the precise cellular function of CutA remains unclear. Our bioinformatic analysis reveals that CutA proteins are universally distributed across all domains of life. Based on sequence-based clustering, we chose representative cyanobacterial CutA proteins for physiological, biochemical, and structural characterization and examined their involvement in heavy metal tolerance, by generating CutA mutants in filamentous Nostoc sp. and in unicellular Synechococcus elongatus. However, we were unable to find any involvement of cyanobacterial CutA in metal tolerance under various conditions. This prompted us to re-examine experimentally the role of CutA in protecting E. coli from Cu2+ . Since we found no effect on copper tolerance, we conclude that CutA plays a different role that is not involved in metal protection. We resolved high-resolution CutA structures from Nostoc and S. elongatus. Similarly to their counterpart from E. coli and to canonical PII proteins, cyanobacterial CutA proteins are trimeric in solution and in crystal structure; however, no binding affinity for small signaling molecules or for Cu2+ could be detected. The clefts between the CutA subunits, corresponding to the binding pockets of PII proteins, are formed by conserved aromatic and charged residues, suggesting a conserved binding/signaling function for CutA. In fact, we find binding of organic Bis-Tris/MES molecules in CutA crystal structures, revealing a strong tendency of these pockets to accommodate cargo. This highlights the need to search for the potential physiological ligands and for their signaling functions upon binding to CutA. DATABASES: Structural data are available in Protein Data Bank (PDB) under the accession numbers 6GDU, 6GDV, 6GDW, 6GDX, 6T76, and 6T7E.
Collapse
Affiliation(s)
- Khaled A Selim
- Interfaculty Institute for Microbiology and Infection Medicine, Organismic Interactions Department, Tübingen University, Germany.,Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Lorena Tremiño
- Instituto de Biomedicina de Valencia (IBV-CSIC), CIBER de Enfermedades Raras (CIBERER-ISCIII), Valencia, Spain
| | - Clara Marco-Marín
- Instituto de Biomedicina de Valencia (IBV-CSIC), CIBER de Enfermedades Raras (CIBERER-ISCIII), Valencia, Spain
| | - Vikram Alva
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Javier Espinosa
- Departamento de Fisiología, Genética y Microbiología, Universidad de Alicante, Spain
| | - Asunción Contreras
- Departamento de Fisiología, Genética y Microbiología, Universidad de Alicante, Spain
| | - Marcus D Hartmann
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Karl Forchhammer
- Interfaculty Institute for Microbiology and Infection Medicine, Organismic Interactions Department, Tübingen University, Germany
| | - Vicente Rubio
- Instituto de Biomedicina de Valencia (IBV-CSIC), CIBER de Enfermedades Raras (CIBERER-ISCIII), Valencia, Spain
| |
Collapse
|
4
|
An in silico structural and physicochemical characterization of TonB-dependent copper receptor in A. baumannii. Microb Pathog 2018. [DOI: 10.1016/j.micpath.2018.03.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
5
|
Ingale AG. Prediction of Structural and Functional Aspects of Protein. PHARMACEUTICAL SCIENCES 2017. [DOI: 10.4018/978-1-5225-1762-7.ch021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.
Collapse
|
6
|
In silico determination and validation of baumannii acinetobactin utilization a structure and ligand binding site. BIOMED RESEARCH INTERNATIONAL 2013; 2013:172784. [PMID: 24106696 PMCID: PMC3780550 DOI: 10.1155/2013/172784] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2013] [Revised: 07/21/2013] [Accepted: 07/31/2013] [Indexed: 01/21/2023]
Abstract
Acinetobacter baumannii is a deadly nosocomial pathogen. Iron is an essential element for the pathogen. Under iron-restricted conditions, the bacterium expresses iron-regulated outer membrane proteins (IROMPs). Baumannii acinetobactin utilization (BauA) is the most important member of IROMPs in A. baumannii. Determination of its tertiary structure could help deduction of its functions and its interactions with ligands. The present study unveils BauA 3D structure via in silico approaches. Apart from ab initio, other rational methods such as homology modeling and threading were invoked to achieve the purpose. For homology modeling, BLAST was run on the sequence in order to find the best template. The template was then served to model the 3D structure. All the models built were evaluated qualitatively. The best model predicted by LOMETS was selected for analyses. Refinement of 3D structure as well as determination of its clefts and ligand binding sites was carried out on the structure. In contrast to the typical trimeric arrangement found in porins, BauA is monomeric. The barrel is formed by 22 antiparallel transmembrane β -strands. There are short periplasmic turns and longer surface-located loops. An N-terminal domain referred to either as the cork, the plug, or the hatch domain occludes the β -barrel.
Collapse
|
7
|
Wang Z, Yin P, Lee JS, Parasuram R, Somarowthu S, Ondrechen MJ. Protein function annotation with Structurally Aligned Local Sites of Activity (SALSAs). BMC Bioinformatics 2013; 14 Suppl 3:S13. [PMID: 23514271 PMCID: PMC3584854 DOI: 10.1186/1471-2105-14-s3-s13] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background The prediction of biochemical function from the 3D structure of a protein has proved to be much more difficult than was originally foreseen. A reliable method to test the likelihood of putative annotations and to predict function from structure would add tremendous value to structural genomics data. We report on a new method, Structurally Aligned Local Sites of Activity (SALSA), for the prediction of biochemical function based on a local structural match at the predicted catalytic or binding site. Results Implementation of the SALSA method is described. For the structural genomics protein PY01515 (PDB ID 2aqw) from Plasmodium yoelii, it is shown that the putative annotation, Orotidine 5'-monophosphate decarboxylase (OMPDC), is most likely correct. SALSA analysis of YP_001304206.1 (PDB ID 3h3l), a putative sugar hydrolase from Parabacteroides distasonis, shows that its active site does not bear close resemblance to any previously characterized member of its superfamily, the Concanavalin A-like lectins/glucanases. It is noted that three residues in the active site of the thermophilic beta-1,4-xylanase from Nonomuraea flexuosa (PDB ID 1m4w), Y78, E87, and E176, overlap with POOL-predicted residues of similar type, Y168, D153, and E232, in YP_001304206.1. The substrate recognition regions of the two proteins are rather different, suggesting that YP_001304206.1 is a new functional type within the superfamily. A structural genomics protein from Mycobacterium avium (PDB ID 3q1t) has been reported to be an enoyl-CoA hydratase (ECH), but SALSA analysis shows a poor match between the predicted residues for the SG protein and those of known ECHs. A better local structural match is obtained with Anabaena beta-diketone hydrolase (ABDH), a known β-diketone hydrolase from Cyanobacterium anabaena (PDB ID 2j5s). This suggests that the reported ECH function of the SG protein is incorrect and that it is more likely a β-diketone hydrolase. Conclusions A local site match provides a more compelling function prediction than that obtainable from a simple 3D structure match. The present method can confirm putative annotations, identify misannotation, and in some cases suggest a more probable annotation.
Collapse
Affiliation(s)
- Zhouxi Wang
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA
| | | | | | | | | | | |
Collapse
|
8
|
Sureshan V, Deshpande CN, Boucher Y, Koenig JE, Stokes HW, Harrop SJ, Curmi PMG, Mabbutt BC. Integron gene cassettes: a repository of novel protein folds with distinct interaction sites. PLoS One 2013; 8:e52934. [PMID: 23349695 PMCID: PMC3548836 DOI: 10.1371/journal.pone.0052934] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2012] [Accepted: 11/26/2012] [Indexed: 11/19/2022] Open
Abstract
Mobile gene cassettes captured within integron arrays encompass a vast and diverse pool of genetic novelty. In most cases, functional annotation of gene cassettes directly recovered by cassette-PCR is obscured by their characteristically high sequence novelty. This inhibits identification of those specific functions or biological features that might constitute preferential factors for lateral gene transfer via the integron system. A structural genomics approach incorporating x-ray crystallography has been utilised on a selection of cassettes to investigate evolutionary relationships hidden at the sequence level. Gene cassettes were accessed from marine sediments (pristine and contaminated sites), as well as a range of Vibrio spp. We present six crystal structures, a remarkably high proportion of our survey of soluble proteins, which were found to possess novel folds. These entirely new structures are diverse, encompassing all-α, α+β and α/β fold classes, and many contain clear binding pocket features for small molecule substrates. The new structures emphasise the large repertoire of protein families encoded within the integron cassette metagenome and which remain to be characterised. Oligomeric association is a notable recurring property common to these new integron-derived proteins. In some cases, the protein–protein contact sites utilised in homomeric assembly could instead form suitable contact points for heterogeneous regulator/activator proteins or domains. Such functional features are ideal for a flexible molecular componentry needed to ensure responsive and adaptive bacterial functions.
Collapse
Affiliation(s)
- Visaahini Sureshan
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Chandrika N. Deshpande
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Yan Boucher
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Jeremy E. Koenig
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | | | - H. W. Stokes
- ithree institute, University of Technology, Sydney, New South Wales, Australia
| | - Stephen J. Harrop
- School of Physics, University of New South Wales, New South Wales, Australia
| | - Paul M. G. Curmi
- School of Physics, University of New South Wales, New South Wales, Australia
- Centre for Applied Medical Research, St Vincent's Hospital, Sydney, New South Wales, Australia
| | - Bridget C. Mabbutt
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia
- * E-mail:
| |
Collapse
|
9
|
Han L, Zhang YJ, Song J, Liu MS, Zhang Z. Identification of catalytic residues using a novel feature that integrates the microenvironment and geometrical location properties of residues. PLoS One 2012; 7:e41370. [PMID: 22829945 PMCID: PMC3400608 DOI: 10.1371/journal.pone.0041370] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 06/20/2012] [Indexed: 11/18/2022] Open
Abstract
Enzymes play a fundamental role in almost all biological processes and identification of catalytic residues is a crucial step for deciphering the biological functions and understanding the underlying catalytic mechanisms. In this work, we developed a novel structural feature called MEDscore to identify catalytic residues, which integrated the microenvironment (ME) and geometrical properties of amino acid residues. Firstly, we converted a residue's ME into a series of spatially neighboring residue pairs, whose likelihood of being located in a catalytic ME was deduced from a benchmark enzyme dataset. We then calculated an ME-based score, termed as MEscore, by summing up the likelihood of all residue pairs. Secondly, we defined a parameter called Dscore to measure the relative distance of a residue to the center of the protein, provided that catalytic residues are typically located in the center of the protein structure. Finally, we defined the MEDscore feature based on an effective nonlinear integration of MEscore and Dscore. When evaluated on a well-prepared benchmark dataset using five-fold cross-validation tests, MEDscore achieved a robust performance in identifying catalytic residues with an AUC1.0 of 0.889. At a ≤ 10% false positive rate control, MEDscore correctly identified approximately 70% of the catalytic residues. Remarkably, MEDscore achieved a competitive performance compared with the residue conservation score (e.g. CONscore), the most informative singular feature predominantly employed to identify catalytic residues. To the best of our knowledge, MEDscore is the first singular structural feature exhibiting such an advantage. More importantly, we found that MEDscore is complementary with CONscore and a significantly improved performance can be achieved by combining CONscore with MEDscore in a linear manner. As an implementation of this work, MEDscore has been made freely accessible at http://protein.cau.edu.cn/mepi/.
Collapse
Affiliation(s)
- Lei Han
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
| | - Yong-Jun Zhang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, People's Republic of China
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, People's Republic of China
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Ming S. Liu
- CSIRO - Mathematics, Informatics and Statistics, Clayton, Victoria, Australia
- * E-mail: (MSL); (ZZ)
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
- * E-mail: (MSL); (ZZ)
| |
Collapse
|
10
|
Abstract
An overwhelming array of structural variants has evolved from a comparatively small number of protein structural domains; which has in turn facilitated an expanse of functional derivatives. Herein, I review the primary mechanisms which have contributed to the vastness of our existing, and expanding, protein repertoires. Protein function prediction strategies, both sequence and structure based, are also discussed and their associated strengths and weaknesses assessed.
Collapse
Affiliation(s)
- Roy D Sleator
- Department of Biological Sciences, Cork Institute of Technology, Cork, Ireland.
| |
Collapse
|
11
|
Buchko GW, Robinson H. Crystal structure of cce_0566 from Cyanothece 51142, a protein associated with nitrogen fixation in the DUF269 family. FEBS Lett 2012; 586:350-5. [PMID: 22289180 PMCID: PMC3641832 DOI: 10.1016/j.febslet.2012.01.037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2012] [Revised: 01/16/2012] [Accepted: 01/16/2012] [Indexed: 11/24/2022]
Abstract
The crystal structure for cce_0566 (171 aa, 19.4 kDa), a DUF269 annotated protein from the diazotrophic cyanobacterium Cyanothece sp. ATCC 51142, was determined to 1.60Å resolution. Cce_0566 is a homodimer with each molecule composed of eight α-helices folded on one side of a three strand anti-parallel β-sheet. Hydrophobic interactions between the side chains of largely conserved residues on the surface of each β-sheet hold the dimer together. The fold observed for cce_0566 may be unique to proteins in the DUF269 family, hence, the protein may also have a function unique to nitrogen fixation. A solvent accessible cleft containing conserved charged residues near the dimer interface could represent the active site or ligand-binding surface for the protein's biological function.
Collapse
Affiliation(s)
- Garry W Buchko
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA.
| | | |
Collapse
|
12
|
Abstract
The recent explosion in the number and diversity of novel proteins identified by the large-scale "omics" technologies poses new and important questions to the blossoming field of systems biology--what are all these proteins, how did they come about, and most importantly, what do they do? From a comparatively small number of protein structural domains a staggering array of structural variants has evolved, which has in turn facilitated an expanse of functional derivatives. This review considers the primary mechanisms that have contributed to the vastness of our existing, and expanding, protein repertoires, while also outlining the protocols available for elucidating their true biological function. The various function prediction programs available, both sequence and structure based, are discussed and their associated strengths and weaknesses outlined.
Collapse
Affiliation(s)
- Roy D Sleator
- Department of Biological Sciences, Cork Institute of Technology, Bishopstown, Cork, Ireland.
| |
Collapse
|
13
|
Doppelt-Azeroual O, Delfaud F, Moriaud F, de Brevern AG. Fast and automated functional classification with MED-SuMo: an application on purine-binding proteins. Protein Sci 2010; 19:847-67. [PMID: 20162627 DOI: 10.1002/pro.364] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Ligand-protein interactions are essential for biological processes, and precise characterization of protein binding sites is crucial to understand protein functions. MED-SuMo is a powerful technology to localize similar local regions on protein surfaces. Its heuristic is based on a 3D representation of macromolecules using specific surface chemical features associating chemical characteristics with geometrical properties. MED-SMA is an automated and fast method to classify binding sites. It is based on MED-SuMo technology, which builds a similarity graph, and it uses the Markov Clustering algorithm. Purine binding sites are well studied as drug targets. Here, purine binding sites of the Protein DataBank (PDB) are classified. Proteins potentially inhibited or activated through the same mechanism are gathered. Results are analyzed according to PROSITE annotations and to carefully refined functional annotations extracted from the PDB. As expected, binding sites associated with related mechanisms are gathered, for example, the Small GTPases. Nevertheless, protein kinases from different Kinome families are also found together, for example, Aurora-A and CDK2 proteins which are inhibited by the same drugs. Representative examples of different clusters are presented. The effectiveness of the MED-SMA approach is demonstrated as it gathers binding sites of proteins with similar structure-activity relationships. Moreover, an efficient new protocol associates structures absent of cocrystallized ligands to the purine clusters enabling those structures to be associated with a specific binding mechanism. Applications of this classification by binding mode similarity include target-based drug design and prediction of cross-reactivity and therefore potential toxic side effects.
Collapse
Affiliation(s)
- Olivia Doppelt-Azeroual
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Université Paris Diderot-Paris 7, Institut National de la Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, 75739 Paris cedex 15, France.
| | | | | | | |
Collapse
|
14
|
Vorobjev YN. Blind docking method combining search of low-resolution binding sites with ligand pose refinement by molecular dynamics-based global optimization. J Comput Chem 2010; 31:1080-92. [PMID: 19821514 DOI: 10.1002/jcc.21394] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
This study describes the development of a new blind hierarchical docking method, bhDock, its implementation, and accuracy assessment. The bhDock method uses two-step algorithm. First, a comprehensive set of low-resolution binding sites is determined by analyzing entire protein surface and ranked by a simple score function. Second, ligand position is determined via a molecular dynamics-based method of global optimization starting from a small set of high ranked low-resolution binding sites. The refinement of the ligand binding pose starts from uniformly distributed multiple initial ligand orientations and uses simulated annealing molecular dynamics coupled with guided force-field deformation of protein-ligand interactions to find the global minimum. Assessment of the bhDock method on the set of 37 protein-ligand complexes has shown the success rate of predictions of 78%, which is better than the rate reported for the most cited docking methods, such as AutoDock, DOCK, GOLD, and FlexX, on the same set of complexes.
Collapse
Affiliation(s)
- Yury N Vorobjev
- Institute of Chemical Biology and Fundamental Medicine of the Siberian Branch of the Russian Academy of Science, Novosibirsk, Russia.
| |
Collapse
|
15
|
Buchko GW, Robinson H, Abendroth J, Staker BL, Myler PJ. Structural characterization of Burkholderia pseudomallei adenylate kinase (Adk): profound asymmetry in the crystal structure of the 'open' state. Biochem Biophys Res Commun 2010; 394:1012-7. [PMID: 20331978 DOI: 10.1016/j.bbrc.2010.03.112] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2010] [Accepted: 03/17/2010] [Indexed: 02/02/2023]
Abstract
In all organisms adenylate kinases (Adks) play a vital role in cellular energy metabolism and nucleic acid synthesis. Due to differences in catalytic properties between the Adks found in prokaryotes and in the cytoplasm of eukaryotes, there is interest in targeting this enzyme for new drug therapies against infectious bacterial agents. Here we report the 2.1A resolution crystal structure for the 220-residue Adk from Burkholderia pseudomallei (BpAdk), the etiological agent responsible for the infectious disease melioidosis. The general structure of apo BpAdk is similar to other Adk structures, composed of a CORE subdomain with peripheral ATP-binding (ATP(bd)) and LID subdomains. The two molecules in the asymmetric unit have significantly different conformations, with a backbone RMSD of 1.46 A. These two BpAdk conformations may represent 'open' Adk sub-states along the preferential pathway to the 'closed' substrate-bound state.
Collapse
Affiliation(s)
- Garry W Buchko
- Biological Sciences Division and Seattle Structural Genomics Center for Infectious Disease, Pacific Northwest National Laboratory, Richland, WA 99352, USA.
| | | | | | | | | |
Collapse
|
16
|
An overview of in silico protein function prediction. Arch Microbiol 2010; 192:151-5. [DOI: 10.1007/s00203-010-0549-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2009] [Revised: 01/08/2010] [Accepted: 01/10/2010] [Indexed: 12/12/2022]
|
17
|
Buchko GW, Robinson H, Addlagatta A. Structural characterization of the protein cce_0567 from Cyanothece 51142, a metalloprotein associated with nitrogen fixation in the DUF683 family. BIOCHIMICA ET BIOPHYSICA ACTA 2009; 1794:627-33. [PMID: 19336042 PMCID: PMC3707797 DOI: 10.1016/j.bbapap.2009.01.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2008] [Revised: 01/06/2009] [Accepted: 01/07/2009] [Indexed: 11/26/2022]
Abstract
The genomes of many cyanobacteria contain the sequence for a small protein with a common "Domain of Unknown Function" grouped into the DUF683 protein family. While the biological function of DUF683 is still not known, their genomic location within nitrogen fixation clusters suggests that DUF683 proteins may play a role in the process. The diurnal cyanobacterium Cyanothece sp. PCC 51142 contains a gene for a protein that falls into the DUF683 family, cce_0567 (78 aa, 9.0 kDa). In an effort to elucidate the biochemical role DUF683 proteins may play in nitrogen fixation, we have determined the first crystal structure for a protein in this family, cce_0567, to 1.84 A resolution. Cce_0567 crystallized in space group P2(1) with two protein molecules and one Ni(2+) cation per asymmetric unit. The protein is composed of two alpha-helices, residues P11 to G41 (alpha1) and L49-E74 (alpha2), with the second alpha-helix containing a short 3(10)-helix (Y46-N48). A four-residue linker (L42-D45) between the helices allows them to form an anti-parallel bundle and cross over each other towards their termini. In solution it is likely that two molecules of cce_0567 form a rod-like dimer by the stacking interactions of approximately 1/2 of the protein. Histidine-36 is highly conserved in all known DUF683 proteins and the N2 nitrogen of the H36 side chain of each molecule in the dimer is coordinated with Ni(2+) in the crystal structure. The divalent cation Ni(2+) was titrated into (15)N-labeled cce_0567 and chemical shift perturbations were observed only in the (1)H-(15)N HSQC spectra for residues at, or near, the site of Ni(2+) binding observed in the crystal structure. There was no evidence for an increase in the size of cce_0567 upon binding Ni(2+), even in large molar excess of Ni(2+), indicating that a metal was not required for dimer formation. Circular dichroism spectroscopy indicated that cce_0567 was extremely robust, with a melting temperature of approximately 62 degrees C that was reversible.
Collapse
Affiliation(s)
- Garry W Buchko
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA.
| | | | | |
Collapse
|
18
|
Liu ZP, Wu LY, Wang Y, Zhang XS, Chen L. Bridging protein local structures and protein functions. Amino Acids 2008; 35:627-50. [PMID: 18421562 PMCID: PMC7088341 DOI: 10.1007/s00726-008-0088-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Accepted: 03/10/2008] [Indexed: 12/11/2022]
Abstract
One of the major goals of molecular and evolutionary biology is to understand the functions of proteins by extracting functional information from protein sequences, structures and interactions. In this review, we summarize the repertoire of methods currently being applied and report recent progress in the field of in silico annotation of protein function based on the accumulation of vast amounts of sequence and structure data. In particular, we emphasize the newly developed structure-based methods, which are able to identify locally structural motifs and reveal their relationship with protein functions. These methods include computational tools to identify the structural motifs and reveal the strong relationship between these pre-computed local structures and protein functions. We also discuss remaining problems and possible directions for this exciting and challenging area.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 100080, Beijing, China
| | | | | | | | | |
Collapse
|
19
|
Peisach E, Wang L, Burroughs AM, Aravind L, Dunaway-Mariano D, Allen KN. The X-ray crystallographic structure and activity analysis of a Pseudomonas-specific subfamily of the HAD enzyme superfamily evidences a novel biochemical function. Proteins 2008; 70:197-207. [PMID: 17654544 DOI: 10.1002/prot.21583] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The haloacid dehalogenase (HAD) superfamily is a large family of proteins dominated by phosphotransferases. Thirty-three sequence families within the HAD superfamily (HADSF) have been identified to assist in function assignment. One such family includes the enzyme phosphoacetaldehyde hydrolase (phosphonatase). Phosphonatase possesses the conserved Rossmanniod core domain and a C1-type cap domain. Other members of this family do not possess a cap domain and because the cap domain of phosphonatase plays an important role in active site desolvation and catalysis, the function of the capless family members must be unique. A representative of the capless subfamily, PSPTO_2114, from the plant pathogen Pseudomonas syringae, was targeted for catalytic activity and structure analyses. The X-ray structure of PSPTO_2114 reveals a capless homodimer that conserves some but not all of the intersubunit contacts contributed by the core domains of the phosphonatase homodimer. The region of the PSPTO_2114 that corresponds to the catalytic scaffold of phosphonatase (and other HAD phosphotransfereases) positions amino acid residues that are ill suited for Mg+2 cofactor binding and mediation of phosphoryl group transfer between donor and acceptor substrates. The absence of phosphotransferase activity in PSPTO_2114 was confirmed by kinetic assays. To explore PSPTO_2114 function, the conservation of sequence motifs extending outside of the HADSF catalytic scaffold was examined. The stringently conserved residues among PSPTO_2114 homologs were mapped onto the PSPTO_2114 three-dimensional structure to identify a surface region unique to the family members that do not possess a cap domain. The hypothesis that this region is used in protein-protein recognition is explored to define, for the first time, HADSF proteins which have acquired a function other than that of a catalyst.
Collapse
Affiliation(s)
- Ezra Peisach
- Department of Physiology and Biophysics, Boston University School of Medicine, 715 Albany Street, Boston, Massachusetts 02118-2394, USA
| | | | | | | | | | | |
Collapse
|
20
|
Jianlin Cheng, Tegge A, Baldi P. Machine Learning Methods for Protein Structure Prediction. IEEE Rev Biomed Eng 2008; 1:41-9. [DOI: 10.1109/rbme.2008.2008239] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
21
|
Abstract
We present a method, termed AutoLigand, for the prediction of ligand-binding sites in proteins of known structure. The method searches the space surrounding the protein and finds the contiguous envelope with the specified volume of atoms, which has the largest possible interaction energy with the protein. It uses a full atomic representation, with atom types for carbon, hydrogen, oxygen, nitrogen and sulfur (and others, if desired), and is designed to minimize the need for artificial geometry. Testing on a set of 187 diverse protein-ligand complexes has shown that the method is successful in predicting the location and approximate volume of the binding site in 73% of cases. Additional testing was performed on a set of 96 protein-ligand complexes with crystallographic structures of apo and holo forms, and AutoLigand was able to predict the binding site in 80% of the apo structures.
Collapse
Affiliation(s)
- Rodney Harris
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037, USA
| | | | | |
Collapse
|
22
|
Koclega KD, Chruszcz M, Zimmerman MD, Cymborowski M, Evdokimova E, Minor W. Crystal structure of a transcriptional regulator TM1030 from Thermotoga maritima solved by an unusual MAD experiment. J Struct Biol 2007; 159:424-32. [PMID: 17588774 PMCID: PMC2093942 DOI: 10.1016/j.jsb.2007.04.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2007] [Revised: 04/20/2007] [Accepted: 04/30/2007] [Indexed: 01/07/2023]
Abstract
The crystal structure of a putative transcriptional regulator protein TM1030 from Thermotoga maritima, a hyperthermophilic bacterium, was determined by an unusual multi-wavelength anomalous dispersion method at 2.0 A resolution, in which data from two different crystals and two different beamlines were used. The protein belongs to the tetracycline repressor TetR superfamily. The three-dimensional structure of TM1030 is similar to the structures of proteins that function as multidrug-binding transcriptional repressors, and contains a large solvent-exposed pocket similar to the drug-binding pockets present in those repressors. The asymmetric unit in the crystal structure contains a single protein chain and the twofold symmetry of the dimer is adopted by the crystal symmetry. The structure described in this paper is an apo- form of TM1030. Although it is known that the protein is significantly overexpressed during heat shock, its detailed function cannot be yet explained.
Collapse
Affiliation(s)
- Katarzyna D. Koclega
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22908, USA
- Institute of Technical Biochemistry, Faculty of Biotechnology and Food Sciences, Technical University of Lodz, Lodz, Poland
- Midwest Center for Structural Genomics
| | - Maksymilian Chruszcz
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22908, USA
- Midwest Center for Structural Genomics
| | - Matthew D. Zimmerman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22908, USA
- Midwest Center for Structural Genomics
| | - Marcin Cymborowski
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22908, USA
- Midwest Center for Structural Genomics
| | - Elena Evdokimova
- Department of Medicinal Biophysics, University of Toronto, and Ontario Center for Structural Proteomics, Ontario Cancer Institute, Toronto, Ontario M5G 2C4, Canada
- Midwest Center for Structural Genomics
| | - Wladek Minor
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22908, USA
- Midwest Center for Structural Genomics
- *Correspondence e-mail: , University of Virginia, Department of Molecular Physiology and Biological Physics, 1340 Jefferson Park Avenue, Charlottesville, VA 22908, USA, Phone: +1-434-243-0033, Fax: +1-434-982-1616
| |
Collapse
|
23
|
Sangar V, Blankenberg DJ, Altman N, Lesk AM. Quantitative sequence-function relationships in proteins based on gene ontology. BMC Bioinformatics 2007; 8:294. [PMID: 17686158 PMCID: PMC1976327 DOI: 10.1186/1471-2105-8-294] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2007] [Accepted: 08/08/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function--the basis of transfer of annotations in databases--must therefore be regarded with caution. Here, we present a quantitative study of sequence and function divergence, based on the Gene Ontology classification of function. We determined the relationship between sequence divergence and function divergence in 6828 protein families from the PFAM database. Within families there is a broad range of sequence similarity from very closely related proteins--for instance, orthologs in different mammals--to very distantly-related proteins at the limit of reliable recognition of homology. RESULTS We correlated the divergence in sequences determined from pairwise alignments, and the divergence in function determined by path lengths in the Gene Ontology graph, taking into account the fact that many proteins have multiple functions. Our results show that, among homologous proteins, the proportion of divergent functions decreases dramatically above a threshold of sequence similarity at about 50% residue identity. For proteins with more than 50% residue identity, transfer of annotation between homologs will lead to an erroneous attribution with a totally dissimilar function in fewer than 6% of cases. This means that for very similar proteins (about 50 % identical residues) the chance of completely incorrect annotation is low; however, because of the phenomenon of recruitment, it is still non-zero. CONCLUSION Our results describe general features of the evolution of protein function, and serve as a guide to the reliability of annotation transfer, based on the closeness of the relationship between a new protein and its nearest annotated relative.
Collapse
Affiliation(s)
- Vineet Sangar
- Department of Biochemistry and Molecular Biology, Center of Computational Biology and Genomics, The Huck Institute for Genomics, Proteomics and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Daniel J Blankenberg
- Department of Biochemistry and Molecular Biology, Center of Computational Biology and Genomics, The Huck Institute for Genomics, Proteomics and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Naomi Altman
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology, Center of Computational Biology and Genomics, The Huck Institute for Genomics, Proteomics and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
24
|
Affiliation(s)
- Ivano Bertini
- Magnetic Resonance Center (CERM) and Department of Chemistry – University of Florence, via L. Sacconi 6, 50019 Sesto Fiorentino, Italy, Fax: +39‐055‐457‐4271
| | - Antonio Rosato
- Magnetic Resonance Center (CERM) and Department of Chemistry – University of Florence, via L. Sacconi 6, 50019 Sesto Fiorentino, Italy, Fax: +39‐055‐457‐4271
| |
Collapse
|
25
|
Marti-Renom MA, Rossi A, Al-Shahrour F, Davis FP, Pieper U, Dopazo J, Sali A. The AnnoLite and AnnoLyze programs for comparative annotation of protein structures. BMC Bioinformatics 2007; 8 Suppl 4:S4. [PMID: 17570147 PMCID: PMC1892083 DOI: 10.1186/1471-2105-8-s4-s4] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Advances in structural biology, including structural genomics, have resulted in a rapid increase in the number of experimentally determined protein structures. However, about half of the structures deposited by the structural genomics consortia have little or no information about their biological function. Therefore, there is a need for tools for automatically and comprehensively annotating the function of protein structures. We aim to provide such tools by applying comparative protein structure annotation that relies on detectable relationships between protein structures to transfer functional annotations. Here we introduce two programs, AnnoLite and AnnoLyze, which use the structural alignments deposited in the DBAli database. Description AnnoLite predicts the SCOP, CATH, EC, InterPro, PfamA, and GO terms with an average sensitivity of ~90% and average precision of ~80%. AnnoLyze predicts ligand binding site and domain interaction patches with an average sensitivity of ~70% and average precision of ~30%, correctly localizing binding sites for small molecules in ~95% of its predictions. Conclusion The AnnoLite and AnnoLyze programs for comparative annotation of protein structures can reliably and automatically annotate new protein structures. The programs are fully accessible via the Internet as part of the DBAli suite of tools at .
Collapse
Affiliation(s)
- Marc A Marti-Renom
- Structural Genomics Unit, Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Andrea Rossi
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94143, USA
| | - Fátima Al-Shahrour
- Functional Genomics Unit, Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Fred P Davis
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94143, USA
| | - Ursula Pieper
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94143, USA
| | - Joaquín Dopazo
- Functional Genomics Unit, Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Andrej Sali
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94143, USA
| |
Collapse
|
26
|
Marti-Renom MA, Pieper U, Madhusudhan MS, Rossi A, Eswar N, Davis FP, Al-Shahrour F, Dopazo J, Sali A. DBAli tools: mining the protein structure space. Nucleic Acids Res 2007; 35:W393-7. [PMID: 17478513 PMCID: PMC1933139 DOI: 10.1093/nar/gkm236] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The DBAli tools use a comprehensive set of structural alignments in the DBAli database to leverage the structural information deposited in the Protein Data Bank (PDB). These tools include (i) the DBAlit program that allows users to input the 3D coordinates of a protein structure for comparison by MAMMOTH against all chains in the PDB; (ii) the AnnoLite and AnnoLyze programs that annotate a target structure based on its stored relationships to other structures; (iii) the ModClus program that clusters structures by sequence and structure similarities; (iv) the ModDom program that identifies domains as recurrent structural fragments and (v) an implementation of the COMPARER method in the SALIGN command in MODELLER that creates a multiple structure alignment for a set of related protein structures. Thus, the DBAli tools, which are freely accessible via the World Wide Web at http://salilab.org/DBAli/, allow users to mine the protein structure space by establishing relationships between protein structures and their functions.
Collapse
Affiliation(s)
- Marc A Marti-Renom
- Structural Genomics Unit, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94158-2330, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Kawabata T, Go N. Detection of pockets on protein surfaces using small and large probe spheres to find putative ligand binding sites. Proteins 2007; 68:516-29. [PMID: 17444522 DOI: 10.1002/prot.21283] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
One of the simplest ways to predict ligand binding sites is to identify pocket-shaped regions on the protein surface. Many programs have already been proposed to identify these pocket regions. Examination of their algorithms revealed that a pocket intrinsically has two arbitrary properties, "size" and "depth". We proposed a new definition for pockets using two explicit adjustable parameters that correspond to these two arbitrary properties. A pocket region is defined as a space into which a small probe can enter, but a large probe cannot. The radii of small and large probe spheres are the two parameters that correspond to the "size" and "depth" of the pockets, respectively. These values can be adjusted individual putative ligand molecule. To determine the optimal value of the large probe spheres radius, we generated pockets for thousands of protein structures in the database, using several size of large probe spheres, examined the correspondence of these pockets with known binding site positions. A new measure of shallowness, a minimum inaccessible radius, R(inaccess), indicated that binding sites of coenzymes are very deep, while those for adenine/guanine mononucleotide have only medium shallowness and those for short peptides and oligosaccharides are shallow. The optimal radius of large probe spheres was 3-4 A for the coenzymes, 4 A for adenine/guanine mononucleotides, and 5 A or more for peptides/oligosaccharides. Comparison of our program with two other popular pocket-finding programs showed that our program had a higher performance of detecting binding pockets, although it required more computational time.
Collapse
Affiliation(s)
- Takeshi Kawabata
- Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, Nara, Japan.
| | | |
Collapse
|
28
|
Marsden RL, Lewis TA, Orengo CA. Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint. BMC Bioinformatics 2007; 8:86. [PMID: 17349043 PMCID: PMC1829165 DOI: 10.1186/1471-2105-8-86] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2006] [Accepted: 03/09/2007] [Indexed: 11/25/2022] Open
Abstract
Background Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI), the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families. Results In this analysis we have derived a comprehensive domain annotation of the genomes using CATH, Pfam-A and Newfam domain families. We consider what proportions of structurally uncharacterised families are accessible to high-throughput structural genomics pipelines, specifically those targeting families containing multiple prokaryotic orthologues. In measuring the domain coverage of the genomes, we show the benefits of selecting targets from both structurally uncharacterised domain families, whilst in addition, pursuing additional targets from large structurally characterised protein superfamilies. Conclusion This work suggests that such a combined approach to target selection is essential if structural genomics is to achieve a comprehensive structural coverage of the genomes, leading to greater insights into structure and the mechanisms that underlie protein evolution.
Collapse
Affiliation(s)
- Russell L Marsden
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK
| | - Tony A Lewis
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK
| | - Christine A Orengo
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
29
|
Lesnyak DV, Osipiuk J, Skarina T, Sergiev PV, Bogdanov AA, Edwards A, Savchenko A, Joachimiak A, Dontsova OA. Methyltransferase that modifies guanine 966 of the 16 S rRNA: functional identification and tertiary structure. J Biol Chem 2007; 282:5880-7. [PMID: 17189261 PMCID: PMC2885967 DOI: 10.1074/jbc.m608214200] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
N(2)-Methylguanine 966 is located in the loop of Escherichia coli 16 S rRNA helix 31, forming a part of the P-site tRNA-binding pocket. We found yhhF to be a gene encoding for m(2)G966 specific 16 S rRNA methyltransferase. Disruption of the yhhF gene by kanamycin resistance marker leads to a loss of modification at G966. The modification could be rescued by expression of recombinant protein from the plasmid carrying the yhhF gene. Moreover, purified m(2)G966 methyltransferase, in the presence of S-adenosylomethionine (AdoMet), is able to methylate 30 S ribosomal subunits that were purified from yhhF knock-out strain in vitro. The methylation is specific for G966 base of the 16 S rRNA. The m(2)G966 methyltransferase was crystallized, and its structure has been determined and refined to 2.05A(.) The structure closely resembles RsmC rRNA methyltransferase, specific for m(2)G1207 of the 16 S rRNA. Structural comparisons and analysis of the enzyme active site suggest modes for binding AdoMet and rRNA to m(2)G966 methyltransferase. Based on the experimental data and current nomenclature the protein expressed from the yhhF gene was renamed to RsmD. A model for interaction of RsmD with ribosome has been proposed.
Collapse
Affiliation(s)
- Dmitry V. Lesnyak
- Department of Bioinformatics and Bioengineering, Moscow State University, Moscow 119992, Russia
| | - Jerzy Osipiuk
- Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, Argonne, Illinois 60439
| | - Tatiana Skarina
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5G IL6, Canada
| | - Petr V. Sergiev
- Department of Chemistry and A.N. Belozersky Institute of Physico-Chemical Biology, Moscow State University, Moscow 119992, Russia
| | - Alexey A. Bogdanov
- Department of Chemistry and A.N. Belozersky Institute of Physico-Chemical Biology, Moscow State University, Moscow 119992, Russia
| | - Aled Edwards
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5G IL6, Canada
| | - Alexei Savchenko
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5G IL6, Canada
| | - Andrzej Joachimiak
- Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, Argonne, Illinois 60439
| | - Olga A. Dontsova
- Department of Chemistry and A.N. Belozersky Institute of Physico-Chemical Biology, Moscow State University, Moscow 119992, Russia
| |
Collapse
|
30
|
Abstract
Zinc is one of the metal ions essential for life, as it is required for the proper functioning of a large number of proteins. Despite its importance, the annotation of zinc-binding proteins in gene banks or protein domain databases still has significant room for improvement. In the present work, we compiled a list of known zinc-binding protein domains and of known zinc-binding sequence motifs (zinc-binding patterns), and then used them jointly to analyze the proteome of 57 different organisms to obtain an overview of zinc usage by archaeal, bacterial, and eukaryotic organisms. Zinc-binding proteins are an abundant fraction of these proteomes, ranging between 4% and 10%. The number of zinc-binding proteins correlates linearly with the total number of proteins encoded by the genome of an organism, but the proportionality constant of Eukaryota (8.8%) is significantly higher than that observed in Bacteria and Archaea (from 5% to 6%). Most of this enrichment is due to the larger portfolio of regulatory proteins in Eukaryota.
Collapse
Affiliation(s)
- Claudia Andreini
- Magnetic Resonance Center (CERM), University of Florence, Via L. Sacconi 6, 50019 Sesto Fiorentino, Italy
| | | | | | | |
Collapse
|
31
|
Rossi A, Marti-Renom MA, Sali A. Localization of binding sites in protein structures by optimization of a composite scoring function. Protein Sci 2006; 15:2366-80. [PMID: 16963645 PMCID: PMC2242385 DOI: 10.1110/ps.062247506] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The rise in the number of functionally uncharacterized protein structures is increasing the demand for structure-based methods for functional annotation. Here, we describe a method for predicting the location of a binding site of a given type on a target protein structure. The method begins by constructing a scoring function, followed by a Monte Carlo optimization, to find a good scoring patch on the protein surface. The scoring function is a weighted linear combination of the z-scores of various properties of protein structure and sequence, including amino acid residue conservation, compactness, protrusion, convexity, rigidity, hydrophobicity, and charge density; the weights are calculated from a set of previously identified instances of the binding-site type on known protein structures. The scoring function can easily incorporate different types of information useful in localization, thus increasing the applicability and accuracy of the approach. To test the method, 1008 known protein structures were split into 20 different groups according to the type of the bound ligand. For nonsugar ligands, such as various nucleotides, binding sites were correctly identified in 55%-73% of the cases. The method is completely automated (http://salilab.org/patcher) and can be applied on a large scale in a structural genomics setting.
Collapse
Affiliation(s)
- Andrea Rossi
- Department of Biopharmaceutical Sciences and Pharmaceutical Chemistry, California Institute for Quantitative Biomedical Research, University of California, San Francisco, California 94143-2552, USA.
| | | | | |
Collapse
|
32
|
Osipiuk J, Maltseva N, Dementieva I, Clancy S, Collart F, Joachimiak A. Structure of YidB protein from Shigella flexneri shows a new fold with homeodomain motif. Proteins 2006; 65:509-13. [PMID: 16927377 PMCID: PMC2885951 DOI: 10.1002/prot.21054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
| | | | | | | | | | - Andrzej Joachimiak
- Correspondence to: Dr. Andrzej Joachimiak, Biosciences Division, Midwest Center for Structural Genomics and Structural Biology Center, Argonne National Laboratory, 9700 S Cass Ave. Argonne, IL 60439.
| |
Collapse
|
33
|
Deng H, Chen G, Yang W, Yang JJ. Predicting calcium-binding sites in proteins - a graph theory and geometry approach. Proteins 2006; 64:34-42. [PMID: 16617426 DOI: 10.1002/prot.20973] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Identifying calcium-binding sites in proteins is one of the first steps towards predicting and understanding the role of calcium in biological systems for protein structure and function studies. Due to the complexity and irregularity of calcium-binding sites, a fast and accurate method for predicting and identifying calcium-binding protein is needed. Here we report our development of a new fast algorithm (GG) to detect calcium-binding sites. The GG algorithm uses a graph theory algorithm to find oxygen clusters of the protein and a geometric algorithm to identify the center of these clusters. A cluster of four or more oxygen atoms has a high potential for calcium binding. High performance with about 90% site sensitivity and 80% site selectivity has been obtained for three datasets containing a total of 123 proteins. The results suggest that a sphere of a certain size with four or more oxygen atoms on the surface and without other atoms inside is necessary and sufficient for quickly identifying the majority of the calcium-binding sites with high accuracy. Our finding opens a new avenue to visualize and analyze calcium-binding sites in proteins facilitating the prediction of functions from structural genomic information.
Collapse
Affiliation(s)
- Hai Deng
- Department of Computer Science, Georgia State University, Atlanta, Georgia 30302, USA
| | | | | | | |
Collapse
|
34
|
Kim Y, Maltseva N, Dementieva I, Collart F, Holzle D, Joachimiak A. Crystal structure of hypothetical protein YfiH from Shigella flexneri at 2 A resolution. Proteins 2006; 63:1097-101. [PMID: 16498617 PMCID: PMC2792012 DOI: 10.1002/prot.20589] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
| | | | | | | | | | - Andrzej Joachimiak
- Correspondence to: Andrzej Joachimiak, Biosciences Division, Midwest Center for Structural Genomics and Structural Biology, Center, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439.
| |
Collapse
|
35
|
Nocek B, Cuff M, Evdokimova E, Edwards A, Joachimiak A, Savchenko A. 1.6 A crystal structure of a PA2721 protein from pseudomonas aeruginosa--a potential drug-resistance protein. Proteins 2006; 63:1102-5. [PMID: 16493657 PMCID: PMC2792011 DOI: 10.1002/prot.20659] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- B. Nocek
- Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, Argonne, Illinois
| | - M. Cuff
- Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, Argonne, Illinois
| | - E. Evdokimova
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada
| | - A. Edwards
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada
- Clinical Genomics Centre/Proteomics, University Health Network, Toronto, Ontario, Canada
- Correspondence to: Andrzej Joachimiak, Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, 9700 South Cass Avenue, Building 202, Argonne, IL 60439.
| | - A. Joachimiak
- Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, Argonne, Illinois
- The University of Chicago, Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois
- Correspondence to: Andrzej Joachimiak, Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, 9700 South Cass Avenue, Building 202, Argonne, IL 60439.
| | - A. Savchenko
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
36
|
Yura K, Yamaguchi A, Go M. Coverage of whole proteome by structural genomics observed through protein homology modeling database. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2006; 7:65-76. [PMID: 17146617 PMCID: PMC1769342 DOI: 10.1007/s10969-006-9010-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2006] [Accepted: 08/08/2006] [Indexed: 11/07/2022]
Abstract
We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE ( http://daisy.nagahama-i-bio.ac.jp/Famsbase/ ), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics.
Collapse
Affiliation(s)
- Kei Yura
- Quantum Bioinformatics Team, Center for Computational Science and Engineering, Japan Atomic Energy Agency, Kyoto 619-0215, Japan.
| | | | | |
Collapse
|
37
|
Zhang R, Minh T, Lezondra L, Korolev S, Moy S, Collart F, Joachimiak A. 1.6 A crystal structure of YteR protein from Bacillus subtilis, a predicted lyase. Proteins 2006; 60:561-5. [PMID: 15906318 PMCID: PMC2792013 DOI: 10.1002/prot.20410] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- R. Zhang
- Structural Biology Center, Midwest Center for Structural Genomics, Biosciences, Argonne National Laboratory
| | - T. Minh
- Structural Biology Center, Midwest Center for Structural Genomics, Biosciences, Argonne National Laboratory
- College of Chemistry, University of California at Berkeley, Berkeley, CA
| | - L. Lezondra
- Structural Biology Center, Midwest Center for Structural Genomics, Biosciences, Argonne National Laboratory
| | - S. Korolev
- Structural Biology Center, Midwest Center for Structural Genomics, Biosciences, Argonne National Laboratory
| | - S.F. Moy
- Structural Biology Center, Midwest Center for Structural Genomics, Biosciences, Argonne National Laboratory
| | - F. Collart
- Structural Biology Center, Midwest Center for Structural Genomics, Biosciences, Argonne National Laboratory
| | - A. Joachimiak
- Structural Biology Center, Midwest Center for Structural Genomics, Biosciences, Argonne National Laboratory
- Correspondence to: Andrzej Joachimiak, Structural Biology Center, Midwest Center for Structural Genomics, Biosciences, Argonne National Laboratory, 9700 South Cass Ave., Bldg. 202, Argonne, IL 60439.
| |
Collapse
|
38
|
Fleming K, Kelley LA, Islam SA, MacCallum RM, Muller A, Pazos F, Sternberg MJ. The proteome: structure, function and evolution. Philos Trans R Soc Lond B Biol Sci 2006; 361:441-51. [PMID: 16524832 PMCID: PMC1609342 DOI: 10.1098/rstb.2005.1802] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
This paper reports two studies to model the inter-relationships between protein sequence, structure and function. First, an automated pipeline to provide a structural annotation of proteomes in the major genomes is described. The results are stored in a database at Imperial College, London (3D-GENOMICS) that can be accessed at www.sbg.bio.ic.ac.uk. Analysis of the assignments to structural superfamilies provides evolutionary insights. 3D-GENOMICS is being integrated with related proteome annotation data at University College London and the European Bioinformatics Institute in a project known as e-protein (http://www.e-protein.org/). The second topic is motivated by the developments in structural genomics projects in which the structure of a protein is determined prior to knowledge of its function. We have developed a new approach PHUNCTIONER that uses the gene ontology (GO) classification to supervise the extraction of the sequence signal responsible for protein function from a structure-based sequence alignment. Using GO we can obtain profiles for a range of specificities described in the ontology. In the region of low sequence similarity (around 15%), our method is more accurate than assignment from the closest structural homologue. The method is also able to identify the specific residues associated with the function of the protein family.
Collapse
Affiliation(s)
- Keiran Fleming
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
| | - Lawrence A Kelley
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Suhail A Islam
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Robert M MacCallum
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Arne Muller
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Florencio Pazos
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
| | - Michael J.E Sternberg
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
- Author for correspondence ()
| |
Collapse
|
39
|
Ferré S, King RD. Finding Motifs in Protein Secondary Structure for Use in Function Prediction. J Comput Biol 2006; 13:719-31. [PMID: 16706721 DOI: 10.1089/cmb.2006.13.719] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This paper presents a novel algorithm for the discovery of biological sequence motifs. Our motivation is the prediction of gene function. We seek to discover motifs and combinations of motifs in the secondary structure of proteins for application to the understanding and prediction of functional classes. The motifs found by our algorithm allow both flexible length structural elements and flexible length gaps and can be of arbitrary length. The algorithm is based on neither top-down nor bottom-up search, but rather is dichotomic. It is also "anytime," so that fixed termination of the search is not necessary. We have applied our algorithm to yeast sequence data to discover rules predicting function classes from secondary structure. These resultant rules are informative, consistent with known biology, and a contribution to scientific knowledge. Surprisingly, the rules also demonstrate that secondary structure prediction algorithms are effective for membrane proteins and suggest that the association between secondary structure and function is stronger in membrane proteins than globular ones. We demonstrate that our algorithm can successfully predict gene function directly from predicted secondary structure; e.g., we correctly predict the gene YGL124c to be involved in the functional class "cytoplasmic and nuclear degradation." Datasets and detailed results (generated motifs, rules, evaluation on test dataset, and predictions on unknown dataset) are available at www.aber.ac.uk/compsci/Research/bio/dss/yeast.ss.mips/, and www.genepredictions.org.
Collapse
Affiliation(s)
- Sébastien Ferré
- Irisa/Université de Rennes 1, Campus de Beaulieu, 35042 Rennes cedex, France.
| | | |
Collapse
|
40
|
Ofran Y, Punta M, Schneider R, Rost B. Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery. Drug Discov Today 2006; 10:1475-82. [PMID: 16243268 DOI: 10.1016/s1359-6446(05)03621-4] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Every entirely sequenced genome reveals 100 s to 1000 s of protein sequences for which the only annotation available is 'hypothetical protein'. Thus, in the human genome and in the genomes of pathogenic agents there could be 1000 s of potential, unexplored drug targets. Computational prediction of protein function can play a role in studying these targets. We shall review the challenges, research approaches and recently developed tools in the field of computational function-prediction and we will discuss the ways these issues can change the process of drug discovery.
Collapse
Affiliation(s)
- Yanay Ofran
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.
| | | | | | | |
Collapse
|
41
|
Abstract
The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74 000 models for ∼240 proteins. The system is accessible at and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data.
Collapse
Affiliation(s)
| | | | - Domenico Cozzetto
- Department of Biochemical Sciences, University ‘La Sapienza’P.le Aldo Moro, 5, I-00185 Rome, Italy
| | - Ivano Giuseppe Talamo
- Department of Biochemical Sciences, University ‘La Sapienza’P.le Aldo Moro, 5, I-00185 Rome, Italy
| | - Anna Tramontano
- Department of Biochemical Sciences, University ‘La Sapienza’P.le Aldo Moro, 5, I-00185 Rome, Italy
- Istituto Pasteur—Fondazione Cenci Bolognetti, University ‘La Sapienza’P.le Aldo Moro, 5, I-00185 Rome, Italy
- To whom correspondence should be addressed. Tel: +39 0649910556; Fax: +39 0649910717;
| |
Collapse
|
42
|
Marsden RL, Lee D, Maibaum M, Yeats C, Orengo CA. Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Nucleic Acids Res 2006; 34:1066-80. [PMID: 16481312 PMCID: PMC1373602 DOI: 10.1093/nar/gkj494] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the proteomes can be assigned to fewer than 6000 well-characterized domain families with the remaining domain-like regions belonging to a much larger number of small uncharacterized families that are largely species specific. Our comprehensive domain annotation of 203 genomes enables us to provide more accurate estimates of the number of multi-domain proteins found in the three kingdoms of life than previous calculations. We find that 67% of eukaryotic sequences are multi-domain compared with 56% of sequences in prokaryotes. By measuring the domain coverage of genome sequences, we show that the structural genomics initiatives should aim to provide structures for less than a thousand structurally uncharacterized Pfam families to achieve reasonable structural annotation of the genomes. However, in large families, additional structures should be determined as these would reveal more about the evolution of the family and enable a greater understanding of how function evolves.
Collapse
Affiliation(s)
- Russell L Marsden
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK.
| | | | | | | | | |
Collapse
|
43
|
Gold ND, Jackson RM. Fold Independent Structural Comparisons of Protein–Ligand Binding Sites for Exploring Functional Relationships. J Mol Biol 2006; 355:1112-24. [PMID: 16359705 DOI: 10.1016/j.jmb.2005.11.044] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2005] [Revised: 11/11/2005] [Accepted: 11/15/2005] [Indexed: 11/23/2022]
Abstract
The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.
Collapse
Affiliation(s)
- Nicola D Gold
- Institute of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9JT, UK
| | | |
Collapse
|
44
|
Abstract
An increasing attention has been dedicated to the characterization of complex networks within the protein world. This work is reporting how we uncovered networked structures that reflected the structural similarities among protein binding sites. First, a 211 binding sites dataset has been compiled by removing the redundant proteins in the Protein Ligand Database (PLD) (http://www-mitchell.ch.cam.ac.uk/pld/). Using a clique detection algorithm we have performed all-against-all binding site comparisons among the 211 available ones. Within the set of nodes representing each binding site an edge was added whenever a pair of binding sites had a similarity higher than a threshold value. The generated similarity networks revealed that many nodes had few links and only few were highly connected, but due to the limited data available it was not possible to definitively prove a scale-free architecture. Within the same dataset, the binding site similarity networks were compared with the networks of sequence and fold similarity networks. In the protein world, indications were found that structure is better conserved than sequence, but on its own, sequence was better conserved than the subset of functional residues forming the binding site. Because a binding site is strongly linked with protein function, the identification of protein binding site similarity networks could accelerate the functional annotation of newly identified genes. In view of this we have discussed several potential applications of binding site similarity networks, such as the construction of novel binding site classification databases, as well as the implications for protein molecular design in general and computational chemogenomics in particular.
Collapse
Affiliation(s)
- Ziding Zhang
- Nestlé Research Center, Nestec Ltd, BioAnalytical Science, CH-1000 Lausanne 26, Switzerland
| | | |
Collapse
|
45
|
Najmanovich RJ, Torrance JW, Thornton JM. Prediction of protein function from structure: insights from methods for the detection of local structural similarities. Biotechniques 2005; 38:847, 849, 851. [PMID: 16018542 DOI: 10.2144/05386te01] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
46
|
Takeuchi H, Rigden DJ, Ebrahimi B, Turner PC, Rees HH. Regulation of ecdysteroid signalling during Drosophila development: identification, characterization and modelling of ecdysone oxidase, an enzyme involved in control of ligand concentration. Biochem J 2005; 389:637-45. [PMID: 15813704 PMCID: PMC1180713 DOI: 10.1042/bj20050498] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The steroidal moulting hormones (ecdysteroids) mediate developmental transitions in insects, and their regulation is mainly controlled by the production and inactivation of these steroid hormones at the appropriate developmental times. One route of metabolism of ecdysteroids in insects involves EO (ecdysone oxidase)-catalysed conversion into 3-dehydroecdysteroid, which undergoes reduction to the corresponding 3-epiecdysteroid. By a twin-stranded bioinformatics approach, employing both phylogenomics and model structure-based analysis, we first predicted that DmEO (the EO of Drosophila melanogaster) corresponds to the protein product of gene CG9504. When CG9504 was expressed in COS7 cells, significant conversion of ecdysone into 3-dehydroecdysone was observed. Quantitative PCR and enzyme assay showed that DmEO was mainly expressed in the midgut during the late instars at a time corresponding to a hormone titre peak. DmEO shares only 27% amino acid sequence identity with Spodoptera littoralis (Lepidoptera) EO, yet key substrate-binding residues are well conserved. A model of DmEO is consistent with an inability to catalyse reaction of cholesterol derivatives. The significance of DmEO in ligand activation is discussed in relation to new evidence suggesting that 3-dehydro- and 3-epiecdysteroids may be functionally active as ligands in a novel, atypical ecdysteroid signalling pathway involving the Drosophila orphan nuclear receptor, DHR38, rather than being merely hormone inactivation products.
Collapse
Affiliation(s)
- Hajime Takeuchi
- Cellular Regulation and Signalling Division, School of Biological Sciences, University of Liverpool, Biosciences Building, Crown Street, Liverpool L69 7ZB, UK.
| | | | | | | | | |
Collapse
|
47
|
Valencia A. Automatic annotation of protein function. Curr Opin Struct Biol 2005; 15:267-74. [PMID: 15922590 DOI: 10.1016/j.sbi.2005.05.010] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2005] [Revised: 04/29/2005] [Accepted: 05/10/2005] [Indexed: 11/22/2022]
Abstract
The annotation of protein function at genomic scale is essential for day-to-day work in biology and for any systematic approach to the modeling of biological systems. Currently, functional annotation is essentially based on the expansion of the relatively small number of experimentally determined functions to large collections of proteins. The task of systematic annotation faces formidable practical problems related to the accuracy of the input experimental information, the reliability of current systems for transferring information between related sequences, and the reproducibility of the links between database information and the original experiments reported in publications. These technical difficulties merely lie on the surface of the deeper problem of the evolution of protein function in the context of protein sequences and structures. Given the mixture of technical and scientific challenges, it is not surprising that errors are introduced, and expanded, in database annotations. In this situation, a more realistic option is the development of a reliability index for database annotations, instead of depending exclusively on efforts to correct databases. Several groups have attempted to compare the database annotations of similar proteins, which constitutes the first steps toward the calibration of the relationship between sequence and annotation space.
Collapse
Affiliation(s)
- Alfonso Valencia
- Protein Design Group, National Center for Biotechnology, CNB-CSIC, Darwin 3, Cantoblanco, 28049 Madrid, Spain.
| |
Collapse
|
48
|
Laskowski RA, Watson JD, Thornton JM. ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res 2005; 33:W89-93. [PMID: 15980588 PMCID: PMC1160175 DOI: 10.1093/nar/gki414] [Citation(s) in RCA: 471] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
ProFunc () is a web server for predicting the likely function of proteins whose 3D structure is known but whose function is not. Users submit the coordinates of their structure to the server in PDB format. ProFunc makes use of both existing and novel methods to analyse the protein's sequence and structure identifying functional motifs or close relationships to functionally characterized proteins. A summary of the analyses provides an at-a-glance view of what each of the different methods has found. More detailed results are available on separate pages. Often where one method has failed to find anything useful another may be more forthcoming. The server is likely to be of most use in structural genomics where a large proportion of the proteins whose structures are solved are of hypothetical proteins of unknown function. However, it may also find use in a comparative analysis of members of large protein families. It provides a convenient compendium of sequence and structural information that often hold vital functional clues to be followed up experimentally.
Collapse
Affiliation(s)
- Roman A Laskowski
- European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
49
|
Watson JD, Laskowski RA, Thornton JM. Predicting protein function from sequence and structural data. Curr Opin Struct Biol 2005; 15:275-84. [PMID: 15963890 DOI: 10.1016/j.sbi.2005.04.003] [Citation(s) in RCA: 203] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2005] [Revised: 02/04/2005] [Accepted: 04/18/2005] [Indexed: 10/25/2022]
Abstract
When a protein's function cannot be experimentally determined, it can often be inferred from sequence similarity. Should this process fail, analysis of the protein structure can provide functional clues or confirm tentative functional assignments inferred from the sequence. Many structure-based approaches exist (e.g. fold similarity, three-dimensional templates), but as no single method can be expected to be successful in all cases, a more prudent approach involves combining multiple methods. Several automated servers that integrate evidence from multiple sources have been released this year and particular improvements have been seen with methods utilizing the Gene Ontology functional annotation schema.
Collapse
Affiliation(s)
- James D Watson
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
50
|
Crystal structure of the hypothetical protein TA1238 from Thermoplasma acidophilum: a new type of helical super-bundle. ACTA ACUST UNITED AC 2005. [DOI: 10.1007/s10969-004-3789-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|