1
|
Aina A, Hsueh SCC, Plotkin SS. PROTHON: A Local Order Parameter-Based Method for Efficient Comparison of Protein Ensembles. J Chem Inf Model 2023. [PMID: 37178169 DOI: 10.1021/acs.jcim.3c00145] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
The comparison of protein conformational ensembles is of central importance in structural biology. However, there are few computational methods for ensemble comparison, and those that are readily available, such as ENCORE, utilize methods that are sufficiently computationally expensive to be prohibitive for large ensembles. Here, a new method is presented for efficient representation and comparison of protein conformational ensembles. The method is based on the representation of a protein ensemble as a vector of probability distribution functions (pdfs), with each pdf representing the distribution of a local structural property such as the number of contacts between Cβ atoms. Dissimilarity between two conformational ensembles is quantified by the Jensen-Shannon distance between the corresponding set of probability distribution functions. The method is validated for conformational ensembles generated by molecular dynamics simulations of ubiquitin, as well as experimentally derived conformational ensembles of a 130 amino acid truncated form of human tau protein. In the ubiquitin ensemble data set, the method was up to 88 times faster than the existing ENCORE software, while simultaneously utilizing 48 times fewer computing cores. We make the method available as a Python package, called PROTHON, and provide a GitHub page with the Python source code at https://github.com/PlotkinLab/Prothon.
Collapse
Affiliation(s)
- Adekunle Aina
- Department of Physics and Astronomy, The University of British Columbia, Vancouver, BC V6T 1Z1, Canada
| | - Shawn C C Hsueh
- Department of Physics and Astronomy, The University of British Columbia, Vancouver, BC V6T 1Z1, Canada
| | - Steven S Plotkin
- Department of Physics and Astronomy, The University of British Columbia, Vancouver, BC V6T 1Z1, Canada
- Genome Science and Technology Program, The University of British Columbia, Vancouver, BC V6T 1Z1, Canada
| |
Collapse
|
2
|
Nair A, Chauhan P, Saha B, Kubatzky KF. Conceptual Evolution of Cell Signaling. Int J Mol Sci 2019; 20:E3292. [PMID: 31277491 PMCID: PMC6651758 DOI: 10.3390/ijms20133292] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 06/26/2019] [Accepted: 06/28/2019] [Indexed: 12/27/2022] Open
Abstract
During the last 100 years, cell signaling has evolved into a common mechanism for most physiological processes across systems. Although the majority of cell signaling principles were initially derived from hormonal studies, its exponential growth has been supported by interdisciplinary inputs, e.g., from physics, chemistry, mathematics, statistics, and computational fields. As a result, cell signaling has grown out of scope for any general review. Here, we review how the messages are transferred from the first messenger (the ligand) to the receptor, and then decoded with the help of cascades of second messengers (kinases, phosphatases, GTPases, ions, and small molecules such as cAMP, cGMP, diacylglycerol, etc.). The message is thus relayed from the membrane to the nucleus where gene expression ns, subsequent translations, and protein targeting to the cell membrane and other organelles are triggered. Although there are limited numbers of intracellular messengers, the specificity of the response profiles to the ligands is generated by the involvement of a combination of selected intracellular signaling intermediates. Other crucial parameters in cell signaling are its directionality and distribution of signaling strengths in different pathways that may crosstalk to adjust the amplitude and quality of the final effector output. Finally, we have reflected upon its possible developments during the coming years.
Collapse
Affiliation(s)
- Arathi Nair
- National Center for Cell Science (NCCS), Ganeshkhind, Pune 411007, India
| | - Prashant Chauhan
- National Center for Cell Science (NCCS), Ganeshkhind, Pune 411007, India
| | - Bhaskar Saha
- National Center for Cell Science (NCCS), Ganeshkhind, Pune 411007, India.
| | - Katharina F Kubatzky
- Zentrum für Infektiologie, Medizinische Mikrobiologie und Hygiene, Universitätsklinikum Heidelberg, Im Neuenheimer Feld 324, 69120 Heidelberg, Germany.
| |
Collapse
|
3
|
Identification of Lutzomyia longipalpis Odorant Binding Protein Modulators by Comparative Modeling, Hierarchical Virtual Screening, and Molecular Dynamics. J CHEM-NY 2018. [DOI: 10.1155/2018/4173479] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Visceral leishmaniasis (VL) is the second most important vector-borne disease in the world. It is transmitted by Lutzomyia longipalpis in America; therefore, controlling the vector is essential to prevent the disease, especially using traps with chemical attractants. It is known that odorant binding proteins (OBPs) act at the first odor selection level, so in silico methodology was used to identify putative vector chemical modulators based on OBPs on known ligand structures. Therefore, 3D structures of L. longipalpis OBP were predicted through different comparative modeling methods. The best model was subjected to molecular dynamics studies. Then, a hierarchical virtual screening approach filtered OBP modulator-like compounds from ZINC12 biogenic database based in global chemical space, using principal components from ChemGPS-NP server. Such compounds then were evaluated and ranked according to their affinity with the OBP orthosteric site by molecular docking in DOCK 6.7. The compounds were scored by Grid Score function and top five ranked poses had their intermolecular complex interactions analyzed in PLIP server. Most ligands in the top of the rank were lysophospholipids, which could potentially interact with the OBP hydrophobic pocket through Phe72, Tyr76, Ile79, Ala87, Lys88, Asp92, Phe61, Leu75, Trp113, His120, and Phe122 residues and H-bonding with His120 and Phe122. Next, compounds in the top of the rank were evaluated by 50 ns MD and the results showed that the phosphate group of these compounds could set a salt bridge with His110. Additionally, Tyr76, Ala87, Met91, Trp113, and Phe122 were important to hydrophobic interactions with the ligand. These results highlight the importance of accurate assessments such as MD studies in order to analyze the docking results in the identification of new odorant modulators.
Collapse
|
4
|
Similarity/dissimilarity analysis of protein structures based on Markov random fields. Comput Biol Chem 2018; 75:45-53. [PMID: 29747075 DOI: 10.1016/j.compbiolchem.2018.04.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Revised: 03/08/2018] [Accepted: 04/23/2018] [Indexed: 11/21/2022]
Abstract
Protein Structure Similarity plays an important role in study on functional properties of proteins and evolutionary study. Many efficient methods have been proposed to advance protein structural comparison, but there are still some challenges in the contact strength definitions and similarity measures. In this work, we schemed out a new method to analyze the similarity/dissimilarity of the protein structures based on Markov random fields. We evaluated the proposed method with two experiments and compared it with the competing methods The results indicate that the proposed method exhibits a strong ability to detect the similarities/dissimilarities among the conformation of different cyclic peptides and protein structures. We also found that the alpha-C, oxygen O and N allow us to extract more conserved structures of the proteins, and Markov random fields with 2-point cliques (V) and orders 3 and 1 are more efficient in detecting the similarities/dissimilarities among different protein structures. This understanding can be used to design more powerful methods for similarities/dissimilarities analysis of different protein structures.
Collapse
|
5
|
From cheminformatics to structure-based design: Web services and desktop applications based on the NAOMI library. J Biotechnol 2017; 261:207-214. [DOI: 10.1016/j.jbiotec.2017.06.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2017] [Revised: 05/31/2017] [Accepted: 06/07/2017] [Indexed: 02/06/2023]
|
6
|
Adeola HA, Van Wyk JC, Arowolo A, Ngwanya RM, Mkentane K, Khumalo NP. Emerging Diagnostic and Therapeutic Potentials of Human Hair Proteomics. Proteomics Clin Appl 2017; 12. [PMID: 28960873 DOI: 10.1002/prca.201700048] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Revised: 06/09/2017] [Indexed: 01/22/2023]
Abstract
The use of noninvasive human substrates to interrogate pathophysiological conditions has become essential in the post- Human Genome Project era. Due to its high turnover rate, and its long term capability to incorporate exogenous and endogenous substances from the circulation, hair testing is emerging as a key player in monitoring long term drug compliance, chronic alcohol abuse, forensic toxicology, and biomarker discovery, among other things. Novel high-throughput 'omics based approaches like proteomics have been underutilized globally in comprehending human hair morphology and its evolving use as a diagnostic testing substrate in the era of precision medicine. There is paucity of scientific evidence that evaluates the difference in drug incorporation into hair based on lipid content, and very few studies have addressed hair growth rates, hair forms, and the biological consequences of hair grooming or bleaching. It is apparent that protein-based identification using the human hair proteome would play a major role in understanding these parameters akin to DNA single nucleotide polymorphism profiling, up to single amino acid polymorphism resolution. Hence, this work seeks to identify and discuss the progress made thus far in the field of molecular hair testing using proteomic approaches, and identify ways in which proteomics would improve the field of hair research, considering that the human hair is mostly composed of proteins. Gaps in hair proteomics research are identified and the potential of hair proteomics in establishing a historic medical repository of normal and disease-specific proteome is also discussed.
Collapse
Affiliation(s)
- Henry A Adeola
- Division of Dermatology, Department of Medicine, Faculty of Health Sciences and Groote Schuur Hospital, University of Cape Town, Cape Town, South Africa.,Hair and Skin Research Laboratory, Groote Schuur Hospital, Cape Town, South Africa
| | - Jennifer C Van Wyk
- Division of Dermatology, Department of Medicine, Faculty of Health Sciences and Groote Schuur Hospital, University of Cape Town, Cape Town, South Africa.,Hair and Skin Research Laboratory, Groote Schuur Hospital, Cape Town, South Africa
| | - Afolake Arowolo
- Division of Dermatology, Department of Medicine, Faculty of Health Sciences and Groote Schuur Hospital, University of Cape Town, Cape Town, South Africa.,Hair and Skin Research Laboratory, Groote Schuur Hospital, Cape Town, South Africa
| | - Reginald M Ngwanya
- Division of Dermatology, Department of Medicine, Faculty of Health Sciences and Groote Schuur Hospital, University of Cape Town, Cape Town, South Africa
| | - Khwezikazi Mkentane
- Division of Dermatology, Department of Medicine, Faculty of Health Sciences and Groote Schuur Hospital, University of Cape Town, Cape Town, South Africa.,Hair and Skin Research Laboratory, Groote Schuur Hospital, Cape Town, South Africa
| | - Nonhlanhla P Khumalo
- Division of Dermatology, Department of Medicine, Faculty of Health Sciences and Groote Schuur Hospital, University of Cape Town, Cape Town, South Africa.,Hair and Skin Research Laboratory, Groote Schuur Hospital, Cape Town, South Africa
| |
Collapse
|
7
|
Mahajan S, de Brevern AG, Sanejouand YH, Srinivasan N, Offmann B. Use of a structural alphabet to find compatible folds for amino acid sequences. Protein Sci 2014; 24:145-53. [PMID: 25297700 DOI: 10.1002/pro.2581] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 10/06/2014] [Indexed: 01/01/2023]
Abstract
The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as "Protein Blocks" (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at http://www.bo-protscience.fr/forsa.
Collapse
Affiliation(s)
- Swapnil Mahajan
- Université de La Réunion, DSIMB, UMR-S S1134, Saint Denis Messag Cedex 09, La Réunion, F-97715, France; INSERM, UMR-S 1134, DSIMB, F-75739, Paris, France; Laboratoire d'Excellence, GR-Ex, Paris, F-75739, France; Université de Nantes, UFIP CNRS UMR 6286 Faculté des Sciences et Techniques, 2 rue de la Houssinière, 44392, Nantes Cedex 03, France
| | | | | | | | | |
Collapse
|
8
|
Wang Z, Yin P, Lee JS, Parasuram R, Somarowthu S, Ondrechen MJ. Protein function annotation with Structurally Aligned Local Sites of Activity (SALSAs). BMC Bioinformatics 2013; 14 Suppl 3:S13. [PMID: 23514271 PMCID: PMC3584854 DOI: 10.1186/1471-2105-14-s3-s13] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background The prediction of biochemical function from the 3D structure of a protein has proved to be much more difficult than was originally foreseen. A reliable method to test the likelihood of putative annotations and to predict function from structure would add tremendous value to structural genomics data. We report on a new method, Structurally Aligned Local Sites of Activity (SALSA), for the prediction of biochemical function based on a local structural match at the predicted catalytic or binding site. Results Implementation of the SALSA method is described. For the structural genomics protein PY01515 (PDB ID 2aqw) from Plasmodium yoelii, it is shown that the putative annotation, Orotidine 5'-monophosphate decarboxylase (OMPDC), is most likely correct. SALSA analysis of YP_001304206.1 (PDB ID 3h3l), a putative sugar hydrolase from Parabacteroides distasonis, shows that its active site does not bear close resemblance to any previously characterized member of its superfamily, the Concanavalin A-like lectins/glucanases. It is noted that three residues in the active site of the thermophilic beta-1,4-xylanase from Nonomuraea flexuosa (PDB ID 1m4w), Y78, E87, and E176, overlap with POOL-predicted residues of similar type, Y168, D153, and E232, in YP_001304206.1. The substrate recognition regions of the two proteins are rather different, suggesting that YP_001304206.1 is a new functional type within the superfamily. A structural genomics protein from Mycobacterium avium (PDB ID 3q1t) has been reported to be an enoyl-CoA hydratase (ECH), but SALSA analysis shows a poor match between the predicted residues for the SG protein and those of known ECHs. A better local structural match is obtained with Anabaena beta-diketone hydrolase (ABDH), a known β-diketone hydrolase from Cyanobacterium anabaena (PDB ID 2j5s). This suggests that the reported ECH function of the SG protein is incorrect and that it is more likely a β-diketone hydrolase. Conclusions A local site match provides a more compelling function prediction than that obtainable from a simple 3D structure match. The present method can confirm putative annotations, identify misannotation, and in some cases suggest a more probable annotation.
Collapse
Affiliation(s)
- Zhouxi Wang
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA
| | | | | | | | | | | |
Collapse
|
9
|
Does Computational Biology Help us to Understand the Molecular Phylogenetics and Evolution of Cluster of Differentiation (CD) Proteins? Protein J 2013; 32:143-54. [DOI: 10.1007/s10930-013-9466-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
10
|
Abstract
An overwhelming array of structural variants has evolved from a comparatively small number of protein structural domains; which has in turn facilitated an expanse of functional derivatives. Herein, I review the primary mechanisms which have contributed to the vastness of our existing, and expanding, protein repertoires. Protein function prediction strategies, both sequence and structure based, are also discussed and their associated strengths and weaknesses assessed.
Collapse
Affiliation(s)
- Roy D Sleator
- Department of Biological Sciences, Cork Institute of Technology, Cork, Ireland.
| |
Collapse
|
11
|
Abstract
The recent explosion in the number and diversity of novel proteins identified by the large-scale "omics" technologies poses new and important questions to the blossoming field of systems biology--what are all these proteins, how did they come about, and most importantly, what do they do? From a comparatively small number of protein structural domains a staggering array of structural variants has evolved, which has in turn facilitated an expanse of functional derivatives. This review considers the primary mechanisms that have contributed to the vastness of our existing, and expanding, protein repertoires, while also outlining the protocols available for elucidating their true biological function. The various function prediction programs available, both sequence and structure based, are discussed and their associated strengths and weaknesses outlined.
Collapse
Affiliation(s)
- Roy D Sleator
- Department of Biological Sciences, Cork Institute of Technology, Bishopstown, Cork, Ireland.
| |
Collapse
|
12
|
Poleksic A. Optimizing a widely used protein structure alignment measure in expected polynomial time. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1716-1720. [PMID: 21904019 DOI: 10.1109/tcbb.2011.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Protein structure alignment is an important tool in many biological applications, such as protein evolution studies, protein structure modeling, and structure-based, computer-aided drug design. Protein structure alignment is also one of the most challenging problems in computational molecular biology, due to an infinite number of possible spatial orientations of any two protein structures. We study one of the most commonly used measures of pairwise protein structure similarity, defined as the number of pairs of atoms in two proteins that can be superimposed under a predefined distance cutoff. We prove that the expected running time of a recently published algorithm for optimizing this (and some other, derived measures of protein structure similarity) is polynomial.
Collapse
Affiliation(s)
- Aleksandar Poleksic
- Department of Computer Science, University of Northern Iowa, 305 ITTC, Cedar Falls, IA 50614-0507, USA.
| |
Collapse
|
13
|
Hetényi C, van der Spoel D. Toward prediction of functional protein pockets using blind docking and pocket search algorithms. Protein Sci 2011; 20:880-93. [PMID: 21413095 DOI: 10.1002/pro.618] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2010] [Revised: 03/06/2011] [Accepted: 03/07/2011] [Indexed: 11/09/2022]
Abstract
Location of functional binding pockets of bioactive ligands on protein molecules is essential in structural genomics and drug design projects. If the experimental determination of ligand-protein complex structures is complicated, blind docking (BD) and pocket search (PS) calculations can help in the prediction of atomic resolution binding mode and the location of the pocket of a ligand on the entire protein surface. Whereas the number of successful predictions by these methods is increasing even for the complicated cases of exosites or allosteric binding sites, their reliability has not been fully established. For a critical assessment of reliability, we use a set of ligand-protein complexes, which were found to be problematic in previous studies. The robustness of BD and PS methods is addressed in terms of success of the selection of truly functional pockets from among the many putative ones identified on the surfaces of ligand-bound and ligand-free (holo and apo) protein forms. Issues related to BD such as effect of hydration, existence of multiple pockets, and competition of subsidiary ligands are considered. Practical cases of PS are discussed, categorized and strategies are recommended for handling the different situations. PS can be used in conjunction with BD, as we find that a consensus approach combining the techniques improves predictive power.
Collapse
Affiliation(s)
- Csaba Hetényi
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden.
| | | |
Collapse
|
14
|
Day R, Qu X, Swanson R, Bohannan Z, Bliss R, Tsai J. Relative Packing Groups in Template-Based Structure Prediction: Cooperative Effects of True Positive Constraints. J Comput Biol 2011; 18:17-26. [DOI: 10.1089/cmb.2010.0078] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ryan Day
- Chemistry Department, University of the Pacific, Stockton, California
| | | | - Rosemarie Swanson
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas
| | - Zach Bohannan
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California
| | - Robert Bliss
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas
| | - Jerry Tsai
- Chemistry Department, University of the Pacific, Stockton, California
| |
Collapse
|
15
|
Doppelt-Azeroual O, Delfaud F, Moriaud F, de Brevern AG. Fast and automated functional classification with MED-SuMo: an application on purine-binding proteins. Protein Sci 2010; 19:847-67. [PMID: 20162627 DOI: 10.1002/pro.364] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Ligand-protein interactions are essential for biological processes, and precise characterization of protein binding sites is crucial to understand protein functions. MED-SuMo is a powerful technology to localize similar local regions on protein surfaces. Its heuristic is based on a 3D representation of macromolecules using specific surface chemical features associating chemical characteristics with geometrical properties. MED-SMA is an automated and fast method to classify binding sites. It is based on MED-SuMo technology, which builds a similarity graph, and it uses the Markov Clustering algorithm. Purine binding sites are well studied as drug targets. Here, purine binding sites of the Protein DataBank (PDB) are classified. Proteins potentially inhibited or activated through the same mechanism are gathered. Results are analyzed according to PROSITE annotations and to carefully refined functional annotations extracted from the PDB. As expected, binding sites associated with related mechanisms are gathered, for example, the Small GTPases. Nevertheless, protein kinases from different Kinome families are also found together, for example, Aurora-A and CDK2 proteins which are inhibited by the same drugs. Representative examples of different clusters are presented. The effectiveness of the MED-SMA approach is demonstrated as it gathers binding sites of proteins with similar structure-activity relationships. Moreover, an efficient new protocol associates structures absent of cocrystallized ligands to the purine clusters enabling those structures to be associated with a specific binding mechanism. Applications of this classification by binding mode similarity include target-based drug design and prediction of cross-reactivity and therefore potential toxic side effects.
Collapse
Affiliation(s)
- Olivia Doppelt-Azeroual
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Université Paris Diderot-Paris 7, Institut National de la Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, 75739 Paris cedex 15, France.
| | | | | | | |
Collapse
|
16
|
Veeramalai M, Gilbert D, Valiente G. An optimized TOPS+ comparison method for enhanced TOPS models. BMC Bioinformatics 2010; 11:138. [PMID: 20236520 PMCID: PMC2858036 DOI: 10.1186/1471-2105-11-138] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2009] [Accepted: 03/17/2010] [Indexed: 11/28/2022] Open
Abstract
Background Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+. Results We have developed a TOPS+ string model as an improvement to the TOPS [1-3] graph model by considering loops as secondary structure elements (SSEs) in addition to helices and strands, representing ligands as first class objects, and describing interactions between SSEs, and SSEs and ligands, by incoming and outgoing arcs, annotating SSEs with the interaction direction and type. Benchmarking results of an all-against-all pairwise comparison using a large dataset of 2,620 non-redundant structures from the PDB40 dataset [4] demonstrate the biological significance, in terms of SCOP classification at the superfamily level, of our TOPS+ comparison method. Conclusions Our advanced TOPS+ comparison shows better performance on the PDB40 dataset [4] compared to our basic TOPS+ method, giving 90% accuracy for SCOP alpha+beta; a 6% increase in accuracy compared to the TOPS and basic TOPS+ methods. It also outperforms the TOPS, basic TOPS+ and SSAP comparison methods on the Chew-Kedem dataset [5], achieving 98% accuracy. Software Availability The TOPS+ comparison server is available at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/.
Collapse
Affiliation(s)
- Mallika Veeramalai
- Joint Center for Molecular Modeling, Sanford-Burnham Medical Research Institute, La Jolla, CA 92037, USA.
| | | | | |
Collapse
|
17
|
An overview of in silico protein function prediction. Arch Microbiol 2010; 192:151-5. [DOI: 10.1007/s00203-010-0549-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2009] [Revised: 01/08/2010] [Accepted: 01/10/2010] [Indexed: 12/12/2022]
|
18
|
Abstract
MOTIVATION Structural alignment is an important tool for understanding the evolutionary relationships between proteins. However, finding the best pairwise structural alignment is difficult, due to the infinite number of possible superpositions of two structures. Unlike the sequence alignment problem, which has a polynomial time solution, the structural alignment problem has not been even classified as solvable. RESULTS We study one of the most widely used measures of protein structural similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. We prove that, for any two proteins, this measure can be optimized for all but finitely many distance cutoffs. Our method leads to a series of algorithms for optimizing other structure similarity measures, including the measures commonly used in protein structure prediction experiments. We also present a polynomial time algorithm for finding a near-optimal superposition of two proteins. Aside from having a relatively low cost, the algorithm for near-optimal solution returns a superposition of provable quality. In other words, the difference between the score of the returned superposition and the score of an optimal superposition can be explicitly computed and used to determine whether the returned superposition is, in fact, the best superposition. CONTACT poleksic@cs.uni.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aleksandar Poleksic
- Department of Computer Science, University of Northern Iowa, Cedar Falls, IA 50614, USA.
| |
Collapse
|
19
|
Pascual-García A, Abia D, Ortiz ÁR, Bastolla U. Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures. PLoS Comput Biol 2009; 5:e1000331. [PMID: 19325884 PMCID: PMC2654728 DOI: 10.1371/journal.pcbi.1000331] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2008] [Accepted: 02/11/2009] [Indexed: 11/19/2022] Open
Abstract
Structural classifications of proteins assume the existence of the fold, which is an intrinsic equivalence class of protein domains. Here, we test in which conditions such an equivalence class is compatible with objective similarity measures. We base our analysis on the transitive property of the equivalence relationship, requiring that similarity of A with B and B with C implies that A and C are also similar. Divergent gene evolution leads us to expect that the transitive property should approximately hold. However, if protein domains are a combination of recurrent short polypeptide fragments, as proposed by several authors, then similarity of partial fragments may violate the transitive property, favouring the continuous view of the protein structure space. We propose a measure to quantify the violations of the transitive property when a clustering algorithm joins elements into clusters, and we find out that such violations present a well defined and detectable cross-over point, from an approximately transitive regime at high structure similarity to a regime with large transitivity violations and large differences in length at low similarity. We argue that protein structure space is discrete and hierarchic classification is justified up to this cross-over point, whereas at lower similarities the structure space is continuous and it should be represented as a network. We have tested the qualitative behaviour of this measure, varying all the choices involved in the automatic classification procedure, i.e., domain decomposition, alignment algorithm, similarity score, and clustering algorithm, and we have found out that this behaviour is quite robust. The final classification depends on the chosen algorithms. We used the values of the clustering coefficient and the transitivity violations to select the optimal choices among those that we tested. Interestingly, this criterion also favours the agreement between automatic and expert classifications. As a domain set, we have selected a consensus set of 2,890 domains decomposed very similarly in SCOP and CATH. As an alignment algorithm, we used a global version of MAMMOTH developed in our group, which is both rapid and accurate. As a similarity measure, we used the size-normalized contact overlap, and as a clustering algorithm, we used average linkage. The resulting automatic classification at the cross-over point was more consistent than expert ones with respect to the structure similarity measure, with 86% of the clusters corresponding to subsets of either SCOP or CATH superfamilies and fewer than 5% containing domains in distinct folds according to both SCOP and CATH. Almost 15% of SCOP superfamilies and 10% of CATH superfamilies were split, consistent with the notion of fold change in protein evolution. These results were qualitatively robust for all choices that we tested, although we did not try to use alignment algorithms developed by other groups. Folds defined in SCOP and CATH would be completely joined in the regime of large transitivity violations where clustering is more arbitrary. Consistently, the agreement between SCOP and CATH at fold level was lower than their agreement with the automatic classification obtained using as a clustering algorithm, respectively, average linkage (for SCOP) or single linkage (for CATH). The networks representing significant evolutionary and structural relationships between clusters beyond the cross-over point may allow us to perform evolutionary, structural, or functional analyses beyond the limits of classification schemes. These networks and the underlying clusters are available at http://ub.cbm.uam.es/research/ProtNet.php Making order of the fast-growing information on proteins is essential for gaining evolutionary and functional knowledge. The most successful approaches to this task are based on classifications of protein structures, such as SCOP and CATH, which assume a discrete view of the protein structure space as a collection of separated equivalence classes (folds). However, several authors proposed that protein domains should be regarded as assemblies of polypeptide fragments, which implies that the protein–structure space is continuous. Here, we assess these views of domain space through the concept of transitivity; i.e., we test whether structure similarity of A with B and B with C implies that A and C are similar, as required for consistent classification. We find that the domain space is approximately transitive and discrete at high similarity and continuous at low similarity, where transitivity is severely violated. Comparing our classification at the cross-over similarity with CATH and SCOP, we find that they join proteins at low similarity where classification is inconsistent. Part of this discrepancy is due to structural divergence of homologous domains, which are forced to be in a single cluster in CATH and SCOP. Structural and evolutionary relationships between consistent clusters are represented as a network in our approach, going beyond current protein classification schemes. We conjecture that our results are related to a change of evolutionary regime, from uniparental divergent evolution for highly related domains to assembly of large fragments for which the classical tree representation is unsuitable.
Collapse
Affiliation(s)
| | - David Abia
- Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid, Spain
| | - Ángel R. Ortiz
- Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid, Spain
| | - Ugo Bastolla
- Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid, Spain
- * E-mail:
| |
Collapse
|
20
|
Lindorff-Larsen K, Ferkinghoff-Borg J. Similarity measures for protein ensembles. PLoS One 2009; 4:e4203. [PMID: 19145244 PMCID: PMC2615214 DOI: 10.1371/journal.pone.0004203] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2008] [Accepted: 11/25/2008] [Indexed: 11/29/2022] Open
Abstract
Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations. However, instead of examining individual conformations it is in many cases more relevant to analyse ensembles of conformations that have been obtained either through experiments or from methods such as molecular dynamics simulations. We here present three approaches that can be used to compare conformational ensembles in the same way as the root mean square deviation is used to compare individual pairs of structures. The methods are based on the estimation of the probability distributions underlying the ensembles and subsequent comparison of these distributions. We first validate the methods using a synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single-molecule refinement.
Collapse
|
21
|
Haddadian EJ, Cheng MH, Coalson RD, Xu Y, Tang P. In silico models for the human alpha4beta2 nicotinic acetylcholine receptor. J Phys Chem B 2008; 112:13981-90. [PMID: 18847252 DOI: 10.1021/jp804868s] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The neuronal alpha4beta2 nicotinic acetylcholine receptor (nAChR) is one of the most widely expressed nAChR subtypes in the brain. Its subunits have high sequence identity (54 and 46% for alpha4 and beta2, respectively) with alpha and beta subunits in Torpedo nAChR. Using the known structure of the Torpedo nAChR as a template, the closed-channel structure of the alpha4beta2 nAChR was constructed through homology modeling. Normal-mode analysis was performed on this closed structure and the resulting lowest frequency mode was applied to it for a "twist-to-open" motion, which increased the minimum pore radius from 2.7 to 3.4 A and generated an open-channel model. Nicotine could bind to the predicted agonist binding sites in the open-channel model but not in the closed one. Both models were subsequently equilibrated in a ternary lipid mixture via extensive molecular dynamics (MD) simulations. Over the course of 11 ns MD simulations, the open channel remained open with filled water, but the closed channel showed a much lower water density at its hydrophobic gate comprised of residues alpha4-V259 and alpha4-L263 and their homologous residues in the beta2 subunits. Brownian dynamics simulations of Na+ permeation through the open channel demonstrated a current-voltage relationship that was consistent with experimental data on the conducting state of alpha4beta2 nAChR. Besides establishment of the well-equilibrated closed- and open-channel alpha4beta2 structural models, the MD simulations on these models provided valuable insights into critical factors that potentially modulate channel gating. Rotation and tilting of TM2 helices led to changes in orientations of pore-lining residue side chains. Without concerted movement, the reorientation of one or two hydrophobic side chains could be enough for channel opening. The closed- and open-channel structures exhibited distinct patterns of electrostatic interactions at the interface of extracellular and transmembrane domains that might regulate the signal propagation of agonist binding to channel opening. A potential prominent role of the beta2 subunit in channel gating was also elucidated in the study.
Collapse
Affiliation(s)
- Esmael J Haddadian
- Department of Anesthesiology, University of Pittsburgh School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA
| | | | | | | | | |
Collapse
|
22
|
Kuhn D, Weskamp N, Hüllermeier E, Klebe G. Functional classification of protein kinase binding sites using Cavbase. ChemMedChem 2008; 2:1432-47. [PMID: 17694525 DOI: 10.1002/cmdc.200700075] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Increasingly, drug-discovery processes focus on complete gene families. Tools for analyzing similarities and differences across protein families are important for the understanding of key functional features of proteins. Herein we present a method for classifying protein families on the basis of the properties of their active sites. We have developed Cavbase, a method for describing and comparing protein binding pockets, and show its application to the functional classification of the binding pockets of the protein family of protein kinases. A diverse set of kinase cavities is mutually compared and analyzed in terms of recurring functional recognition patterns in the active sites. We are able to propose a relevant classification based on the binding motifs in the active sites. The obtained classification provides a novel perspective on functional properties across protein space. The classification of the MAP and the c-Abl kinases is analyzed in detail, showing a clear separation of the respective kinase subfamilies. Remarkable cross-relations among protein kinases are detected, in contrast to sequence-based classifications, which are not able to detect these relations. Furthermore, our classification is able to highlight features important in the optimization of protein kinase inhibitors. Using small-molecule inhibition data we could rationalize cross-reactivities between unrelated kinases which become apparent in the structural comparison of their binding sites. This procedure helps in the identification of other possible kinase targets that behave similarly in "binding pocket space" to the kinase under consideration.
Collapse
Affiliation(s)
- Daniel Kuhn
- Department of Pharmaceutical Chemistry, University of Marburg, Marbacher Weg 6, 35032 Marburg, Germany
| | | | | | | |
Collapse
|
23
|
Ahola V, Aittokallio T, Vihinen M, Uusipaikka E. Model-based prediction of sequence alignment quality. Bioinformatics 2008; 24:2165-71. [DOI: 10.1093/bioinformatics/btn414] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
24
|
Structural refinement of membrane proteins by restrained molecular dynamics and solvent accessibility data. Biophys J 2008; 95:5349-61. [PMID: 18676641 DOI: 10.1529/biophysj.108.142984] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
We present an approach for incorporating solvent accessibility data from electron paramagnetic resonance experiments in the structural refinement of membrane proteins through restrained molecular dynamics simulations. The restraints have been parameterized from oxygen (PiO(2)) and nickel-ethylenediaminediacetic acid (PiNiEdda) collision frequencies, as indicators of lipid or aqueous exposed spin-label sites. These are enforced through interactions between a pseudoatom representation of the covalently attached Nitroxide spin-label and virtual "solvent" particles corresponding to O(2) and NiEdda in the surrounding environment. Interactions were computed using an empirical potential function, where the parameters have been optimized to account for the different accessibilities of the spin-label pseudoatoms to the surrounding environment. This approach, "pseudoatom-driven solvent accessibility refinement", was validated by refolding distorted conformations of the Streptomyces lividans potassium channel (KcsA), corresponding to a range of 2-30 A root mean-square deviations away from the native structure. Molecular dynamics simulations based on up to 58 electron paramagnetic resonance restraints derived from spin-label mutants were able to converge toward the native structure within 1-3 A root mean-square deviations with minimal computational cost. The use of energy-based ranking and structure similarity clustering as selection criteria helped in the convergence and identification of correctly folded structures from a large number of simulations. This approach can be applied to a variety of integral membrane protein systems, regardless of oligomeric state, and should be particularly useful in calculating conformational changes from a known reference crystal structure.
Collapse
|
25
|
Kiel C, Beltrao P, Serrano L. Analyzing Protein Interaction Networks Using Structural Information. Annu Rev Biochem 2008; 77:415-41. [DOI: 10.1146/annurev.biochem.77.062706.133317] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Christina Kiel
- EMBL-CRG Systems Biology Unit, Center de Regulacio Genomica, Barcelona 08003, Spain; ,
| | - Pedro Beltrao
- European Molecular Biology Laboratory, 69117 Heidelberg, Germany;
| | - Luis Serrano
- EMBL-CRG Systems Biology Unit, Center de Regulacio Genomica, Barcelona 08003, Spain; ,
| |
Collapse
|
26
|
Liu ZP, Wu LY, Wang Y, Zhang XS, Chen L. Bridging protein local structures and protein functions. Amino Acids 2008; 35:627-50. [PMID: 18421562 PMCID: PMC7088341 DOI: 10.1007/s00726-008-0088-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Accepted: 03/10/2008] [Indexed: 12/11/2022]
Abstract
One of the major goals of molecular and evolutionary biology is to understand the functions of proteins by extracting functional information from protein sequences, structures and interactions. In this review, we summarize the repertoire of methods currently being applied and report recent progress in the field of in silico annotation of protein function based on the accumulation of vast amounts of sequence and structure data. In particular, we emphasize the newly developed structure-based methods, which are able to identify locally structural motifs and reveal their relationship with protein functions. These methods include computational tools to identify the structural motifs and reveal the strong relationship between these pre-computed local structures and protein functions. We also discuss remaining problems and possible directions for this exciting and challenging area.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 100080, Beijing, China
| | | | | | | | | |
Collapse
|
27
|
Campagna A, Serrano L, Kiel C. Shaping dots and lines: adding modularity into protein interaction networks using structural information. FEBS Lett 2008; 582:1231-6. [PMID: 18282473 DOI: 10.1016/j.febslet.2008.02.019] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2007] [Revised: 02/07/2008] [Accepted: 02/08/2008] [Indexed: 12/12/2022]
Abstract
Determining protein interaction networks and generating models to simulate network changes in time and space are crucial for understanding a biological system and for predicting the effect of mutants found in diseases. In this review we discuss the great potential of using structural information together with computational tools towards reaching this goal: the prediction of new protein interactions, the estimation of affinities and kinetic rate constants between protein complexes, and finally the determination of which interactions are compatible with each other and which interactions are exclusive. The latter one will be important to reorganize large scale networks into functional modular networks.
Collapse
Affiliation(s)
- Anne Campagna
- EMBL-CRG Systems Biology Unit, CRG-Centre de Regulacio Genomica, Dr. Aiguader 88, 08003 Barcelona, Spain
| | | | | |
Collapse
|
28
|
Shatsky M, Nussinov R, Wolfson HJ. Algorithms for multiple protein structure alignment and structure-derived multiple sequence alignment. Methods Mol Biol 2008; 413:125-46. [PMID: 18075164 PMCID: PMC10773980 DOI: 10.1007/978-1-59745-574-9_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Primary amino acid content and the geometry of the folded protein 3D structure are major parameters of protein function. During the course of evolution the protein 3D structure is more preserved than its primary sequence. Thus, analysis of protein structures is expected to lead to a deep insight into protein function. Recognition of a structural core common to a set of protein structures serves as a basic tool for the studies of protein evolution and classification, analysis of similar structural motifs and functional binding sites, and for homology modeling and threading. In this chapter, we discuss several biologically related computational aspects of the multiple structure alignment and propose a method that provides solutions to these problems. Finally, we address the problem of structure-based multiple sequence alignment and propose an optimization method that unifies primary sequence and 3D structure information.
Collapse
Affiliation(s)
- Maxim Shatsky
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | | | | |
Collapse
|
29
|
Liu ZP, Wu LY, Wang Y, Chen L, Zhang XS. Predicting gene ontology functions from protein's regional surface structures. BMC Bioinformatics 2007; 8:475. [PMID: 18070366 PMCID: PMC2233648 DOI: 10.1186/1471-2105-8-475] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2007] [Accepted: 12/11/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Annotation of protein functions is an important task in the post-genomic era. Most early approaches for this task exploit only the sequence or global structure information. However, protein surfaces are believed to be crucial to protein functions because they are the main interfaces to facilitate biological interactions. Recently, several databases related to structural surfaces, such as pockets and cavities, have been constructed with a comprehensive library of identified surface structures. For example, CASTp provides identification and measurements of surface accessible pockets as well as interior inaccessible cavities. RESULTS A novel method was proposed to predict the Gene Ontology (GO) functions of proteins from the pocket similarity network, which is constructed according to the structure similarities of pockets. The statistics of the networks were presented to explore the relationship between the similar pockets and GO functions of proteins. Cross-validation experiments were conducted to evaluate the performance of the proposed method. Results and codes are available at: http://zhangroup.aporc.org/bioinfo/PSN/. CONCLUSION The computational results demonstrate that the proposed method based on the pocket similarity network is effective and efficient for predicting GO functions of proteins in terms of both computational complexity and prediction accuracy. The proposed method revealed strong relationship between small surface patterns (or pockets) and GO functions, which can be further used to identify active sites or functional motifs. The high quality performance of the prediction method together with the statistics also indicates that pockets play essential roles in biological interactions or the GO functions. Moreover, in addition to pockets, the proposed network framework can also be used for adopting other protein spatial surface patterns to predict the protein functions.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, China.
| | | | | | | | | |
Collapse
|
30
|
Liu S, Zhang C, Liang S, Zhou Y. Fold recognition by concurrent use of solvent accessibility and residue depth. Proteins 2007; 68:636-45. [PMID: 17510969 DOI: 10.1002/prot.21459] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Recognizing the structural similarity without significant sequence identity (called fold recognition) is the key for bridging the gap between the number of known protein sequences and the number of structures solved. Previously, we developed a fold-recognition method called SP(3) which combines sequence-derived sequence profiles, secondary-structure profiles and residue-depth dependent, structure-derived sequence profiles. The use of residue-depth-dependent profiles makes SP(3) one of the best automatic predictors in CASP 6. Because residue depth (RD) and solvent accessible surface area (solvent accessibility) are complementary in describing the exposure of a residue to solvent, we test whether or not incorporation of solvent-accessibility profiles into SP(3) could further increase the accuracy of fold recognition. The resulting method, called SP(4), was tested in SALIGN benchmark for alignment accuracy and Lindahl, LiveBench 8 and CASP7 blind prediction for fold recognition sensitivity and model-structure accuracy. For remote homologs, SP(4) is found to consistently improve over SP(3) in the accuracy of sequence alignment and predicted structural models as well as in the sensitivity of fold recognition. Our result suggests that RD and solvent accessibility can be used concurrently for improving the accuracy and sensitivity of fold recognition. The SP(4) server and its local usage package are available on http://sparks.informatics.iupui.edu/SP4.
Collapse
Affiliation(s)
- Song Liu
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | | | | | |
Collapse
|
31
|
Mirkovic N, Li Z, Parnassa A, Murray D. Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization. Proteins 2007; 66:766-77. [PMID: 17154423 DOI: 10.1002/prot.21191] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The technological breakthroughs in structural genomics were designed to facilitate the solution of a sufficient number of structures, so that as many protein sequences as possible can be structurally characterized with the aid of comparative modeling. The leverage of a solved structure is the number and quality of the models that can be produced using the structure as a template for modeling and may be viewed as the "currency" with which the success of a structural genomics endeavor can be measured. Moreover, the models obtained in this way should be valuable to all biologists. To this end, at the Northeast Structural Genomics Consortium (NESG), a modular computational pipeline for automated high-throughput leverage analysis was devised and used to assess the leverage of the 186 unique NESG structures solved during the first phase of the Protein Structure Initiative (January 2000 to July 2005). Here, the results of this analysis are presented. The number of sequences in the nonredundant protein sequence database covered by quality models produced by the pipeline is approximately 39,000, so that the average leverage is approximately 210 models per structure. Interestingly, only 7900 of these models fulfill the stringent modeling criterion of being at least 30% sequence-identical to the corresponding NESG structures. This study shows how high-throughput modeling increases the efficiency of structure determination efforts by providing enhanced coverage of protein structure space. In addition, the approach is useful in refining the boundaries of structural domains within larger protein sequences, subclassifying sequence diverse protein families, and defining structure-based strategies specific to a particular family.
Collapse
Affiliation(s)
- Nebojsa Mirkovic
- Department of Microbiology and Immunology, Weill Medical College of Cornell University, New York, New York 10021, USA
| | | | | | | |
Collapse
|
32
|
Mayer KL, Qu Y, Bansal S, LeBlond PD, Jenney FE, Brereton PS, Adams MWW, Xu Y, Prestegard JH. Structure determination of a new protein from backbone-centered NMR data and NMR-assisted structure prediction. Proteins 2006; 65:480-9. [PMID: 16927360 DOI: 10.1002/prot.21119] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Targeting of proteins for structure determination in structural genomic programs often includes the use of threading and fold recognition methods to exclude proteins belonging to well-populated fold families, but such methods can still fail to recognize preexisting folds. The authors illustrate here a method in which limited amounts of structural data are used to improve an initial homology search and the data are subsequently used to produce a structure by data-constrained refinement of an identified structural template. The data used are primarily NMR-based residual dipolar couplings, but they also include additional chemical shift and backbone-nuclear Overhauser effect data. Using this methodology, a backbone structure was efficiently produced for a 10 kDa protein (PF1455) from Pyrococcus furiosus. Its relationship to existing structures and its probable function are discussed.
Collapse
Affiliation(s)
- K L Mayer
- Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia 30602, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Reis LFL, Van Sluys MA, Garratt RC, Pereira HM, Teixeira MM. GMOs: building the future on the basis of past experience. AN ACAD BRAS CIENC 2006; 78:667-86. [PMID: 17143405 DOI: 10.1590/s0001-37652006000400005] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2006] [Accepted: 10/24/2006] [Indexed: 11/22/2022] Open
Abstract
Biosafety of genetically modified organisms (GMOs) and their derivatives is still a major topic in the agenda of government and societies worldwide. The aim of this review is to bring into light that data that supported the decision taken back in 1998 as an exercise to stimulate criticism from the scientific community for upcoming discussions and to avoid emotional and senseless arguments that could jeopardize future development in the field. It must be emphasized that Roundup Ready soybean is just one example of how biotechnology can bring in significant advances for society, not only through increased productivity, but also with beneficial environmental impact, thereby allowing more rational use of agricultural pesticides for improvement of the soil conditions. The adoption of agricultural practices with higher yield will also allow better distribution of income among small farmers. New species of genetically modified plants will soon be available and society should be capable of making decisions in an objective and well-informed manner, through collegiate bodies that are qualified in all aspects of biosafety and environmental impact.
Collapse
Affiliation(s)
- Luiz F L Reis
- Ludwig Institute for Cancer Research, São Paulo, SP, Brazil.
| | | | | | | | | |
Collapse
|
34
|
Carraro R, Búa J, Ruiz A, Paulino M. Modelling and study of cyclosporin A and related compounds in complexes with a Trypanosoma cruzi cyclophilin. J Mol Graph Model 2006; 26:48-61. [PMID: 17174582 DOI: 10.1016/j.jmgm.2006.09.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2006] [Revised: 09/08/2006] [Accepted: 09/20/2006] [Indexed: 10/24/2022]
Abstract
Cyclophilins (CyPs) are enzymes involved in protein folding, catalyzing the isomerisation of peptidyl prolyl bonds in proteins and peptides between the cis- and trans-conformations. They are also the major cellular target for the immunosuppressive drug Cyclosporin A (CsA). In Trypanosoma cruzi, the most abundantly expressed CyP is an isoform of 19 kDa, TcCyP19, in which the enzymatic activity is inhibited by CsA. Among a reported set of CsA analogues, two non-immunosuppressive compounds, H-7-94 and F-7-62, proved to be the best inhibitors of TcCyP19 enzymatic activity as well as the most efficient trypanocidal drugs. With the objective of analysing, at the molecular level, how the structural differences between the three above-mentioned inhibitors justify their different inhibitory activity on TcCyP19, three-dimensional molecular modelling structures were generated to computationally simulate behaviours and interactions. An energy-minimized model of each binary complex in water with ions was obtained. These models were then used as starting point for molecular dynamic simulations, performed with GROMOS96 program. With the resulting set of co-ordinates and energies, a comparison of the interaction between CsA and both CsA analogues in T. cruzi and human cyclophilins were performed. Within the different magnitudes analysed, the total potential complex energy exhibited the best correlation with the experimental data. The results obtained in this study support the use of this methodology when designing new lead inhibitor compounds.
Collapse
Affiliation(s)
- Roberto Carraro
- Physical Chemistry and Mathematics Department, Molecular Pharmacology and Biomodelling Laboratory, Facultad de Química, Universidad de la República, General Flores 2124, 11800 Montevideo, Uruguay
| | | | | | | |
Collapse
|
35
|
Rossi A, Marti-Renom MA, Sali A. Localization of binding sites in protein structures by optimization of a composite scoring function. Protein Sci 2006; 15:2366-80. [PMID: 16963645 PMCID: PMC2242385 DOI: 10.1110/ps.062247506] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The rise in the number of functionally uncharacterized protein structures is increasing the demand for structure-based methods for functional annotation. Here, we describe a method for predicting the location of a binding site of a given type on a target protein structure. The method begins by constructing a scoring function, followed by a Monte Carlo optimization, to find a good scoring patch on the protein surface. The scoring function is a weighted linear combination of the z-scores of various properties of protein structure and sequence, including amino acid residue conservation, compactness, protrusion, convexity, rigidity, hydrophobicity, and charge density; the weights are calculated from a set of previously identified instances of the binding-site type on known protein structures. The scoring function can easily incorporate different types of information useful in localization, thus increasing the applicability and accuracy of the approach. To test the method, 1008 known protein structures were split into 20 different groups according to the type of the bound ligand. For nonsugar ligands, such as various nucleotides, binding sites were correctly identified in 55%-73% of the cases. The method is completely automated (http://salilab.org/patcher) and can be applied on a large scale in a structural genomics setting.
Collapse
Affiliation(s)
- Andrea Rossi
- Department of Biopharmaceutical Sciences and Pharmaceutical Chemistry, California Institute for Quantitative Biomedical Research, University of California, San Francisco, California 94143-2552, USA.
| | | | | |
Collapse
|
36
|
Weissman KJ, Hong H, Popovic B, Meersman F. Evidence for a protein-protein interaction motif on an acyl carrier protein domain from a modular polyketide synthase. ACTA ACUST UNITED AC 2006; 13:625-36. [PMID: 16793520 DOI: 10.1016/j.chembiol.2006.04.010] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2005] [Revised: 04/24/2006] [Accepted: 04/25/2006] [Indexed: 11/19/2022]
Abstract
During biosynthesis on modular polyketide synthases (PKSs), chain extension intermediates are tethered to acyl carrier protein (ACP) domains through phosphopantetheinyl prosthetic groups. Each ACP must therefore interact with every other domain within the module, and also with a downstream acceptor domain. The nature of these interactions is key to our understanding of the topology and operation of these multienzymes. Sequence analysis and homology modeling implicates a potential helical region (helix II) on the ACPs as a protein-protein interaction motif. Using site-directed mutagenesis, we show that residues along this putative helix lie at the interface between the ACP and the phosphopantetheinyl transferase that catalyzes its activation. Our results accord with previous studies of discrete ACP proteins from fatty acid and aromatic polyketide biosynthesis, suggesting that helix II may also serve as a universal interaction motif in modular PKSs.
Collapse
Affiliation(s)
- Kira J Weissman
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, United Kingdom.
| | | | | | | |
Collapse
|
37
|
Abstract
Homology modeling plays a central role in determining protein structure in the structural genomics project. The importance of homology modeling has been steadily increasing because of the large gap that exists between the overwhelming number of available protein sequences and experimentally solved protein structures, and also, more importantly, because of the increasing reliability and accuracy of the method. In fact, a protein sequence with over 30% identity to a known structure can often be predicted with an accuracy equivalent to a low-resolution X-ray structure. The recent advances in homology modeling, especially in detecting distant homologues, aligning sequences with template structures, modeling of loops and side chains, as well as detecting errors in a model, have contributed to reliable prediction of protein structure, which was not possible even several years ago. The ongoing efforts in solving protein structures, which can be time-consuming and often difficult, will continue to spur the development of a host of new computational methods that can fill in the gap and further contribute to understanding the relationship between protein structure and function.
Collapse
Affiliation(s)
- Zhexin Xiang
- Center for Molecular Modeling, Center for Information Technology, National Institutes of Health, Building 12A Room 2051, 12 South Drive, Bethesda, Maryland 20892-5624, USA.
| |
Collapse
|
38
|
Yura K, Yamaguchi A, Go M. Coverage of whole proteome by structural genomics observed through protein homology modeling database. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2006; 7:65-76. [PMID: 17146617 PMCID: PMC1769342 DOI: 10.1007/s10969-006-9010-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2006] [Accepted: 08/08/2006] [Indexed: 11/07/2022]
Abstract
We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE ( http://daisy.nagahama-i-bio.ac.jp/Famsbase/ ), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics.
Collapse
Affiliation(s)
- Kei Yura
- Quantum Bioinformatics Team, Center for Computational Science and Engineering, Japan Atomic Energy Agency, Kyoto 619-0215, Japan.
| | | | | |
Collapse
|
39
|
Fleming K, Kelley LA, Islam SA, MacCallum RM, Muller A, Pazos F, Sternberg MJ. The proteome: structure, function and evolution. Philos Trans R Soc Lond B Biol Sci 2006; 361:441-51. [PMID: 16524832 PMCID: PMC1609342 DOI: 10.1098/rstb.2005.1802] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
This paper reports two studies to model the inter-relationships between protein sequence, structure and function. First, an automated pipeline to provide a structural annotation of proteomes in the major genomes is described. The results are stored in a database at Imperial College, London (3D-GENOMICS) that can be accessed at www.sbg.bio.ic.ac.uk. Analysis of the assignments to structural superfamilies provides evolutionary insights. 3D-GENOMICS is being integrated with related proteome annotation data at University College London and the European Bioinformatics Institute in a project known as e-protein (http://www.e-protein.org/). The second topic is motivated by the developments in structural genomics projects in which the structure of a protein is determined prior to knowledge of its function. We have developed a new approach PHUNCTIONER that uses the gene ontology (GO) classification to supervise the extraction of the sequence signal responsible for protein function from a structure-based sequence alignment. Using GO we can obtain profiles for a range of specificities described in the ontology. In the region of low sequence similarity (around 15%), our method is more accurate than assignment from the closest structural homologue. The method is also able to identify the specific residues associated with the function of the protein family.
Collapse
Affiliation(s)
- Keiran Fleming
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
| | - Lawrence A Kelley
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Suhail A Islam
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Robert M MacCallum
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Arne Muller
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Florencio Pazos
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
| | - Michael J.E Sternberg
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
- Author for correspondence ()
| |
Collapse
|
40
|
Sam V, Tai CH, Garnier J, Gibrat JF, Lee B, Munson PJ. ROC and confusion analysis of structure comparison methods identify the main causes of divergence from manual protein classification. BMC Bioinformatics 2006; 7:206. [PMID: 16613604 PMCID: PMC1513609 DOI: 10.1186/1471-2105-7-206] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2005] [Accepted: 04/13/2006] [Indexed: 11/30/2022] Open
Abstract
Background Current classification of protein folds are based, ultimately, on visual inspection of similarities. Previous attempts to use computerized structure comparison methods show only partial agreement with curated databases, but have failed to provide detailed statistical and structural analysis of the causes of these divergences. Results We construct a map of similarities/dissimilarities among manually defined protein folds, using a score cutoff value determined by means of the Receiver Operating Characteristics curve. It identifies folds which appear to overlap or to be "confused" with each other by two distinct similarity measures. It also identifies folds which appear inhomogeneous in that they contain apparently dissimilar domains, as measured by both similarity measures. At a low (1%) false positive rate, 25 to 38% of domain pairs in the same SCOP folds do not appear similar. Our results suggest either that some of these folds are defined using criteria other than purely structural consideration or that the similarity measures used do not recognize some relevant aspects of structural similarity in certain cases. Specifically, variations of the "common core" of some folds are severe enough to defeat attempts to automatically detect structural similarity and/or to lead to false detection of similarity between domains in distinct folds. Structures in some folds vary greatly in size because they contain varying numbers of a repeating unit, while similarity scores are quite sensitive to size differences. Structures in different folds may contain similar substructures, which produce false positives. Finally, the common core within a structure may be too small relative to the entire structure, to be recognized as the basis of similarity to another. Conclusion A detailed analysis of the entire available protein fold space by two automated similarity methods reveals the extent and the nature of the divergence between the automatically determined similarity/dissimilarity and the manual fold type classifications. Some of the observed divergences can probably be addressed with better structure comparison methods and better automatic, intelligent classification procedures. Others may be intrinsic to the problem, suggesting a continuous rather than discrete protein fold space.
Collapse
Affiliation(s)
- Vichetra Sam
- Mathematical and Statistical Computing Laboratory, DCB, CIT, NIH, DHHS, Bethesda, MD, USA
| | - Chin-Hsien Tai
- Laboratory of Molecular Biology, CCR, NCI, NIH, DHHS, Bethesda, MD, USA
| | - Jean Garnier
- Mathematical and Statistical Computing Laboratory, DCB, CIT, NIH, DHHS, Bethesda, MD, USA
- Mathematique Informatique et Genome, INRA, Jouy-en-Josas, France
| | | | - Byungkook Lee
- Laboratory of Molecular Biology, CCR, NCI, NIH, DHHS, Bethesda, MD, USA
| | - Peter J Munson
- Mathematical and Statistical Computing Laboratory, DCB, CIT, NIH, DHHS, Bethesda, MD, USA
| |
Collapse
|
41
|
Ofran Y, Punta M, Schneider R, Rost B. Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery. Drug Discov Today 2006; 10:1475-82. [PMID: 16243268 DOI: 10.1016/s1359-6446(05)03621-4] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Every entirely sequenced genome reveals 100 s to 1000 s of protein sequences for which the only annotation available is 'hypothetical protein'. Thus, in the human genome and in the genomes of pathogenic agents there could be 1000 s of potential, unexplored drug targets. Computational prediction of protein function can play a role in studying these targets. We shall review the challenges, research approaches and recently developed tools in the field of computational function-prediction and we will discuss the ways these issues can change the process of drug discovery.
Collapse
Affiliation(s)
- Yanay Ofran
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.
| | | | | | | |
Collapse
|
42
|
Chandonia JM, Brenner SE. Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches. Proteins 2006; 58:166-79. [PMID: 15521074 DOI: 10.1002/prot.20298] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Structural genomics is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy that is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the "Pfam5000" strategy, which involves selecting the 5000 most important families from the Pfam database as sources for targets. We compare the Pfam5000 strategy to several other proposed strategies that would require similar numbers of targets. These strategies include complete solution of several small to moderately sized bacterial proteomes, partial coverage of the human proteome, and random selection of approximately 5000 targets from sequenced genomes. We measure the impact that successful implementation of these strategies would have upon structural interpretation of the proteins in Swiss-Prot, TrEMBL, and 131 complete proteomes (including 10 of eukaryotes) from the Proteome Analysis database at the European Bioinformatics Institute (EBI). Solving the structures of proteins from the 5000 largest Pfam families would allow accurate fold assignment for approximately 68% of all prokaryotic proteins (covering 59% of residues) and 61% of eukaryotic proteins (40% of residues). More fine-grained coverage that would allow accurate modeling of these proteins would require an order of magnitude more targets. The Pfam5000 strategy may be modified in several ways, for example, to focus on larger families, bacterial sequences, or eukaryotic sequences; as long as secondary consideration is given to large families within Pfam, coverage results vary only slightly. In contrast, focusing structural genomics on a single tractable genome would have only a limited impact in structural knowledge of other proteomes: A significant fraction (about 30-40% of the proteins and 40-60% of the residues) of each proteome is classified in small families, which may have little overlap with other species of interest. Random selection of targets from one or more genomes is similar to the Pfam5000 strategy in that proteins from larger families are more likely to be chosen, but substantial effort would be spent on small families.
Collapse
Affiliation(s)
- John-Marc Chandonia
- Berkeley Structural Genomics Center, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | | |
Collapse
|
43
|
Abstract
This article describes the development of a new method for multiple sequence alignment based on fold-level protein structure alignments, which provides an improvement in accuracy compared with the most commonly used sequence-only-based techniques. This method integrates the widely used, progressive multiple sequence alignment approach ClustalW with the Topology of Protein Structure (TOPS) topology-based alignment algorithm. The TOPS approach produces a structural alignment for the input protein set by using a topology-based pattern discovery program, providing a set of matched sequence regions that can be used to guide a sequence alignment using ClustalW. The resulting alignments are more reliable than a sequence-only alignment, as determined by 20-fold cross-validation with a set of 106 protein examples from the CATH database, distributed in seven superfold families. The method is particularly effective for sets of proteins that have similar structures at the fold level but low sequence identity. The aim of this research is to contribute towards bridging the gap between protein sequence and structure analysis, in the hope that this can be used to assist the understanding of the relationship between sequence, structure and function. The tool is available at http://balabio.dcs.gla.ac.uk/msat/.
Collapse
Affiliation(s)
- Te Ren
- Department of Computer Science, Bioinformatics Research Centre, University of Glasgow, Glasgow G12 8QQ, Scotland, UK
| | | | | | | |
Collapse
|
44
|
Glaser F, Morris RJ, Najmanovich RJ, Laskowski RA, Thornton JM. A method for localizing ligand binding pockets in protein structures. Proteins 2005; 62:479-88. [PMID: 16304646 DOI: 10.1002/prot.20769] [Citation(s) in RCA: 166] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The accurate identification of ligand binding sites in protein structures can be valuable in determining protein function. Once the binding site is known, it becomes easier to perform in silico and experimental procedures that may allow the ligand type and the protein function to be determined. For example, binding pocket shape analysis relies heavily on the correct localization of the ligand binding site. We have developed SURFNET-ConSurf, a modular, two-stage method for identifying the location and shape of potential ligand binding pockets in protein structures. In the first stage, the SURFNET program identifies clefts in the protein surface that are potential binding sites. In the second stage, these clefts are trimmed in size by cutting away regions distant from highly conserved residues, as defined by the ConSurf-HSSP database. The largest clefts that remain tend to be those where ligands bind. To test the approach, we analyzed a nonredundant set of 244 protein structures from the PDB and found that SURFNET-ConSurf identifies a ligand binding pocket in 75% of them. The trimming procedure reduces the original cleft volumes by 30% on average, while still encompassing an average 87% of the ligand volume. From the analysis of the results we conclude that for those cases in which the ligands are found in large, highly conserved clefts, the combined SURFNET-ConSurf method gives pockets that are a better match to the ligand shape and location. We also show that this approach works better for enzymes than for nonenzyme proteins.
Collapse
Affiliation(s)
- Fabian Glaser
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| | | | | | | | | |
Collapse
|
45
|
Shatsky M, Nussinov R, Wolfson HJ. Optimization of multiple-sequence alignment based on multiple-structure alignment. Proteins 2005; 62:209-17. [PMID: 16294339 DOI: 10.1002/prot.20665] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Routinely used multiple-sequence alignment methods use only sequence information. Consequently, they may produce inaccurate alignments. Multiple-structure alignment methods, on the other hand, optimize structural alignment by ignoring sequence information. Here, we present an optimization method that unifies sequence and structure information. The alignment score is based on standard amino acid substitution probabilities combined with newly computed three-dimensional structure alignment probabilities. The advantage of our alignment scheme is in its ability to produce more accurate multiple alignments. We demonstrate the usefulness of the method in three applications: 1) computing more accurate multiple-sequence alignments, 2) analyzing protein conformational changes, and 3) computation of amino acid structure-sequence conservation with application to protein-protein docking prediction. The method is available at http://bioinfo3d.cs.tau.ac.il/staccato/.
Collapse
Affiliation(s)
- Maxim Shatsky
- School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, Israel.
| | | | | |
Collapse
|
46
|
Lubec G, Afjehi-Sadat L, Yang JW, John JPP. Searching for hypothetical proteins: theory and practice based upon original data and literature. Prog Neurobiol 2005; 77:90-127. [PMID: 16271823 DOI: 10.1016/j.pneurobio.2005.10.001] [Citation(s) in RCA: 120] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2005] [Revised: 09/18/2005] [Accepted: 10/02/2005] [Indexed: 12/29/2022]
Abstract
A large part of mammalian proteomes is represented by hypothetical proteins (HP), i.e. proteins predicted from nucleic acid sequences only and protein sequences with unknown function. Databases are far from being complete and errors are expected. The legion of HP is awaiting experiments to show their existence at the protein level and subsequent bioinformatic handling in order to assign proteins a tentative function is mandatory. Two-dimensional gel-electrophoresis with subsequent mass spectrometrical identification of protein spots is an appropriate tool to search for HP in the high-throughput mode. Spots are identified by MS or by MS/MS measurements (MALDI-TOF, MALDI-TOF-TOF) and subsequent software as e.g. Mascot or ProFound. In many cases proteins can thus be unambiguously identified and characterised; if this is not the case, de novo sequencing or Q-TOF analysis is warranted. If the protein is not identified, the sequence is being sent to databases for BLAST searches to determine identities/similarities or homologies to known proteins. If no significant identity to known structures is observed, the protein sequence is examined for the presence of functional domains (databases PROSITE, PRINTS, InterPro, ProDom, Pfam and SMART), subjected to searches for motifs (ELM) and finally protein-protein interaction databases (InterWeaver, STRING) are consulted or predictions from conformations are performed. We here provide information about hypothetical proteins in terms of protein chemical analysis, independent of antibody availability and specificity and bioinformatic handling to contribute to the extension/completion of protein databases and include original work on HP in the brain to illustrate the processes of HP identification and functional assignment.
Collapse
Affiliation(s)
- Gert Lubec
- Department of Pediatrics, Division of Basic Sciences, Medical University of Vienna, Waehringer Guertel 18-20, A-1090, Vienna, Austria.
| | | | | | | |
Collapse
|
47
|
Forrest LR, Honig B. An assessment of the accuracy of methods for predicting hydrogen positions in protein structures. Proteins 2005; 61:296-309. [PMID: 16114036 DOI: 10.1002/prot.20601] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The addition of hydrogen atoms to models or experimental structures of proteins that contain only non-hydrogen atoms is a common step in crystallographic structure refinement, in theoretical studies of proteins, and in protein structure prediction. Accurate prediction of the hydrogen positions is essential, since they constitute around half of the atoms in proteins and hence contribute significantly to their energetics. Many computational tools exist for predicting hydrogen positions, although to date no quantitative comparison has been made of their accuracy or efficiency. Here we take advantage of the recent increase in ultra-high-resolution X-ray crystal structures (< 0.9 A resolution), as well as of a number of relatively high-resolution neutron diffraction structures (< 1.8 A resolution), to compare the quality of the predictions generated by a large set of commonly used methods. These include CHARMM, CNS, GROMACS, MCCE, MolProbity, WHAT IF, and X-PLOR. The hydrogen atoms that lack a rotational degree of freedom are mostly, but not always, accurately predicted. For hydrogens with a rotational degree of freedom, all the methods give much less accurate predictions. The predictions for the hydroxyl hydrogens are analyzed in detail, particularly those buried within the protein, and some explanation is provided for the errors observed. The results provide a means to make informed decisions regarding the choice and implementation of methodologies for placing hydrogens on structures of proteins. They also point to shortcomings in current force fields and suggest the need for improved descriptions of hydrogen bonding energetics.
Collapse
Affiliation(s)
- Lucy R Forrest
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | |
Collapse
|
48
|
McLaughlin WA, Kulp DW, de la Cruz J, Lu XJ, Lawson CL, Berman HM. A structure-based method for identifying DNA-binding proteins and their sites of DNA-interaction. ACTA ACUST UNITED AC 2005; 5:255-65. [PMID: 15704013 DOI: 10.1007/s10969-005-4902-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2004] [Accepted: 11/17/2004] [Indexed: 01/11/2023]
Abstract
A classification model of a DNA-binding protein chain was created based on identification of alpha helices within the chain likely to bind to DNA. Using the model, all chains in the Protein Data Bank were classified. For many of the chains classified with high confidence, previous documentation for DNA-binding was found, yet no sequence homology to the structures used to train the model was detected. The result indicates that the chain model can be used to supplement sequence based methods for annotating the function of DNA-binding. Four new candidates for DNA-binding were found, including two structures solved through structural genomics efforts. For each of the candidate structures, possible sites of DNA-binding are indicated by listing the residue ranges of alpha helices likely to interact with DNA.
Collapse
Affiliation(s)
- William A McLaughlin
- Department of Chemistry and Chemical Biology, Rutgers-The State University of New Jersey, 610 Taylor Road, Piscataway, NJ 08854-8087, USA
| | | | | | | | | | | |
Collapse
|
49
|
Centeno NB, Planas-Iglesias J, Oliva B. Comparative modelling of protein structure and its impact on microbial cell factories. Microb Cell Fact 2005; 4:20. [PMID: 15989691 PMCID: PMC1183243 DOI: 10.1186/1475-2859-4-20] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2005] [Accepted: 06/30/2005] [Indexed: 11/22/2022] Open
Abstract
Comparative modeling is becoming an increasingly helpful technique in microbial cell factories as the knowledge of the three-dimensional structure of a protein would be an invaluable aid to solve problems on protein production. For this reason, an introduction to comparative modeling is presented, with special emphasis on the basic concepts, opportunities and challenges of protein structure prediction. This review is intended to serve as a guide for the biologist who has no special expertise and who is not involved in the determination of protein structure. Selected applications of comparative modeling in microbial cell factories are outlined, and the role of microbial cell factories in the structural genomics initiative is discussed.
Collapse
Affiliation(s)
- Nuria B Centeno
- Structural Bioinformatics Laboratory, Research Group on Biomedical Informatics (GRIB), IMIM/UPF. c/ Dr. Aiguader 80. 08003 Barcelona, Spain
| | - Joan Planas-Iglesias
- Structural Bioinformatics Laboratory, Research Group on Biomedical Informatics (GRIB), IMIM/UPF. c/ Dr. Aiguader 80. 08003 Barcelona, Spain
| | - Baldomero Oliva
- Structural Bioinformatics Laboratory, Research Group on Biomedical Informatics (GRIB), IMIM/UPF. c/ Dr. Aiguader 80. 08003 Barcelona, Spain
| |
Collapse
|
50
|
Noel JP, Austin MB, Bomati EK. Structure-function relationships in plant phenylpropanoid biosynthesis. CURRENT OPINION IN PLANT BIOLOGY 2005; 8:249-53. [PMID: 15860421 PMCID: PMC2861907 DOI: 10.1016/j.pbi.2005.03.013] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Plants, as sessile organisms, evolve and exploit metabolic systems to create a rich repertoire of complex natural products that hold adaptive significance for their survival in challenging ecological niches on earth. As an experimental tool set, structural biology provides a high-resolution means to uncover detailed information about the structure-function relationships of metabolic enzymes at the atomic level. Together with genomic and biochemical approaches and an appreciation of molecular evolution, structural enzymology holds great promise for addressing a number of questions relating to secondary or, more appropriately, specialized metabolism. Why is secondary metabolism so adaptable? How are reactivity, regio-chemistry and stereo-chemistry steered during the multi-step conversion of substrates into products? What are the vestigial structural and mechanistic traits that remain in biosynthetic enzymes during the diversification of substrate and product selectivity? What does the catalytic landscape look like as an enzyme family traverses all possible lineages en route to the acquisition of new substrate and/or product specificities? And how can one rationally engineer biosynthesis using the unique perspectives of evolution and structural biology to create novel chemicals for human use?
Collapse
Affiliation(s)
- Joseph P Noel
- Jack Skirball Chemical Biology and Proteomics Laboratory, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, California 92037, USA.
| | | | | |
Collapse
|