1
|
Medrano-Soto A, Pal D, Eisenberg D. Inferring molecular function: contributions from functional linkages. Trends Genet 2008; 24:587-90. [PMID: 18951645 DOI: 10.1016/j.tig.2008.10.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2008] [Revised: 10/03/2008] [Accepted: 10/03/2008] [Indexed: 10/21/2022]
Abstract
In the current era of high-throughput sequencing and structure determination, functional annotation has become a bottleneck in biomedical science. Here, we show that automated inference of molecular function using functional linkages among genes increases the accuracy of functional assignments by > or =8% and enriches functional descriptions in > or =34% of top assignments. Furthermore, biochemical literature supports >80% of automated inferences for previously unannotated proteins. These results emphasize the benefit of incorporating functional linkages in protein annotation.
Collapse
Affiliation(s)
- Arturo Medrano-Soto
- Howard Hughes Medical Institute (HHMI), 675C. E. Young Drive South, Los Angeles, CA 90095, USA
| | | | | |
Collapse
|
2
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using MODELLER. ACTA ACUST UNITED AC 2008; Chapter 2:Unit 2.9. [PMID: 18429317 DOI: 10.1002/0471140864.ps0209s50] [Citation(s) in RCA: 757] [Impact Index Per Article: 44.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Functional characterization of a protein sequence is a common goal in biology, and is usually facilitated by having an accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco, San Francisco, California, USA
| | | | | | | | | | | | | | | |
Collapse
|
3
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using Modeller. ACTA ACUST UNITED AC 2008; Chapter 5:Unit-5.6. [PMID: 18428767 DOI: 10.1002/0471250953.bi0506s15] [Citation(s) in RCA: 1792] [Impact Index Per Article: 105.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco San Francisco, California
| | - Ben Webb
- University of California at San Francisco San Francisco, California
| | | | - M S Madhusudhan
- University of California at San Francisco San Francisco, California
| | - David Eramian
- University of California at San Francisco San Francisco, California
| | - Min-Yi Shen
- University of California at San Francisco San Francisco, California
| | - Ursula Pieper
- University of California at San Francisco San Francisco, California
| | - Andrej Sali
- University of California at San Francisco San Francisco, California
| |
Collapse
|
4
|
Li J, Wang W. Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids. ACTA ACUST UNITED AC 2007; 50:392-402. [PMID: 17609897 DOI: 10.1007/s11427-007-0023-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2006] [Accepted: 09/19/2006] [Indexed: 10/23/2022]
Abstract
Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned sequences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitution matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9.
Collapse
Affiliation(s)
- Jing Li
- National Laboratory of Solid State Microstructure and Department of Physics, Nanjing University, Nanjing, 210093, China
| | | |
Collapse
|
5
|
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state of the art by a number of specific examples.
Collapse
|
6
|
Miyazawa S, Jernigan RL. How effective for fold recognition is a potential of mean force that includes relative orientations between contacting residues in proteins? J Chem Phys 2006; 122:024901. [PMID: 15638624 DOI: 10.1063/1.1824012] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We estimate the statistical distribution of relative orientations between contacting residues from a database of protein structures and evaluate the potential of mean force for relative orientations between contacting residues. Polar angles and Euler angles are used to specify two degrees of directional freedom and three degrees of rotational freedom for the orientation of one residue relative to another in contacting residues, respectively. A local coordinate system affixed to each residue based only on main chain atoms is defined for fold recognition. The number of contacting residue pairs in the database will severely limit the resolution of the statistical distribution of relative orientations, if it is estimated by dividing space into cells and counting samples observed in each cell. To overcome such problems and to evaluate the fully anisotropic distributions of relative orientations as a function of polar and Euler angles, we choose a method in which the observed distribution is represented as a sum of delta functions each of which represents the observed orientation of a contacting residue, and is evaluated as a series expansion of spherical harmonics functions. The sample size limits the frequencies of modes whose expansion coefficients can be reliably estimated. High frequency modes are statistically less reliable than low frequency modes. Each expansion coefficient is separately corrected for the sample size according to suggestions from a Bayesian statistical analysis. As a result, many expansion terms can be utilized to evaluate orientational distributions. Also, unlike other orientational potentials, the uniform distribution is used for a reference distribution in evaluating a potential of mean force for each type of contacting residue pair from its orientational distribution, so that residue-residue orientations can be fully evaluated. It is shown by using decoy sets that the discrimination power of the orientational potential in fold recognition increases by taking account of the Euler angle dependencies and becomes comparable to that of a simple contact potential, and that the total energy potential taken as a simple sum of contact, orientation, and (phi,psi) potentials performs well to identify the native folds.
Collapse
Affiliation(s)
- Sanzo Miyazawa
- Faculty of Technology, Gunma University, Kiryu, Gunma 376-8515, Japan.
| | | |
Collapse
|
7
|
Abstract
Recently, we developed a pairwise structural alignment algorithm using realistic structural and environmental information (SAUCE). In this paper, we at first present an automatic fold hierarchical classification based on SAUCE alignments. This classification enables us to build a fold tree containing different levels of multiple structural profiles. Then a tree-based fold search algorithm is described. We applied this method to a group of structures with sequence identity less than 35% and did a series of leave one out tests. These tests are approximately comparable to fold recognition tests on superfamily level. Results show that fold recognition via a fold tree can be faster and better at detecting distant homologues than classic fold recognition methods.
Collapse
Affiliation(s)
- Yu Chen
- Bioinformatics Program, University of Michigan, Ann Arbor, Michigan 48109-1065, USA
| | | |
Collapse
|
8
|
Warren CM, Kani K, Landgraf R. The N-terminal domains of neuregulin 1 confer signal attenuation. J Biol Chem 2006; 281:27306-16. [PMID: 16825199 DOI: 10.1074/jbc.m512887200] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Degradation of activated ERBB receptors is an important mechanism for signal attenuation. However, compared with epidermal growth factor (EGF) receptor, the ERBB2/ERBB3 signaling pair is considered to be attenuation-deficient. The ERBB2/ERBB3 ligands of the neuregulin family rely on an EGF-like domain for signaling and are generated from larger membrane-bound precursors. In contrast to EGF, which is processed to yield a 6-kDa peptide ligand, mature neuregulins retain a variety of segments N-terminal to the EGF-like domain. Here we evaluate the role of the N-terminal domain of neuregulin 1 in signaling and turnover of ERBB2/ERBB3. Our data suggest that whereas the EGF-like domain of neuregulin 1 is required and sufficient for the formation of active receptor heterodimers, the presence of the N-terminal Ig-like domain is required for efficient signal attenuation. This manifests itself for both ERBB2 and ERBB3 but is more pronounced and coupled directly to degradation for ERBB3. When stimulated with only the EGF-like domain, ERBB3 shows degradation rates comparable with constitutive turnover, but stimulation with full-length neuregulin 1 resulted in receptor degradation at rates that are comparable with activated EGF receptor. Most of the enhancement in down-regulation was maintained after replacing the Ig-like domain with a thioredoxin protein of comparable size but different amino acid composition, suggesting that the physical presence but not specific properties of the Ig-like domain are needed. This sequence-independent effect of the N-terminal domain correlates with an enhanced ability of full-size neuregulin 1 to disrupt higher order oligomers of the ERBB3 extracellular domains in vitro.
Collapse
Affiliation(s)
- Carmen M Warren
- Department of Medicine, , Molecular Biology Institute, UCLA, Los Angeles, California 90095-1678, USA
| | | | | |
Collapse
|
9
|
Affiliation(s)
- Ninad Prabhu
- Johnson Research Foundation, Dept. of Biochemistry and Biophysics, University of Pennsylvania
| | - Kim Sharp
- Johnson Research Foundation, Dept. of Biochemistry and Biophysics, University of Pennsylvania
| |
Collapse
|
10
|
Larson SA, Hilser VJ. Analysis of the "thermodynamic information content" of a Homo sapiens structural database reveals hierarchical thermodynamic organization. Protein Sci 2005; 13:1787-801. [PMID: 15215522 PMCID: PMC2279918 DOI: 10.1110/ps.04706204] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Classification of the amounts and types of lower order structural elements in proteins is a prerequisite to effective comparisons between protein folds. In an effort to provide an additional vehicle for fold comparison, we present an alternative classification scheme whereby protein folds are represented in statistical thermodynamic terms in such a way as to illuminate the energetic building blocks within protein structures. The thermodynamic relationship is examined between amino acid sequences and the conformational ensembles for a database of 159 Homo sapiens protein structures ranging from 50 to 250 amino acids. Using hierarchical clustering, it is shown through fold-recognition experiments that (1) eight thermodynamic environmental descriptors sufficiently accounts for the energetic variation within the native state ensembles of the H. sapiens structural database, (2) an amino acid library of only six residue types is sufficient to encode >90% of the thermodynamic information required for fold specificity in the entire database, and (3) structural resolution of the statistically derived environments reveals sequential cooperative segments throughout the protein, which are independent of secondary structure. As the first level of thermodynamic organization in proteins, these segments represent the thermodynamic counterpart to secondary structure.
Collapse
Affiliation(s)
- Scott A Larson
- Department of Human Biological Chemistry and Genetics, 5.162 Medical Research Bldg., University of Texas Medical Branch, Galveston, TX 77555-1068, USA
| | | |
Collapse
|
11
|
EvDTree: structure-dependent substitution profiles based on decision tree classification of 3D environments. BMC Bioinformatics 2005; 6:4. [PMID: 15638949 PMCID: PMC545998 DOI: 10.1186/1471-2105-6-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2004] [Accepted: 01/10/2005] [Indexed: 12/04/2022] Open
Abstract
Background Structure-dependent substitution matrices increase the accuracy of sequence alignments when the 3D structure of one sequence is known, and are successful e.g. in fold recognition. We propose a new automated method, EvDTree, based on a decision tree algorithm, for automatic derivation of amino acid substitution probabilities from a set of sequence-structure alignments. The main advantage over other approaches is an unbiased automatic selection of the most informative structural descriptors and associated values or thresholds. This feature allows automatic derivation of structure-dependent substitution scores for any specific set of structures, without the need to empirically determine best descriptors and parameters. Results Decision trees for residue substitutions were constructed for each residue type from sequence-structure alignments extracted from the HOMSTRAD database. For each tree cluster, environment-dependent substitution profiles were derived. The resulting structure-dependent substitution scores were assessed using a criterion based on the mean ranking of observed substitution among all possible substitutions and in sequence-structure alignments. The automatically built EvDTree substitution scores provide significantly better results than conventional matrices and similar or slightly better results than other structure-dependent matrices. EvDTree has been applied to small disulfide-rich proteins as a test case to automatically derive specific substitutions scores providing better results than non-specific substitution scores. Analyses of the decision tree classifications provide useful information on the relative importance of different structural descriptors. Conclusions We propose a fully automatic method for the classification of structural environments and inference of structure-dependent substitution profiles. We show that this approach is more accurate than existing methods for various applications. The easy adaptation of EvDTree to any specific data set opens the way for class-specific structure-dependent substitution scores which can be used in threading-based remote homology searches.
Collapse
|
12
|
Pal D, Eisenberg D. Inference of Protein Function from Protein Structure. Structure 2005; 13:121-30. [PMID: 15642267 DOI: 10.1016/j.str.2004.10.015] [Citation(s) in RCA: 152] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2004] [Revised: 10/18/2004] [Accepted: 10/20/2004] [Indexed: 11/28/2022]
Abstract
Structural genomics has brought us three-dimensional structures of proteins with unknown functions. To shed light on such structures, we have developed ProKnow (http://www.doe-mbi.ucla.edu/Services/ProKnow/), which annotates proteins with Gene Ontology functional terms. The method extracts features from the protein such as 3D fold, sequence, motif, and functional linkages and relates them to function via the ProKnow knowledgebase of features, which links features to annotated functions via annotation profiles. Bayes' theorem is used to compute weights of the functions assigned, using likelihoods based on the extracted features. The description level of the assigned function is quantified by the ontology depth (from 1 = general to 9 = specific). Jackknife tests show approximately 89% correct assignments at ontology depth 1 and 40% at depth 9, with 93% coverage of 1507 distinct folded proteins. Overall, about 70% of the assignments were inferred correctly. This level of performance suggests that ProKnow is a useful resource in functional assessments of novel proteins.
Collapse
Affiliation(s)
- Debnath Pal
- UCLA-DOE Institute for Genomics and Proteomics, Los Angeles, CA 90095, USA
| | | |
Collapse
|
13
|
Mikyas Y, Makabi M, Raval-Fernandes S, Harrington L, Kickhoefer VA, Rome LH, Stewart PL. Cryoelectron microscopy imaging of recombinant and tissue derived vaults: localization of the MVP N termini and VPARP. J Mol Biol 2004; 344:91-105. [PMID: 15504404 DOI: 10.1016/j.jmb.2004.09.021] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2004] [Revised: 09/03/2004] [Accepted: 09/14/2004] [Indexed: 10/26/2022]
Abstract
The vault is a highly conserved ribonucleoprotein particle found in all higher eukaryotes. It has a barrel-shaped structure and is composed of the major vault protein (MVP); vault poly(ADP-ribose) polymerase (VPARP); telomerase-associated protein 1 (TEP1); and small untranslated RNA (vRNA). Although its strong conservation and high abundance indicate an important cellular role, the function of the vault is unknown. In humans, vaults have been implicated in multidrug resistance during chemotherapy. Recently, assembly of recombinant vaults has been established in insect cells expressing only MVP. Here, we demonstrate that co-expression of MVP with one or both of the other two vault proteins results in their co-assembly into regularly shaped vaults. Particles assembled from MVP with N-terminal peptide tags of various length are compared. Cryoelectron microscopy (cryoEM) and single-particle image reconstruction methods were used to determine the structure of nine recombinant vaults of various composition, as well as wild-type and TEP1-deficient mouse vaults. Recombinant vaults with MVP N-terminal peptide tags showed internal density that varied in size with the length of the tag. Reconstruction of a recombinant vault with a cysteine-rich tag revealed 48-fold rotational symmetry for the vault. A model is proposed for the organization of MVP within the vault with all of the MVP N termini interacting non-covalently at the vault midsection and 48 copies of MVP forming each half vault. CryoEM difference mapping localized VPARP to three density bands lining the inner surface of the vault. Difference maps designed to localize TEP1 showed only weak density inside of the caps, suggesting that TEP1 may interact with MVP via a small interaction region. In the absence of atomic-resolution structures for either VPARP or TEP1, fold recognition methods were applied. A total of 21 repeats were predicted for the TEP1 WD-repeat domain, suggesting an unusually large beta-propeller fold.
Collapse
Affiliation(s)
- Yeshi Mikyas
- Department of Molecular and Medical Pharmacology, Crump Institute for Molecular Imaging, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA
| | | | | | | | | | | | | |
Collapse
|
14
|
Becker T, Hritz J, Vogel M, Caliebe A, Bukau B, Soll J, Schleiff E. Toc12, a novel subunit of the intermembrane space preprotein translocon of chloroplasts. Mol Biol Cell 2004; 15:5130-44. [PMID: 15317846 PMCID: PMC524789 DOI: 10.1091/mbc.e04-05-0405] [Citation(s) in RCA: 82] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2004] [Accepted: 08/09/2004] [Indexed: 11/11/2022] Open
Abstract
Translocation of proteins across membranes is essential for the biogenesis of each cell and is achieved by proteinaceous complexes. We analyzed the translocation complex of the intermembrane space from chloroplasts and identified a 12-kDa protein associated with the Toc machinery. Toc12 is an outer envelope protein exposing a soluble domain into the intermembrane space. Toc12 contains a J-domain and stimulates the ATPase activity of DnaK. The conformational stability and the ability to stimulate Hsp70 are dependent on a disulfide bridge within the loop region of the J-domain, suggesting a redox-regulated activation of the chaperone. Toc12 is associated with Toc64 and Tic22. Its J-domain recruits the Hsp70 of outer envelope membrane to the intermembrane space translocon and facilitates its interaction to the preprotein.
Collapse
Affiliation(s)
- Thomas Becker
- Botanisches Institut, LMU München, 80638 Munich, Germany
| | | | | | | | | | | | | |
Collapse
|
15
|
Nishimura AL, Mitne-Neto M, Silva HCA, Richieri-Costa A, Middleton S, Cascio D, Kok F, Oliveira JRM, Gillingwater T, Webb J, Skehel P, Zatz M. A mutation in the vesicle-trafficking protein VAPB causes late-onset spinal muscular atrophy and amyotrophic lateral sclerosis. Am J Hum Genet 2004; 75:822-31. [PMID: 15372378 PMCID: PMC1182111 DOI: 10.1086/425287] [Citation(s) in RCA: 718] [Impact Index Per Article: 34.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2004] [Accepted: 08/20/2004] [Indexed: 12/11/2022] Open
Abstract
Motor neuron diseases (MNDs) are a group of neurodegenerative disorders with involvement of upper and/or lower motor neurons, such as amyotrophic lateral sclerosis (ALS), spinal muscular atrophy (SMA), progressive bulbar palsy, and primary lateral sclerosis. Recently, we have mapped a new locus for an atypical form of ALS/MND (atypical amyotrophic lateral sclerosis [ALS8]) at 20q13.3 in a large white Brazilian family. Here, we report the finding of a novel missense mutation in the vesicle-associated membrane protein/synaptobrevin-associated membrane protein B (VAPB) gene in patients from this family. Subsequently, the same mutation was identified in patients from six additional kindreds but with different clinical courses, such as ALS8, late-onset SMA, and typical severe ALS with rapid progression. Although it was not possible to link all these families, haplotype analysis suggests a founder effect. Members of the vesicle-associated proteins are intracellular membrane proteins that can associate with microtubules and that have been shown to have a function in membrane transport. These data suggest that clinically variable MNDs may be caused by a dysfunction in intracellular membrane trafficking.
Collapse
Affiliation(s)
- Agnes L. Nishimura
- Human Genome Research Center, Department of Biology, Biosciences Institute, São Paulo University, and Anesthesiology, Pain, and Intensive Care Department, Medical School of the Federal University of São Paulo, São Paulo; Genetics Service, Hospital of Rehabilitation of Craniofacial Anomalies, São Paulo University, Bauru, Brazil; Division of Neuroscience, University of Edinburgh, Edinburgh; and Institute for Genomics and Proteomics, Molecular Biology Institute, University of California–Los Angeles Department of Energy (UCLA-DOE), Los Angeles
| | - Miguel Mitne-Neto
- Human Genome Research Center, Department of Biology, Biosciences Institute, São Paulo University, and Anesthesiology, Pain, and Intensive Care Department, Medical School of the Federal University of São Paulo, São Paulo; Genetics Service, Hospital of Rehabilitation of Craniofacial Anomalies, São Paulo University, Bauru, Brazil; Division of Neuroscience, University of Edinburgh, Edinburgh; and Institute for Genomics and Proteomics, Molecular Biology Institute, University of California–Los Angeles Department of Energy (UCLA-DOE), Los Angeles
| | - Helga C. A. Silva
- Human Genome Research Center, Department of Biology, Biosciences Institute, São Paulo University, and Anesthesiology, Pain, and Intensive Care Department, Medical School of the Federal University of São Paulo, São Paulo; Genetics Service, Hospital of Rehabilitation of Craniofacial Anomalies, São Paulo University, Bauru, Brazil; Division of Neuroscience, University of Edinburgh, Edinburgh; and Institute for Genomics and Proteomics, Molecular Biology Institute, University of California–Los Angeles Department of Energy (UCLA-DOE), Los Angeles
| | - Antônio Richieri-Costa
- Human Genome Research Center, Department of Biology, Biosciences Institute, São Paulo University, and Anesthesiology, Pain, and Intensive Care Department, Medical School of the Federal University of São Paulo, São Paulo; Genetics Service, Hospital of Rehabilitation of Craniofacial Anomalies, São Paulo University, Bauru, Brazil; Division of Neuroscience, University of Edinburgh, Edinburgh; and Institute for Genomics and Proteomics, Molecular Biology Institute, University of California–Los Angeles Department of Energy (UCLA-DOE), Los Angeles
| | - Susan Middleton
- Human Genome Research Center, Department of Biology, Biosciences Institute, São Paulo University, and Anesthesiology, Pain, and Intensive Care Department, Medical School of the Federal University of São Paulo, São Paulo; Genetics Service, Hospital of Rehabilitation of Craniofacial Anomalies, São Paulo University, Bauru, Brazil; Division of Neuroscience, University of Edinburgh, Edinburgh; and Institute for Genomics and Proteomics, Molecular Biology Institute, University of California–Los Angeles Department of Energy (UCLA-DOE), Los Angeles
| | - Duilio Cascio
- Human Genome Research Center, Department of Biology, Biosciences Institute, São Paulo University, and Anesthesiology, Pain, and Intensive Care Department, Medical School of the Federal University of São Paulo, São Paulo; Genetics Service, Hospital of Rehabilitation of Craniofacial Anomalies, São Paulo University, Bauru, Brazil; Division of Neuroscience, University of Edinburgh, Edinburgh; and Institute for Genomics and Proteomics, Molecular Biology Institute, University of California–Los Angeles Department of Energy (UCLA-DOE), Los Angeles
| | - Fernando Kok
- Human Genome Research Center, Department of Biology, Biosciences Institute, São Paulo University, and Anesthesiology, Pain, and Intensive Care Department, Medical School of the Federal University of São Paulo, São Paulo; Genetics Service, Hospital of Rehabilitation of Craniofacial Anomalies, São Paulo University, Bauru, Brazil; Division of Neuroscience, University of Edinburgh, Edinburgh; and Institute for Genomics and Proteomics, Molecular Biology Institute, University of California–Los Angeles Department of Energy (UCLA-DOE), Los Angeles
| | - João R. M. Oliveira
- Human Genome Research Center, Department of Biology, Biosciences Institute, São Paulo University, and Anesthesiology, Pain, and Intensive Care Department, Medical School of the Federal University of São Paulo, São Paulo; Genetics Service, Hospital of Rehabilitation of Craniofacial Anomalies, São Paulo University, Bauru, Brazil; Division of Neuroscience, University of Edinburgh, Edinburgh; and Institute for Genomics and Proteomics, Molecular Biology Institute, University of California–Los Angeles Department of Energy (UCLA-DOE), Los Angeles
| | - Tom Gillingwater
- Human Genome Research Center, Department of Biology, Biosciences Institute, São Paulo University, and Anesthesiology, Pain, and Intensive Care Department, Medical School of the Federal University of São Paulo, São Paulo; Genetics Service, Hospital of Rehabilitation of Craniofacial Anomalies, São Paulo University, Bauru, Brazil; Division of Neuroscience, University of Edinburgh, Edinburgh; and Institute for Genomics and Proteomics, Molecular Biology Institute, University of California–Los Angeles Department of Energy (UCLA-DOE), Los Angeles
| | - Jeanette Webb
- Human Genome Research Center, Department of Biology, Biosciences Institute, São Paulo University, and Anesthesiology, Pain, and Intensive Care Department, Medical School of the Federal University of São Paulo, São Paulo; Genetics Service, Hospital of Rehabilitation of Craniofacial Anomalies, São Paulo University, Bauru, Brazil; Division of Neuroscience, University of Edinburgh, Edinburgh; and Institute for Genomics and Proteomics, Molecular Biology Institute, University of California–Los Angeles Department of Energy (UCLA-DOE), Los Angeles
| | - Paul Skehel
- Human Genome Research Center, Department of Biology, Biosciences Institute, São Paulo University, and Anesthesiology, Pain, and Intensive Care Department, Medical School of the Federal University of São Paulo, São Paulo; Genetics Service, Hospital of Rehabilitation of Craniofacial Anomalies, São Paulo University, Bauru, Brazil; Division of Neuroscience, University of Edinburgh, Edinburgh; and Institute for Genomics and Proteomics, Molecular Biology Institute, University of California–Los Angeles Department of Energy (UCLA-DOE), Los Angeles
| | - Mayana Zatz
- Human Genome Research Center, Department of Biology, Biosciences Institute, São Paulo University, and Anesthesiology, Pain, and Intensive Care Department, Medical School of the Federal University of São Paulo, São Paulo; Genetics Service, Hospital of Rehabilitation of Craniofacial Anomalies, São Paulo University, Bauru, Brazil; Division of Neuroscience, University of Edinburgh, Edinburgh; and Institute for Genomics and Proteomics, Molecular Biology Institute, University of California–Los Angeles Department of Energy (UCLA-DOE), Los Angeles
| |
Collapse
|
16
|
Leary RH, Rosen JB, Jambeck P. An optimal structure-discriminative amino acid index for protein fold recognition. Biophys J 2004; 86:411-9. [PMID: 14695283 PMCID: PMC1303806 DOI: 10.1016/s0006-3495(04)74117-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Abstract
Identifying the fold class of a protein sequence of unknown structure is a fundamental problem in modern biology. We apply a supervised learning algorithm to the classification of protein sequences with low sequence identity from a library of 174 structural classes created with the Combinatorial Extension structural alignment methodology. A class of rules is considered that assigns test sequences to structural classes based on the closest match of an amino acid index profile of the test sequence to a profile centroid for each class. A mathematical optimization procedure is applied to determine an amino acid index of maximal structural discriminatory power by maximizing the ratio of between-class to within-class profile variation. The optimal index is computed as the solution to a generalized eigenvalue problem, and its performance for fold classification is compared to that of other published indices. The optimal index has significantly more structural discriminatory power than all currently known indices, including average surrounding hydrophobicity, which it most closely resembles. It demonstrates >70% classification accuracy over all folds and nearly 100% accuracy on several folds with distinctive conserved structural features. Finally, there is a compelling universality to the optimal index in that it does not appear to depend strongly on the specific structural classes used in its computation.
Collapse
Affiliation(s)
- R H Leary
- San Diego Supercomputer Center, La Jolla, California 92093-0505, USA.
| | | | | |
Collapse
|
17
|
Goonesekere NCW, Lee B. Frequency of gaps observed in a structurally aligned protein pair database suggests a simple gap penalty function. Nucleic Acids Res 2004; 32:2838-43. [PMID: 15155852 PMCID: PMC419611 DOI: 10.1093/nar/gkh610] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Gap penalty is an important component of the scoring scheme that is needed when searching for homologous proteins and for accurate alignment of protein sequences. Most homology search and sequence alignment algorithms employ a heuristic 'affine gap penalty' scheme q + r x n, in which q is the penalty for opening a gap, r the penalty for extending it and n the gap length. In order to devise a more rational scoring scheme, we examined the pattern of gaps that occur in a database of structurally aligned protein domain pairs. We find that the logarithm of the frequency of gaps varies linearly with the length of the gap, but with a break at a gap of length 3, and is well approximated by two linear regression lines with R2 values of 1.0 and 0.99. The bilinear behavior is retained when gaps are categorized by secondary structures of the two residues flanking the gap. Similar results were obtained when another, totally independent, structurally aligned protein pair database was used. These results suggest a modification of the affine gap penalty function.
Collapse
Affiliation(s)
- Nalin C W Goonesekere
- Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 37, Room 5120, 37 Convent Drive MSC 4264, Bethesda, MD 20892-4264, USA
| | | |
Collapse
|
18
|
Reinhardt A, Eisenberg D. DPANN: Improved sequence to structure alignments following fold recognition. Proteins 2004; 56:528-38. [PMID: 15229885 DOI: 10.1002/prot.20144] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
In fold recognition (FR) a protein sequence of unknown structure is assigned to the closest known three-dimensional (3D) fold. Although FR programs can often identify among all possible folds the one a sequence adopts, they frequently fail to align the sequence to the equivalent residue positions in that fold. Such failures frustrate the next step in structure prediction, protein model building. Hence it is desirable to improve the quality of the alignments between the sequence and the identified structure. We have used artificial neural networks (ANN) to derive a substitution matrix to create alignments between a protein sequence and a protein structure through dynamic programming (DPANN: Dynamic Programming meets Artificial Neural Networks). The matrix is based on the amino acid type and the secondary structure state of each residue. In a database of protein pairs that have the same fold but lack sequences-similarity, DPANN aligns over 30% of all sequences to the paired structure, resembling closely the structural superposition of the pair. In over half of these cases the DPANN alignment is close to the structural superposition, although the initial alignment from the step of fold recognition is not close. Conversely, the alignment created during fold recognition outperforms DPANN in only 10% of all cases. Thus application of DPANN after fold recognition leads to substantial improvements in alignment accuracy, which in turn provides more useful templates for the modeling of protein structures. In the artificial case of using actual instead of predicted secondary structures for the probe protein, over 50% of the alignments are successful.
Collapse
|
19
|
Kleiger G, Panina EM, Mallick P, Eisenberg D. PFIT and PFRIT: bioinformatic algorithms for detecting glycosidase function from structure and sequence. Protein Sci 2003; 13:221-9. [PMID: 14691237 PMCID: PMC2286537 DOI: 10.1110/ps.03274104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
The identification of the enzymes involved in the metabolism of simple and complex carbohydrates presents one bioinformatic challenge in the post-genomic era. Here, we present the PFIT and PFRIT algorithms for identifying those proteins adopting the alpha/beta barrel fold that function as glycosidases. These algorithms are based on the observation that proteins adopting the alpha/beta barrel fold share positions in their tertiary structures having equivalent sets of atomic interactions. These are conserved tertiary interaction positions, which have been implicated in both structure and function. Glycosidases adopting the alpha/beta barrel fold share more conserved tertiary interactions than alpha/beta barrel proteins having other functions. The enrichment pattern of conserved tertiary interactions in the glycosidases is the information that PFIT and PFRIT use to predict whether any given alpha/beta barrel will function as a glycosidase or not. Using as a test set a database of 19 glycosidase and 45 nonglycosidase alpha/beta barrel proteins with low sequence similarity, PFIT and PFRIT can correctly predict glycosidase function for 84% of the proteins known to function as glycosidases. PFIT and PFRIT incorrectly predict glycosidase function for 25% of the nonglycosidases. The program PSI-BLAST can also correctly identify 84% of the 19 glycosidases, however, it incorrectly predicts glycosidase function for 50% of the nonglycosidases (twofold greater than PFIT and PFRIT). Overall, we demonstrate that the structure-based PFIT and PFRIT algorithms are both more selective and sensitive for predicting glycosidase function than the sequence-based PSI-BLAST algorithm.
Collapse
Affiliation(s)
- Gary Kleiger
- Howard Hughes Medical Institute, University of California, Los Angeles-Department Of Energy, Institute of Genomics and Proteomics, UCLA, Los Angeles, California 90095, USA
| | | | | | | |
Collapse
|
20
|
Kong Y, Ma J. A structural-informatics approach for mining beta-sheets: locating sheets in intermediate-resolution density maps. J Mol Biol 2003; 332:399-413. [PMID: 12948490 DOI: 10.1016/s0022-2836(03)00859-3] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Here, we report a new computational method, called sheetminer, for mining beta-sheets in the density maps at intermediate resolutions of 6 to 10A. The method employs a multi-step ad hoc morphological analysis of density maps to identify the unique characteristics of beta-sheets. It was tested on density maps from 12 protein crystal structures that were artificially blurred to intermediate resolutions. There are a total of 35 independent beta-sheets with a wide distribution of morphology. The method successfully located 34 of them and missed only one. The method was also applied to an experimental 9A electron cryomicroscopic structure and an 8A X-ray density map. In both cases, the sheet-searching results were found to agree very well with known high-resolution crystal structures. Collectively, these results demonstrate clearly the robustness of sheetminer in locating the regions belonging to beta-sheets in the intermediate-resolution density maps. Furthermore, sheetminer is completely complementary to all other existing computational methods, including helixhunter and threading algorithms. Their combined usage has the potential to significantly enhance the computational modeling capacity for a much more complete interpretation of structural data at intermediate resolutions, from which extraction of functional information would be more effective. This is particularly important in the field of structural genomics, in which the fast screening approach may not always yield crystals that diffract to atomic resolution. An exciting future application of sheetminer is as a valuable tool for revealing the structures of amyloid fibrils that are rich in beta-motifs.
Collapse
Affiliation(s)
- Yifei Kong
- Graduate Program of Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, TX 77030, USA
| | | |
Collapse
|