1
|
Xu J, Ji Z, Wang C, Xu F, Wang F, Zheng Y, Tang Y, Wei Z, Zhao T, Zhao K. WATER-SOAKED SPOT1 Controls Chloroplast Development and Leaf Senescence via Regulating Reactive Oxygen Species Homeostasis in Rice. FRONTIERS IN PLANT SCIENCE 2022; 13:918673. [PMID: 35693165 PMCID: PMC9178249 DOI: 10.3389/fpls.2022.918673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 04/26/2022] [Indexed: 06/15/2023]
Abstract
Transmembrane kinases (TMKs) play important roles in plant growth and signaling cascades of phytohormones. However, its function in the regulation of early leaf senescence (ELS) of plants remains unknown. Here, we report the molecular cloning and functional characterization of the WATER-SOAKED SPOT1 gene which encodes a protein belongs to the TMK family and controls chloroplast development and leaf senescence in rice (Oryza sativa L.). The water-soaked spot1 (oswss1) mutant displays water-soaked spots which subsequently developed into necrotic symptoms at the tillering stage. Moreover, oswss1 exhibits slightly rolled leaves with irregular epidermal cells, decreased chlorophyll contents, and defective stomata and chloroplasts as compared with the wild type. Map-based cloning revealed that OsWSS1 encodes transmembrane kinase TMK1. Genetic complementary experiments verified that a Leu396Pro amino acid substitution, residing in the highly conserved region of leucine-rich repeat (LRR) domain, was responsible for the phenotypes of oswss1. OsWSS1 was constitutively expressed in all tissues and its encoded protein is localized to the plasma membrane. Mutation of OsWSS1 led to hyper-accumulation of reactive oxygen species (ROS), more severe DNA fragmentation, and cell death than that of the wild-type control. In addition, we found that the expression of senescence-associated genes (SAGs) was significantly higher, while the expression of genes associated with chloroplast development and photosynthesis was significantly downregulated in oswss1 as compared with the wild type. Taken together, our results demonstrated that OsWSS1, a member of TMKs, plays a vital role in the regulation of ROS homeostasis, chloroplast development, and leaf senescence in rice.
Collapse
Affiliation(s)
- Jiangmin Xu
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Life Sciences, Northwest A&F University, Xianyang, China
| | - Zhiyuan Ji
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Chunlian Wang
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Feifei Xu
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Fujun Wang
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Yuhan Zheng
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yongchao Tang
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zheng Wei
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Tianyong Zhao
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Life Sciences, Northwest A&F University, Xianyang, China
| | - Kaijun Zhao
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
2
|
Miotto M, Di Rienzo L, Corsi P, Ruocco G, Raimondo D, Milanetti E. Simulated Epidemics in 3D Protein Structures to Detect Functional Properties. J Chem Inf Model 2020; 60:1884-1891. [PMID: 32011881 DOI: 10.1021/acs.jcim.9b01027] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The outcome of an epidemic is closely related to the network of interactions between individuals. Likewise, protein functions depend on the 3D arrangement of their residues and the underlying energetic interaction network. Borrowing ideas from the theoretical framework that has been developed to address the spreading of real diseases, we study for the first time the diffusion of a fictitious epidemic inside the protein nonbonded interaction network, aiming to study network features and properties. Our approach allows us to probe the overall stability and the capability of propagating information in complex 3D structures, proving to be very efficient in addressing different problems, from the assessment of thermal stability to the identification of functional sites.
Collapse
Affiliation(s)
- Mattia Miotto
- Department of Physics, Sapienza University, Rome 00185, Italy.,Center for Life Nanoscience, Istituto Italiano di Tecnologia, Rome 00161, Italy
| | | | - Pietro Corsi
- Department of Science, Roma Tre University, Rome 00154, Italy
| | - Giancarlo Ruocco
- Department of Physics, Sapienza University, Rome 00185, Italy.,Center for Life Nanoscience, Istituto Italiano di Tecnologia, Rome 00161, Italy
| | - Domenico Raimondo
- Department of Molecular Medicine, Sapienza University, Rome 00161, Italy
| | - Edoardo Milanetti
- Department of Physics, Sapienza University, Rome 00185, Italy.,Center for Life Nanoscience, Istituto Italiano di Tecnologia, Rome 00161, Italy
| |
Collapse
|
3
|
The history of the CATH structural classification of protein domains. Biochimie 2015; 119:209-17. [PMID: 26253692 PMCID: PMC4678953 DOI: 10.1016/j.biochi.2015.08.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Accepted: 08/01/2015] [Indexed: 11/21/2022]
Abstract
This article presents a historical review of the protein structure classification database CATH. Together with the SCOP database, CATH remains comprehensive and reasonably up-to-date with the now more than 100,000 protein structures in the PDB. We review the expansion of the CATH and SCOP resources to capture predicted domain structures in the genome sequence data and to provide information on the likely functions of proteins mediated by their constituent domains. The establishment of comprehensive function annotation resources has also meant that domain families can be functionally annotated allowing insights into functional divergence and evolution within protein families. We present a historical review of the protein structure database CATH. We review the expansion of the CATH and SCOP resources with sequence data and functional annotations. How functional annotation resources allow insights into functional divergence and evolution within protein families.
Collapse
|
4
|
He Z, Alazmi M, Zhang J, Xu D. Protein structural model selection by combining consensus and single scoring methods. PLoS One 2013; 8:e74006. [PMID: 24023923 PMCID: PMC3759460 DOI: 10.1371/journal.pone.0074006] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2013] [Accepted: 07/26/2013] [Indexed: 01/28/2023] Open
Abstract
Quality assessment (QA) for predicted protein structural models is an important and challenging research problem in protein structure prediction. Consensus Global Distance Test (CGDT) methods assess each decoy (predicted structural model) based on its structural similarity to all others in a decoy set and has been proved to work well when good decoys are in a majority cluster. Scoring functions evaluate each single decoy based on its structural properties. Both methods have their merits and limitations. In this paper, we present a novel method called PWCom, which consists of two neural networks sequentially to combine CGDT and single model scoring methods such as RW, DDFire and OPUS-Ca. Specifically, for every pair of decoys, the difference of the corresponding feature vectors is input to the first neural network which enables one to predict whether the decoy-pair are significantly different in terms of their GDT scores to the native. If yes, the second neural network is used to decide which one of the two is closer to the native structure. The quality score for each decoy in the pool is based on the number of winning times during the pairwise comparisons. Test results on three benchmark datasets from different model generation methods showed that PWCom significantly improves over consensus GDT and single scoring methods. The QA server (MUFOLD-Server) applying this method in CASP 10 QA category was ranked the second place in terms of Pearson and Spearman correlation performance.
Collapse
Affiliation(s)
- Zhiquan He
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Missouri, United States of America
| | - Meshari Alazmi
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Missouri, United States of America
| | - Jingfen Zhang
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Missouri, United States of America
| | - Dong Xu
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Missouri, United States of America
- * E-mail:
| |
Collapse
|
5
|
Sippl MJ. Fold space unlimited. Curr Opin Struct Biol 2009; 19:312-20. [DOI: 10.1016/j.sbi.2009.03.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2009] [Revised: 02/16/2009] [Accepted: 03/16/2009] [Indexed: 11/25/2022]
|
6
|
Pieper U, Eswar N, Webb BM, Eramian D, Kelly L, Barkan DT, Carter H, Mankoo P, Karchin R, Marti-Renom MA, Davis FP, Sali A. MODBASE, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 2009; 37:D347-54. [PMID: 18948282 PMCID: PMC2686492 DOI: 10.1093/nar/gkn791] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2008] [Accepted: 10/08/2008] [Indexed: 11/14/2022] Open
Abstract
MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/).
Collapse
Affiliation(s)
- Ursula Pieper
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Narayanan Eswar
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Ben M. Webb
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - David Eramian
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Libusha Kelly
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - David T. Barkan
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Hannah Carter
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Parminder Mankoo
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Rachel Karchin
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Marc A. Marti-Renom
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Fred P. Davis
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| |
Collapse
|
7
|
Personalizing foods: is genotype necessary? Curr Opin Biotechnol 2008; 19:121-8. [DOI: 10.1016/j.copbio.2008.02.010] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2007] [Revised: 02/15/2008] [Accepted: 02/18/2008] [Indexed: 01/18/2023]
|
8
|
Yura K, Yamaguchi A, Go M. Coverage of whole proteome by structural genomics observed through protein homology modeling database. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2006; 7:65-76. [PMID: 17146617 PMCID: PMC1769342 DOI: 10.1007/s10969-006-9010-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2006] [Accepted: 08/08/2006] [Indexed: 11/07/2022]
Abstract
We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE ( http://daisy.nagahama-i-bio.ac.jp/Famsbase/ ), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics.
Collapse
Affiliation(s)
- Kei Yura
- Quantum Bioinformatics Team, Center for Computational Science and Engineering, Japan Atomic Energy Agency, Kyoto 619-0215, Japan.
| | | | | |
Collapse
|
9
|
Sam V, Tai CH, Garnier J, Gibrat JF, Lee B, Munson PJ. ROC and confusion analysis of structure comparison methods identify the main causes of divergence from manual protein classification. BMC Bioinformatics 2006; 7:206. [PMID: 16613604 PMCID: PMC1513609 DOI: 10.1186/1471-2105-7-206] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2005] [Accepted: 04/13/2006] [Indexed: 11/30/2022] Open
Abstract
Background Current classification of protein folds are based, ultimately, on visual inspection of similarities. Previous attempts to use computerized structure comparison methods show only partial agreement with curated databases, but have failed to provide detailed statistical and structural analysis of the causes of these divergences. Results We construct a map of similarities/dissimilarities among manually defined protein folds, using a score cutoff value determined by means of the Receiver Operating Characteristics curve. It identifies folds which appear to overlap or to be "confused" with each other by two distinct similarity measures. It also identifies folds which appear inhomogeneous in that they contain apparently dissimilar domains, as measured by both similarity measures. At a low (1%) false positive rate, 25 to 38% of domain pairs in the same SCOP folds do not appear similar. Our results suggest either that some of these folds are defined using criteria other than purely structural consideration or that the similarity measures used do not recognize some relevant aspects of structural similarity in certain cases. Specifically, variations of the "common core" of some folds are severe enough to defeat attempts to automatically detect structural similarity and/or to lead to false detection of similarity between domains in distinct folds. Structures in some folds vary greatly in size because they contain varying numbers of a repeating unit, while similarity scores are quite sensitive to size differences. Structures in different folds may contain similar substructures, which produce false positives. Finally, the common core within a structure may be too small relative to the entire structure, to be recognized as the basis of similarity to another. Conclusion A detailed analysis of the entire available protein fold space by two automated similarity methods reveals the extent and the nature of the divergence between the automatically determined similarity/dissimilarity and the manual fold type classifications. Some of the observed divergences can probably be addressed with better structure comparison methods and better automatic, intelligent classification procedures. Others may be intrinsic to the problem, suggesting a continuous rather than discrete protein fold space.
Collapse
Affiliation(s)
- Vichetra Sam
- Mathematical and Statistical Computing Laboratory, DCB, CIT, NIH, DHHS, Bethesda, MD, USA
| | - Chin-Hsien Tai
- Laboratory of Molecular Biology, CCR, NCI, NIH, DHHS, Bethesda, MD, USA
| | - Jean Garnier
- Mathematical and Statistical Computing Laboratory, DCB, CIT, NIH, DHHS, Bethesda, MD, USA
- Mathematique Informatique et Genome, INRA, Jouy-en-Josas, France
| | | | - Byungkook Lee
- Laboratory of Molecular Biology, CCR, NCI, NIH, DHHS, Bethesda, MD, USA
| | - Peter J Munson
- Mathematical and Statistical Computing Laboratory, DCB, CIT, NIH, DHHS, Bethesda, MD, USA
| |
Collapse
|
10
|
Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, Rossi A, Marti-Renom M, Karchin R, Webb BM, Eramian D, Shen MY, Kelly L, Melo F, Sali A. MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 2006; 34:D291-5. [PMID: 16381869 PMCID: PMC1347422 DOI: 10.1093/nar/gkj059] [Citation(s) in RCA: 202] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
MODBASE () is a database of annotated comparative protein structure models for all available protein sequences that can be matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on MODELLER for fold assignment, sequence–structure alignment, model building and model assessment (). MODBASE is updated regularly to reflect the growth in protein sequence and structure databases, and improvements in the software for calculating the models. MODBASE currently contains 3 094 524 reliable models for domains in 1 094 750 out of 1 817 889 unique protein sequences in the UniProt database (July 5, 2005); only models based on statistically significant alignments and models assessed to have the correct fold despite insignificant alignments are included. MODBASE also allows users to generate comparative models for proteins of interest with the automated modeling server MODWEB (). Our other resources integrated with MODBASE include comprehensive databases of multiple protein structure alignments (DBAli, ), structurally defined ligand binding sites and structurally defined binary domain interfaces (PIBASE, ) as well as predictions of ligand binding sites, interactions between yeast proteins, and functional consequences of human nsSNPs (LS-SNP, ).
Collapse
Affiliation(s)
- Ursula Pieper
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Narayanan Eswar
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Fred P. Davis
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Hannes Braberg
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - M. S. Madhusudhan
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Andrea Rossi
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Marc Marti-Renom
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Rachel Karchin
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Ben M. Webb
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - David Eramian
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Graduate Group in Biophysics, University of CaliforniaSan Francisco, CA, USA
| | - Min-Yi Shen
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Libusha Kelly
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Graduate Group in Biological and Medical Informatics, University of CaliforniaSan Francisco, CA, USA
| | - Francisco Melo
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de ChileAlameda 340, Santiago, Chile
| | - Andrej Sali
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- To whom correspondence should be addressed. Tel: +1 415 514 4227; Fax: +1 415 514 4231;
| |
Collapse
|
11
|
Yang J, Dong XC, Leng Y. Conformation biases of amino acids based on tripeptide microenvironment from PDB database. J Theor Biol 2005; 240:374-84. [PMID: 16290902 DOI: 10.1016/j.jtbi.2005.09.025] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2005] [Revised: 09/28/2005] [Accepted: 09/29/2005] [Indexed: 11/30/2022]
Abstract
We have constructed a bank (FTTP) of tendentious factors of three states of three-peptide units from PDB database based on conformational dihedral angle library and demonstrated that amino acid biases toward protein secondary structure are present in natural protein sequences. Our research results reveal that 20 standard amino acids fall into three groups: nine residues inclined to alpha-helix with a common character (e.g. direct side chain aliphatic residues or positive/negative charged residues) arrange in three grades, viz EA, QKRLD, and MN, in turn; seven residues are apt to beta-strand with 2'-branched side chain aliphatic residues or benzyl-included residues, namely PV, IYTC, and F, in three ranks; and four residues SHWG show a double tendency to both alpha and beta. Noticeably, proline has the strongest ability to form extended conformation, especially the Re value up to 9.5298 at position 3 (Table 3). Thus, biases of codons show an evident tendency in protein folding, where GC-rich codons are mainly in charge of forming contracted conformation, especially the codon's first letter plays a dominant role in translating the genomic GC signature into protein sequences and structures. So, biases of amino acids will play an important role in protein folding, folding codons, refining domain, structure prediction, and structural genomics/proteomics.
Collapse
Affiliation(s)
- Jie Yang
- Life Science College, State Key Laboratory of Pharmaceutical Biotechnology, Nanjing University, Nanjing 210093, PR China.
| | | | | |
Collapse
|
12
|
Vlahoviček K, Pintar A, Parthasarathi L, Carugo O, Pongor S. CX, DPX and PRIDE: WWW servers for the analysis and comparison of protein 3D structures. Nucleic Acids Res 2005; 33:W252-4. [PMID: 15980464 PMCID: PMC1160123 DOI: 10.1093/nar/gki362] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The WWW servers at http://www.icgeb.org/protein/ are dedicated to the analysis of protein 3D structures submitted by the users as the Protein Data Bank (PDB) files. CX computes an atomic protrusion index that makes it possible to highlight the protruding atoms within a protein 3D structure. DPX calculates a depth index for the buried atoms and makes it possible to analyze the distribution of buried residues. CX and DPX return PDB files containing the calculated indices that can then be visualized using standard programs, such as Swiss-PDBviewer and Rasmol. PRIDE compares 3D structures using a fast algorithm based on the distribution of inter-atomic distances. The options include pairwise as well as multiple comparisons, and fold recognition based on searching the CATH fold database.
Collapse
Affiliation(s)
| | | | | | | | - Sándor Pongor
- To whom correspondence should be addressed. Fax: +39 04 226 555;
| |
Collapse
|
13
|
Standley DM, Toh H, Nakamura H. Detecting local structural similarity in proteins by maximizing number of equivalent residues. Proteins 2005; 57:381-91. [PMID: 15340925 DOI: 10.1002/prot.20211] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A new algorithm for superimposing protein structures based on maximizing the number of spatially equivalent residues is introduced. The algorithm works in three distinct steps. First, the optimal residue map is calculated by structural alignment. By default, the double dynamic programming algorithm, as implemented in the program ASH, was used for the structure alignment step, but we also present results based on alignments imported from three other programs (Dali, CE, and VAST).Second, the structures are spatially superimposed such that the effective number of equivalent residues (NER)--aligned residue pairs that can be spatially overlapped--is maximized. The NER score is an analytic, differentiable similarity function that rewards spatially equivalent residues but ignores non-equivalent ones. Maximization of the NER score results in accurate superpositions in cases where root mean square deviation (RMSD) minimization fails. Third, the NER function is used in conjunction with traditional dynamic programming to realign the structures based on the proximity of residues in the superposition. Results are presented for a wide range of superposition problems and compared to results from Dali, CE, and VAST. In addition, several structure-structure pairs that show only partial similarity are discussed, and results are compared to those from the LGA, SARF2, and ThreeCa programs.
Collapse
Affiliation(s)
- Daron M Standley
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka, Japan.
| | | | | |
Collapse
|
14
|
Zhang Z, Kochhar S, Grigorov M. Exploring the sequence-structure protein landscape in the glycosyltransferase family. Protein Sci 2004; 12:2291-302. [PMID: 14500887 PMCID: PMC2366918 DOI: 10.1110/ps.03131303] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D-PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence-similarity search methods (i.e., BLAST and PSI-BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI-BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family.
Collapse
Affiliation(s)
- Ziding Zhang
- Nestlé Research Center, CH-1000 Lausanne 26, Switzerland.
| | | | | |
Collapse
|
15
|
Marti‐Renom MA, Madhusudhan M, Eswar N, Pieper U, Shen M, Sali A, Fiser A, Mirkovic N, John B, Stuart A. Modeling Protein Structure from its Sequence. ACTA ACUST UNITED AC 2003. [DOI: 10.1002/0471250953.bi0501s03] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Marc A. Marti‐Renom
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - M.S. Madhusudhan
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Narayanan Eswar
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Ursula Pieper
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Min‐yi Shen
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Andrej Sali
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Andras Fiser
- Department of Biochemistry and Seaver Foundation Center for Bioinformatics Albert Einstein College of Medicine Bronx New York
| | - Nebojsa Mirkovic
- Laboratory of Molecular Biophysics The Rockefeller University New York New York
| | - Bino John
- Laboratory of Molecular Biophysics The Rockefeller University New York New York
| | - Ashley Stuart
- Laboratory of Molecular Biophysics The Rockefeller University New York New York
| |
Collapse
|
16
|
Wingren C, Edmundson AB, Borrebaeck CAK. Designing proteins to crystallize through beta-strand pairing. Protein Eng Des Sel 2003; 16:255-64. [PMID: 12736368 DOI: 10.1093/proeng/gzg038] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Inherent difficulties in growing protein crystals are major concerns within structural biology and particularly in structural proteomics. Here, we describe a novel approach of engineering target proteins by surface mutagenesis to increase the odds of crystallizing the molecules. To this end, we have exploited our recent triad-hypothesis using proteins with crystallographically defined beta-structures as the principal models. Crystal packing analyses of 182 protein structures belonging to 21 different superfamilies implied that the propensities to crystallize could be engineered into target proteins by replacing short segments, 5-6 residues, of their beta-strands with 'cassettes' of suitable packing motifs. These packing motifs will generate specific crystal packing interactions that promote crystallization. Key features of the primary and tertiary structures of such packing motifs have been identified for immunoglobulins. Further, packing motifs have been engineered successfully into six model antibodies without disturbing their capabilities to be produced, their immunoreactivity and their overall structure. Preliminary crystallization analyses have also been performed. Taken together, the procedures outline a rational protocol for crystallizing proteins by surface mutagenesis. The importance of these findings is discussed in relation to the crystallization of proteins in general.
Collapse
Affiliation(s)
- Christer Wingren
- Department of Immunotechnology, Lund University, P.O. Box 7031, SE-220 07 Lund, Sweden.
| | | | | |
Collapse
|
17
|
Yamaguchi A, Iwadate M, Suzuki EI, Yura K, Kawakita S, Umeyama H, Go M. Enlarged FAMSBASE: protein 3D structure models of genome sequences for 41 species. Nucleic Acids Res 2003; 31:463-8. [PMID: 12520053 PMCID: PMC165564 DOI: 10.1093/nar/gkg117] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Enlarged FAMSBASE is a relational database of comparative protein structure models for the whole genome of 41 species, presented in the GTOP database. The models are calculated by Full Automatic Modeling System (FAMS). Enlarged FAMSBASE provides a wide range of query keys, such as name of ORF (open reading frame), ORF keywords, Protein Data Bank (PDB) ID, PDB heterogen atoms and sequence similarity. Heterogen atoms in PDB include cofactors, ligands and other factors that interact with proteins, and are a good starting point for analyzing interactions between proteins and other molecules. The data may also work as a template for drug design. The present number of ORFs with protein 3D models in FAMSBASE is 183 805, and the database includes an average of three models for each ORF. FAMSBASE is available at http://famsbase.bio.nagoya-u.ac.jp/famsbase/.
Collapse
Affiliation(s)
- Akihiro Yamaguchi
- Division of Biological Science, Graduate School of Science, Nagoya University, Nagoya 464-8602, Japan
| | | | | | | | | | | | | |
Collapse
|
18
|
Ortiz AR, Strauss CEM, Olmea O. MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci 2002; 11:2606-21. [PMID: 12381844 PMCID: PMC2373724 DOI: 10.1110/ps.0215902] [Citation(s) in RCA: 320] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Advances in structural genomics and protein structure prediction require the design of automatic, fast, objective, and well benchmarked methods capable of comparing and assessing the similarity of low-resolution three-dimensional structures, via experimental or theoretical approaches. Here, a new method for sequence-independent structural alignment is presented that allows comparison of an experimental protein structure with an arbitrary low-resolution protein tertiary model. The heuristic algorithm is given and then used to show that it can describe random structural alignments of proteins with different folds with good accuracy by an extreme value distribution. From this observation, a structural similarity score between two proteins or two different conformations of the same protein is derived from the likelihood of obtaining a given structural alignment by chance. The performance of the derived score is then compared with well established, consensus manual-based scores and data sets. We found that the new approach correlates better than other tools with the gold standard provided by a human evaluator. Timings indicate that the algorithm is fast enough for routine use with large databases of protein models. Overall, our results indicate that the new program (MAMMOTH) will be a good tool for protein structure comparisons in structural genomics applications. MAMMOTH is available from our web site at http://physbio.mssm.edu/~ortizg/.
Collapse
Affiliation(s)
- Angel R Ortiz
- Department of Physiology and Biophysics, Mount Sinai School of Medicine, New York University, New York, New York 10029, USA.
| | | | | |
Collapse
|
19
|
Williams MG, Shirai H, Shi J, Nagendra HG, Mueller J, Mizuguchi K, Miguel RN, Lovell SC, Innis CA, Deane CM, Chen L, Campillo N, Burke DF, Blundell TL, de Bakker PI. Sequence-structure homology recognition by iterative alignment refinement and comparative modeling. Proteins 2002; Suppl 5:92-7. [PMID: 11835486 DOI: 10.1002/prot.1169] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Our approach to fold recognition for the fourth critical assessment of techniques for protein structure prediction (CASP4) experiment involved the use of the FUGUE sequence-structure homology recognition program (http://www-cryst.bioc.cam.ac.uk/fugue), followed by model building. We treat models as hypotheses and examine these to determine whether they explain the available data. Our method depends heavily on environment-specific substitution tables derived from our database of structural alignments of homologous proteins (HOMSTRAD, http://www-cryst.bioc.cam.ac.uk/homstrad/). FUGUE uses these tables to incorporate structural information into profiles created from HOMSTRAD alignments that are matched against a profile created for the target from multiple sequence alignment. In addition, environment-specific substitution tables are used throughout the modeling procedure and as part of the model evaluation. Annotation of sequence alignments with JOY, to reflect local structural features, proved valuable, both for modifying hypotheses, and for rejecting predictions when the expected pattern of conservation is not observed. Our stringency in rejecting incorrect predictions led us to submit a relatively small number of models, including only a low number of false positives, resulting in a high average score.
Collapse
Affiliation(s)
- M G Williams
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Pieper U, Eswar N, Stuart AC, Ilyin VA, Sali A. MODBASE, a database of annotated comparative protein structure models. Nucleic Acids Res 2002; 30:255-9. [PMID: 11752309 PMCID: PMC99112 DOI: 10.1093/nar/30.1.255] [Citation(s) in RCA: 84] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2001] [Revised: 10/02/2001] [Accepted: 10/02/2001] [Indexed: 11/12/2022] Open
Abstract
MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10(-4)) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server.
Collapse
Affiliation(s)
- Ursula Pieper
- Laboratories of Molecular Biophysics, The Pels Family Center for Biochemistry and Structural Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10021, USA
| | | | | | | | | |
Collapse
|
21
|
Abstract
Conventional fold recognition techniques rely mainly on the analysis of the entire sequence of a protein. We present an MBA method to improve performance of any conventional sequence-based fold assignment. The method uses sequence motifs, such as those defined in the Prosite database, and the SwissProt annotation of the fold library. When combined with a simple SDP method, the coverage of MBA is comparable to the results obtained with PSI-BLAST. However, the set of the MBA predictions is significantly different from that of PSI-BLAST, leading to a 40% increase of the coverage for the combined MBA/PSI-BLAST method. The MBA approach can be easily adopted to include the results of sequence-independent function prediction methods and alternative motif and annotation databases. The method is available through the web server localized at http://www.doe-mbi.ucla.edu/mba.
Collapse
Affiliation(s)
- L Salwinski
- Department of Chemistry, UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, UCLA, Los Angeles, California 90095-1570, USA
| | | |
Collapse
|
22
|
Abstract
Structural genomics projects aim to provide an experimental or computational three-dimensional model structure for all of the tractable macromolecules that are encoded by complete genomes. To this end, pilot centres worldwide are now exploring the feasibility of large-scale structure determination. Their experimental structures and computational models are expected to yield insight into the molecular function and mechanism of thousands of proteins. The pervasiveness of this information is likely to change the use of structure in molecular biology and biochemistry.
Collapse
Affiliation(s)
- S E Brenner
- Department of Plant and Microbial Biology, University of California, 461A Koshland Hall, Berkeley, California 94720-3102, USA.
| |
Collapse
|
23
|
Affiliation(s)
- R T Lee
- Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
24
|
Carugo O, Pongor S. A normalized root-mean-square distance for comparing protein three-dimensional structures. Protein Sci 2001; 10:1470-3. [PMID: 11420449 PMCID: PMC2374114 DOI: 10.1110/ps.690101] [Citation(s) in RCA: 245] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
The degree of similarity of two protein three-dimensional structures is usually measured with the root-mean-square distance between equivalent atom pairs. Such a similarity measure depends on the dimension of the proteins, that is, on the number of equivalent atom pairs. The present communication presents a simple procedure to make the root-mean-square distances between pairs of three-dimensional structures independent of their dimensions. This normalization may be useful in evolutionary and fold classification studies as well as in simple comparisons between different structural models.
Collapse
Affiliation(s)
- O Carugo
- Protein Structure and Function Group, International Centre for Genetic Engineering and Biotechnology, Area Science Park, Padriciano 99, 34012 Trieste, Italy.
| | | |
Collapse
|
25
|
D'Alfonso G, Tramontano A, Lahm A. Structural conservation in single-domain proteins: implications for homology modeling. J Struct Biol 2001; 134:246-56. [PMID: 11551183 DOI: 10.1006/jsbi.2001.4351] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Large-scale sequencing projects are widening the gap between the known protein universe and the fraction for which structural information has been experimentally obtained. Through the application of homology (comparative) modeling and more general structure prediction techniques, this gap can, however, be narrowed, providing indirect structural information for a considerable number of proteins. Moreover, the estimated number of existing protein folds seems to be limited and many of these yet unknown folds should be discovered by dedicated large-scale structural genomics projects. Within this perspective, homology (comparative) modeling will gain in importance, as will the use of models derived by this technique. Here we discuss how well a sequence alignment, the most common starting point for generating a model, reflects the structural conservation between homologous proteins and we show that sequence information is able to direct construction of acceptable models as far as the structural core is concerned. We also show here that the regions surrounding insertions and deletions are much less conserved than the core and discuss the implications of this observation for loop modeling.
Collapse
|
26
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2001. [PMCID: PMC2447185 DOI: 10.1002/cfg.55] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
27
|
Erlandsen H, Abola EE, Stevens RC. Combining structural genomics and enzymology: completing the picture in metabolic pathways and enzyme active sites. Curr Opin Struct Biol 2000; 10:719-30. [PMID: 11114510 DOI: 10.1016/s0959-440x(00)00154-8] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
An important goal of structural genomics is to complete the structural analysis of all the enzymes in metabolic pathways and to understand the structural similarities and differences. A preliminary glimpse of this type of analysis was achieved before structural genomics efforts with the glycolytic pathway and efforts are underway for many other pathways, including that of catecholamine metabolism. Structural enzymology necessitates a complete structural characterization, even for highly homologous proteins (greater than 80% sequence homology), as every active site has distinct structural features and it is these active site differences that distinguish one enzyme from another. Short cuts with homology modeling cannot be taken with our current knowledge base. Each enzyme structure in a pathway needs to be determined, including structures containing bound substrates, cofactors, products and transition state analogs, in order to obtain a complete structural and functional understanding of pathway-related enzymes.
Collapse
Affiliation(s)
- H Erlandsen
- The Scripps Research Institute, Department of Molecular Biology, La Jolla, CA 92037, USA
| | | | | |
Collapse
|