1
|
The First Complete Genome Sequence of a Novel Tetrastichus brontispae RNA Virus-1 (TbRV-1). Viruses 2019; 11:v11030257. [PMID: 30871248 PMCID: PMC6466307 DOI: 10.3390/v11030257] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2019] [Revised: 03/09/2019] [Accepted: 03/10/2019] [Indexed: 11/17/2022] Open
Abstract
The complete sequence of a novel RNA virus isolated from Tetrastichus brontispae (TbRV-1) was determined to be 12,239 nucleotides in length with five non-overlapping, linearly arranged coding sequences (CDS), potentially encoding nucleoproteins, hypothetical proteins, matrix proteins, glycoproteins, and RNA-dependent RNA polymerases. Sequence analysis indicated that the RNA-dependent RNA polymerase of TbRV-1 shares a 65% nucleotide and 67% amino acid sequence identity with Hubei dimarhabdovirus 2, suggesting that TbRV-1 is a member of the dimarhabdovirus supergroup. This corresponded to the result of the phylogenetic analysis. The affiliation of TbRV-1 with members of the family Rhabdoviridae was further validated by similar transcription termination motifs (GGAACUUUUUUU) to the Drosophila sigmavirus. The prevalence of TbRV-1 in all tissues suggested that the virus was constitutive of, and not specific to, any wasp tissue. To our knowledge, this is the first report on the complete genome sequence of a dimarhabdovirus in parasitoids.
Collapse
|
2
|
Fletcher K, Klosterman SJ, Derevnina L, Martin F, Bertier LD, Koike S, Reyes-Chin-Wo S, Mou B, Michelmore R. Comparative genomics of downy mildews reveals potential adaptations to biotrophy. BMC Genomics 2018; 19:851. [PMID: 30486780 PMCID: PMC6264045 DOI: 10.1186/s12864-018-5214-8] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 10/31/2018] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND Spinach downy mildew caused by the oomycete Peronospora effusa is a significant burden on the expanding spinach production industry, especially for organic farms where synthetic fungicides cannot be deployed to control the pathogen. P. effusa is highly variable and 15 new races have been recognized in the past 30 years. RESULTS We virulence phenotyped, sequenced, and assembled two isolates of P. effusa from the Salinas Valley, California, U.S.A. that were identified as race 13 and 14. These assemblies are high quality in comparison to assemblies of other downy mildews having low total scaffold count (784 & 880), high contig N50s (48 kb & 52 kb), high BUSCO completion and low BUSCO duplication scores and share many syntenic blocks with Phytophthora species. Comparative analysis of four downy mildew and three Phytophthora species revealed parallel absences of genes encoding conserved domains linked to transporters, pathogenesis, and carbohydrate activity in the biotrophic species. Downy mildews surveyed that have lost the ability to produce zoospores have a common loss of flagella/motor and calcium domain encoding genes. Our phylogenomic data support multiple origins of downy mildews from hemibiotrophic progenitors and suggest that common gene losses in these downy mildews may be of genes involved in the necrotrophic stages of Phytophthora spp. CONCLUSIONS We present a high-quality draft genome of Peronospora effusa that will serve as a reference for Peronospora spp. We identified several Pfam domains as under-represented in the downy mildews consistent with the loss of zoosporegenesis and necrotrophy. Phylogenomics provides further support for a polyphyletic origin of downy mildews.
Collapse
Affiliation(s)
- Kyle Fletcher
- The Genome Center, Genome and Biomedical Sciences Facility, University of California, 451 East Health Sciences Drive, Davis, CA 95616 USA
| | - Steven J. Klosterman
- United States Department of Agriculture, Agricultural Research Service, Salinas, CA 93905 USA
| | - Lida Derevnina
- The Genome Center, Genome and Biomedical Sciences Facility, University of California, 451 East Health Sciences Drive, Davis, CA 95616 USA
- Present Address: The Sainsbury Laboratory, Norwich Research Park, Norwich, NR4 7UH UK
| | - Frank Martin
- United States Department of Agriculture, Agricultural Research Service, Salinas, CA 93905 USA
| | - Lien D. Bertier
- The Genome Center, Genome and Biomedical Sciences Facility, University of California, 451 East Health Sciences Drive, Davis, CA 95616 USA
| | - Steven Koike
- UC Davis Cooperative Extension Monterey County, Salinas, CA 93901 USA
- Present Address: TriCal Diagnostics, Hollister, CA 95023 USA
| | - Sebastian Reyes-Chin-Wo
- The Genome Center, Genome and Biomedical Sciences Facility, University of California, 451 East Health Sciences Drive, Davis, CA 95616 USA
| | - Beiquan Mou
- United States Department of Agriculture, Agricultural Research Service, Salinas, CA 93905 USA
| | - Richard Michelmore
- The Genome Center, Genome and Biomedical Sciences Facility, University of California, 451 East Health Sciences Drive, Davis, CA 95616 USA
- Departments of Plant Sciences, Molecular & Cellular Biology, Medical Microbiology & Immunology, University of California, Davis, 95616 USA
| |
Collapse
|
3
|
Siragusa L, Cross S, Baroni M, Goracci L, Cruciani G. BioGPS: Navigating biological space to predict polypharmacology, off-targeting, and selectivity. Proteins 2015; 83:517-32. [DOI: 10.1002/prot.24753] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Revised: 12/09/2014] [Accepted: 12/13/2014] [Indexed: 12/12/2022]
Affiliation(s)
- Lydia Siragusa
- Laboratory for Chemometrics and Molecular Modeling, Department of Chemistry, Biology and Biotechnology; University of Perugia; Perugia 06123 Italy
| | - Simon Cross
- Molecular Discovery Limited; Pinner, Middlesex, London HA5 5NE United Kingdom
| | - Massimo Baroni
- Molecular Discovery Limited; Pinner, Middlesex, London HA5 5NE United Kingdom
| | - Laura Goracci
- Laboratory for Chemometrics and Molecular Modeling, Department of Chemistry, Biology and Biotechnology; University of Perugia; Perugia 06123 Italy
| | - Gabriele Cruciani
- Laboratory for Chemometrics and Molecular Modeling, Department of Chemistry, Biology and Biotechnology; University of Perugia; Perugia 06123 Italy
| |
Collapse
|
4
|
Ribeiro JV, Cerqueira NMFSA, Fernandes PA, Ramos MJ. chem-path-tracker: An Automated Tool to Analyze Chemical Motifs in Molecular Structures. Chem Biol Drug Des 2014; 84:44-53. [DOI: 10.1111/cbdd.12349] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Revised: 04/16/2014] [Accepted: 04/20/2014] [Indexed: 01/25/2023]
Affiliation(s)
- João V. Ribeiro
- REQUIMTE; Departamento de Química e Bioquímica; Faculdade de Ciências; Universidade do Porto; Rua do Campo Alegre s/n Porto 4169-007 Portugal
| | - N. M. F. S. A. Cerqueira
- REQUIMTE; Departamento de Química e Bioquímica; Faculdade de Ciências; Universidade do Porto; Rua do Campo Alegre s/n Porto 4169-007 Portugal
| | - Pedro A. Fernandes
- REQUIMTE; Departamento de Química e Bioquímica; Faculdade de Ciências; Universidade do Porto; Rua do Campo Alegre s/n Porto 4169-007 Portugal
| | - Maria J. Ramos
- REQUIMTE; Departamento de Química e Bioquímica; Faculdade de Ciências; Universidade do Porto; Rua do Campo Alegre s/n Porto 4169-007 Portugal
| |
Collapse
|
5
|
Doppelt-Azeroual O, Delfaud F, Moriaud F, de Brevern AG. Fast and automated functional classification with MED-SuMo: an application on purine-binding proteins. Protein Sci 2010; 19:847-67. [PMID: 20162627 DOI: 10.1002/pro.364] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Ligand-protein interactions are essential for biological processes, and precise characterization of protein binding sites is crucial to understand protein functions. MED-SuMo is a powerful technology to localize similar local regions on protein surfaces. Its heuristic is based on a 3D representation of macromolecules using specific surface chemical features associating chemical characteristics with geometrical properties. MED-SMA is an automated and fast method to classify binding sites. It is based on MED-SuMo technology, which builds a similarity graph, and it uses the Markov Clustering algorithm. Purine binding sites are well studied as drug targets. Here, purine binding sites of the Protein DataBank (PDB) are classified. Proteins potentially inhibited or activated through the same mechanism are gathered. Results are analyzed according to PROSITE annotations and to carefully refined functional annotations extracted from the PDB. As expected, binding sites associated with related mechanisms are gathered, for example, the Small GTPases. Nevertheless, protein kinases from different Kinome families are also found together, for example, Aurora-A and CDK2 proteins which are inhibited by the same drugs. Representative examples of different clusters are presented. The effectiveness of the MED-SMA approach is demonstrated as it gathers binding sites of proteins with similar structure-activity relationships. Moreover, an efficient new protocol associates structures absent of cocrystallized ligands to the purine clusters enabling those structures to be associated with a specific binding mechanism. Applications of this classification by binding mode similarity include target-based drug design and prediction of cross-reactivity and therefore potential toxic side effects.
Collapse
Affiliation(s)
- Olivia Doppelt-Azeroual
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Université Paris Diderot-Paris 7, Institut National de la Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, 75739 Paris cedex 15, France.
| | | | | | | |
Collapse
|
6
|
Wu CY, Chen YC, Lim C. A structural-alphabet-based strategy for finding structural motifs across protein families. Nucleic Acids Res 2010; 38:e150. [PMID: 20525797 PMCID: PMC2919736 DOI: 10.1093/nar/gkq478] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Proteins with insignificant sequence and overall structure similarity may still share locally conserved contiguous structural segments; i.e. structural/3D motifs. Most methods for finding 3D motifs require a known motif to search for other similar structures or functionally/structurally crucial residues. Here, without requiring a query motif or essential residues, a fully automated method for discovering 3D motifs of various sizes across protein families with different folds based on a 16-letter structural alphabet is presented. It was applied to structurally non-redundant proteins bound to DNA, RNA, obligate/non-obligate proteins as well as free DNA-binding proteins (DBPs) and proteins with known structures but unknown function. Its usefulness was illustrated by analyzing the 3D motifs found in DBPs. A non-specific motif was found with a ‘corner’ architecture that confers a stable scaffold and enables diverse interactions, making it suitable for binding not only DNA but also RNA and proteins. Furthermore, DNA-specific motifs present ‘only’ in DBPs were discovered. The motifs found can provide useful guidelines in detecting binding sites and computational protein redesign.
Collapse
Affiliation(s)
- Chih Yuan Wu
- Department of Chemistry, National Tsing Hua University, Hsinchu, Taiwan
| | | | | |
Collapse
|
7
|
Tendulkar AV, Krallinger M, de la Torre V, López G, Wangikar PP, Valencia A. FragKB: structural and literature annotation resource of conserved peptide fragments and residues. PLoS One 2010; 5:e9679. [PMID: 20305778 PMCID: PMC2841175 DOI: 10.1371/journal.pone.0009679] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2009] [Accepted: 02/12/2010] [Indexed: 01/21/2023] Open
Abstract
Background FragKB (Fragment Knowledgebase) is a repository of clusters of structurally similar fragments from proteins. Fragments are annotated with information at the level of sequence, structure and function, integrating biological descriptions derived from multiple existing resources and text mining. Methodology FragKB contains approximately 400,000 conserved fragments from 4,800 representative proteins from PDB. Literature annotations are extracted from more than 1,700 articles and are available for over 12,000 fragments. The underlying systematic annotation workflow of FragKB ensures efficient update and maintenance of this database. The information in FragKB can be accessed through a web interface that facilitates sequence and structural visualization of fragments together with known literature information on the consequences of specific residue mutations and functional annotations of proteins and fragment clusters. FragKB is accessible online at http://ubio.bioinfo.cnio.es/biotools/fragkb/. Significance The information presented in FragKB can be used for modeling protein structures, for designing novel proteins and for functional characterization of related fragments. The current release is focused on functional characterization of proteins through inspection of conservation of the fragments.
Collapse
Affiliation(s)
- Ashish V Tendulkar
- Structural Biology and Biocomputing Programme, Spanish National Cancer Center, Madrid, Spain.
| | | | | | | | | | | |
Collapse
|
8
|
Hvidsten TR, Lægreid A, Kryshtafovych A, Andersson G, Fidelis K, Komorowski J. A comprehensive analysis of the structure-function relationship in proteins based on local structure similarity. PLoS One 2009; 4:e6266. [PMID: 19603073 PMCID: PMC2705683 DOI: 10.1371/journal.pone.0006266] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2008] [Accepted: 06/10/2009] [Indexed: 12/22/2022] Open
Abstract
Background Sequence similarity to characterized proteins provides testable functional hypotheses for less than 50% of the proteins identified by genome sequencing projects. With structural genomics it is believed that structural similarities may give functional hypotheses for many of the remaining proteins. Methodology/Principal Findings We provide a systematic analysis of the structure-function relationship in proteins using the novel concept of local descriptors of protein structure. A local descriptor is a small substructure of a protein which includes both short- and long-range interactions. We employ a library of commonly reoccurring local descriptors general enough to assemble most existing protein structures. We then model the relationship between these local shapes and Gene Ontology using rule-based learning. Our IF-THEN rule model offers legible, high resolution descriptions that combine local substructures and is able to discriminate functions even for functionally versatile folds such as the frequently occurring TIM barrel and Rossmann fold. By evaluating the predictive performance of the model, we provide a comprehensive quantification of the structure-function relationship based only on local structure similarity. Our findings are, among others, that conserved structure is a stronger prerequisite for enzymatic activity than for binding specificity, and that structure-based predictions complement sequence-based predictions. The model is capable of generating correct hypotheses, as confirmed by a literature study, even when no significant sequence similarity to characterized proteins exists. Conclusions/Significance Our approach offers a new and complete description and quantification of the structure-function relationship in proteins. By demonstrating how our predictions offer higher sensitivity than using global structure, and complement the use of sequence, we show that the presented ideas could advance the development of meta-servers in function prediction.
Collapse
Affiliation(s)
- Torgeir R. Hvidsten
- The Linnaeus Centre for Bioinformatics, Uppsala University and The Swedish University for Agricultural Sciences, Uppsala, Sweden
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Sweden
| | - Astrid Lægreid
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, St. Olavs Hospital HF, Trondheim, Norway
| | | | - Gunnar Andersson
- The Linnaeus Centre for Bioinformatics, Uppsala University and The Swedish University for Agricultural Sciences, Uppsala, Sweden
- Department of Chemistry, Environment and Feed Hygiene, National Veterinary Institute, Uppsala, Sweden
| | | | - Jan Komorowski
- The Linnaeus Centre for Bioinformatics, Uppsala University and The Swedish University for Agricultural Sciences, Uppsala, Sweden
- Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warszawa, Poland
- * E-mail:
| |
Collapse
|
9
|
Fox-Erlich S, Schiller MR, Gryk MR. Structural conservation of a short, functional, peptide-sequence motif. Front Biosci (Landmark Ed) 2009; 14:1143-51. [PMID: 19273121 DOI: 10.2741/3299] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Full length, eukaryotic proteins generally consist of several autonomously folding and functioning domains. Many of these domains are known to function by binding and/or modifying other partner proteins based on the recognition of a short, linear amino sequence contained within the target protein. This article reviews the many bioinformatic tools and resources which discover, define and catalogue the various, known protein domains as well as assist users by identifying domain signatures within proteins of interest. We also review the smaller subset of bioinformatic tools which catalogue and help identify the short linear motifs used for domain targeting. It has been suggested that these short, functional, peptide-sequence motifs are normally found in unstructured regions of the target. The role of protein structure in the activity of one representative of these short, functional motifs is explored through an examination of known structures deposited in the Protein Data Bank.
Collapse
Affiliation(s)
- Susan Fox-Erlich
- Department of Molecular, Microbial and Structural Biology, University of Connecticut Health Center, 263 Farmington Avenue, Farmington, CT 06030-3305, USA
| | | | | |
Collapse
|
10
|
Chien TY, Chang DTH, Chen CY, Weng YZ, Hsu CM. E1DS: catalytic site prediction based on 1D signatures of concurrent conservation. Nucleic Acids Res 2008; 36:W291-6. [PMID: 18524800 PMCID: PMC2447799 DOI: 10.1093/nar/gkn324] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2008] [Revised: 04/25/2008] [Accepted: 05/07/2008] [Indexed: 11/21/2022] Open
Abstract
Large-scale automatic annotation of protein sequences remains challenging in postgenomics era. E1DS is designed for annotating enzyme sequences based on a repository of 1D signatures. The employed sequence signatures are derived using a novel pattern mining approach that discovers long motifs consisted of several sequential blocks (conserved segments). Each of the sequential blocks is considerably conserved among the protein members of an EC group. Moreover, a signature includes at least three sequential blocks that are concurrently conserved, i.e. frequently observed together in sequences. In other words, a sequence signature is consisted of residues from multiple regions of the protein sequence, which echoes the observation that an enzyme catalytic site is usually constituted of residues that are largely separated in the sequence. E1DS currently contains 5421 sequence signatures that in total cover 932 4-digital EC numbers. E1DS is evaluated based on a collection of enzymes with catalytic sites annotated in Catalytic Site Atlas. When compared to the famous pattern database PROSITE, predictions based on E1DS signatures are considered more sensitive in identifying catalytic sites and the involved residues. E1DS is available at http://e1ds.ee.ncku.edu.tw/ and a mirror site can be found at http://e1ds.csbb.ntu.edu.tw/.
Collapse
Affiliation(s)
- Ting-Ying Chien
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
| | - Darby Tien-Hao Chang
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
| | - Chien-Yu Chen
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
| | - Yi-Zhong Weng
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
| | - Chen-Ming Hsu
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
| |
Collapse
|
11
|
Rossi KA, Weigelt CA, Nayeem A, Krystek SR. Loopholes and missing links in protein modeling. Protein Sci 2007; 16:1999-2012. [PMID: 17660258 PMCID: PMC2206982 DOI: 10.1110/ps.072887807] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2007] [Revised: 06/08/2007] [Accepted: 06/09/2007] [Indexed: 10/23/2022]
Abstract
This paper provides an unbiased comparison of four commercially available programs for loop sampling, Prime, Modeler, ICM, and Sybyl, each of which uses a different modeling protocol. The study assesses the quality of results and examines the relative strengths and weaknesses of each method. The set of loops to be modeled varied in length from 4-12 amino acids. The approaches used for loop modeling can be classified into two methodologies: ab initio loop generation (Modeler and Prime) and database searches (Sybyl and ICM). Comparison of the modeled loops to the native structures was used to determine the accuracy of each method. All of the protocols returned similar results for short loop lengths (four to six residues), but as loop length increased, the quality of the results varied among the programs. Prime generated loops with RMSDs <2.5 A for loops up to 10 residues, while the other three methods met the 2.5 A criteria at seven-residue loops. Additionally, the ability of the software to utilize disulfide bonds and X-ray crystal packing influenced the quality of the results. In the final analysis, the top-ranking loop from each program was rarely the loop with the lowest RMSD with respect to the native template, revealing a weakness in all programs to correctly rank the modeled loops.
Collapse
Affiliation(s)
- Karen A Rossi
- Computer-Assisted Drug Design, Pharmaceutical Research Institute, Bristol-Myers Squibb Company, Princeton, New Jersey 08543, USA.
| | | | | | | |
Collapse
|
12
|
Griko NB, Rose-Young L, Zhang X, Carpenter L, Candas M, Ibrahim MA, Junker M, Bulla LA. Univalent Binding of the Cry1Ab Toxin of Bacillus thuringiensis to a Conserved Structural Motif in the Cadherin Receptor BT-R1. Biochemistry 2007; 46:10001-7. [PMID: 17696320 DOI: 10.1021/bi700769s] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The Cry1Ab toxin produced by Bacillus thuringiensis (Bt) exerts insecticidal action upon binding to BT-R1, a cadherin receptor localized in the midgut epithelium of the tobacco hornworm Manduca sexta [Dorsch, J. A., Candas, M., Griko, N. B., Maaty, W. S., Midboe, E. G., Vadlamudi, R. K., and Bulla, L. A., Jr. (2002) Cry1A toxins of Bacillus thuringiensis bind specifically to a region adjacent to the membrane-proximal extracellular domain of BT-R1 in Manduca sexta: involvement of a cadherin in the entomopathogenicity of Bacillus thuringiensis, Insect Biochem. Mol. Biol. 32, 1025-1036]. BT-R1 represents a family of invertebrate cadherins whose ectodomains (ECs) are composed of multiple cadherin repeats (EC1 through EC12). In the present work, we determined the Cry1Ab toxin binding site in BT-R1 in the context of cadherin structural determinants. Our studies revealed a conserved structural motif for toxin binding that includes two distinct regions within the N- and C-termini of EC12. These regions are characterized by unique sequence signatures that mark the toxin-binding function in BT-R1 as well as in homologous lepidopteran cadherins. Structure modeling of EC12 discloses the conserved motif as a single broad interface that holds the N- and C-termini in close proximity. Binding of toxin to BT-R1, which is univalent, and the subsequent downstream molecular events responsible for cell death depend on the conserved motif in EC12.
Collapse
|
13
|
Abstract
It has long been recognized that knowledge of the 3D structures of proteins has the potential to accelerate drug discovery, but recent developments in genome sequencing, robotics and bioinformatics have radically transformed the opportunities. Many new protein targets have been identified from genome analyses and studied by X-ray analysis or NMR spectroscopy. Structural biology has been instrumental in directing not only lead optimization and target identification, where it has well-established roles, but also lead discovery, now that high-throughput methods of structure determination can provide powerful approaches to screening.
Collapse
Affiliation(s)
- Miles Congreve
- Astex Technology, 436 Cambridge Science Park, Milton Road, Cambridge CB4 0QA, UK
| | | | | |
Collapse
|
14
|
Blundell TL, Sibanda BL, Montalvão RW, Brewerton S, Chelliah V, Worth CL, Harmer NJ, Davies O, Burke D. Structural biology and bioinformatics in drug design: opportunities and challenges for target identification and lead discovery. Philos Trans R Soc Lond B Biol Sci 2006; 361:413-23. [PMID: 16524830 PMCID: PMC1609333 DOI: 10.1098/rstb.2005.1800] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Impressive progress in genome sequencing, protein expression and high-throughput crystallography and NMR has radically transformed the opportunities to use protein three-dimensional structures to accelerate drug discovery, but the quantity and complexity of the data have ensured a central place for informatics. Structural biology and bioinformatics have assisted in lead optimization and target identification where they have well established roles; they can now contribute to lead discovery, exploiting high-throughput methods of structure determination that provide powerful approaches to screening of fragment binding.
Collapse
Affiliation(s)
- Tom L Blundell
- Department of Biochemistry, University of Cambridge 80 Tennis Court Road, Cambridge CB2 1GA, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Kristensen DM, Chen BY, Fofanov VY, Ward RM, Lisewski AM, Kimmel M, Kavraki LE, Lichtarge O. Recurrent use of evolutionary importance for functional annotation of proteins based on local structural similarity. Protein Sci 2006; 15:1530-6. [PMID: 16672239 PMCID: PMC2242527 DOI: 10.1110/ps.062152706] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The annotation of protein function has not kept pace with the exponential growth of raw sequence and structure data. An emerging solution to this problem is to identify 3D motifs or templates in protein structures that are necessary and sufficient determinants of function. Here, we demonstrate the recurrent use of evolutionary trace information to construct such 3D templates for enzymes, search for them in other structures, and distinguish true from spurious matches. Serine protease templates built from evolutionarily important residues distinguish between proteases and other proteins nearly as well as the classic Ser-His-Asp catalytic triad. In 53 enzymes spanning 33 distinct functions, an automated pipeline identifies functionally related proteins with an average positive predictive power of 62%, including correct matches to proteins with the same function but with low sequence identity (the average identity for some templates is only 17%). Although these template building, searching, and match classification strategies are not yet optimized, their sequential implementation demonstrates a functional annotation pipeline which does not require experimental information, but only local molecular mimicry among a small number of evolutionarily important residues.
Collapse
Affiliation(s)
- David M Kristensen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Marsden RL, Lee D, Maibaum M, Yeats C, Orengo CA. Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Nucleic Acids Res 2006; 34:1066-80. [PMID: 16481312 PMCID: PMC1373602 DOI: 10.1093/nar/gkj494] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the proteomes can be assigned to fewer than 6000 well-characterized domain families with the remaining domain-like regions belonging to a much larger number of small uncharacterized families that are largely species specific. Our comprehensive domain annotation of 203 genomes enables us to provide more accurate estimates of the number of multi-domain proteins found in the three kingdoms of life than previous calculations. We find that 67% of eukaryotic sequences are multi-domain compared with 56% of sequences in prokaryotes. By measuring the domain coverage of genome sequences, we show that the structural genomics initiatives should aim to provide structures for less than a thousand structurally uncharacterized Pfam families to achieve reasonable structural annotation of the genomes. However, in large families, additional structures should be determined as these would reveal more about the evolution of the family and enable a greater understanding of how function evolves.
Collapse
Affiliation(s)
- Russell L Marsden
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK.
| | | | | | | | | |
Collapse
|
17
|
Zhong W, Altun G, Harrison R, Tai PC, Pan Y. Improved K-means clustering algorithm for exploring local protein sequence motifs representing common structural property. IEEE Trans Nanobioscience 2005; 4:255-65. [PMID: 16220690 DOI: 10.1109/tnb.2005.853667] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Information about local protein sequence motifs is very important to the analysis of biologically significant conserved regions of protein sequences. These conserved regions can potentially determine the diverse conformation and activities of proteins. In this work, recurring sequence motifs of proteins are explored with an improved K-means clustering algorithm on a new dataset. The structural similarity of these recurring sequence clusters to produce sequence motifs is studied in order to evaluate the relationship between sequence motifs and their structures. To the best of our knowledge, the dataset used by our research is the most updated dataset among similar studies for sequence motifs. A new greedy initialization method for the K-means algorithm is proposed to improve traditional K-means clustering techniques. The new initialization method tries to choose suitable initial points, which are well separated and have the potential to form high-quality clusters. Our experiments indicate that the improved K-means algorithm satisfactorily increases the percentage of sequence segments belonging to clusters with high structural similarity. Careful comparison of sequence motifs obtained by the improved and traditional algorithms also suggests that the improved K-means clustering algorithm may discover some relatively weak and subtle sequence motifs, which are undetectable by the traditional K-means algorithms. Many biochemical tests reported in the literature show that these sequence motifs are biologically meaningful. Experimental results also indicate that the improved K-means algorithm generates more detailed sequence motifs representing common structures than previous research. Furthermore, these motifs are universally conserved sequence patterns across protein families, overcoming some weak points of other popular sequence motifs. The satisfactory result of the experiment suggests that this new K-means algorithm may be applied to other areas of bioinformatics research in order to explore the underlying relationships between data samples more effectively.
Collapse
Affiliation(s)
- Wei Zhong
- Computer Science Department, Georgia State University, Atlanta, GA 30303-4110, USA.
| | | | | | | | | |
Collapse
|
18
|
Wang K, Samudrala R. FSSA: a novel method for identifying functional signatures from structural alignments. Bioinformatics 2005; 21:2969-77. [PMID: 15860561 DOI: 10.1093/bioinformatics/bti471] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION It is commonly believed that sequence determines structure, which in turn determines function. However, the presence of many proteins with the same structural fold but different functions suggests that global structure and function do not always correlate well. RESULTS We propose a method for accurate functional annotation, based on identification of functional signatures from structural alignments (FSSA) using the Structural Classification of Proteins (SCOP) database. The FSSA method is superior at function discrimination and classification compared with several methods that directly inherit functional annotation information from homology inference, such as Smith-Waterman, PSI-BLAST, hidden Markov models and structure comparison methods, for a large number of structural fold families. Our results indicate that the contributions of amino acid residue types and positions to structure and function are largely separable for proteins in multi-functional fold families.
Collapse
Affiliation(s)
- Kai Wang
- Computational Genomics Group, Department of Microbiology, University of Washington Seattle, WA 98195, USA
| | | |
Collapse
|
19
|
Laskowski RA, Chistyakov VV, Thornton JM. PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids. Nucleic Acids Res 2005; 33:D266-8. [PMID: 15608193 PMCID: PMC539955 DOI: 10.1093/nar/gki001] [Citation(s) in RCA: 326] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
PDBsum is a database of mainly pictorial summaries of the 3D structures of proteins and nucleic acids in the Protein Data Bank. Its pages aim to provide an at-a-glance view of the contents of every 3D structure, plus detailed structural analyses of each protein chain, DNA–RNA chain and any bound ligands and metals. In the past year, the database has been significantly improved, in terms of both appearance and new content. Moreover, it has moved to its new address at http://www.ebi.ac.uk/thornton-srv/databases/pdbsum.
Collapse
Affiliation(s)
- Roman A Laskowski
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
20
|
Vendruscolo M, Dobson CM. A glimpse at the organization of the protein universe. Proc Natl Acad Sci U S A 2005; 102:5641-2. [PMID: 15827120 PMCID: PMC556289 DOI: 10.1073/pnas.0500274102] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Michele Vendruscolo
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom.
| | | |
Collapse
|
21
|
Hoffmann B, Eichmüller C, Steinhauser O, Konrat R. Rapid Assessment of Protein Structural Stability and Fold Validation via NMR. Methods Enzymol 2005; 394:142-75. [PMID: 15808220 DOI: 10.1016/s0076-6879(05)94006-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
In structural proteomics, it is necessary to efficiently screen in a high-throughput manner for the presence of stable structures in proteins that can be subjected to subsequent structure determination by X-ray or NMR spectroscopy. Here we illustrate that the (1)H chemical distribution in a protein as detected by (1)H NMR spectroscopy can be used to probe protein structural stability (e.g., the presence of stable protein structures) of proteins in solution. Based on experimental data obtained on well-structured proteins and proteins that exist in a molten globule state or a partially folded alpha-helical state, a well-defined threshold exists that can be used as a quantitative benchmark for protein structural stability (e.g., foldedness) in solution. Additionally, in this chapter we describe a largely automated strategy for rapid fold validation and structure-based backbone signal assignment. Our methodology is based on a limited number of NMR experiments (e.g., HNCA and 3D NOESY-HSQC) and performs a Monte Carlo-type optimization. The novel feature of the method is the opportunity to screen for structural fragments (e.g., template scanning). The performance of this new validation tool is demonstrated with applications to a diverse set of proteins.
Collapse
Affiliation(s)
- Bernd Hoffmann
- Institute of Theoretical Chemistry and Molecular Structural Biology, University of Vienna, Austria
| | | | | | | |
Collapse
|
22
|
Barzilai A, Kumar S, Wolfson H, Nussinov R. Potential folding-function interrelationship in proteins. Proteins 2004; 56:635-49. [PMID: 15281117 DOI: 10.1002/prot.20132] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The possibility is addressed that protein folding and function may be related via regions that are critical for both folding and function. This approach is based on the building blocks folding model that describes protein folding as binding events of conformationally fluctuating building blocks. Within these, we identify building block fragments that are critical for achieving the native fold. A library of such critical building blocks (CBBs) is constructed. Then, it is asked whether the functionally important residues fall in these CBB fragments. We find that for over two-thirds of the proteins in our library with available functional information, the catalytic or binding site residues lie within the CBB regions. From the evolutionary standpoint, a folding-function relationship is advantageous, since the need to guard against mutations is limited to one region. Furthermore, conformationally similar CBBs are found in globally unrelated proteins with different functions. Hence, substituting CBBs may lead to designed proteins with altered functions. We further find that the CBBs in our library are conformationally unstable.
Collapse
Affiliation(s)
- Adi Barzilai
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | | | | | | |
Collapse
|
23
|
Chelliah V, Chen L, Blundell TL, Lovell SC. Distinguishing structural and functional restraints in evolution in order to identify interaction sites. J Mol Biol 2004; 342:1487-504. [PMID: 15364576 DOI: 10.1016/j.jmb.2004.08.022] [Citation(s) in RCA: 82] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2003] [Revised: 07/20/2004] [Accepted: 08/09/2004] [Indexed: 11/29/2022]
Abstract
Structural genomics projects are producing many three-dimensional structures of proteins that have been identified only from their gene sequences. It is therefore important to develop computational methods that will predict sites involved in productive intermolecular interactions that might give clues about functions. Techniques based on evolutionary conservation of amino acids have the advantage over physiochemical methods in that they are more general. However, the majority of techniques neither use all available structural and sequence information, nor are able to distinguish between evolutionary restraints that arise from the need to maintain structure and those that arise from function. Three methods to identify evolutionary restraints on protein sequence and structure are described here. The first identifies those residues that have a higher degree of conservation than expected: this is achieved by comparing for each amino acid position the sequence conservation observed in the homologous family of proteins with the degree of conservation predicted on the basis of amino acid type and local environment. The second uses information theory to identify those positions where environment-specific substitution tables make poor predictions of the overall amino acid substitution pattern. The third method identifies those residues that have highly conserved positions when three-dimensional structures of proteins in a homologous family are superposed. The scores derived from these methods are mapped onto the protein three-dimensional structures and contoured, allowing identification clusters of residues with strong evolutionary restraints that are sites of interaction in proteins involved in a variety of functions. Our method differs from other published techniques by making use of structural information to identify restraints that arise from the structure of the protein and differentiating these restraints from others that derive from intermolecular interactions that mediate functions in the whole organism.
Collapse
Affiliation(s)
- Vijayalakshmi Chelliah
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| | | | | | | |
Collapse
|
24
|
Brakoulias A, Jackson RM. Towards a structural classification of phosphate binding sites in protein-nucleotide complexes: an automated all-against-all structural comparison using geometric matching. Proteins 2004; 56:250-60. [PMID: 15211509 DOI: 10.1002/prot.20123] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A method is described for the rapid comparison of protein binding sites using geometric matching to detect similar three-dimensional structure. The geometric matching detects common atomic features through identification of the maximum common sub-graph or clique. These features are not necessarily evident from sequence or from global structural similarity giving additional insight into molecular recognition not evident from current sequence or structural classification schemes. Here we use the method to produce an all-against-all comparison of phosphate binding sites in a number of different nucleotide phosphate-binding proteins. The similarity search is combined with clustering of similar sites to allow a preliminary structural classification. Clustering by site similarity produces a classification of binding sites for the 476 representative local environments producing ten main clusters representing half of the representative environments. The similarities make sense in terms of both structural and functional classification schemes. The ten main clusters represent a very limited number of unique structural binding motifs for phosphate. These are the structural P-loop, di-nucleotide binding motif [FAD/NAD(P)-binding and Rossman-like fold] and FAD-binding motif. Similar classification schemes for nucleotide binding proteins have also been arrived at independently by others using different methods.
Collapse
Affiliation(s)
- Andreas Brakoulias
- Department of Biochemistry & Molecular Biology, University College London, Gower Street, London, England
| | | |
Collapse
|
25
|
Wei L, Altman RB. Recognizing complex, asymmetric functional sites in protein structures using a Bayesian scoring function. J Bioinform Comput Biol 2004; 1:119-38. [PMID: 15290784 DOI: 10.1142/s0219720003000150] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2002] [Revised: 03/03/2003] [Accepted: 03/03/2003] [Indexed: 11/18/2022]
Abstract
The increase in known three-dimensional protein structures enables us to build statistical profiles of important functional sites in protein molecules. These profiles can then be used to recognize sites in large-scale automated annotations of new protein structures. We report an improved FEATURE system which recognizes functional sites in protein structures. FEATURE defines multi-level physico-chemical properties and recognizes sites based on the spatial distribution of these properties in the sites' microenvironments. It uses a Bayesian scoring function to compare a query region with the statistical profile built from known examples of sites and control nonsites. We have previously shown that FEATURE can accurately recognize calcium-binding sites and have reported interesting results scanning for calcium-binding sites in the entire Protein Data Bank. Here we report the ability of the improved FEATURE to characterize and recognize geometrically complex and asymmetric sites such as ATP-binding sites and disulfide bond-forming sites. FEATURE does not rely on conserved residues or conserved residue geometry of the sites. We also demonstrate that, in the absence of a statistical profile of the sites, FEATURE can use an artificially constructed profile based on a priori knowledge to recognize the sites in new structures, using redoxin active sites as an example.
Collapse
Affiliation(s)
- Liping Wei
- Nexus Genomics, Inc., 229 Polaris Ave., Suite 6, Mountain View, CA 94043, USA.
| | | |
Collapse
|
26
|
Selvarani P, Shanthi V, Rajesh CK, Saravanan S, Sekar K. BSDD: Biomolecules Segment Display Device--a web-based interactive display tool. Nucleic Acids Res 2004; 32:W645-8. [PMID: 15215468 PMCID: PMC441558 DOI: 10.1093/nar/gkh420] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
An interactive web-based display tool, Biomolecules Segment Display Device (BSDD), has been developed to search for and visualize a user-defined motif or fragment among the protein structures available in the Protein Data Bank (PDB). In addition, the tool works for the structures available in a selected sub-set of non-homologous protein structures (25% and 90% sequence identity). The graphics package RASMOL has been incorporated as an interface to visualize the three-dimensional structure of the user-defined motif. In addition, the software can be used to extract the atomic coordinates of the required fragment and save them to the client system. The atomic coordinates are updated every week from the RCSB-PDB server, and hence the results produced by BSDD are up to date at any given time. The software BSDD is available over the World Wide Web at http://iris.physics.iisc.ernet.in/bsdd or http://144.16.71.2/bsdd.
Collapse
Affiliation(s)
- P Selvarani
- Bioinformatics Centre, Indian Institute of Science, Bangalore 560 012, India
| | | | | | | | | |
Collapse
|
27
|
Shapiro J, Brutlag D. FoldMiner: structural motif discovery using an improved superposition algorithm. Protein Sci 2004; 13:278-94. [PMID: 14691242 PMCID: PMC2286532 DOI: 10.1110/ps.03239404] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We report an unsupervised structural motif discovery algorithm, FoldMiner, which is able to detect global and local motifs in a database of proteins without the need for multiple structure or sequence alignments and without relying on prior classification of proteins into families. Motifs, which are discovered from pairwise superpositions of a query structure to a database of targets, are described probabilistically in terms of the conservation of each secondary structure element's position and are used to improve detection of distant structural relationships. During each iteration of the algorithm, the motif is defined from the current set of homologs and is used both to recruit additional homologous structures and to discard false positives. FoldMiner thus achieves high specificity and sensitivity by distinguishing between homologous and nonhomologous structures by the regions of the query to which they align. We find that when two proteins of the same fold are aligned, highly conserved secondary structure elements in one protein tend to align to highly conserved elements in the second protein, suggesting that FoldMiner consistently identifies the same motif in members of a fold. Structural alignments are performed by an improved superposition algorithm, LOCK 2, which detects distant structural relationships by placing increased emphasis on the alignment of secondary structure elements. LOCK 2 obeys several properties essential in automated analysis of protein structure: It is symmetric, its alignments of secondary structure elements are transitive, its alignments of residues display a high degree of transitivity, and its scoring system is empirically found to behave as a metric.
Collapse
Affiliation(s)
- Jessica Shapiro
- Biophysics Program and Department of Biochemistry, Stanford University, Stanford, California 94305-5307, USA
| | | |
Collapse
|
28
|
Abstract
Here we describe various methods currently under development aimed at identifying a protein's function from its three-dimensional structure. We are combining a number of these methods to create a pipeline of applications, called ProFunc, which will take a given 3D structure, run all the applications on it and compile and summarise the results obtained. The aim is to provide a best guess as to the protein's function from the evidence provided by the different methods. Here we present three examples, using structures solved by the Midwest Center for Structural Genomics consortium, illustrating the strengths and weaknesses of current approaches.
Collapse
Affiliation(s)
- Roman A Laskowski
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | |
Collapse
|
29
|
Via A, Helmer-Citterich M. A structural study for the optimisation of functional motifs encoded in protein sequences. BMC Bioinformatics 2004; 5:50. [PMID: 15119965 PMCID: PMC420233 DOI: 10.1186/1471-2105-5-50] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2004] [Accepted: 04/30/2004] [Indexed: 11/23/2022] Open
Abstract
Background A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. Results Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases), the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed. Conclusion Our method can be applied to any type of functional motif or pattern (not only PROSITE ones) which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of structurally conserved residues is already available on request and will be soon accessible on our web server. The procedure is intended for the use of pattern database curators and of scientists interested in a specific protein family for which no specific or selective patterns are yet available.
Collapse
Affiliation(s)
- Allegra Via
- Centre for Molecular Bioinformatics, Dept. of Biology, University of Rome Tor Vergata, Rome (Italy)
| | - Manuela Helmer-Citterich
- Centre for Molecular Bioinformatics, Dept. of Biology, University of Rome Tor Vergata, Rome (Italy)
| |
Collapse
|
30
|
Tendulkar AV, Joshi AA, Sohoni MA, Wangikar PP. Clustering of Protein Structural Fragments Reveals Modular Building Block Approach of Nature. J Mol Biol 2004; 338:611-29. [PMID: 15081817 DOI: 10.1016/j.jmb.2004.02.047] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2003] [Revised: 02/11/2004] [Accepted: 02/17/2004] [Indexed: 11/29/2022]
Abstract
Structures of peptide fragments drawn from a protein can potentially occupy a vast conformational continuum. We co-ordinatize this conformational space with the help of geometric invariants and demonstrate that the peptide conformations of the currently available protein structures are heavily biased in favor of a finite number of conformational types or structural building blocks. This is achieved by representing a peptides' backbone structure with geometric invariants and then clustering peptides based on closeness of the geometric invariants. This results in 12,903 clusters, of which 2207 are made up of peptides drawn from functionally and/or structurally related proteins. These are termed "functional" clusters and provide clues about potential functional sites. The rest of the clusters, including the largest few, are made up of peptides drawn from unrelated proteins and are termed "structural" clusters. The largest clusters are of regular secondary structures such as helices and beta strands as well as of beta hairpins. Several categories of helices and strands are discovered based on geometric differences. In addition to the known classes of loops, we discover several new classes, which will be useful in protein structure modeling. Our algorithm does not require assignment of secondary structure and, therefore, overcomes the limitations in loop classification due to ambiguity in secondary structure assignment at loop boundaries.
Collapse
Affiliation(s)
- Ashish V Tendulkar
- Kanwal Rekhi School of Information Technology, Indian Institute of Technology, Bombay, Powai, Mumbai 400 076, India
| | | | | | | |
Collapse
|
31
|
Tang CL, Xie L, Koh IYY, Posy S, Alexov E, Honig B. On the Role of Structural Information in Remote Homology Detection and Sequence Alignment: New Methods Using Hybrid Sequence Profiles. J Mol Biol 2003; 334:1043-62. [PMID: 14643665 DOI: 10.1016/j.jmb.2003.10.025] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Structural alignments often reveal relationships between proteins that cannot be detected using sequence alignment alone. However, profile search methods based entirely on structural alignments alone have not been found to be effective in finding remote homologs. Here, we explore the role of structural information in remote homolog detection and sequence alignment. To this end, we develop a series of hybrid multidimensional alignment profiles that combine sequence, secondary and tertiary structure information into hybrid profiles. Sequence-based profiles are profiles whose position-specific scoring matrix is derived from sequence alignment alone; structure-based profiles are those derived from multiple structure alignments. We compare pure sequence-based profiles to pure structure-based profiles, as well as to hybrid profiles that use combined sequence-and-structure-based profiles, where sequence-based profiles are used in loop/motif regions and structural information is used in core structural regions. All of the hybrid methods offer significant improvement over simple profile-to-profile alignment. We demonstrate that both sequence-based and structure-based profiles contribute to remote homology detection and alignment accuracy, and that each contains some unique information. We discuss the implications of these results for further improvements in amino acid sequence and structural analysis.
Collapse
Affiliation(s)
- Christopher L Tang
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Columbia University, New York, NY 10032, USA
| | | | | | | | | | | |
Collapse
|
32
|
Tendulkar AV, Wangikar PP, Sohoni MA, Samant VV, Mone CY. Parameterization and Classification of the Protein Universe via Geometric Techniques. J Mol Biol 2003; 334:157-72. [PMID: 14596807 DOI: 10.1016/j.jmb.2003.09.021] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
We present a scheme for the classification of 3487 non-redundant protein structures into 1207 non-hierarchical clusters by using recurring structural patterns of three to six amino acids as keys of classification. This results in several signature patterns, which seem to decide membership of a protein in a functional category. The patterns provide clues to the key residues involved in functional sites as well as in protein-protein interaction. The discovered patterns include a "glutamate double bridge" of superoxide dismutase, the functional interface of the serine protease and inhibitor, interface of homo/hetero dimers, and functional sites of several enzyme families. We use geometric invariants to decide superimposability of structural patterns. This allows the parameterization of patterns and discovery of recurring patterns via clustering. The geometric invariant-based approach eliminates the computationally explosive step of pair-wise comparison of structures. The results provide a vast resource for the biologists for experimental validation of the proposed functional sites, and for the design of synthetic enzymes, inhibitors and drugs.
Collapse
Affiliation(s)
- Ashish V Tendulkar
- Kanwal Rekhi School of Information Technology, Indian Institute of Technology, Bombay, Powai, Mumbai 400 076, India
| | | | | | | | | |
Collapse
|
33
|
Abstract
The success of structural genomics initiatives requires the development and application of tools for structure analysis, prediction, and annotation. In this paper we review recent developments in these areas; specifically structure alignment, the detection of remote homologs and analogs, homology modeling and the use of structures to predict function. We also discuss various rationales for structural genomics initiatives. These include the structure-based clustering of sequence space and genome-wide function assignment. It is also argued that structural genomics can be integrated into more traditional biological research if specific biological questions are included in target selection strategies.
Collapse
Affiliation(s)
- Sharon Goldsmith-Fischman
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | |
Collapse
|
34
|
Mondal S, Jaishankar SP, Ramakumar S. Role of context in the relationship between form and function: structural plasticity of some PROSITE patterns. Biochem Biophys Res Commun 2003; 305:1078-84. [PMID: 12767941 DOI: 10.1016/s0006-291x(03)00882-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
True positive hits of PROSITE sequence pattern are expected to have a characteristic three-dimensional structure. The combined sequence-structure attributes of PROSITE patterns can be used for function prediction of an uncharacterized protein with known primary and 3D structure, a situation that might arise in structural genomics projects. We have found specific examples of true hits of PROSITE patterns displaying structural plasticity by assuming significantly different local conformation, depending upon the context. Our work highlights the importance of taking into account all the known distinct conformations of PROSITE patterns, while creating a sensitive 3D template for the pattern, for use in functional annotation.
Collapse
Affiliation(s)
- Sukanta Mondal
- Department of Physics, Indian Institute of Science, Bangalore 560 012, India
| | | | | |
Collapse
|
35
|
Chan CH, Lyu PC, Hwang JK. Computation of the Protein Structure Entropy and Its Applications to Protein Folding Processes. J CHIN CHEM SOC-TAIP 2003. [DOI: 10.1002/jccs.200300097] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
36
|
Wangikar PP, Tendulkar AV, Ramya S, Mali DN, Sarawagi S. Functional sites in protein families uncovered via an objective and automated graph theoretic approach. J Mol Biol 2003; 326:955-78. [PMID: 12581652 DOI: 10.1016/s0022-2836(02)01384-0] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We report a method for detection of recurring side-chain patterns (DRESPAT) using an unbiased and automated graph theoretic approach. We first list all structural patterns as sub-graphs where the protein is represented as a graph. The patterns from proteins are compared pair-wise to detect patterns common to a protein pair based on content and geometry criteria. The recurring pattern is then detected using an automated search algorithm from the all-against-all pair-wise comparison data of proteins. Intra-protein pattern comparison data are used to enable detection of patterns recurring within a protein. A method has been proposed for empirical calculation of statistical significance of recurring pattern. The method was tested on 17 protein sets of varying size, composed of non-redundant representatives from SCOP superfamilies. Recurring patterns in serine proteases, cysteine proteases, lipases, cupredoxin, ferredoxin, ferritin, cytochrome c, aspartoyl proteases, peroxidases, phospholipase A2, endonuclease, SH3 domain, EF-hand and lectins show additional residues conserved in the vicinity of the known functional sites. On the basis of the recurring patterns in ferritin, EF-hand and lectins, we could separate proteins or domains that are structurally similar yet different in metal ion-binding characteristics. In addition, novel recurring patterns were observed in glutathione-S-transferase, phospholipase A2 and ferredoxin with potential structural/functional roles. The results are discussed in relation to the known functional sites in each family. Between 2000 and 50,000 patterns were enumerated from each protein with between ten and 500 patterns detected as common to an evolutionarily related protein pair. Our results show that unbiased extraction of functional site pattern is not feasible from an evolutionarily related protein pair but is feasible from protein sets comprising five or more proteins. The DRESPAT method does not require a user-defined pattern, size or location of the pattern and therefore, has the potential to uncover new functional sites in protein families.
Collapse
Affiliation(s)
- Pramod P Wangikar
- Department of Chemical Engineering, Indian Institute of Technology, Bombay, Powai Mumbai 400 076, India.
| | | | | | | | | |
Collapse
|
37
|
Asaoka T, Ando T, Meguro T, Yamato I. Development of a structure based protein function prediction method: Calcium binding protein. CHEM-BIO INFORMATICS JOURNAL 2003. [DOI: 10.1273/cbij.3.96] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
38
|
Zhang ZL, Harrison PM, Gerstein M. Digging deep for ancient relics: a survey of protein motifs in the intergenic sequences of four eukaryotic genomes. J Mol Biol 2002; 323:811-22. [PMID: 12417195 DOI: 10.1016/s0022-2836(02)01035-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We have examined conserved protein motifs in the non-coding, intergenic regions ("pseudomotif patterns") and surveyed their occurrence in the fly, worm, yeast and human genomes (chromosomes 21 and 22 only). To identify these patterns, we masked out annotated genes, pseudogenes and repeat regions from the raw genomic sequence and then compared the remaining sequence, in six-frame translation, against 1319 patterns from the PROSITE database. For each pseudomotif pattern, the absolute number of occurrences is not very informative unless compared against a statistical expectation; consequently, we calculated the expected occurrence of each pattern using a Poisson model and verified this with simulations. Using a p-value cut-off of 0.01, we found 67 pseudomotif patterns over-represented in fly intergenic regions, 34 in worm, 21 in human and six in yeast. These include the zinc finger, leucine zipper, nucleotide-binding motif and EGF domain. Many of the over-represented patterns were common to two or more organisms, but there were a few that were unique to specific ones. Furthermore, we found more over-represented patterns in the fly than in the worm, although the fly has fewer pseudogenes. This puzzling observation can be explained by a higher deletion rate in the fly genome. We also surveyed under-represented patterns, finding 23 in the fly, 12 in the worm, 18 in human and two in yeast. If intergenic sequences were truly random, we would expect an equal number of over and under-represented patterns. The fact that for each organism the number of over-represented patterns is greater than the number of under-represented ones implies that a fraction of the intergenic regions consist of ancient protein fragments that, due to accumulated disablements, have become unrecognizable by conventional techniques for gene and pseudogene identification. Moreover, we find that in aggregate the over-represented pseudomotif patterns occupy a substantial fraction of the intergenic regions. Further information is available at http://pseudogene.org
Collapse
Affiliation(s)
- Zhao Lei Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass Center 432A, 266 Whitney Avenue, P.O. Box 208114, New Haven, CT 06520-8114, USA
| | | | | |
Collapse
|
39
|
Norin M, Sundström M. Structural proteomics: lessons learnt from the early case studies. FARMACO (SOCIETA CHIMICA ITALIANA : 1989) 2002; 57:947-51. [PMID: 12484544 DOI: 10.1016/s0014-827x(02)01212-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The genomics efforts have identified a large number of novel genes and thus provided a pool of interesting but not functionally characterized target proteins. It has been suggested that structural proteomics will significantly impact the success rate of functional characterization of such identified genes and proteins by providing structure-function hypotheses by fold and feature recognition and analysis. Structural proteomics initiatives, both in academic and industrial settings, are today generating protein structures at an unprecedented rate although relatively few large-scale efforts have been displayed in the public domain. However, a number of individual studies have provided a 'road-map' for selected approaches that hold the promise to significantly impact the process of deriving function from structure.
Collapse
Affiliation(s)
- Martin Norin
- Biovitrum, Department of Structural Chemistry, Nordenflychtsvägen 62:6, SE-112 76 Stockholm, Sweden.
| | | |
Collapse
|
40
|
Bonneau R, Strauss CEM, Rohl CA, Chivian D, Bradley P, Malmström L, Robertson T, Baker D. De novo prediction of three-dimensional structures for major protein families. J Mol Biol 2002; 322:65-78. [PMID: 12215415 DOI: 10.1016/s0022-2836(02)00698-8] [Citation(s) in RCA: 180] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We use the Rosetta de novo structure prediction method to produce three-dimensional structure models for all Pfam-A sequence families with average length under 150 residues and no link to any protein of known structure. To estimate the reliability of the predictions, the method was calibrated on 131 proteins of known structure. For approximately 60% of the proteins one of the top five models was correctly predicted for 50 or more residues, and for approximately 35%, the correct SCOP superfamily was identified in a structure-based search of the Protein Data Bank using one of the models. This performance is consistent with results from the fourth critical assessment of structure prediction (CASP4). Correct and incorrect predictions could be partially distinguished using a confidence function based on a combination of simulation convergence, protein length and the similarity of a given structure prediction to known protein structures. While the limited accuracy and reliability of the method precludes definitive conclusions, the Pfam models provide the only tertiary structure information available for the 12% of publicly available sequences represented by these large protein families.
Collapse
Affiliation(s)
- Richard Bonneau
- Department of Biochemistry, University of Washington, Seattle, WA 98195-7350, USA
| | | | | | | | | | | | | | | |
Collapse
|
41
|
Reva B, Finkelstein A, Topiol S. Threading with chemostructural restrictions method for predicting fold and functionally significant residues: application to dipeptidylpeptidase IV (DPP-IV). Proteins 2002; 47:180-93. [PMID: 11933065 DOI: 10.1002/prot.10076] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We present a new method for more accurate modeling of protein structure, called threading with chemostructural restrictions. This method addresses those cases in which a target sequence has only remote homologues of known structure for which sequence comparison methods cannot provide accurate alignments. Although remote homologues cannot provide an accurate model for the whole chain, they can be used in constructing practically useful models for the most conserved-and often the most interesting-part of the structure. For many proteins of interest, one can suggest certain chemostructural patterns for the native structure based on the available information on the structural superfamily of the protein, the type of activity, the sequence location of the functionally significant residues, and other factors. We use such patterns to restrict (1) a number of possible templates, and (2) a number of allowed chain conformations on a template. The latter restrictions are imposed in the form of additional template potentials (including terms acting as sequence anchors) that act on certain residues. This approach is tested on remote homologues of alpha/beta-hydrolases that have significant structural similarity in the positions of their catalytic triads. The study shows that, in spite of significant deviations between the model and the native structures, the surroundings of the catalytic triad (positions of C(alpha) atoms of 20-30 nearby residues) can be reproduced with accuracy of 2-3 A. We then apply the approach to predict the structure of dipeptidylpeptidase IV (DPP-IV). Using experimentally available data identifying the catalytic triad residues of DPP-IV (David et al., J Biol Chem 1993;268:17247-17252); we predict a model structure of the catalytic domain of DPP-IV based on the 3D fold of prolyl oligopeptidase (Fulop et al., Cell 1998;94:161-170) and use this structure for modeling the interaction of DPP-IV with inhibitor.
Collapse
Affiliation(s)
- Boris Reva
- Novartis Institute for Biomedical Research, Summit, New Jersey, USA.
| | | | | |
Collapse
|
42
|
Abstract
The major challenge for post-genomic research is to functionally assign and validate a large number of novel target genes and their corresponding proteins. Functional genomics approaches have, therefore, gained considerable attention in the quest to convert this massive data set into useful information. One of the crucial components for the functional understanding of unassigned proteins is the analysis of their experimental or modeled 3D structures. Structural proteomics initiatives are generating protein structures at an unprecedented rate but our current knowledge of 3D-structural space is still limited. Estimates on the completeness of the 3D-structural coverage of proteins vary but it is generally accepted that only a minority of the structural proteome has a template structure from which reliable conclusions can be drawn. Thus, structural proteomics has set out to build a map of protein structures that will represent all protein folds included in the 'global proteome'.
Collapse
Affiliation(s)
- Martin Norin
- Biovitrum, Department of Structural Chemistry., Stockholm, Sweden
| | | |
Collapse
|
43
|
Jackson RM, Russell RB. Predicting function from structure: examples of the serine protease inhibitor canonical loop conformation found in extracellular proteins. COMPUTERS & CHEMISTRY 2001; 26:31-9. [PMID: 11765849 DOI: 10.1016/s0097-8485(01)00097-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The prediction of protein function from structure is becoming of growing importance in the age of structural genomics. We have focused on the problem of identifying sites of potential serine protease inhibitor interactions on the surface of proteins of known structure. Given that there is no sequence conservation within canonical loops from different inhibitor families we first compare representative loops to all fragments of equal length among proteins of known structure by calculating main-chain RMS deviation. Fragments with RMS deviation below a certain threshold (hits) are removed if residues have solvent accessibilities appreciably lower than those observed in the search structure. These remaining hits are further filtered to remove those occurring largely within secondary structure elements. Likely functional significance is restricted further by considering only extracellular protein domains. Also a test is performed to see if the loop can dock into the binding site of the serine protease trypsin without unacceptable steric clashes. By comparing different canonical loop structures to the protein structure database we show that the method was able to detect previously known inhibitors. In addition, we discuss potentially new canonical loop structures found in secreted hydrolases, toxins, viral proteins, cytokines and other proteins. We discuss the possible functional significance of several of the examples found.
Collapse
Affiliation(s)
- R M Jackson
- Department of Biochemistry and Molecular Biology, University College, Gower Street, London WCIE 6BT, UK.
| | | |
Collapse
|
44
|
Abstract
Conventional fold recognition techniques rely mainly on the analysis of the entire sequence of a protein. We present an MBA method to improve performance of any conventional sequence-based fold assignment. The method uses sequence motifs, such as those defined in the Prosite database, and the SwissProt annotation of the fold library. When combined with a simple SDP method, the coverage of MBA is comparable to the results obtained with PSI-BLAST. However, the set of the MBA predictions is significantly different from that of PSI-BLAST, leading to a 40% increase of the coverage for the combined MBA/PSI-BLAST method. The MBA approach can be easily adopted to include the results of sequence-independent function prediction methods and alternative motif and annotation databases. The method is available through the web server localized at http://www.doe-mbi.ucla.edu/mba.
Collapse
Affiliation(s)
- L Salwinski
- Department of Chemistry, UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, UCLA, Los Angeles, California 90095-1570, USA
| | | |
Collapse
|
45
|
Olsson B, Laurio K, Gudjonsson L. A hybrid method for protein sequence modeling with improved accuracy. Inf Sci (N Y) 2001. [DOI: 10.1016/s0020-0255(01)00161-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
46
|
Abstract
The pattern suggested for the structure-function superfamily of cytochromes P450 is composed by combining the conserved amino acid motifs. The sizes of P450 cytochromes were estimated according to their length. The empirical coefficients reflecting the peculiarities of the primary structure of these enzymes are calculated. We propose an approach for determining novel proteins sequences to the mentioned superfamily on the ground of the complex of these parameters. A number of the hypothetical proteins from the international databases is related to the cytochromes P450 by means of our pattern.
Collapse
Affiliation(s)
- A G Buchatskii
- Institute of Molecular Genetics, Russian Academy of Science, Moscow.
| | | |
Collapse
|
47
|
Bonneau R, Baker D. Ab initio protein structure prediction: progress and prospects. ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE 2001; 30:173-89. [PMID: 11340057 DOI: 10.1146/annurev.biophys.30.1.173] [Citation(s) in RCA: 226] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Considerable recent progress has been made in the field of ab initio protein structure prediction, as witnessed by the third Critical Assessment of Structure Prediction (CASP3). In spite of this progress, much work remains, for the field has yet to produce consistently reliable ab initio structure prediction protocols. In this work, we review the features of current ab initio protocols in an attempt to highlight the foundations of recent progress in the field and suggest promising directions for future work.
Collapse
Affiliation(s)
- R Bonneau
- Department of Biochemistry, University of Washington, Seattle, Washington, Box 357350, 98195, USA.
| | | |
Collapse
|
48
|
Abstract
This article investigates aspects of pairwise and multiple structure comparison, and the problem of automatically discover common patterns in a set of structures. Descriptions and representation of structures and patterns are described, as well as scoring and algorithms for comparison and discovery. A framework and nomenclature is developed for classifying different methods, and many of these are reviewed and placed into this framework.
Collapse
Affiliation(s)
- I Eidhammer
- Department of Informatics, University of Bergen, Høyteknologisentret, N-5020 Bergen, Norway.
| | | | | |
Collapse
|
49
|
Linial M, Yona G. Methodologies for target selection in structural genomics. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2001; 73:297-320. [PMID: 11063777 DOI: 10.1016/s0079-6107(00)00011-0] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
As the number of complete genomes that have been sequenced keeps growing, unknown areas of the protein space are revealed and new horizons open up. Most of this information will be fully appreciated only when the structural information about the encoded proteins becomes available. The goal of structural genomics is to direct large-scale efforts of protein structure determination, so as to increase the impact of these efforts. This review focuses on current approaches in structural genomics aimed at selecting representative proteins as targets for structure determination. We will discuss the concept of representative structures/folds, the current methodologies for identifying those proteins, and computational techniques for identifying proteins which are expected to adopt new structural folds.
Collapse
Affiliation(s)
- M Linial
- Department of Biological Chemistry, Institute of Life Sciences, Hebrew University, 91904, Jerusalem, Israel.
| | | |
Collapse
|
50
|
Turcotte M, Muggleton SH, Sternberg MJ. Automated discovery of structural signatures of protein fold and function. J Mol Biol 2001; 306:591-605. [PMID: 11178916 DOI: 10.1006/jmbi.2000.4414] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
There are constraints on a protein sequence/structure for it to adopt a particular fold. These constraints could be either a local signature involving particular sequences or arrangements of secondary structure or a global signature involving features along the entire chain. To search systematically for protein fold signatures, we have explored the use of Inductive Logic Programming (ILP). ILP is a machine learning technique which derives rules from observation and encoded principles. The derived rules are readily interpreted in terms of concepts used by experts. For 20 populated folds in SCOP, 59 rules were found automatically. The accuracy of these rules, which is defined as the number of true positive plus true negative over the total number of examples, is 74% (cross-validated value). Further analysis was carried out for 23 signatures covering 30% or more positive examples of a particular fold. The work showed that signatures of protein folds exist, about half of rules discovered automatically coincide with the level of fold in the SCOP classification. Other signatures correspond to homologous family and may be the consequence of a functional requirement. Examination of the rules shows that many correspond to established principles published in specific literature. However, in general, the list of signatures is not part of standard biological databases of protein patterns. We find that the length of the loops makes an important contribution to the signatures, suggesting that this is an important determinant of the identity of protein folds. With the expansion in the number of determined protein structures, stimulated by structural genomics initiatives, there will be an increased need for automated methods to extract principles of protein folding from coordinates.
Collapse
Affiliation(s)
- M Turcotte
- Imperial Cancer Research Fund, Biomolecular Modelling Labratory, London, WC2A 3PX, UK
| | | | | |
Collapse
|