101
|
|
102
|
Pang CNI, Lin K, Wouters MA, Heringa J, George RA. Identifying foldable regions in protein sequence from the hydrophobic signal. Nucleic Acids Res 2007; 36:578-88. [PMID: 18056079 PMCID: PMC2241846 DOI: 10.1093/nar/gkm1070] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Structural genomics initiatives aim to elucidate representative 3D structures for the majority of protein families over the next decade, but many obstacles must be overcome. The correct design of constructs is extremely important since many proteins will be too large or contain unstructured regions and will not be amenable to crystallization. It is therefore essential to identify regions in protein sequences that are likely to be suitable for structural study. Scooby-Domain is a fast and simple method to identify globular domains in protein sequences. Domains are compact units of protein structure and their correct delineation will aid structural elucidation through a divide-and-conquer approach. Scooby-Domain predictions are based on the observed lengths and hydrophobicities of domains from proteins with known tertiary structure. The prediction method employs an A*-search to identify sequence regions that form a globular structure and those that are unstructured. On a test set of 173 proteins with consensus CATH and SCOP domain definitions, Scooby-Domain has a sensitivity of 50% and an accuracy of 29%, which is better than current state-of-the-art methods. The method does not rely on homology searches and, therefore, can identify previously unknown domains.
Collapse
Affiliation(s)
- Chi N I Pang
- Structural & Computational Biology Program, Victor Chang Cardiac Research Institute, Sydney, Australia
| | | | | | | | | |
Collapse
|
103
|
Lee D, Redfern O, Orengo C. Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 2007; 8:995-1005. [PMID: 18037900 DOI: 10.1038/nrm2281] [Citation(s) in RCA: 352] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
104
|
Abstract
Alternative splicing is thought to be one of the major sources for functional diversity in higher eukaryotes. Interestingly, when mapping splicing events onto protein structures, about half of the events affect structured and even highly conserved regions i.e. are non-trivial on the structure level. This has led to the controversial hypothesis that such splice variants result in nonsense-mediated mRNA decay or non-functional, unstructured proteins, which do not contribute to the functional diversity of an organism. Here we show in a comprehensive study on alternative splicing that proteins appear to be much more tolerant to structural deletions, insertions and replacements than previously thought. We find literature evidence that such non-trivial splicing isoforms exhibit different functional properties compared to their native counterparts and allow for interesting regulatory patterns on the protein network level. We provide examples that splicing events may represent transitions between different folds in the protein sequence–structure space and explain these links by a common genetic mechanism. Taken together, those findings hint to a more prominent role of splicing in protein structure evolution and to a different view of phenotypic plasticity of protein structures.
Collapse
Affiliation(s)
- Fabian Birzele
- Practical Informatics and Bioinformatics Group, Department of Informatics, Ludwig-Maximilians-University, Amalienstrasse 17, D-80333 Munich, Germany.
| | | | | |
Collapse
|
105
|
Copper binding to the Alzheimer's disease amyloid precursor protein. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2007; 37:269-79. [PMID: 18030462 PMCID: PMC2921068 DOI: 10.1007/s00249-007-0234-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2007] [Revised: 10/24/2007] [Accepted: 10/26/2007] [Indexed: 12/25/2022]
Abstract
Alzheimer’s disease is the fourth biggest killer in developed countries. Amyloid precursor protein (APP) plays a central role in the development of the disease, through the generation of a peptide called Aβ by proteolysis of the precursor protein. APP can function as a metalloprotein and modulate copper transport via its extracellular copper binding domain (CuBD). Copper binding to this domain has been shown to reduce Aβ levels and hence a molecular understanding of the interaction between metal and protein could lead to the development of novel therapeutics to treat the disease. We have recently determined the three-dimensional structures of apo and copper bound forms of CuBD. The structures provide a mechanism by which CuBD could readily transfer copper ions to other proteins. Importantly, the lack of significant conformational changes to CuBD on copper binding suggests a model in which copper binding affects the dimerisation state of APP leading to reduction in Aβ production. We thus predict that disruption of APP dimers may be a novel therapeutic approach to treat Alzheimer’s disease.
Collapse
|
106
|
Abyzov A, Ilyin VA. A comprehensive analysis of non-sequential alignments between all protein structures. BMC STRUCTURAL BIOLOGY 2007; 7:78. [PMID: 18005453 PMCID: PMC2213659 DOI: 10.1186/1472-6807-7-78] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2007] [Accepted: 11/16/2007] [Indexed: 05/02/2023]
Abstract
Background The majority of relations between proteins can be represented as a conventional sequential alignment. Nevertheless, unusual non-sequential alignments with different connectivity of the aligned fragments in compared proteins have been reported by many researchers. It is interesting to understand those non-sequential alignments; are they unique, sporadic cases or they occur frequently; do they belong to a few specific folds or spread among many different folds, as a common feature of protein structure. We present here a comprehensive large-scale study of non-sequential alignments between available protein structures in Protein Data Bank. Results The study has been conducted on a non-redundant set of 8,865 protein structures aligned with the aid of the TOPOFIT method. It has been estimated that between 17.4% and 35.2% of all alignments are non-sequential depending on variations in the parameters. Analysis of the data revealed that non-sequential relations between proteins do occur systematically and in large quantities. Various sizes and numbers of non-sequential fragments have been observed with all possible complexities of fragment rearrangements found for alignments consisting of up to 12 fragments. It has been found that non-sequential alignments are not limited to proteins of any particular fold and are present in more than two hundred of them. Moreover, many of them are found between proteins with different fold assignments. It has been shown that protein structure symmetry does not explain non-sequential alignments. Therefore, compelling evidences have been provided that non-sequential alignments between proteins are systematic and widespread across the protein universe. Conclusion The phenomenon of the widespread occurrence of non-sequential alignments between proteins might represent a missing rule of protein structure organization. More detailed study of this phenomenon will enhance our understanding of protein stability, folding, and evolution.
Collapse
Affiliation(s)
- Alexej Abyzov
- Department of Biology, Northeastern University 360 Huntington Avenue, Boston, MA 02115, USA.
| | | |
Collapse
|
107
|
ProCKSI: a decision support system for Protein (structure) Comparison, Knowledge, Similarity and Information. BMC Bioinformatics 2007; 8:416. [PMID: 17963510 PMCID: PMC2222653 DOI: 10.1186/1471-2105-8-416] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2007] [Accepted: 10/26/2007] [Indexed: 11/19/2022] Open
Abstract
Background We introduce the decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information (ProCKSI). ProCKSI integrates various protein similarity measures through an easy to use interface that allows the comparison of multiple proteins simultaneously. It employs the Universal Similarity Metric (USM), the Maximum Contact Map Overlap (MaxCMO) of protein structures and other external methods such as the DaliLite and the TM-align methods, the Combinatorial Extension (CE) of the optimal path, and the FAST Align and Search Tool (FAST). Additionally, ProCKSI allows the user to upload a user-defined similarity matrix supplementing the methods mentioned, and computes a similarity consensus in order to provide a rich, integrated, multicriteria view of large datasets of protein structures. Results We present ProCKSI's architecture and workflow describing its intuitive user interface, and show its potential on three distinct test-cases. In the first case, ProCKSI is used to evaluate the results of a previous CASP competition, assessing the similarity of proposed models for given targets where the structures could have a large deviation from one another. To perform this type of comparison reliably, we introduce a new consensus method. The second study deals with the verification of a classification scheme for protein kinases, originally derived by sequence comparison by Hanks and Hunter, but here we use a consensus similarity measure based on structures. In the third experiment using the Rost and Sander dataset (RS126), we investigate how a combination of different sets of similarity measures influences the quality and performance of ProCKSI's new consensus measure. ProCKSI performs well with all three datasets, showing its potential for complex, simultaneous multi-method assessment of structural similarity in large protein datasets. Furthermore, combining different similarity measures is usually more robust than relying on one single, unique measure. Conclusion Based on a diverse set of similarity measures, ProCKSI computes a consensus similarity profile for the entire protein set. All results can be clustered, visualised, analysed and easily compared with each other through a simple and intuitive interface. ProCKSI is publicly available at for academic and non-commercial use.
Collapse
|
108
|
Gherardini PF, Wass MN, Helmer-Citterich M, Sternberg MJE. Convergent Evolution of Enzyme Active Sites Is not a Rare Phenomenon. J Mol Biol 2007; 372:817-45. [PMID: 17681532 DOI: 10.1016/j.jmb.2007.06.017] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2006] [Revised: 05/14/2007] [Accepted: 06/08/2007] [Indexed: 02/03/2023]
Abstract
Since convergent evolution of enzyme active sites was first identified in serine proteases, other individual instances of this phenomenon have been documented. However, a systematic analysis assessing the frequency of this phenomenon across enzyme space is still lacking. This work uses the Query3d structural comparison algorithm to integrate for the first time detailed knowledge about catalytic residues, available through the Catalytic Site Atlas (CSA), with the evolutionary information provided by the Structural Classification of Proteins (SCOP) database. This study considers two modes of convergent evolution: (i) mechanistic analogues which are enzymes that use the same mechanism to perform related, but possibly different, reactions (considered here as sharing the first three digits of the EC number); and (ii) transformational analogues which catalyse exactly the same reaction (identical EC numbers), but may use different mechanisms. Mechanistic analogues were identified in 15% (26 out of 169) of the three-digit EC groups considered, showing that this phenomenon is not rare. Furthermore 11 of these groups also contain transformational analogues. The catalytic triad is the most widespread active site; the results of the structural comparison show that this mechanism, or variations thereof, is present in 23 superfamilies. Transformational analogues were identified for 45 of the 951 four-digit EC numbers present within the CSA and about half of these were also mechanistic analogues exhibiting convergence of their active sites. This analysis has also been extended to the whole Protein Data Bank to provide a complete and manually curated list of the all the transformational analogues whose structure is classified in SCOP. The results of this work show that the phenomenon of convergent evolution is not rare, especially when considering large enzymatic families.
Collapse
Affiliation(s)
- Pier Federico Gherardini
- Biochemistry Building, Division of Molecular Biosciences, Imperial College London, London SW7 2AZ, UK
| | | | | | | |
Collapse
|
109
|
Shaw N, Tempel W, Chang J, Yang H, Cheng C, Ng J, Rose J, Rao Z, Wang BC, Liu ZJ. Crystal structure solution of a ParB-like nuclease at atomic resolution. Proteins 2007; 70:263-7. [PMID: 17729285 DOI: 10.1002/prot.21641] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Neil Shaw
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
110
|
Rasteiro R, Pereira-Leal JB. Multiple domain insertions and losses in the evolution of the Rab prenylation complex. BMC Evol Biol 2007; 7:140. [PMID: 17705859 PMCID: PMC1994686 DOI: 10.1186/1471-2148-7-140] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2007] [Accepted: 08/17/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rab proteins are regulators of vesicular trafficking, requiring a lipid modification for proper function, prenylation of C-terminal cysteines. This is catalysed by a complex of a catalytic heterodimer (Rab Geranylgeranyl Transferase - RabGGTase) and an accessory protein (Rab Escort Protein. REP). Components of this complex display domain insertions relative to paralogous proteins. The function of these inserted domains is unclear. RESULTS We profiled the domain architecture of the components of the Rab prenylation complex in evolution. We identified the orthologues of the components of the Rab prenylation machinery in 43 organisms, representing the crown eukaryotic groups. We characterize in detail the domain structure of all these components and the phylogenetic relationships between the individual domains. CONCLUSION We found different domain insertions in different taxa, in alpha-subunits of RGGTase and REP. Our results suggest that there were multiple insertions, expansions and contractions in the evolution of this prenylation complex.
Collapse
Affiliation(s)
- Rita Rasteiro
- Instituto Gulbenkian de Ciência, Apartado 14, P-2781-901 Oeiras Portugal
| | | |
Collapse
|
111
|
Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment. BMC Bioinformatics 2007; 8:252. [PMID: 17629909 PMCID: PMC1939857 DOI: 10.1186/1471-2105-8-252] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2007] [Accepted: 07/13/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. RESULTS We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the methodology to discriminate and classify biological sequences and structures. A second set of experiments aims at assessing how well two commonly available classification algorithms, UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining), can use the methodology to perform their task, their performance being evaluated against gold standards and with the use of well known statistical indexes, i.e., the F-measure and the partition distance. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of USM on biological data. The main ones are reported next. CONCLUSION UCD and NCD are indistinguishable, i.e., they yield nearly the same values of the statistical indexes we have used, accross experiments and data sets, while CD is almost always worse than both. UPGMA seems to yield better classification results with respect to NJ, i.e., better values of the statistical indexes (10% difference or above), on a substantial fraction of experiments, compressors and USM approximation choices. The compression program PPMd, based on PPM (Prediction by Partial Matching), for generic data and Gencompress for DNA, are the best performers among the compression algorithms we have used, although the difference in performance, as measured by statistical indexes, between them and the other algorithms depends critically on the data set and may not be as large as expected. PPMd used with UCD or NCD and UPGMA, on sequence data is very close, although worse, in performance with the alignment methods (less than 2% difference on the F-measure). Yet, it scales well with data set size and it can work on data other than sequences. In summary, our quantitative analysis naturally complements the rich theory behind USM and supports the conclusion that the methodology is worth using because of its robustness, flexibility, scalability, and competitiveness with existing techniques. In particular, the methodology applies to all biological data in textual format. The software and data sets are available under the GNU GPL at the supplementary material web page.
Collapse
|
112
|
Andrade J, Karmali A, Carrondo MA, Frazão C. Structure of Amidase from Pseudomonas aeruginosa Showing a Trapped Acyl Transfer Reaction Intermediate State. J Biol Chem 2007; 282:19598-605. [PMID: 17442671 DOI: 10.1074/jbc.m701039200] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Microbial amidases belong to the thiol nitrilases family and have potential biotechnological applications in chemical and pharmaceutical industries as well as in bioremediation. The amidase from Pseudomonas aeruginosa isa6 x 38-kDa enzyme that catalyzes the hydrolysis of a small range of short aliphatic amides. The hereby reported high resolution crystallographic structure shows that each amidase monomer is formed by a globular four-layer alphabetabetaalpha sandwich domain with an additional 81-residue long C-terminal segment. This wraps arm-in-arm with a homologous C-terminal chain of another monomer, producing a strongly packed dimer. In the crystal, the biological active homo-hexameric amidase is built grouping three such dimers around a crystallographic 3-fold axis. The structure also elucidates the structural basis for the enzyme activity, with the nitrilases catalytic triad at the bottom of a 13-A deep, funnel-shaped pocket, accessible from the solvent through a narrow neck with 3-A diameter. An acyl transfer intermediate, resulting from the purification protocol, was found bound to the amidase nucleophilic agent, Cys(166). These results suggest that some pocket defining residues should undergo conformational shifts to allow substrates and products to access and leave the catalytic pocket, for turnover to occur.
Collapse
Affiliation(s)
- Jorge Andrade
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Avenida da República, Apartado 127, 2781-901 Oeiras, Portugal
| | | | | | | |
Collapse
|
113
|
Tung CH, Huang JW, Yang JM. Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database. Genome Biol 2007; 8:R31. [PMID: 17335583 PMCID: PMC1868941 DOI: 10.1186/gb-2007-8-3-r31] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2006] [Revised: 01/05/2007] [Accepted: 03/03/2007] [Indexed: 11/23/2022] Open
Abstract
3D BLAST, a novel protein structure database search tool, is a useful tool for analysing novel structures, capable of returning a list of aligned structures ordered according to E-values. We present a novel protein structure database search tool, 3D-BLAST, that is useful for analyzing novel structures and can return a ranked list of alignments. This tool has the features of BLAST (for example, robust statistical basis, and effective and reliable search capabilities) and employs a kappa-alpha (κ, α) plot derived structural alphabet and a new substitution matrix. 3D-BLAST searches more than 12,000 protein structures in 1.2 s and yields good results in zones with low sequence similarity.
Collapse
Affiliation(s)
- Chi-Hua Tung
- Institute of Bioinformatics, National Chiao Tung University, 75 Po-Ai Street, Hsinchu, 30050, Taiwan
| | - Jhang-Wei Huang
- Institute of Bioinformatics, National Chiao Tung University, 75 Po-Ai Street, Hsinchu, 30050, Taiwan
| | - Jinn-Moon Yang
- Institute of Bioinformatics, National Chiao Tung University, 75 Po-Ai Street, Hsinchu, 30050, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, 75 Po-Ai Street, Hsinchu, 30050, Taiwan
- Core Facility for Structural Bioinformatics, National Chiao Tung University, 75 Po-Ai Street, Hsinchu, Taiwan
| |
Collapse
|
114
|
Chiang YS, Gelfand TI, Kister AE, Gelfand IM. New classification of supersecondary structures of sandwich-like proteins uncovers strict patterns of strand assemblage. Proteins 2007; 68:915-21. [PMID: 17557333 DOI: 10.1002/prot.21473] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
To describe the supersecondary structure (SSS) of beta sandwich-like proteins (SPs), we introduce a structural unit called the "strandon." A strandon is defined as a set of sequentially consecutive strands connected by hydrogen bonds in 3D structures. Representing beta-proteins as the assembly of strandons exposes the underlying similarities in their SSS and enables us to construct a novel classification scheme of SPs. Classification of all known SPs is based on shared supersecondary structural features and is presented in the SSS database (http://binfs.umdnj.edu/sssdb/). Analysis of the SSS reveals two common specific patterns. The first pattern defines the arrangement of strandons and was found in 95% of all examined SPs. The second pattern establishes the ordering of strands in the protein domain and was observed in 82% of the analyzed SPs. Knowledge of these two patterns that uncover the spatial arrangement of strands will likely prove useful in protein structure prediction.
Collapse
Affiliation(s)
- Yih-Shien Chiang
- Department of Health Informatics, SHRP, University of Medicine and Dentistry of New Jersey, Newark, New Jersey 07107, USA
| | | | | | | |
Collapse
|
115
|
Torrance JW, Holliday GL, Mitchell JB, Thornton JM. The Geometry of Interactions between Catalytic Residues and their Substrates. J Mol Biol 2007; 369:1140-52. [DOI: 10.1016/j.jmb.2007.03.055] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2007] [Revised: 03/14/2007] [Accepted: 03/20/2007] [Indexed: 10/23/2022]
|
116
|
Marti-Renom MA, Rossi A, Al-Shahrour F, Davis FP, Pieper U, Dopazo J, Sali A. The AnnoLite and AnnoLyze programs for comparative annotation of protein structures. BMC Bioinformatics 2007; 8 Suppl 4:S4. [PMID: 17570147 PMCID: PMC1892083 DOI: 10.1186/1471-2105-8-s4-s4] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Advances in structural biology, including structural genomics, have resulted in a rapid increase in the number of experimentally determined protein structures. However, about half of the structures deposited by the structural genomics consortia have little or no information about their biological function. Therefore, there is a need for tools for automatically and comprehensively annotating the function of protein structures. We aim to provide such tools by applying comparative protein structure annotation that relies on detectable relationships between protein structures to transfer functional annotations. Here we introduce two programs, AnnoLite and AnnoLyze, which use the structural alignments deposited in the DBAli database. Description AnnoLite predicts the SCOP, CATH, EC, InterPro, PfamA, and GO terms with an average sensitivity of ~90% and average precision of ~80%. AnnoLyze predicts ligand binding site and domain interaction patches with an average sensitivity of ~70% and average precision of ~30%, correctly localizing binding sites for small molecules in ~95% of its predictions. Conclusion The AnnoLite and AnnoLyze programs for comparative annotation of protein structures can reliably and automatically annotate new protein structures. The programs are fully accessible via the Internet as part of the DBAli suite of tools at .
Collapse
Affiliation(s)
- Marc A Marti-Renom
- Structural Genomics Unit, Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Andrea Rossi
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94143, USA
| | - Fátima Al-Shahrour
- Functional Genomics Unit, Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Fred P Davis
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94143, USA
| | - Ursula Pieper
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94143, USA
| | - Joaquín Dopazo
- Functional Genomics Unit, Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Andrej Sali
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94143, USA
| |
Collapse
|
117
|
Zheng X, Dai X, Zhao Y, Chen Q, Lu F, Yao D, Yu Q, Liu X, Zhang C, Gu X, Luo M. Restructuring of the dinucleotide-binding fold in an NADP(H) sensor protein. Proc Natl Acad Sci U S A 2007; 104:8809-14. [PMID: 17496144 PMCID: PMC1885584 DOI: 10.1073/pnas.0700480104] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
NAD(P) has long been known as an essential energy-carrying molecule in cells. Recent data, however, indicate that NAD(P) also plays critical signaling roles in regulating cellular functions. The crystal structure of a human protein, HSCARG, with functions previously unknown, has been determined to 2.4-A resolution. The structure reveals that HSCARG can form an asymmetrical dimer with one subunit occupied by one NADP molecule and the other empty. Restructuring of its NAD(P)-binding Rossmann fold upon NADP binding changes an extended loop to an alpha-helix to restore the integrity of the Rossmann fold. The previously unobserved restructuring suggests that HSCARG may assume a resting state when the level of NADP(H) is normal within the cell. When the NADP(H) level passes a threshold, an extensive restructuring of HSCARG would result in the activation of its regulatory functions. Immunofluorescent imaging shows that HSCARG redistributes from being associated with intermediate filaments in the resting state to being dispersed in the nucleus and the cytoplasm. The structural change of HSCARG upon NADP(H) binding could be a new regulatory mechanism that responds only to a significant change of NADP(H) levels. One of the functions regulated by HSCARG may be argininosuccinate synthetase that is involved in NO synthesis.
Collapse
Affiliation(s)
- Xiaofeng Zheng
- *National Laboratory of Protein Engineering and Plant Genetic Engineering
- Departments of Biochemistry and Molecular Biology and
- To whom correspondence may be addressed. E-mail: or
| | - Xueyu Dai
- *National Laboratory of Protein Engineering and Plant Genetic Engineering
- Departments of Biochemistry and Molecular Biology and
| | - Yanmei Zhao
- *National Laboratory of Protein Engineering and Plant Genetic Engineering
- Departments of Biochemistry and Molecular Biology and
| | - Qiang Chen
- *National Laboratory of Protein Engineering and Plant Genetic Engineering
- Departments of Biochemistry and Molecular Biology and
| | - Fei Lu
- Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing 100871, China
| | - Deqiang Yao
- Institute of High Energy Physics, Chinese Academy of Sciences, Beijing, 100049, China
| | - Quan Yu
- *National Laboratory of Protein Engineering and Plant Genetic Engineering
- Departments of Biochemistry and Molecular Biology and
| | - Xinping Liu
- *National Laboratory of Protein Engineering and Plant Genetic Engineering
- Departments of Biochemistry and Molecular Biology and
| | - Chuanmao Zhang
- Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing 100871, China
| | - Xiaocheng Gu
- *National Laboratory of Protein Engineering and Plant Genetic Engineering
| | - Ming Luo
- Department of Microbiology, University of Alabama, Birmingham, AL 35294; and
- To whom correspondence may be addressed. E-mail: or
| |
Collapse
|
118
|
Rodrigues APC, Grant BJ, Godzik A, Friedberg I. The 2006 automated function prediction meeting. BMC Bioinformatics 2007; 8 Suppl 4:S1-4. [PMID: 17570143 PMCID: PMC1892079 DOI: 10.1186/1471-2105-8-s4-s1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Ana PC Rodrigues
- Burnham Institute for Medical Research, 10901 N. Torrey Pines Rd., La Jolla, CA 92037 USA
| | - Barry J Grant
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093, USA
| | - Adam Godzik
- Burnham Institute for Medical Research, 10901 N. Torrey Pines Rd., La Jolla, CA 92037 USA
- Center for Research in Biological Systems (CRBS), University of California, San Diego, 9500 Gilman Drive La Jolla, MC 0446 CA 92093, USA
| | - Iddo Friedberg
- Burnham Institute for Medical Research, 10901 N. Torrey Pines Rd., La Jolla, CA 92037 USA
| |
Collapse
|
119
|
Tung CH, Yang JM. fastSCOP: a fast web server for recognizing protein structural domains and SCOP superfamilies. Nucleic Acids Res 2007; 35:W438-43. [PMID: 17485476 PMCID: PMC1933144 DOI: 10.1093/nar/gkm288] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The fastSCOP is a web server that rapidly identifies the structural domains and determines the evolutionary superfamilies of a query protein structure. This server uses 3D-BLAST to scan quickly a large structural classification database (SCOP1.71 with <95% identity with each other) and the top 10 hit domains, which have different superfamily classifications, are obtained from the hit lists. MAMMOTH, a detailed structural alignment tool, is adopted to align these top 10 structures to refine domain boundaries and to identify evolutionary superfamilies. Our previous works demonstrated that 3D-BLAST is as fast as BLAST, and has the characteristics of BLAST (e.g. a robust statistical basis, effective search and reliable database search capabilities) in large structural database searches based on a structural alphabet database and a structural alphabet substitution matrix. The classification accuracy of this server is ∼98% for 586 query structures and the average execution time is ∼5. This server was also evaluated on 8700 structures, which have no annotations in the SCOP; the server can automatically assign 7311 (84%) proteins (9420 domains) to the SCOP superfamilies in 9.6 h. These results suggest that the fastSCOP is robust and can be a useful server for recognizing the evolutionary classifications and the protein functions of novel structures. The server is accessible at http://fastSCOP.life.nctu.edu.tw.
Collapse
Affiliation(s)
- Chi-Hua Tung
- Institute of Bioinformatics, Department of Biological Science and Technology and Core Facility for Structural Bioinformatics, National Chiao Tung University, Hsinchu, 30050 Taiwan
| | - Jinn-Moon Yang
- Institute of Bioinformatics, Department of Biological Science and Technology and Core Facility for Structural Bioinformatics, National Chiao Tung University, Hsinchu, 30050 Taiwan
- *To whom correspondence should be addressed. +886 3 571212 56942+886 3 5729288
| |
Collapse
|
120
|
Marti-Renom MA, Pieper U, Madhusudhan MS, Rossi A, Eswar N, Davis FP, Al-Shahrour F, Dopazo J, Sali A. DBAli tools: mining the protein structure space. Nucleic Acids Res 2007; 35:W393-7. [PMID: 17478513 PMCID: PMC1933139 DOI: 10.1093/nar/gkm236] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The DBAli tools use a comprehensive set of structural alignments in the DBAli database to leverage the structural information deposited in the Protein Data Bank (PDB). These tools include (i) the DBAlit program that allows users to input the 3D coordinates of a protein structure for comparison by MAMMOTH against all chains in the PDB; (ii) the AnnoLite and AnnoLyze programs that annotate a target structure based on its stored relationships to other structures; (iii) the ModClus program that clusters structures by sequence and structure similarities; (iv) the ModDom program that identifies domains as recurrent structural fragments and (v) an implementation of the COMPARER method in the SALIGN command in MODELLER that creates a multiple structure alignment for a set of related protein structures. Thus, the DBAli tools, which are freely accessible via the World Wide Web at http://salilab.org/DBAli/, allow users to mine the protein structure space by establishing relationships between protein structures and their functions.
Collapse
Affiliation(s)
- Marc A Marti-Renom
- Structural Genomics Unit, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94158-2330, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
121
|
Jefferson ER, Walsh TP, Roberts TJ, Barton GJ. SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein-Protein Interactions. Nucleic Acids Res 2007; 35:D580-9. [PMID: 17202171 PMCID: PMC1899103 DOI: 10.1093/nar/gkl836] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
SNAPPI-DB, a high performance database of Structures, iNterfaces and Alignments of Protein–Protein Interactions, and its associated Java Application Programming Interface (API) is described. SNAPPI-DB contains structural data, down to the level of atom co-ordinates, for each structure in the Protein Data Bank (PDB) together with associated data including SCOP, CATH, Pfam, SWISSPROT, InterPro, GO terms, Protein Quaternary Structures (PQS) and secondary structure information. Domain–domain interactions are stored for multiple domain definitions and are classified by their Superfamily/Family pair and interaction interface. Each set of classified domain–domain interactions has an associated multiple structure alignment for each partner. The API facilitates data access via PDB entries, domains and domain–domain interactions. Rapid development, fast database access and the ability to perform advanced queries without the requirement for complex SQL statements are provided via an object oriented database and the Java Data Objects (JDO) API. SNAPPI-DB contains many features which are not available in other databases of structural protein–protein interactions. It has been applied in three studies on the properties of protein–protein interactions and is currently being employed to train a protein–protein interaction predictor and a functional residue predictor. The database, API and manual are available for download at: .
Collapse
Affiliation(s)
| | | | | | - Geoffrey J. Barton
- To whom correspondence should be addressed. Tel: +44 01382 385860; Fax: +44 01382 385764;
| |
Collapse
|
122
|
Kong GKW, Adams JJ, Harris HH, Boas JF, Curtain CC, Galatis D, Masters CL, Barnham KJ, McKinstry WJ, Cappai R, Parker MW. Structural Studies of the Alzheimer’s Amyloid Precursor Protein Copper-binding Domain Reveal How it Binds Copper Ions. J Mol Biol 2007; 367:148-61. [PMID: 17239395 DOI: 10.1016/j.jmb.2006.12.041] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2006] [Revised: 12/11/2006] [Accepted: 12/15/2006] [Indexed: 11/30/2022]
Abstract
Alzheimer's disease (AD) is the major cause of dementia. Amyloid beta peptide (Abeta), generated by proteolytic cleavage of the amyloid precursor protein (APP), is central to AD pathogenesis. APP can function as a metalloprotein and modulate copper (Cu) transport, presumably via its extracellular Cu-binding domain (CuBD). Cu binding to the CuBD reduces Abeta levels, suggesting that a Cu mimetic may have therapeutic potential. We describe here the atomic structures of apo CuBD from three crystal forms and found they have identical Cu-binding sites despite the different crystal lattices. The structure of Cu(2+)-bound CuBD reveals that the metal ligands are His147, His151, Tyr168 and two water molecules, which are arranged in a square pyramidal geometry. The site resembles a Type 2 non-blue Cu center and is supported by electron paramagnetic resonance and extended X-ray absorption fine structure studies. A previous study suggested that Met170 might be a ligand but we suggest that this residue plays a critical role as an electron donor in CuBDs ability to reduce Cu ions. The structure of Cu(+)-bound CuBD is almost identical to the Cu(2+)-bound structure except for the loss of one of the water ligands. The geometry of the site is unfavorable for Cu(+), thus providing a mechanism by which CuBD could readily transfer Cu ions to other proteins.
Collapse
Affiliation(s)
- Geoffrey K-W Kong
- Biota Structural Biology Laboratory, St. Vincent's Institute, 9 Princes Street, Fitzroy, Victoria 3065, Australia
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
123
|
Macías JR, Jiménez-Lozano N, Carazo JM. Integrating electron microscopy information into existing Distributed Annotation Systems. J Struct Biol 2007; 158:205-13. [PMID: 17400476 DOI: 10.1016/j.jsb.2007.02.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2006] [Revised: 12/19/2006] [Accepted: 02/13/2007] [Indexed: 10/23/2022]
Abstract
The increase of daily released bioinformatic data has generated new ways of organising and disseminating information. Specifically, in the field of sequence data, many efforts have been made not only to store information in databases, but also to annotate it and then share these annotations through a standard XML (eXtensible Markup Language) protocol and appropriate integration clients. This is the context in which the Distributed Annotation System (DAS) has emerged in genomics. Additionally, initiatives in the field of structural data, such as the extension of DAS to atomic resolution data, which generated the SPICE client, have also occurred. This paper presents 3D-EM DAS, a further extension of the DAS protocol that allows sharing annotations about hybrid models. This annotation system has been built on the basis of the EMDB, which stores Three-dimensional Electron Microscopy (3D-EM) volumes, PDB, which houses atomic coordinates, and UniProt (for protein sequences) databases. In this way, annotations for sequences, atomic coordinates, and 3D-EM volumes are collected and displayed through a single graphical visualization client. Thus, users have an integrated view of all the annotations together with the whole macromolecule (3D-EM map coming from EMDB), the atomic resolution structures fitted into it (coordinates coming from PDB) and the sequences corresponding to each of the structures (from UniProt).
Collapse
Affiliation(s)
- J R Macías
- Unidad de Biocomputación, Centro Nacional de Biotecnología-CSIC, Campus de Cantoblanco UAM, c/ Darwin 3, 28049 Madrid, Spain.
| | | | | |
Collapse
|
124
|
Bateman A, Finn RD. SCOOP: a simple method for identification of novel protein superfamily relationships. Bioinformatics 2007; 23:809-14. [PMID: 17277330 PMCID: PMC2603044 DOI: 10.1093/bioinformatics/btm034] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Profile searches of sequence databases are a sensitive way to detect sequence relationships. Sophisticated profile-profile comparison algorithms that have been recently introduced increase search sensitivity even further. RESULTS In this article, a simpler approach than profile-profile comparison is presented that has a comparable performance to state-of-the-art tools such as COMPASS, HHsearch and PRC. This approach is called SCOOP (Simple Comparison Of Outputs Program), and is shown to find known relationships between families in the Pfam database as well as detect novel distant relationships between families. Several novel discoveries are presented including the discovery that a domain of unknown function (DUF283) found in Dicer proteins is related to double-stranded RNA-binding domains. AVAILABILITY SCOOP is freely available under a GNU GPL license from http://www.sanger.ac.uk/Users/agb/SCOOP/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alex Bateman
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.
| | | |
Collapse
|
125
|
Rueda M, Ferrer-Costa C, Meyer T, Pérez A, Camps J, Hospital A, Gelpí JL, Orozco M. A consensus view of protein dynamics. Proc Natl Acad Sci U S A 2007; 104:796-801. [PMID: 17215349 PMCID: PMC1783393 DOI: 10.1073/pnas.0605534104] [Citation(s) in RCA: 189] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2006] [Indexed: 11/18/2022] Open
Abstract
The dynamics of proteins in aqueous solution has been investigated through a massive approach based on "state of the art" molecular dynamics simulations performed for all protein metafolds using the four most popular force fields (OPLS, CHARMM, AMBER, and GROMOS). A detailed analysis of the massive database of trajectories (>1.5 terabytes of data obtained using approximately 50 years of CPU) allowed us to obtain a robust-consensus picture of protein dynamics in aqueous solution.
Collapse
Affiliation(s)
- Manuel Rueda
- *Molecular Modelling and Bioinformatics Unit and
- Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and
| | - Carles Ferrer-Costa
- *Molecular Modelling and Bioinformatics Unit and
- Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and
| | - Tim Meyer
- *Molecular Modelling and Bioinformatics Unit and
- Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and
- Departament de Bioquímica i Biologia Molecular, Facultat de Biologia, Universitat de Barcelona, Avgda Diagonal 645, 08028 Barcelona, Spain
| | - Alberto Pérez
- *Molecular Modelling and Bioinformatics Unit and
- Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and
| | - Jordi Camps
- Structural Biology Node, Institut de Recerca Biomèdica, Parc Científic de Barcelona, Josep Samitier 1-5, 08028 Barcelona, Spain
- Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and
| | - Adam Hospital
- *Molecular Modelling and Bioinformatics Unit and
- Structural Biology Node, Institut de Recerca Biomèdica, Parc Científic de Barcelona, Josep Samitier 1-5, 08028 Barcelona, Spain
- Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and
| | - Josep Lluis Gelpí
- *Molecular Modelling and Bioinformatics Unit and
- Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and
- Departament de Bioquímica i Biologia Molecular, Facultat de Biologia, Universitat de Barcelona, Avgda Diagonal 645, 08028 Barcelona, Spain
| | - Modesto Orozco
- *Molecular Modelling and Bioinformatics Unit and
- Structural Biology Node, Institut de Recerca Biomèdica, Parc Científic de Barcelona, Josep Samitier 1-5, 08028 Barcelona, Spain
- Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and
- Departament de Bioquímica i Biologia Molecular, Facultat de Biologia, Universitat de Barcelona, Avgda Diagonal 645, 08028 Barcelona, Spain
| |
Collapse
|
126
|
Cho KI, Lee K, Lee KH, Kim D, Lee D. Specificity of molecular interactions in transient protein-protein interaction interfaces. Proteins 2007; 65:593-606. [PMID: 16948160 DOI: 10.1002/prot.21056] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this study, we investigate what types of interactions are specific to their biological function, and what types of interactions are persistent regardless of their functional category in transient protein-protein heterocomplexes. This is the first approach to analyze protein-protein interfaces systematically at the molecular interaction level in the context of protein functions. We perform systematic analysis at the molecular interaction level using classification and feature subset selection technique prevalent in the field of pattern recognition. To represent the physicochemical properties of protein-protein interfaces, we design 18 molecular interaction types using canonical and noncanonical interactions. Then, we construct input vector using the frequency of each interaction type in protein-protein interface. We analyze the 131 interfaces of transient protein-protein heterocomplexes in PDB: 33 protease-inhibitors, 52 antibody-antigens, 46 signaling proteins including 4 cyclin dependent kinase and 26 G-protein. Using kNN classification and feature subset selection technique, we show that there are specific interaction types based on their functional category, and such interaction types are conserved through the common binding mechanism, rather than through the sequence or structure conservation. The extracted interaction types are C(alpha)-- H...O==C interaction, cation...anion interaction, amine...amine interaction, and amine...cation interaction. With these four interaction types, we achieve the classification success rate up to 83.2% with leave-one-out cross-validation at k = 15. Of these four interaction types, C(alpha)--H...O==C shows binding specificity for protease-inhibitor complexes, while cation-anion interaction is predominant in signaling complexes. The amine ... amine and amine...cation interaction give a minor contribution to the classification accuracy. When combined with these two interactions, they increase the accuracy by 3.8%. In the case of antibody-antigen complexes, the sign is somewhat ambiguous. From the evolutionary perspective, while protease-inhibitors and sig-naling proteins have optimized their interfaces to suit their biological functions, antibody-antigen interactions are the happenstance, implying that antibody-antigen complexes do not show distinctive interaction types. Persistent interaction types such as pi...pi, amide-carbonyl, and hydroxyl-carbonyl interaction, are also investigated. Analyzing the structural orientations of the pi...pi stacking interactions, we find that herringbone shape is a major configuration in transient protein-protein interfaces. This result is different from that of protein core, where parallel-displaced configurations are the major configuration. We also analyze overall trend of amide-carbonyl and hydroxyl-carbonyl interactions. It is noticeable that nearly 82% of the interfaces have at least one hydroxyl-carbonyl interactions.
Collapse
Affiliation(s)
- Kyu-il Cho
- Bio-Information System Laboratory, Department of BioSystems, KAIST, Guseong-dong, Yuseong-gu, 305-701, Daejeon, Korea
| | | | | | | | | |
Collapse
|
127
|
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state of the art by a number of specific examples.
Collapse
|
128
|
Abstract
Protein sequence classification and comparison has become increasingly important in the current "omics" revolution, where scientists are working on functional genomics and proteomics technologies for large-scale protein function prediction. However, functional classification is also important for the bench scientist wanting to analyze single or small sets of proteins, or even a single genome. A number of tools are available for sequence classification, such as sequence similarity searches, motif- or pattern-finding software, and protein signatures for identifying protein families and domains. One such tool, InterPro, is a documentation resource that integrates the major players in the protein signature field to provide a valuable tool for annotation of proteins. Protein sequences are searched using the InterProScan software to identify signatures from the InterPro member databases; Pfam, PROSITE, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D, and PANTHER. The InterPro database can be searched to retrieve precalculated matches for UniProtKB proteins, or to find additional information on protein families and domains. For completely sequenced genomes, the user can retrieve InterPro-based analyses on all nonredundant proteins in the proteome, and can execute user-selected proteome comparisons. This chapter will describe how to use InterPro and InterProScan for protein sequence classification and comparative proteomics.
Collapse
|
129
|
Pratelli R, Pilot G. The plant-specific VIMAG domain of Glutamine Dumper1 is necessary for the function of the protein in Arabidopsis. FEBS Lett 2006; 580:6961-6. [PMID: 17157837 DOI: 10.1016/j.febslet.2006.11.064] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2006] [Revised: 11/21/2006] [Accepted: 11/21/2006] [Indexed: 11/23/2022]
Abstract
The over-expression of the arabidopsis GLUTAMINE DUMPER1 gene (GDU1) leads to increased amino acid content and transport. In a screening for mutations suppressing this phenotype, a mutant was isolated. The mutation leads to a glycine to arginine substitution in one of the two conserved domains of the protein, the VIMAG domain. More detailed structure function relationship analyses showed that the presence of this domain and the membrane localisation are both necessary for the function of the GDU1 protein. These results shed light on the function of the GDU1 protein whose family is specific to plants.
Collapse
Affiliation(s)
- Réjane Pratelli
- Institute for Cellular and Molecular Botany (IZMB), Kirschallee 1, 53115 Bonn, Germany
| | | |
Collapse
|
130
|
Smialowski P, Martin-Galiano AJ, Mikolajka A, Girschick T, Holak TA, Frishman D. Protein solubility: sequence based prediction and experimental verification. Bioinformatics 2006; 23:2536-42. [PMID: 17150993 DOI: 10.1093/bioinformatics/btl623] [Citation(s) in RCA: 104] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Obtaining soluble proteins in sufficient concentrations is a recurring limiting factor in various experimental studies. Solubility is an individual trait of proteins which, under a given set of experimental conditions, is determined by their amino acid sequence. Accurate theoretical prediction of solubility from sequence is instrumental for setting priorities on targets in large-scale proteomics projects. RESULTS We present a machine-learning approach called PROSO to assess the chance of a protein to be soluble upon heterologous expression in Escherichia coli based on its amino acid composition. The classification algorithm is organized as a two-layered structure in which the output of primary support vector machine (SVM) classifiers serves as input for a secondary Naive Bayes classifier. Experimental progress information from the TargetDB database as well as previously published datasets were used as the source of training data. In comparison with previously published methods our classification algorithm possesses improved discriminatory capacity characterized by the Matthews Correlation Coefficient (MCC) of 0.434 between predicted and known solubility states and the overall prediction accuracy of 72% (75 and 68% for positive and negative class, respectively). We also provide experimental verification of our predictions using solubility measurements for 31 mutational variants of two different proteins.
Collapse
Affiliation(s)
- Pawel Smialowski
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | | | | | | | | | | |
Collapse
|
131
|
Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 2006; 35:D291-7. [PMID: 17135200 PMCID: PMC1751535 DOI: 10.1093/nar/gkl959] [Citation(s) in RCA: 239] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We report the latest release (version 3.0) of the CATH protein domain database (). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto ∼2 million sequences in completed genomes and UniProt.
Collapse
Affiliation(s)
| | | | | | - Alison Cuff
- To whom correspondence should be addressed: Tel: +1 44 207 679 3890; Fax: +1 44 207 679 7193;
| | | | | | | | | | | | | | | | | | - Janet M. Thornton
- European Bioinformatics Institute, Hinxton HallHinxton, Cambridge CB 10 IRQ, UK
| | | |
Collapse
|
132
|
Sonego P, Pacurar M, Dhir S, Kertész-Farkas A, Kocsor A, Gáspári Z, Leunissen JA, Pongor S. A Protein Classification Benchmark collection for machine learning. Nucleic Acids Res 2006; 35:D232-6. [PMID: 17142240 PMCID: PMC1669728 DOI: 10.1093/nar/gkl812] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Protein classification by machine learning algorithms is now widely used in structural and functional annotation of proteins. The Protein Classification Benchmark collection () was created in order to provide standard datasets on which the performance of machine learning methods can be compared. It is primarily meant for method developers and users interested in comparing methods under standardized conditions. The collection contains datasets of sequences and structures, and each set is subdivided into positive/negative, training/test sets in several ways. There is a total of 6405 classification tasks, 3297 on protein sequences, 3095 on protein structures and 10 on protein coding regions in DNA. Typical tasks include the classification of structural domains in the SCOP and CATH databases based on their sequences or structures, as well as various functional and taxonomic classification problems. In the case of hierarchical classification schemes, the classification tasks can be defined at various levels of the hierarchy (such as classes, folds, superfamilies, etc.). For each dataset there are distance matrices available that contain all vs. all comparison of the data, based on various sequence or structure comparison methods, as well as a set of classification performance measures computed with various classifier algorithms.
Collapse
Affiliation(s)
| | | | | | - Attila Kertész-Farkas
- Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Aradi vértanúk tere 1.H-6720 Szeged, Hungary
| | - András Kocsor
- Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Aradi vértanúk tere 1.H-6720 Szeged, Hungary
| | - Zoltán Gáspári
- Institute of Chemistry, Eötvös Loránd UniversityPázmány Péter sétány 1/A, H-1117 Budapest, Hungary
- Bioinformatics Group, Biological Research CentreHungarian Academy of Sciences, Temesvári krt. 62, H-6701 Szeged, Hungary
| | - Jack A.M. Leunissen
- Laboratory of Bioinformatics, Wageningen University and Research CentrePO Box 8128, 6700 ET Wageningen, The Netherlands
| | - Sándor Pongor
- To whom correspondence should be addressed. Tel: +39 0403757300; Fax: +39 040226555;
| |
Collapse
|
133
|
Tracing the origin of functional and conserved domains in the human proteome: implications for protein evolution at the modular level. BMC Evol Biol 2006; 6:91. [PMID: 17090320 PMCID: PMC1654190 DOI: 10.1186/1471-2148-6-91] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2006] [Accepted: 11/07/2006] [Indexed: 11/29/2022] Open
Abstract
Background The functional repertoire of the human proteome is an incremental collection of functions accomplished by protein domains evolved along the Homo sapiens lineage. Therefore, knowledge on the origin of these functionalities provides a better understanding of the domain and protein evolution in human. The lack of proper comprehension about such origin has impelled us to study the evolutionary origin of human proteome in a unique way as detailed in this study. Results This study reports a unique approach for understanding the evolution of human proteome by tracing the origin of its constituting domains hierarchically, along the Homo sapiens lineage. The uniqueness of this method lies in subtractive searching of functional and conserved domains in the human proteome resulting in higher efficiency of detecting their origins. From these analyses the nature of protein evolution and trends in domain evolution can be observed in the context of the entire human proteome data. The method adopted here also helps delineate the degree of divergence of functional families occurred during the course of evolution. Conclusion This approach to trace the evolutionary origin of functional domains in the human proteome facilitates better understanding of their functional versatility as well as provides insights into the functionality of hypothetical proteins present in the human proteome. This work elucidates the origin of functional and conserved domains in human proteins, their distribution along the Homo sapiens lineage, occurrence frequency of different domain combinations and proteome-wide patterns of their distribution, providing insights into the evolutionary solution to the increased complexity of the human proteome.
Collapse
|
134
|
Berman HM, Burley SK, Chiu W, Sali A, Adzhubei A, Bourne PE, Bryant SH, Dunbrack RL, Fidelis K, Frank J, Godzik A, Henrick K, Joachimiak A, Heymann B, Jones D, Markley JL, Moult J, Montelione GT, Orengo C, Rossmann MG, Rost B, Saibil H, Schwede T, Standley DM, Westbrook JD. Outcome of a workshop on archiving structural models of biological macromolecules. Structure 2006; 14:1211-7. [PMID: 16955948 DOI: 10.1016/j.str.2006.06.005] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Affiliation(s)
- Helen M Berman
- The Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
135
|
Lin Z, Sriskanthadevan S, Huang H, Siu CH, Yang D. Solution structures of the adhesion molecule DdCAD-1 reveal new insights into Ca2+-dependent cell-cell adhesion. Nat Struct Mol Biol 2006; 13:1016-22. [PMID: 17057715 DOI: 10.1038/nsmb1162] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2006] [Accepted: 10/03/2006] [Indexed: 02/06/2023]
Abstract
DdCAD-1 is a novel Ca(2+)-dependent cell adhesion molecule that lacks a hydrophobic signal peptide and a transmembrane domain. DdCAD-1 is expressed by the social amoeba Dictyostelium discoideum at the onset of development. It is synthesized as a soluble protein and then transported to the plasma membrane by contractile vacuoles. Here we describe the novel features of the solution structures of Ca(2+)-free and Ca(2+)-bound monomeric DdCAD-1. DdCAD-1 contains two beta-sandwich domains, belonging to the betagamma-crystallin and immunoglobulin fold classes, respectively. Whereas the N-terminal domain has a major role in homophilic binding, the C-terminal domain tethers the protein to the cell membrane. From structural and mutational analyses, we propose a model for the Ca(2+)-bound DdCAD-1 dimer as a basis for understanding DdCAD-1-mediated cell-cell adhesion at the molecular level. Our results provide new insights into Ca(2+)-dependent mechanisms for cell-cell adhesion.
Collapse
Affiliation(s)
- Zhi Lin
- Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, Singapore 117543
| | | | | | | | | |
Collapse
|
136
|
Zhi D, Krishna SS, Cao H, Pevzner P, Godzik A. Representing and comparing protein structures as paths in three-dimensional space. BMC Bioinformatics 2006; 7:460. [PMID: 17052359 PMCID: PMC1626488 DOI: 10.1186/1471-2105-7-460] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2006] [Accepted: 10/20/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Most existing formulations of protein structure comparison are based on detailed atomic level descriptions of protein structures and bypass potential insights that arise from a higher-level abstraction. RESULTS We propose a structure comparison approach based on a simplified representation of proteins that describes its three-dimensional path by local curvature along the generalized backbone of the polypeptide. We have implemented a dynamic programming procedure that aligns curvatures of proteins by optimizing a defined sum turning angle deviation measure. CONCLUSION Although our procedure does not directly optimize global structural similarity as measured by RMSD, our benchmarking results indicate that it can surprisingly well recover the structural similarity defined by structure classification databases and traditional structure alignment programs. In addition, our program can recognize similarities between structures with extensive conformation changes that are beyond the ability of traditional structure alignment programs. We demonstrate the applications of procedure to several contexts of structure comparison. An implementation of our procedure, CURVE, is available as a public webserver.
Collapse
Affiliation(s)
- Degui Zhi
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720-3102, USA
| | - S Sri Krishna
- Joint Center for Structural Genomics, Burnham Institute for Medical Research, La Jolla, CA 92037, USA
| | - Haibo Cao
- Bioinformatics Program, Infectious and Inflammation Disease Center, Burnham Institute for Medical Research, La Jolla, CA 92037, USA
| | - Pavel Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093-0114, USA
| | - Adam Godzik
- Joint Center for Structural Genomics, Burnham Institute for Medical Research, La Jolla, CA 92037, USA
- Bioinformatics Program, Infectious and Inflammation Disease Center, Burnham Institute for Medical Research, La Jolla, CA 92037, USA
| |
Collapse
|
137
|
Chivian D, Baker D. Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection. Nucleic Acids Res 2006; 34:e112. [PMID: 16971460 PMCID: PMC1635247 DOI: 10.1093/nar/gkl480] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The accuracy of a homology model based on the structure of a distant relative or other topologically equivalent protein is primarily limited by the quality of the alignment. Here we describe a systematic approach for sequence-to-structure alignment, called ‘K*Sync’, in which alignments are generated by dynamic programming using a scoring function that combines information on many protein features, including a novel measure of how obligate a sequence region is to the protein fold. By systematically varying the weights on the different features that contribute to the alignment score, we generate very large ensembles of diverse alignments, each optimal under a particular constellation of weights. We investigate a variety of approaches to select the best models from the ensemble, including consensus of the alignments, a hydrophobic burial measure, low- and high-resolution energy functions, and combinations of these evaluation methods. The effect on model quality and selection resulting from loop modeling and backbone optimization is also studied. The performance of the method on a benchmark set is reported and shows the approach to be effective at both generating and selecting accurate alignments. The method serves as the foundation of the homology modeling module in the Robetta server.
Collapse
Affiliation(s)
- Dylan Chivian
- Department of Biochemistry, University of WashingtonSeattle, WA, USA
| | - David Baker
- Department of Biochemistry, University of WashingtonSeattle, WA, USA
- Howard Hughes Medical Institute, SeattleWA, USA
- To whom correspondence should be addressed at Department of Biochemistry and HHMI, University of Washington, Box 357350, Seattle, WA 98195, USA. Tel: +1 206 543 1295; Fax: +1 206 685 1792;
| |
Collapse
|
138
|
Godoi PHC, Galhardo RS, Luche DD, Van Sluys MA, Menck CFM, Oliva G. Structure of the thiazole biosynthetic enzyme THI1 from Arabidopsis thaliana. J Biol Chem 2006; 281:30957-66. [PMID: 16912043 DOI: 10.1074/jbc.m604469200] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Thiamin pyrophosphate is an essential coenzyme in all organisms that depend on fermentation, respiration or photosynthesis to produce ATP. It is synthesized through two independent biosynthetic routes: one for the synthesis of 2-methyl-4-amino-5-hydroxymethylpyrimidine pyrophosphate (pyrimidine moiety) and another for the synthesis of 4-methyl-5-(beta-hydroxyethyl) thiazole phosphate (thiazole moiety). Herein, we will describe the three-dimensional structure of THI1 protein from Arabidopsis thaliana determined by single wavelength anomalous diffraction to 1.6A resolution. The protein was produced using heterologous expression in bacteria, unexpectedly bound to 2-carboxylate-4-methyl-5-beta-(ethyl adenosine 5-diphosphate) thiazole, a potential intermediate of the thiazole biosynthesis in Eukaryotes. THI1 has a topology similar to dinucleotide binding domains and although details concerning its function are unknown, this work provides new clues about the thiazole biosynthesis in Eukaryotes.
Collapse
Affiliation(s)
- Paulo H C Godoi
- Departamento de Física e Informática, Instituto de Física de São Carlos, Universidade de São Paulo, São Carlos, SP, CP 369, 13560-970, Brazil
| | | | | | | | | | | |
Collapse
|
139
|
Nichols CE, Johnson C, Lockyer M, Charles IG, Lamb HK, Hawkins AR, Stammers DK. Structural characterization of Salmonella typhimurium YeaZ, an M22 O-sialoglycoprotein endopeptidase homolog. Proteins 2006; 64:111-23. [PMID: 16617437 DOI: 10.1002/prot.20982] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The Salmonella typhimurium "yeaZ" gene (StyeaZ) encodes an essential protein of unknown function (StYeaZ), which has previously been annotated as a putative homolog of the Pasteurella haemolytica M22 O-sialoglycoprotein endopeptidase Gcp. YeaZ has also recently been reported as the first example of an RPF from a gram-negative bacterial species. To further characterize the properties of StYeaZ and the widely occurring MK-M22 family, we describe the purification, biochemical analysis, crystallization, and structure determination of StYeaZ. The crystal structure of StYeaZ reveals a classic two-lobed actin-like fold with structural features consistent with nucleotide binding. However, microcalorimetry experiments indicated that StYeaZ neither binds polyphosphates nor a wide range of nucleotides. Additionally, biochemical assays show that YeaZ is not an active O-sialoglycoprotein endopeptidase, consistent with the lack of the critical zinc binding motif. We present a detailed comparison of YeaZ with available structural homologs, the first reported structural analysis of an MK-M22 family member. The analysis indicates that StYeaZ has an unusual orientation of the A and B lobes which may require substantial relative movement or interaction with a partner protein in order to bind ligands. Comparison of the fold of YeaZ with that of a known RPF domain from a gram-positive species shows significant structural differences and therefore potentially distinctive RPF mechanisms for these two bacterial classes.
Collapse
Affiliation(s)
- C E Nichols
- Division of Structural Biology, The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | | | | | | | | | | | | |
Collapse
|
140
|
Yang JM, Tung CH. Protein structure database search and evolutionary classification. Nucleic Acids Res 2006; 34:3646-59. [PMID: 16885238 PMCID: PMC1540718 DOI: 10.1093/nar/gkl395] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2006] [Revised: 05/06/2006] [Accepted: 05/09/2006] [Indexed: 11/14/2022] Open
Abstract
As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].
Collapse
Affiliation(s)
- Jinn-Moon Yang
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, 30050, Taiwan.
| | | |
Collapse
|
141
|
Abstract
UNLABELLED Universal ontology of catalytic sites is required to systematize enzyme catalytic sites, their evolution as well as relations between catalytic sites and protein families, organisms and chemical reactions. Here we present a classification of hydrolases catalytic sites based on hierarchical organization. The web-accessible database provides information on the catalytic sites, protein folds, EC numbers and source organisms of the enzymes and includes software allowing for analysis and visualization of the relations between them. AVAILABILITY http://www.enzyme.chem.msu.ru/hcs/
Collapse
Affiliation(s)
- Igor A Gariev
- School of Enzymology, Department of Chemistry, M.V. Lomonosov Moscow State University, Moscow, 119992, Russia.
| | | |
Collapse
|
142
|
Abstract
Owing to the ongoing success of the genome sequencing and structural genomics projects, the increase in both sequence and structural data is rapid. The development of tools for the annotation of sequence and structural data has become more important in the hope of keeping up with this data explosion. Scientists in this field have addressed these issues over the last 10 years and there now exists a wealth of methods and approaches to help interpret these data. However, there is no current way in which these methods can be incorporated easily so that the resulting annotations can be viewed together. This review discusses the development of these annotation methods and introduces the BioSapiens Network of Excellence, which has been formed in order to integrate the methods which have been developed in Europe.
Collapse
Affiliation(s)
- Gabrielle A Reeves
- EMBL--European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | |
Collapse
|
143
|
Marsden BD, Sundstrom M, Knapp S. High-throughput structural characterisation of therapeutic protein targets. Expert Opin Drug Discov 2006; 1:123-36. [DOI: 10.1517/17460441.1.2.123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
144
|
Williams PD, Pollock DD, Blackburne BP, Goldstein RA. Assessing the accuracy of ancestral protein reconstruction methods. PLoS Comput Biol 2006; 2:e69. [PMID: 16789817 PMCID: PMC1480538 DOI: 10.1371/journal.pcbi.0020069] [Citation(s) in RCA: 133] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2005] [Accepted: 05/04/2006] [Indexed: 11/18/2022] Open
Abstract
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated. It is now possible to apply computational methods to known current protein sequences to recreate the sequences of ancestral proteins. By synthesising these proteins and measuring their properties in the laboratory, we can gain much information about the nature of evolution, better understand how proteins change and adapt over time, and develop insights into the environments of ancient organisms. Unfortunately, the accuracy of these reconstructions is difficult to evaluate. We simulate protein evolution using a simplified computational model and apply the various reconstruction methods to the sequences that arise from our simulations. Because we have the complete record of the evolutionary history, we can evaluate the reconstruction accuracy directly. We demonstrate that the reconstruction procedures in common use may have a bias toward overestimating the properties of these ancestral proteins, opposite to what has been assumed previously. An alternative method of creating these sequences is presented, Bayesian sampling, that can eliminate this bias and provide more robust conclusions.
Collapse
Affiliation(s)
- Paul D Williams
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| | - David D Pollock
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, Louisiana, United States of America
| | - Benjamin P Blackburne
- Division of Mathematical Biology, National Institute of Medical Research, Mill Hill, London, United Kingdom
| | - Richard A Goldstein
- Division of Mathematical Biology, National Institute of Medical Research, Mill Hill, London, United Kingdom
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
145
|
Brylinski M, Konieczny L, Roterman I. Hydrophobic collapse in (in silico) protein folding. Comput Biol Chem 2006; 30:255-67. [PMID: 16798094 DOI: 10.1016/j.compbiolchem.2006.04.007] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2005] [Revised: 04/06/2006] [Accepted: 04/06/2006] [Indexed: 11/28/2022]
Abstract
A model of hydrophobic collapse, which is treated as the driving force for protein folding, is presented. This model is the superposition of three models commonly used in protein structure prediction: (1) 'oil-drop' model introduced by Kauzmann, (2) a lattice model introduced to decrease the number of degrees of freedom for structural changes and (3) a model of the formation of hydrophobic core as a key feature in driving the folding of proteins. These three models together helped to develop the idea of a fuzzy-oil-drop as a model for an external force field of hydrophobic character mimicking the hydrophobicity-differentiated environment for hydrophobic collapse. All amino acids in the polypeptide interact pair-wise during the folding process (energy minimization procedure) and interact with the external hydrophobic force field defined by a three-dimensional Gaussian function. The value of the Gaussian function usually interpreted as a probability distribution is treated as a normalized hydrophobicity distribution, with its maximum in the center of the ellipsoid and decreasing proportionally with the distance versus the center. The fuzzy-oil-drop is elastic and changes its shape and size during the simulated folding procedure.
Collapse
Affiliation(s)
- Michal Brylinski
- Department of Bioinformatics and Telemedicine, Collegium Medicum, Jagiellonian University, Kopernika 17, 31-501 Krakow, Poland
| | | | | |
Collapse
|
146
|
Lise S, Walker-Taylor A, Jones DT. Docking protein domains in contact space. BMC Bioinformatics 2006; 7:310. [PMID: 16790041 PMCID: PMC1559650 DOI: 10.1186/1471-2105-7-310] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2006] [Accepted: 06/21/2006] [Indexed: 11/10/2022] Open
Abstract
Background Many biological processes involve the physical interaction between protein domains. Understanding these functional associations requires knowledge of the molecular structure. Experimental investigations though present considerable difficulties and there is therefore a need for accurate and reliable computational methods. In this paper we present a novel method that seeks to dock protein domains using a contact map representation. Rather than providing a full three dimensional model of the complex, the method predicts contacting residues across the interface. We use a scoring function that combines structural, physicochemical and evolutionary information, where each potential residue contact is assigned a value according to the scoring function and the hypothesis is that the real configuration of contacts is the one that maximizes the score. The search is performed with a simulated annealing algorithm directly in contact space. Results We have tested the method on interacting domain pairs that are part of the same protein (intra-molecular domains). We show that it correctly predicts some contacts and that predicted residues tend to be significantly closer to each other than other pairs of residues in the same domains. Moreover we find that predicted contacts can often discriminate the best model (or the native structure, if present) among a set of optimal solutions generated by a standard docking procedure. Conclusion Contact docking appears feasible and able to complement other computational methods for the prediction of protein-protein interactions. With respect to more standard docking algorithms it might be more suitable to handle protein conformational changes and to predict complexes starting from protein models.
Collapse
Affiliation(s)
- Stefano Lise
- Department of Biochemistry and Molecular Biology, University College London, UK
| | | | - David T Jones
- Department of Biochemistry and Molecular Biology, University College London, UK
- Department of Computer Science, University College London, UK
| |
Collapse
|
147
|
Lees JG, Miles AJ, Wien F, Wallace BA. A reference database for circular dichroism spectroscopy covering fold and secondary structure space. Bioinformatics 2006; 22:1955-62. [PMID: 16787970 DOI: 10.1093/bioinformatics/btl327] [Citation(s) in RCA: 324] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Circular Dichroism (CD) spectroscopy is a long-established technique for studying protein secondary structures in solution. Empirical analyses of CD data rely on the availability of reference datasets comprised of far-UV CD spectra of proteins whose crystal structures have been determined. This article reports on the creation of a new reference dataset which effectively covers both secondary structure and fold space, and uses the higher information content available in synchrotron radiation circular dichroism (SRCD) spectra to more accurately predict secondary structure than has been possible with existing reference datasets. It also examines the effects of wavelength range, structural redundancy and different means of categorizing secondary structures on the accuracy of the analyses. In addition, it describes a novel use of hierarchical cluster analyses to identify protein relatedness based on spectral properties alone. The databases are shown to be applicable in both conventional CD and SRCD spectroscopic analyses of proteins. Hence, by combining new bioinformatics and biophysical methods, a database has been produced that should have wide applicability as a tool for structural molecular biology.
Collapse
Affiliation(s)
- Jonathan G Lees
- Department of Crystallography, Birkbeck College, University of London, London WC1E 7HX, UK
| | | | | | | |
Collapse
|
148
|
Chou WI, Pai TW, Liu SH, Hsiung BK, Chang MT. The family 21 carbohydrate-binding module of glucoamylase from Rhizopus oryzae consists of two sites playing distinct roles in ligand binding. Biochem J 2006; 396:469-77. [PMID: 16509822 PMCID: PMC1482813 DOI: 10.1042/bj20051982] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The starch-hydrolysing enzyme GA (glucoamylase) from Rhizopus oryzae is a commonly used glycoside hydrolase in industry. It consists of a C-terminal catalytic domain and an N-terminal starch-binding domain, which belong to the CBM21 (carbohydrate-binding module, family 21). In the present study, a molecular model of CBM21 from R. oryzae GA (RoGACBM21) was constructed according to PSSC (progressive secondary structure correlation), modified structure-based sequence alignment, and site-directed mutagenesis was used to identify and characterize potential ligand-binding sites. Our model suggests that RoGACBM21 contains two ligand-binding sites, with Tyr32 and Tyr67 grouped into site I, and Trp47, Tyr83 and Tyr93 grouped into site II. The involvement of these aromatic residues has been validated using chemical modification, UV difference spectroscopy studies, and both qualitative and quantitative binding assays on a series of RoGACBM21 mutants. Our results further reveal that binding sites I and II play distinct roles in ligand binding, the former not only is involved in binding insoluble starch, but also facilitates the binding of RoGACBM21 to long-chain soluble polysaccharides, whereas the latter serves as the major binding site mediating the binding of both soluble polysaccharide and insoluble ligands. In the present study we have for the first time demonstrated that the key ligand-binding residues of RoGACBM21 can be identified and characterized by a combination of novel bioinformatics methodologies in the absence of resolved three-dimensional structural information.
Collapse
Affiliation(s)
- Wei-I Chou
- *Institute of Molecular and Cellular Biology, Department of Life Science, National Tsing Hua University, No. 101, Sec. 2, Kuang Fu Rd, Hsinchu, Taiwan 30013, Republic of China
| | - Tun-Wen Pai
- †Department of Computer Science, National Taiwan Ocean University, No. 2, Pei Ning Rd, Keelung, Taiwan 20224, Republic of China
| | - Shi-Hwei Liu
- *Institute of Molecular and Cellular Biology, Department of Life Science, National Tsing Hua University, No. 101, Sec. 2, Kuang Fu Rd, Hsinchu, Taiwan 30013, Republic of China
| | - Bor-Kai Hsiung
- *Institute of Molecular and Cellular Biology, Department of Life Science, National Tsing Hua University, No. 101, Sec. 2, Kuang Fu Rd, Hsinchu, Taiwan 30013, Republic of China
| | - Margaret D.-T. Chang
- *Institute of Molecular and Cellular Biology, Department of Life Science, National Tsing Hua University, No. 101, Sec. 2, Kuang Fu Rd, Hsinchu, Taiwan 30013, Republic of China
- To whom correspondence should be addressed (email )
| |
Collapse
|
149
|
Whitfield EJ, Pruess M, Apweiler R. Bioinformatics database infrastructure for biotechnology research. J Biotechnol 2006; 124:629-39. [PMID: 16757051 DOI: 10.1016/j.jbiotec.2006.04.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2005] [Revised: 03/06/2006] [Accepted: 04/03/2006] [Indexed: 10/24/2022]
Abstract
Many databases are available that provide valuable data resources for the biotechnological researcher. According to their core data, they can be divided into different types. Some databases provide primary data, like all published nucleotide sequences, others deal with protein sequences. In addition to these two basic types of databases, a huge number of more specialized resources are available, like databases about protein structures, protein identification, special features of genes and/or proteins, or certain organisms. Furthermore, some resources offer integrated views on different types of data, allowing the user to do easy customized queries over large datasets and to compare different types of data.
Collapse
Affiliation(s)
- Eleanor J Whitfield
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambs CB10 1SD, UK.
| | | | | |
Collapse
|
150
|
Eramian D, Shen MY, Devos D, Melo F, Sali A, Marti-Renom MA. A composite score for predicting errors in protein structure models. Protein Sci 2006; 15:1653-66. [PMID: 16751606 PMCID: PMC2242555 DOI: 10.1110/ps.062095806] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct approximately 85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DeltaRMSD) from 0.63 A to 0.45 A, while having a higher Pearson correlation coefficient to RMSD (r=0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling.
Collapse
Affiliation(s)
- David Eramian
- Graduate Group in Biophysics, Department of Biopharmaceutical Sciences, University of California at San Francisco 94158, USA
| | | | | | | | | | | |
Collapse
|