Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, Orengo C. The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 2005;33:D247-51. [PMID: 15608188 PMCID: PMC539978 DOI: 10.1093/nar/gki024] [Citation(s) in RCA: 211] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

For:	Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, Orengo C. The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 2005;33:D247-51. [PMID: 15608188 PMCID: PMC539978 DOI: 10.1093/nar/gki024] [Citation(s) in RCA: 211] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

101

Zhao XM, Li X, Chen L, Aihara K. Protein classification with imbalanced data. Proteins 2007;70:1125-32. [DOI: 10.1002/prot.21870] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

102

Pang CNI, Lin K, Wouters MA, Heringa J, George RA. Identifying foldable regions in protein sequence from the hydrophobic signal. Nucleic Acids Res 2007;36:578-88. [PMID: 18056079 PMCID: PMC2241846 DOI: 10.1093/nar/gkm1070] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

103

Lee D, Redfern O, Orengo C. Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 2007;8:995-1005. [PMID: 18037900 DOI: 10.1038/nrm2281] [Citation(s) in RCA: 352] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

104

Birzele F, Csaba G, Zimmer R. Alternative splicing and protein structure evolution. Nucleic Acids Res 2007;36:550-8. [PMID: 18055499 PMCID: PMC2241867 DOI: 10.1093/nar/gkm1054] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

105

Copper binding to the Alzheimer's disease amyloid precursor protein. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2007;37:269-79. [PMID: 18030462 PMCID: PMC2921068 DOI: 10.1007/s00249-007-0234-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2007] [Revised: 10/24/2007] [Accepted: 10/26/2007] [Indexed: 12/25/2022]

106

Abyzov A, Ilyin VA. A comprehensive analysis of non-sequential alignments between all protein structures. BMC STRUCTURAL BIOLOGY 2007;7:78. [PMID: 18005453 PMCID: PMC2213659 DOI: 10.1186/1472-6807-7-78] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2007] [Accepted: 11/16/2007] [Indexed: 05/02/2023]

Abstract

Background

The majority of relations between proteins can be represented as a conventional sequential alignment. Nevertheless, unusual non-sequential alignments with different connectivity of the aligned fragments in compared proteins have been reported by many researchers. It is interesting to understand those non-sequential alignments; are they unique, sporadic cases or they occur frequently; do they belong to a few specific folds or spread among many different folds, as a common feature of protein structure. We present here a comprehensive large-scale study of non-sequential alignments between available protein structures in Protein Data Bank.

Results

The study has been conducted on a non-redundant set of 8,865 protein structures aligned with the aid of the TOPOFIT method. It has been estimated that between 17.4% and 35.2% of all alignments are non-sequential depending on variations in the parameters. Analysis of the data revealed that non-sequential relations between proteins do occur systematically and in large quantities. Various sizes and numbers of non-sequential fragments have been observed with all possible complexities of fragment rearrangements found for alignments consisting of up to 12 fragments. It has been found that non-sequential alignments are not limited to proteins of any particular fold and are present in more than two hundred of them. Moreover, many of them are found between proteins with different fold assignments. It has been shown that protein structure symmetry does not explain non-sequential alignments. Therefore, compelling evidences have been provided that non-sequential alignments between proteins are systematic and widespread across the protein universe.

Conclusion

The phenomenon of the widespread occurrence of non-sequential alignments between proteins might represent a missing rule of protein structure organization. More detailed study of this phenomenon will enhance our understanding of protein stability, folding, and evolution.

Collapse

107

ProCKSI: a decision support system for Protein (structure) Comparison, Knowledge, Similarity and Information. BMC Bioinformatics 2007;8:416. [PMID: 17963510 PMCID: PMC2222653 DOI: 10.1186/1471-2105-8-416] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2007] [Accepted: 10/26/2007] [Indexed: 11/19/2022] Open

Abstract

Background

We introduce the decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information (ProCKSI). ProCKSI integrates various protein similarity measures through an easy to use interface that allows the comparison of multiple proteins simultaneously. It employs the Universal Similarity Metric (USM), the Maximum Contact Map Overlap (MaxCMO) of protein structures and other external methods such as the DaliLite and the TM-align methods, the Combinatorial Extension (CE) of the optimal path, and the FAST Align and Search Tool (FAST). Additionally, ProCKSI allows the user to upload a user-defined similarity matrix supplementing the methods mentioned, and computes a similarity consensus in order to provide a rich, integrated, multicriteria view of large datasets of protein structures.

Results

We present ProCKSI's architecture and workflow describing its intuitive user interface, and show its potential on three distinct test-cases. In the first case, ProCKSI is used to evaluate the results of a previous CASP competition, assessing the similarity of proposed models for given targets where the structures could have a large deviation from one another. To perform this type of comparison reliably, we introduce a new consensus method. The second study deals with the verification of a classification scheme for protein kinases, originally derived by sequence comparison by Hanks and Hunter, but here we use a consensus similarity measure based on structures. In the third experiment using the Rost and Sander dataset (RS126), we investigate how a combination of different sets of similarity measures influences the quality and performance of ProCKSI's new consensus measure. ProCKSI performs well with all three datasets, showing its potential for complex, simultaneous multi-method assessment of structural similarity in large protein datasets. Furthermore, combining different similarity measures is usually more robust than relying on one single, unique measure.

Conclusion

Based on a diverse set of similarity measures, ProCKSI computes a consensus similarity profile for the entire protein set. All results can be clustered, visualised, analysed and easily compared with each other through a simple and intuitive interface.

ProCKSI is publicly available at for academic and non-commercial use.

Collapse

108

Gherardini PF, Wass MN, Helmer-Citterich M, Sternberg MJE. Convergent Evolution of Enzyme Active Sites Is not a Rare Phenomenon. J Mol Biol 2007;372:817-45. [PMID: 17681532 DOI: 10.1016/j.jmb.2007.06.017] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2006] [Revised: 05/14/2007] [Accepted: 06/08/2007] [Indexed: 02/03/2023]

Abstract

Since convergent evolution of enzyme active sites was first identified in serine proteases, other individual instances of this phenomenon have been documented. However, a systematic analysis assessing the frequency of this phenomenon across enzyme space is still lacking. This work uses the Query3d structural comparison algorithm to integrate for the first time detailed knowledge about catalytic residues, available through the Catalytic Site Atlas (CSA), with the evolutionary information provided by the Structural Classification of Proteins (SCOP) database. This study considers two modes of convergent evolution: (i) mechanistic analogues which are enzymes that use the same mechanism to perform related, but possibly different, reactions (considered here as sharing the first three digits of the EC number); and (ii) transformational analogues which catalyse exactly the same reaction (identical EC numbers), but may use different mechanisms. Mechanistic analogues were identified in 15% (26 out of 169) of the three-digit EC groups considered, showing that this phenomenon is not rare. Furthermore 11 of these groups also contain transformational analogues. The catalytic triad is the most widespread active site; the results of the structural comparison show that this mechanism, or variations thereof, is present in 23 superfamilies. Transformational analogues were identified for 45 of the 951 four-digit EC numbers present within the CSA and about half of these were also mechanistic analogues exhibiting convergence of their active sites. This analysis has also been extended to the whole Protein Data Bank to provide a complete and manually curated list of the all the transformational analogues whose structure is classified in SCOP. The results of this work show that the phenomenon of convergent evolution is not rare, especially when considering large enzymatic families.

Collapse

109

Shaw N, Tempel W, Chang J, Yang H, Cheng C, Ng J, Rose J, Rao Z, Wang BC, Liu ZJ. Crystal structure solution of a ParB-like nuclease at atomic resolution. Proteins 2007;70:263-7. [PMID: 17729285 DOI: 10.1002/prot.21641] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

110

Rasteiro R, Pereira-Leal JB. Multiple domain insertions and losses in the evolution of the Rab prenylation complex. BMC Evol Biol 2007;7:140. [PMID: 17705859 PMCID: PMC1994686 DOI: 10.1186/1471-2148-7-140] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2007] [Accepted: 08/17/2007] [Indexed: 11/10/2022] Open

111

Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment. BMC Bioinformatics 2007;8:252. [PMID: 17629909 PMCID: PMC1939857 DOI: 10.1186/1471-2105-8-252] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2007] [Accepted: 07/13/2007] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available.

RESULTS

We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the methodology to discriminate and classify biological sequences and structures. A second set of experiments aims at assessing how well two commonly available classification algorithms, UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining), can use the methodology to perform their task, their performance being evaluated against gold standards and with the use of well known statistical indexes, i.e., the F-measure and the partition distance. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of USM on biological data. The main ones are reported next.

CONCLUSION

UCD and NCD are indistinguishable, i.e., they yield nearly the same values of the statistical indexes we have used, accross experiments and data sets, while CD is almost always worse than both. UPGMA seems to yield better classification results with respect to NJ, i.e., better values of the statistical indexes (10% difference or above), on a substantial fraction of experiments, compressors and USM approximation choices. The compression program PPMd, based on PPM (Prediction by Partial Matching), for generic data and Gencompress for DNA, are the best performers among the compression algorithms we have used, although the difference in performance, as measured by statistical indexes, between them and the other algorithms depends critically on the data set and may not be as large as expected. PPMd used with UCD or NCD and UPGMA, on sequence data is very close, although worse, in performance with the alignment methods (less than 2% difference on the F-measure). Yet, it scales well with data set size and it can work on data other than sequences. In summary, our quantitative analysis naturally complements the rich theory behind USM and supports the conclusion that the methodology is worth using because of its robustness, flexibility, scalability, and competitiveness with existing techniques. In particular, the methodology applies to all biological data in textual format. The software and data sets are available under the GNU GPL at the supplementary material web page.

Collapse

112

Andrade J, Karmali A, Carrondo MA, Frazão C. Structure of Amidase from Pseudomonas aeruginosa Showing a Trapped Acyl Transfer Reaction Intermediate State. J Biol Chem 2007;282:19598-605. [PMID: 17442671 DOI: 10.1074/jbc.m701039200] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open

113

Tung CH, Huang JW, Yang JM. Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database. Genome Biol 2007;8:R31. [PMID: 17335583 PMCID: PMC1868941 DOI: 10.1186/gb-2007-8-3-r31] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2006] [Revised: 01/05/2007] [Accepted: 03/03/2007] [Indexed: 11/23/2022] Open

114

Chiang YS, Gelfand TI, Kister AE, Gelfand IM. New classification of supersecondary structures of sandwich-like proteins uncovers strict patterns of strand assemblage. Proteins 2007;68:915-21. [PMID: 17557333 DOI: 10.1002/prot.21473] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

115

Torrance JW, Holliday GL, Mitchell JB, Thornton JM. The Geometry of Interactions between Catalytic Residues and their Substrates. J Mol Biol 2007;369:1140-52. [DOI: 10.1016/j.jmb.2007.03.055] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2007] [Revised: 03/14/2007] [Accepted: 03/20/2007] [Indexed: 10/23/2022]

116

Marti-Renom MA, Rossi A, Al-Shahrour F, Davis FP, Pieper U, Dopazo J, Sali A. The AnnoLite and AnnoLyze programs for comparative annotation of protein structures. BMC Bioinformatics 2007;8 Suppl 4:S4. [PMID: 17570147 PMCID: PMC1892083 DOI: 10.1186/1471-2105-8-s4-s4] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

117

Zheng X, Dai X, Zhao Y, Chen Q, Lu F, Yao D, Yu Q, Liu X, Zhang C, Gu X, Luo M. Restructuring of the dinucleotide-binding fold in an NADP(H) sensor protein. Proc Natl Acad Sci U S A 2007;104:8809-14. [PMID: 17496144 PMCID: PMC1885584 DOI: 10.1073/pnas.0700480104] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open

118

Rodrigues APC, Grant BJ, Godzik A, Friedberg I. The 2006 automated function prediction meeting. BMC Bioinformatics 2007;8 Suppl 4:S1-4. [PMID: 17570143 PMCID: PMC1892079 DOI: 10.1186/1471-2105-8-s4-s1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

119

Tung CH, Yang JM. fastSCOP: a fast web server for recognizing protein structural domains and SCOP superfamilies. Nucleic Acids Res 2007;35:W438-43. [PMID: 17485476 PMCID: PMC1933144 DOI: 10.1093/nar/gkm288] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

120

Marti-Renom MA, Pieper U, Madhusudhan MS, Rossi A, Eswar N, Davis FP, Al-Shahrour F, Dopazo J, Sali A. DBAli tools: mining the protein structure space. Nucleic Acids Res 2007;35:W393-7. [PMID: 17478513 PMCID: PMC1933139 DOI: 10.1093/nar/gkm236] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

121

Jefferson ER, Walsh TP, Roberts TJ, Barton GJ. SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein-Protein Interactions. Nucleic Acids Res 2007;35:D580-9. [PMID: 17202171 PMCID: PMC1899103 DOI: 10.1093/nar/gkl836] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

122

Kong GKW, Adams JJ, Harris HH, Boas JF, Curtain CC, Galatis D, Masters CL, Barnham KJ, McKinstry WJ, Cappai R, Parker MW. Structural Studies of the Alzheimer’s Amyloid Precursor Protein Copper-binding Domain Reveal How it Binds Copper Ions. J Mol Biol 2007;367:148-61. [PMID: 17239395 DOI: 10.1016/j.jmb.2006.12.041] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2006] [Revised: 12/11/2006] [Accepted: 12/15/2006] [Indexed: 11/30/2022]

123

Macías JR, Jiménez-Lozano N, Carazo JM. Integrating electron microscopy information into existing Distributed Annotation Systems. J Struct Biol 2007;158:205-13. [PMID: 17400476 DOI: 10.1016/j.jsb.2007.02.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2006] [Revised: 12/19/2006] [Accepted: 02/13/2007] [Indexed: 10/23/2022]

124

Bateman A, Finn RD. SCOOP: a simple method for identification of novel protein superfamily relationships. Bioinformatics 2007;23:809-14. [PMID: 17277330 PMCID: PMC2603044 DOI: 10.1093/bioinformatics/btm034] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

125

Rueda M, Ferrer-Costa C, Meyer T, Pérez A, Camps J, Hospital A, Gelpí JL, Orozco M. A consensus view of protein dynamics. Proc Natl Acad Sci U S A 2007;104:796-801. [PMID: 17215349 PMCID: PMC1783393 DOI: 10.1073/pnas.0605534104] [Citation(s) in RCA: 189] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2006] [Indexed: 11/18/2022] Open

Affiliation(s)

Manuel Rueda *Molecular Modelling and Bioinformatics Unit and Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and
Carles Ferrer-Costa *Molecular Modelling and Bioinformatics Unit and Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and
Tim Meyer *Molecular Modelling and Bioinformatics Unit and Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and Departament de Bioquímica i Biologia Molecular, Facultat de Biologia, Universitat de Barcelona, Avgda Diagonal 645, 08028 Barcelona, Spain
Alberto Pérez *Molecular Modelling and Bioinformatics Unit and Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and
Jordi Camps Structural Biology Node, Institut de Recerca Biomèdica, Parc Científic de Barcelona, Josep Samitier 1-5, 08028 Barcelona, Spain Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and
Adam Hospital *Molecular Modelling and Bioinformatics Unit and Structural Biology Node, Institut de Recerca Biomèdica, Parc Científic de Barcelona, Josep Samitier 1-5, 08028 Barcelona, Spain Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and
Josep Lluis Gelpí *Molecular Modelling and Bioinformatics Unit and Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and Departament de Bioquímica i Biologia Molecular, Facultat de Biologia, Universitat de Barcelona, Avgda Diagonal 645, 08028 Barcelona, Spain
Modesto Orozco *Molecular Modelling and Bioinformatics Unit and Structural Biology Node, Institut de Recerca Biomèdica, Parc Científic de Barcelona, Josep Samitier 1-5, 08028 Barcelona, Spain Computational Biology Program, Barcelona Supercomputing Center, Jordi Girona 31, Edifici Nexus II, 08028 Barcelona, Spain; and Departament de Bioquímica i Biologia Molecular, Facultat de Biologia, Universitat de Barcelona, Avgda Diagonal 645, 08028 Barcelona, Spain

Collapse

126

Cho KI, Lee K, Lee KH, Kim D, Lee D. Specificity of molecular interactions in transient protein-protein interaction interfaces. Proteins 2007;65:593-606. [PMID: 16948160 DOI: 10.1002/prot.21056] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Abstract

In this study, we investigate what types of interactions are specific to their biological function, and what types of interactions are persistent regardless of their functional category in transient protein-protein heterocomplexes. This is the first approach to analyze protein-protein interfaces systematically at the molecular interaction level in the context of protein functions. We perform systematic analysis at the molecular interaction level using classification and feature subset selection technique prevalent in the field of pattern recognition. To represent the physicochemical properties of protein-protein interfaces, we design 18 molecular interaction types using canonical and noncanonical interactions. Then, we construct input vector using the frequency of each interaction type in protein-protein interface. We analyze the 131 interfaces of transient protein-protein heterocomplexes in PDB: 33 protease-inhibitors, 52 antibody-antigens, 46 signaling proteins including 4 cyclin dependent kinase and 26 G-protein. Using kNN classification and feature subset selection technique, we show that there are specific interaction types based on their functional category, and such interaction types are conserved through the common binding mechanism, rather than through the sequence or structure conservation. The extracted interaction types are C(alpha)-- H...O==C interaction, cation...anion interaction, amine...amine interaction, and amine...cation interaction. With these four interaction types, we achieve the classification success rate up to 83.2% with leave-one-out cross-validation at k = 15. Of these four interaction types, C(alpha)--H...O==C shows binding specificity for protease-inhibitor complexes, while cation-anion interaction is predominant in signaling complexes. The amine ... amine and amine...cation interaction give a minor contribution to the classification accuracy. When combined with these two interactions, they increase the accuracy by 3.8%. In the case of antibody-antigen complexes, the sign is somewhat ambiguous. From the evolutionary perspective, while protease-inhibitors and sig-naling proteins have optimized their interfaces to suit their biological functions, antibody-antigen interactions are the happenstance, implying that antibody-antigen complexes do not show distinctive interaction types. Persistent interaction types such as pi...pi, amide-carbonyl, and hydroxyl-carbonyl interaction, are also investigated. Analyzing the structural orientations of the pi...pi stacking interactions, we find that herringbone shape is a major configuration in transient protein-protein interfaces. This result is different from that of protein core, where parallel-displaced configurations are the major configuration. We also analyze overall trend of amide-carbonyl and hydroxyl-carbonyl interactions. It is noticeable that nearly 82% of the interfaces have at least one hydroxyl-carbonyl interactions.

Collapse

127

Eswar N, Sali A. Comparative Modeling of Drug Target Proteins. COMPREHENSIVE MEDICINAL CHEMISTRY II 2007. [PMCID: PMC7151936 DOI: 10.1016/b0-08-045044-x/00251-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

128

InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol Biol 2007;396:59-70. [PMID: 18025686 DOI: 10.1007/978-1-59745-515-2_5] [Citation(s) in RCA: 288] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

129

Pratelli R, Pilot G. The plant-specific VIMAG domain of Glutamine Dumper1 is necessary for the function of the protein in Arabidopsis. FEBS Lett 2006;580:6961-6. [PMID: 17157837 DOI: 10.1016/j.febslet.2006.11.064] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2006] [Revised: 11/21/2006] [Accepted: 11/21/2006] [Indexed: 11/23/2022]

130

Smialowski P, Martin-Galiano AJ, Mikolajka A, Girschick T, Holak TA, Frishman D. Protein solubility: sequence based prediction and experimental verification. Bioinformatics 2006;23:2536-42. [PMID: 17150993 DOI: 10.1093/bioinformatics/btl623] [Citation(s) in RCA: 104] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

131

Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 2006;35:D291-7. [PMID: 17135200 PMCID: PMC1751535 DOI: 10.1093/nar/gkl959] [Citation(s) in RCA: 239] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

132

Sonego P, Pacurar M, Dhir S, Kertész-Farkas A, Kocsor A, Gáspári Z, Leunissen JA, Pongor S. A Protein Classification Benchmark collection for machine learning. Nucleic Acids Res 2006;35:D232-6. [PMID: 17142240 PMCID: PMC1669728 DOI: 10.1093/nar/gkl812] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

133

Tracing the origin of functional and conserved domains in the human proteome: implications for protein evolution at the modular level. BMC Evol Biol 2006;6:91. [PMID: 17090320 PMCID: PMC1654190 DOI: 10.1186/1471-2148-6-91] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2006] [Accepted: 11/07/2006] [Indexed: 11/29/2022] Open

134

Berman HM, Burley SK, Chiu W, Sali A, Adzhubei A, Bourne PE, Bryant SH, Dunbrack RL, Fidelis K, Frank J, Godzik A, Henrick K, Joachimiak A, Heymann B, Jones D, Markley JL, Moult J, Montelione GT, Orengo C, Rossmann MG, Rost B, Saibil H, Schwede T, Standley DM, Westbrook JD. Outcome of a workshop on archiving structural models of biological macromolecules. Structure 2006;14:1211-7. [PMID: 16955948 DOI: 10.1016/j.str.2006.06.005] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

135

Lin Z, Sriskanthadevan S, Huang H, Siu CH, Yang D. Solution structures of the adhesion molecule DdCAD-1 reveal new insights into Ca2+-dependent cell-cell adhesion. Nat Struct Mol Biol 2006;13:1016-22. [PMID: 17057715 DOI: 10.1038/nsmb1162] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2006] [Accepted: 10/03/2006] [Indexed: 02/06/2023]

136

Zhi D, Krishna SS, Cao H, Pevzner P, Godzik A. Representing and comparing protein structures as paths in three-dimensional space. BMC Bioinformatics 2006;7:460. [PMID: 17052359 PMCID: PMC1626488 DOI: 10.1186/1471-2105-7-460] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2006] [Accepted: 10/20/2006] [Indexed: 11/10/2022] Open

137

Chivian D, Baker D. Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection. Nucleic Acids Res 2006;34:e112. [PMID: 16971460 PMCID: PMC1635247 DOI: 10.1093/nar/gkl480] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

138

Godoi PHC, Galhardo RS, Luche DD, Van Sluys MA, Menck CFM, Oliva G. Structure of the thiazole biosynthetic enzyme THI1 from Arabidopsis thaliana. J Biol Chem 2006;281:30957-66. [PMID: 16912043 DOI: 10.1074/jbc.m604469200] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open

139

Nichols CE, Johnson C, Lockyer M, Charles IG, Lamb HK, Hawkins AR, Stammers DK. Structural characterization of Salmonella typhimurium YeaZ, an M22 O-sialoglycoprotein endopeptidase homolog. Proteins 2006;64:111-23. [PMID: 16617437 DOI: 10.1002/prot.20982] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

140

Yang JM, Tung CH. Protein structure database search and evolutionary classification. Nucleic Acids Res 2006;34:3646-59. [PMID: 16885238 PMCID: PMC1540718 DOI: 10.1093/nar/gkl395] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2006] [Revised: 05/06/2006] [Accepted: 05/09/2006] [Indexed: 11/14/2022] Open

141

Gariev IA, Varfolomeev SD. Hierarchical classification of hydrolases catalytic sites. Bioinformatics 2006;22:2574-6. [PMID: 16877756 DOI: 10.1093/bioinformatics/btl413] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

142

Reeves GA, Thornton JM. Integrating biological data through the genome. Hum Mol Genet 2006;15 Spec No 1:R81-7. [PMID: 16651373 DOI: 10.1093/hmg/ddl086] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

143

Marsden BD, Sundstrom M, Knapp S. High-throughput structural characterisation of therapeutic protein targets. Expert Opin Drug Discov 2006;1:123-36. [DOI: 10.1517/17460441.1.2.123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

144

Williams PD, Pollock DD, Blackburne BP, Goldstein RA. Assessing the accuracy of ancestral protein reconstruction methods. PLoS Comput Biol 2006;2:e69. [PMID: 16789817 PMCID: PMC1480538 DOI: 10.1371/journal.pcbi.0020069] [Citation(s) in RCA: 133] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2005] [Accepted: 05/04/2006] [Indexed: 11/18/2022] Open

Abstract

The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.

It is now possible to apply computational methods to known current protein sequences to recreate the sequences of ancestral proteins. By synthesising these proteins and measuring their properties in the laboratory, we can gain much information about the nature of evolution, better understand how proteins change and adapt over time, and develop insights into the environments of ancient organisms. Unfortunately, the accuracy of these reconstructions is difficult to evaluate. We simulate protein evolution using a simplified computational model and apply the various reconstruction methods to the sequences that arise from our simulations. Because we have the complete record of the evolutionary history, we can evaluate the reconstruction accuracy directly. We demonstrate that the reconstruction procedures in common use may have a bias toward overestimating the properties of these ancestral proteins, opposite to what has been assumed previously. An alternative method of creating these sequences is presented, Bayesian sampling, that can eliminate this bias and provide more robust conclusions.

Collapse

145

Brylinski M, Konieczny L, Roterman I. Hydrophobic collapse in (in silico) protein folding. Comput Biol Chem 2006;30:255-67. [PMID: 16798094 DOI: 10.1016/j.compbiolchem.2006.04.007] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2005] [Revised: 04/06/2006] [Accepted: 04/06/2006] [Indexed: 11/28/2022]

146

Lise S, Walker-Taylor A, Jones DT. Docking protein domains in contact space. BMC Bioinformatics 2006;7:310. [PMID: 16790041 PMCID: PMC1559650 DOI: 10.1186/1471-2105-7-310] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2006] [Accepted: 06/21/2006] [Indexed: 11/10/2022] Open

147

Lees JG, Miles AJ, Wien F, Wallace BA. A reference database for circular dichroism spectroscopy covering fold and secondary structure space. Bioinformatics 2006;22:1955-62. [PMID: 16787970 DOI: 10.1093/bioinformatics/btl327] [Citation(s) in RCA: 324] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

148

Chou WI, Pai TW, Liu SH, Hsiung BK, Chang MT. The family 21 carbohydrate-binding module of glucoamylase from Rhizopus oryzae consists of two sites playing distinct roles in ligand binding. Biochem J 2006;396:469-77. [PMID: 16509822 PMCID: PMC1482813 DOI: 10.1042/bj20051982] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

149

Whitfield EJ, Pruess M, Apweiler R. Bioinformatics database infrastructure for biotechnology research. J Biotechnol 2006;124:629-39. [PMID: 16757051 DOI: 10.1016/j.jbiotec.2006.04.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2005] [Revised: 03/06/2006] [Accepted: 04/03/2006] [Indexed: 10/24/2022]

150

Eramian D, Shen MY, Devos D, Melo F, Sali A, Marti-Renom MA. A composite score for predicting errors in protein structure models. Protein Sci 2006;15:1653-66. [PMID: 16751606 PMCID: PMC2242555 DOI: 10.1110/ps.062095806] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]