1
|
Estrada E. Universality in protein residue networks. Biophys J 2010; 98:890-900. [PMID: 20197043 DOI: 10.1016/j.bpj.2009.11.017] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2009] [Revised: 10/23/2009] [Accepted: 11/11/2009] [Indexed: 10/19/2022] Open
Abstract
Residue networks representing 595 nonhomologous proteins are studied. These networks exhibit universal topological characteristics as they belong to the topological class of modular networks formed by several highly interconnected clusters separated by topological cavities. There are some networks that tend to deviate from this universality. These networks represent small-size proteins having <200 residues. This article explains such differences in terms of the domain structure of these proteins. On the other hand, the topological cavities characterizing proteins residue networks match very well with protein binding sites. This study investigates the effect of the cutoff value used in building the residue network. For small cutoff values, <5 A, the cavities found are very large corresponding almost to the whole protein surface. On the contrary, for large cutoff value, >10.0 A, only very large cavities are detected and the networks look very homogeneous. These findings are useful for practical purposes as well as for identifying protein-like complex networks. Finally, this article shows that the main topological class of residue networks is not reproduced by random networks growing according to Erdös-Rényi model or the preferential attachment method of Barabási-Albert. However, the Watts-Strogatz model reproduces very well the topological class as well as other topological properties of residue network. A more biologically appealing modification of the Watts-Strogatz model to describe residue networks is proposed.
Collapse
Affiliation(s)
- Ernesto Estrada
- Department of Physics and Institute of Complex Systems, University of Strathclyde, Glasgow, United Kingdom.
| |
Collapse
|
2
|
Cuff A, Redfern OC, Greene L, Sillitoe I, Lewis T, Dibley M, Reid A, Pearl F, Dallman T, Todd A, Garratt R, Thornton J, Orengo C. The CATH hierarchy revisited-structural divergence in domain superfamilies and the continuity of fold space. Structure 2010; 17:1051-62. [PMID: 19679085 PMCID: PMC2741583 DOI: 10.1016/j.str.2009.06.015] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Revised: 06/24/2009] [Accepted: 06/25/2009] [Indexed: 11/29/2022]
Abstract
This paper explores the structural continuum in CATH and the extent to which superfamilies adopt distinct folds. Although most superfamilies are structurally conserved, in some of the most highly populated superfamilies (4% of all superfamilies) there is considerable structural divergence. While relatives share a similar fold in the evolutionary conserved core, diverse elaborations to this core can result in significant differences in the global structures. Applying similar protocols to examine the extent to which structural overlaps occur between different fold groups, it appears this effect is confined to just a few architectures and is largely due to small, recurring super-secondary motifs (e.g., αβ-motifs, α-hairpins). Although 24% of superfamilies overlap with superfamilies having different folds, only 14% of nonredundant structures in CATH are involved in overlaps. Nevertheless, the existence of these overlaps suggests that, in some regions of structure space, the fold universe should be seen as more continuous.
Collapse
Affiliation(s)
- Alison Cuff
- Institute of Structural and Molecular Biology, University College London, London, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Faure G, Bornot A, de Brevern AG. Analysis of protein contacts into Protein Units. Biochimie 2009; 91:876-87. [PMID: 19383526 DOI: 10.1016/j.biochi.2009.04.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2008] [Accepted: 04/13/2009] [Indexed: 11/18/2022]
Abstract
Three-dimensional structures of proteins are the support of their biological functions. Their folds are maintained by inter-residue interactions which are one of the main focuses to understand the mechanisms of protein folding and stability. Furthermore, protein structures can be composed of single or multiple functional domains that can fold and function independently. Hence, dividing a protein into domains is useful for obtaining an accurate structure and function determination. In previous studies, we enlightened protein contact properties according to different definitions and developed a novel methodology named Protein Peeling. Within protein structures, Protein Peeling characterizes small successive compact units along the sequence called protein units (PUs). The cutting done by Protein Peeling maximizes the number of contacts within the PUs and minimizes the number of contacts between them. This method is so a relevant tool in the context of the protein folding research and particularly regarding the hierarchical model proposed by George Rose. Here, we accurately analyze the PUs at different levels of cutting, using a non-redundant protein databank. Distribution of PU sizes, number of PUs or their accessibility are screened to determine their common and different features. Moreover, we highlight the preferential amino acid interactions inside and between PUs. Our results show that PUs are clearly an intermediate level between secondary structures and protein structural domains.
Collapse
Affiliation(s)
- Guilhem Faure
- INSERM UMR-S 726, Equipe de Bioinformatique Génomique et Moléculaire (EBGM), DSIMB, Université Paris Diderot - Paris 7, case 7113, 2 place Jussieu, 75251 Paris, France
| | | | | |
Collapse
|
4
|
Faure G, Bornot A, de Brevern AG. Protein contacts, inter-residue interactions and side-chain modelling. Biochimie 2008; 90:626-39. [DOI: 10.1016/j.biochi.2007.11.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Accepted: 11/22/2007] [Indexed: 10/22/2022]
|
5
|
Kundu S, Sorensen DC, Phillips GN. Automatic domain decomposition of proteins by a Gaussian Network Model. Proteins 2006; 57:725-33. [PMID: 15478120 DOI: 10.1002/prot.20268] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Proteins are often comprised of domains of apparently independent folding units. These domains can be defined in various ways, but one useful definition divides the protein into substructures that seem to move more or less independently. The same methods that allow fairly accurate calculation of motion can be used to help classify these substructures. We show how the Gaussian Network Model (GNM), commonly used for determining motion, can also be adapted to automatically classify domains in proteins. Parallels between this physical network model and graph theory implementation are apparent. The method is applied to a nonredundant set of 55 proteins, and the results are compared to the visual assignments by crystallographers. Apart from decomposing proteins into structural domains, the algorithm can generally be applied to any large macromolecular system to decompose it into motionally decoupled sub-systems.
Collapse
Affiliation(s)
- Sibsankar Kundu
- Department of Biochemistry, University of Wisconsin, Madison, Wisconsin 53706, USA
| | | | | |
Collapse
|
6
|
Simon K, Xu J, Kim C, Skrynnikov NR. Estimating the accuracy of protein structures using residual dipolar couplings. JOURNAL OF BIOMOLECULAR NMR 2005; 33:83-93. [PMID: 16258827 DOI: 10.1007/s10858-005-2601-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2005] [Accepted: 08/05/2005] [Indexed: 05/05/2023]
Abstract
It has been commonly recognized that residual dipolar coupling data provide a measure of quality for protein structures. To quantify this observation, a database of 100 single-domain proteins has been compiled where each protein was represented by two independently solved structures. Backbone 1H-15N dipolar couplings were simulated for the target structures and then fitted to the model structures. The fits were characterized by an R-factor which was corrected for the effects of non-uniform distribution of dipolar vectors on a unit sphere. The analyses show that favorable R values virtually guarantee high accuracy of the model structure (where accuracy is defined as the backbone coordinate rms deviation). On the other hand, unfavorable R values do not necessarily suggest low accuracy. Based on the simulated data, a simple empirical formula is proposed to estimate the accuracy of protein structures. The method is illustrated with a number of examples, including PDZ2 domain of human phosphatase hPTP1E.
Collapse
Affiliation(s)
- Katya Simon
- Department of Chemistry, Purdue University, West Lafayette, IN 47907, USA
| | | | | | | |
Collapse
|
7
|
Maurer MH. The path to enlightenment: making sense of genomic and proteomic information. GENOMICS PROTEOMICS & BIOINFORMATICS 2005; 2:123-31. [PMID: 15629052 PMCID: PMC5172447 DOI: 10.1016/s1672-0229(04)02018-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Whereas genomics describes the study of genome, mainly represented by its gene expression on the DNA or RNA level, the term proteomics denotes the study of the proteome, which is the protein complement encoded by the genome. In recent years, the number of proteomic experiments increased tremendously. While all fields of proteomics have made major technological advances, the biggest step was seen in bioinformatics. Biological information management relies on sequence and structure databases and powerful software tools to translate experimental results into meaningful biological hypotheses and answers. In this resource article, I provide a collection of databases and software available on the Internet that are useful to interpret genomic and proteomic data. The article is a toolbox for researchers who have genomic or proteomic datasets and need to put their findings into a biological context.
Collapse
Affiliation(s)
- Martin H Maurer
- Department of Physiology and Pathophysiology, University of Heidelberg, 69120 Heidelberg, Germany.
| |
Collapse
|
8
|
Day R, Beck DAC, Armen RS, Daggett V. A consensus view of fold space: combining SCOP, CATH, and the Dali Domain Dictionary. Protein Sci 2004; 12:2150-60. [PMID: 14500873 PMCID: PMC2366924 DOI: 10.1110/ps.0306803] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
We have determined consensus protein-fold classifications on the basis of three classification methods, SCOP, CATH, and Dali. These classifications make use of different methods of defining and categorizing protein folds that lead to different views of protein-fold space. Pairwise comparisons of domains on the basis of their fold classifications show that much of the disagreement between the classification systems is due to differing domain definitions rather than assigning the same domain to different folds. However, there are significant differences in the fold assignments between the three systems. These remaining differences can be explained primarily in terms of the breadth of the fold classifications. Many structures may be defined as having one fold in one system, whereas far fewer are defined as having the analogous fold in another system. By comparing these folds for a nonredundant set of proteins, the consensus method breaks up broad fold classifications and combines restrictive fold classifications into metafolds, creating, in effect, an averaged view of fold space. This averaged view requires that the structural similarities between proteins having the same metafold be recognized by multiple classification systems. Thus, the consensus map is useful for researchers looking for fold similarities that are relatively independent of the method used to compare proteins. The 30 most populated metafolds, representing the folds of about half of a nonredundant subset of the PDB, are presented here. The full list of metafolds is presented on the Web.
Collapse
Affiliation(s)
- Ryan Day
- Biomolecular Structure and Design Program and Department of Medicinal Chemistry, University of Washington, Seattle, Washington 98195, USA
| | | | | | | |
Collapse
|
9
|
Nekrasov AN. Analysis of the Information Structure of Protein Sequences: A New Method for Analyzing the Domain Organization of Proteins. J Biomol Struct Dyn 2004; 21:615-24. [PMID: 14769054 DOI: 10.1080/07391102.2004.10506952] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
The amino acid sequences of gamma-crystallin, Haloalkane Dehalogenase, Phthalate Dioxygenase, Porphobilinogen Deaminase and Myosin Regulatory Domain c-chain were analyzed for their information content. Sites of increased degree of information coordination between residues (IDIC-sites) were identified, and their organization was studied by means of analyzing the information structure of the protein sequences. Relationships between the structural units forming the spatial and informational structure of proteins were demonstrated. Associations of information-coordinated structural elements (IDIC-associations) were mapped onto compact structural domains found in the spatial structures of globular proteins. The proposed method of analyzing the information structure of protein sequences may find applications in the biotechnology and structural chemistry of proteins.
Collapse
Affiliation(s)
- Alexei N Nekrasov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, ul Miklukho-Maklaya, 16/10, Moscow, 117997 Russia.
| |
Collapse
|
10
|
Abstract
We developed a method CHOP dissecting proteins into domain-like fragments. The basic idea was to cut proteins beginning from very reliable experimental information (PDB), proceeding to expert annotations of domain-like regions (Pfam-A), and completing through cuts based on termini of known proteins. In this way, CHOP dissected more than two thirds of all proteins from 62 proteomes. Analysis of our structural domain-like fragments revealed four surprising results. First, >70% of all dissected proteins contained more than one fragment. Second, most domains spanned on average over approximately 100 residues. This average was similar for eukaryotic and prokaryotic proteins, and it is also valid-although previously not described-for all proteins in the PDB. Third, single-domain proteins were significant longer than most domains in multidomain proteins. Fourth, three fourths of all domains appeared shorter than 210 residues. We believe that our CHOP fragments constituted an important resource for functional and structural genomics. Nevertheless, our main motivation to develop CHOP was that the single-linkage clustering method failed to adequately group full-length proteins. In contrast, CLUP-the simple clustering scheme CLUP introduced here-succeeded largely to group the CHOP fragments from 62 proteomes such that all members of one cluster shared a basic structural core. CLUP found >63,000 multi- and >118,000 single-member clusters. Although most fragments were restricted to a particular cluster, approximately 24% of the fragments were duplicated in at least two clusters. Our thresholds for grouping two fragments into the same cluster were rather conservative. Nevertheless, our results suggested that structural genomics initiatives have to target >30,000 fragments to at least cover the multimember clusters in 62 proteomes.
Collapse
Affiliation(s)
- Jinfeng Liu
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA
| | | |
Collapse
|
11
|
Cao H, Ihm Y, Wang CZ, Morris JR, Su M, Dobbs D, Ho KM. Three-dimensional threading approach to protein structure recognition. POLYMER 2004. [DOI: 10.1016/j.polymer.2003.10.091] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
12
|
Marchler-Bauer A, Panchenko AR, Ariel N, Bryant SH. Comparison of sequence and structure alignments for protein domains. Proteins 2002; 48:439-46. [PMID: 12112669 DOI: 10.1002/prot.10163] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Profile search methods based on protein domain alignments have proven to be useful tools in comparative sequence analysis. Domain alignments used by currently available search methods have been computed by sequence comparison. With the growth of the protein structure database, however, alignments of many domain pairs have also been computed by structure comparison. Here, we examine the extent to which information from these two sources agrees. We measure agreement with respect to identification of homologous regions in each protein, that is, with respect to the location of domain boundaries. We also measure agreement with respect to identification of homologous residue sites by comparing alignments and assessing the accuracy of the molecular models they predict. We find that domain alignments in publicly available collections based on sequence and structure comparison are largely consistent. However, the homologous regions identified by sequence comparison are often shorter than those identified by 3D structure comparison. In addition, when overall sequence similarity is low alignments from sequence comparison produce less accurate molecular models, suggesting that they less accurately identify homologous sites. These observations suggest that structure comparison results might be used to improve the overall accuracy of domain alignment collections and the performance of profile search methods based on them.
Collapse
Affiliation(s)
- Aron Marchler-Bauer
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894, USA
| | | | | | | |
Collapse
|
13
|
Orengo CA, Bray JE, Buchan DWA, Harrison A, Lee D, Pearl FMG, Sillitoe I, Todd AE, Thornton JM. The CATH protein family database: A resource for structural and functional annotation of genomes. Proteomics 2002. [DOI: 10.1002/1615-9861(200201)2:1<11::aid-prot11>3.0.co;2-t] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|