1
|
Kaiser F, Labudde D. Unsupervised Discovery of Geometrically Common Structural Motifs and Long-Range Contacts in Protein 3D Structures. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:671-680. [PMID: 29990265 DOI: 10.1109/tcbb.2017.2786250] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The essential role of small evolutionarily conserved structural units in proteins has been extensively researched and validated. A popular example are serine proteases, where the peptide cleavage reaction is realized by a configuration of only three residues. Brought to spatial proximity during the protein folding process, such structural motifs are often long-range contacts and usually hard to detect at sequence level. Due to the constantly increasing resource of protein 3D structure data, the computational identification of structural motifs can contribute significantly to the understanding of protein fold and function. Thus, we propose a method to discover structural motifs of high geometrical similarity and desired sequence separation in protein 3D structure data. By utilizing methods originated from data mining, no a priori knowledge is required. The applicability of the method is demonstrated by the identification of the catalytic unit of serine proteases and the ion-coordination center of cupredoxins. Furthermore, large-scale analysis of the entire Protein Data Bank points towards the presence of ubiquitous structural motifs, independent of any specific fold or function. We envision that our method is suitable to uncover functional mechanisms and to derive fingerprint libraries of structural motifs, which could be used to assess protein family association.
Collapse
|
2
|
Meysman P, Zhou C, Cule B, Goethals B, Laukens K. Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns. BioData Min 2015; 8:4. [PMID: 25657820 PMCID: PMC4318390 DOI: 10.1186/s13040-015-0038-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 01/18/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and other biomolecules, such as DNA. The discovery and characterization of these patterns is an important research topic within structural biology as it can give fundamental insight into protein structures and can aid in the prediction of unknown structures. RESULTS Here we apply an efficient spatial pattern miner to search for sets of amino acids that occur frequently in close spatial proximity in the protein structures of the Protein DataBank. This allowed us to mine for a new class of amino acid patterns, that we term FreSCOs (Frequent Spatially Cohesive Component sets), which feature synergetic combinations. To demonstrate the relevance of these FreSCOs, they were compared in relation to the thermostability of the protein structure and the interaction preferences of DNA-protein complexes. In both cases, the results matched well with prior investigations using more complex methods on smaller data sets. CONCLUSIONS The currently characterized protein structures feature a diverse set of frequent amino acid patterns that can be related to the stability of the protein molecular structure and that are independent from protein function or specific conserved domains.
Collapse
Affiliation(s)
- Pieter Meysman
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Cheng Zhou
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Boris Cule
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Bart Goethals
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
3
|
Matsuoka M, Kikuchi T. Sequence analysis on the information of folding initiation segments in ferredoxin-like fold proteins. BMC STRUCTURAL BIOLOGY 2014; 14:15. [PMID: 24884463 PMCID: PMC4055915 DOI: 10.1186/1472-6807-14-15] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Accepted: 05/15/2014] [Indexed: 02/06/2023]
Abstract
BACKGROUND While some studies have shown that the 3D protein structures are more conservative than their amino acid sequences, other experimental studies have shown that even if two proteins share the same topology, they may have different folding pathways. There are many studies investigating this issue with molecular dynamics or Go-like model simulations, however, one should be able to obtain the same information by analyzing the proteins' amino acid sequences, if the sequences contain all the information about the 3D structures. In this study, we use information about protein sequences to predict the location of their folding segments. We focus on proteins with a ferredoxin-like fold, which has a characteristic topology. Some of these proteins have different folding segments. RESULTS Despite the simplicity of our methods, we are able to correctly determine the experimentally identified folding segments by predicting the location of the compact regions considered to play an important role in structural formation. We also apply our sequence analyses to some homologues of each protein and confirm that there are highly conserved folding segments despite the homologues' sequence diversity. These homologues have similar folding segments even though the homology of two proteins' sequences is not so high. CONCLUSION Our analyses have proven useful for investigating the common or different folding features of the proteins studied.
Collapse
Affiliation(s)
| | - Takeshi Kikuchi
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan.
| |
Collapse
|
4
|
Ben Ishay E, Rahamim G, Orevi T, Hazan G, Amir D, Haas E. Fast Subdomain Folding Prior to the Global Refolding Transition of E. coli Adenylate Kinase: A Double Kinetics Study. J Mol Biol 2012; 423:613-23. [DOI: 10.1016/j.jmb.2012.08.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2012] [Revised: 07/31/2012] [Accepted: 08/07/2012] [Indexed: 11/16/2022]
|
5
|
Sengupta D, Kundu S. Role of long- and short-range hydrophobic, hydrophilic and charged residues contact network in protein's structural organization. BMC Bioinformatics 2012; 13:142. [PMID: 22720789 PMCID: PMC3464617 DOI: 10.1186/1471-2105-13-142] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Accepted: 06/21/2012] [Indexed: 11/10/2022] Open
Abstract
Background The three-dimensional structure of a protein can be described as a graph where nodes represent residues and the strength of non-covalent interactions between them are edges. These protein contact networks can be separated into long and short-range interactions networks depending on the positions of amino acids in primary structure. Long-range interactions play a distinct role in determining the tertiary structure of a protein while short-range interactions could largely contribute to the secondary structure formations. In addition, physico chemical properties and the linear arrangement of amino acids of the primary structure of a protein determines its three dimensional structure. Here, we present an extensive analysis of protein contact subnetworks based on the London van der Waals interactions of amino acids at different length scales. We further subdivided those networks in hydrophobic, hydrophilic and charged residues networks and have tried to correlate their influence in the overall topology and organization of a protein. Results The largest connected component (LCC) of long (LRN)-, short (SRN)- and all-range (ARN) networks within proteins exhibit a transition behaviour when plotted against different interaction strengths of edges among amino acid nodes. While short-range networks having chain like structures exhibit highly cooperative transition; long- and all-range networks, which are more similar to each other, have non-chain like structures and show less cooperativity. Further, the hydrophobic residues subnetworks in long- and all-range networks have similar transition behaviours with all residues all-range networks, but the hydrophilic and charged residues networks don’t. While the nature of transitions of LCC’s sizes is same in SRNs for thermophiles and mesophiles, there exists a clear difference in LRNs. The presence of larger size of interconnected long-range interactions in thermophiles than mesophiles, even at higher interaction strength between amino acids, give extra stability to the tertiary structure of the thermophiles. All the subnetworks at different length scales (ARNs, LRNs and SRNs) show assortativity mixing property of their participating amino acids. While there exists a significant higher percentage of hydrophobic subclusters over others in ARNs and LRNs; we do not find the assortative mixing behaviour of any the subclusters in SRNs. The clustering coefficient of hydrophobic subclusters in long-range network is the highest among types of subnetworks. There exist highly cliquish hydrophobic nodes followed by charged nodes in LRNs and ARNs; on the other hand, we observe the highest dominance of charged residues cliques in short-range networks. Studies on the perimeter of the cliques also show higher occurrences of hydrophobic and charged residues’ cliques. Conclusions The simple framework of protein contact networks and their subnetworks based on London van der Waals force is able to capture several known properties of protein structure as well as can unravel several new features. The thermophiles do not only have the higher number of long-range interactions; they also have larger cluster of connected residues at higher interaction strengths among amino acids, than their mesophilic counterparts. It can reestablish the significant role of long-range hydrophobic clusters in protein folding and stabilization; at the same time, it shed light on the higher communication ability of hydrophobic subnetworks over the others. The results give an indication of the controlling role of hydrophobic subclusters in determining protein’s folding rate. The occurrences of higher perimeters of hydrophobic and charged cliques imply the role of charged residues as well as hydrophobic residues in stabilizing the distant part of primary structure of a protein through London van der Waals interaction.
Collapse
Affiliation(s)
- Dhriti Sengupta
- Department of Biophysics, Molecular Biology & Bioinformatics, University of Calcutta, 92 APC Road, Kolkata-700009, India
| | | |
Collapse
|
6
|
Gromiha MM. Influence of long-range contacts and surrounding residues on the transition state structures of proteins. Anal Biochem 2011; 408:32-6. [DOI: 10.1016/j.ab.2010.08.029] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2010] [Revised: 08/16/2010] [Accepted: 08/22/2010] [Indexed: 10/19/2022]
|
7
|
Hamacher K. Efficient quantification of the importance of contacts for the dynamical stability of proteins. J Comput Chem 2010; 32:810-5. [PMID: 20957707 DOI: 10.1002/jcc.21659] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2010] [Revised: 07/12/2010] [Accepted: 08/05/2010] [Indexed: 11/07/2022]
Abstract
Understanding the stability of the native state and the dynamics of a protein is of great importance for all areas of biomolecular design. The efficient estimation of the influence of individual contacts between amino acids in a protein structure is a first step in the reengineering of a particular protein for technological or pharmacological purposes. At the same time, the functional annotation of molecular evolution can be facilitated by such insight. Here, we use a recently suggested, information theoretical measure in biomolecular design - the Kullback-Leibler-divergence - to quantify and therefore rank residue-residue contacts within proteins according to their overall contribution to the molecular mechanics. We implement this protocol on the basis of a reduced molecular model, which allows us to use a well-known lemma of linear algebra to speed up the computation. The increase in computational performance is around 10(1)- to 10(4)-fold. We applied the method to two proteins to illustrate the protocol and its results. We found that our method can reliably identify key residues in the molecular mechanics and the protein fold in comparison to well-known properties in the serine protease inhibitor. We found significant correlations to experimental results, e.g., dissociation constants and Φ values.
Collapse
|
8
|
MOTONO C, GROMIHA MM. Dynamic and Structural Analysis of Hyperthermophilic Cold Shock Protein Stability. KOBUNSHI RONBUNSHU 2010. [DOI: 10.1295/koron.67.151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Chie MOTONO
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST)
| | - M. Michael GROMIHA
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST)
| |
Collapse
|
9
|
Motono C, Gromiha MM, Kumar S. Thermodynamic and kinetic determinants ofThermotoga maritimacold shock protein stability: A structural and dynamic analysis. Proteins 2008; 71:655-69. [DOI: 10.1002/prot.21729] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
10
|
Li J, Wang J, Wang W. Identifying folding nucleus based on residue contact networks of proteins. Proteins 2008; 71:1899-907. [DOI: 10.1002/prot.21891] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
11
|
Anil B, Sato S, Cho JH, Raleigh DP. Fine structure analysis of a protein folding transition state; distinguishing between hydrophobic stabilization and specific packing. J Mol Biol 2005; 354:693-705. [PMID: 16246369 DOI: 10.1016/j.jmb.2005.08.054] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2005] [Revised: 08/15/2005] [Accepted: 08/23/2005] [Indexed: 10/25/2022]
Abstract
Developing a detailed understanding of the structure and energetics of protein folding transition states is a key step in describing the folding process. The phi-value analysis approach allows the energetic contribution of side-chains to be mapped out by comparing wild-type with individual mutants where conservative changes are introduced. Studies where multiple substitutions are made at individual sites are much rarer but are potentially very useful for understanding the contribution of each element of a side-chain to transition state formation, and for distinguishing the relative importance of specific packing versus hydrophobic interactions. We have made a series of conservative mutations at multiple buried sites in the N-terminal domain of L9 in order to assess the relative importance of specific side-chain packing versus less specific hydrophobic stabilization of the transition state. A total of 28 variants were prepared using both naturally occurring and non-naturally occurring amino acids at six sites. Analysis of the mutants by NMR and CD showed no perturbation of the structure. There is no correlation between changes in hydrophobicity and changes in stability. In contrast, there is excellent linear correlation between the hydrophobicity of a side-chain and the log of the folding rate, ln(k(f)). The correlation between ln(k(f)) and the change in hydrophobicity holds even for substitutions that change the shape and/or size of a side-chain significantly. For most sites, the correlation with the logarithm of the unfolding rate, ln(k(u)), is much worse. Mutants with more hydrophobic amino acid substitutions fold faster, and those with less hydrophobic amino acid substitutions fold slower. The results show that hydrophobic interactions amongst core residues are an important driving force for forming the transition state, and are more important than specific tight packing interactions. Finally, a number of substitutions lead to negative phi-values and the origin of these effects are described.
Collapse
Affiliation(s)
- Burcu Anil
- Department of Chemistry, State University of New York at Stony Brook, Stony Brook, NY 11794-3400, USA
| | | | | | | |
Collapse
|
12
|
Ágoston V, Cemazar M, Kaján L, Pongor S. Graph-representation of oxidative folding pathways. BMC Bioinformatics 2005; 6:19. [PMID: 15676070 PMCID: PMC549202 DOI: 10.1186/1471-2105-6-19] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2004] [Accepted: 01/27/2005] [Indexed: 11/17/2022] Open
Abstract
Background The process of oxidative folding combines the formation of native disulfide bond with conformational folding resulting in the native three-dimensional fold. Oxidative folding pathways can be described in terms of disulfide intermediate species (DIS) which can also be isolated and characterized. Each DIS corresponds to a family of folding states (conformations) that the given DIS can adopt in three dimensions. Results The oxidative folding space can be represented as a network of DIS states interconnected by disulfide interchange reactions that can either create/abolish or rearrange disulfide bridges. We propose a simple 3D representation wherein the states having the same number of disulfide bridges are placed on separate planes. In this representation, the shuffling transitions are within the planes, and the redox edges connect adjacent planes. In a number of experimentally studied cases (bovine pancreatic trypsin inhibitor, insulin-like growth factor and epidermal growth factor), the observed intermediates appear as part of contiguous oxidative folding pathways. Conclusions Such networks can be used to visualize folding pathways in terms of the experimentally observed intermediates. A simple visualization template written for the Tulip package can be obtained from V.A.
Collapse
Affiliation(s)
- Vilmos Ágoston
- Bioinformatics Group, Biological Research Center, Hungarian Academy of Sciences, Temesvári krt. 62, 6726 Szeged, Hungary
| | - Masa Cemazar
- Protein Structure and Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, Area Science Park, 34012 Trieste, Italy
- Institute for Molecular Bioscience, University of Queensland, St. Lucia 4072, QLD, Australia
| | - László Kaján
- Protein Structure and Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, Area Science Park, 34012 Trieste, Italy
| | - Sándor Pongor
- Protein Structure and Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, Area Science Park, 34012 Trieste, Italy
| |
Collapse
|
13
|
Michael Gromiha M. Distinct roles of conventional non-covalent and cation–π interactions in protein stability. POLYMER 2005. [DOI: 10.1016/j.polymer.2004.10.028] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|