1
|
Tvaroška I, Kozmon S, Kóňa J. Molecular Modeling Insights into the Structure and Behavior of Integrins: A Review. Cells 2023; 12:cells12020324. [PMID: 36672259 PMCID: PMC9856412 DOI: 10.3390/cells12020324] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 01/09/2023] [Accepted: 01/11/2023] [Indexed: 01/18/2023] Open
Abstract
Integrins are heterodimeric glycoproteins crucial to the physiology and pathology of many biological functions. As adhesion molecules, they mediate immune cell trafficking, migration, and immunological synapse formation during inflammation and cancer. The recognition of the vital roles of integrins in various diseases revealed their therapeutic potential. Despite the great effort in the last thirty years, up to now, only seven integrin-based drugs have entered the market. Recent progress in deciphering integrin functions, signaling, and interactions with ligands, along with advancement in rational drug design strategies, provide an opportunity to exploit their therapeutic potential and discover novel agents. This review will discuss the molecular modeling methods used in determining integrins' dynamic properties and in providing information toward understanding their properties and function at the atomic level. Then, we will survey the relevant contributions and the current understanding of integrin structure, activation, the binding of essential ligands, and the role of molecular modeling methods in the rational design of antagonists. We will emphasize the role played by molecular modeling methods in progress in these areas and the designing of integrin antagonists.
Collapse
Affiliation(s)
- Igor Tvaroška
- Institute of Chemistry, Slovak Academy of Sciences, Dúbravska cesta 9, 845 38 Bratislava, Slovakia
- Correspondence:
| | - Stanislav Kozmon
- Institute of Chemistry, Slovak Academy of Sciences, Dúbravska cesta 9, 845 38 Bratislava, Slovakia
- Medical Vision o. z., Záhradnícka 4837/55, 821 08 Bratislava, Slovakia
| | - Juraj Kóňa
- Institute of Chemistry, Slovak Academy of Sciences, Dúbravska cesta 9, 845 38 Bratislava, Slovakia
- Medical Vision o. z., Záhradnícka 4837/55, 821 08 Bratislava, Slovakia
| |
Collapse
|
2
|
Abstract
De novo protein design enables the exploration of novel sequences and structures absent from the natural protein universe. De novo design also stands as a stringent test for our understanding of the underlying physical principles of protein folding and may lead to the development of proteins with unmatched functional characteristics. The first fundamental challenge of de novo design is to devise "designable" structural templates leading to sequences that will adopt the predicted fold. Here, we built on the TopoBuilder (TB) de novo design method, to automatically assemble structural templates with native-like features starting from string descriptors that capture the overall topology of proteins. Our framework eliminates the dependency of hand-crafted and fold-specific rules through an iterative, data-driven approach that extracts geometrical parameters from structural tertiary motifs. We evaluated the TopoBuilder framework by designing sequences for a set of five protein folds and experimental characterization revealed that several sequences were folded and stable in solution. The TopoBuilder de novo design framework will be broadly useful to guide the generation of artificial proteins with customized geometries, enabling the exploration of the protein universe.
Collapse
|
3
|
Yadav NS, Kumar P, Singh I. Structural and functional analysis of protein. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00026-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
4
|
Energy propagation along polypeptide α-helix: Experimental data and ab initio zone structure. Biosystems 2019; 185:104016. [DOI: 10.1016/j.biosystems.2019.104016] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2019] [Revised: 08/09/2019] [Accepted: 08/09/2019] [Indexed: 12/11/2022]
|
5
|
Guiding exploration in conformational feature space with Lipschitz underestimation for ab-initio protein structure prediction. Comput Biol Chem 2018; 73:105-119. [PMID: 29475175 DOI: 10.1016/j.compbiolchem.2018.02.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 01/25/2018] [Accepted: 02/03/2018] [Indexed: 01/04/2023]
Abstract
Computing conformations which are essential to associate structural and functional information with gene sequences, is challenging due to the high dimensionality and rugged energy surface of the protein conformational space. Consequently, the dimension of the protein conformational space should be reduced to a proper level, and an effective exploring algorithm should be proposed. In this paper, a plug-in method for guiding exploration in conformational feature space with Lipschitz underestimation (LUE) for ab-initio protein structure prediction is proposed. The conformational space is converted into ultrafast shape recognition (USR) feature space firstly. Based on the USR feature space, the conformational space can be further converted into Underestimation space according to Lipschitz estimation theory for guiding exploration. As a consequence of the use of underestimation model, the tight lower bound estimate information can be used for exploration guidance, the invalid sampling areas can be eliminated in advance, and the number of energy function evaluations can be reduced. The proposed method provides a novel technique to solve the exploring problem of protein conformational space. LUE is applied to differential evolution (DE) algorithm, and metropolis Monte Carlo(MMC) algorithm which is available in the Rosetta; When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. Further, LUE is compared with DE and MMC by testing on 15 small-to-medium structurally diverse proteins. Test results show that near-native protein structures with higher accuracy can be obtained more rapidly and efficiently with the use of LUE.
Collapse
|
6
|
Chu H, Liu H. TetraBASE: A Side Chain-Independent Statistical Energy for Designing Realistically Packed Protein Backbones. J Chem Inf Model 2018; 58:430-442. [PMID: 29314837 DOI: 10.1021/acs.jcim.7b00677] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
To construct backbone structures of high designability is a primary aspect of computational protein design. We report here a side chain-independent statistical energy that aims at realistic modeling of through-space packing of polypeptide backbones. To mitigate the lack of explicit amino acid side chains, the model treats the interbackbone site packing as being dependent on peptide local conformation. In addition, new variables suitable for statistical analysis, one for relative orientation and another for distance, have been introduced to represent the intersite geometry based on the asymmetrical tetrahedron organization of distinct chemical groups surrounding the Cα-carbon atoms. The resulting tetrahedron-based backbone statistical energy (tetraBASE) model has been used to optimize the tertiary organizations of secondary structure elements (SSEs) of designated types with Monte Caro simulated annealing, starting from artificial initial configurations. The tetraBASE minimum energy structures can reproduce SSE packing frequently observed in native proteins with atomic root-mean-square deviations of 1-2 Å. The model has also been tested by examining the stability of native SSE arrangements under tetraBASE. The results suggest that tetraBASE model can be used to effectively represent interbackbone packing when designing backbone structures without explicitly knowing side chain types.
Collapse
Affiliation(s)
- Huanyu Chu
- School of Life Sciences, University of Science and Technology of China , 230027 Hefei, Anhui China.,Hefei National Laboratory for Physical Sciences at the Microscales , 230027 Hefei, Anhui China
| | - Haiyan Liu
- School of Life Sciences, University of Science and Technology of China , 230027 Hefei, Anhui China.,Hefei National Laboratory for Physical Sciences at the Microscales , 230027 Hefei, Anhui China.,Collaborative Innovation Center of Chemistry for Life Sciences , 230027 Hefei, Anhui China
| |
Collapse
|
7
|
Hao XH, Zhang GJ, Zhou XG. Conformational Space Sampling Method Using Multi-Subpopulation Differential Evolution for De novo Protein Structure Prediction. IEEE Trans Nanobioscience 2017; 16:618-633. [DOI: 10.1109/tnb.2017.2749243] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
8
|
Wozniak PP, Vriend G, Kotulska M. Correlated mutations select misfolded from properly folded proteins. Bioinformatics 2017; 33:1497-1504. [PMID: 28203707 DOI: 10.1093/bioinformatics/btx013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 01/11/2017] [Indexed: 11/14/2022] Open
Affiliation(s)
- P P Wozniak
- Faculty of Fundamental Problems of Technology, Department of Biomedical Engineering, Wrocław University of Science and Technology, Wrocław, Poland
| | - G Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - M Kotulska
- Faculty of Fundamental Problems of Technology, Department of Biomedical Engineering, Wrocław University of Science and Technology, Wrocław, Poland
| |
Collapse
|
9
|
Martin JW, Zhou P, Donald BR. Systematic solution to homo-oligomeric structures determined by NMR. Proteins 2015; 83:651-61. [PMID: 25620116 DOI: 10.1002/prot.24768] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Revised: 12/12/2014] [Accepted: 01/12/2015] [Indexed: 11/07/2022]
Abstract
Protein structure determination by NMR has predominantly relied on simulated annealing-based conformational search for a converged fold using primarily distance constraints, including constraints derived from nuclear Overhauser effects, paramagnetic relaxation enhancement, and cysteine crosslinkings. Although there is no guarantee that the converged fold represents the global minimum of the conformational space, it is generally accepted that good convergence is synonymous to the global minimum. Here, we show such a criterion breaks down in the presence of large numbers of ambiguous constraints from NMR experiments on homo-oligomeric protein complexes. A systematic evaluation of the conformational solutions that satisfy the NMR constraints of a trimeric membrane protein, DAGK, reveals 9 distinct folds, including the reported NMR and crystal structures. This result highlights the fundamental limitation of global fold determination for homo-oligomeric proteins using ambiguous distance constraints and provides a systematic solution for exhaustive enumeration of all satisfying solutions.
Collapse
Affiliation(s)
- Jeffrey W Martin
- Department of Computer Science, Duke University, Durham, North Carolina, 27708
| | | | | |
Collapse
|
10
|
Sadowski MI. Prediction of protein domain boundaries from inverse covariances. Proteins 2013; 81:253-60. [PMID: 22987736 PMCID: PMC3563215 DOI: 10.1002/prot.24181] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2012] [Revised: 08/10/2012] [Accepted: 09/04/2012] [Indexed: 01/04/2023]
Abstract
It has been known even since relatively few structures had been solved that longer protein chains often contain multiple domains, which may fold separately and play the role of reusable functional modules found in many contexts. In many structural biology tasks, in particular structure prediction, it is of great use to be able to identify domains within the structure and analyze these regions separately. However, when using sequence data alone this task has proven exceptionally difficult, with relatively little improvement over the naive method of choosing boundaries based on size distributions of observed domains. The recent significant improvement in contact prediction provides a new source of information for domain prediction. We test several methods for using this information including a kernel smoothing-based approach and methods based on building alpha-carbon models and compare performance with a length-based predictor, a homology search method and four published sequence-based predictors: DOMCUT, DomPRO, DLP-SVM, and SCOOBY-DOmain. We show that the kernel-smoothing method is significantly better than the other ab initio predictors when both single-domain and multidomain targets are considered and is not significantly different to the homology-based method. Considering only multidomain targets the kernel-smoothing method outperforms all of the published methods except DLP-SVM. The kernel smoothing method therefore represents a potentially useful improvement to ab initio domain prediction.
Collapse
Affiliation(s)
- Michael I Sadowski
- MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London, United Kingdom.
| |
Collapse
|
11
|
Xu D, Zhang Y. Toward optimal fragment generations for ab initio protein structure assembly. Proteins 2012; 81:229-39. [PMID: 22972754 DOI: 10.1002/prot.24179] [Citation(s) in RCA: 170] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2012] [Revised: 08/06/2012] [Accepted: 09/03/2012] [Indexed: 01/03/2023]
Abstract
Fragment assembly using structural motifs excised from other solved proteins has shown to be an efficient method for ab initio protein-structure prediction. However, how to construct accurate fragments, how to derive optimal restraints from fragments, and what the best fragment length is are the basic issues yet to be systematically examined. In this work, we developed a gapless-threading method to generate position-specific structure fragments. Distance profiles and torsion angle pairs are then derived from the fragments by statistical consistency analysis, which achieved comparable accuracy with the machine-learning-based methods although the fragments were taken from unrelated proteins. When measured by both accuracies of the derived distance profiles and torsion angle pairs, we come to a consistent conclusion that the optimal fragment length for structural assembly is around 10, and at least 100 fragments at each location are needed to achieve optimal structure assembly. The distant profiles and torsion angle pairs as derived by the fragments have been successfully used in QUARK for ab initio protein structure assembly and are provided by the QUARK online server at http://zhanglab.ccmb. med.umich.edu/QUARK/.
Collapse
Affiliation(s)
- Dong Xu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | | |
Collapse
|
12
|
Analytic markovian rates for generalized protein structure evolution. PLoS One 2012; 7:e34228. [PMID: 22693543 PMCID: PMC3367531 DOI: 10.1371/journal.pone.0034228] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2011] [Accepted: 02/26/2012] [Indexed: 12/24/2022] Open
Abstract
A general understanding of the complex phenomenon of protein evolution requires the accurate description of the constraints that define the sub-space of proteins with mutations that do not appreciably reduce the fitness of the organism. Such constraints can have multiple origins, in this work we present a model for constrained evolutionary trajectories represented by a Markovian process throughout a set of protein-like structures artificially constructed to be topological intermediates between the structure of two natural occurring proteins. The number and type of intermediate steps defines how constrained the total evolutionary process is. By using a coarse-grained representation for the protein structures, we derive an analytic formulation of the transition rates between each of the intermediate structures. The results indicate that compact structures with a high number of hydrogen bonds are more probable and have a higher likelihood to arise during evolution. Knowledge of the transition rates allows for the study of complex evolutionary pathways represented by trajectories through a set of intermediate structures.
Collapse
|
13
|
Taylor WR, Jones DT, Sadowski MI. Protein topology from predicted residue contacts. Protein Sci 2011; 21:299-305. [PMID: 22102360 DOI: 10.1002/pro.2002] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2011] [Revised: 11/08/2011] [Accepted: 11/10/2011] [Indexed: 11/12/2022]
Abstract
Residue contacts predicted from correlated positions in a multiple sequence alignment are often sparse and uncertain. To some extent, these limitations in the data can be overcome by grouping the contacts by secondary structure elements and enumerating the possible packing arrangements of these elements in a combinatorial manner. Strong interactions appear frequently but inconsistent interactions are down-weighted and missing interactions up-weighted. The resulting improved consistency in the predicted interactions has allowed the method to be successfully applied to proteins up to 200 residues in length which is larger than any structure previously predicted using sequence data alone.
Collapse
Affiliation(s)
- William R Taylor
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London, United Kingdom.
| | | | | |
Collapse
|
14
|
Taylor WR, Sadowski MI. Structural constraints on the covariance matrix derived from multiple aligned protein sequences. PLoS One 2011; 6:e28265. [PMID: 22194819 PMCID: PMC3237328 DOI: 10.1371/journal.pone.0028265] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Accepted: 11/04/2011] [Indexed: 11/18/2022] Open
Abstract
Residue contact predictions were calculated based on the mutual information observed between pairs of positions in large multiple protein sequence alignments. Where previously only the statistical properties of these data have been considered important, we introduce new measures to impose constraints that make the contact map more consistent with a three dimensional structure. These included global (bulk) properties and local secondary structure properties. The latter allowed the contact constraints to be employed at the level of filtering pairs of secondary structure contacts which led to a more efficient (lower-level) implementation in the PLATO structure prediction server. Where previously the measure of success with this method had been whether the correct fold was predicted in the top 10 ranked models, with the current implementation, our summary statistic is the number of correct folds included in the top 10 models--which is on average over 50 percent.
Collapse
Affiliation(s)
- William R Taylor
- Division of Mathematical Biology, MRC National Institute for Medical Research, London, United Kingdom.
| | | |
Collapse
|
15
|
Hu C, Koehl P, Max N. PackHelix: a tool for helix-sheet packing during protein structure prediction. Proteins 2011; 79:2828-43. [PMID: 21905109 PMCID: PMC3172692 DOI: 10.1002/prot.23108] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 04/18/2011] [Accepted: 05/13/2011] [Indexed: 11/09/2022]
Abstract
The three-dimensional structure of a protein is organized around the packing of its secondary structure elements. Predicting the topology and constructing the geometry of structural motifs involving α-helices and/or β-strands are therefore key steps for accurate prediction of protein structure. While many efforts have focused on how to pack helices and on how to sample exhaustively the topologies and geometries of multiple strands forming a β-sheet in a protein, there has been little progress on generating native-like packings of helices on sheets. We describe a method that can generate the packing of multiple helices on a given β-sheet for αβα sandwich type protein folds. This method mines the results of a statistical analysis of the conformations of αβ(2) motifs in protein structures to provide input values for the geometric attributes of the packing of a helix on a sheet. It then proceeds with a geometric builder that generates multiple arrangements of the helices on the sheet of interest by sampling through these values and performing consistency checks that guarantee proper loop geometry between the helices and the strands, minimal number of collisions between the helices, and proper formation of a hydrophobic core. The method is implemented as a module of ProteinShop. Our results show that it produces structures that are within 4-6 Å RMSD of the native one, regardless of the number of helices that need to be packed, though this number may increase if the protein has several helices between two consecutive strands in the sequence that pack on the sheet formed by these two strands.
Collapse
Affiliation(s)
- Chengcheng Hu
- Department of Computer Science, University of California, Davis, CA 95616
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, CA 95616
| | - Nelson Max
- Department of Computer Science, University of California, Davis, CA 95616
| |
Collapse
|
16
|
Sadowski MI, Maksimiak K, Taylor WR. Direct correlation analysis improves fold recognition. Comput Biol Chem 2011; 35:323-32. [PMID: 22000804 PMCID: PMC3267019 DOI: 10.1016/j.compbiolchem.2011.08.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Revised: 08/11/2011] [Accepted: 08/11/2011] [Indexed: 11/23/2022]
Abstract
The extraction of correlated mutations through the method of direct information (DI) provides predicted contact residue pairs that can be used to constrain the three dimensional structures of proteins. We apply this method to a large set of decoy protein folds consisting of many thousand well-constructed models, only tens of which have the correct fold. We find that DI is able to greatly improve the ranking of the true (native) fold but others still remain high scoring that would be difficult to discard due to small shifts in the core beta sheets.
Collapse
Affiliation(s)
| | | | - William R. Taylor
- Corresponding author. Tel.: +44 208 816 2298; fax: +44 208 816 2460.
| |
Collapse
|
17
|
Mapping the distribution of packing topologies within protein interiors shows predominant preference for specific packing motifs. BMC Bioinformatics 2011; 12:195. [PMID: 21605466 PMCID: PMC3123238 DOI: 10.1186/1471-2105-12-195] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2010] [Accepted: 05/24/2011] [Indexed: 11/24/2022] Open
Abstract
Background Mapping protein primary sequences to their three dimensional folds referred to as the 'second genetic code' remains an unsolved scientific problem. A crucial part of the problem concerns the geometrical specificity in side chain association leading to densely packed protein cores, a hallmark of correctly folded native structures. Thus, any model of packing within proteins should constitute an indispensable component of protein folding and design. Results In this study an attempt has been made to find, characterize and classify recurring patterns in the packing of side chain atoms within a protein which sustains its native fold. The interaction of side chain atoms within the protein core has been represented as a contact network based on the surface complementarity and overlap between associating side chain surfaces. Some network topologies definitely appear to be preferred and they have been termed 'packing motifs', analogous to super secondary structures in proteins. Study of the distribution of these motifs reveals the ubiquitous presence of typical smaller graphs, which appear to get linked or coalesce to give larger graphs, reminiscent of the nucleation-condensation model in protein folding. One such frequently occurring motif, also envisaged as the unit of clustering, the three residue clique was invariably found in regions of dense packing. Finally, topological measures based on surface contact networks appeared to be effective in discriminating sequences native to a specific fold amongst a set of decoys. Conclusions Out of innumerable topological possibilities, only a finite number of specific packing motifs are actually realized in proteins. This small number of motifs could serve as a basis set in the construction of larger networks. Of these, the triplet clique exhibits distinct preference both in terms of composition and geometry.
Collapse
|
18
|
Hollup SM, Sadowski MI, Jonassen I, Taylor WR. Exploring the limits of fold discrimination by structural alignment: a large scale benchmark using decoys of known fold. Comput Biol Chem 2011; 35:174-88. [PMID: 21704264 PMCID: PMC3145973 DOI: 10.1016/j.compbiolchem.2011.04.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Accepted: 04/23/2011] [Indexed: 11/10/2022]
Abstract
Protein structure comparison by pairwise alignment is commonly used to identify highly similar substructures in pairs of proteins and provide a measure of structural similarity based on the size and geometric similarity of the match. These scores are routinely applied in analyses of protein fold space under the assumption that high statistical significance is equivalent to a meaningful relationship, however the truth of this assumption has previously been difficult to test since there is a lack of automated methods which do not rely on the same underlying principles. As a resolution to this we present a method based on the use of topological descriptions of global protein structure, providing an independent means to assess the ability of structural alignment to maintain meaningful structural correspondances on a large scale. Using a large set of decoys of specified global fold we benchmark three widely used methods for structure comparison, SAP, TM-align and DALI, and test the degree to which this assumption is justified for these methods. Application of a topological edit distance measure to provide a scale of the degree of fold change shows that while there is a broad correlation between high structural alignment scores and low edit distances there remain many pairs of highly significant score which differ by core strand swaps and therefore are structurally different on a global level. Possible causes of this problem and its meaning for present assessments of protein fold space are discussed.
Collapse
|
19
|
Hollup SM, Fuglebakk E, Taylor WR, Reuter N. Exploring the factors determining the dynamics of different protein folds. Protein Sci 2011; 20:197-209. [PMID: 21086444 DOI: 10.1002/pro.558] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Normal mode analyses of homologous proteins at the family and superfamily level show that slow dynamics are similar and are preserved through evolution. This study investigates how the slow dynamics of proteins is affected by variation in the protein architecture and fold. For this purpose, we have used computer-generated protein models based on idealized protein structures with varying folds. These are shown to be protein-like in their behavior, and they are used to investigate the influence of architecture and fold on the slow dynamics. We compared the dynamics of models having different folds but similar architecture and found the architecture to be the dominant factor for the slow dynamics.
Collapse
Affiliation(s)
- S M Hollup
- Department of Informatics, University of Bergen, N-5020 Bergen, Norway
| | | | | | | |
Collapse
|
20
|
Dai L, Zhou Y. Characterizing the existing and potential structural space of proteins by large-scale multiple loop permutations. J Mol Biol 2011; 408:585-95. [PMID: 21376059 DOI: 10.1016/j.jmb.2011.02.056] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2010] [Revised: 02/22/2011] [Accepted: 02/24/2011] [Indexed: 10/18/2022]
Abstract
Worldwide structural genomics projects are increasing structure coverage of sequence space but have not significantly expanded the protein structure space itself (i.e., number of unique structural folds) since 2007. Discovering new structural folds experimentally by directed evolution and random recombination of secondary-structure blocks is also proved rarely successful. Meanwhile, previous computational efforts for large-scale mapping of protein structure space are limited to simple model proteins and led to an inconclusive answer on the completeness of the existing observed protein structure space. Here, we build novel protein structures by extending naturally occurring circular (single-loop) permutation to multiple loop permutations (MLPs). These structures are clustered by structural similarity measure called TM-score. The computational technique allows us to produce different structural clusters on the same naturally occurring, packed, stable core but with alternatively connected secondary-structure segments. A large-scale MLP of 2936 domains from structural classification of protein domains reproduces those existing structural clusters (63%) mostly as hubs for many nonredundant sequences and illustrates newly discovered novel clusters as islands adopted by a few sequences only. Results further show that there exist a significant number of novel potentially stable clusters for medium-size or large-size single-domain proteins, in particular, >100 amino acid residues, that are either not yet adopted by nature or adopted only by a few sequences. This study suggests that MLP provides a simple yet highly effective tool for engineering and design of novel protein structures (including naturally knotted proteins). The implication of recovering new-fold targets from critical assessment of structure prediction techniques (CASP) by MLP on template-based structure prediction is also discussed. Our MLP structures are available for download at the publication page of the Web site http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Liang Dai
- School of Informatics, Indiana University Purdue University Indianapolis, and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 719 Indiana Avenue, Walker Plaza Building Suite 319, Indianapolis, IN 46202, USA
| | | |
Collapse
|
21
|
Abstract
It is well known that the set of observed topological arrangements of secondary structures in globular proteins is highly limited. These limitations have been explained as the consequence of several rules of thumb including a strong preference for right-handed connections, against crossing loops and certain beta strand patterns. We present a critical evaluation of the power of these rules to distinguish known from possible topologies in a large set of two- and three-layer protein structures and determine that although these rules are still largely valid, an increasing number of exceptions can be found to many of them. The rules are then used to construct a generalised linear model for assessing the probability of occurrence of an arbitrary topology in the PDB. Application of the model to a large set of topologies generated during structure prediction showed that many had a similar probability of occurrence to known PDB folds.
Collapse
Affiliation(s)
- Ben Grainger
- Division of Mathematical Biology, National Institute for Medical Research, London, United Kingdom.
| | | | | |
Collapse
|
22
|
Abstract
The three-dimensional structure of a protein is organized around the packing of its secondary structure elements. Although much is known about the packing geometry observed between alpha-helices and between beta-sheets, there has been little progress on characterizing helix-sheet interactions. We present an analysis of the conformation of alphabeta(2) motifs in proteins, corresponding to all occurrences of helices in contact with two strands that are hydrogen bonded. The geometry of the alphabeta(2) motif is characterized by the azimuthal angle theta between the helix axis and an average vector representing the two strands, the elevation angle psi between the helix axis and the plane containing the two strands, and the distance D between the helix and the strands. We observe that the helix tends to align to the two strands, with a preference for an antiparallel orientation if the two strands are parallel; this preference is diminished for other topologies of the beta-sheet. Side-chain packing at the interface between the helix and the strands is mostly hydrophobic, with a preference for aliphatic amino acids in the strand and aromatic amino acids in the helix. From the knowledge of the geometry and amino acid propensities of alphabeta(2) motifs in proteins, we have derived different statistical potentials that are shown to be efficient in picking native-like conformations among a set of non-native conformations in well-known decoy datasets. The information on the geometry of alphabeta(2) motifs as well as the related statistical potentials have applications in the field of protein structure prediction.
Collapse
Affiliation(s)
- Chengcheng Hu
- Department of Computer Science University of California, Davis, CA 95616
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, CA 95616
| |
Collapse
|
23
|
Abstract
In recent years, there have been significant advances in the field of computational protein design including the successful computational design of enzymes based on backbone scaffolds from experimentally solved structures. It is likely that large-scale sampling of protein backbone conformations will become necessary as further progress is made on more complicated systems. Removing the constraint of having to use scaffolds based on known protein backbones is a potential method of solving the problem. With this application in mind, we describe a method to systematically construct a large number of de novo backbone structures from idealized topological forms in a top–down hierarchical approach. The structural properties of these novel backbone scaffolds were analyzed and compared with a set of high-resolution experimental structures from the protein data bank (PDB). It was found that the Ramachandran plot distribution and relative γ- and β-turn frequencies were similar to those found in the PDB. The de novo scaffolds were sequence designed with RosettaDesign, and the energy distributions and amino acid compositions were comparable with the results for redesigned experimentally solved backbones. Proteins 2010. © 2009 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- James T MacDonald
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA
| | | | | | | |
Collapse
|
24
|
Taylor WR, Chelliah V, Hollup SM, MacDonald JT, Jonassen I. Probing the "dark matter" of protein fold space. Structure 2009; 17:1244-52. [PMID: 19748345 DOI: 10.1016/j.str.2009.07.012] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Revised: 07/13/2009] [Accepted: 07/14/2009] [Indexed: 10/20/2022]
Abstract
We used a protein structure prediction method to generate a variety of folds as alpha-carbon models with realistic secondary structures and good hydrophobic packing. The prediction method used only idealized constructs that are not based on known protein structures or fragments of them, producing an unbiased distribution. Model and native fold comparison used a topology-based method as superposition can only be relied on in similar structures. When all the models were compared to a nonredundant set of all known structures, only one-in-ten were found to have a match. This large excess of novel folds was associated with each protein probe and if true in general, implies that the space of possible folds is larger than the space of realized folds, in much the same way that sequence-space is larger than fold-space. The large excess of novel folds exhibited no unusual properties and has been likened to cosmological dark matter.
Collapse
Affiliation(s)
- William R Taylor
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, London, UK.
| | | | | | | | | |
Collapse
|
25
|
Abstract
BACKGROUND The prediction of protein structure can be facilitated by the use of constraints based on a knowledge of functional sites. Without this information it is still possible to predict which residues are likely to be part of a functional site and this information can be used to select model structures from a variety of alternatives that would correspond to a functional protein. RESULTS Using a large collection of protein-like decoy models, a score was devised that selected those with predicted functional site residues that formed a cluster. When tested on a variety of small alpha/beta/alpha type proteins, including enzymes and non-enzymes, those that corresponded to the native fold were ranked highly. This performance held also for a selection of larger alpha/beta/alpha proteins that played no part in the development of the method. CONCLUSION The use of predicted site positions provides a useful filter to discriminate native-like protein models from non-native models. The method can be applied to any collection of models and should provide a useful aid to all modelling methods from ab initio to homology based approaches.
Collapse
Affiliation(s)
- Vijayalakshmi Chelliah
- Division of Mathematical Biology, The National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK
| | - William R Taylor
- Division of Mathematical Biology, The National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK
| |
Collapse
|