1
|
Wang L, Wen Z, Liu SW, Zhang L, Finley C, Lee HJ, Fan HJS. Overview of AlphaFold2 and breakthroughs in overcoming its limitations. Comput Biol Med 2024; 176:108620. [PMID: 38761500 DOI: 10.1016/j.compbiomed.2024.108620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 05/01/2024] [Accepted: 05/14/2024] [Indexed: 05/20/2024]
Abstract
Predicting three-dimensional (3D) protein structures has been challenging for decades. The emergence of AlphaFold2 (AF2), a deep learning-based machine learning method developed by DeepMind, became a game changer in the protein folding community. AF2 can predict a protein's three-dimensional structure with high confidence based on its amino acid sequence. Accurate prediction of protein structures can dramatically accelerate our understanding of biological mechanisms and provide a solid foundation for reliable drug design. Although AF2 breaks through the barriers in predicting protein structures, many rooms remain to be further studied. This review provides a brief historical overview of the development of protein structure prediction, covering template-based, template-free, and machine learning-based methods. In addition to reviewing the potential benefits (Pros) and considerations (Cons) of using AF2, this review summarizes the diverse applications, including protein structure predictions, dynamic changes, point mutation, integration of language model and experimental data, protein complex, and protein-peptide interaction. It underscores recent advancements in efficiency, reliability, and broad application of AF2. This comprehensive review offers valuable insights into the applications of AF2 and AF2-inspired AI methods in structural biology and its potential for clinically significant drug target discovery.
Collapse
Affiliation(s)
- Lei Wang
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Zehua Wen
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Shi-Wei Liu
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Lihong Zhang
- Digestive Department, Binhai New Area Hospital of TCM Tianjin, Tianjin, 300451, China
| | - Cierra Finley
- Department of Natural Sciences, Southwest Tennessee Community College, Memphis, TN, 38015, USA
| | - Ho-Jin Lee
- Department of Natural Sciences, Southwest Tennessee Community College, Memphis, TN, 38015, USA; Division of Natural & Mathematical Sciences, LeMoyne-Own College, Memphis, TN, 38126, USA.
| | - Hua-Jun Shawn Fan
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China.
| |
Collapse
|
2
|
Yadav NS, Kumar P, Singh I. Structural and functional analysis of protein. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00026-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
3
|
Green biomanufacturing promoted by automatic retrobiosynthesis planning and computational enzyme design. Chin J Chem Eng 2022. [DOI: 10.1016/j.cjche.2021.08.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
4
|
Zhang GJ, Ma LF, Wang XQ, Zhou XG. Secondary Structure and Contact Guided Differential Evolution for Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1068-1081. [PMID: 30295627 DOI: 10.1109/tcbb.2018.2873691] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Ab initio protein tertiary structure prediction is one of the long-standing problems in structural bioinformatics. With the help of residue-residue contact and secondary structure prediction information, the accuracy of ab initio structure prediction can be enhanced. In this study, an improved differential evolution with secondary structure and residue-residue contact information referred to as SCDE is proposed for protein structure prediction. In SCDE, two score models based on secondary structure and contact information are proposed, and two selection strategies, namely, secondary structure-based selection strategy and contact-based selection strategy, are designed to guide conformation space search. A probability distribution function is designed to balance these two selection strategies. Experimental results on a benchmark dataset with 28 proteins and four free model targets in CASP12 demonstrate that the proposed SCDE is effective and efficient.
Collapse
|
5
|
Abbass J, Nebel JC. Enhancing fragment-based protein structure prediction by customising fragment cardinality according to local secondary structure. BMC Bioinformatics 2020; 21:170. [PMID: 32357827 PMCID: PMC7195757 DOI: 10.1186/s12859-020-3491-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 04/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Whenever suitable template structures are not available, usage of fragment-based protein structure prediction becomes the only practical alternative as pure ab initio techniques require massive computational resources even for very small proteins. However, inaccuracy of their energy functions and their stochastic nature imposes generation of a large number of decoys to explore adequately the solution space, limiting their usage to small proteins. Taking advantage of the uneven complexity of the sequence-structure relationship of short fragments, we adjusted the fragment insertion process by customising the number of available fragment templates according to the expected complexity of the predicted local secondary structure. Whereas the number of fragments is kept to its default value for coil regions, important and dramatic reductions are proposed for beta sheet and alpha helical regions, respectively. RESULTS The evaluation of our fragment selection approach was conducted using an enhanced version of the popular Rosetta fragment-based protein structure prediction tool. It was modified so that the number of fragment candidates used in Rosetta could be adjusted based on the local secondary structure. Compared to Rosetta's standard predictions, our strategy delivered improved first models, + 24% and + 6% in terms of GDT, when using 2000 and 20,000 decoys, respectively, while reducing significantly the number of fragment candidates. Furthermore, our enhanced version of Rosetta is able to deliver with 2000 decoys a performance equivalent to that produced by standard Rosetta while using 20,000 decoys. We hypothesise that, as the fragment insertion process focuses on the most challenging regions, such as coils, fewer decoys are needed to explore satisfactorily conformation spaces. CONCLUSIONS Taking advantage of the high accuracy of sequence-based secondary structure predictions, we showed the value of that information to customise the number of candidates used during the fragment insertion process of fragment-based protein structure prediction. Experimentations conducted using standard Rosetta showed that, when using the recommended number of decoys, i.e. 20,000, our strategy produces better results. Alternatively, similar results can be achieved using only 2000 decoys. Consequently, we recommend the adoption of this strategy to either improve significantly model quality or reduce processing times by a factor 10.
Collapse
Affiliation(s)
- Jad Abbass
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE UK
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE UK
| |
Collapse
|
6
|
Li ZW, Sun K, Hao XH, Hu J, Ma LF, Zhou XG, Zhang GJ. Loop Enhanced Conformational Resampling Method for Protein Structure Prediction. IEEE Trans Nanobioscience 2019; 18:567-577. [PMID: 31180866 DOI: 10.1109/tnb.2019.2922101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Protein structure prediction has been a long-standing problem for the past decades. In particular, the loop region structure remains an obstacle in forming an accurate protein tertiary structure because of its flexibility. In this study, Rama torsion angle and secondary structure feature-guided differential evolution named RSDE is proposed to predict three-dimensional structure with the exploitation on the loop region structure. In RSDE, the structure of the loop region is improved by the following: loop-based cross operator, which interchanges configuration of a randomly selected loop region between individuals, and loop-based mutate operator, which considers torsion angle feature into conformational sampling. A stochastic ranking selective strategy is designed to select conformations with low energy and near-native structure. Moreover, the conformational resampling method, which uses previously learned knowledge to guide subsequent sampling, is proposed to improve the sampling efficiency. Experiments on a total of 28 test proteins reveals that the proposed RSDE is effective and can obtain native-like models.
Collapse
|
7
|
Peyravi F, Latif A, Moshtaghioun SM. Protein tertiary structure prediction using hidden Markov model based on lattice. J Bioinform Comput Biol 2019; 17:1950007. [PMID: 31057069 DOI: 10.1142/s0219720019500070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The prediction of protein structure from its amino acid sequence is one of the most prominent problems in computational biology. The biological function of a protein depends on its tertiary structure which is determined by its amino acid sequence via the process of protein folding. We propose a novel fold recognition method for protein tertiary structure prediction based on a hidden Markov model and 3D coordinates of amino acid residues. The method introduces states based on the basis vectors in Bravais cubic lattices to learn the path of amino acids of the proteins of each fold. Three hidden Markov models are considered based on simple cubic, body-centered cubic (BCC) and face-centered cubic (FCC) lattices. A 10-fold cross validation was performed on a set of 42 fold SCOP dataset. The proposed composite methodology is compared to fold recognition methods which have HMM as base of their algorithms having approaches on only amino acid sequence or secondary structure. The accuracy of proposed model based on face-centered cubic lattices is quite better in comparison with SAM, 3-HMM optimized and Markov chain optimized in overall experiment. The huge data of 3D space help the model to have greater performance in comparison to methods which use only primary structures or only secondary structures.
Collapse
Affiliation(s)
- Farzad Peyravi
- * Department of Computer Engineering, Yazd University, Yazd, Iran
| | | | | |
Collapse
|
8
|
A Composite Approach to Protein Tertiary Structure Prediction: Hidden Markov Model Based on Lattice. Bull Math Biol 2018; 81:899-918. [DOI: 10.1007/s11538-018-00542-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 11/28/2018] [Indexed: 11/25/2022]
|
9
|
Deng H, Jia Y, Zhang Y. Protein structure prediction. INTERNATIONAL JOURNAL OF MODERN PHYSICS. B 2018; 32:1840009. [PMID: 30853739 PMCID: PMC6407873 DOI: 10.1142/s021797921840009x] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Predicting 3D structure of protein from its amino acid sequence is one of the most important unsolved problems in biophysics and computational biology. This paper attempts to give a comprehensive introduction of the most recent effort and progress on protein structure prediction. Following the general flowchart of structure prediction, related concepts and methods are presented and discussed. Moreover, brief introductions are made to several widely-used prediction methods and the community-wide critical assessment of protein structure prediction (CASP) experiments.
Collapse
Affiliation(s)
- Haiyou Deng
- College of Science, Huazhong Agricultural University, Wuhan 4R0070, P. R. China
| | - Ya Jia
- College of Physical Science and Technology, Central China Normal University, Wuhan 430079, P. R. China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 45108, USA
| |
Collapse
|
10
|
Simoncini D, Schiex T, Zhang KYJ. Balancing exploration and exploitation in population-based sampling improves fragment-based de novo protein structure prediction. Proteins 2017; 85:852-858. [PMID: 28066917 DOI: 10.1002/prot.25244] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Revised: 11/29/2016] [Accepted: 12/18/2016] [Indexed: 01/17/2023]
Abstract
Conformational search space exploration remains a major bottleneck for protein structure prediction methods. Population-based meta-heuristics typically enable the possibility to control the search dynamics and to tune the balance between local energy minimization and search space exploration. EdaFold is a fragment-based approach that can guide search by periodically updating the probability distribution over the fragment libraries used during model assembly. We implement the EdaFold algorithm as a Rosetta protocol and provide two different probability update policies: a cluster-based variation (EdaRosec ) and an energy-based one (EdaRoseen ). We analyze the search dynamics of our new Rosetta protocols and show that EdaRosec is able to provide predictions with lower C αRMSD to the native structure than EdaRoseen and Rosetta AbInitio Relax protocol. Our software is freely available as a C++ patch for the Rosetta suite and can be downloaded from http://www.riken.jp/zhangiru/software/. Our protocols can easily be extended in order to create alternative probability update policies and generate new search dynamics. Proteins 2017; 85:852-858. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- David Simoncini
- INRA MIAT, UR 875, Castanet-Tolosan Cedex, 31326, France.,Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, 1-7-22 Suehiro, Yokohama, Kanagawa, 230-0045, Japan
| | - Thomas Schiex
- INRA MIAT, UR 875, Castanet-Tolosan Cedex, 31326, France
| | - Kam Y J Zhang
- Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, 1-7-22 Suehiro, Yokohama, Kanagawa, 230-0045, Japan
| |
Collapse
|
11
|
Garza-Fabre M, Kandathil SM, Handl J, Knowles J, Lovell SC. Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction. EVOLUTIONARY COMPUTATION 2016; 24:577-607. [PMID: 26908350 DOI: 10.1162/evco_a_00176] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Computational approaches to de novo protein tertiary structure prediction, including those based on the preeminent "fragment-assembly" technique, have failed to scale up fully to larger proteins (on the order of 100 residues and above). A number of limiting factors are thought to contribute to the scaling problem over and above the simple combinatorial explosion, but the key ones relate to the lack of exploration of properly diverse protein folds, and to an acute form of "deception" in the energy function, whereby low-energy conformations do not reliably equate with native structures. In this article, solutions to both of these problems are investigated through a multistage memetic algorithm incorporating the successful Rosetta method as a local search routine. We found that specialised genetic operators significantly add to structural diversity and that this translates well to reaching low energies. The use of a generalised stochastic ranking procedure for selection enables the memetic algorithm to handle and traverse deep energy wells that can be considered deceptive, which further adds to the ability of the algorithm to obtain a much-improved diversity of folds. The results should translate to a tangible improvement in the performance of protein structure prediction algorithms in blind experiments such as CASP, and potentially to a further step towards the more challenging problem of predicting the three-dimensional shape of large proteins.
Collapse
Affiliation(s)
- Mario Garza-Fabre
- Decision and Cognitive Sciences Research Centre, University of Manchester, Manchester, M15 6PB, UK
| | - Shaun M Kandathil
- Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| | - Julia Handl
- Decision and Cognitive Sciences Research Centre, University of Manchester, Manchester, M15 6PB, UK
| | - Joshua Knowles
- School of Computer Science, University of Birmingham, Birmingham, B15 2TT, UK
| | - Simon C Lovell
- Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| |
Collapse
|
12
|
Kandathil SM, Handl J, Lovell SC. Toward a detailed understanding of search trajectories in fragment assembly approaches to protein structure prediction. Proteins 2016; 84:411-26. [PMID: 26799916 PMCID: PMC4982100 DOI: 10.1002/prot.24987] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Revised: 12/03/2015] [Accepted: 12/31/2015] [Indexed: 11/30/2022]
Abstract
Energy functions, fragment libraries, and search methods constitute three key components of fragment‐assembly methods for protein structure prediction, which are all crucial for their ability to generate high‐accuracy predictions. All of these components are tightly coupled; efficient searching becomes more important as the quality of fragment libraries decreases. Given these relationships, there is currently a poor understanding of the strengths and weaknesses of the sampling approaches currently used in fragment‐assembly techniques. Here, we determine how the performance of search techniques can be assessed in a meaningful manner, given the above problems. We describe a set of techniques that aim to reduce the impact of the energy function, and assess exploration in view of the search space defined by a given fragment library. We illustrate our approach using Rosetta and EdaFold, and show how certain features of these methods encourage or limit conformational exploration. We demonstrate that individual trajectories of Rosetta are susceptible to local minima in the energy landscape, and that this can be linked to non‐uniform sampling across the protein chain. We show that EdaFold's novel approach can help balance broad exploration with locating good low‐energy conformations. This occurs through two mechanisms which cannot be readily differentiated using standard performance measures: exclusion of false minima, followed by an increasingly focused search in low‐energy regions of conformational space. Measures such as ours can be helpful in characterizing new fragment‐based methods in terms of the quality of conformational exploration realized. Proteins 2016; 84:411–426. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Faculty of Life Sciences, the University of Manchester, Manchester, M13 9PL, United Kingdom
| | - Julia Handl
- Alliance Manchester Business School, Faculty of Humanities, the University of Manchester, Manchester, M13 9PL, United Kingdom
| | - Simon C Lovell
- Faculty of Life Sciences, the University of Manchester, Manchester, M13 9PL, United Kingdom
| |
Collapse
|
13
|
Three-dimensional protein structure prediction: Methods and computational strategies. Comput Biol Chem 2014; 53PB:251-276. [DOI: 10.1016/j.compbiolchem.2014.10.001] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 10/03/2014] [Accepted: 10/07/2014] [Indexed: 01/01/2023]
|
14
|
Shrestha R, Zhang KYJ. Improving fragment quality for de novo structure prediction. Proteins 2014; 82:2240-52. [PMID: 24753351 DOI: 10.1002/prot.24587] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Revised: 04/03/2014] [Accepted: 04/15/2014] [Indexed: 11/08/2022]
Abstract
De novo structure prediction can be defined as a search in conformational space under the guidance of an energy function. The most successful de novo structure prediction methods, such as Rosetta, assemble the fragments from known structures to reduce the search space. Therefore, the fragment quality is an important factor in structure prediction. In our study, a method is proposed to generate a new set of fragments from the lowest energy de novo models. These fragments were subsequently used to predict the next-round of models. In a benchmark of 30 proteins, the new set of fragments showed better performance when used to predict de novo structures. The lowest energy model predicted using our method was closer to native structure than Rosetta for 22 proteins. Following a similar trend, the best model among top five lowest energy models predicted using our method was closer to native structure than Rosetta for 20 proteins. In addition, our experiment showed that the C-alpha root mean square deviation was improved from 5.99 to 5.03 Å on average compared to Rosetta when the lowest energy models were picked as the best predicted models.
Collapse
Affiliation(s)
- Rojan Shrestha
- Zhang Initiative Research Unit, Institute Laboratories, RIKEN, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan; Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-0882, Japan
| | | |
Collapse
|
15
|
Simoncini D, Zhang KYJ. Efficient sampling in fragment-based protein structure prediction using an estimation of distribution algorithm. PLoS One 2013; 8:e68954. [PMID: 23935913 PMCID: PMC3723781 DOI: 10.1371/journal.pone.0068954] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Accepted: 06/07/2013] [Indexed: 11/19/2022] Open
Abstract
Fragment assembly is a powerful method of protein structure prediction that builds protein models from a pool of candidate fragments taken from known structures. Stochastic sampling is subsequently used to refine the models. The structures are first represented as coarse-grained models and then as all-atom models for computational efficiency. Many models have to be generated independently due to the stochastic nature of the sampling methods used to search for the global minimum in a complex energy landscape. In this paper we present EdaFold(AA), a fragment-based approach which shares information between the generated models and steers the search towards native-like regions. A distribution over fragments is estimated from a pool of low energy all-atom models. This iteratively-refined distribution is used to guide the selection of fragments during the building of models for subsequent rounds of structure prediction. The use of an estimation of distribution algorithm enabled EdaFold(AA) to reach lower energy levels and to generate a higher percentage of near-native models. [Formula: see text] uses an all-atom energy function and produces models with atomic resolution. We observed an improvement in energy-driven blind selection of models on a benchmark of EdaFold(AA) in comparison with the [Formula: see text] AbInitioRelax protocol.
Collapse
Affiliation(s)
- David Simoncini
- Zhang Initiative Research Unit, Institute Laboratories, RIKEN, Wako, Saitama, Japan
| | - Kam Y. J. Zhang
- Zhang Initiative Research Unit, Institute Laboratories, RIKEN, Wako, Saitama, Japan
- * E-mail:
| |
Collapse
|
16
|
Yuzlenko O, Lazaridis T. Membrane protein native state discrimination by implicit membrane models. J Comput Chem 2013; 34:731-8. [PMID: 23224861 PMCID: PMC3584241 DOI: 10.1002/jcc.23189] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2012] [Revised: 10/16/2012] [Accepted: 10/28/2012] [Indexed: 02/01/2023]
Abstract
Four implicit membrane models [IMM1, generalized Born (GB)-surface area-implicit membrane (GBSAIM), GB with a simple switching (GBSW), and heterogeneous dielectric GB (HDGB)] were tested for their ability to discriminate the native conformation of five membrane proteins from 450 decoys generated by the Rosetta-Membrane program. The energy ranking of the native state and Z-scores were used to assess the performance of the models. The effect of membrane thickness was examined and was found to be substantial. Quite satisfactory discrimination was achieved with the all-atom IMM1 and GBSW models at 25.4 Å thickness and with the HDGB model at 28.5 Å thickness. The energy components by themselves were not discriminative. Both van der Waals and electrostatic interactions contributed to native state discrimination, to a different extent in each model. Computational efficiency of the models decreased in the order: extended-atom IMM1 > all-atom IMM1 > GBSAIM > GBSW > HDGB. These results encourage the further development and use of implicit membrane models for membrane protein structure prediction.
Collapse
Affiliation(s)
- Olga Yuzlenko
- Department of Chemistry, City College of the City University of New York, 160 Convent Avenue, New York, New York 10031, USA
| | | |
Collapse
|
17
|
Dal Palú A, Spyrakis F, Cozzini P. A new approach for investigating protein flexibility based on Constraint Logic Programming. The first application in the case of the estrogen receptor. Eur J Med Chem 2012; 49:127-40. [PMID: 22277571 DOI: 10.1016/j.ejmech.2012.01.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2011] [Revised: 01/05/2012] [Accepted: 01/05/2012] [Indexed: 12/01/2022]
Abstract
We describe the potential of a novel method, based on Constraint Logic Programming (CLP), developed for an exhaustive sampling of protein conformational space. The CLP framework proposed here has been tested and applied to the estrogen receptor, whose activity and function is strictly related to its intrinsic, and well known, dynamics. We have investigated in particular the flexibility of H12, focusing on the pathways followed by the helix when moving from one stable crystallographic conformation to the others. Millions of geometrically feasible conformations were generated, selected and the traces connecting the different forms were determined by using a shortest path algorithm. The preliminary analyses showed a marked agreement between the crystallographic agonist-like, antagonist-like and hypothetical apo forms, and the corresponding conformations identified by the CLP framework. These promising results, together with the short computational time required to perform the analyses, make this constraint-based approach a valuable tool for the study of protein folding prediction. The CLP framework enables one to consider various structural and energetic scenarious, without changing the core algorithm. To show the feasibility of the method, we intentionally choose a pure geometric setting, neglecting the energetic evaluation of the poses, in order to be independent from a specific force field and to provide the possibility of comparing different behaviours associated with various energy models.
Collapse
|
18
|
Handl J, Knowles J, Vernon R, Baker D, Lovell SC. The dual role of fragments in fragment-assembly methods for de novo protein structure prediction. Proteins 2011; 80:490-504. [PMID: 22095594 DOI: 10.1002/prot.23215] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Revised: 08/17/2011] [Accepted: 09/14/2011] [Indexed: 11/07/2022]
Abstract
In fragment-assembly techniques for protein structure prediction, models of protein structure are assembled from fragments of known protein structures. This process is typically guided by a knowledge-based energy function and uses a heuristic optimization method. The fragments play two important roles in this process: they define the set of structural parameters available, and they also assume the role of the main variation operators that are used by the optimiser. Previous analysis has typically focused on the first of these roles. In particular, the relationship between local amino acid sequence and local protein structure has been studied by a range of authors. The correlation between the two has been shown to vary with the window length considered, and the results of these analyses have informed directly the choice of fragment length in state-of-the-art prediction techniques. Here, we focus on the second role of fragments and aim to determine the effect of fragment length from an optimization perspective. We use theoretical analyses to reveal how the size and structure of the search space changes as a function of insertion length. Furthermore, empirical analyses are used to explore additional ways in which the size of the fragment insertion influences the search both in a simulation model and for the fragment-assembly technique, Rosetta.
Collapse
Affiliation(s)
- Julia Handl
- Manchester Business School, The University of Manchester, United Kingdom.
| | | | | | | | | |
Collapse
|
19
|
Wang Z, Xu J. A conditional random fields method for RNA sequence-structure relationship modeling and conformation sampling. ACTA ACUST UNITED AC 2011; 27:i102-10. [PMID: 21685058 PMCID: PMC3117333 DOI: 10.1093/bioinformatics/btr232] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Accurate tertiary structures are very important for the functional study of non-coding RNA molecules. However, predicting RNA tertiary structures is extremely challenging, because of a large conformation space to be explored and lack of an accurate scoring function differentiating the native structure from decoys. The fragment-based conformation sampling method (e.g. FARNA) bears shortcomings that the limited size of a fragment library makes it infeasible to represent all possible conformations well. A recent dynamic Bayesian network method, BARNACLE, overcomes the issue of fragment assembly. In addition, neither of these methods makes use of sequence information in sampling conformations. Here, we present a new probabilistic graphical model, conditional random fields (CRFs), to model RNA sequence–structure relationship, which enables us to accurately estimate the probability of an RNA conformation from sequence. Coupled with a novel tree-guided sampling scheme, our CRF model is then applied to RNA conformation sampling. Experimental results show that our CRF method can model RNA sequence–structure relationship well and sequence information is important for conformation sampling. Our method, named as TreeFolder, generates a much higher percentage of native-like decoys than FARNA and BARNACLE, although we use the same simple energy function as BARNACLE. Contact:zywang@ttic.edu; j3xu@ttic.edu Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhiyong Wang
- Toyota Technological Institute at Chicago, IL, USA.
| | | |
Collapse
|
20
|
Lee J, Lee J, Sasaki TN, Sasai M, Seok C, Lee J. De novo
protein structure prediction by dynamic fragment assembly and conformational space annealing. Proteins 2011; 79:2403-17. [DOI: 10.1002/prot.23059] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Revised: 03/24/2011] [Accepted: 04/12/2011] [Indexed: 12/25/2022]
|
21
|
Lee J, Lee D, Park H, Coutsias EA, Seok C. Protein loop modeling by using fragment assembly and analytical loop closure. Proteins 2010; 78:3428-36. [PMID: 20872556 PMCID: PMC2976774 DOI: 10.1002/prot.22849] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2010] [Revised: 07/16/2010] [Accepted: 07/31/2010] [Indexed: 12/27/2022]
Abstract
Protein loops are often involved in important biological functions such as molecular recognition, signal transduction, or enzymatic action. The three dimensional structures of loops can provide essential information for understanding molecular mechanisms behind protein functions. In this article, we develop a novel method for protein loop modeling, where the loop conformations are generated by fragment assembly and analytical loop closure. The fragment assembly method reduces the conformational space drastically, and the analytical loop closure method finds the geometrically consistent loop conformations efficiently. We also derive an analytic formula for the gradient of any analytical function of dihedral angles in the space of closed loops. The gradient can be used to optimize various restraints derived from experiments or databases, for example restraints for preferential interactions between specific residues or for preferred backbone angles. We demonstrate that the current loop modeling method outperforms previous methods that employ residue-based torsion angle maps or different loop closure strategies when tested on two sets of loop targets of lengths ranging from 4 to 12.
Collapse
Affiliation(s)
- Julian Lee
- Department of Bioinformatics and Life Science, Soongsil University, Seoul 156-743, Korea
| | - Dongseon Lee
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Hahnbeom Park
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Evangelos A. Coutsias
- Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM 87131, USA
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| |
Collapse
|
22
|
Kim SY. An off-lattice frustrated model protein with a six-stranded β-barrel structure. J Chem Phys 2010; 133:135102. [DOI: 10.1063/1.3494038] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
23
|
Zhang J, Wang Q, Barz B, He Z, Kosztin I, Shang Y, Xu D. MUFOLD: A new solution for protein 3D structure prediction. Proteins 2010; 78:1137-52. [PMID: 19927325 DOI: 10.1002/prot.22634] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
There have been steady improvements in protein structure prediction during the past 2 decades. However, current methods are still far from consistently predicting structural models accurately with computing power accessible to common users. Toward achieving more accurate and efficient structure prediction, we developed a number of novel methods and integrated them into a software package, MUFOLD. First, a systematic protocol was developed to identify useful templates and fragments from Protein Data Bank for a given target protein. Then, an efficient process was applied for iterative coarse-grain model generation and evaluation at the Calpha or backbone level. In this process, we construct models using interresidue spatial restraints derived from alignments by multidimensional scaling, evaluate and select models through clustering and static scoring functions, and iteratively improve the selected models by integrating spatial restraints and previous models. Finally, the full-atom models were evaluated using molecular dynamics simulations based on structural changes under simulated heating. We have continuously improved the performance of MUFOLD by using a benchmark of 200 proteins from the Astral database, where no template with >25% sequence identity to any target protein is included. The average root-mean-square deviation of the best models from the native structures is 4.28 A, which shows significant and systematic improvement over our previous methods. The computing time of MUFOLD is much shorter than many other tools, such as Rosetta. MUFOLD demonstrated some success in the 2008 community-wide experiment for protein structure prediction CASP8.
Collapse
Affiliation(s)
- Jingfen Zhang
- Department of Computer Science, University of Missouri, Columbia, Missouri 65211, USA
| | | | | | | | | | | | | |
Collapse
|
24
|
Helles G. A comparative study of the reported performance of ab initio protein structure prediction algorithms. J R Soc Interface 2008; 5:387-96. [PMID: 18077243 DOI: 10.1098/rsif.2007.1278] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein structure prediction is one of the major challenges in bioinformatics today. Throughout the past five decades, many different algorithmic approaches have been attempted, and although progress has been made the problem remains unsolvable even for many small proteins. While the general objective is to predict the three-dimensional structure from primary sequence, our current knowledge and computational power are simply insufficient to solve a problem of such high complexity. Some prediction algorithms do, however, appear to perform better than others, although it is not always obvious which ones they are and it is perhaps even less obvious why that is. In this review, the reported performance results from 18 different recently published prediction algorithms are compared. Furthermore, the general algorithmic settings most likely responsible for the difference in the reported performance are identified, and the specific settings of each of the 18 prediction algorithms are also compared. The average normalized r.m.s.d. scores reported range from 11.17 to 3.48. With a performance measure including both r.m.s.d. scores and CPU time, the currently best-performing prediction algorithm is identified to be the I-TASSER algorithm. Two of the algorithmic settings--protein representation and fragment assembly--were found to have definite positive influence on the running time and the predicted structures, respectively. There thus appears to be a clear benefit from incorporating this knowledge in the design of new prediction algorithms.
Collapse
Affiliation(s)
- Glennie Helles
- University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark.
| |
Collapse
|
25
|
Dong Q, Wang X, Lin L, Wang Y. Analysis and prediction of protein local structure based on structure alphabets. Proteins 2008; 72:163-72. [DOI: 10.1002/prot.21904] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
26
|
Computational study for protein-protein docking using global optimization and empirical potentials. Int J Mol Sci 2008; 9:65-77. [PMID: 19325720 PMCID: PMC2635596 DOI: 10.3390/ijms9010065] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2007] [Accepted: 01/15/2008] [Indexed: 11/17/2022] Open
Abstract
Protein-protein interactions are important for biochemical processes in biological systems. The 3D structure of the macromolecular complex resulting from the protein-protein association is a very useful source to understand its specific functions. This work focuses on computational study for protein-protein docking, where the individually crystallized structures of interacting proteins are treated as rigid, and the conformational space generated by the two interacting proteins is explored extensively. The energy function consists of intermolecular electrostatic potential, desolvation free energy represented by empirical contact potential, and simple repulsive energy terms. The conformational space is six dimensional, represented by translational vectors and rotational angles formed between two interacting proteins. The conformational sampling is carried out by the search algorithms such as simulated annealing (SA), conformational space annealing (CSA), and CSA combined with SA simulations (combined CSA/SA). Benchmark tests are performed on a set of 18 protein-protein complexes selected from various protein families to examine feasibility of these search methods coupled with the energy function above for protein docking study.
Collapse
|
27
|
A historical perspective of template-based protein structure prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:3-42. [PMID: 18075160 DOI: 10.1007/978-1-59745-574-9_1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This chapter presents a broad and a historical overview of the problem of protein structure prediction. Different structure prediction methods, including homology modeling, fold recognition (FR)/protein threading, ab initio/de novo approaches, and hybrid techniques involving multiple types of approaches, are introduced in a historical context. The progress of the field as a whole, especially in the threading/FR area, as reflected by the CASP/CAFASP contests, is reviewed. At the end of the chapter, we discuss the challenging issues ahead in the field of protein structure prediction.
Collapse
|
28
|
Deronne KW, Karypis G. Effective optimization algorithms for fragment-assembly based protein structure prediction. J Bioinform Comput Biol 2007; 5:335-52. [PMID: 17589965 DOI: 10.1142/s0219720007002618] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2006] [Revised: 11/17/2006] [Accepted: 12/11/2006] [Indexed: 11/18/2022]
Abstract
Despite recent developments in protein structure prediction, an accurate new fold prediction algorithm remains elusive. One of the challenges facing current techniques is the size and complexity of the space containing possible structures for a query sequence. Traditionally, to explore this space fragment assembly approaches to new fold prediction have used stochastic optimization techniques. Here, we examine deterministic algorithms for optimizing scoring functions in protein structure prediction. Two previously unused techniques are applied to the problem, called the Greedy algorithm and the Hill-climbing (HC) algorithm. The main difference between the two is that the latter implements a technique to overcome local minima. Experiments on a diverse set of 276 proteins show that the HC algorithms consistently outperform existing approaches based on Simulated Annealing optimization (a traditional stochastic technique) in optimizing the root mean squared deviation between native and working structures.
Collapse
Affiliation(s)
- Kevin W Deronne
- Department of Computer Science & Engineering, Digital Technology Center, Army HPC Research Center, University of Minnesota, Minneapolis, MN 55455, USA.
| | | |
Collapse
|
29
|
Yang Y, Liu H. Genetic algorithms for protein conformation sampling and optimization in a discrete backbone dihedral angle space. J Comput Chem 2007; 27:1593-602. [PMID: 16868993 DOI: 10.1002/jcc.20463] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We have investigated protein conformation sampling and optimization based on the genetic algorithm and discrete main chain dihedral state model. An efficient approach combining the genetic algorithm with local minimization and with a niche technique based on the sharing function is proposed. Using two different types of potential energy functions, a Go-type potential function and a knowledge-based pairwise potential energy function, and a test set containing small proteins of varying sizes and secondary structure compositions, we demonstrated the importance of local minimization and population diversity in protein conformation optimization with genetic algorithms. Some general properties of the sampled conformations such as their native-likeness and the influences of including side-chains are discussed.
Collapse
Affiliation(s)
- Yuedong Yang
- Hefei National Laboratory for Physical Sciences, Key Laboratory of Structural Biology, School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230026, People's Republic of China
| | | |
Collapse
|
30
|
Dong QW, Wang XL, Lin L. Methods for optimizing the structure alphabet sequences of proteins. Comput Biol Med 2007; 37:1610-6. [PMID: 17493604 DOI: 10.1016/j.compbiomed.2007.03.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2006] [Accepted: 03/16/2007] [Indexed: 11/24/2022]
Abstract
Protein structure prediction based on fragment assemble has made great progress in recent years. Local protein structure prediction is receiving increased attention. One essential step of local protein structure prediction method is that the three-dimensional conformations must be compressed into one-dimensional series of letters of a structural alphabet. The traditional method assigns each structure fragment the structure alphabet that has the best local structure similarity. However, such locally optimal structure alphabet sequence does not guarantee to produce the globally optimal structure. This study presents two efficient methods trying to find the optimal structure alphabet sequence, which can model the native structures as accuracy as possible. First, a 28-letter structure alphabet is derived by clustering fragment in Cartesian space with fragment length of seven residues. The average quantization error of the 28 letters is 0.82 A in term of root mean square deviation. Then, two efficient methods are presented to encode the protein structures into series of structure alphabet letters, that is, the greedy and dynamic programming algorithm. They are tested on PDB database using the structure alphabet developed in Cartesian coordinates space (our structure alphabet) and in torsion angles space (the PB structure alphabet), respectively. The experimental results show that these two methods can find the approximately optimal structure alphabet sequences by searching a small fraction of the modeling space. The traditional local-optimization method achieves 26.27 A root mean square deviations between the reconstructed structures and the native one, while the modeling accuracy is improved to 3.28 A by the greedy algorithm. The results are helpful for local protein structure prediction.
Collapse
Affiliation(s)
- Qi-wen Dong
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | | | | |
Collapse
|
31
|
McAllister SR, Mickus BE, Klepeis JL, Floudas CA. Novel approach for alpha-helical topology prediction in globular proteins: generation of interhelical restraints. Proteins 2007; 65:930-52. [PMID: 17029234 DOI: 10.1002/prot.21095] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The protein folding problem represents one of the most challenging problems in computational biology. Distance constraints and topology predictions can be highly useful for the folding problem in reducing the conformational space that must be searched by deterministic algorithms to find a protein structure of minimum conformational energy. We present a novel optimization framework for predicting topological contacts and generating interhelical distance restraints between hydrophobic residues in alpha-helical globular proteins. It should be emphasized that since the model does not make assumptions about the form of the helices, it is applicable to all alpha-helical proteins, including helices with kinks and irregular helices. This model aims at enhancing the ASTRO-FOLD protein folding approach of Klepeis and Floudas (Journal of Computational Chemistry 2003;24:191-208), which finds the structure of global minimum conformational energy via a constrained nonlinear optimization problem. The proposed topology prediction model was evaluated on 26 alpha-helical proteins ranging from 2 to 8 helices and 35 to 159 residues, and the best identified average interhelical distances corresponding to the predicted contacts fell below 11 A in all 26 of these systems. Given the positive results of applying the model to several protein systems, the importance of interhelical hydrophobic-to-hydrophobic contacts in determining the folding of alpha-helical globular proteins is highlighted.
Collapse
Affiliation(s)
- S R McAllister
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | | | |
Collapse
|
32
|
Kim SY, Lee W, Lee J. Protein folding using fragment assembly and physical energy function. J Chem Phys 2007; 125:194908. [PMID: 17129168 DOI: 10.1063/1.2364500] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We perform a systematic study of the effects of sequence-independent backbone interactions and sequence-dependent side-chain interactions on protein folding using fragment assembly and physical energy function. Structures for ten proteins belonging to various structural classes are predicted only with Lennard-Jones interaction between backbone atoms. We find nativelike structures for beta proteins, suggesting that for proteins in this class, the global tertiary structures can be determined mainly by sequence-independent backbone interactions. On the other hand, for alpha proteins, nonlocal hydrophobic side-chain interaction is also required to obtain nativelike structures.
Collapse
Affiliation(s)
- Seung-Yeon Kim
- School of General Education, ChungJu National University, Chungju 380-702, Korea
| | | | | |
Collapse
|
33
|
Zhang N, Ruan J, Wu J, Zhang T. SHEETSPAIR: A Database of Amino Acid Pairs in Protein Sheet Structures. DATA SCIENCE JOURNAL 2007. [DOI: 10.2481/dsj.6.s589] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
34
|
Abstract
Many of the recent secondary structure prediction methods incorporate the idea of fuzzy set theory, where instead of assigning a definite secondary structure to a query residue, probability for the residue being in each of the conformational states is estimated. Moreover, continuous assignment of conformational states to the experimentally observed protein structures can be performed in order to reflect inherent flexibility. Although various measures have been developed for evaluating performances of secondary structure prediction methods, they depend only on the most probable secondary structures. They do not assess the accuracy of the probabilities produced by fuzzy prediction methods, and they cannot incorporate information contained in continuous assignments of conformational states to observed structures. Three important measures for evaluating performance of a secondary structure prediction algorithm, Q score, Segment OVerlap (SOV) measure, and the k-state correlation coefficient (Corr), are deformed into fuzzy measures F score, Fuzzy OVerlap (FOV) measure, and the fuzzy correlation coefficient (Forr), so that the new measures not only assess probabilistic outputs of fuzzy prediction methods, but also incorporate information from continuous assignments of secondary structure. As an example of application, prediction results of four fuzzy secondary structure prediction methods, PSIPRED, PROFking, SABLE, and PREDICT, are assessed using the new fuzzy measures.
Collapse
Affiliation(s)
- Julian Lee
- Department of Bioinformatics and Life Science, Bioinformatics and Molecular Design Technology Innovation Center and Computer Aided Molecular Design Research Center, Soongsil University, Seoul 156-743, Korea.
| |
Collapse
|
35
|
Huang W, Chen M, Lü Z. Energy optimization for off-lattice protein folding. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 74:041907. [PMID: 17155096 DOI: 10.1103/physreve.74.041907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2006] [Indexed: 05/12/2023]
Abstract
Two three-dimensional AB off-lattice protein models consisting of hydrophobic and hydrophilic monomers are studied in this paper. By incorporating an extra energy contribution into the original energy function, the protein folding is converted from a constraint optimization problem into an unconstrained one which can be solved by the well-known gradient method. From the initial configurations randomly generated by the heuristic strategy proposed in this paper, our algorithm can find better results than those by nPERM for the four Fibonacci sequences. Based on the initial configurations obtained by energy landscape paving (ELP) routine, some of our results for the lowest energies are better than the best values reported in the literature.
Collapse
Affiliation(s)
- Wenqi Huang
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | | | | |
Collapse
|
36
|
Tuffery P, Derreumaux P. Dependency between consecutive local conformations helps assemble protein structures from secondary structures using Go potential and greedy algorithm. Proteins 2006; 61:732-40. [PMID: 16231300 DOI: 10.1002/prot.20698] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Discretization of protein conformational space and fragment assembly methods simplify the search of native structures. These methods, mostly of Monte Carlo and genetic-type, do not exploit, however, the fact that short fragments describing consecutive parts of proteins are conformation-dependent. Yet, this information should be useful in improving ab initio and comparative protein structure modeling. In a preliminary study, we have assessed the possibility of using greedy algorithms for protein structure reconstruction based on the assembly of fragments of four-residue length. Greedy algorithms differ from Monte Carlo and genetic approaches in that they grow a polypeptide chain one fragment after another. Here, we move one step further in complexity, and provide strong evidence that the dependence between consecutive local conformations during assembly makes possible the reconstruction of protein structures from their secondary structures using a Go potential. Overall our procedure can reproduce 20 protein structures of 50-164 amino acids within 2.7 to 6.5 A RMSd and is able to identify native topologies for all proteins, although some targets are stabilized by very long-range interactions.
Collapse
Affiliation(s)
- Pierre Tuffery
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Paris, France.
| | | |
Collapse
|
37
|
Wainreb G, Haspel N, Wolfson HJ, Nussinov R. A permissive secondary structure-guided superposition tool for clustering of protein fragments toward protein structure prediction via fragment assembly. ACTA ACUST UNITED AC 2006; 22:1343-52. [PMID: 16543273 DOI: 10.1093/bioinformatics/btl098] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
MOTIVATION Secondary-Structure Guided Superposition tool (SSGS) is a permissive secondary structure-based algorithm for matching of protein structures and in particular their fragments. The algorithm was developed towards protein structure prediction via fragment assembly. RESULTS In a fragment-based structural prediction scheme, a protein sequence is cut into building blocks (BBs). The BBs are assembled to predict their relative 3D arrangement. Finally, the assemblies are refined. To implement this prediction scheme, a clustered structural library representing sequence patterns for protein fragments is essential. To create a library, BBs generated by cutting proteins from the PDB are compared and structurally similar BBs are clustered. To allow structural comparison and clustering of the BBs, which are often relatively short with flexible loops, we have devised SSGS. SSGS maintains high similarity between cluster members and is highly efficient. When it comes to comparing BBs for clustering purposes, the algorithm obtains better results than other, non-secondary structure guided protein superimposition algorithms.
Collapse
Affiliation(s)
- Gilad Wainreb
- Sackler Institute of Molecular Medicine, Department of Human Genetics, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | | | |
Collapse
|
38
|
Arunachalam J, Kanagasabai V, Gautham N. Protein structure prediction using mutually orthogonal Latin squares and a genetic algorithm. Biochem Biophys Res Commun 2006; 342:424-33. [PMID: 16487483 DOI: 10.1016/j.bbrc.2006.01.162] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2006] [Accepted: 01/31/2006] [Indexed: 11/29/2022]
Abstract
We combine a new, extremely fast technique to generate a library of low energy structures of an oligopeptide (by using mutually orthogonal Latin squares to sample its conformational space) with a genetic algorithm to predict protein structures. The protein sequence is divided into oligopeptides, and a structure library is generated for each. These libraries are used in a newly defined mutation operator that, together with variation, crossover, and diversity operators, is used in a modified genetic algorithm to make the prediction. Application to five small proteins has yielded near native structures.
Collapse
Affiliation(s)
- J Arunachalam
- Department of Crystallography and Biophysics, University of Madras, Chennai 600025, India
| | | | | |
Collapse
|
39
|
Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R. Advances in protein structure prediction and de novo protein design: A review. Chem Eng Sci 2006. [DOI: 10.1016/j.ces.2005.04.009] [Citation(s) in RCA: 175] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
40
|
Dayalan S, Gooneratne ND, Bevinakoppa S, Schroder H. Dihedral angle and secondary structure database of short amino acid fragments. Bioinformation 2006; 1:78-80. [PMID: 17597859 PMCID: PMC1891663 DOI: 10.6026/97320630001078] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2005] [Revised: 12/18/2005] [Accepted: 12/23/2005] [Indexed: 11/23/2022] Open
Abstract
UNLABELLED Dihedral angles of amino acids are of considerable importance in protein tertiary structure prediction as they define the backbone of a protein and hence almost define the protein's entire conformation. Most ab initio protein structure prediction methods predict the secondary structure of a protein before predicting the tertiary structure because three-dimensional fold consists of repeating units of secondary structures. Hence, both dihedral angles and secondary structures are important in tertiary structure prediction of proteins. Here we describe a database called DASSD (Dihedral Angle and Secondary Structure Database of Short Amino acid Fragments) that contains dihedral angle values and secondary structure details of short amino acid fragments of lengths 1, 3 and 5. Information stored in this database was extracted from a set of 5,227 non-redundant high resolution (less than 2-angstroms) protein structures. In total, DASSD stores details for about 733,000 fragments. This database finds application in the development of ab initio protein structure prediction methods using fragment libraries and fragment assembly techniques. It is also useful in protein secondary structure prediction. AVAILABILITY DASSD can be accessed and downloaded from http://www.cs.rmit.edu.au/dassd/
Collapse
Affiliation(s)
- Saravanan Dayalan
- School of Computer Science and Information Technology, RMIT University, GPO Box 2474V, Melbourne 3001, Australia.
| | | | | | | |
Collapse
|
41
|
Fuzzy k-Nearest Neighbor Method for Protein Secondary Structure Prediction and Its Parallel Implementation. COMPUTATIONAL INTELLIGENCE AND BIOINFORMATICS 2006. [DOI: 10.1007/11816102_48] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
|
42
|
Abstract
The field of protein-structure prediction has been revolutionized by the application of "mix-and-match" methods both in template-based homology modeling and in template-free de novo folding. Consensus analysis and recombination of fragments copied from known protein structures is currently the only approach that allows the building of models that are closer to the native structure of the target protein than the structure of its closest homologue. It is also the most successful approach in cases in which the target protein exhibits a novel three-dimensional fold. This review summarizes the recent developments in both template-based and template-free protein structure modeling and compares the available methods for protein-structure prediction by recombination of fragments. A convergence between the "protein folding" and "protein evolution" schools of thought is postulated.
Collapse
Affiliation(s)
- Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland.
| |
Collapse
|
43
|
Gong H, Fleming PJ, Rose GD. Building native protein conformation from highly approximate backbone torsion angles. Proc Natl Acad Sci U S A 2005; 102:16227-32. [PMID: 16251268 PMCID: PMC1283474 DOI: 10.1073/pnas.0508415102] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2005] [Indexed: 11/18/2022] Open
Abstract
Reconstructing a protein in three dimensions from its backbone torsion angles is an ongoing challenge because minor inaccuracies in these angles produce major errors in the structure. As a familiar example, a small change in an elbow angle causes a large displacement at the end of your arm, the longer the arm, the larger the displacement. Even accurate knowledge of the backbone torsions and Psi is insufficient, owing to the small, but cumulative, deviations from ideality in backbone planarity, which, if ignored, also lead to major errors in the structure. Against this background, we conducted a computational experiment to assess whether protein conformation can be determined from highly approximate backbone torsion angles, the kind of information that is now obtained readily from NMR. Specifically, backbone torsion angles were taken from proteins of known structure and mapped into 60 degrees x 60 degrees grid squares, called mesostates. Side-chain atoms beyond the beta -carbon were discarded. A mesostate representation of the protein backbone was then used to extract likely candidates from a fragment library of mesostate pentamers, followed by Monte Carlo-based fragment-assembly simulations to identify stable conformations compatible with the given mesostate sequence. Only three simple energy terms were used to gauge stability: molecular compaction, soft-sphere repulsion, and hydrogen bonding. For the six representative proteins described here, stable conformers can be partitioned into a remarkably small number of topologically distinct clusters. Among these, the native topology is found with high frequency and can be identified as the cluster with the most favorable energy.
Collapse
Affiliation(s)
- Haipeng Gong
- T. C. Jenkins Department of Biophysics, The Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | | | | |
Collapse
|
44
|
Kim SY, Lee SB, Lee J. Structure optimization by conformational space annealing in an off-lattice protein model. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 72:011916. [PMID: 16090010 DOI: 10.1103/physreve.72.011916] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2005] [Indexed: 05/03/2023]
Abstract
The optimization results by conformational space annealing are presented for an off-lattice protein model consisting of hydrophobic and hydrophilic residues in Fibonacci sequences. The ground-state energies found are lower than those reported in the literature. In addition, the ground-state conformations in three dimensions exhibit the important aspect of forming a single hydrophobic core in real proteins. The energy landscape for the population of local minima is also investigated.
Collapse
Affiliation(s)
- Seung-Yeon Kim
- School of Computational Sciences, Korea Institute for Advanced Study, Dongdaemun-gu, Seoul
| | | | | |
Collapse
|
45
|
Lee K, Sim J, Lee J. Study of protein-protein interaction using conformational space annealing. Proteins 2005; 60:257-62. [PMID: 15981254 DOI: 10.1002/prot.20567] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We apply conformational space annealing (CSA), an efficient global optimization method, to the study of protein-protein interaction. The CSA is incorporated into the Tinker molecular modeling package along with a B-spline method for CAPRI Round 5 experiments. We have used an energy function for the protein-protein interaction that consists of electrostatic interaction, van der Waals interaction, and solvation energy terms represented by the occupancy desolvation method. The parameters of the AMBER94 all-atom empirical force field are used. Each energy term is calculated by precalculated grid potentials and B-spline method approximation. The ligand protein is placed inside a sphere of 50 A radius centered at an appropriate location, and the CSA rigid docking studies are carried out to find stable complexes. Up to 10 complexes are selected using the K-mean clustering method and biological information when available. These complexes are energy-minimized for further refinement by considering the flexibility of interacting proteins. The results show that the CSA method has a potential for the study of protein-protein interaction.
Collapse
Affiliation(s)
- Kyoungrim Lee
- School of Computational Sciences, Korea Institute for Advanced Study, Dongdaemun-gu, Seoul, Korea
| | | | | |
Collapse
|
46
|
Lee J, Kim SY, Lee J. Protein structure prediction based on fragment assembly and parameter optimization. Biophys Chem 2005; 115:209-14. [PMID: 15752606 DOI: 10.1016/j.bpc.2004.12.046] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2004] [Revised: 11/09/2004] [Accepted: 12/10/2004] [Indexed: 11/28/2022]
Abstract
We propose a novel method for ab-initio prediction of protein tertiary structures based on the fragment assembly and global optimization. Fifteen residue long fragment libraries are constructed using the secondary structure prediction method PREDICT, and fragments in these libraries are assembled to generate full-length chains of a query protein. Tertiary structures of 50 to 100 conformations are obtained by minimizing an energy function for proteins, using the conformational space annealing method that enables one to sample diverse low-lying local minima of the energy. Then in order to enhance the performance of the prediction method, we optimize the linear parameters of the energy function, so that the native-like conformations become energetically more favorable than the non-native ones for proteins with known structures. We test the feasibility of the parameter optimization procedure by applying it to the training set consisting of three proteins: the 10-55 residue fragment of staphylococcal protein A (PDB ID 1bdd), a designed protein betanova, and 1fsd.
Collapse
Affiliation(s)
- Julian Lee
- Department of Bioinformatics and Life Science, Computer Aided Molecular Design Research Center, Bioinformatics and Molecular Design Technology Innovation Center, Soongsil University, Seoul 156-743, South Korea.
| | | | | |
Collapse
|
47
|
Floudas CA. Research challenges, opportunities and synergism in systems engineering and computational biology. AIChE J 2005. [DOI: 10.1002/aic.10620] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
48
|
Lee K, Czaplewski C, Kim SY, Lee J. An efficient molecular docking using conformational space annealing. J Comput Chem 2004; 26:78-87. [PMID: 15538770 DOI: 10.1002/jcc.20147] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Molecular docking falls into the general category of global optimization problems because its main purpose is to find the most stable complex consisting of a receptor and its ligand. Conformational space annealing (CSA), a powerful global optimization method, is incorporated with the Tinker molecular modeling package to perform molecular docking simulations of six receptor-ligand complexes (3PTB, 1ULB, 2CPP, 1STP, 3CPA, and 1PPH) from the Protein Data Bank. In parallel, Monte Carlo with the minimization (MCM) method is also incorporated into the Tinker package for comparison. The energy function, consisting of electrostatic interactions, van der Waals interactions, and torsional energy terms, is calculated using the AMBER94 all-atom empirical force field. Rigid docking simulations for all six complexes and flexible docking simulations for three complexes (1STP, 3CPA, and 1PPH) are carried out using the CSA and the MCM methods. The simulation results show that the docking procedures using the CSA method generally find the most stable complexes as well as the native-like complexes more efficiently and accurately than those using the MCM, demonstrating that CSA is a promising search method for molecular docking problems.
Collapse
Affiliation(s)
- Kyoungrim Lee
- School of Computational Sciences, Korea Institute for Advanced Study, 207-43 Cheongnyangni 2-dong, Dongdaemun-gu, Seoul, South Korea
| | | | | | | |
Collapse
|