1
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
2
|
Peyravi F, Latif A, Moshtaghioun SM. Protein tertiary structure prediction using hidden Markov model based on lattice. J Bioinform Comput Biol 2019; 17:1950007. [PMID: 31057069 DOI: 10.1142/s0219720019500070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The prediction of protein structure from its amino acid sequence is one of the most prominent problems in computational biology. The biological function of a protein depends on its tertiary structure which is determined by its amino acid sequence via the process of protein folding. We propose a novel fold recognition method for protein tertiary structure prediction based on a hidden Markov model and 3D coordinates of amino acid residues. The method introduces states based on the basis vectors in Bravais cubic lattices to learn the path of amino acids of the proteins of each fold. Three hidden Markov models are considered based on simple cubic, body-centered cubic (BCC) and face-centered cubic (FCC) lattices. A 10-fold cross validation was performed on a set of 42 fold SCOP dataset. The proposed composite methodology is compared to fold recognition methods which have HMM as base of their algorithms having approaches on only amino acid sequence or secondary structure. The accuracy of proposed model based on face-centered cubic lattices is quite better in comparison with SAM, 3-HMM optimized and Markov chain optimized in overall experiment. The huge data of 3D space help the model to have greater performance in comparison to methods which use only primary structures or only secondary structures.
Collapse
Affiliation(s)
- Farzad Peyravi
- * Department of Computer Engineering, Yazd University, Yazd, Iran
| | | | | |
Collapse
|
3
|
A Composite Approach to Protein Tertiary Structure Prediction: Hidden Markov Model Based on Lattice. Bull Math Biol 2018; 81:899-918. [DOI: 10.1007/s11538-018-00542-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 11/28/2018] [Indexed: 11/25/2022]
|
4
|
Rodriguez PM, Stratmann D, Duprat E, Papandreou N, Acuna R, Lacroix Z, Chomilier J. Correlating topology and thermodynamics to predict protein structure sensitivity to point mutations. BIO-ALGORITHMS AND MED-SYSTEMS 2018. [DOI: 10.1515/bams-2018-0026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractThe relation between distribution of hydrophobic amino acids along with protein chains and their structure is far from being completely understood. No reliable method allowsab initioprediction of the folded structure from this distribution of physicochemical properties, even when they are highly degenerated by considering only two classes: hydrophobic and polar. Establishment of long-range hydrophobic three dimension (3D) contacts is essential for the formation of the nucleus, a key process in the early steps of protein folding. Thus, a large number of 3D simulation studies were developed to challenge this issue. They are nowadays evaluated in a specific chapter of the molecular modeling competition, Critical Assessment of Protein Structure Prediction. We present here a simulation of the early steps of the folding process for 850 proteins, performed in a discrete 3D space, which results in peaks in the predicted distribution of intra-chain noncovalent contacts. The residues located at these peak positions tend to be buried in the core of the protein and are expected to correspond to critical positions in the sequence, important both for folding and structural (or similarly, energetic in the thermodynamic hypothesis) stability. The degree of stabilization or destabilization due to a point mutation at the critical positions involved in numerous contacts is estimated from the calculated folding free energy difference between mutated and native structures. The results show that these critical positions are not tolerant towards mutation. This simulation of the noncovalent contacts only needs a sequence as input, and this paper proposes a validation of the method by comparison with the prediction of stability by well-established programs.
Collapse
|
5
|
Dubey SP, Balaji S, Kini NG, Sathish Kumar M. A Novel Framework for Ab Initio Coarse Protein Structure Prediction. Adv Bioinformatics 2018; 2018:7607384. [PMID: 30026759 PMCID: PMC6031167 DOI: 10.1155/2018/7607384] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 04/26/2018] [Accepted: 05/27/2018] [Indexed: 02/07/2023] Open
Abstract
Hydrophobic-Polar model is a simplified representation of Protein Structure Prediction (PSP) problem. However, even with the HP model, the PSP problem remains NP-complete. This work proposes a systematic and problem specific design for operators of the evolutionary program which hybrids with local search hill climbing, to efficiently explore the search space of PSP and thereby obtain an optimum conformation. The proposed algorithm achieves this by incorporating the following novel features: (i) new initialization method which generates only valid individuals with (rather than random) better fitness values; (ii) use of probability-based selection operators that limit the local convergence; (iii) use of secondary structure based mutation operator that makes the structure more closely to the laboratory determined structure; and (iv) incorporating all the above-mentioned features developed a complete two-tier framework. The developed framework builds the protein conformation on the square and triangular lattice. The test has been performed using benchmark sequences, and a comparative evaluation is done with various state-of-the-art algorithms. Moreover, in addition to hypothetical test sequences, we have tested protein sequences deposited in protein database repository. It has been observed that the proposed framework has shown superior performance regarding accuracy (fitness value) and speed (number of generations needed to attain the final conformation). The concepts used to enhance the performance are generic and can be used with any other population-based search algorithm such as genetic algorithm, ant colony optimization, and immune algorithm.
Collapse
Affiliation(s)
- Sandhya Parasnath Dubey
- Department of Computer Science & Eng., Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India
| | - S. Balaji
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India
| | - N. Gopalakrishna Kini
- Department of Computer Science & Eng., Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India
| | - M. Sathish Kumar
- Department of ECE, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India
| |
Collapse
|
6
|
Induction of broadly neutralizing antibodies in Germinal Centre simulations. Curr Opin Biotechnol 2018; 51:137-145. [DOI: 10.1016/j.copbio.2018.01.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 01/04/2018] [Indexed: 11/16/2022]
|
7
|
Ramyachitra D, Veeralakshmi V. Bacterial Foraging Optimization for protein structure prediction using FCC & HP energy model. GENE REPORTS 2017. [DOI: 10.1016/j.genrep.2017.01.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
8
|
Hao XH, Zhang GJ, Zhou XG, Yu XF. A Novel Method Using Abstract Convex Underestimation in Ab-Initio Protein Structure Prediction for Guiding Search in Conformational Feature Space. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:887-900. [PMID: 26552093 DOI: 10.1109/tcbb.2015.2497226] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
To address the searching problem of protein conformational space in ab-initio protein structure prediction, a novel method using abstract convex underestimation (ACUE) based on the framework of evolutionary algorithm was proposed. Computing such conformations, essential to associate structural and functional information with gene sequences, is challenging due to the high-dimensionality and rugged energy surface of the protein conformational space. As a consequence, the dimension of protein conformational space should be reduced to a proper level. In this paper, the high-dimensionality original conformational space was converted into feature space whose dimension is considerably reduced by feature extraction technique. And, the underestimate space could be constructed according to abstract convex theory. Thus, the entropy effect caused by searching in the high-dimensionality conformational space could be avoided through such conversion. The tight lower bound estimate information was obtained to guide the searching direction, and the invalid searching area in which the global optimal solution is not located could be eliminated in advance. Moreover, instead of expensively calculating the energy of conformations in the original conformational space, the estimate value is employed to judge if the conformation is worth exploring to reduce the evaluation time, thereby making computational cost lower and the searching process more efficient. Additionally, fragment assembly and the Monte Carlo method are combined to generate a series of metastable conformations by sampling in the conformational space. The proposed method provides a novel technique to solve the searching problem of protein conformational space. Twenty small-to-medium structurally diverse proteins were tested, and the proposed ACUE method was compared with It Fix, HEA, Rosetta and the developed method LEDE without underestimate information. Test results show that the ACUE method can more rapidly and more efficiently obtain the near-native protein structure.
Collapse
|
9
|
Maximova T, Moffatt R, Ma B, Nussinov R, Shehu A. Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics. PLoS Comput Biol 2016; 12:e1004619. [PMID: 27124275 PMCID: PMC4849799 DOI: 10.1371/journal.pcbi.1004619] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Investigation of macromolecular structure and dynamics is fundamental to understanding how macromolecules carry out their functions in the cell. Significant advances have been made toward this end in silico, with a growing number of computational methods proposed yearly to study and simulate various aspects of macromolecular structure and dynamics. This review aims to provide an overview of recent advances, focusing primarily on methods proposed for exploring the structure space of macromolecules in isolation and in assemblies for the purpose of characterizing equilibrium structure and dynamics. In addition to surveying recent applications that showcase current capabilities of computational methods, this review highlights state-of-the-art algorithmic techniques proposed to overcome challenges posed in silico by the disparate spatial and time scales accessed by dynamic macromolecules. This review is not meant to be exhaustive, as such an endeavor is impossible, but rather aims to balance breadth and depth of strategies for modeling macromolecular structure and dynamics for a broad audience of novices and experts.
Collapse
Affiliation(s)
- Tatiana Maximova
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
| | - Ryan Moffatt
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
| | - Buyong Ma
- Basic Science Program, Leidos Biomedical Research, Inc. Cancer and Inflammation Program, National Cancer Institute, Frederick, Maryland, United States of America
| | - Ruth Nussinov
- Basic Science Program, Leidos Biomedical Research, Inc. Cancer and Inflammation Program, National Cancer Institute, Frederick, Maryland, United States of America
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
- Department of Biongineering, George Mason University, Fairfax, Virginia, United States of America
- School of Systems Biology, George Mason University, Manassas, Virginia, United States of America
| |
Collapse
|
10
|
Rashid MA, Iqbal S, Khatib F, Hoque MT, Sattar A. Guided macro-mutation in a graded energy based genetic algorithm for protein structure prediction. Comput Biol Chem 2016; 61:162-77. [PMID: 26878130 DOI: 10.1016/j.compbiolchem.2016.01.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2015] [Revised: 11/29/2015] [Accepted: 01/21/2016] [Indexed: 10/22/2022]
Abstract
Protein structure prediction is considered as one of the most challenging and computationally intractable combinatorial problem. Thus, the efficient modeling of convoluted search space, the clever use of energy functions, and more importantly, the use of effective sampling algorithms become crucial to address this problem. For protein structure modeling, an off-lattice model provides limited scopes to exercise and evaluate the algorithmic developments due to its astronomically large set of data-points. In contrast, an on-lattice model widens the scopes and permits studying the relatively larger proteins because of its finite set of data-points. In this work, we took the full advantage of an on-lattice model by using a face-centered-cube lattice that has the highest packing density with the maximum degree of freedom. We proposed a graded energy-strategically mixes the Miyazawa-Jernigan (MJ) energy with the hydrophobic-polar (HP) energy-based genetic algorithm (GA) for conformational search. In our application, we introduced a 2 × 2 HP energy guided macro-mutation operator within the GA to explore the best possible local changes exhaustively. Conversely, the 20 × 20 MJ energy model-the ultimate objective function of our GA that needs to be minimized-considers the impacts amongst the 20 different amino acids and allow searching the globally acceptable conformations. On a set of benchmark proteins, our proposed approach outperformed state-of-the-art approaches in terms of the free energy levels and the root-mean-square deviations.
Collapse
Affiliation(s)
- Mahmood A Rashid
- SCIMS, University of the South Pacific, Laucala Bay, Suva, Fiji; IIIS, Griffith University, Brisbane, QLD, Australia.
| | | | - Firas Khatib
- CIS, University of Massachusetts Dartmouth, MA, USA.
| | | | - Abdul Sattar
- IIIS, Griffith University, Brisbane, QLD, Australia.
| |
Collapse
|
11
|
A Multi-Objective Approach for Protein Structure Prediction Based on an Energy Model and Backbone Angle Preferences. Int J Mol Sci 2015; 16:15136-49. [PMID: 26151847 PMCID: PMC4519891 DOI: 10.3390/ijms160715136] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2015] [Revised: 06/25/2015] [Accepted: 06/25/2015] [Indexed: 11/17/2022] Open
Abstract
Protein structure prediction (PSP) is concerned with the prediction of protein tertiary structure from primary structure and is a challenging calculation problem. After decades of research effort, numerous solutions have been proposed for optimisation methods based on energy models. However, further investigation and improvement is still needed to increase the accuracy and similarity of structures. This study presents a novel backbone angle preference factor, which is one of the factors inducing protein folding. The proposed multiobjective optimisation approach simultaneously considers energy models and backbone angle preferences to solve the ab initio PSP. To prove the effectiveness of the multiobjective optimisation approach based on the energy models and backbone angle preferences, 75 amino acid sequences with lengths ranging from 22 to 88 amino acids were selected from the CB513 data set to be the benchmarks. The data sets were highly dissimilar, therefore indicating that they are meaningful. The experimental results showed that the root-mean-square deviation (RMSD) of the multiobjective optimization approach based on energy model and backbone angle preferences was superior to those of typical energy models, indicating that the proposed approach can facilitate the ab initio PSP.
Collapse
|
12
|
Shehu A. A Review of Evolutionary Algorithms for Computing Functional Conformations of Protein Molecules. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2015. [DOI: 10.1007/7653_2015_47] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
13
|
Santos J, Villot P, Diéguez M. Emergent protein folding modeled with evolved neural cellular automata using the 3D HP model. J Comput Biol 2014; 21:823-45. [PMID: 25343217 DOI: 10.1089/cmb.2014.0077] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We used cellular automata (CA) for the modeling of the temporal folding of proteins. Unlike the focus of the vast research already done on the direct prediction of the final folded conformations, we will model the temporal and dynamic folding process. To reduce the complexity of the interactions and the nature of the amino acid elements, lattice models like HP were used, a model that categorizes the amino acids regarding their hydrophobicity. Taking into account the restrictions of the lattice model, the CA model defines how the amino acids interact through time to obtain a folded conformation. We extended the classical CA models using artificial neural networks for their implementation (neural CA), and we used evolutionary computing to automatically obtain the models by means of Differential Evolution. As the iterative folding also provides the final folded conformation, we can compare the results with those from direct prediction methods of the final protein conformation. Finally, as the neural CA that provides the iterative folding process can be evolved using several protein sequences and used as operators in the folding of another protein with different length, this represents an advantage over the NP-hard complexity of the original problem of the direct prediction.
Collapse
Affiliation(s)
- José Santos
- Department of Computer Science, University of A Coruña , A Coruña, Spain
| | | | | |
Collapse
|
14
|
Liu J, Song B, Yao Y, Xue Y, Liu W, Liu Z. Wang-Landau sampling in face-centered-cubic hydrophobic-hydrophilic lattice model proteins. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 90:042715. [PMID: 25375531 DOI: 10.1103/physreve.90.042715] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2014] [Indexed: 06/04/2023]
Abstract
Finding the global minimum-energy structure is one of the main problems of protein structure prediction. The face-centered-cubic (fcc) hydrophobic-hydrophilic (HP) lattice model can reach high approximation ratios of real protein structures, so the fcc lattice model is a good choice to predict the protein structures. The lacking of an effective global optimization method is the key obstacle in solving this problem. The Wang-Landau sampling method is especially useful for complex systems with a rough energy landscape and has been successfully applied to solving many optimization problems. We apply the improved Wang-Landau (IWL) sampling method, which incorporates the generation of an initial conformation based on the greedy strategy and the neighborhood strategy based on pull moves into the Wang-Landau sampling method to predict the protein structures on the fcc HP lattice model. Unlike conventional Monte Carlo simulations that generate a probability distribution at a given temperature, the Wang-Landau sampling method can estimate the density of states accurately via a random walk, which produces a flat histogram in energy space. We test 12 general benchmark instances on both two-dimensional and three-dimensional (3D) fcc HP lattice models. The lowest energies by the IWL sampling method are as good as or better than those of other methods in the literature for all instances. We then test five sets of larger-scale instances, denoted by the S, R, F90, F180, and CASP target instances on the 3D fcc HP lattice model. The numerical results show that our algorithm performs better than the other five methods in the literature on both the lowest energies and the average lowest energies in all runs. The IWL sampling method turns out to be a powerful tool to study the structure prediction of the fcc HP lattice model proteins.
Collapse
Affiliation(s)
- Jingfa Liu
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, 210044, China and School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing, 210044, China and Network Information Center, Nanjing University of Information Science & Technology, Nanjing 210044, China
| | - Beibei Song
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, 210044, China and School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing, 210044, China
| | - Yonglei Yao
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, 210044, China and School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing, 210044, China
| | - Yu Xue
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, 210044, China and School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing, 210044, China
| | - Wenjie Liu
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, 210044, China and School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing, 210044, China
| | - Zhaoxia Liu
- Network Information Center, Nanjing University of Information Science & Technology, Nanjing 210044, China
| |
Collapse
|
15
|
How good are simplified models for protein structure prediction? Adv Bioinformatics 2014; 2014:867179. [PMID: 24876837 PMCID: PMC4022063 DOI: 10.1155/2014/867179] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 01/22/2014] [Accepted: 01/23/2014] [Indexed: 11/18/2022] Open
Abstract
Protein structure prediction (PSP) has been one of the most challenging problems in computational biology for several decades. The challenge is largely due to the complexity of the all-atomic details and the unknown nature of the energy function. Researchers have therefore used simplified energy models that consider interaction potentials only between the amino acid monomers in contact on discrete lattices. The restricted nature of the lattices and the energy models poses a twofold concern regarding the assessment of the models. Can a native or a very close structure be obtained when structures are mapped to lattices? Can the contact based energy models on discrete lattices guide the search towards the native structures? In this paper, we use the protein chain lattice fitting (PCLF) problem to address the first concern; we developed a constraint-based local search algorithm for the PCLF problem for cubic and face-centered cubic lattices and found very close lattice fits for the native structures. For the second concern, we use a number of techniques to sample the conformation space and find correlations between energy functions and root mean square deviation (RMSD) distance of the lattice-based structures with the native structures. Our analysis reveals weakness of several contact based energy models used that are popular in PSP.
Collapse
|
16
|
A Parallel Framework for Multipoint Spiral Search in ab Initio Protein Structure Prediction. Adv Bioinformatics 2014; 2014:985968. [PMID: 24744779 PMCID: PMC3976798 DOI: 10.1155/2014/985968] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 02/04/2014] [Accepted: 02/06/2014] [Indexed: 11/17/2022] Open
Abstract
Protein structure prediction is computationally a very challenging problem. A large number of existing search algorithms attempt to solve the problem by exploring possible structures and finding the one with the minimum free energy. However, these algorithms perform poorly on large sized proteins due to an astronomically wide search space. In this paper, we present a multipoint spiral search framework that uses parallel processing techniques to expedite exploration by starting from different points. In our approach, a set of random initial solutions are generated and distributed to different threads. We allow each thread to run for a predefined period of time. The improved solutions are stored threadwise. When the threads finish, the solutions are merged together and the duplicates are removed. A selected distinct set of solutions are then split to different threads again. In our ab initio protein structure prediction method, we use the three-dimensional face-centred-cubic lattice for structure-backbone mapping. We use both the low resolution hydrophobic-polar energy model and the high-resolution 20 × 20 energy model for search guiding. The experimental results show that our new parallel framework significantly improves the results obtained by the state-of-the-art single-point search approaches for both energy models on three-dimensional face-centred-cubic lattice. We also experimentally show the effectiveness of mixing energy models within parallel threads.
Collapse
|
17
|
GUYEUX CHRISTOPHE, CÔTÉ NATHALIEML, BAHI JACQUESM, BIENIA WOJCIECH. IS PROTEIN FOLDING PROBLEM REALLY A NP-COMPLETE ONE? FIRST INVESTIGATIONS. J Bioinform Comput Biol 2014; 12:1350017. [DOI: 10.1142/s0219720013500170] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
To determine the 3D conformation of proteins is a necessity to understand their functions or interactions with other molecules. It is commonly admitted that, when proteins fold from their primary linear structures to their final 3D conformations, they tend to choose the ones that minimize their free energy. To find the 3D conformation of a protein knowing its amino acid sequence, bioinformaticians use various models of different resolutions and artificial intelligence tools, as the protein folding prediction problem is a NP complete one. More precisely, to determine the backbone structure of the protein using the low resolution models (2D HP square and 3D HP cubic), by finding the conformation that minimizes free energy, is intractable exactly. Both proofs of NP-completeness and the 2D prediction consider that acceptable conformations have to satisfy a self-avoiding walk (SAW) requirement, as two different amino acids cannot occupy a same position in the lattice. It is shown in this document that the SAW requirement considered when proving NP-completeness is different from the SAW requirement used in various prediction programs, and that they are different from the real biological requirement. Indeed, the proof of NP completeness and the predictions in silico consider conformations that are not possible in practice. Consequences of this fact are investigated in this research work.
Collapse
Affiliation(s)
- CHRISTOPHE GUYEUX
- FEMTO-ST Institute, UMR 6174 CNRS, University of Franche-Comté, Besançon, France
| | - NATHALIE M.-L. CÔTÉ
- Laboratoire de Biologie du Développement, UMR 7622, Université Pierre et Marie Curie, Paris, France
| | - JACQUES M. BAHI
- FEMTO-ST Institute, UMR 6174 CNRS, University of Franche-Comté, Besançon, France
| | - WOJCIECH BIENIA
- G-SCOP Laboratory, ENSIMAG, 46 Avenue Félix Viallet, F-38031 Grenoble Cedex 1, France
| |
Collapse
|
18
|
Maher B, Albrecht AA, Loomes M, Yang XS, Steinhöfel K. A firefly-inspired method for protein structure prediction in lattice models. Biomolecules 2014; 4:56-75. [PMID: 24970205 PMCID: PMC4030990 DOI: 10.3390/biom4010056] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2013] [Revised: 12/17/2013] [Accepted: 12/27/2013] [Indexed: 02/05/2023] Open
Abstract
We introduce a Firefly-inspired algorithmic approach for protein structure prediction over two different lattice models in three-dimensional space. In particular, we consider three-dimensional cubic and three-dimensional face-centred-cubic (FCC) lattices. The underlying energy models are the Hydrophobic-Polar (H-P) model, the Miyazawa–Jernigan (M-J) model and a related matrix model. The implementation of our approach is tested on ten H-P benchmark problems of a length of 48 and ten M-J benchmark problems of a length ranging from 48 until 61. The key complexity parameter we investigate is the total number of objective function evaluations required to achieve the optimum energy values for the H-P model or competitive results in comparison to published values for the M-J model. For H-P instances and cubic lattices, where data for comparison are available, we obtain an average speed-up over eight instances of 2.1, leaving out two extreme values (otherwise, 8.8). For six M-J instances, data for comparison are available for cubic lattices and runs with a population size of 100, where, a priori, the minimum free energy is a termination criterion. The average speed-up over four instances is 1.2 (leaving out two extreme values, otherwise 1.1), which is achieved for a population size of only eight instances. The present study is a test case with initial results for ad hoc parameter settings, with the aim of justifying future research on larger instances within lattice model settings, eventually leading to the ultimate goal of implementations for off-lattice models.
Collapse
Affiliation(s)
- Brian Maher
- Department of Informatics, King's College London, Strand, London WC2R 2LS, UK.
| | - Andreas A Albrecht
- School of Science and Technology, Middlesex University, The Burroughs, London, NW4 4BT, UK.
| | - Martin Loomes
- School of Science and Technology, Middlesex University, The Burroughs, London, NW4 4BT, UK.
| | - Xin-She Yang
- School of Science and Technology, Middlesex University, The Burroughs, London, NW4 4BT, UK.
| | - Kathleen Steinhöfel
- Department of Informatics, King's College London, Strand, London WC2R 2LS, UK.
| |
Collapse
|
19
|
Liu J, Song B, Liu Z, Huang W, Sun Y, Liu W. Energy-landscape paving for prediction of face-centered-cubic hydrophobic-hydrophilic lattice model proteins. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 88:052704. [PMID: 24329293 DOI: 10.1103/physreve.88.052704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Indexed: 06/03/2023]
Abstract
Protein structure prediction (PSP) is a classical NP-hard problem in computational biology. The energy-landscape paving (ELP) method is a class of heuristic global optimization algorithm, and has been successfully applied to solving many optimization problems with complex energy landscapes in the continuous space. By putting forward a new update mechanism of the histogram function in ELP and incorporating the generation of initial conformation based on the greedy strategy and the neighborhood search strategy based on pull moves into ELP, an improved energy-landscape paving (ELP+) method is put forward. Twelve general benchmark instances are first tested on both two-dimensional and three-dimensional (3D) face-centered-cubic (fcc) hydrophobic-hydrophilic (HP) lattice models. The lowest energies by ELP+ are as good as or better than those of other methods in the literature for all instances. Then, five sets of larger-scale instances, denoted by S, R, F90, F180, and CASP target instances on the 3D FCC HP lattice model are tested. The proposed algorithm finds lower energies than those by the five other methods in literature. Not unexpectedly, this is particularly pronounced for the longer sequences considered. Computational results show that ELP+ is an effective method for PSP on the fcc HP lattice model.
Collapse
Affiliation(s)
- Jingfa Liu
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing 210044, China and Network Information Center, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Beibei Song
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing 210044, China and School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Zhaoxia Liu
- Network Information Center, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Weibo Huang
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing 210044, China and School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Yuanyuan Sun
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing 210044, China and School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Wenjie Liu
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing 210044, China and School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
| |
Collapse
|
20
|
Rashid MA, Newton MAH, Hoque MT, Sattar A. Mixing energy models in genetic algorithms for on-lattice protein structure prediction. BIOMED RESEARCH INTERNATIONAL 2013; 2013:924137. [PMID: 24224180 PMCID: PMC3800614 DOI: 10.1155/2013/924137] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Revised: 08/16/2013] [Accepted: 08/19/2013] [Indexed: 02/06/2023]
Abstract
Protein structure prediction (PSP) is computationally a very challenging problem. The challenge largely comes from the fact that the energy function that needs to be minimised in order to obtain the native structure of a given protein is not clearly known. A high resolution 20 × 20 energy model could better capture the behaviour of the actual energy function than a low resolution energy model such as hydrophobic polar. However, the fine grained details of the high resolution interaction energy matrix are often not very informative for guiding the search. In contrast, a low resolution energy model could effectively bias the search towards certain promising directions. In this paper, we develop a genetic algorithm that mainly uses a high resolution energy model for protein structure evaluation but uses a low resolution HP energy model in focussing the search towards exploring structures that have hydrophobic cores. We experimentally show that this mixing of energy models leads to significant lower energy structures compared to the state-of-the-art results.
Collapse
Affiliation(s)
- Mahmood A. Rashid
- Institute for Integrated & Intelligent Systems, Science 2 (N34) 1.45, 170 Kessels Road, Nathan, QLD 4111, Australia
- Queensland Research Lab, National ICT Australia, Level 8, Y Block, 2 George Street, Brisbane, QLD 4000, Australia
| | - M. A. Hakim Newton
- Institute for Integrated & Intelligent Systems, Science 2 (N34) 1.45, 170 Kessels Road, Nathan, QLD 4111, Australia
| | - Md. Tamjidul Hoque
- Computer Science, 2000 Lakeshore drive, Math 308, New Orleans, LA 70148, USA
| | - Abdul Sattar
- Institute for Integrated & Intelligent Systems, Science 2 (N34) 1.45, 170 Kessels Road, Nathan, QLD 4111, Australia
- Queensland Research Lab, National ICT Australia, Level 8, Y Block, 2 George Street, Brisbane, QLD 4000, Australia
| |
Collapse
|
21
|
Bechini A. On the characterization and software implementation of general protein lattice models. PLoS One 2013; 8:e59504. [PMID: 23555684 PMCID: PMC3612044 DOI: 10.1371/journal.pone.0059504] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Accepted: 02/13/2013] [Indexed: 11/19/2022] Open
Abstract
models of proteins have been widely used as a practical means to computationally investigate general properties of the system. In lattice models any sterically feasible conformation is represented as a self-avoiding walk on a lattice, and residue types are limited in number. So far, only two- or three-dimensional lattices have been used. The inspection of the neighborhood of alpha carbons in the core of real proteins reveals that also lattices with higher coordination numbers, possibly in higher dimensional spaces, can be adopted. In this paper, a new general parametric lattice model for simplified protein conformations is proposed and investigated. It is shown how the supporting software can be consistently designed to let algorithms that operate on protein structures be implemented in a lattice-agnostic way. The necessary theoretical foundations are developed and organically presented, pinpointing the role of the concept of main directions in lattice-agnostic model handling. Subsequently, the model features across dimensions and lattice types are explored in tests performed on benchmark protein sequences, using a Python implementation. Simulations give insights on the use of square and triangular lattices in a range of dimensions. The trend of potential minimum for sequences of different lengths, varying the lattice dimension, is uncovered. Moreover, an extensive quantitative characterization of the usage of the so-called "move types" is reported for the first time. The proposed general framework for the development of lattice models is simple yet complete, and an object-oriented architecture can be proficiently employed for the supporting software, by designing ad-hoc classes. The proposed framework represents a new general viewpoint that potentially subsumes a number of solutions previously studied. The adoption of the described model pushes to look at protein structure issues from a more general and essential perspective, making computational investigations over simplified models more straightforward as well.
Collapse
Affiliation(s)
- Alessio Bechini
- Department of Information Engineering, University of Pisa, Pisa, Italy.
| |
Collapse
|
22
|
Rashid MA, Newton MAH, Hoque MT, Shatabda S, Pham DN, Sattar A. Spiral search: a hydrophobic-core directed local search for simplified PSP on 3D FCC lattice. BMC Bioinformatics 2013; 14 Suppl 2:S16. [PMID: 23368706 PMCID: PMC3549848 DOI: 10.1186/1471-2105-14-s2-s16] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Protein structure prediction is an important but unsolved problem in biological science. Predicted structures vary much with energy functions and structure-mapping spaces. In our simplified ab initio protein structure prediction methods, we use hydrophobic-polar (HP) energy model for structure evaluation, and 3-dimensional face-centred-cubic lattice for structure mapping. For HP energy model, developing a compact hydrophobic-core (H-core) is essential for the progress of the search. The H-core helps find a stable structure with the lowest possible free energy. Results In order to build H-cores, we present a new Spiral Search algorithm based on tabu-guided local search. Our algorithm uses a novel H-core directed guidance heuristic that squeezes the structure around a dynamic hydrophobic-core centre. We applied random walks to break premature H-cores and thus to avoid early convergence. We also used a novel relay-restart technique to handle stagnation. Conclusions We have tested our algorithms on a set of benchmark protein sequences. The experimental results show that our spiral search algorithm outperforms the state-of-the-art local search algorithms for simplified protein structure prediction. We also experimentally show the effectiveness of the relay-restart.
Collapse
Affiliation(s)
- Mahmood A Rashid
- Institute for Integrated & Intelligent Systems, Griffith University, QLD, Australia.
| | | | | | | | | | | |
Collapse
|
23
|
Shatabda S, Hakim Newton MA, Rashid MA, Pham DN, Sattar A. The road not taken: retreat and diverge in local search for simplified protein structure prediction. BMC Bioinformatics 2013; 14 Suppl 2:S19. [PMID: 23368768 PMCID: PMC3549842 DOI: 10.1186/1471-2105-14-s2-s19] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Given a protein's amino acid sequence, the protein structure prediction problem is to find a three dimensional structure that has the native energy level. For many decades, it has been one of the most challenging problems in computational biology. A simplified version of the problem is to find an on-lattice self-avoiding walk that minimizes the interaction energy among the amino acids. Local search methods have been preferably used in solving the protein structure prediction problem for their efficiency in finding very good solutions quickly. However, they suffer mainly from two problems: re-visitation and stagnancy. RESULTS In this paper, we present an efficient local search algorithm that deals with these two problems. During search, we select the best candidate at each iteration, but store the unexplored second best candidates in a set of elite conformations, and explore them whenever the search faces stagnation. Moreover, we propose a new non-isomorphic encoding for the protein conformations to store the conformations and to check similarity when applied with a memory based search. This new encoding helps eliminate conformations that are equivalent under rotation and translation, and thus results in better prevention of re-visitation. CONCLUSION On standard benchmark proteins, our algorithm significantly outperforms the state-of-the art approaches for Hydrophobic-Polar energy models and Face Centered Cubic Lattice.
Collapse
Affiliation(s)
- Swakkhar Shatabda
- Institute of Intelligent and Integrated Systems, Griffith University, Queensland, Australia
- Queensland Research Laboratory, National ICT of Australia
| | - MA Hakim Newton
- Institute of Intelligent and Integrated Systems, Griffith University, Queensland, Australia
- Queensland Research Laboratory, National ICT of Australia
| | - Mahmood A Rashid
- Institute of Intelligent and Integrated Systems, Griffith University, Queensland, Australia
- Queensland Research Laboratory, National ICT of Australia
| | - Duc Nghia Pham
- Institute of Intelligent and Integrated Systems, Griffith University, Queensland, Australia
- Queensland Research Laboratory, National ICT of Australia
| | - Abdul Sattar
- Institute of Intelligent and Integrated Systems, Griffith University, Queensland, Australia
- Queensland Research Laboratory, National ICT of Australia
| |
Collapse
|