1
|
da Silva LSA, Seman LO, Camponogara E, Mariani VC, Dos Santos Coelho L. Bilinear optimization of protein structure prediction: An exact approach via AB off-lattice model. Comput Biol Med 2024; 176:108558. [PMID: 38754216 DOI: 10.1016/j.compbiomed.2024.108558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/25/2024] [Accepted: 05/05/2024] [Indexed: 05/18/2024]
Abstract
Protein structure prediction (PSP) remains a central challenge in computational biology due to its inherent complexity and high dimensionality. While numerous heuristic approaches have appeared in the literature, their success varies. The AB off-lattice model, which characterizes proteins as sequences of A (hydrophobic) and B (hydrophilic) beads, presents a simplified perspective on PSP. This work presents a mathematical optimization-based methodology capitalizing on the off-lattice AB model. Dissecting the inherent non-linearities of the energy landscape of protein folding allowed for formulating the PSP as a bilinear optimization problem. This formulation was achieved by introducing auxiliary variables and constraints that encapsulate the nuanced relationship between the protein's conformational space and its energy landscape. The proposed bilinear model exhibited notable accuracy in pinpointing the global minimum energy conformations on a benchmark dataset presented by the Protein Data Bank (PDB). Compared to traditional heuristic-based methods, this bilinear approach yielded exact solutions, reducing the likelihood of local minima entrapment. This research highlights the potential of reframing the traditionally non-linear protein structure prediction problem into a bilinear optimization problem through the off-lattice AB model. Such a transformation offers a route toward methodologies that can determine the global solution, challenging current PSP paradigms. Exploration into hybrid models, merging bilinear optimization and heuristic components, might present an avenue for balancing accuracy with computational efficiency.
Collapse
Affiliation(s)
- Luiza Scapinello Aquino da Silva
- Electrical Engineering Graduate Program (PPGEE), Federal University of Parana (UFPR), Coronel Francisco Heraclito dos Santos, Curitiba, 81530-000, Paraná, Brazil.
| | - Laio Oriel Seman
- Department of Automation and Systems Engineering, Federal University of Santa Catarina (UFSC), Engenheiro Agronômico Andrei Cristian Ferreira, Florianópolis, 88040-900, Santa Catarina, Brazil
| | - Eduardo Camponogara
- Department of Automation and Systems Engineering, Federal University of Santa Catarina (UFSC), Engenheiro Agronômico Andrei Cristian Ferreira, Florianópolis, 88040-900, Santa Catarina, Brazil
| | - Viviana Cocco Mariani
- Electrical Engineering Graduate Program (PPGEE), Federal University of Parana (UFPR), Coronel Francisco Heraclito dos Santos, Curitiba, 81530-000, Paraná, Brazil; Mechanical Engineering Graduate Program (PGMec), Federal University of Parana (UFPR), Coronel Francisco Heraclito dos Santos, Curitiba, 81530-000, Paraná, Brazil
| | - Leandro Dos Santos Coelho
- Electrical Engineering Graduate Program (PPGEE), Federal University of Parana (UFPR), Coronel Francisco Heraclito dos Santos, Curitiba, 81530-000, Paraná, Brazil
| |
Collapse
|
2
|
Filgueiras JL, Varela D, Santos J. Protein structure prediction with energy minimization and deep learning approaches. NATURAL COMPUTING 2023:1-12. [PMID: 37363286 PMCID: PMC10165305 DOI: 10.1007/s11047-023-09943-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 04/12/2023] [Indexed: 06/28/2023]
Abstract
In this paper we discuss the advantages and problems of two alternatives for ab initio protein structure prediction. On one hand, recent approaches based on deep learning, which have significantly improved prediction results for a wide variety of proteins, are discussed. On the other hand, methods based on protein conformational energy minimization and with different search strategies are analyzed. In this latter case, our methods based on a memetic combination between differential evolution and the fragment replacement technique are included, incorporating also the possibility of niching in the evolutionary search. Different proteins have been used to analyze the pros and cons in both approaches, proposing possibilities of integration of both alternatives.
Collapse
Affiliation(s)
- Juan Luis Filgueiras
- Department of Computer Science and Information Technologies, CITIC (Centre for Information and Communications Technology Research), University of A Coruña, A Coruña, Spain
| | - Daniel Varela
- Department of Computer Science and Information Technologies, CITIC (Centre for Information and Communications Technology Research), University of A Coruña, A Coruña, Spain
| | - José Santos
- Department of Computer Science and Information Technologies, CITIC (Centre for Information and Communications Technology Research), University of A Coruña, A Coruña, Spain
| |
Collapse
|
3
|
Issa M. Expeditious COVID-19 similarity measure tool based on consolidated SCA algorithm with mutation and opposition operators. Appl Soft Comput 2021; 104:107197. [PMID: 33642960 PMCID: PMC7895693 DOI: 10.1016/j.asoc.2021.107197] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Revised: 02/09/2021] [Accepted: 02/15/2021] [Indexed: 11/21/2022]
Abstract
COVID-19 is a global pandemic that aroused the interest of scientists to prevent it and design a drug for it. Nowadays, presenting intelligent biological data analysis tools at a low cost is important to analyze the biological structure of COVID-19. The global alignment algorithm is one of the important bioinformatics tools that measure the most accurate similarity between a pair of biological sequences. The huge time consumption of the standard global alignment algorithm is its main limitation especially for sequences with huge lengths. This work proposed a fast global alignment tool (G-Aligner) based on meta-heuristic algorithms that estimate similarity measurements near the exact ones at a reasonable time with low cost. The huge length of sequences leads G-Aligner based on standard Sine–Cosine optimization algorithm (SCA) to trap in local minima. Therefore, an improved version of SCA was presented in this work that is based on integration with PSO. Besides, mutation and opposition operators are applied to enhance the exploration capability and avoiding trapping in local minima. The performance of the improved SCA algorithm (SP-MO) was evaluated on a set of IEEE CEC functions. Besides, G-Aligner based on the SP-MO algorithm was tested to measure the similarity of real biological sequence. It was used also to measure the similarity of the COVID-19 virus with the other 13 viruses to validate its performance. The tests concluded that the SP-MO algorithm has superiority over the relevant studies in the literature and produce the highest average similarity measurements 75% of the exact one.
Collapse
Affiliation(s)
- Mohamed Issa
- Computer and Systems Department, Faculty of Engineering, Zagazig University, Zagazig, Egypt.,Faculty of Computers and Informatics, Nahda University, Beni Suef, Egypt
| |
Collapse
|
4
|
Esfandi B, Atabati M. Sequential Dihedral Angles (SDAs): A Method for Evaluating the 3D Structure of Proteins. Protein J 2021; 40:1-7. [PMID: 33442828 DOI: 10.1007/s10930-020-09961-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/31/2020] [Indexed: 11/29/2022]
Abstract
One of the most important steps in modeling three-dimensional (3D) structures of proteins is the evaluation of the constructed models. The present study suggests that the correctness of a structure may be tested by using the characteristics of sequential dihedral angles (SDAs) between adjacent alpha-carbons (Cα) in the main chains of proteins. From our studies on protein structures in the protein data bank (PDB), the SDAs between the Cα in the main chains are limited in their values. In addition, the sum of the absolute values of the three sequential dihedral angles (SDAs) can never be 0 degree. Moreover, 48 degrees is the lowest value existing for the sum of the absolute values of three sequential dihedral angles (SDAs). Thus, the SDAs between the alpha-carbons along the main chains of proteins may be a useful parameter for evaluating anomalies in protein structures.
Collapse
Affiliation(s)
- Babak Esfandi
- School of Chemistry, Damghan University, Damghan, Iran
| | | |
Collapse
|
5
|
Zhang L, Ma H, Qian W, Li H. Sequence-based protein structure optimization using enhanced simulated annealing algorithm on a coarse-grained model. J Mol Model 2020; 26:250. [PMID: 32833195 DOI: 10.1007/s00894-020-04490-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 07/30/2020] [Indexed: 12/28/2022]
Abstract
The understanding of protein structure is vital to determine biological function. We presented an enhanced simulated annealing (ESA) algorithm to investigate protein three-dimensional (3D) structure on a coarse-grained model. Inside the algorithm, we adjusted exploration equations to achieve good search intensity. To that end, our algorithm used (i) a multivariable disturbance operator for diversification of solution, (ii) a sign function to improve randomness of solution, and (iii) taking remainder operation performed on floating-point number to tackle out-of-range solution. By monitoring energy value throughout the simulation, the energy-optimal state can be found. The ESA algorithm was tested on artificial and real protein sequences with different lengths. The results show that our algorithm outperforms conventional simulated annealing algorithm and can compete with the reported algorithms before. Especially, our algorithm can obtain folding conformations with specific structural features. Further analysis shows that simulating trajectory of seeking the lowest energy can exhibit thermodynamic behavior of protein folding. Graphical Abstract.
Collapse
Affiliation(s)
- Lizhong Zhang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China.,College of Computer Science and Technology, Shenyang University of Chemical Technology, Shenyang, 110142, China
| | - He Ma
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China. .,Key Laboratory of Medical Image Computing, Ministry of Education, Northeastern University, Shenyang, 110169, China.
| | - Wei Qian
- Department of Electrical and Computer Engineering, College of Engineering, University of Texas, El Paso, TX, 79968, USA
| | - Haiyan Li
- College of Pharmaceutical and Bioengineering, Shenyang University of Chemical Technology, Shenyang, 110142, China
| |
Collapse
|
6
|
Zhang L, Ma H, Qian W, Li H. Protein structure optimization using improved simulated annealing algorithm on a three-dimensional AB off-lattice model. Comput Biol Chem 2020; 85:107237. [PMID: 32109854 DOI: 10.1016/j.compbiolchem.2020.107237] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Revised: 02/11/2020] [Accepted: 02/15/2020] [Indexed: 01/01/2023]
Abstract
This paper proposed an improved simulated annealing (ISA) algorithm for protein structure optimization based on a three-dimensional AB off-lattice model. In the algorithm, we provided a general formula used for producing initial solution, and designed a multivariable disturbance term, relating to the parameters of simulated annealing and a tuned constant, to generate neighborhood solution. To avoid missing optimal solution, storage operation was performed in searching process. We applied the algorithm to test artificial protein sequences from literature and constructed a benchmark dataset consisting of 10 real protein sequences from the Protein Data Bank (PDB). Otherwise, we generated Cα space-filling model to represent protein folding conformation. The results indicate our algorithm outperforms the five methods before in searching lower energies of artificial protein sequences. In the testing on real proteins, our method can achieve the energy conformations with Cα-RMSD less than 3.0 Å from the PDB structures. Moreover, Cα space-filling model may simulate dynamic change of protein folding conformation at atomic level.
Collapse
Affiliation(s)
- Lizhong Zhang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China; College of Computer Science and Technology, Shenyang University of Chemical Technology, Shenyang 110142, China
| | - He Ma
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China; Key Laboratory of Medical Image Computing (Northeastern University), Ministry of Education, Shenyang 110169, China.
| | - Wei Qian
- Department of Electrical and Computer Engineering, College of Engineering, University of Texas, El Paso TX 79968, USA
| | - Haiyan Li
- College of Pharmaceutical and Bioengineering, Shenyang University of Chemical Technology, Shenyang 110142, China
| |
Collapse
|
7
|
Wu H, Yang R, Fu Q, Chen J, Lu W, Li H. Research on predicting 2D-HP protein folding using reinforcement learning with full state space. BMC Bioinformatics 2019; 20:685. [PMID: 31874607 PMCID: PMC6929271 DOI: 10.1186/s12859-019-3259-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein structure prediction has always been an important issue in bioinformatics. Prediction of the two-dimensional structure of proteins based on the hydrophobic polarity model is a typical non-deterministic polynomial hard problem. Currently reported hydrophobic polarity model optimization methods, greedy method, brute-force method, and genetic algorithm usually cannot converge robustly to the lowest energy conformations. Reinforcement learning with the advantages of continuous Markov optimal decision-making and maximizing global cumulative return is especially suitable for solving global optimization problems of biological sequences. RESULTS In this study, we proposed a novel hydrophobic polarity model optimization method derived from reinforcement learning which structured the full state space, and designed an energy-based reward function and a rigid overlap detection rule. To validate the performance, sixteen sequences were selected from the classical data set. The results indicated that reinforcement learning with full states successfully converged to the lowest energy conformations against all sequences, while the reinforcement learning with partial states folded 50% sequences to the lowest energy conformations. Reinforcement learning with full states hits the lowest energy on an average 5 times, which is 40 and 100% higher than the three and zero hit by the greedy algorithm and reinforcement learning with partial states respectively in the last 100 episodes. CONCLUSIONS Our results indicate that reinforcement learning with full states is a powerful method for predicting two-dimensional hydrophobic-polarity protein structure. It has obvious competitive advantages compared with greedy algorithm and reinforcement learning with partial states.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Ru Yang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Qiming Fu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China. .,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Jianping Chen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Weizhong Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Haiou Li
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| |
Collapse
|
8
|
Neural-based approaches to overcome feature selection and applicability domain in drug-related property prediction. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105777] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
9
|
Investigation of machine learning techniques on proteomics: A comprehensive survey. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2019; 149:54-69. [PMID: 31568792 DOI: 10.1016/j.pbiomolbio.2019.09.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 09/16/2019] [Accepted: 09/23/2019] [Indexed: 11/21/2022]
Abstract
Proteomics is the extensive investigation of proteins which has empowered the recognizable proof of consistently expanding quantities of protein. Proteins are necessary part of living life form, with numerous capacities. The proteome is the complete arrangement of proteins that are created or altered by a life form or framework of the organism. Proteome fluctuates with time and unambiguous prerequisites, or stresses, that a cell or organism experiences. Proteomics is an interdisciplinary area that has derived from the hereditary data of different genome ventures. Much proteomics information is gathered with the assistance of high throughput techniques, for example, mass spectrometry and microarray. It would regularly take weeks or months to analyze the information and perform examinations by hand. Therefore, scholars and scientific experts are teaming up with computer science researchers and mathematicians to make projects and pipeline to computationally examine the protein information. Utilizing bioinformatics procedures, scientists are prepared to do quicker investigation and protein information storing. The goal of this paper is to brief about the review of machine learning procedures and its application in the field of proteomics.
Collapse
|
10
|
Protein folding optimization using differential evolution extended with local search and component reinitialization. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.04.072] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
11
|
Gao S, Song S, Cheng J, Todo Y, Zhou M. Incorporation of Solvent Effect into Multi-Objective Evolutionary Algorithm for Improved Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1365-1378. [PMID: 28534784 DOI: 10.1109/tcbb.2017.2705094] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The problem of predicting the three-dimensional (3-D) structure of a protein from its one-dimensional sequence has been called the "holy grail of molecular biology", and it has become an important part of structural genomics projects. Despite the rapid developments in computer technology and computational intelligence, it remains challenging and fascinating. In this paper, to solve it we propose a multi-objective evolutionary algorithm. We decompose the protein energy function Chemistry at HARvard Macromolecular Mechanics force fields into bond and non-bond energies as the first and second objectives. Considering the effect of solvent, we innovatively adopt a solvent-accessible surface area as the third objective. We use 66 benchmark proteins to verify the proposed method and obtain better or competitive results in comparison with the existing methods. The results suggest the necessity to incorporate the effect of solvent into a multi-objective evolutionary algorithm to improve protein structure prediction in terms of accuracy and efficiency.
Collapse
|
12
|
Lin J, Zhong Y, Li E, Lin X, Zhang H. Multi-agent simulated annealing algorithm with parallel adaptive multiple sampling for protein structure prediction in AB off-lattice model. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2017.09.037] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
13
|
Bošković B, Brest J. Genetic algorithm with advanced mechanisms applied to the protein structure prediction in a hydrophobic-polar model and cubic lattice. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2016.04.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|