1
|
Li J, Wang L, Zhu Z, Song C. Exploring the Alternative Conformation of a Known Protein Structure Based on Contact Map Prediction. J Chem Inf Model 2024; 64:301-315. [PMID: 38117138 PMCID: PMC10777399 DOI: 10.1021/acs.jcim.3c01381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 12/03/2023] [Accepted: 12/05/2023] [Indexed: 12/21/2023]
Abstract
The rapid development of deep learning-based methods has considerably advanced the field of protein structure prediction. The accuracy of predicting the 3D structures of simple proteins is comparable to that of experimentally determined structures, providing broad possibilities for structure-based biological studies. Another critical question is whether and how multistate structures can be predicted from a given protein sequence. In this study, analysis of tens of two-state proteins demonstrated that deep learning-based contact map predictions contain structural information on both states, which suggests that it is probably appropriate to change the target of deep learning-based protein structure prediction from one specific structure to multiple likely structures. Furthermore, by combining deep learning- and physics-based computational methods, we developed a protocol for exploring alternative conformations from a known structure of a given protein, by which we successfully approached the holo-state conformations of multiple representative proteins from their apo-state structures.
Collapse
Affiliation(s)
- Jiaxuan Li
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Lei Wang
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Zefeng Zhu
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Chen Song
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| |
Collapse
|
2
|
Qi H, Wang T, Li H, Li C, Guan L, Liu W, Wang J, Lu F, Mao S, Qin HM. Sequence- and Structure-Based Mining of Thermostable D-Allulose 3-Epimerase and Computer-Guided Protein Engineering To Improve Enzyme Activity. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:18431-18442. [PMID: 37970673 DOI: 10.1021/acs.jafc.3c07204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2023]
Abstract
D-Allulose, a functional sweetener, can be synthesized from fructose using D-allulose 3-epimerase (DAEase). Nevertheless, a majority of the reported DAEases have inadequate stability under harsh industrial reaction conditions, which greatly limits their practical applications. In this study, big data mining combined with a computer-guided free energy calculation strategy was employed to discover a novel DAEase with excellent thermostability. Consensus sequence analysis of flexible regions and comparison of binding energies after substrate docking were performed using phylogeny-guided big data analyses. TtDAE from Thermogutta terrifontis was the most thermostable among 358 candidate enzymes, with a half-life of 32 h at 70 °C. Subsequently, structure-guided virtual screening and a customized strategy based on a combinatorial active-site saturation test/iterative saturation mutagenesis were utilized to engineer TtDAE. Finally, the catalytic activity of the M4 variant (P105A/L14C/T63G/I65A) was increased by 5.12-fold. Steered molecular dynamics simulations indicated that M4 had an enlarged substrate-binding pocket, which enhanced the fit between the enzyme and the substrate. The approach presented here, combining DAEases mining with further rational modification, provides guidance for obtaining promising catalysts for industrial-scale production.
Collapse
Affiliation(s)
- Hongbin Qi
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| | - Tong Wang
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| | - Huimin Li
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| | - Chao Li
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| | - Lijun Guan
- Institute of Food Processing, Heilongjiang Academy of Agricultural Sciences, Harbin 150086, China
| | - Weidong Liu
- Industrial Enzymes National Engineering Laboratory, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
| | - Jianwen Wang
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan
| | - Fuping Lu
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| | - Shuhong Mao
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| | - Hui-Min Qin
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| |
Collapse
|
3
|
Buehler MJ. Multiscale Modeling at the Interface of Molecular Mechanics and Natural Language through Attention Neural Networks. Acc Chem Res 2022; 55:3387-3403. [PMID: 36378952 DOI: 10.1021/acs.accounts.2c00330] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Humans are continually bombarded with massive amounts of data. To deal with this influx of information, we use the concept of attention in order to perceive the most relevant input from vision, hearing, touch, and others. Thereby, the complex ensemble of signals is used to generate output by querying the processed data in appropriate ways. Attention is also the hallmark of the development of scientific theories, where we elucidate which parts of a problem are critical, often expressed through differential equations. In this Account we review the emergence of attention-based neural networks as a class of approaches that offer many opportunities to describe materials across scales and modalities, including how universal building blocks interact to yield a set of material properties. In fact, the self-assembly of hierarchical, structurally complex, and multifunctional biomaterials remains a grand challenge in modeling, theory, and experiment. Expanding from the process by which material building blocks physically interact to form a type of material, in this Account we view self-assembly as both the functional emergence of properties from interacting building blocks as well as the physical process by which elementary building blocks interact and yield structure and, thereby, functions. This perspective, integrated through the theory of materiomics, allows us to solve multiscale problems with a first-principles-based computational approach based on attention-based neural networks that transform information to feature to property while providing a flexible modeling approach that can integrate theory, simulation, and experiment. Since these models are based on a natural language framework, they offer various benefits including incorporation of general domain knowledge via general-purpose pretraining, which can be accomplished without labeled data or large amounts of lower-quality data. Pretrained models then offer a general-purpose platform that can be fine-tuned to adapt these models to make specific predictions, often with relatively little labeled data. The transferrable power of the language-based modeling approach realizes a neural olog description, where mathematical categorization is learned by multiheaded attention, without domain knowledge in its formulation. It can hence be applied to a range of complex modeling tasks─such as physical field predictions, molecular properties, or structure predictions, all using an identical formulation. This offers a complementary modeling approach that is already finding numerous applications, with great potential to solve complex assembly problems, enabling us to learn, build, and utilize functional categorization of how building blocks yield a range of material functions. In this Account, we demonstrate the approach in various application areas, including protein secondary structure prediction and prediction of normal-mode frequencies as well as predicting mechanical fields near cracks. Unifying these diverse problem areas is the building block approach, where the models are based on a universally applicable platform that offers benefits ranging from transferability, interpretability, and cross-domain pollination of knowledge as exemplified through a transformer model applied to predict how musical compositions infer de novo protein structures. We discuss future potentialities of this approach for a variety of material phenomena across scales, including the use in multiparadigm modeling schemes.
Collapse
Affiliation(s)
- Markus J Buehler
- Laboratory for Atomistic and Molecular Mechanics, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, United States.,Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, United States
| |
Collapse
|
4
|
Cui L, Cui A, Li Q, Yang L, Liu H, Shao W, Feng Y. Molecular Evolution of an Aminotransferase Based on Substrate–Enzyme Binding Energy Analysis for Efficient Valienamine Synthesis. ACS Catal 2022. [DOI: 10.1021/acscatal.2c03784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Li Cui
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Anqi Cui
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Qitong Li
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Lezhou Yang
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hao Liu
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Wenguang Shao
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yan Feng
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
5
|
Rahman J, Newton MAH, Hasan MAM, Sattar A. A stacked meta-ensemble for protein inter-residue distance prediction. Comput Biol Med 2022; 148:105824. [PMID: 35863250 DOI: 10.1016/j.compbiomed.2022.105824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 06/21/2022] [Accepted: 07/03/2022] [Indexed: 11/25/2022]
Abstract
Predicted inter-residue distances are a key behind recent success in high quality protein structure prediction (PSP). However, prediction of both short and long distance values together is challenging. Consequently, predicted short distances are mostly used by existing PSP methods. In this paper, we use a stacked meta-ensemble method to combine deep learning models trained for different ranges of real-valued distances. On five benchmark sets of proteins, our proposed inter-residue distance prediction method improves mean Local Distance Different Test (LDDT) scores at least by 5% over existing such methods. Moreover, using a real-valued distance based conformational search algorithm, we also show that predicted long distances help obtain significantly better protein conformations than when only predicted short distances are used. Our method is named meta-ensemble for distance prediction (MDP) and its program is available from https://gitlab.com/mahnewton/mdp.
Collapse
Affiliation(s)
- Julia Rahman
- School of Information and Communication Technology, Griffith University, Queensland, Australia.
| | - M A Hakim Newton
- Institute of Integrated and Intelligent Systems, Griffith University, Queensland, Australia; School of Information and Physical Sciences, The University of Newcastle, New South Wales, Australia.
| | | | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Queensland, Australia; Institute of Integrated and Intelligent Systems, Griffith University, Queensland, Australia
| |
Collapse
|
6
|
Behjati A, Zare-Mirakabad F, Arab SS, Nowzari-Dalini A. Protein sequence profile prediction using ProtAlbert transformer. Comput Biol Chem 2022; 99:107717. [DOI: 10.1016/j.compbiolchem.2022.107717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 06/03/2022] [Accepted: 06/21/2022] [Indexed: 11/03/2022]
|
7
|
Angadi UB, Chaturvedi KK, Srivastava S, Rai A. A Novel Way of Comparing Protein 3D Structure Using Graph Partitioning Approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1361-1368. [PMID: 31494554 DOI: 10.1109/tcbb.2019.2938948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Alignment and comparison of protein 3D structures is an important and fundamental task in structural biology to study evolutionary, functional and structural relatedness among proteins. Since two decades, the research on protein structure alignment has been taken up on priority and numbers of research articles are being published. There are incremental advances over previous efforts, and still these methods continue to improve over the time and still this is an open problem in structural biology. A novel methodology has been developed for comparing protein 3D structure by employing conversion of pair of protein 3D structures into 2D graphs (undirected weighted graph), partitioning of 2D graphs into sub-graphs, matching sub-graphs with main graphs and finally these sub-graphs matches calculates similarity between the pair of proteins. The proposed method has been implemented in MATLAB and R Package. The performance of the developed methodology is tested with four existing best methods such as CE, jFATCAT, TM_Align and Dali on 100 proteins benchmark dataset with SCOP database. The proposed method is efficient in terms of time complexity, accuracy, grouping of proteins in relevant structural groups and provides additional information towards non-bonded interactions and sub-graphs indicates the dominance of secondary structure.
Collapse
|
8
|
Protein Structure Prediction: Conventional and Deep Learning Perspectives. Protein J 2021; 40:522-544. [PMID: 34050498 DOI: 10.1007/s10930-021-10003-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/21/2021] [Indexed: 10/21/2022]
Abstract
Protein structure prediction is a way to bridge the sequence-structure gap, one of the main challenges in computational biology and chemistry. Predicting any protein's accurate structure is of paramount importance for the scientific community, as these structures govern their function. Moreover, this is one of the complicated optimization problems that computational biologists have ever faced. Experimental protein structure determination methods include X-ray crystallography, Nuclear Magnetic Resonance Spectroscopy and Electron Microscopy. All of these are tedious and time-consuming procedures that require expertise. To make the process less cumbersome, scientists use predictive tools as part of computational methods, using data consolidated in the protein repositories. In recent years, machine learning approaches have raised the interest of the structure prediction community. Most of the machine learning approaches for protein structure prediction are centred on co-evolution based methods. The accuracy of these approaches depends on the number of homologous protein sequences available in the databases. The prediction problem becomes challenging for many proteins, especially those without enough sequence homologs. Deep learning methods allow for the extraction of intricate features from protein sequence data without making any intuitions. Accurately predicted protein structures are employed for drug discovery, antibody designs, understanding protein-protein interactions, and interactions with other molecules. This article provides a review of conventional and deep learning approaches in protein structure prediction. We conclude this review by outlining a few publicly available datasets and deep learning architectures currently employed for protein structure prediction tasks.
Collapse
|
9
|
Roche R, Bhattacharya S, Bhattacharya D. Hybridized distance- and contact-based hierarchical structure modeling for folding soluble and membrane proteins. PLoS Comput Biol 2021; 17:e1008753. [PMID: 33621244 PMCID: PMC7935296 DOI: 10.1371/journal.pcbi.1008753] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 03/05/2021] [Accepted: 01/31/2021] [Indexed: 11/18/2022] Open
Abstract
Crystallography and NMR system (CNS) is currently a widely used method for fragment-free ab initio protein folding from inter-residue distance or contact maps. Despite its widespread use in protein structure prediction, CNS is a decade-old macromolecular structure determination system that was originally developed for solving macromolecular geometry from experimental restraints as opposed to predictive modeling driven by interaction map data. As such, the adaptation of the CNS experimental structure determination protocol for ab initio protein folding is intrinsically anomalous that may undermine the folding accuracy of computational protein structure prediction. In this paper, we propose a new CNS-free hierarchical structure modeling method called DConStruct for folding both soluble and membrane proteins driven by distance and contact information. Rigorous experimental validation shows that DConStruct attains much better reconstruction accuracy than CNS when tested with the same input contact map at varying contact thresholds. The hierarchical modeling with iterative self-correction employed in DConStruct scales at a much higher degree of folding accuracy than CNS with the increase in contact thresholds, ultimately approaching near-optimal reconstruction accuracy at higher-thresholded contact maps. The folding accuracy of DConStruct can be further improved by exploiting distance-based hybrid interaction maps at tri-level thresholding, as demonstrated by the better performance of our method in folding free modeling targets from the 12th and 13th rounds of the Critical Assessment of techniques for protein Structure Prediction (CASP) experiments compared to popular CNS- and fragment-based approaches and energy-minimization protocols, some of which even using much finer-grained distance maps than ours. Additional large-scale benchmarking shows that DConStruct can significantly improve the folding accuracy of membrane proteins compared to a CNS-based approach. These results collectively demonstrate the feasibility of greatly improving the accuracy of ab initio protein folding by optimally exploiting the information encoded in inter-residue interaction maps beyond what is possible by CNS. Predicting the folded and functional 3-dimensional structure of a protein molecule from its amino acid sequence is of central importance to structural biology. Recently, promising advances have been made in ab initio protein folding due to the reasonably accurate estimation of inter-residue interaction maps at increasingly higher resolutions that range from binary contacts to finer-grained distances. Despite the progress in predicting the interaction maps, approaches for turning the residue-residue interactions projected in these maps into their precise spatial positioning heavily rely on a decade-old experimental structure determination protocol that is not suitable for predictive modeling. This paper presents a new hierarchical structure modeling method, DConStruct, which can better exploit the information encoded in the interaction maps at multiple granularities, from binary contact maps to distance-based hybrid maps at tri-level thresholding, for improved ab initio folding. Multiple large-scale benchmarking experiments show that our proposed method can substantially improve the folding accuracy for both soluble and membrane proteins compared to state-of-the-art approaches. DConStruct is licensed under the GNU General Public License v3 and freely available at https://github.com/Bhattacharya-Lab/DConStruct.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
- Department of Biological Sciences, Auburn University, Auburn, Alabama, United States of America
- * E-mail:
| |
Collapse
|
10
|
McGehee AJ, Bhattacharya S, Roche R, Bhattacharya D. PolyFold: An interactive visual simulator for distance-based protein folding. PLoS One 2020; 15:e0243331. [PMID: 33270805 PMCID: PMC7714222 DOI: 10.1371/journal.pone.0243331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 11/18/2020] [Indexed: 11/18/2022] Open
Abstract
Recent advances in distance-based protein folding have led to a paradigm shift in protein structure prediction. Through sufficiently precise estimation of the inter-residue distance matrix for a protein sequence, it is now feasible to predict the correct folds for new proteins much more accurately than ever before. Despite the exciting progress, a dedicated visualization system that can dynamically capture the distance-based folding process is still lacking. Most molecular visualizers typically provide only a static view of a folded protein conformation, but do not capture the folding process. Even among the selected few graphical interfaces that do adopt a dynamic perspective, none of them are distance-based. Here we present PolyFold, an interactive visual simulator for dynamically capturing the distance-based protein folding process through real-time rendering of a distance matrix and its compatible spatial conformation as it folds in an intuitive and easy-to-use interface. PolyFold integrates highly convergent stochastic optimization algorithms with on-demand customizations and interactive manipulations to maximally satisfy the geometric constraints imposed by a distance matrix. PolyFold is capable of simulating the complex process of protein folding even on modest personal computers, thus making it accessible to the general public for fostering citizen science. Open source code of PolyFold is freely available for download at https://github.com/Bhattacharya-Lab/PolyFold. It is implemented in cross-platform Java and binary executables are available for macOS, Linux, and Windows.
Collapse
Affiliation(s)
- Andrew J. McGehee
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
- Department of Biological Sciences, Auburn University, Auburn, AL, United States of America
- * E-mail:
| |
Collapse
|
11
|
Badaczewska-Dawid AE, Kolinski A, Kmiecik S. Computational reconstruction of atomistic protein structures from coarse-grained models. Comput Struct Biotechnol J 2019; 18:162-176. [PMID: 31969975 PMCID: PMC6961067 DOI: 10.1016/j.csbj.2019.12.007] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 12/10/2019] [Indexed: 01/02/2023] Open
Abstract
Three-dimensional protein structures, whether determined experimentally or theoretically, are often too low resolution. In this mini-review, we outline the computational methods for protein structure reconstruction from incomplete coarse-grained to all atomistic models. Typical reconstruction schemes can be divided into four major steps. Usually, the first step is reconstruction of the protein backbone chain starting from the C-alpha trace. This is followed by side-chains rebuilding based on protein backbone geometry. Subsequently, hydrogen atoms can be reconstructed. Finally, the resulting all-atom models may require structure optimization. Many methods are available to perform each of these tasks. We discuss the available tools and their potential applications in integrative modeling pipelines that can transfer coarse-grained information from computational predictions, or experiment, to all atomistic structures.
Collapse
Affiliation(s)
| | | | - Sebastian Kmiecik
- Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| |
Collapse
|
12
|
Xu J, Wang S. Analysis of distance-based protein structure prediction by deep learning in CASP13. Proteins 2019; 87:1069-1081. [PMID: 31471916 DOI: 10.1002/prot.25810] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 07/24/2019] [Accepted: 08/27/2019] [Indexed: 12/30/2022]
Abstract
This paper reports the CASP13 results of distance-based contact prediction, threading, and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free-modeling) targets with a median multiple sequence alignment (MSA) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time-consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2, and L long-range contact precision of 70%, 58%, and 45%, respectively, and predicted correct folds (TMscore > 0.5) for 18 of 32 targets. Further, RaptorX predicted correct folds for all FM targets with >300 residues (T0950-D1, T0969-D1, and T1000-D2) and generated the best 3D models for T0950-D1 and T0969-D1 among all groups. This CASP13 test confirms our previous findings: (a) predicted distance is more useful than contacts for both template-based and free modeling; and (b) structure modeling may be improved by integrating template and coevolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13.
Collapse
Affiliation(s)
- Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
13
|
Abstract
Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.
Collapse
|
14
|
Adhikari B, Hou J, Cheng J. DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics 2019; 34:1466-1472. [PMID: 29228185 PMCID: PMC5925776 DOI: 10.1093/bioinformatics/btx781] [Citation(s) in RCA: 105] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 12/07/2017] [Indexed: 12/14/2022] Open
Abstract
Motivation Significant improvements in the prediction of protein residue–residue contacts are observed in the recent years. These contacts, predicted using a variety of coevolution-based and machine learning methods, are the key contributors to the recent progress in ab initio protein structure prediction, as demonstrated in the recent CASP experiments. Continuing the development of new methods to reliably predict contact maps is essential to further improve ab initio structure prediction. Results In this paper we discuss DNCON2, an improved protein contact map predictor based on two-level deep convolutional neural networks. It consists of six convolutional neural networks—the first five predict contacts at 6, 7.5, 8, 8.5 and 10 Å distance thresholds, and the last one uses these five predictions as additional features to predict final contact maps. On the free-modeling datasets in CASP10, 11 and 12 experiments, DNCON2 achieves mean precisions of 35, 50 and 53.4%, respectively, higher than 30.6% by MetaPSICOV on CASP10 dataset, 34% by MetaPSICOV on CASP11 dataset and 46.3% by Raptor-X on CASP12 dataset, when top L/5 long-range contacts are evaluated. We attribute the improved performance of DNCON2 to the inclusion of short- and medium-range contacts into training, two-level approach to prediction, use of the state-of-the-art optimization and activation functions, and a novel deep learning architecture that allows each filter in a convolutional layer to access all the input features of a protein of arbitrary length. Availability and implementation The web server of DNCON2 is at http://sysbio.rnet.missouri.edu/dncon2/ where training and testing datasets as well as the predictions for CASP10, 11 and 12 free-modeling datasets can also be downloaded. Its source code is available at https://github.com/multicom-toolbox/DNCON2/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA
| | - Jie Hou
- Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA
| | - Jianlin Cheng
- Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA.,Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
15
|
Kurczynska M, Kotulska M. Automated method to differentiate between native and mirror protein models obtained from contact maps. PLoS One 2018; 13:e0196993. [PMID: 29787567 PMCID: PMC5963800 DOI: 10.1371/journal.pone.0196993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Accepted: 04/24/2018] [Indexed: 11/23/2022] Open
Abstract
Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters–the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models.
Collapse
Affiliation(s)
- Monika Kurczynska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - Malgorzata Kotulska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
- * E-mail:
| |
Collapse
|
16
|
Wang Y, Huang J, Li W, Wang S, Ding C. Specific and intrinsic sequence patterns extracted by deep learning from intra-protein binding and non-binding peptide fragments. Sci Rep 2017; 7:14916. [PMID: 29097708 PMCID: PMC5668431 DOI: 10.1038/s41598-017-14877-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Accepted: 10/19/2017] [Indexed: 11/30/2022] Open
Abstract
The key finding in the DNA double helix model is the specific pairing or binding between nucleotides A-T and C-G, and the pairing rules are the molecule basis of genetic code. Unfortunately, no such rules have been discovered for proteins. Here we show that intrinsic sequence patterns between intra-protein binding peptide fragments exist, they can be extracted using a deep learning algorithm, and they bear an interesting semblance to the DNA double helix model. The intra-protein binding peptide fragments have specific and intrinsic sequence patterns, distinct from non-binding peptide fragments, and multi-millions of binding and non-binding peptide fragments from currently available protein X-ray structures are classified with an accuracy of up to 93%. The specific binding between short peptide fragments may provide an important driving force for protein folding and protein-protein interaction, two open and fundamental problems in molecular biology, and it may have significant potential in design, discovery, and development of peptide, protein, and antibody drugs.
Collapse
Affiliation(s)
- Yuhong Wang
- Department of chemistry and Laser Chemistry Institute, Fudan University, Shanghai, 200433, P.R. China.
| | - Junzhou Huang
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, 76019, USA
| | - Wei Li
- School of life science, Jilin University, Changchun, 130012, P.R. China
| | - Sheng Wang
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, 76019, USA
| | - Chuanfan Ding
- Department of chemistry and Laser Chemistry Institute, Fudan University, Shanghai, 200433, P.R. China
| |
Collapse
|
17
|
Xiong D, Mao W, Gong H. Predicting the helix-helix interactions from correlated residue mutations. Proteins 2017; 85:2162-2169. [PMID: 28833538 DOI: 10.1002/prot.25370] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Revised: 08/03/2017] [Accepted: 08/13/2017] [Indexed: 12/30/2022]
Abstract
Helix-helix interactions are crucial in the structure assembly, stability and function of helix-rich proteins including many membrane proteins. In spite of remarkable progresses over the past decades, the accuracy of predicting protein structures from their amino acid sequences is still far from satisfaction. In this work, we focused on a simpler problem, the prediction of helix-helix interactions, the results of which could facilitate practical protein structure prediction by constraining the sampling space. Specifically, we started from the noisy 2D residue contact maps derived from correlated residue mutations, and utilized ridge detection to identify the characteristic residue contact patterns for helix-helix interactions. The ridge information as well as a few additional features were then fed into a machine learning model HHConPred to predict interactions between helix pairs. In an independent test, our method achieved an F-measure of ∼60% for predicting helix-helix interactions. Moreover, although the model was trained mainly using soluble proteins, it could be extended to membrane proteins with at least comparable performance relatively to previous approaches that were generated purely using membrane proteins. All data and source codes are available at http://166.111.152.91/Downloads.html or https://github.com/dpxiong/HHConPred.
Collapse
Affiliation(s)
- Dapeng Xiong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| | - Wenzhi Mao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| |
Collapse
|
18
|
Abstract
Post-translational modifications (PTMs) are an important source of protein regulation; they fine-tune the function, localization, and interaction with other molecules of the majority of proteins and are partially responsible for their multifunctionality. Usually, proteins have several potential modification sites, and their patterns of occupancy are associated with certain functional states. These patterns imply cross talk among PTMs within and between proteins, the majority of which are still to be discovered. Several methods detect associations between PTMs; these have recently combined into a global resource, the PTMcode database, which contains already known and predicted functional associations between pairs of PTMs from more than 45,000 proteins in 19 eukaryotic species.
Collapse
Affiliation(s)
- Pablo Minguez
- Department of Genetics and Genomics, Instituto de Investigacion Sanitaria-University Hospital Fundacion Jimenez Diaz (IIS-FJD), Avda. Reyes Católicos 2, 28040, Madrid, Spain.
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, 13125, Berlin, Germany
| |
Collapse
|
19
|
Li Y, Song T, Yang J, Zhang Y, Yang J. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids. PLoS One 2016; 11:e0167430. [PMID: 27918587 PMCID: PMC5137889 DOI: 10.1371/journal.pone.0167430] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2016] [Accepted: 11/14/2016] [Indexed: 11/30/2022] Open
Abstract
In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences. We then apply this method into the ND5 dataset consisting of the ND5 protein sequences of 9 species, and the F10 and G11 datasets representing two of the xylanases containing glycoside hydrolase families, i.e., families 10 and 11. As a result, our method achieves a correlation coefficient of 0.962 with the canonical protein sequence aligner ClustalW in the ND5 dataset, much higher than those of other 5 popular alignment-free methods. In addition, we successfully separate the xylanases sequences in the F10 family and the G11 family and illustrate that the F10 family is more heat stable than the G11 family, consistent with a few previous studies. Moreover, we prove mathematically an identity equation involving the Pseudo-Markov transition probability vector and the amino acids content ratio vector.
Collapse
Affiliation(s)
- Yushuang Li
- School of Science, Yanshan University, Qinhuangdao, China
| | - Tian Song
- School of Science, Yanshan University, Qinhuangdao, China
| | - Jiasheng Yang
- Department of Civil and Environmental Engineering, National Universality of Singapore, Singapore
| | - Yi Zhang
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei, China
| | - Jialiang Yang
- School of Mathematics and Information Science, Henan Polytechnic University, Henan, China
| |
Collapse
|
20
|
Szałaj P, Tang Z, Michalski P, Pietal MJ, Luo OJ, Sadowski M, Li X, Radew K, Ruan Y, Plewczynski D. An integrated 3-Dimensional Genome Modeling Engine for data-driven simulation of spatial genome organization. Genome Res 2016; 26:1697-1709. [PMID: 27789526 PMCID: PMC5131821 DOI: 10.1101/gr.205062.116] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 10/20/2016] [Indexed: 02/03/2023]
Abstract
ChIA-PET is a high-throughput mapping technology that reveals long-range chromatin interactions and provides insights into the basic principles of spatial genome organization and gene regulation mediated by specific protein factors. Recently, we showed that a single ChIA-PET experiment provides information at all genomic scales of interest, from the high-resolution locations of binding sites and enriched chromatin interactions mediated by specific protein factors, to the low resolution of nonenriched interactions that reflect topological neighborhoods of higher-order chromosome folding. This multilevel nature of ChIA-PET data offers an opportunity to use multiscale 3D models to study structural-functional relationships at multiple length scales, but doing so requires a structural modeling platform. Here, we report the development of 3D-GNOME (3-Dimensional Genome Modeling Engine), a complete computational pipeline for 3D simulation using ChIA-PET data. 3D-GNOME consists of three integrated components: a graph-distance-based heat map normalization tool, a 3D modeling platform, and an interactive 3D visualization tool. Using ChIA-PET and Hi-C data derived from human B-lymphocytes, we demonstrate the effectiveness of 3D-GNOME in building 3D genome models at multiple levels, including the entire genome, individual chromosomes, and specific segments at megabase (Mb) and kilobase (kb) resolutions of single average and ensemble structures. Further incorporation of CTCF-motif orientation and high-resolution looping patterns in 3D simulation provided additional reliability of potential biologically plausible topological structures.
Collapse
Affiliation(s)
- Przemysław Szałaj
- Centre of New Technologies, Warsaw University, 02-097 Warsaw, Poland.,Centre for Innovative Research, Medical University of Bialystok, 15-089 Białystok, Poland.,I-BioStat, Hasselt University, BE3590 Hasselt, Belgium
| | - Zhonghui Tang
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Paul Michalski
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Michal J Pietal
- Centre of New Technologies, Warsaw University, 02-097 Warsaw, Poland
| | - Oscar J Luo
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Michał Sadowski
- Centre of New Technologies, Warsaw University, 02-097 Warsaw, Poland
| | - Xingwang Li
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Kamen Radew
- Centre of New Technologies, Warsaw University, 02-097 Warsaw, Poland
| | - Yijun Ruan
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA.,Department of Genetics and Genome Sciences, UConn Health, Farmington, Connecticut 06032, USA
| | - Dariusz Plewczynski
- Centre of New Technologies, Warsaw University, 02-097 Warsaw, Poland.,Centre for Innovative Research, Medical University of Bialystok, 15-089 Białystok, Poland.,Faculty of Pharmacy, Medical University of Warsaw, 02-097 Warsaw, Poland
| |
Collapse
|
21
|
Bhattacharya D, Cao R, Cheng J. UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling. Bioinformatics 2016; 32:2791-9. [PMID: 27259540 PMCID: PMC5018369 DOI: 10.1093/bioinformatics/btw316] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 05/15/2016] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named 'foldons' through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computational approaches for de novo protein structure prediction often randomly sample protein conformational space as opposed to experimentally suggested stepwise sampling. RESULTS Here, we develop a novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling via stepwise synthesis and assembly of foldon units that minimizes a composite physics and knowledge-based energy function for de novo protein structure prediction. The proposed method, UniCon3D, has been found to (i) sample lower energy conformations with higher accuracy than traditional random sampling in a small benchmark of 6 proteins; (ii) perform comparably with the top five automated methods on 30 difficult target domains from the 11th Critical Assessment of Protein Structure Prediction (CASP) experiment and on 15 difficult target domains from the 10th CASP experiment; and (iii) outperform two state-of-the-art approaches and a baseline counterpart of UniCon3D that performs traditional random sampling for protein modeling aided by predicted residue-residue contacts on 45 targets from the 10th edition of CASP. AVAILABILITY AND IMPLEMENTATION Source code, executable versions, manuals and example data of UniCon3D for Linux and OSX are freely available to non-commercial users at http://sysbio.rnet.missouri.edu/UniCon3D/ CONTACT: chengji@missouri.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Jianlin Cheng
- Department of Computer Science Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|