51
|
Dong S, Wang S. Assembled graph neural network using graph transformer with edges for protein model quality assessment. J Mol Graph Model 2021; 110:108053. [PMID: 34773871 DOI: 10.1016/j.jmgm.2021.108053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 10/13/2021] [Accepted: 10/13/2021] [Indexed: 10/19/2022]
Abstract
Acquainting protein's structure is of vital importance to accurately understanding its function. Computational method of deep learning has made great progress in protein structure prediction from sequence, and has the potential to help structural biology research. The computational methods usually require independent protein structure model quality assessment to select the best from the model pool or guide protein structure refinement. We construct a graph neural network finely assembled with Graph Transformer Feature Extractor and message-passing layers for protein model quality assessment. The graph based method can more naturally embody the protein structure than a sequence or voxelized representation method. Although the widely used graph convolutional network has a strong ability to learn spatial patterns, it does not weigh the dependencies of different nodes on other nodes. So we introduce Graph Transformer to excavate the different degrees of neighboring residue nodes contributing to their local environments and extract local features. This is subsequently followed by message-passing layers to transmit-receive local information. Our network makes better use of edge information and is lightweight since relatively few input features and number of network layers, and experimental results demonstrate that our model outperforms various existing methods. Core code is made freely available at: https://github.com/Crystal-Dsq/proteinqa.
Collapse
Affiliation(s)
- Shiqi Dong
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, China.
| |
Collapse
|
52
|
Tong X, Liu S, Gu J, Wu C, Liang Y, Shi X. Amino acid environment affinity model based on graph attention network. J Bioinform Comput Biol 2021; 20:2150032. [PMID: 34775920 DOI: 10.1142/s0219720021500323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Proteins are engines involved in almost all functions of life. They have specific spatial structures formed by twisting and folding of one or more polypeptide chains composed of amino acids. Protein sites are protein structure microenvironments that can be identified by three-dimensional locations and local neighborhoods in which the structure or function exists. Understanding the amino acid environment affinity is essential for additional protein structural or functional studies, such as mutation analysis and functional site detection. In this study, an amino acid environment affinity model based on the graph attention network was developed. Initially, we constructed a protein graph according to the distance between amino acid pairs. Then, we extracted a set of structural features for each node. Finally, the protein graph and the associated node feature set were set to input the graph attention network model and to obtain the amino acid affinities. Numerical results show that our proposed method significantly outperforms a recent 3DCNN-based method by almost 30%.
Collapse
Affiliation(s)
- Xueheng Tong
- College of Computer Science and Technology, Jilin University, Qianjing Street 2699, Changchun, Jilin 130012, China
| | - Shuqi Liu
- College of Computer Science and Technology, Jilin University, Qianjing Street 2699, Changchun, Jilin 130012, China
| | - Jiawei Gu
- College of Computer Science and Technology, Jilin University, Qianjing Street 2699, Changchun, Jilin 130012, China
| | - Chunguo Wu
- College of Computer Science and Technology, Jilin University, Qianjing Street 2699, Changchun, Jilin 130012, China
| | - Yanchun Liang
- School of Computer Science, Zhuhai College of Science and Technology Zhuhai, Guangdong 519041, China
| | - Xiaohu Shi
- College of Computer Science and Technology, Jilin University, Qianjing Street 2699, Changchun, Jilin 130012, China.,School of Computer Science, Zhuhai College of Science and Technology Zhuhai, Guangdong 519041, China
| |
Collapse
|
53
|
Abstract
Proteolysis-targeting chimeras (PROTACs), which selectively induce targeted protein degradation, represent an emerging drug discovery technology. Although numerous PROTACs have been reported, designing potent PROTACs still remains a great challenge, to some extent, due to insufficient structural data of Target-PROTAC-E3 ternary complexes. In this work, PROTAC-Model, an integrative computational method by combining the FRODOCK-based protocol and RosettaDock-based refinement, was developed to predict PROTAC-mediated ternary complex structures and tested on 14 cases. The quality of the models was evaluated using the criteria of the critical assessment of predicted interactions (CAPRI). Using the unbound structures, the FRODOCK-based protocol can generate the ternary complex structures with medium or high quality for 8 cases out of 14. With the refinement by RosettaDock, the cases with medium or high quality increase to 12. Compared with PRosettaC and the method developed by Drummond et al., PROTAC-Model shows better performance. In summary, PROTAC-Model should be useful for the rational design of PROTACs.
Collapse
Affiliation(s)
- Gaoqi Weng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
54
|
Proteomic Tools for the Analysis of Cytoskeleton Proteins. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2364:363-425. [PMID: 34542864 DOI: 10.1007/978-1-0716-1661-1_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Proteomic analyses have become an essential part of the toolkit of the molecular biologist, given the widespread availability of genomic data and open source or freely accessible bioinformatics software. Tools are available for detecting homologous sequences, recognizing functional domains, and modeling the three-dimensional structure for any given protein sequence, as well as for predicting interactions with other proteins or macromolecules. Although a wealth of structural and functional information is available for many cytoskeletal proteins, with representatives spanning all of the major subfamilies, the majority of cytoskeletal proteins remain partially or totally uncharacterized. Moreover, bioinformatics tools provide a means for studying the effects of synthetic mutations or naturally occurring variants of these cytoskeletal proteins. This chapter discusses various freely available proteomic analysis tools, with a focus on in silico prediction of protein structure and function. The selected tools are notable for providing an easily accessible interface for the novice while retaining advanced functionality for more experienced computational biologists.
Collapse
|
55
|
Jandova Z, Vargiu AV, Bonvin AMJJ. Native or Non-Native Protein-Protein Docking Models? Molecular Dynamics to the Rescue. J Chem Theory Comput 2021; 17:5944-5954. [PMID: 34342983 PMCID: PMC8444332 DOI: 10.1021/acs.jctc.1c00336] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Indexed: 11/29/2022]
Abstract
Molecular docking excels at creating a plethora of potential models of protein-protein complexes. To correctly distinguish the favorable, native-like models from the remaining ones remains, however, a challenge. We assessed here if a protocol based on molecular dynamics (MD) simulations would allow distinguishing native from non-native models to complement scoring functions used in docking. To this end, the first models for 25 protein-protein complexes were generated using HADDOCK. Next, MD simulations complemented with machine learning were used to discriminate between native and non-native complexes based on a combination of metrics reporting on the stability of the initial models. Native models showed higher stability in almost all measured properties, including the key ones used for scoring in the Critical Assessment of PRedicted Interaction (CAPRI) competition, namely the positional root mean square deviations and fraction of native contacts from the initial docked model. A random forest classifier was trained, reaching a 0.85 accuracy in correctly distinguishing native from non-native complexes. Reasonably modest simulation lengths of the order of 50-100 ns are sufficient to reach this accuracy, which makes this approach applicable in practice.
Collapse
Affiliation(s)
- Zuzana Jandova
- Computational
Structural Biology Group, Bijvoet Centre for Biomolecular Research,
Faculty of Science—Chemistry, Utrecht
University, Padualaan 8, 3584 CH Utrecht, the Netherlands
| | - Attilio Vittorio Vargiu
- Physics
Department, University of Cagliari, Cittadella
Universitaria, S.P. 8 km 0.700, 09042 Monserrato, Italy
| | - Alexandre M. J. J. Bonvin
- Computational
Structural Biology Group, Bijvoet Centre for Biomolecular Research,
Faculty of Science—Chemistry, Utrecht
University, Padualaan 8, 3584 CH Utrecht, the Netherlands
| |
Collapse
|
56
|
Laine E, Eismann S, Elofsson A, Grudinin S. Protein sequence-to-structure learning: Is this the end(-to-end revolution)? Proteins 2021; 89:1770-1786. [PMID: 34519095 DOI: 10.1002/prot.26235] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/16/2021] [Accepted: 09/03/2021] [Indexed: 01/08/2023]
Abstract
The potential of deep learning has been recognized in the protein structure prediction community for some time, and became indisputable after CASP13. In CASP14, deep learning has boosted the field to unanticipated levels reaching near-experimental accuracy. This success comes from advances transferred from other machine learning areas, as well as methods specifically designed to deal with protein sequences and structures, and their abstractions. Novel emerging approaches include (i) geometric learning, that is, learning on representations such as graphs, three-dimensional (3D) Voronoi tessellations, and point clouds; (ii) pretrained protein language models leveraging attention; (iii) equivariant architectures preserving the symmetry of 3D space; (iv) use of large meta-genome databases; (v) combinations of protein representations; and (vi) finally truly end-to-end architectures, that is, differentiable models starting from a sequence and returning a 3D structure. Here, we provide an overview and our opinion of the novel deep learning approaches developed in the last 2 years and widely used in CASP14.
Collapse
Affiliation(s)
- Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Stephan Eismann
- Department of Computer Science and Applied Physics, Stanford University, Stanford, California, USA
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, Grenoble, France
| |
Collapse
|
57
|
Ye L, Wu P, Peng Z, Gao J, Liu J, Yang J. Improved estimation of model quality using predicted inter-residue distance. Bioinformatics 2021; 37:3752-3759. [PMID: 34473228 DOI: 10.1093/bioinformatics/btab632] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 08/27/2021] [Accepted: 08/31/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein model quality assessment (QA) is an essential component in protein structure prediction, which aims to estimate the quality of a structure model and/or select the most accurate model out from a pool of structure models, without knowing the native structure. QA remains a challenging task in protein structure prediction. RESULTS Based on the inter-residue distance predicted by the recent deep learning-based structure prediction algorithm trRosetta, we developed QDistance, a new approach to the estimation of both global and local qualities. QDistance works for both single-model and multi-models inputs. We designed several distance-based features to assess the agreement between the predicted and model-derived inter-residue distances. Together with a few widely used features, they are fed into a simple yet powerful linear regression model to infer the global QA scores. The local QA scores for each structure model are predicted based on a comparative analysis with a set of selected reference models. For multi-models input, the reference models are selected from the input based on the predicted global QA scores. For single-model input, the reference models are predicted by trRosetta. With the informative distance-based features, QDistance can predict the global quality with satisfactory accuracy. Benchmark tests on the CASP13 and the CAMEO structure models suggested that QDistance was competitive other methods. Blind tests in the CASP14 experiments showed that QDistance was robust and ranked among the top predictors. Especially, QDistance was the top 3 local QA method and made the most accurate local QA prediction for unreliable local region. Analysis showed that this superior performance can be attributed to the inclusion of the predicted inter-residue distance. AVAILABILITY AND IMPLEMENTATION http://yanglab.nankai.edu.cn/QDistance. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lisha Ye
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Peikun Wu
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China
| | - Jianzhao Gao
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Jian Liu
- College of Computer Science, Nankai University, Tianjin, 300071, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| |
Collapse
|
58
|
Igashov I, Pavlichenko N, Grudinin S. Spherical convolutions on molecular graphs for protein model quality assessment. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abf856] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Abstract
Processing information on three-dimensional (3D) objects requires methods stable to rigid-body transformations, in particular rotations, of the input data. In image processing tasks, convolutional neural networks achieve this property using rotation-equivariant operations. However, contrary to images, graphs generally have irregular topology. This makes it challenging to define a rotation-equivariant convolution operation on these structures. In this work, we propose spherical graph convolutional network that processes 3D models of proteins represented as molecular graphs. In a protein molecule, individual amino acids have common topological elements. This allows us to unambiguously associate each amino acid with a local coordinate system and construct rotation-equivariant spherical filters that operate on angular information between graph nodes. Within the framework of the protein model quality assessment problem, we demonstrate that the proposed spherical convolution method significantly improves the quality of model assessment compared to the standard message-passing approach. It is also comparable to state-of-the-art methods, as we demonstrate on critical assessment of structure prediction benchmarks. The proposed technique operates only on geometric features of protein 3D models. This makes it universal and applicable to any other geometric-learning task where the graph structure allows constructing local coordinate systems. The method is available at https://team.inria.fr/nano-d/software/s-gcn/.
Collapse
|
59
|
McGuffin LJ, Aldowsari FMF, Alharbi SMA, Adiyaman R. ModFOLD8: accurate global and local quality estimates for 3D protein models. Nucleic Acids Res 2021; 49:W425-W430. [PMID: 33963867 PMCID: PMC8218196 DOI: 10.1093/nar/gkab321] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 04/01/2021] [Accepted: 04/21/2021] [Indexed: 11/26/2022] Open
Abstract
Methods for estimating the quality of 3D models of proteins are vital tools for driving the acceptance and utility of predicted tertiary structures by the wider bioscience community. Here we describe the significant major updates to ModFOLD, which has maintained its position as a leading server for the prediction of global and local quality of 3D protein models, over the past decade (>20 000 unique external users). ModFOLD8 is the latest version of the server, which combines the strengths of multiple pure-single and quasi-single model methods. Improvements have been made to the web server interface and there has been successive increases in prediction accuracy, which were achieved through integration of newly developed scoring methods and advanced deep learning-based residue contact predictions. Each version of the ModFOLD server has been independently blind tested in the biennial CASP experiments, as well as being continuously evaluated via the CAMEO project. In CASP13 and CASP14, the ModFOLD7 and ModFOLD8 variants ranked among the top 10 quality estimation methods according to almost every official analysis. Prior to CASP14, ModFOLD8 was also applied for the evaluation of SARS-CoV-2 protein models as part of CASP Commons 2020 initiative. The ModFOLD8 server is freely available at: https://www.reading.ac.uk/bioinf/ModFOLD/.
Collapse
Affiliation(s)
- Liam J McGuffin
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Fahd M F Aldowsari
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Shuaa M A Alharbi
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Recep Adiyaman
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| |
Collapse
|
60
|
Lechuga A, Kazlauskas D, Salas M, Redrejo-Rodríguez M. Unlimited Cooperativity of Betatectivirus SSB, a Novel DNA Binding Protein Related to an Atypical Group of SSBs From Protein-Primed Replicating Bacterial Viruses. Front Microbiol 2021; 12:699140. [PMID: 34267740 PMCID: PMC8276246 DOI: 10.3389/fmicb.2021.699140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 06/08/2021] [Indexed: 11/20/2022] Open
Abstract
Bam35 and related betatectiviruses are tail-less bacteriophages that prey on members of the Bacillus cereus group. These temperate viruses replicate their linear genome by a protein-primed mechanism. In this work, we have identified and characterized the product of the viral ORF2 as a single-stranded DNA binding protein (hereafter B35SSB). B35SSB binds ssDNA with great preference over dsDNA or RNA in a sequence-independent, highly cooperative manner that results in a non-specific stimulation of DNA replication. We have also identified several aromatic and basic residues, involved in base-stacking and electrostatic interactions, respectively, that are required for effective protein-ssDNA interaction. Although SSBs are essential for DNA replication in all domains of life as well as many viruses, they are very diverse proteins. However, most SSBs share a common structural domain, named OB-fold. Protein-primed viruses could constitute an exception, as no OB-fold DNA binding protein has been reported. Based on databases searches as well as phylogenetic and structural analyses, we showed that B35SSB belongs to a novel and independent group of SSBs. This group contains proteins encoded by protein-primed viral genomes from unrelated viruses, spanning betatectiviruses and Φ29 and close podoviruses, and they share a conserved pattern of secondary structure. Sensitive searches and structural predictions indicate that B35SSB contains a conserved domain resembling a divergent OB-fold, which would constitute the first occurrence of an OB-fold-like domain in a protein-primed genome.
Collapse
Affiliation(s)
- Ana Lechuga
- Centro de Biologiìa Molecular Severo Ochoa (CSIC-UAM), Madrid, Spain
| | - Darius Kazlauskas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio Av. 7, Vilnius, Lithuania
| | - Margarita Salas
- Centro de Biologiìa Molecular Severo Ochoa (CSIC-UAM), Madrid, Spain
| | - Modesto Redrejo-Rodríguez
- Centro de Biologiìa Molecular Severo Ochoa (CSIC-UAM), Madrid, Spain
- Departamento de Bioquímica, Universidad Autónoma de Madrid (UAM), Madrid, Spain
- Instituto de Investigaciones Biomédicas Alberto Sols (CSIC-UAM), Madrid, Spain
| |
Collapse
|
61
|
Dapkūnas J, Olechnovič K, Venclovas Č. Modeling of protein complexes in CASP14 with emphasis on the interaction interface prediction. Proteins 2021; 89:1834-1843. [PMID: 34176161 PMCID: PMC9292421 DOI: 10.1002/prot.26167] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 06/21/2021] [Accepted: 06/23/2021] [Indexed: 01/08/2023]
Abstract
The goal of CASP experiments is to monitor the progress in the protein structure prediction field. During the 14th CASP edition we aimed to test our capabilities of predicting structures of protein complexes. Our protocol for modeling protein assemblies included both template‐based modeling and free docking. Structural templates were identified using sensitive sequence‐based searches. If sequence‐based searches failed, we performed structure‐based template searches using selected CASP server models. In the absence of reliable templates we applied free docking starting from monomers generated by CASP servers. We evaluated and ranked models of protein complexes using an improved version of our protein structure quality assessment method, VoroMQA, taking into account both interaction interface and global structure scores. If reliable templates could be identified, generally accurate models of protein assemblies were generated with the exception of an antibody‐antigen interaction. The success of free docking mainly depended on the accuracy of initial subunit models and on the scoring of docking solutions. To put our overall results in perspective, we analyzed our performance in the context of other CASP groups. Although the subunits in our assembly models often were not of the top quality, these models had, overall, the best‐predicted intersubunit interfaces according to several accuracy measures. We attribute our relative success primarily to the emphasis on the interaction interface when modeling and scoring.
Collapse
Affiliation(s)
- Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
62
|
Olechnovič K, Venclovas Č. VoroContacts: a tool for the analysis of interatomic contacts in macromolecular structures. Bioinformatics 2021; 37:4873-4875. [PMID: 34132767 DOI: 10.1093/bioinformatics/btab448] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 05/03/2021] [Accepted: 06/14/2021] [Indexed: 11/12/2022] Open
Abstract
SUMMARY VoroContacts is a versatile tool for computing and analyzing contact surface areas (CSAs) and solvent accessible surface areas (SASAs) for 3 D structures of proteins, nucleic acids and their complexes at the atomic resolution. CSAs and SASAs are derived using Voronoi tessellation of 3 D structure, represented as a collection of atomic balls. VoroContacts web server features a highly configurable query interface, which enables on-the-fly analysis of contacts for selected set of atoms and allows filtering interatomic contacts by their type, surface areas, distance between contacting atoms and sequence separation between contacting residues. The VoroContacts functionality is also implemented as part of the standalone Voronota package, enabling batch processing. AVAILABILITY AND IMPLEMENTATION https://bioinformatics.lt/wtsam/vorocontacts. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius, LT-10257, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius, LT-10257, Lithuania
| |
Collapse
|
63
|
MAP4K4 expression in cardiomyocytes: multiple isoforms, multiple phosphorylations and interactions with striatins. Biochem J 2021; 478:2121-2143. [PMID: 34032269 PMCID: PMC8203206 DOI: 10.1042/bcj20210003] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 05/21/2021] [Accepted: 05/25/2021] [Indexed: 12/02/2022]
Abstract
The Ser/Thr kinase MAP4K4, like other GCKIV kinases, has N-terminal kinase and C-terminal citron homology (CNH) domains. MAP4K4 can activate c-Jun N-terminal kinases (JNKs), and studies in the heart suggest it links oxidative stress to JNKs and heart failure. In other systems, MAP4K4 is regulated in striatin-interacting phosphatase and kinase (STRIPAK) complexes, in which one of three striatins tethers PP2A adjacent to a kinase to keep it dephosphorylated and inactive. Our aim was to understand how MAP4K4 is regulated in cardiomyocytes. The rat MAP4K4 gene was not properly defined. We identified the first coding exon of the rat gene using 5′-RACE, we cloned the full-length sequence and confirmed alternative-splicing of MAP4K4 in rat cardiomyocytes. We identified an additional α-helix C-terminal to the kinase domain important for kinase activity. In further studies, FLAG-MAP4K4 was expressed in HEK293 cells or cardiomyocytes. The Ser/Thr protein phosphatase inhibitor calyculin A (CalA) induced MAP4K4 hyperphosphorylation, with phosphorylation of the activation loop and extensive phosphorylation of the linker between the kinase and CNH domains. This required kinase activity. MAP4K4 associated with myosin in untreated cardiomyocytes, and this was lost with CalA-treatment. FLAG-MAP4K4 associated with all three striatins in cardiomyocytes, indicative of regulation within STRIPAK complexes and consistent with activation by CalA. Computational analysis suggested the interaction was direct and mediated via coiled-coil domains. Surprisingly, FLAG-MAP4K4 inhibited JNK activation by H2O2 in cardiomyocytes and increased myofibrillar organisation. Our data identify MAP4K4 as a STRIPAK-regulated kinase in cardiomyocytes, and suggest it regulates the cytoskeleton rather than activates JNKs.
Collapse
|
64
|
Wang X, Flannery ST, Kihara D. Protein Docking Model Evaluation by Graph Neural Networks. Front Mol Biosci 2021; 8:647915. [PMID: 34113650 PMCID: PMC8185212 DOI: 10.3389/fmolb.2021.647915] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 04/26/2021] [Indexed: 12/03/2022] Open
Abstract
Physical interactions of proteins play key functional roles in many important cellular processes. To understand molecular mechanisms of such functions, it is crucial to determine the structure of protein complexes. To complement experimental approaches, which usually take a considerable amount of time and resources, various computational methods have been developed for predicting the structures of protein complexes. In computational modeling, one of the challenges is to identify near-native structures from a large pool of generated models. Here, we developed a deep learning-based approach named Graph Neural Network-based DOcking decoy eValuation scorE (GNN-DOVE). To evaluate a protein docking model, GNN-DOVE extracts the interface area and represents it as a graph. The chemical properties of atoms and the inter-atom distances are used as features of nodes and edges in the graph, respectively. GNN-DOVE was trained, validated, and tested on docking models in the Dockground database and further tested on a combined dataset of Dockground and ZDOCK benchmark as well as a CAPRI scoring dataset. GNN-DOVE performed better than existing methods, including DOVE, which is our previous development that uses a convolutional neural network on voxelized structure models.
Collapse
Affiliation(s)
- Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Sean T. Flannery
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| |
Collapse
|
65
|
Takei Y, Ishida T. P3CMQA: Single-Model Quality Assessment Using 3DCNN with Profile-Based Features. Bioengineering (Basel) 2021; 8:bioengineering8030040. [PMID: 33808604 PMCID: PMC8003382 DOI: 10.3390/bioengineering8030040] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/12/2021] [Accepted: 03/16/2021] [Indexed: 11/16/2022] Open
Abstract
Model quality assessment (MQA), which selects near-native structures from structure models, is an important process in protein tertiary structure prediction. The three-dimensional convolution neural network (3DCNN) was applied to the task, but the performance was comparable to existing methods because it used only atom-type features as the input. Thus, we added sequence profile-based features, which are also used in other methods, to improve the performance. We developed a single-model MQA method for protein structures based on 3DCNN using sequence profile-based features, namely, P3CMQA. Performance evaluation using a CASP13 dataset showed that profile-based features improved the assessment performance, and the proposed method was better than currently available single-model MQA methods, including the previous 3DCNN-based method. We also implemented a web-interface of the method to make it more user-friendly.
Collapse
Affiliation(s)
- Yuma Takei
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo 152-8550, Japan;
- Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo 152-8550, Japan;
- Correspondence:
| |
Collapse
|
66
|
Shuvo MH, Bhattacharya S, Bhattacharya D. QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks. Bioinformatics 2021; 36:i285-i291. [PMID: 32657397 PMCID: PMC7355297 DOI: 10.1093/bioinformatics/btaa455] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Protein model quality estimation, in many ways, informs protein structure prediction. Despite their tight coupling, existing model quality estimation methods do not leverage inter-residue distance information or the latest technological breakthrough in deep learning that has recently revolutionized protein structure prediction. RESULTS We present a new distance-based single-model quality estimation method called QDeep by harnessing the power of stacked deep residual neural networks (ResNets). Our method first employs stacked deep ResNets to perform residue-level ensemble error classifications at multiple predefined error thresholds, and then combines the predictions from the individual error classifiers for estimating the quality of a protein structural model. Experimental results show that our method consistently outperforms existing state-of-the-art methods including ProQ2, ProQ3, ProQ3D, ProQ4, 3DCNN, MESHI, and VoroMQA in multiple independent test datasets across a wide-range of accuracy measures; and that predicted distance information significantly contributes to the improved performance of QDeep. AVAILABILITY AND IMPLEMENTATION https://github.com/Bhattacharya-Lab/QDeep. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA.,Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| |
Collapse
|
67
|
Hiranuma N, Park H, Baek M, Anishchenko I, Dauparas J, Baker D. Improved protein structure refinement guided by deep learning based accuracy estimation. Nat Commun 2021; 12:1340. [PMID: 33637700 PMCID: PMC7910447 DOI: 10.1038/s41467-021-21511-x] [Citation(s) in RCA: 112] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 01/18/2021] [Indexed: 11/22/2022] Open
Abstract
We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models and uses these predictions to guide Rosetta protein structure refinement. The network uses 3D convolutions to evaluate local atomic environments followed by 2D convolutions to provide their global contexts and outperforms other methods that similarly predict the accuracy of protein structure models. Overall accuracy predictions for X-ray and cryoEM structures in the PDB correlate with their resolution, and the network should be broadly useful for assessing the accuracy of both predicted structure models and experimentally determined structures and identifying specific regions likely to be in error. Incorporation of the accuracy predictions at multiple stages in the Rosetta refinement protocol considerably increased the accuracy of the resulting protein structure models, illustrating how deep learning can improve search for global energy minima of biomolecules.
Collapse
Affiliation(s)
- Naozumi Hiranuma
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Washington, WA, USA
| | - Hahnbeom Park
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Minkyung Baek
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Ivan Anishchenko
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Justas Dauparas
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - David Baker
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Washington, WA, USA.
| |
Collapse
|
68
|
Igashov I, Olechnovič L, Kadukova M, Venclovas Č, Grudinin S. VoroCNN: Deep convolutional neural network built on 3D Voronoi tessellation of protein structures. Bioinformatics 2021; 37:2332-2339. [PMID: 33620450 DOI: 10.1093/bioinformatics/btab118] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 01/08/2021] [Accepted: 02/22/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Effective use of evolutionary information has recently led to tremendous progress in computational prediction of three-dimensional (3D) structures of proteins and their complexes. Despite the progress, the accuracy of predicted structures tends to vary considerably from case to case. Since the utility of computational models depends on their accuracy, reliable estimates of deviation between predicted and native structures are of utmost importance. RESULTS For the first time, we present a deep convolutional neural network (CNN) constructed on a Voronoi tessellation of 3D molecular structures. Despite the irregular data domain, our data representation allows us to efficiently introduce both convolution and pooling operations and train the network in an end-to-end fashion without precomputed descriptors. The resultant model, VoroCNN, predicts local qualities of 3D protein folds. The prediction results are competitive to state of the art and superior to the previous 3D CNN architectures built for the same task. We also discuss practical applications of VoroCNN, for example, in recognition of protein binding interfaces. AVAILABILITY The model, data, and evaluation tests are available at https://team.inria.fr/nano-d/software/vorocnn/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ilia Igashov
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.,Moscow Institute of Physics and Technology, 141701 Dolgoprudniy, Russia
| | - Liment Olechnovič
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, LT 10257, Lithuania
| | - Maria Kadukova
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.,Moscow Institute of Physics and Technology, 141701 Dolgoprudniy, Russia
| | - Česlovas Venclovas
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, LT 10257, Lithuania
| | - Sergei Grudinin
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
69
|
Sedova M, Jaroszewski L, Iyer M, Li Z, Godzik A. ModFlex: Towards Function Focused Protein Modeling. J Mol Biol 2021; 433:166828. [PMID: 33972023 DOI: 10.1016/j.jmb.2021.166828] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 01/07/2021] [Accepted: 01/09/2021] [Indexed: 11/19/2022]
Abstract
There is a wide, and continuously widening, gap between the number of proteins known only by their amino acid sequence versus those structurally characterized by direct experiment. To close this gap, we mostly rely on homology-based inference and modeling to reason about the structures of the uncharacterized proteins by using structures of homologous proteins as templates. With the rapidly growing size of the Protein Data Bank, there are often multiple choices of templates, including multiple sets of coordinates from the same protein. The substantial conformational differences observed between different experimental structures of the same protein often reflect function related structural flexibility. Thus, depending on the questions being asked, using distant homologs, or coordinate sets with lower resolution but solved in the appropriate functional form, as templates may be more informative. The ModFlex server (https://modflex.org/) addresses this seldom mentioned gap in the standard homology modeling approach by providing the user with an interface with multiple options and tools to select the most relevant template and explore the range of structural diversity in the available templates. ModFlex is closely integrated with a range of other programs and servers developed in our group for the analysis and visualization of protein structural flexibility and divergence.
Collapse
Affiliation(s)
- Mayya Sedova
- University of California Riverside School of Medicine, Biosciences Division, Riverside, CA, United States
| | - Lukasz Jaroszewski
- University of California Riverside School of Medicine, Biosciences Division, Riverside, CA, United States
| | - Mallika Iyer
- Graduate School of Biomedical Sciences, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, United States
| | - Zhanwen Li
- University of California Riverside School of Medicine, Biosciences Division, Riverside, CA, United States
| | - Adam Godzik
- University of California Riverside School of Medicine, Biosciences Division, Riverside, CA, United States.
| |
Collapse
|
70
|
Jing X, Xu J. Improved Protein Model Quality Assessment By Integrating Sequential And Pairwise Features Using Deep Learning. Bioinformatics 2020; 36:5361-5367. [PMID: 33325480 PMCID: PMC8016469 DOI: 10.1093/bioinformatics/btaa1037] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 11/27/2020] [Accepted: 12/06/2020] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Accurately estimating protein model quality in the absence of experimental structure is not only important for model evaluation and selection, but also useful for model refinement. Progress has been steadily made by introducing new features and algorithms (especially deep neural networks), but the accuracy of quality assessment (QA) is still not very satisfactory, especially local QA on hard protein targets. RESULTS We propose a new single-model-based QA method ResNetQA for both local and global quality assessment. Our method predicts model quality by integrating sequential and pairwise features using a deep neural network composed of both 1 D and 2 D convolutional residual neural networks (ResNet). The 2 D ResNet module extracts useful information from pairwise features such as model-derived distance maps, co-evolution information, and predicted distance potential from sequences. The 1 D ResNet is used to predict local (global) model quality from sequential features and pooled pairwise information generated by 2 D ResNet. Tested on the CASP12 and CASP13 datasets, our experimental results show that our method greatly outperforms existing state-of-the-art methods. Our ablation studies indicate that the 2 D ResNet module and pairwise features play an important role in improving model quality assessment. AVAILABILITY https://github.com/AndersJing/ResNetQA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoyang Jing
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| |
Collapse
|
71
|
Intermolecular Interactions in Crystal Structures of Imatinib-Containing Compounds. Int J Mol Sci 2020; 21:ijms21238970. [PMID: 33255944 PMCID: PMC7731260 DOI: 10.3390/ijms21238970] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 11/23/2020] [Accepted: 11/24/2020] [Indexed: 02/07/2023] Open
Abstract
Imatinib, one of the most used therapeutic agents to treat leukemia, is an inhibitor that specifically blocks the activity of tyrosine kinases. The molecule of imatinib is flexible and contains several functional groups able to take part in H-bonding and hydrophobic interactions. Analysis of molecular conformations for this drug was carried out using density functional theory calculations of rotation potentials along single bonds and by analyzing crystal structures of imatinib-containing compounds taken from the Cambridge Structural Database and the Protein Data Bank. Rotation along the N-C bond in the region of the amide group was found to be the reason for two relatively stable molecular conformations, an extended and a folded one. The role of various types of intermolecular interactions in stabilization of the particular molecular conformation was studied in terms of (i) the likelihood of H-bond formation, and (ii) their contribution to the Voronoi molecular surface. It is shown that experimentally observed hydrogen bonds are in accord with the likelihood of their formation. The number of H-bonds in ligand-receptor complexes surpasses that in imatinib salts due to the large number of donors and acceptors of H-bonding within the binding pocket of tyrosine kinases. Contribution of hydrophilic intermolecular interactions to the Voronoi molecular surface is similar for both conformations, while π...π stacking is more typical for the folded conformation of imatinib.
Collapse
|
72
|
Carriles AA, Mills A, Muñoz-Alonso MJ, Gutiérrez D, Domínguez JM, Hermoso JA, Gago F. Structural Cues for Understanding eEF1A2 Moonlighting. Chembiochem 2020; 22:374-391. [PMID: 32875694 DOI: 10.1002/cbic.202000516] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 09/01/2020] [Indexed: 12/16/2022]
Abstract
Spontaneous mutations in the EEF1A2 gene cause epilepsy and severe neurological disabilities in children. The crystal structure of eEF1A2 protein purified from rabbit skeletal muscle reveals a post-translationally modified dimer that provides information about the sites of interaction with numerous binding partners, including itself, and maps these mutations onto the dimer and tetramer interfaces. The spatial locations of the side chain carboxylates of Glu301 and Glu374, to which phosphatidylethanolamine is uniquely attached via an amide bond, define the anchoring points of eEF1A2 to cellular membranes and interorganellar membrane contact sites. Additional bioinformatic and molecular modeling results provide novel structural insight into the demonstrated binding of eEF1A2 to SH3 domains, the common MAPK docking groove, filamentous actin, and phosphatidylinositol-4 kinase IIIβ. In this new light, the role of eEF1A2 as an ancient, multifaceted, and articulated G protein at the crossroads of autophagy, oncogenesis and viral replication appears very distant from the "canonical" one of delivering aminoacyl-tRNAs to the ribosome that has dominated the scene and much of the thinking for many decades.
Collapse
Affiliation(s)
- Alejandra A Carriles
- Department of Crystallography and Structural Biology, Institute of Physical-Chemistry "Rocasolano" CSIC, 28006, Madrid, Spain.,Biocrystallography Unit, Division of Immunology, Transplantation, and Infectious Diseases, IRCCS Scientific Institute San Raffaele, 20132, Milan, Italy
| | - Alberto Mills
- Department of Biomedical Sciences and "Unidad Asociada IQM-CSIC", School of Medicine and Health Sciences, University of Alcalá, 28805, Alcalá de Henares, Madrid, Spain
| | - María-José Muñoz-Alonso
- Department of Cell Biology and Pharmacogenomics, PharmaMar S.A.U., 28770, Colmenar Viejo, Madrid, Spain
| | - Dolores Gutiérrez
- Proteomics Unit, Faculty of Pharmacy, Complutense University, 28040, Madrid, Spain
| | - Juan M Domínguez
- Department of Cell Biology and Pharmacogenomics, PharmaMar S.A.U., 28770, Colmenar Viejo, Madrid, Spain
| | - Juan A Hermoso
- Department of Crystallography and Structural Biology, Institute of Physical-Chemistry "Rocasolano" CSIC, 28006, Madrid, Spain
| | - Federico Gago
- Department of Biomedical Sciences and "Unidad Asociada IQM-CSIC", School of Medicine and Health Sciences, University of Alcalá, 28805, Alcalá de Henares, Madrid, Spain
| |
Collapse
|
73
|
Wang X, Terashi G, Christoffer CW, Zhu M, Kihara D. Protein docking model evaluation by 3D deep convolutional neural networks. Bioinformatics 2020; 36:2113-2118. [PMID: 31746961 DOI: 10.1093/bioinformatics/btz870] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 08/25/2019] [Accepted: 11/19/2019] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Many important cellular processes involve physical interactions of proteins. Therefore, determining protein quaternary structures provide critical insights for understanding molecular mechanisms of functions of the complexes. To complement experimental methods, many computational methods have been developed to predict structures of protein complexes. One of the challenges in computational protein complex structure prediction is to identify near-native models from a large pool of generated models. RESULTS We developed a convolutional deep neural network-based approach named DOcking decoy selection with Voxel-based deep neural nEtwork (DOVE) for evaluating protein docking models. To evaluate a protein docking model, DOVE scans the protein-protein interface of the model with a 3D voxel and considers atomic interaction types and their energetic contributions as input features applied to the neural network. The deep learning models were trained and validated on docking models available in the ZDock and DockGround databases. Among the different combinations of features tested, almost all outperformed existing scoring functions. AVAILABILITY AND IMPLEMENTATION Codes available at http://github.com/kiharalab/DOVE, http://kiharalab.org/dove/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | | | - Mengmeng Zhu
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA.,Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
74
|
Studer G, Rempfer C, Waterhouse AM, Gumienny R, Haas J, Schwede T. QMEANDisCo-distance constraints applied on model quality estimation. Bioinformatics 2020; 36:1765-1771. [PMID: 31697312 PMCID: PMC7075525 DOI: 10.1093/bioinformatics/btz828] [Citation(s) in RCA: 418] [Impact Index Per Article: 104.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 10/24/2019] [Accepted: 11/06/2019] [Indexed: 01/13/2023] Open
Abstract
Motivation Methods that estimate the quality of a 3D protein structure model in absence of an experimental reference structure are crucial to determine a model’s utility and potential applications. Single model methods assess individual models whereas consensus methods require an ensemble of models as input. In this work, we extend the single model composite score QMEAN that employs statistical potentials of mean force and agreement terms by introducing a consensus-based distance constraint (DisCo) score. Results DisCo exploits distance distributions from experimentally determined protein structures that are homologous to the model being assessed. Feed-forward neural networks are trained to adaptively weigh contributions by the multi-template DisCo score and classical single model QMEAN parameters. The result is the composite score QMEANDisCo, which combines the accuracy of consensus methods with the broad applicability of single model approaches. We also demonstrate that, despite being the de-facto standard for structure prediction benchmarking, CASP models are not the ideal data source to train predictive methods for model quality estimation. For performance assessment, QMEANDisCo is continuously benchmarked within the CAMEO project and participated in CASP13. For both, it ranks among the top performers and excels with low response times. Availability and implementation QMEANDisCo is available as web-server at https://swissmodel.expasy.org/qmean. The source code can be downloaded from https://git.scicore.unibas.ch/schwede/QMEAN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriel Studer
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Christine Rempfer
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Andrew M Waterhouse
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Rafal Gumienny
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Juergen Haas
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| |
Collapse
|
75
|
Grigas AT, Mei Z, Treado JD, Levine ZA, Regan L, O'Hern CS. Using physical features of protein core packing to distinguish real proteins from decoys. Protein Sci 2020; 29:1931-1944. [PMID: 32710566 PMCID: PMC7454528 DOI: 10.1002/pro.3914] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Revised: 07/10/2020] [Accepted: 07/20/2020] [Indexed: 01/06/2023]
Abstract
The ability to consistently distinguish real protein structures from computationally generated model decoys is not yet a solved problem. One route to distinguish real protein structures from decoys is to delineate the important physical features that specify a real protein. For example, it has long been appreciated that the hydrophobic cores of proteins contribute significantly to their stability. We used two sources to obtain datasets of decoys to compare with real protein structures: submissions to the biennial Critical Assessment of protein Structure Prediction competition, in which researchers attempt to predict the structure of a protein only knowing its amino acid sequence, and also decoys generated by 3DRobot, which have user-specified global root-mean-squared deviations from experimentally determined structures. Our analysis revealed that both sets of decoys possess cores that do not recapitulate the key features that define real protein cores. In particular, the model structures appear more densely packed (because of energetically unfavorable atomic overlaps), contain too few residues in the core, and have improper distributions of hydrophobic residues throughout the structure. Based on these observations, we developed a feed-forward neural network, which incorporates key physical features of protein cores, to predict how well a computational model recapitulates the real protein structure without knowledge of the structure of the target sequence. By identifying the important features of protein structure, our method is able to rank decoy structures with similar accuracy to that obtained by state-of-the-art methods that incorporate many additional features. The small number of physical features makes our model interpretable, emphasizing the importance of protein packing and hydrophobicity in protein structure prediction.
Collapse
Affiliation(s)
- Alex T. Grigas
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
| | - Zhe Mei
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of ChemistryYale UniversityNew HavenConnecticutUSA
| | - John D. Treado
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
| | - Zachary A. Levine
- Department of PathologyYale UniversityNew HavenConnecticutUSA
- Department of Molecular Biophysics and BiochemistryYale UniversityNew HavenConnecticutUSA
| | - Lynne Regan
- Institute of Quantitative Biology, Biochemistry and Biotechnology, Centre for Synthetic and Systems Biology, School of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Corey S. O'Hern
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
- Department of PhysicsYale UniversityNew HavenConnecticutUSA
- Department of Applied PhysicsYale UniversityNew HavenConnecticutUSA
| |
Collapse
|
76
|
Abstract
Atom pairwise potential functions make up an essential part of many scoring functions for protein decoy detection. With the development of machine learning (ML) tools, there are multiple ways to combine potential functions to create novel ML models and methods. Potential function parameters can be easily extracted; however, it is usually hard to directly obtain the calculated atom pairwise energies from scoring functions. Amber, as one of the most popular suites of modeling programs, has an extensive history and library of force field potential functions. In this work, we directly used the force field parameters in ff94 and ff14SB from Amber and encoded them to calculate atom pairwise energies for different interactions. Two sets of structures (single amino acid set and a dipeptide set) were used to evaluate the performance of our encoded Amber potentials. From the comparison results between energy terms obtained from our encoding and Amber, we find energy difference within ±0.06 kcal/mol for all tested structures. Previously we have shown that the Random Forest (RF) model can help to emphasize more important atom pairwise interactions and ignore insignificant ones [Pei, J.; Zheng, Z.; Merz, K. M. J. Chem. Inf. Model. 2019, 59, 1919-1929]. Here, as an example of combining ML methods with traditional potential functions, we followed the same work flow to combine the RF models with force field potential functions from Amber. To determine the performance of our RF models with force field potential functions, 224 different protein native-decoy systems were used as our training and testing sets We find that the RF models with ff94 and ff14SB force field parameters outperformed all other scoring functions (RF models with KECSA2, RWplus, DFIRE, dDFIRE, and GOAP) considered in this work for native structure detection, and they performed similarly in detecting the best decoy. Through inclusion of best decoy to decoy comparisons in building our RF models, we were able to generate models that outperformed the score functions tested herein both on accuracy and best decoy detection, again showing the performance and flexibility of our RF models to tackle this problem. Finally, the importance of the RF algorithm and force field parameters were also tested and the comparison results suggest that both the RF algorithm and force field potentials are important with the ML scoring function achieving its best performance only by combining them together. All code and data used in this work are available at https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection.
Collapse
Affiliation(s)
- Jun Pei
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Lin Frank Song
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Kenneth M Merz
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| |
Collapse
|
77
|
Liu T, Wang Z. MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials. BMC Bioinformatics 2020; 21:246. [PMID: 32631256 PMCID: PMC7336608 DOI: 10.1186/s12859-020-3383-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 01/22/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein model quality assessment (QA) is an essential procedure in protein structure prediction. QA methods can predict the qualities of protein models and identify good models from decoys. Clustering-based methods need a certain number of models as input. However, if a pool of models are not available, methods that only need a single model as input are indispensable. RESULTS We developed MASS, a QA method to predict the global qualities of individual protein models using random forests and various novel energy functions. We designed six novel energy functions or statistical potentials that can capture the structural characteristics of a protein model, which can also be used in other protein-related bioinformatics research. MASS potentials demonstrated higher importance than the energy functions of RWplus, GOAP, DFIRE and Rosetta when the scores they generated are used as machine learning features. MASS outperforms almost all of the four CASP11 top-performing single-model methods for global quality assessment in terms of all of the four evaluation criteria officially used by CASP, which measure the abilities to assign relative and absolute scores, identify the best model from decoys, and distinguish between good and bad models. MASS has also achieved comparable performances with the leading QA methods in CASP12 and CASP13. CONCLUSIONS MASS and the source code for all MASS potentials are publicly available at http://dna.cs.miami.edu/MASS/ .
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA.
| |
Collapse
|
78
|
Wang W, Wang J, Xu D, Shang Y. Two New Heuristic Methods for Protein Model Quality Assessment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1430-1439. [PMID: 30418914 PMCID: PMC8988942 DOI: 10.1109/tcbb.2018.2880202] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Protein tertiary structure prediction is an important open challenge in bioinformatics and requires effective methods to accurately evaluate the quality of protein 3-D models generated computationally. Many quality assessment (QA) methods have been proposed over the past three decades. However, the accuracy or robustness is unsatisfactory for practical applications. In this paper, two new heuristic QA methods are proposed: MUfoldQA_S and MUfoldQA_C. The MUfoldQA_S is a quasi-single-model QA method that assesses the model quality based on the known protein structures with similar sequences. This algorithm can be directly applied to protein fragments without the necessity of building a full structural model. A BLOSUM-based heuristic is also introduced to help differentiate accurate templates from poor ones. In MUfoldQA_C, the ideas from MUfoldQA_S were combined with the consensus approach to create a multi-model QA method that could also utilize information from existing reference models and have demonstrated improved performance. Extensive experimental results of these two methods have shown significant improvement over existing methods. In addition, both methods have been blindly tested in the CASP12 world-wide competition in the protein structure prediction field and ranked as top performers in their respective categories.
Collapse
|
79
|
Rosell M, Fernández-Recio J. Docking approaches for modeling multi-molecular assemblies. Curr Opin Struct Biol 2020; 64:59-65. [PMID: 32615514 PMCID: PMC7324114 DOI: 10.1016/j.sbi.2020.05.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 05/13/2020] [Accepted: 05/21/2020] [Indexed: 12/12/2022]
Abstract
Computational docking approaches aim to overcome the limited availability of experimental structural data on protein-protein interactions, which are key in biology. The field is rapidly moving from the traditional docking methodologies for modeling of binary complexes to more integrative approaches using template-based, data-driven modeling of multi-molecular assemblies. We will review here the predictive capabilities of current docking methods in blind conditions, based on the results from the most recent community-wide blind experiments. Integration of template-based and ab initio docking approaches is emerging as the optimal strategy for modeling protein complexes and multimolecular assemblies. We will also review the new methodological advances on ab initio docking and integrative modeling.
Collapse
Affiliation(s)
- Mireia Rosell
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain; Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de La Rioja - Gobierno de La Rioja, 26007 Logroño, Spain
| | - Juan Fernández-Recio
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain; Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de La Rioja - Gobierno de La Rioja, 26007 Logroño, Spain.
| |
Collapse
|
80
|
Miranda MRA, Uchôa AF, Ferreira SR, Ventury KE, Costa EP, Carmo PRL, Machado OLT, Fernandes KVS, Amancio Oliveira AE. Chemical Modifications of Vicilins Interfere with Chitin-Binding Affinity and Toxicity to Callosobruchus maculatus (Coleoptera: Chrysomelidae) Insect: A Combined In Vitro and In Silico Analysis. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2020; 68:5596-5605. [PMID: 32343573 DOI: 10.1021/acs.jafc.9b08034] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Vicilins are related to cowpea seed resistance toward Callosobruchus maculatus due to their ability to bind to chitinous structures lining larval midgut. However, this binding mechanism is not fully understood. Here, we identified chitin binding sites and investigated how in vitro and in silico chemical modifications interfere with vicilin chitin binding and insect toxicity. In vitro assays showed that unmodified vicilin strongly binds to chitin matrices, mainly with acetylated chitin. Chemical modifications of specific amino acids (tryptophan, lysine, tyrosine), as well as glutaraldehyde cross-linking, decreased the evaluated parameters. In silico analyses identified at least one chitin binding site in vicilin monomer, the region between Arg208 and Lys216, which bears the sequence REGIRELMK and forms an α helix, exposed in the 3D structure. In silico modifications of Lys223 (acetylated at its terminal nitrogen) and Trp316 (iodinated to 7-iodine-L-tryptophan or oxidized to β-oxy-indolylalanine) decreased vicilin chitin binding affinity. Glucose, sucrose, and N-acetylglucosamine also interfered with vicilin chitin binding affinity.
Collapse
Affiliation(s)
- Maria Raquel A Miranda
- Departamento de Bioquímica, Centro de Ciências, Universidade Federal do Ceará (UFC), Fortaleza Ceará 60440554, Brazil
| | - Adriana F Uchôa
- Departamento de Biologia Celular e Genética, Centro de Biociências, Universidade Federal do Rio Grande do Norte, Natal, Rio Grande do Norte 59072970, Brazil
| | - Sarah R Ferreira
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Kayan E Ventury
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Evenilton P Costa
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Paulo R Leitão Carmo
- NUPEN, Universidade Federal do Rio de Janeiro (UFRJ) Macaé, Rio de Janeiro 27965-045, Brazil
| | - Olga L T Machado
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Katia V S Fernandes
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Antonia Elenir Amancio Oliveira
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| |
Collapse
|
81
|
Olechnovič K, Venclovas Č. VoroMQA web server for assessing three-dimensional structures of proteins and protein complexes. Nucleic Acids Res 2020; 47:W437-W442. [PMID: 31073605 PMCID: PMC6602437 DOI: 10.1093/nar/gkz367] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 04/19/2019] [Accepted: 05/05/2019] [Indexed: 01/12/2023] Open
Abstract
The VoroMQA (Voronoi tessellation-based Model Quality Assessment) web server is dedicated to the estimation of protein structure quality, a common step in selecting realistic and most accurate computational models and in validating experimental structures. As an input, the VoroMQA web server accepts one or more protein structures in PDB format. Input structures may be either monomeric proteins or multimeric protein complexes. For every input structure, the server provides both global and local (per-residue) scores. Visualization of the local scores along the protein chain is enhanced by providing secondary structure assignment and information on solvent accessibility. A unique feature of the VoroMQA server is the ability to directly assess protein-protein interaction interfaces. If this type of assessment is requested, the web server provides interface quality scores, interface energy estimates, and local scores for residues involved in inter-chain interfaces. VoroMQA, the underlying method of the web server, was extensively tested in recent community-wide CASP and CAPRI experiments. During these experiments VoroMQA showed outstanding performance both in model selection and in estimation of accuracy of local structural regions. The VoroMQA web server is available at http://bioinformatics.ibt.lt/wtsam/voromqa.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| |
Collapse
|
82
|
McGuffin LJ, Adiyaman R, Maghrabi AHA, Shuid AN, Brackenridge DA, Nealon JO, Philomina LS. IntFOLD: an integrated web resource for high performance protein structure and function prediction. Nucleic Acids Res 2020; 47:W408-W413. [PMID: 31045208 PMCID: PMC6602432 DOI: 10.1093/nar/gkz322] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 04/05/2019] [Accepted: 04/23/2019] [Indexed: 12/14/2022] Open
Abstract
The IntFOLD server provides a unified resource for the automated prediction of: protein tertiary structures with built-in estimates of model accuracy (EMA), protein structural domain boundaries, natively unstructured or disordered regions in proteins, and protein–ligand interactions. The component methods have been independently evaluated via the successive blind CASP experiments and the continual CAMEO benchmarking project. The IntFOLD server has established its ranking as one of the best performing publicly available servers, based on independent official evaluation metrics. Here, we describe significant updates to the server back end, where we have focused on performance improvements in tertiary structure predictions, in terms of global 3D model quality and accuracy self-estimates (ASE), which we achieve using our newly improved ModFOLD7_rank algorithm. We also report on various upgrades to the front end including: a streamlined submission process, enhanced visualization of models, new confidence scores for ranking, and links for accessing all annotated model data. Furthermore, we now include an option for users to submit selected models for further refinement via convenient push buttons. The IntFOLD server is freely available at: http://www.reading.ac.uk/bioinf/IntFOLD/.
Collapse
Affiliation(s)
- Liam J McGuffin
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Recep Adiyaman
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Ali H A Maghrabi
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Ahmad N Shuid
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK.,Infectomics cluster, Advanced Medical and Dental Institute, University of Science, Malaysia, Bertam, 13200, Kepala Batas, Pulau Pinang, Malaysia
| | | | - John O Nealon
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Limcy S Philomina
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| |
Collapse
|
83
|
Chen J, Siu SWI. Machine Learning Approaches for Quality Assessment of Protein Structures. Biomolecules 2020; 10:biom10040626. [PMID: 32316682 PMCID: PMC7226485 DOI: 10.3390/biom10040626] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 04/07/2020] [Accepted: 04/09/2020] [Indexed: 11/16/2022] Open
Abstract
Protein structures play a very important role in biomedical research, especially in drug discovery and design, which require accurate protein structures in advance. However, experimental determinations of protein structure are prohibitively costly and time-consuming, and computational predictions of protein structures have not been perfected. Methods that assess the quality of protein models can help in selecting the most accurate candidates for further work. Driven by this demand, many structural bioinformatics laboratories have developed methods for estimating model accuracy (EMA). In recent years, EMA by machine learning (ML) have consistently ranked among the top-performing methods in the community-wide CASP challenge. Accordingly, we systematically review all the major ML-based EMA methods developed within the past ten years. The methods are grouped by their employed ML approach-support vector machine, artificial neural networks, ensemble learning, or Bayesian learning-and their significances are discussed from a methodology viewpoint. To orient the reader, we also briefly describe the background of EMA, including the CASP challenge and its evaluation metrics, and introduce the major ML/DL techniques. Overall, this review provides an introductory guide to modern research on protein quality assessment and directions for future research in this area.
Collapse
|
84
|
da Silva FCV, Pessoa Costa E, Moreira Gomes V, de Oliveira Carvalho A. Inhibition mechanism of human salivary α-amylase by lipid transfer protein from Vigna unguiculata. Comput Biol Chem 2020; 85:107193. [DOI: 10.1016/j.compbiolchem.2019.107193] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 12/06/2019] [Accepted: 12/11/2019] [Indexed: 01/09/2023]
|
85
|
Olechnovič K, Venclovas Č. Contact Area-Based Structural Analysis of Proteins and Their Complexes Using CAD-Score. Methods Mol Biol 2020; 2112:75-90. [PMID: 32006279 DOI: 10.1007/978-1-0716-0270-6_6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Quantifying discrepancies between computationally derived and native (reference) structures is an essential step in the development and comparison of protein modeling and protein-protein docking methods. Measuring conformational differences of proteins or protein complexes is also important in other areas of structural biology such as molecular dynamics and crystallography. There are multiple scores to do that. However, nearly all of them, whether superposition-based (e.g., RMSD) or superposition-free, use distances to measure similarity. CAD-score is conceptually different as it uses physical contacts represented as contact areas. Such representation makes it possible to quantify differences of both structures and surfaces (e.g., protein-protein interfaces and binding sites) using the same framework. A number of studies have found CAD-score to be among the most robust scores. The method is implemented both as a web server and as standalone software available at http://bioinformatics.lt/software/cad-score . Here, we describe how to use the standalone CAD-score software for comparison and analysis of protein structures, interfaces, and binding sites.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania.
| |
Collapse
|
86
|
Abstract
There is a large gap between the numbers of known protein-protein interactions and the corresponding experimentally solved structures of protein complexes. Fortunately, this gap can be in part bridged by computational structure modeling methods. Currently, template-based modeling is the most accurate means to predict both individual protein structures and protein complexes. One of the major issues in template-based modeling is to identify homologous structures that could be utilized as templates. To simplify this task, we have developed the PPI3D web server. The server is not only able to search for homologous protein complexes, but also provides means to analyze identified interactions and to model protein complexes. In recent CASP and CAPRI experiments, PPI3D proved to be a useful tool for homology modeling of multimeric proteins. In this chapter, we provide a brief description of the PPI3D web server capabilities and how to use the server for modeling of protein complexes.
Collapse
Affiliation(s)
- Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania.
| |
Collapse
|
87
|
Abstract
Assessing the accuracy of 3D models has become a keystone in the protein structure prediction field. ModFOLD7 is our leading resource for Estimates of Model Accuracy (EMA), which has been upgraded by integrating a number of the pioneering pure-single- and quasi-single-model approaches. Such an integration has given our latest version the strengths to accurately score and rank predicted models, with higher consistency compared to older EMA methods. Additionally, the server provides three options for producing global score estimates, depending on the requirements of the user: (1) ModFOLD7_rank, which is optimized for ranking/selection, (2) ModFOLD7_cor, which is optimized for correlations of predicted and observed scores, and (3) ModFOLD7 global for balanced performance. ModFOLD7 has been ranked among the top few EMA methods according to independent blind testing by the CASP13 assessors. Another evaluation resource for ModFOLD7 is the CAMEO project, where the method is continuously automatically evaluated, showing a significant improvement compared to our previous versions. The ModFOLD7 server is freely available at http://www.reading.ac.uk/bioinf/ModFOLD/ .
Collapse
Affiliation(s)
- Ali H A Maghrabi
- School of Biological Sciences, University of Reading, Reading, Berkshire, UK
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, Berkshire, UK.
| |
Collapse
|
88
|
Lensink MF, Brysbaert G, Nadzirin N, Velankar S, Chaleil RAG, Gerguri T, Bates PA, Laine E, Carbone A, Grudinin S, Kong R, Liu RR, Xu XM, Shi H, Chang S, Eisenstein M, Karczynska A, Czaplewski C, Lubecka E, Lipska A, Krupa P, Mozolewska M, Golon Ł, Samsonov S, Liwo A, Crivelli S, Pagès G, Karasikov M, Kadukova M, Yan Y, Huang SY, Rosell M, Rodríguez-Lumbreras LA, Romero-Durana M, Díaz-Bueno L, Fernandez-Recio J, Christoffer C, Terashi G, Shin WH, Aderinwale T, Subraman SRMV, Kihara D, Kozakov D, Vajda S, Porter K, Padhorny D, Desta I, Beglov D, Ignatov M, Kotelnikov S, Moal IH, Ritchie DW, de Beauchêne IC, Maigret B, Devignes MD, Echartea MER, Barradas-Bautista D, Cao Z, Cavallo L, Oliva R, Cao Y, Shen Y, Baek M, Park T, Woo H, Seok C, Braitbard M, Bitton L, Scheidman-Duhovny D, Dapkūnas J, Olechnovič K, Venclovas Č, Kundrotas PJ, Belkin S, Chakravarty D, Badal VD, Vakser IA, Vreven T, Vangaveti S, Borrman T, Weng Z, Guest JD, Gowthaman R, Pierce BG, Xu X, Duan R, Qiu L, Hou J, Merideth BR, Ma Z, Cheng J, Zou X, Koukos PI, Roel-Touris J, Ambrosetti F, Geng C, Schaarschmidt J, Trellet ME, Melquiond ASJ, Xue L, Jiménez-García B, van Noort CW, Honorato RV, Bonvin AMJJ, Wodak SJ. Blind prediction of homo- and hetero-protein complexes: The CASP13-CAPRI experiment. Proteins 2019; 87:1200-1221. [PMID: 31612567 PMCID: PMC7274794 DOI: 10.1002/prot.25838] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 12/28/2022]
Abstract
We present the results for CAPRI Round 46, the third joint CASP-CAPRI protein assembly prediction challenge. The Round comprised a total of 20 targets including 14 homo-oligomers and 6 heterocomplexes. Eight of the homo-oligomer targets and one heterodimer comprised proteins that could be readily modeled using templates from the Protein Data Bank, often available for the full assembly. The remaining 11 targets comprised 5 homodimers, 3 heterodimers, and two higher-order assemblies. These were more difficult to model, as their prediction mainly involved "ab-initio" docking of subunit models derived from distantly related templates. A total of ~30 CAPRI groups, including 9 automatic servers, submitted on average ~2000 models per target. About 17 groups participated in the CAPRI scoring rounds, offered for most targets, submitting ~170 models per target. The prediction performance, measured by the fraction of models of acceptable quality or higher submitted across all predictors groups, was very good to excellent for the nine easy targets. Poorer performance was achieved by predictors for the 11 difficult targets, with medium and high quality models submitted for only 3 of these targets. A similar performance "gap" was displayed by scorer groups, highlighting yet again the unmet challenge of modeling the conformational changes of the protein components that occur upon binding or that must be accounted for in template-based modeling. Our analysis also indicates that residues in binding interfaces were less well predicted in this set of targets than in previous Rounds, providing useful insights for directions of future improvements.
Collapse
Affiliation(s)
- Marc F. Lensink
- University of Lille, CNRS UMR8576 UGSF, Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| | - Guillaume Brysbaert
- University of Lille, CNRS UMR8576 UGSF, Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| | - Nurul Nadzirin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | - Tereza Gerguri
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK
| | - Paul A. Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK
| | - Elodie Laine
- CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Sorbonne Université, Paris, France
| | - Alessandra Carbone
- CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Sorbonne Université, Paris, France
- Institut Universitaire de France (IUF), Paris, France
| | - Sergei Grudinin
- Université Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, Grenoble, France
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Ran-Ran Liu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xi-Ming Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Hang Shi
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Miriam Eisenstein
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | | | | | - Emilia Lubecka
- Institute of Informatics, Faculty of Mathematics, Physics, and Informatics, University of Gdańsk, Gdańsk, Poland
| | | | - Paweł Krupa
- Polish Academy of Sciences, Institute of Physics, Warsaw, Poland
| | | | - Łukasz Golon
- Faculty of Chemistry, University of Gdańsk, Gdańsk, Poland
| | | | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Gdańsk, Poland
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | | | - Guillaume Pagès
- Université Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, Grenoble, France
| | | | - Maria Kadukova
- Université Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, Grenoble, France
- Moscow Institute of Physics and Technology, Dolgoprudniy, Russia
| | - Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Mireia Rosell
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Instituto de Ciencias de la Vid y del Vino (ICVV-CSIC), Logroño, Spain
| | - Luis A. Rodríguez-Lumbreras
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Instituto de Ciencias de la Vid y del Vino (ICVV-CSIC), Logroño, Spain
| | | | | | - Juan Fernandez-Recio
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Instituto de Ciencias de la Vid y del Vino (ICVV-CSIC), Logroño, Spain
- Instituto de Biología Molecular de Barcelona (IBMB-CSIC), Barcelona, Spain
| | | | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana
| | - Woong-Hee Shin
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana
| | - Tunde Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, Indiana
| | | | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, Indiana
| | - Dima Kozakov
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts
- Department of Chemistry, Boston University, Boston, Massachusetts
| | - Kathryn Porter
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Dzmitry Padhorny
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - Israel Desta
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Dmitri Beglov
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Mikhail Ignatov
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - Sergey Kotelnikov
- Moscow Institute of Physics and Technology, Dolgoprudniy, Russia
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - Iain H. Moal
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | | | | | | - Didier Barradas-Bautista
- Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Zhen Cao
- Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Luigi Cavallo
- Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Romina Oliva
- Department of Sciences and Technologies, University of Naples “Parthenope”, Napoli, Italy
| | - Yue Cao
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas
| | - Minkyung Baek
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Taeyong Park
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Hyeonuk Woo
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Merav Braitbard
- Department of Biological Chemistry, Institute of Live Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Lirane Bitton
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Dina Scheidman-Duhovny
- Department of Biological Chemistry, Institute of Live Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Petras J. Kundrotas
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas
| | - Saveliy Belkin
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas
| | - Devlina Chakravarty
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas
| | - Varsha D. Badal
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas
| | - Ilya A. Vakser
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas
| | - Thom Vreven
- Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts
| | - Sweta Vangaveti
- Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts
| | - Tyler Borrman
- Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts
| | - Zhiping Weng
- Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts
| | - Johnathan D. Guest
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland
| | - Ragul Gowthaman
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland
| | - Brian G. Pierce
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland
| | - Xianjin Xu
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri
| | - Rui Duan
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri
| | - Liming Qiu
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri
| | - Jie Hou
- Department of Computer Science, University of Missouri, Columbia, Missouri
| | - Benjamin Ryan Merideth
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri
- Informatics Institute, University of Missouri, Columbia, Missouri
| | - Zhiwei Ma
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, Missouri
- Informatics Institute, University of Missouri, Columbia, Missouri
| | - Xiaoqin Zou
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri
- Informatics Institute, University of Missouri, Columbia, Missouri
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri
- Department of Biochemistry, University of Missouri, Columbia, Missouri
| | - Panagiotis I. Koukos
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Jorge Roel-Touris
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Francesco Ambrosetti
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Cunliang Geng
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Jörg Schaarschmidt
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Mikael E. Trellet
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Adrien S. J. Melquiond
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Li Xue
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Brian Jiménez-García
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Charlotte W. van Noort
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Rodrigo V. Honorato
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Alexandre M. J. J. Bonvin
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | | |
Collapse
|
89
|
Simpkin AJ, Thomas JMH, Simkovic F, Keegan RM, Rigden DJ. Molecular replacement using structure predictions from databases. Acta Crystallogr D Struct Biol 2019; 75:1051-1062. [PMID: 31793899 PMCID: PMC6889911 DOI: 10.1107/s2059798319013962] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 10/12/2019] [Indexed: 01/19/2023] Open
Abstract
Molecular replacement (MR) is the predominant route to solution of the phase problem in macromolecular crystallography. Where the lack of a suitable homologue precludes conventional MR, one option is to predict the target structure using bioinformatics. Such modelling, in the absence of homologous templates, is called ab initio or de novo modelling. Recently, the accuracy of such models has improved significantly as a result of the availability, in many cases, of residue-contact predictions derived from evolutionary covariance analysis. Covariance-assisted ab initio models representing structurally uncharacterized Pfam families are now available on a large scale in databases, potentially representing a valuable and easily accessible supplement to the PDB as a source of search models. Here, the unconventional MR pipeline AMPLE is employed to explore the value of structure predictions in the GREMLIN and PconsFam databases. It was tested whether these deposited predictions, processed in various ways, could solve the structures of PDB entries that were subsequently deposited. The results were encouraging: nine of 27 GREMLIN cases were solved, covering target lengths of 109-355 residues and a resolution range of 1.4-2.9 Å, and with target-model shared sequence identity as low as 20%. The cluster-and-truncate approach in AMPLE proved to be essential for most successes. For the overall lower quality structure predictions in the PconsFam database, remodelling with Rosetta within the AMPLE pipeline proved to be the best approach, generating ensemble search models from single-structure deposits. Finally, it is shown that the AMPLE-obtained search models deriving from GREMLIN deposits are of sufficiently high quality to be selected by the sequence-independent MR pipeline SIMBAD. Overall, the results help to point the way towards the optimal use of the expanding databases of ab initio structure predictions.
Collapse
Affiliation(s)
- Adam J. Simpkin
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Jens M. H. Thomas
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Felix Simkovic
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Ronan M. Keegan
- STFC, Rutherford Appleton Laboratory, Research Complex at Harwell, Didcot OX11 0FA, England
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
90
|
Christoffer C, Terashi G, Shin WH, Aderinwale T, Maddhuri Venkata Subramaniya SR, Peterson L, Verburgt J, Kihara D. Performance and enhancement of the LZerD protein assembly pipeline in CAPRI 38-46. Proteins 2019; 88:948-961. [PMID: 31697428 DOI: 10.1002/prot.25850] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 10/07/2019] [Accepted: 11/03/2019] [Indexed: 01/17/2023]
Abstract
We report the performance of the protein docking prediction pipeline of our group and the results for Critical Assessment of Prediction of Interactions (CAPRI) rounds 38-46. The pipeline integrates programs developed in our group as well as other existing scoring functions. The core of the pipeline is the LZerD protein-protein docking algorithm. If templates of the target complex are not found in PDB, the first step of our docking prediction pipeline is to run LZerD for a query protein pair. Meanwhile, in the case of human group prediction, we survey the literature to find information that can guide the modeling, such as protein-protein interface information. In addition to any literature information and binding residue prediction, generated docking decoys were selected by a rank aggregation of statistical scoring functions. The top 10 decoys were relaxed by a short molecular dynamics simulation before submission to remove atom clashes and improve side-chain conformations. In these CAPRI rounds, our group, particularly the LZerD server, showed robust performance. On the other hand, there are failed cases where some other groups were successful. To understand weaknesses of our pipeline, we analyzed sources of errors for failed targets. Since we noted that structure refinement is a step that needs improvement, we newly performed a comparative study of several refinement approaches. Finally, we show several examples that illustrate successful and unsuccessful cases by our group.
Collapse
Affiliation(s)
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana
| | - Woong-Hee Shin
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana.,Department of Chemistry Education, Sunchon National University, Suncheon, Jeollanam-do, Republic of Korea
| | - Tunde Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, Indiana
| | | | - Lenna Peterson
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana
| | - Jacob Verburgt
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, Indiana.,Department of Biological Sciences, Purdue University, West Lafayette, Indiana.,Purdue University Center for Cancer Research, Purdue University, West Lafayette, Indiana.,Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio
| |
Collapse
|
91
|
Dapkūnas J, Kairys V, Olechnovič K, Venclovas Č. Template-based modeling of diverse protein interactions in CAPRI rounds 38-45. Proteins 2019; 88:939-947. [PMID: 31697420 DOI: 10.1002/prot.25845] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 11/03/2019] [Indexed: 11/09/2022]
Abstract
Structures of proteins complexed with other proteins, peptides, or ligands are essential for investigation of molecular mechanisms. However, the experimental structures of protein complexes of interest are often not available. Therefore, computational methods are widely used to predict these structures, and, of those methods, template-based modeling is the most successful. In the rounds 38-45 of the Critical Assessment of PRediction of Interactions (CAPRI), we applied template-based modeling for 9 of 11 protein-protein and protein-peptide interaction targets, resulting in medium and high-quality models for six targets. For the protein-oligosaccharide docking targets, we used constraints derived from template structures, and generated models of at least acceptable quality for most of the targets. Apparently, high flexibility of oligosaccharide molecules was the main cause preventing us from obtaining models of higher quality. We also participated in the CAPRI scoring challenge, the goal of which was to identify the highest quality models from a large pool of decoys. In this experiment, we tested VoroMQA, a scoring method based on interatomic contact areas. The results showed VoroMQA to be quite effective in scoring strongly binding and obligatory protein complexes, but less successful in the case of transient interactions. We extensively used manual intervention in both CAPRI modeling and scoring experiments. This oftentimes allowed us to select the correct templates from available alternatives and to limit the search space during the model scoring.
Collapse
Affiliation(s)
- Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Visvaldas Kairys
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
92
|
Long S, Tian P. A simple neural network implementation of generalized solvation free energy for assessment of protein structural models. RSC Adv 2019; 9:36227-36233. [PMID: 35540566 PMCID: PMC9074945 DOI: 10.1039/c9ra05168f] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 10/14/2019] [Indexed: 11/21/2022] Open
Abstract
Rapid and accurate assessment of protein structural models is essential for protein structure prediction and design. Great progress has been made in this regard, especially by recent application of "knowledge-based" potentials. Various machine learning based protein structural model quality assessment methods are also quite successful. However, performance of traditional "physics-based" models has not been as effective. Based on our analysis of the fundamental computational limitation behind unsatisfactory performance of "physics-based" models, we propose a generalized solvation free energy (GSFE) framework, which is intrinsically flexible for multi-scale treatments and is amenable for machine learning implementation. Finally, we implemented a simple example of backbone-based residue level GSFE with neural network, which was found to have competitive performance when compared with highly complex latest "knowledge-based" atomic potentials in distinguishing native structures from decoys.
Collapse
Affiliation(s)
- Shiyang Long
- School of Chemistry, Jilin University Changchun China
| | - Pu Tian
- School of Life Science and School of Artificial Intelligence, Jilin University 2699 Qianjin Street Changchun China 130012
| |
Collapse
|
93
|
Haas J, Gumienny R, Barbato A, Ackermann F, Tauriello G, Bertoni M, Studer G, Smolinski A, Schwede T. Introducing "best single template" models as reference baseline for the Continuous Automated Model Evaluation (CAMEO). Proteins 2019; 87:1378-1387. [PMID: 31571280 DOI: 10.1002/prot.25815] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2019] [Revised: 09/10/2019] [Accepted: 09/13/2019] [Indexed: 12/17/2022]
Abstract
Critical blind assessment of structure prediction techniques is crucial for the scientific community to establish the state of the art, identify bottlenecks, and guide future developments. In Critical Assessment of Techniques in Structure Prediction (CASP), human experts assess the performance of participating methods in relation to the difficulty of the prediction task in a biennial experiment on approximately 100 targets. Yet, the development of automated computational modeling methods requires more frequent evaluation cycles and larger sets of data. The "Continuous Automated Model EvaluatiOn (CAMEO)" platform complements CASP by conducting fully automated blind prediction evaluations based on the weekly pre-release of sequences of those structures, which are going to be published in the next release of the Protein Data Bank (PDB). Each week, CAMEO publishes benchmarking results for predictions corresponding to a set of about 20 targets collected during a 4-day prediction window. CAMEO benchmarking data are generated consistently for all methods at the same point in time, enabling developers to cross-validate their method's performance, and referring to their results in publications. Many successful participants of CASP have used CAMEO-either by directly benchmarking their methods within the system or by comparing their own performance to CAMEO reference data. CAMEO offers a variety of scores reflecting different aspects of structure modeling, for example, binding site accuracy, homo-oligomer interface quality, or accuracy of local model confidence estimates. By introducing the "bestSingleTemplate" method based on structure superpositions as a reference for the accuracy of 3D modeling predictions, CAMEO facilitates objective comparison of techniques and fosters the development of advanced methods.
Collapse
Affiliation(s)
- Juergen Haas
- Computational Structural Biology, University of Basel, Switzerland
| | - Rafal Gumienny
- Computational Structural Biology, Swiss Institute of Bioinformatics, Switzerland
| | - Alessandro Barbato
- Computational Structural Biology, Universitat Basel Department Biozentrum, Switzerland
| | - Flavio Ackermann
- Computational Structural Biology, University of Basel, Switzerland
| | | | - Martino Bertoni
- Computational Structural Biology, Universitat Basel Department Biozentrum, Switzerland
| | - Gabriel Studer
- Computational Structural Biology, University of Basel, Switzerland
| | - Anna Smolinski
- Computational Structural Biology, University of Basel, Switzerland
| | - Torsten Schwede
- Computational Structural Biology, University of Basel, Switzerland
| |
Collapse
|
94
|
Tobias-Santos V, Guerra-Almeida D, Mury F, Ribeiro L, Berni M, Araujo H, Logullo C, Feitosa NM, de Souza-Menezes J, Pessoa Costa E, Nunes-da-Fonseca R. Multiple Roles of the Polycistronic Gene Tarsal-less/Mille-Pattes/Polished-Rice During Embryogenesis of the Kissing Bug Rhodnius prolixus. Front Ecol Evol 2019. [DOI: 10.3389/fevo.2019.00379] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
95
|
Sato R, Ishida T. Protein model accuracy estimation based on local structure quality assessment using 3D convolutional neural network. PLoS One 2019; 14:e0221347. [PMID: 31487288 PMCID: PMC6728020 DOI: 10.1371/journal.pone.0221347] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 08/05/2019] [Indexed: 11/23/2022] Open
Abstract
In protein tertiary structure prediction, model quality assessment programs (MQAPs) are often used to select the final structural models from a pool of candidate models generated by multiple templates and prediction methods. The 3-dimensional convolutional neural network (3DCNN) is an expansion of the 2DCNN and has been applied in several fields, including object recognition. The 3DCNN is also used for MQA tasks, but the performance is low due to several technical limitations related to protein tertiary structures, such as orientation alignment. We proposed a novel single-model MQA method based on local structure quality evaluation using a deep neural network containing 3DCNN layers. The proposed method first assesses the quality of local structures for each residue and then evaluates the quality of whole structures by integrating estimated local qualities. We analyzed the model using the CASP11, CASP12, and 3D-Robot datasets and compared the performance of the model with that of the previous 3DCNN method based on whole protein structures. The proposed method showed a significant improvement compared to the previous 3DCNN method for multiple evaluation measures. We also compared the proposed method to other state-of-the-art methods. Our method showed better performance than the previous 3DCNN-based method and comparable accuracy as the current best single-model methods; particularly, in CASP11 stage2, our method showed a Pearson coefficient of 0.486, which was better than those of the best single-model methods (0.366–0.405). A standalone version of the proposed method and data files are available at https://github.com/ishidalab-titech/3DCNN_MQA.
Collapse
Affiliation(s)
- Rin Sato
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan
- * E-mail:
| |
Collapse
|
96
|
Panetto OS, Gomes HF, Fraga Gomes DS, Campos E, Romeiro NC, Costa EP, do Carmo PRL, Feitosa NM, Moraes J. The effects of Roundup® in embryo development and energy metabolism of the zebrafish (Danio rerio). Comp Biochem Physiol C Toxicol Pharmacol 2019; 222:74-81. [PMID: 30981909 DOI: 10.1016/j.cbpc.2019.04.007] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 04/05/2019] [Accepted: 04/08/2019] [Indexed: 12/28/2022]
Abstract
Roundup® is currently the most widely used and sold agricultural pesticide in the world. The objective of this work was to investigate the effects of Roundup® on energy metabolism during zebrafish (Danio rerio) embryogenesis. The embryo toxicity test was performed for 96 h post-fertilisation and the sublethal concentration of Roundup® was defined as 58.3 mg/L, which resulted in failure to inflate the swim bladder. Biochemical assays were performed with viable embryos following glyphosate exposure, and no significant effects on protein, glucose, glycogen, triglyceride levels or the enzymatic activities of alanine aminotransferase and aspartate aminotransferase were observed. However, the activity of hexokinase was significantly altered following exposure to 11.7 mg/L Roundup®. Through molecular docking we have shown for the first time that the interactions of glucokinase and hexokinases 1 and 2 with glyphosate showed significant interactions in the active sites, corroborating the biochemical results of hexokinase activity in zebrafish exposed to the chemical. From the results of molecular docking interactions carried out on the Zfishglucok, ZfishHK1 and ZfishHK2 models with the glyphosate linker, it can be concluded that there are significant interactions between glyphosate and active sites of glucokinase and hexokinase 1 and 2 proteins. The present work suggests that Roundup® can induce problems in fish embryogenesis relating to the incapacity of swim bladder to inflate. This represents the first study demonstrating the interaction of glyphosate with hexokinase and its isoforms.
Collapse
Affiliation(s)
- Ottassano S Panetto
- Laboratório Integrado de Bioquímica Hatisaburo Masuda, NUPEM, Núcleo em Ecologia e Desenvolvimento Ambiental de Macaé, Universidade Federal do Rio de Janeiro, Avenida São José Barreto, N° 764, Bairro: São José do Barreto, Macaé, RJ CEP: 27.965-045, Brazil
| | - Helga F Gomes
- Laboratório Integrado de Bioquímica Hatisaburo Masuda, NUPEM, Núcleo em Ecologia e Desenvolvimento Ambiental de Macaé, Universidade Federal do Rio de Janeiro, Avenida São José Barreto, N° 764, Bairro: São José do Barreto, Macaé, RJ CEP: 27.965-045, Brazil
| | - Danielle S Fraga Gomes
- Laboratório Integrado de Bioquímica Hatisaburo Masuda, NUPEM, Núcleo em Ecologia e Desenvolvimento Ambiental de Macaé, Universidade Federal do Rio de Janeiro, Avenida São José Barreto, N° 764, Bairro: São José do Barreto, Macaé, RJ CEP: 27.965-045, Brazil
| | - Eldo Campos
- Laboratório Integrado de Bioquímica Hatisaburo Masuda, NUPEM, Núcleo em Ecologia e Desenvolvimento Ambiental de Macaé, Universidade Federal do Rio de Janeiro, Avenida São José Barreto, N° 764, Bairro: São José do Barreto, Macaé, RJ CEP: 27.965-045, Brazil
| | - Nelilma C Romeiro
- Laboratório Integrado de Computação Científica-LICC-NUPEM, Núcleo em Ecologia e Desenvolvimento Ambiental de Macaé, Universidade Federal do Rio de Janeiro, Avenida São José Barreto, N° 764, Bairro: São José do Barreto, Macaé, RJ CEP: 27.965-045, Brazil
| | - Evenilton P Costa
- Laboratório Integrado de Computação Científica-LICC-NUPEM, Núcleo em Ecologia e Desenvolvimento Ambiental de Macaé, Universidade Federal do Rio de Janeiro, Avenida São José Barreto, N° 764, Bairro: São José do Barreto, Macaé, RJ CEP: 27.965-045, Brazil
| | - Paulo R L do Carmo
- Laboratório Integrado de Computação Científica-LICC-NUPEM, Núcleo em Ecologia e Desenvolvimento Ambiental de Macaé, Universidade Federal do Rio de Janeiro, Avenida São José Barreto, N° 764, Bairro: São José do Barreto, Macaé, RJ CEP: 27.965-045, Brazil
| | - Natália M Feitosa
- Laboratório Integrado de Ciências Morfofuncionais, NUPEM, Núcleo em Ecologia e Desenvolvimento Ambiental de Macaé, Universidade Federal do Rio de Janeiro, Avenida São José Barreto, N° 764, Bairro: São José do Barreto, Macaé, RJ CEP: 27.965-045, Brazil
| | - Jorge Moraes
- Laboratório Integrado de Bioquímica Hatisaburo Masuda, NUPEM, Núcleo em Ecologia e Desenvolvimento Ambiental de Macaé, Universidade Federal do Rio de Janeiro, Avenida São José Barreto, N° 764, Bairro: São José do Barreto, Macaé, RJ CEP: 27.965-045, Brazil.
| |
Collapse
|
97
|
Dapkūnas J, Olechnovič K, Venclovas Č. Structural modeling of protein complexes: Current capabilities and challenges. Proteins 2019; 87:1222-1232. [PMID: 31294859 DOI: 10.1002/prot.25774] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/21/2019] [Accepted: 07/06/2019] [Indexed: 12/27/2022]
Abstract
Proteins frequently interact with each other, and the knowledge of structures of the corresponding protein complexes is necessary to understand how they function. Computational methods are increasingly used to provide structural models of protein complexes. Not surprisingly, community-wide Critical Assessment of protein Structure Prediction (CASP) experiments have recently started monitoring the progress in this research area. We participated in CASP13 with the aim to evaluate our current capabilities in modeling of protein complexes and to gain a better understanding of factors that exert the largest impact on these capabilities. To model protein complexes in CASP13, we applied template-based modeling, free docking and hybrid techniques that enabled us to generate models of the topmost quality for 27 of 42 multimers. If templates for protein complexes could be identified, we modeled the structures with reasonable accuracy by straightforward homology modeling. If only partial templates were available, it was nevertheless possible to predict the interaction interfaces correctly or to generate acceptable models for protein complexes by combining template-based modeling with docking. If no templates were available, we used rigid-body docking with limited success. However, in some free docking models, despite the incorrect subunit orientation and missed interface contacts, the approximate location of protein binding sites was identified correctly. Apparently, our overall performance in docking was limited by the quality of monomer models and by the imperfection of scoring methods. The impact of human intervention on our results in modeling of protein complexes was significant indicating the need for improvements of automatic methods.
Collapse
Affiliation(s)
- Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
98
|
Cheng J, Choe MH, Elofsson A, Han KS, Hou J, Maghrabi AHA, McGuffin LJ, Menéndez-Hurtado D, Olechnovič K, Schwede T, Studer G, Uziela K, Venclovas Č, Wallner B. Estimation of model accuracy in CASP13. Proteins 2019; 87:1361-1377. [PMID: 31265154 DOI: 10.1002/prot.25767] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 06/04/2019] [Accepted: 06/15/2019] [Indexed: 12/28/2022]
Abstract
Methods to reliably estimate the accuracy of 3D models of proteins are both a fundamental part of most protein folding pipelines and important for reliable identification of the best models when multiple pipelines are used. Here, we describe the progress made from CASP12 to CASP13 in the field of estimation of model accuracy (EMA) as seen from the progress of the most successful methods in CASP13. We show small but clear progress, that is, several methods perform better than the best methods from CASP12 when tested on CASP13 EMA targets. Some progress is driven by applying deep learning and residue-residue contacts to model accuracy prediction. We show that the best EMA methods select better models than the best servers in CASP13, but that there exists a great potential to improve this further. Also, according to the evaluation criteria based on local similarities, such as lDDT and CAD, it is now clear that single model accuracy methods perform relatively better than consensus-based methods.
Collapse
Affiliation(s)
- Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Myong-Ho Choe
- Department of Life Science, University of Science, Pyongyang, DPR Korea
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Kun-Sop Han
- Department of Life Science, University of Science, Pyongyang, DPR Korea
| | - Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Ali H A Maghrabi
- School of Biological Sciences, University of Reading, Reading, UK
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, UK
| | - David Menéndez-Hurtado
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Basel, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Basel, Switzerland
| | - Karolis Uziela
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Björn Wallner
- Department of Physics, Chemistry, and Biology, Bioinformatics Division, Linköping University, Linköping, Sweden
| |
Collapse
|
99
|
Toliusis P, Tamulaitiene G, Grigaitis R, Tuminauskaite D, Silanskas A, Manakova E, Venclovas C, Szczelkun MD, Siksnys V, Zaremba M. The H-subunit of the restriction endonuclease CglI contains a prototype DEAD-Z1 helicase-like motor. Nucleic Acids Res 2019; 46:2560-2572. [PMID: 29471489 PMCID: PMC5861437 DOI: 10.1093/nar/gky107] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Accepted: 02/08/2018] [Indexed: 11/13/2022] Open
Abstract
CglI is a restriction endonuclease from Corynebacterium glutamicum that forms a complex between: two R-subunits that have site specific-recognition and nuclease domains; and two H-subunits, with Superfamily 2 helicase-like DEAD domains, and uncharacterized Z1 and C-terminal domains. ATP hydrolysis by the H-subunits catalyses dsDNA translocation that is necessary for long-range movement along DNA that activates nuclease activity. Here, we provide biochemical and molecular modelling evidence that shows that Z1 has a fold distantly-related to RecA, and that the DEAD-Z1 domains together form an ATP binding interface and are the prototype of a previously undescribed monomeric helicase-like motor. The DEAD-Z1 motor has unusual Walker A and Motif VI sequences those nonetheless have their expected functions. Additionally, it contains DEAD-Z1-specific features: an H/H motif and a loop (aa 163–aa 172), that both play a role in the coupling of ATP hydrolysis to DNA cleavage. We also solved the crystal structure of the C-terminal domain which has a unique fold, and demonstrate that the Z1-C domains are the principal DNA binding interface of the H-subunit. Finally, we use small angle X-ray scattering to provide a model for how the H-subunit domains are arranged in a dimeric complex.
Collapse
Affiliation(s)
- Paulius Toliusis
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Sauletekio al. 7, LT-10257 Vilnius, Lithuania
| | - Giedre Tamulaitiene
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Sauletekio al. 7, LT-10257 Vilnius, Lithuania
| | - Rokas Grigaitis
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Sauletekio al. 7, LT-10257 Vilnius, Lithuania
| | - Donata Tuminauskaite
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Sauletekio al. 7, LT-10257 Vilnius, Lithuania
| | - Arunas Silanskas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Sauletekio al. 7, LT-10257 Vilnius, Lithuania
| | - Elena Manakova
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Sauletekio al. 7, LT-10257 Vilnius, Lithuania
| | - Ceslovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Sauletekio al. 7, LT-10257 Vilnius, Lithuania
| | - Mark D Szczelkun
- DNA-Protein Interactions Unit, School of Biochemistry, Biomedical Sciences Building, University of Bristol, Bristol BS8 1TD, UK
| | - Virginijus Siksnys
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Sauletekio al. 7, LT-10257 Vilnius, Lithuania
| | - Mindaugas Zaremba
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Sauletekio al. 7, LT-10257 Vilnius, Lithuania
| |
Collapse
|
100
|
Yu Z, Yao Y, Deng H, Yi M. ANDIS: an atomic angle- and distance-dependent statistical potential for protein structure quality assessment. BMC Bioinformatics 2019; 20:299. [PMID: 31159742 PMCID: PMC6547486 DOI: 10.1186/s12859-019-2898-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 05/13/2019] [Indexed: 01/05/2023] Open
Abstract
Background The knowledge-based statistical potential has been widely used in protein structure modeling and model quality assessment. They are commonly evaluated based on their abilities of native recognition as well as decoy discrimination. However, these two aspects are found to be mutually exclusive in many statistical potentials. Results We developed an atomic ANgle- and DIStance-dependent (ANDIS) statistical potential for protein structure quality assessment with distance cutoff being a tunable parameter. When distance cutoff is ≤9.0 Å, “effective atomic interaction” is employed to enhance the ability of native recognition. For a distance cutoff of ≥10 Å, the distance-dependent atom-pair potential with random-walk reference state is combined to strengthen the ability of decoy discrimination. Benchmark tests on 632 structural decoy sets from diverse sources demonstrate that ANDIS outperforms other state-of-the-art potentials in both native recognition and decoy discrimination. Conclusions Distance cutoff is a crucial parameter for distance-dependent statistical potentials. A lower distance cutoff is better for native recognition, while a higher one is favorable for decoy discrimination. The ANDIS potential is freely available as a standalone application at http://qbp.hzau.edu.cn/ANDIS/. Electronic supplementary material The online version of this article (10.1186/s12859-019-2898-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhongwang Yu
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yuangen Yao
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Haiyou Deng
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China. .,Institute of Applied Physics, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Ming Yi
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China. .,Institute of Applied Physics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|