51
|
Shen T, Wu J, Lan H, Zheng L, Pei J, Wang S, Liu W, Huang J. When homologous sequences meet structural decoys: Accurate contact prediction by tFold in CASP14-(tFold for CASP14 contact prediction). Proteins 2021; 89:1901-1910. [PMID: 34473376 DOI: 10.1002/prot.26232] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 08/16/2021] [Accepted: 08/20/2021] [Indexed: 12/29/2022]
Abstract
In this paper, we report our tFold framework's performance on the inter-residue contact prediction task in the 14th Critical Assessment of protein Structure Prediction (CASP14). Our tFold framework seamlessly combines both homologous sequences and structural decoys under an ultra-deep network architecture. Squeeze-excitation and axial attention mechanisms are employed to effectively capture inter-residue interactions. In CASP14, our best predictor achieves 41.78% in the averaged top-L precision for long-range contacts for all the 22 free-modeling (FM) targets, and ranked 1st among all the 60 participating teams. The tFold web server is now freely available at: https://drug.ai.tencent.com/console/en/tfold.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Wei Liu
- Tencent AI Lab, Shenzhen, China
| | | |
Collapse
|
52
|
Ye L, Wu P, Peng Z, Gao J, Liu J, Yang J. Improved estimation of model quality using predicted inter-residue distance. Bioinformatics 2021; 37:3752-3759. [PMID: 34473228 DOI: 10.1093/bioinformatics/btab632] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 08/27/2021] [Accepted: 08/31/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein model quality assessment (QA) is an essential component in protein structure prediction, which aims to estimate the quality of a structure model and/or select the most accurate model out from a pool of structure models, without knowing the native structure. QA remains a challenging task in protein structure prediction. RESULTS Based on the inter-residue distance predicted by the recent deep learning-based structure prediction algorithm trRosetta, we developed QDistance, a new approach to the estimation of both global and local qualities. QDistance works for both single-model and multi-models inputs. We designed several distance-based features to assess the agreement between the predicted and model-derived inter-residue distances. Together with a few widely used features, they are fed into a simple yet powerful linear regression model to infer the global QA scores. The local QA scores for each structure model are predicted based on a comparative analysis with a set of selected reference models. For multi-models input, the reference models are selected from the input based on the predicted global QA scores. For single-model input, the reference models are predicted by trRosetta. With the informative distance-based features, QDistance can predict the global quality with satisfactory accuracy. Benchmark tests on the CASP13 and the CAMEO structure models suggested that QDistance was competitive other methods. Blind tests in the CASP14 experiments showed that QDistance was robust and ranked among the top predictors. Especially, QDistance was the top 3 local QA method and made the most accurate local QA prediction for unreliable local region. Analysis showed that this superior performance can be attributed to the inclusion of the predicted inter-residue distance. AVAILABILITY AND IMPLEMENTATION http://yanglab.nankai.edu.cn/QDistance. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lisha Ye
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Peikun Wu
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China
| | - Jianzhao Gao
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Jian Liu
- College of Computer Science, Nankai University, Tianjin, 300071, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| |
Collapse
|
53
|
Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, Wang J, Cong Q, Kinch LN, Schaeffer RD, Millán C, Park H, Adams C, Glassman CR, DeGiovanni A, Pereira JH, Rodrigues AV, van Dijk AA, Ebrecht AC, Opperman DJ, Sagmeister T, Buhlheller C, Pavkov-Keller T, Rathinaswamy MK, Dalwadi U, Yip CK, Burke JE, Garcia KC, Grishin NV, Adams PD, Read RJ, Baker D. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021; 373:871-876. [PMID: 34282049 PMCID: PMC7612213 DOI: 10.1126/science.abj8754] [Citation(s) in RCA: 2490] [Impact Index Per Article: 830.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 07/07/2021] [Indexed: 01/17/2023]
Abstract
DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo-electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.
Collapse
Affiliation(s)
- Minkyung Baek
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Justas Dauparas
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Sergey Ovchinnikov
- Faculty of Arts and Sciences, Division of Science, Harvard University, Cambridge, MA 02138, USA
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA
| | - Gyu Rie Lee
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Jue Wang
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - R Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Claudia Millán
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | - Hahnbeom Park
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Carson Adams
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Caleb R Glassman
- Program in Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Andy DeGiovanni
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Jose H Pereira
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Andria V Rodrigues
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Alberdina A van Dijk
- Department of Biochemistry, Focus Area Human Metabolomics, North-West University, 2531 Potchefstroom, South Africa
| | - Ana C Ebrecht
- Department of Biochemistry, Focus Area Human Metabolomics, North-West University, 2531 Potchefstroom, South Africa
| | - Diederik J Opperman
- Department of Biotechnology, University of the Free State, 205 Nelson Mandela Drive, Bloemfontein 9300, South Africa
| | - Theo Sagmeister
- Institute of Molecular Biosciences, University of Graz, Humboldtstrasse 50, 8010 Graz, Austria
| | - Christoph Buhlheller
- Institute of Molecular Biosciences, University of Graz, Humboldtstrasse 50, 8010 Graz, Austria
- Medical University of Graz, Graz, Austria
| | - Tea Pavkov-Keller
- Institute of Molecular Biosciences, University of Graz, Humboldtstrasse 50, 8010 Graz, Austria
- BioTechMed-Graz, Graz, Austria
| | - Manoj K Rathinaswamy
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada
| | - Udit Dalwadi
- Life Sciences Institute, Department of Biochemistry and Molecular Biology, The University of British Columbia, Vancouver, BC, Canada
| | - Calvin K Yip
- Life Sciences Institute, Department of Biochemistry and Molecular Biology, The University of British Columbia, Vancouver, BC, Canada
| | - John E Burke
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada
| | - K Christopher Garcia
- Program in Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Nick V Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Paul D Adams
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Randy J Read
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
54
|
Robin X, Haas J, Gumienny R, Smolinski A, Tauriello G, Schwede T. Continuous Automated Model EvaluatiOn (CAMEO)-Perspectives on the future of fully automated evaluation of structure prediction methods. Proteins 2021; 89:1977-1986. [PMID: 34387007 PMCID: PMC8673552 DOI: 10.1002/prot.26213] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 08/05/2021] [Accepted: 08/07/2021] [Indexed: 11/18/2022]
Abstract
The Continuous Automated Model EvaluatiOn (CAMEO) platform complements the biennial CASP experiment by conducting fully automated blind evaluations of three‐dimensional protein prediction servers based on the weekly prerelease of sequences of those structures, which are going to be published in the upcoming release of the Protein Data Bank. While in CASP14, significant success was observed in predicting the structures of individual protein chains with high accuracy, significant challenges remain in correctly predicting the structures of complexes. By implementing fully automated evaluation of predictions for protein–protein complexes, as well as for proteins in complex with ligands, peptides, nucleic acids, or proteins containing noncanonical amino acid residues, CAMEO will assist new developments in those challenging areas of active research.
Collapse
Affiliation(s)
- Xavier Robin
- Biozentrum, University of Basel, Basel, Switzerland.,Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Juergen Haas
- Biozentrum, University of Basel, Basel, Switzerland.,Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Rafal Gumienny
- Biozentrum, University of Basel, Basel, Switzerland.,Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Anna Smolinski
- Biozentrum, University of Basel, Basel, Switzerland.,Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel, Switzerland.,Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland.,Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
55
|
Tunyasuvunakool K, Adler J, Wu Z, Green T, Zielinski M, Žídek A, Bridgland A, Cowie A, Meyer C, Laydon A, Velankar S, Kleywegt GJ, Bateman A, Evans R, Pritzel A, Figurnov M, Ronneberger O, Bates R, Kohl SAA, Potapenko A, Ballard AJ, Romera-Paredes B, Nikolov S, Jain R, Clancy E, Reiman D, Petersen S, Senior AW, Kavukcuoglu K, Birney E, Kohli P, Jumper J, Hassabis D. Highly accurate protein structure prediction for the human proteome. Nature 2021; 596:590-596. [PMID: 34293799 PMCID: PMC8387240 DOI: 10.1038/s41586-021-03828-1] [Citation(s) in RCA: 1502] [Impact Index Per Article: 500.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 07/16/2021] [Indexed: 02/07/2023]
Abstract
Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Gerard J Kleywegt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | | | | |
Collapse
|
56
|
Adiyaman R, McGuffin LJ. ReFOLD3: refinement of 3D protein models with gradual restraints based on predicted local quality and residue contacts. Nucleic Acids Res 2021; 49:W589-W596. [PMID: 34009387 PMCID: PMC8218204 DOI: 10.1093/nar/gkab300] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 03/23/2021] [Accepted: 04/16/2021] [Indexed: 12/16/2022] Open
Abstract
ReFOLD3 is unique in its application of gradual restraints, calculated from local model quality estimates and contact predictions, which are used to guide the refinement of theoretical 3D protein models towards the native structures. ReFOLD3 achieves improved performance by using an iterative refinement protocol to fix incorrect residue contacts and local errors, including unusual bonds and angles, which are identified in the submitted models by our leading ModFOLD8 model quality assessment method. Following refinement, the likely resulting improvements to the submitted models are recognized by ModFOLD8, which produces both global and local quality estimates. During the CASP14 prediction season (May-Aug 2020), we used the ReFOLD3 protocol to refine hundreds of 3D models, for both the refinement and the main tertiary structure prediction categories. Our group improved the global and local quality scores for numerous starting models in the refinement category, where we ranked in the top 10 according to the official assessment. The ReFOLD3 protocol was also used for the refinement of the SARS-CoV-2 targets as a part of the CASP Commons COVID-19 initiative, and we provided a significant number of the top 10 models. The ReFOLD3 web server is freely available at https://www.reading.ac.uk/bioinf/ReFOLD/.
Collapse
Affiliation(s)
- Recep Adiyaman
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| |
Collapse
|
57
|
McGuffin LJ, Aldowsari FMF, Alharbi SMA, Adiyaman R. ModFOLD8: accurate global and local quality estimates for 3D protein models. Nucleic Acids Res 2021; 49:W425-W430. [PMID: 33963867 PMCID: PMC8218196 DOI: 10.1093/nar/gkab321] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 04/01/2021] [Accepted: 04/21/2021] [Indexed: 11/26/2022] Open
Abstract
Methods for estimating the quality of 3D models of proteins are vital tools for driving the acceptance and utility of predicted tertiary structures by the wider bioscience community. Here we describe the significant major updates to ModFOLD, which has maintained its position as a leading server for the prediction of global and local quality of 3D protein models, over the past decade (>20 000 unique external users). ModFOLD8 is the latest version of the server, which combines the strengths of multiple pure-single and quasi-single model methods. Improvements have been made to the web server interface and there has been successive increases in prediction accuracy, which were achieved through integration of newly developed scoring methods and advanced deep learning-based residue contact predictions. Each version of the ModFOLD server has been independently blind tested in the biennial CASP experiments, as well as being continuously evaluated via the CAMEO project. In CASP13 and CASP14, the ModFOLD7 and ModFOLD8 variants ranked among the top 10 quality estimation methods according to almost every official analysis. Prior to CASP14, ModFOLD8 was also applied for the evaluation of SARS-CoV-2 protein models as part of CASP Commons 2020 initiative. The ModFOLD8 server is freely available at: https://www.reading.ac.uk/bioinf/ModFOLD/.
Collapse
Affiliation(s)
- Liam J McGuffin
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Fahd M F Aldowsari
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Shuaa M A Alharbi
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Recep Adiyaman
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| |
Collapse
|
58
|
Harrow J, Drysdale R, Smith A, Repo S, Lanfear J, Blomberg N. ELIXIR: Providing a Sustainable Infrastructure for Life Science Data at European Scale. Bioinformatics 2021; 37:2506-2511. [PMID: 34175941 PMCID: PMC8388016 DOI: 10.1093/bioinformatics/btab481] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 02/19/2021] [Accepted: 06/25/2021] [Indexed: 11/12/2022] Open
Affiliation(s)
- Jennifer Harrow
- ELIXIR Hub, South Building, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Rachel Drysdale
- ELIXIR Hub, South Building, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Andrew Smith
- ELIXIR Hub, South Building, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Susanna Repo
- ELIXIR Hub, South Building, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jerry Lanfear
- ELIXIR Hub, South Building, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Niklas Blomberg
- ELIXIR Hub, South Building, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
59
|
Burley SK, Berman HM. Open-access data: A cornerstone for artificial intelligence approaches to protein structure prediction. Structure 2021; 29:515-520. [PMID: 33984281 PMCID: PMC8178243 DOI: 10.1016/j.str.2021.04.010] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 04/08/2021] [Accepted: 04/23/2021] [Indexed: 12/28/2022]
Abstract
The Protein Data Bank (PDB) was established in 1971 to archive three-dimensional (3D) structures of biological macromolecules as a public good. Fifty years later, the PDB is providing millions of data consumers around the world with open access to more than 175,000 experimentally determined structures of proteins and nucleic acids (DNA, RNA) and their complexes with one another and small-molecule ligands. PDB data users are working, teaching, and learning in fundamental biology, biomedicine, bioengineering, biotechnology, and energy sciences. They also represent the fields of agriculture, chemistry, physics and materials science, mathematics, statistics, computer science, and zoology, and even the social sciences. The enormous wealth of 3D structure data stored in the PDB has underpinned significant advances in our understanding of protein architecture, culminating in recent breakthroughs in protein structure prediction accelerated by artificial intelligence approaches and deep or machine learning methods.
Collapse
Affiliation(s)
- Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08903, USA; Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Helen M Berman
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; The Bridge Institute, Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
60
|
Toth JM, DePietro PJ, Haas J, McLaughlin WA. ResiRole: residue-level functional site predictions to gauge the accuracies of protein structure prediction techniques. Bioinformatics 2021; 37:351-359. [PMID: 32780798 PMCID: PMC8058773 DOI: 10.1093/bioinformatics/btaa712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 07/31/2020] [Accepted: 08/05/2020] [Indexed: 11/25/2022] Open
Abstract
Motivation Methods to assess the quality of protein structure models are needed for user applications. To aid with the selection of structure models and further inform the development of structure prediction techniques, we describe the ResiRole method for the assessment of the quality of structure models. Results Structure prediction techniques are ranked according to the results of round-robin, head-to-head comparisons using difference scores. Each difference score was defined as the absolute value of the cumulative probability for a functional site prediction made with the FEATURE program for the reference structure minus that for the structure model. Overall, the difference scores correlate well with other model quality metrics; and based on benchmarking studies with NaïveBLAST, they are found to detect additional local structural similarities between the structure models and reference structures. Availabilityand implementation Automated analyses of models addressed in CAMEO are available via the ResiRole server, URL http://protein.som.geisinger.edu/ResiRole/. Interactive analyses with user-provided models and reference structures are also enabled. Code is available at github.com/wamclaughlin/ResiRole. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joshua M Toth
- Department of Medical Education, Geisinger Commonwealth School of Medicine, Scranton, PA 18510, USA
| | - Paul J DePietro
- Department of Medical Education, Geisinger Commonwealth School of Medicine, Scranton, PA 18510, USA
| | - Juergen Haas
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, CH-4056 Basel, Switzerland
| | - William A McLaughlin
- Department of Medical Education, Geisinger Commonwealth School of Medicine, Scranton, PA 18510, USA
| |
Collapse
|
61
|
Hiranuma N, Park H, Baek M, Anishchenko I, Dauparas J, Baker D. Improved protein structure refinement guided by deep learning based accuracy estimation. Nat Commun 2021; 12:1340. [PMID: 33637700 PMCID: PMC7910447 DOI: 10.1038/s41467-021-21511-x] [Citation(s) in RCA: 117] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 01/18/2021] [Indexed: 11/22/2022] Open
Abstract
We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models and uses these predictions to guide Rosetta protein structure refinement. The network uses 3D convolutions to evaluate local atomic environments followed by 2D convolutions to provide their global contexts and outperforms other methods that similarly predict the accuracy of protein structure models. Overall accuracy predictions for X-ray and cryoEM structures in the PDB correlate with their resolution, and the network should be broadly useful for assessing the accuracy of both predicted structure models and experimentally determined structures and identifying specific regions likely to be in error. Incorporation of the accuracy predictions at multiple stages in the Rosetta refinement protocol considerably increased the accuracy of the resulting protein structure models, illustrating how deep learning can improve search for global energy minima of biomolecules.
Collapse
Affiliation(s)
- Naozumi Hiranuma
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Washington, WA, USA
| | - Hahnbeom Park
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Minkyung Baek
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Ivan Anishchenko
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Justas Dauparas
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - David Baker
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Washington, WA, USA.
| |
Collapse
|
62
|
Igashov I, Olechnovič L, Kadukova M, Venclovas Č, Grudinin S. VoroCNN: Deep convolutional neural network built on 3D Voronoi tessellation of protein structures. Bioinformatics 2021; 37:2332-2339. [PMID: 33620450 DOI: 10.1093/bioinformatics/btab118] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 01/08/2021] [Accepted: 02/22/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Effective use of evolutionary information has recently led to tremendous progress in computational prediction of three-dimensional (3D) structures of proteins and their complexes. Despite the progress, the accuracy of predicted structures tends to vary considerably from case to case. Since the utility of computational models depends on their accuracy, reliable estimates of deviation between predicted and native structures are of utmost importance. RESULTS For the first time, we present a deep convolutional neural network (CNN) constructed on a Voronoi tessellation of 3D molecular structures. Despite the irregular data domain, our data representation allows us to efficiently introduce both convolution and pooling operations and train the network in an end-to-end fashion without precomputed descriptors. The resultant model, VoroCNN, predicts local qualities of 3D protein folds. The prediction results are competitive to state of the art and superior to the previous 3D CNN architectures built for the same task. We also discuss practical applications of VoroCNN, for example, in recognition of protein binding interfaces. AVAILABILITY The model, data, and evaluation tests are available at https://team.inria.fr/nano-d/software/vorocnn/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ilia Igashov
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.,Moscow Institute of Physics and Technology, 141701 Dolgoprudniy, Russia
| | - Liment Olechnovič
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, LT 10257, Lithuania
| | - Maria Kadukova
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.,Moscow Institute of Physics and Technology, 141701 Dolgoprudniy, Russia
| | - Česlovas Venclovas
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, LT 10257, Lithuania
| | - Sergei Grudinin
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
63
|
Protein Analysis: From Sequence to Structure. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
64
|
Santana CA, Silveira SDA, Moraes JPA, Izidoro SC, de Melo-Minardi RC, Ribeiro AJM, Tyzack JD, Borkakoti N, Thornton JM. GRaSP: a graph-based residue neighborhood strategy to predict binding sites. Bioinformatics 2020; 36:i726-i734. [DOI: 10.1093/bioinformatics/btaa805] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2020] [Indexed: 01/22/2023] Open
Abstract
Abstract
Motivation
The discovery of protein–ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein–ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost.
Results
We proposed Graph-based Residue neighborhood Strategy to Predict binding sites (GRaSP), a novel residue centric and scalable method to predict ligand-binding site residues. It is based on a supervised learning strategy that models the residue environment as a graph at the atomic level. Results show that GRaSP made compatible or superior predictions when compared with methods described in the literature. GRaSP outperformed six other residue-centric methods, including the one considered as state-of-the-art. Also, our method achieved better results than the method from CAMEO independent assessment. GRaSP ranked second when compared with five state-of-the-art pocket-centric methods, which we consider a significant result, as it was not devised to predict pockets. Finally, our method proved scalable as it took 10–20 s on average to predict the binding site for a protein complex whereas the state-of-the-art residue-centric method takes 2–5 h on average.
Availability and implementation
The source code and datasets are available at https://github.com/charles-abreu/GRaSP.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Charles A Santana
- Department of Biochemistry and Immunology
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Sabrina de A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - João P A Moraes
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - Sandro C Izidoro
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - Raquel C de Melo-Minardi
- Department of Biochemistry and Immunology
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - António J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Neera Borkakoti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
65
|
Xu G, Wang Q, Ma J. OPUS-Rota3: Improving Protein Side-Chain Modeling by Deep Neural Networks and Ensemble Methods. J Chem Inf Model 2020; 60:6691-6697. [PMID: 33211480 DOI: 10.1021/acs.jcim.0c00951] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Side-chain modeling is critical for protein structure prediction since the uniqueness of the protein structure is largely determined by its side-chain packing conformation. In this paper, differing from most approaches that rely on rotamer library sampling, we first propose a novel side-chain rotamer prediction method based on deep neural networks, named OPUS-RotaNN. Then, on the basis of our previous work OPUS-Rota2, we propose an open-source side-chain modeling framework, OPUS-Rota3, which integrates the results of different methods into its rotamer library as the sampling candidates. By including OPUS-RotaNN into OPUS-Rota3, we conduct our experiments on three native backbone test sets and one non-native backbone test set. On the native backbone test set, CAMEO-Hard61 for example, OPUS-Rota3 successfully predicts 51.14% of all side-chain dihedral angles with a tolerance criterion of 20° and outperforms OSCAR-star (50.87%), SCWRL4 (50.40%), and FASPR (49.85%). On the non-native backbone test set DB379-ITASSER, the accuracy of OPUS-Rota3 is 52.49%, better than OSCAR-star (48.95%), FASPR (48.69%), and SCWRL4 (48.29%). All the source codes including the training codes and the data we used are available at https://github.com/thuxugang/opus_rota3.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States.,Department of Bioengineering, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
66
|
Walker SP, Yallapragada VVB, Tangney M. Arming Yourself for The In Silico Protein Design Revolution. Trends Biotechnol 2020; 39:651-664. [PMID: 33139074 DOI: 10.1016/j.tibtech.2020.10.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 10/05/2020] [Accepted: 10/05/2020] [Indexed: 12/23/2022]
Abstract
Proteins mediate many essential processes of life to a degree of functional precision unmatched by any synthetic device. While engineered proteins are currently used in biotech, food, biomedicine, and material technology-based industries, the true potential of proteins is practically untapped. The emerging field of in silico protein design is predicted to provide the next quantum leap in the biotech industry. Having predictive control over protein function and the ability to redefine these functions have driven the field of protein engineering into an era of unprecedented development. This article provides a holistic analysis of protein design R&D (current state-of-the-art tools and knowhow) and commercial landscape, as well as a one-stop-shop profile of in silico protein design technology for biotechnology stakeholders.
Collapse
Affiliation(s)
- Sidney P Walker
- CancerResearch@UCC, University College Cork, Cork, Ireland; SynBioCentre, University College Cork, Cork, Ireland
| | - Venkata V B Yallapragada
- CancerResearch@UCC, University College Cork, Cork, Ireland; SynBioCentre, University College Cork, Cork, Ireland
| | - Mark Tangney
- CancerResearch@UCC, University College Cork, Cork, Ireland; SynBioCentre, University College Cork, Cork, Ireland; APC Microbiome Ireland, University College Cork, Cork, Ireland.
| |
Collapse
|
67
|
Toward Increased Reliability, Transparency, and Accessibility in Cross-linking Mass Spectrometry. Structure 2020; 28:1259-1268. [PMID: 33065067 DOI: 10.1016/j.str.2020.09.011] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 09/02/2020] [Accepted: 09/24/2020] [Indexed: 01/09/2023]
Abstract
Cross-linking mass spectrometry (MS) has substantially matured as a method over the past 2 decades through parallel development in multiple labs, demonstrating its applicability to protein structure determination, conformation analysis, and mapping protein interactions in complex mixtures. Cross-linking MS has become a much-appreciated and routinely applied tool, especially in structural biology. Therefore, it is timely that the community commits to the development of methodological and reporting standards. This white paper builds on an open process comprising a number of events at community conferences since 2015 and identifies aspects of Cross-linking MS for which guidelines should be developed as part of a Cross-linking MS standards initiative.
Collapse
|
68
|
Norman RA, Ambrosetti F, Bonvin AMJJ, Colwell LJ, Kelm S, Kumar S, Krawczyk K. Computational approaches to therapeutic antibody design: established methods and emerging trends. Brief Bioinform 2020; 21:1549-1567. [PMID: 31626279 PMCID: PMC7947987 DOI: 10.1093/bib/bbz095] [Citation(s) in RCA: 113] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 06/07/2019] [Accepted: 07/05/2019] [Indexed: 12/31/2022] Open
Abstract
Antibodies are proteins that recognize the molecular surfaces of potentially noxious molecules to mount an adaptive immune response or, in the case of autoimmune diseases, molecules that are part of healthy cells and tissues. Due to their binding versatility, antibodies are currently the largest class of biotherapeutics, with five monoclonal antibodies ranked in the top 10 blockbuster drugs. Computational advances in protein modelling and design can have a tangible impact on antibody-based therapeutic development. Antibody-specific computational protocols currently benefit from an increasing volume of data provided by next generation sequencing and application to related drug modalities based on traditional antibodies, such as nanobodies. Here we present a structured overview of available databases, methods and emerging trends in computational antibody analysis and contextualize them towards the engineering of candidate antibody therapeutics.
Collapse
|
69
|
Studer G, Rempfer C, Waterhouse AM, Gumienny R, Haas J, Schwede T. QMEANDisCo-distance constraints applied on model quality estimation. Bioinformatics 2020; 36:1765-1771. [PMID: 31697312 PMCID: PMC7075525 DOI: 10.1093/bioinformatics/btz828] [Citation(s) in RCA: 462] [Impact Index Per Article: 115.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 10/24/2019] [Accepted: 11/06/2019] [Indexed: 01/13/2023] Open
Abstract
Motivation Methods that estimate the quality of a 3D protein structure model in absence of an experimental reference structure are crucial to determine a model’s utility and potential applications. Single model methods assess individual models whereas consensus methods require an ensemble of models as input. In this work, we extend the single model composite score QMEAN that employs statistical potentials of mean force and agreement terms by introducing a consensus-based distance constraint (DisCo) score. Results DisCo exploits distance distributions from experimentally determined protein structures that are homologous to the model being assessed. Feed-forward neural networks are trained to adaptively weigh contributions by the multi-template DisCo score and classical single model QMEAN parameters. The result is the composite score QMEANDisCo, which combines the accuracy of consensus methods with the broad applicability of single model approaches. We also demonstrate that, despite being the de-facto standard for structure prediction benchmarking, CASP models are not the ideal data source to train predictive methods for model quality estimation. For performance assessment, QMEANDisCo is continuously benchmarked within the CAMEO project and participated in CASP13. For both, it ranks among the top performers and excels with low response times. Availability and implementation QMEANDisCo is available as web-server at https://swissmodel.expasy.org/qmean. The source code can be downloaded from https://git.scicore.unibas.ch/schwede/QMEAN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriel Studer
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Christine Rempfer
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Andrew M Waterhouse
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Rafal Gumienny
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Juergen Haas
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| |
Collapse
|
70
|
Dos Santos-Silva CA, Zupin L, Oliveira-Lima M, Vilela LMB, Bezerra-Neto JP, Ferreira-Neto JR, Ferreira JDC, de Oliveira-Silva RL, Pires CDJ, Aburjaile FF, de Oliveira MF, Kido EA, Crovella S, Benko-Iseppon AM. Plant Antimicrobial Peptides: State of the Art, In Silico Prediction and Perspectives in the Omics Era. Bioinform Biol Insights 2020; 14:1177932220952739. [PMID: 32952397 PMCID: PMC7476358 DOI: 10.1177/1177932220952739] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 07/30/2020] [Indexed: 12/14/2022] Open
Abstract
Even before the perception or interaction with pathogens, plants rely on constitutively guardian molecules, often specific to tissue or stage, with further expression after contact with the pathogen. These guardians include small molecules as antimicrobial peptides (AMPs), generally cysteine-rich, functioning to prevent pathogen establishment. Some of these AMPs are shared among eukaryotes (eg, defensins and cyclotides), others are plant specific (eg, snakins), while some are specific to certain plant families (such as heveins). When compared with other organisms, plants tend to present a higher amount of AMP isoforms due to gene duplications or polyploidy, an occurrence possibly also associated with the sessile habit of plants, which prevents them from evading biotic and environmental stresses. Therefore, plants arise as a rich resource for new AMPs. As these molecules are difficult to retrieve from databases using simple sequence alignments, a description of their characteristics and in silico (bioinformatics) approaches used to retrieve them is provided, considering resources and databases available. The possibilities and applications based on tools versus database approaches are considerable and have been so far underestimated.
Collapse
Affiliation(s)
| | - Luisa Zupin
- Genetic Immunology laboratory, Institute for Maternal and Child Health-IRCCS, Burlo Garofolo, Trieste, Italy
| | - Marx Oliveira-Lima
- Departamento de Genética, Universidade Federal de Pernambuco, Recife, Brazil
| | | | | | | | - José Diogo Cavalcanti Ferreira
- Departamento de Genética, Universidade Federal de Pernambuco, Recife, Brazil.,Departamento de Genética, Instituto Federal de Pernambuco, Pesqueira, Brazil
| | | | | | | | | | - Ederson Akio Kido
- Departamento de Genética, Universidade Federal de Pernambuco, Recife, Brazil
| | - Sergio Crovella
- Genetic Immunology laboratory, Institute for Maternal and Child Health-IRCCS, Burlo Garofolo, Trieste, Italy.,Department of Medicine, Surgery and Health Sciences, University of Trieste, Trieste, Italy
| | | |
Collapse
|
71
|
Nagarajan S, Babu S, Sohn H, Madhavan T. Molecular-Level Understanding of the Somatostatin Receptor 1 (SSTR1)-Ligand Binding: A Structural Biology Study Based on Computational Methods. ACS OMEGA 2020; 5:21145-21161. [PMID: 32875251 PMCID: PMC7450625 DOI: 10.1021/acsomega.0c02847] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 07/31/2020] [Indexed: 06/11/2023]
Abstract
Somatostatin receptor 1 (SSTR1), a subtype of somatostatin receptors, is involved in various signaling mechanisms in different parts of the human body. Like most of the G-protein-coupled receptors (GPCRs), the available information on the structural features of SSTR1 responsible for the biological activity is scarce. In this study, we report a molecular-level understanding of SSTR1-ligand binding, which could be helpful in solving the structural complexities involved in SSTR1 functioning. Based on a three-dimensional quantitative structure-activity relationship (3D-QSAR) study using comparative molecular field analysis (CoMFA) and comparative molecular similarity index analysis (CoMSIA), we have identified that an electronegative, less-bulkier, and hydrophobic atom substitution can substantially increase the biological activity of SSTR1 ligands. A density functional theory (DFT) study has been followed to study the electron-related properties of the SSTR1 ligands and to validate the results obtained via the 3D-QSAR study. 3D models of SSTR1-ligand systems have been embedded in lipid-lipid bilayer membranes to perform molecular dynamics (MD) simulations. Analysis of the MD trajectories reveals important information about the crucial residues involved in SSTR1-ligand binding and various conformational changes in the protein that occur after ligand binding. Additionally, we have identified the probable ligand-binding site of SSTR1 and validated it using MD. We have also studied the favorable conditions that are essential for forming the most stable and lowest-energy bioactive conformation of the ligands inside the binding site. The results of the study could be useful in constructing more potent and novel SSTR1 antagonists and agonists.
Collapse
Affiliation(s)
- Santhosh
Kumar Nagarajan
- Computational
Biology Lab, Department of Genetic Engineering, School of Bioengineering, SRM Institute of Science and Technology, SRM Nagar, Kattankulathur, Chennai 603203, India
| | - Sathya Babu
- Computational
Biology Lab, Department of Genetic Engineering, School of Bioengineering, SRM Institute of Science and Technology, SRM Nagar, Kattankulathur, Chennai 603203, India
| | - Honglae Sohn
- Department
of Chemistry and Department of Carbon Materials, Chosun University, Gwangju, South Korea
| | - Thirumurthy Madhavan
- Computational
Biology Lab, Department of Genetic Engineering, School of Bioengineering, SRM Institute of Science and Technology, SRM Nagar, Kattankulathur, Chennai 603203, India
| |
Collapse
|
72
|
Polychronidou E, Avramouli A, Vlamos P. Alzheimer's Disease: The Role of Mutations in Protein Folding. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2020; 1195:227-236. [PMID: 32468481 DOI: 10.1007/978-3-030-32633-3_31] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
Abstract
Misfolded proteins result when a protein follows the wrong folding pathway. Accumulation of misfolded proteins can cause disorders, known as amyloid diseases. Unfortunately, some of them are very common. The most prevalent one is Alzheimer's disease. Alzheimer's disease is a neurodegenerative disorder and the commonest form of dementia. The current study aims to assess the impact of somatic mutations in PSEN1 gene. The said mutations are the most common cause of familial Alzheimer's disease. As protein functionality can be affected by mutations, the study of possible alterations in the tertiary structure of proteins may reveal new insights related to the relationship between mutations and protein functions. To examine the effect of mutations, the primary structures and their related mutations were retrieved from public databases. Each structure (mutated and unmutated) was predicted based on effective structure prediction methodologies. A benchmarking of the structural predictive tools was accomplished. Comparative analyses of mutated and unmutated proteins were performed based on classic bioinformatics methods (TM-Score, RMSD, etc.) as well as on established shape-based descriptors retrieved from object recognition methodologies. Unsupervised methodologies were applied to the structures, in order to identify groups of mutation with similar mutational impact. Our results provide an essential knowledge toward protein's functionality in structure-based drug design.
Collapse
|
73
|
Kim DN, Gront D, Sanbonmatsu KY. Practical Considerations for Atomistic Structure Modeling with Cryo-EM Maps. J Chem Inf Model 2020; 60:2436-2442. [PMID: 32422044 PMCID: PMC7891309 DOI: 10.1021/acs.jcim.0c00090] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We describe common approaches to atomistic structure modeling with single particle analysis derived cryo-EM maps. Several strategies for atomistic model building and atomistic model fitting methods are discussed, including selection criteria and implementation procedures. In covering basic concepts and caveats, this short perspective aims to help facilitate active discussion between scientists at different levels with diverse backgrounds.
Collapse
Affiliation(s)
- Doo Nam Kim
- Computational Biology Team, Biological Science Division, Pacific Northwest National Laboratory, Richland, Washington, 99354, United States
| | - Dominik Gront
- Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Karissa Y. Sanbonmatsu
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, 87545, United States
- New Mexico Consortium, Los Alamos, New Mexico, 87544, United States
| |
Collapse
|
74
|
Olechnovič K, Venclovas Č. VoroMQA web server for assessing three-dimensional structures of proteins and protein complexes. Nucleic Acids Res 2020; 47:W437-W442. [PMID: 31073605 PMCID: PMC6602437 DOI: 10.1093/nar/gkz367] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 04/19/2019] [Accepted: 05/05/2019] [Indexed: 01/12/2023] Open
Abstract
The VoroMQA (Voronoi tessellation-based Model Quality Assessment) web server is dedicated to the estimation of protein structure quality, a common step in selecting realistic and most accurate computational models and in validating experimental structures. As an input, the VoroMQA web server accepts one or more protein structures in PDB format. Input structures may be either monomeric proteins or multimeric protein complexes. For every input structure, the server provides both global and local (per-residue) scores. Visualization of the local scores along the protein chain is enhanced by providing secondary structure assignment and information on solvent accessibility. A unique feature of the VoroMQA server is the ability to directly assess protein-protein interaction interfaces. If this type of assessment is requested, the web server provides interface quality scores, interface energy estimates, and local scores for residues involved in inter-chain interfaces. VoroMQA, the underlying method of the web server, was extensively tested in recent community-wide CASP and CAPRI experiments. During these experiments VoroMQA showed outstanding performance both in model selection and in estimation of accuracy of local structural regions. The VoroMQA web server is available at http://bioinformatics.ibt.lt/wtsam/voromqa.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| |
Collapse
|
75
|
McGuffin LJ, Adiyaman R, Maghrabi AHA, Shuid AN, Brackenridge DA, Nealon JO, Philomina LS. IntFOLD: an integrated web resource for high performance protein structure and function prediction. Nucleic Acids Res 2020; 47:W408-W413. [PMID: 31045208 PMCID: PMC6602432 DOI: 10.1093/nar/gkz322] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 04/05/2019] [Accepted: 04/23/2019] [Indexed: 12/14/2022] Open
Abstract
The IntFOLD server provides a unified resource for the automated prediction of: protein tertiary structures with built-in estimates of model accuracy (EMA), protein structural domain boundaries, natively unstructured or disordered regions in proteins, and protein–ligand interactions. The component methods have been independently evaluated via the successive blind CASP experiments and the continual CAMEO benchmarking project. The IntFOLD server has established its ranking as one of the best performing publicly available servers, based on independent official evaluation metrics. Here, we describe significant updates to the server back end, where we have focused on performance improvements in tertiary structure predictions, in terms of global 3D model quality and accuracy self-estimates (ASE), which we achieve using our newly improved ModFOLD7_rank algorithm. We also report on various upgrades to the front end including: a streamlined submission process, enhanced visualization of models, new confidence scores for ranking, and links for accessing all annotated model data. Furthermore, we now include an option for users to submit selected models for further refinement via convenient push buttons. The IntFOLD server is freely available at: http://www.reading.ac.uk/bioinf/IntFOLD/.
Collapse
Affiliation(s)
- Liam J McGuffin
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Recep Adiyaman
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Ali H A Maghrabi
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Ahmad N Shuid
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK.,Infectomics cluster, Advanced Medical and Dental Institute, University of Science, Malaysia, Bertam, 13200, Kepala Batas, Pulau Pinang, Malaysia
| | | | - John O Nealon
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Limcy S Philomina
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| |
Collapse
|
76
|
Graves J, Byerly J, Priego E, Makkapati N, Parish SV, Medellin B, Berrondo M. A Review of Deep Learning Methods for Antibodies. Antibodies (Basel) 2020; 9:E12. [PMID: 32354020 PMCID: PMC7344881 DOI: 10.3390/antib9020012] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 04/15/2020] [Accepted: 04/16/2020] [Indexed: 01/09/2023] Open
Abstract
Driven by its successes across domains such as computer vision and natural language processing, deep learning has recently entered the field of biology by aiding in cellular image classification, finding genomic connections, and advancing drug discovery. In drug discovery and protein engineering, a major goal is to design a molecule that will perform a useful function as a therapeutic drug. Typically, the focus has been on small molecules, but new approaches have been developed to apply these same principles of deep learning to biologics, such as antibodies. Here we give a brief background of deep learning as it applies to antibody drug development, and an in-depth explanation of several deep learning algorithms that have been proposed to solve aspects of both protein design in general, and antibody design in particular.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Monica Berrondo
- Macromoltek, Inc, 2500 W William Cannon Dr, Suite 204, Austin, Austin, TX 78745, USA
| |
Collapse
|
77
|
Orengo C, Velankar S, Wodak S, Zoete V, Bonvin AMJJ, Elofsson A, Feenstra KA, Gerloff DL, Hamelryck T, Hancock JM, Helmer-Citterich M, Hospital A, Orozco M, Perrakis A, Rarey M, Soares C, Sussman JL, Thornton JM, Tuffery P, Tusnady G, Wierenga R, Salminen T, Schneider B. A community proposal to integrate structural bioinformatics activities in ELIXIR (3D-Bioinfo Community). F1000Res 2020; 9. [PMID: 32566135 PMCID: PMC7284151 DOI: 10.12688/f1000research.20559.1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/05/2020] [Indexed: 12/11/2022] Open
Abstract
Structural bioinformatics provides the scientific methods and tools to analyse, archive, validate, and present the biomolecular structure data generated by the structural biology community. It also provides an important link with the genomics community, as structural bioinformaticians also use the extensive sequence data to predict protein structures and their functional sites. A very broad and active community of structural bioinformaticians exists across Europe, and 3D-Bioinfo will establish formal platforms to address their needs and better integrate their activities and initiatives. Our mission will be to strengthen the ties with the structural biology research communities in Europe covering life sciences, as well as chemistry and physics and to bridge the gap between these researchers in order to fully realize the potential of structural bioinformatics. Our Community will also undertake dedicated educational, training and outreach efforts to facilitate this, bringing new insights and thus facilitating the development of much needed innovative applications e.g. for human health, drug and protein design. Our combined efforts will be of critical importance to keep the European research efforts competitive in this respect. Here we highlight the major European contributions to the field of structural bioinformatics, the most pressing challenges remaining and how Europe-wide interactions, enabled by ELIXIR and its platforms, will help in addressing these challenges and in coordinating structural bioinformatics resources across Europe. In particular, we present recent activities and future plans to consolidate an ELIXIR 3D-Bioinfo Community in structural bioinformatics and propose means to develop better links across the community. These include building new consortia, organising workshops to establish data standards and seeking community agreement on benchmark data sets and strategies. We also highlight existing and planned collaborations with other ELIXIR Communities and other European infrastructures, such as the structural biology community supported by Instruct-ERIC, with whom we have synergies and overlapping common interests.
Collapse
Affiliation(s)
- Christine Orengo
- Structural and Molecular Biology Department, University College, London, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| | - Shoshana Wodak
- VIB-VUB Center for Structural Biology, Brussels, Belgium
| | - Vincent Zoete
- Department of Oncology, Lausanne University, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alexandre M J J Bonvin
- Bijvoet Center, Faculty of Science - Chemistry, Utrecht University, Utrecht, 3584CH, The Netherlands
| | - Arne Elofsson
- Science for Life Laboratory, Stockholm University, Solna, S-17121, Sweden
| | - K Anton Feenstra
- Dept. Computer Science, Center for Integrative Bioinformatics VU (IBIVU), Vrije Universiteit, Amsterdam, 1081 HV, The Netherlands
| | - Dietland L Gerloff
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Thomas Hamelryck
- Bioinformatics center, Department of Biology, University of Copenhagen, Copenhagen, DK-2200, Denmark
| | | | | | - Adam Hospital
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, 08028, Spain
| | - Modesto Orozco
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, 08028, Spain
| | | | - Matthias Rarey
- ZBH - Center for Bioinformatics, Universität Hamburg, Hamburg, D-20146, Germany
| | - Claudio Soares
- Instituto de Tecnologia Química e Biológica Antonio Xavier, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Joel L Sussman
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| | - Pierre Tuffery
- Ressource Parisienne en Bioinformatique Structurale, Université de Paris, Paris, F-75205, France
| | - Gabor Tusnady
- Membrane Bioinformatics Research Group, Institute of Enzymology, Budapest, H-1117, Hungary
| | | | - Tiina Salminen
- Structural Bioinformatics Laboratory, Åbo Akademi University, Turku, FI-20500, Finland
| | - Bohdan Schneider
- Institute of Biotechnology of the Czech Academy of Sciences, Vestec, CZ-25250, Czech Republic
| |
Collapse
|
78
|
Mulnaes D, Porta N, Clemens R, Apanasenko I, Reiners J, Gremer L, Neudecker P, Smits SHJ, Gohlke H. TopModel: Template-Based Protein Structure Prediction at Low Sequence Identity Using Top-Down Consensus and Deep Neural Networks. J Chem Theory Comput 2020; 16:1953-1967. [DOI: 10.1021/acs.jctc.9b00825] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Daniel Mulnaes
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Nicola Porta
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Rebecca Clemens
- Institute für Biochemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Irina Apanasenko
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & JuStruct, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Jens Reiners
- Institute für Biochemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Center for Structural Studies Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Lothar Gremer
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & JuStruct, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Philipp Neudecker
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & JuStruct, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Sander H. J. Smits
- Institute für Biochemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Center for Structural Studies Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & JuStruct, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
- John von Neumann Institute for Computing (NIC) & Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
79
|
Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci U S A 2020; 117:1496-1503. [PMID: 31896580 DOI: 10.1073/pnas.1914677117] [Citation(s) in RCA: 830] [Impact Index Per Article: 207.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
The prediction of interresidue contacts and distances from coevolutionary data using deep learning has considerably advanced protein structure prediction. Here, we build on these advances by developing a deep residual network for predicting interresidue orientations, in addition to distances, and a Rosetta-constrained energy-minimization protocol for rapidly and accurately generating structure models guided by these restraints. In benchmark tests on 13th Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP13)- and Continuous Automated Model Evaluation (CAMEO)-derived sets, the method outperforms all previously described structure-prediction methods. Although trained entirely on native proteins, the network consistently assigns higher probability to de novo-designed proteins, identifying the key fold-determining residues and providing an independent quantitative measure of the "ideality" of a protein structure. The method promises to be useful for a broad range of protein structure prediction and design problems.
Collapse
|
80
|
Abstract
Assessing the accuracy of 3D models has become a keystone in the protein structure prediction field. ModFOLD7 is our leading resource for Estimates of Model Accuracy (EMA), which has been upgraded by integrating a number of the pioneering pure-single- and quasi-single-model approaches. Such an integration has given our latest version the strengths to accurately score and rank predicted models, with higher consistency compared to older EMA methods. Additionally, the server provides three options for producing global score estimates, depending on the requirements of the user: (1) ModFOLD7_rank, which is optimized for ranking/selection, (2) ModFOLD7_cor, which is optimized for correlations of predicted and observed scores, and (3) ModFOLD7 global for balanced performance. ModFOLD7 has been ranked among the top few EMA methods according to independent blind testing by the CASP13 assessors. Another evaluation resource for ModFOLD7 is the CAMEO project, where the method is continuously automatically evaluated, showing a significant improvement compared to our previous versions. The ModFOLD7 server is freely available at http://www.reading.ac.uk/bioinf/ModFOLD/ .
Collapse
Affiliation(s)
- Ali H A Maghrabi
- School of Biological Sciences, University of Reading, Reading, Berkshire, UK
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, Berkshire, UK.
| |
Collapse
|
81
|
Olechnovič K, Monastyrskyy B, Kryshtafovych A, Venclovas Č. Comparative analysis of methods for evaluation of protein models against native structures. Bioinformatics 2019; 35:937-944. [PMID: 30169622 DOI: 10.1093/bioinformatics/bty760] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 08/04/2018] [Accepted: 08/28/2018] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Measuring discrepancies between protein models and native structures is at the heart of development of protein structure prediction methods and comparison of their performance. A number of different evaluation methods have been developed; however, their comprehensive and unbiased comparison has not been performed. RESULTS We carried out a comparative analysis of several popular model assessment methods (RMSD, TM-score, GDT, QCS, CAD-score, LDDT, SphereGrinder and RPF) to reveal their relative strengths and weaknesses. The analysis, performed on a large and diverse model set derived in the course of three latest community-wide CASP experiments (CASP10-12), had two major directions. First, we looked at general differences between the scores by analyzing distribution, correspondence and correlation of their values as well as differences in selecting best models. Second, we examined the score differences taking into account various structural properties of models (stereochemistry, hydrogen bonds, packing of domains and chain fragments, missing residues, protein length and secondary structure). Our results provide a solid basis for an informed selection of the most appropriate score or combination of scores depending on the task at hand. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, Lithuania
| | | | | | - Česlovas Venclovas
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, Lithuania
| |
Collapse
|
82
|
Berman HM, Adams PD, Bonvin AA, Burley SK, Carragher B, Chiu W, DiMaio F, Ferrin TE, Gabanyi MJ, Goddard TD, Griffin PR, Haas J, Hanke CA, Hoch JC, Hummer G, Kurisu G, Lawson CL, Leitner A, Markley JL, Meiler J, Montelione GT, Phillips GN, Prisner T, Rappsilber J, Schriemer DC, Schwede T, Seidel CAM, Strutzenberg TS, Svergun DI, Tajkhorshid E, Trewhella J, Vallat B, Velankar S, Vuister GW, Webb B, Westbrook JD, White KL, Sali A. Federating Structural Models and Data: Outcomes from A Workshop on Archiving Integrative Structures. Structure 2019; 27:1745-1759. [PMID: 31780431 DOI: 10.1016/j.str.2019.11.002] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 10/31/2019] [Accepted: 11/06/2019] [Indexed: 12/23/2022]
Abstract
Structures of biomolecular systems are increasingly computed by integrative modeling. In this approach, a structural model is constructed by combining information from multiple sources, including varied experimental methods and prior models. In 2019, a Workshop was held as a Biophysical Society Satellite Meeting to assess progress and discuss further requirements for archiving integrative structures. The primary goal of the Workshop was to build consensus for addressing the challenges involved in creating common data standards, building methods for federated data exchange, and developing mechanisms for validating integrative structures. The summary of the Workshop and the recommendations that emerged are presented here.
Collapse
Affiliation(s)
- Helen M Berman
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA; Bridge Institute, Michelson Center, University of Southern California, Los Angeles, CA 90089, USA.
| | - Paul D Adams
- Physical Biosciences Division, Lawrence Berkeley Laboratory, Berkeley, CA 94720-8235, USA; Department of Bioengineering, University of California-Berkeley, Berkeley, CA 94720, USA
| | - Alexandre A Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, the Netherlands
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Skaggs School of Pharmacy and Pharmaceutical Sciences and San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA; Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08903, USA
| | - Bridget Carragher
- Simons Electron Microscopy Center, New York Structural Biology Center, New York, NY 10027, USA; Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Wah Chiu
- Department of Bioengineering, Department of Microbiology and Immunology, Stanford University, Stanford, CA 94305-5447, USA; SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA
| | - Frank DiMaio
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Thomas E Ferrin
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
| | - Margaret J Gabanyi
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Thomas D Goddard
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
| | | | - Juergen Haas
- Swiss Institute of Bioinformatics and Biozentrum, University of Basel, 4056 Basel, Switzerland
| | - Christian A Hanke
- Molecular Physical Chemistry, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Jeffrey C Hoch
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT 06030, USA
| | - Gerhard Hummer
- Department of Theoretical Biophysics, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany; Institute for Biophysics, Goethe University Frankfurt, 60438 Frankfurt am Main, Germany
| | - Genji Kurisu
- Protein Data Bank Japan (PDBj), Institute for Protein Research, Osaka University, Osaka 565-0871, Japan
| | - Catherine L Lawson
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Alexander Leitner
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - John L Markley
- BioMagResBank (BMRB), Biochemistry Department, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Jens Meiler
- Center for Structural Biology, Vanderbilt University, 465 21st Avenue South, Nashville, TN 37221, USA
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Biochemistry, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytech Institute, Troy, NY 12180, USA
| | - George N Phillips
- BioSciences at Rice and Department of Chemistry, Rice University, Houston, TX 77251, USA
| | - Thomas Prisner
- Institute of Physical and Theoretical Chemistry and Center of Biomolecular Magnetic Resonance, Goethe University Frankfurt, 60438 Frankfurt am Main, Germany
| | - Juri Rappsilber
- Wellcome Trust Centre for Cell Biology, Edinburgh EH9 3JR, Scotland
| | - David C Schriemer
- Department of Biochemistry & Molecular Biology, Robson DNA Science Centre, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Torsten Schwede
- Swiss Institute of Bioinformatics and Biozentrum, University of Basel, 4056 Basel, Switzerland
| | - Claus A M Seidel
- Molecular Physical Chemistry, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | | | - Dmitri I Svergun
- European Molecular Biology Laboratory (EMBL), Hamburg Outstation, Notkestrasse 85, 22607 Hamburg, Germany
| | - Emad Tajkhorshid
- Department of Biochemistry, NIH Center for Macromolecular Modeling and Bioinformatics, Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Jill Trewhella
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia; Department of Chemistry, University of Utah, Salt Lake City, UT 84112, USA
| | - Brinda Vallat
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Sameer Velankar
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Geerten W Vuister
- Department of Molecular and Cell Biology, Leicester Institute of Structural and Chemical Biology, University of Leicester, Leicester LE1 9HN, UK
| | - Benjamin Webb
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - John D Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Kate L White
- Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA; Bridge Institute, Michelson Center, University of Southern California, Los Angeles, CA 90089, USA
| | - Andrej Sali
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA; Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA; California Institute for Quantitative Biosciences, University of California, San Francisco, San Francisco, CA 94158, USA.
| |
Collapse
|
83
|
Chen Q, Xiao Y, Zhang W, Mu W. Current methods and applications in computational protein design for food industry. Crit Rev Food Sci Nutr 2019; 60:3259-3270. [DOI: 10.1080/10408398.2019.1682513] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Qiuming Chen
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, China
| | - Yaqin Xiao
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, China
| | - Wenli Zhang
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, China
| | - Wanmeng Mu
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, China
- International Joint Laboratory on Food Safety, Jiangnan University, Wuxi, China
| |
Collapse
|
84
|
Haas J, Gumienny R, Barbato A, Ackermann F, Tauriello G, Bertoni M, Studer G, Smolinski A, Schwede T. Introducing "best single template" models as reference baseline for the Continuous Automated Model Evaluation (CAMEO). Proteins 2019; 87:1378-1387. [PMID: 31571280 DOI: 10.1002/prot.25815] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2019] [Revised: 09/10/2019] [Accepted: 09/13/2019] [Indexed: 12/17/2022]
Abstract
Critical blind assessment of structure prediction techniques is crucial for the scientific community to establish the state of the art, identify bottlenecks, and guide future developments. In Critical Assessment of Techniques in Structure Prediction (CASP), human experts assess the performance of participating methods in relation to the difficulty of the prediction task in a biennial experiment on approximately 100 targets. Yet, the development of automated computational modeling methods requires more frequent evaluation cycles and larger sets of data. The "Continuous Automated Model EvaluatiOn (CAMEO)" platform complements CASP by conducting fully automated blind prediction evaluations based on the weekly pre-release of sequences of those structures, which are going to be published in the next release of the Protein Data Bank (PDB). Each week, CAMEO publishes benchmarking results for predictions corresponding to a set of about 20 targets collected during a 4-day prediction window. CAMEO benchmarking data are generated consistently for all methods at the same point in time, enabling developers to cross-validate their method's performance, and referring to their results in publications. Many successful participants of CASP have used CAMEO-either by directly benchmarking their methods within the system or by comparing their own performance to CAMEO reference data. CAMEO offers a variety of scores reflecting different aspects of structure modeling, for example, binding site accuracy, homo-oligomer interface quality, or accuracy of local model confidence estimates. By introducing the "bestSingleTemplate" method based on structure superpositions as a reference for the accuracy of 3D modeling predictions, CAMEO facilitates objective comparison of techniques and fosters the development of advanced methods.
Collapse
Affiliation(s)
- Juergen Haas
- Computational Structural Biology, University of Basel, Switzerland
| | - Rafal Gumienny
- Computational Structural Biology, Swiss Institute of Bioinformatics, Switzerland
| | - Alessandro Barbato
- Computational Structural Biology, Universitat Basel Department Biozentrum, Switzerland
| | - Flavio Ackermann
- Computational Structural Biology, University of Basel, Switzerland
| | | | - Martino Bertoni
- Computational Structural Biology, Universitat Basel Department Biozentrum, Switzerland
| | - Gabriel Studer
- Computational Structural Biology, University of Basel, Switzerland
| | - Anna Smolinski
- Computational Structural Biology, University of Basel, Switzerland
| | - Torsten Schwede
- Computational Structural Biology, University of Basel, Switzerland
| |
Collapse
|
85
|
Zheng W, Zhang C, Bell EW, Zhang Y. I-TASSER gateway: A protein structure and function prediction server powered by XSEDE. FUTURE GENERATIONS COMPUTER SYSTEMS : FGCS 2019; 99:73-85. [PMID: 31427836 PMCID: PMC6699767 DOI: 10.1016/j.future.2019.04.011] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
There is an increasing gap between the number of known protein sequences and the number of proteins with experimentally characterized structure and function. To alleviate this issue, we have developed the I-TASSER gateway, an online server for automated and reliable protein structure and function prediction. For a given sequence, I-TASSER starts with template recognition from a known structure library, followed by full-length atomic model construction by iterative assembly simulations of the continuous structural fragments excised from the template alignments. Functional insights are then derived from comparative matching of the predicted model with a library of proteins with known function. The I-TASSER pipeline has been recently integrated with the XSEDE Gateway system to accommodate pressing demand from the user community and increasing computing costs. This report summarizes the configuration of the I-TASSER Gateway with the XSEDE-Comet supercomputer cluster, together with an overview of the I-TASSER method and milestones of its development.
Collapse
|
86
|
Xu J, Wang S. Analysis of distance-based protein structure prediction by deep learning in CASP13. Proteins 2019; 87:1069-1081. [PMID: 31471916 DOI: 10.1002/prot.25810] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 07/24/2019] [Accepted: 08/27/2019] [Indexed: 12/30/2022]
Abstract
This paper reports the CASP13 results of distance-based contact prediction, threading, and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free-modeling) targets with a median multiple sequence alignment (MSA) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time-consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2, and L long-range contact precision of 70%, 58%, and 45%, respectively, and predicted correct folds (TMscore > 0.5) for 18 of 32 targets. Further, RaptorX predicted correct folds for all FM targets with >300 residues (T0950-D1, T0969-D1, and T1000-D2) and generated the best 3D models for T0950-D1 and T0969-D1 among all groups. This CASP13 test confirms our previous findings: (a) predicted distance is more useful than contacts for both template-based and free modeling; and (b) structure modeling may be improved by integrating template and coevolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13.
Collapse
Affiliation(s)
- Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
87
|
Guzenko D, Lafita A, Monastyrskyy B, Kryshtafovych A, Duarte JM. Assessment of protein assembly prediction in CASP13. Proteins 2019; 87:1190-1199. [PMID: 31374138 DOI: 10.1002/prot.25795] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Revised: 07/11/2019] [Accepted: 07/27/2019] [Indexed: 01/08/2023]
Abstract
We present the assembly category assessment in the 13th edition of the CASP community-wide experiment. For the second time, protein assemblies constitute an independent assessment category. Compared to the last edition we see a clear uptake in participation, more oligomeric targets released, and consistent, albeit modest, improvement of the predictions quality. Looking at the tertiary structure predictions, we observe that ignoring the oligomeric state of the targets hinders modeling success. We also note that some contact prediction groups successfully predicted homomeric interfacial contacts, though it appears that these predictions were not used for assembly modeling. Homology modeling with sizeable human intervention appears to form the basis of the assembly prediction techniques in this round of CASP. Future developments should see more integrated approaches where subunits are modeled in the context of the assemblies they form.
Collapse
Affiliation(s)
- Dmytro Guzenko
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, California
| | - Aleix Lafita
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - Bohdan Monastyrskyy
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California, USA
| | - Andriy Kryshtafovych
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, California
| |
Collapse
|
88
|
Torrisi M, Kaleel M, Pollastri G. Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction. Sci Rep 2019; 9:12374. [PMID: 31451723 PMCID: PMC6710256 DOI: 10.1038/s41598-019-48786-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 08/12/2019] [Indexed: 01/10/2023] Open
Abstract
Protein Secondary Structure prediction has been a central topic of research in Bioinformatics for decades. In spite of this, even the most sophisticated ab initio SS predictors are not able to reach the theoretical limit of three-state prediction accuracy (88–90%), while only a few predict more than the 3 traditional Helix, Strand and Coil classes. In this study we present tests on different models trained both on single sequence and evolutionary profile-based inputs and develop a new state-of-the-art system with Porter 5. Porter 5 is composed of ensembles of cascaded Bidirectional Recurrent Neural Networks and Convolutional Neural Networks, incorporates new input encoding techniques and is trained on a large set of protein structures. Porter 5 achieves 84% accuracy (81% SOV) when tested on 3 classes and 73% accuracy (70% SOV) on 8 classes on a large independent set. In our tests Porter 5 is 2% more accurate than its previous version and outperforms or matches the most recent predictors of secondary structure we tested. When Porter 5 is retrained on SCOPe based sets that eliminate homology between training/testing samples we obtain similar results. Porter is available as a web server and standalone program at http://distilldeep.ucd.ie/porter/ alongside all the datasets and alignments.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Manaz Kaleel
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
| |
Collapse
|
89
|
Croll TI, Sammito MD, Kryshtafovych A, Read RJ. Evaluation of template-based modeling in CASP13. Proteins 2019; 87:1113-1127. [PMID: 31407380 PMCID: PMC6851432 DOI: 10.1002/prot.25800] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 07/29/2019] [Accepted: 08/08/2019] [Indexed: 12/12/2022]
Abstract
Performance in the template‐based modeling (TBM) category of CASP13 is assessed here, using a variety of metrics. Performance of the predictor groups that participated is ranked using the primary ranking score that was developed by the assessors for CASP12. This reveals that the best results are obtained by groups that include contact predictions or inter‐residue distance predictions derived from deep multiple sequence alignments. In cases where there is a good homolog in the wwPDB (TBM‐easy category), the best results are obtained by modifying a template. However, for cases with poorer homologs (TBM‐hard), very good results can be obtained without using an explicit template, by deep learning algorithms trained on the wwPDB. Alternative metrics are introduced, to allow testing of aspects of structural models that are not addressed by traditional CASP metrics. These include comparisons to the main‐chain and side‐chain torsion angles of the target, and the utility of models for solving crystal structures by the molecular replacement method. The alternative metrics are poorly correlated with the traditional metrics, and it is proposed that modeling has reached a sufficient level of maturity that the best models should be expected to satisfy this wider range of criteria.
Collapse
Affiliation(s)
- Tristan I Croll
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| | - Massimo D Sammito
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| | | | - Randy J Read
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| |
Collapse
|
90
|
Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, Heer FT, de Beer TAP, Rempfer C, Bordoli L, Lepore R, Schwede T. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 2019; 46:W296-W303. [PMID: 29788355 PMCID: PMC6030848 DOI: 10.1093/nar/gky427] [Citation(s) in RCA: 7414] [Impact Index Per Article: 1482.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 05/07/2018] [Indexed: 11/13/2022] Open
Abstract
Homology modelling has matured into an important technique in structural biology, significantly contributing to narrowing the gap between known protein sequences and experimentally determined structures. Fully automated workflows and servers simplify and streamline the homology modelling process, also allowing users without a specific computational expertise to generate reliable protein models and have easy access to modelling results, their visualization and interpretation. Here, we present an update to the SWISS-MODEL server, which pioneered the field of automated modelling 25 years ago and been continuously further developed. Recently, its functionality has been extended to the modelling of homo- and heteromeric complexes. Starting from the amino acid sequences of the interacting proteins, both the stoichiometry and the overall structure of the complex are inferred by homology modelling. Other major improvements include the implementation of a new modelling engine, ProMod3 and the introduction a new local model quality estimation method, QMEANDisCo. SWISS-MODEL is freely available at https://swissmodel.expasy.org.
Collapse
Affiliation(s)
- Andrew Waterhouse
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Martino Bertoni
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Stefan Bienert
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Rafal Gumienny
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Florian T Heer
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Tjaart A P de Beer
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Christine Rempfer
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Lorenza Bordoli
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Rosalba Lepore
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| |
Collapse
|
91
|
Cheng J, Choe MH, Elofsson A, Han KS, Hou J, Maghrabi AHA, McGuffin LJ, Menéndez-Hurtado D, Olechnovič K, Schwede T, Studer G, Uziela K, Venclovas Č, Wallner B. Estimation of model accuracy in CASP13. Proteins 2019; 87:1361-1377. [PMID: 31265154 DOI: 10.1002/prot.25767] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 06/04/2019] [Accepted: 06/15/2019] [Indexed: 12/28/2022]
Abstract
Methods to reliably estimate the accuracy of 3D models of proteins are both a fundamental part of most protein folding pipelines and important for reliable identification of the best models when multiple pipelines are used. Here, we describe the progress made from CASP12 to CASP13 in the field of estimation of model accuracy (EMA) as seen from the progress of the most successful methods in CASP13. We show small but clear progress, that is, several methods perform better than the best methods from CASP12 when tested on CASP13 EMA targets. Some progress is driven by applying deep learning and residue-residue contacts to model accuracy prediction. We show that the best EMA methods select better models than the best servers in CASP13, but that there exists a great potential to improve this further. Also, according to the evaluation criteria based on local similarities, such as lDDT and CAD, it is now clear that single model accuracy methods perform relatively better than consensus-based methods.
Collapse
Affiliation(s)
- Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Myong-Ho Choe
- Department of Life Science, University of Science, Pyongyang, DPR Korea
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Kun-Sop Han
- Department of Life Science, University of Science, Pyongyang, DPR Korea
| | - Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Ali H A Maghrabi
- School of Biological Sciences, University of Reading, Reading, UK
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, UK
| | - David Menéndez-Hurtado
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Basel, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Basel, Switzerland
| | - Karolis Uziela
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Björn Wallner
- Department of Physics, Chemistry, and Biology, Bioinformatics Division, Linköping University, Linköping, Sweden
| |
Collapse
|
92
|
Wagner JR, Churas CP, Liu S, Swift RV, Chiu M, Shao C, Feher VA, Burley SK, Gilson MK, Amaro RE. Continuous Evaluation of Ligand Protein Predictions: A Weekly Community Challenge for Drug Docking. Structure 2019; 27:1326-1335.e4. [PMID: 31257108 DOI: 10.1016/j.str.2019.05.012] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 03/14/2019] [Accepted: 05/30/2019] [Indexed: 12/19/2022]
Abstract
Docking calculations can accelerate drug discovery by predicting the bound poses of ligands for a targeted protein. However, it is not clear which docking methods work best. Furthermore, predicting poses requires steps outside the docking algorithm itself, such as preparation of the protein and ligand, and it is not known which components are most in need of improvement. The Continuous Evaluation of Ligand Protein Predictions (CELPP) is a blinded prediction challenge designed to address these issues. Participants create a workflow to predict protein-ligand binding poses, which is then tasked with predicting 10-100 new protein-ligand crystal structures each week. CELPP evaluates the accuracy of each workflow's predictions and posts the scores online. The results can be used to identify the strengths and weaknesses of current approaches, help map docking problems to the algorithms most likely to overcome them, and illuminate areas of unmet need in structure-guided drug design.
Collapse
Affiliation(s)
- Jeffrey R Wagner
- Drug Design Data Resource, University of California San Diego, La Jolla, CA 92093, USA
| | - Christopher P Churas
- Drug Design Data Resource, University of California San Diego, La Jolla, CA 92093, USA
| | - Shuai Liu
- Drug Design Data Resource, University of California San Diego, La Jolla, CA 92093, USA
| | - Robert V Swift
- Drug Design Data Resource, University of California San Diego, La Jolla, CA 92093, USA
| | - Michael Chiu
- Drug Design Data Resource, University of California San Diego, La Jolla, CA 92093, USA
| | - Chenghua Shao
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Victoria A Feher
- Drug Design Data Resource, University of California San Diego, La Jolla, CA 92093, USA
| | - Stephen K Burley
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Michael K Gilson
- Drug Design Data Resource, University of California San Diego, La Jolla, CA 92093, USA; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA.
| | - Rommie E Amaro
- Drug Design Data Resource, University of California San Diego, La Jolla, CA 92093, USA; Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
93
|
AlQuraishi M. ProteinNet: a standardized data set for machine learning of protein structure. BMC Bioinformatics 2019; 20:311. [PMID: 31185886 PMCID: PMC6560865 DOI: 10.1186/s12859-019-2932-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 06/05/2019] [Indexed: 02/01/2023] Open
Abstract
Background Rapid progress in deep learning has spurred its application to bioinformatics problems including protein structure prediction and design. In classic machine learning problems like computer vision, progress has been driven by standardized data sets that facilitate fair assessment of new methods and lower the barrier to entry for non-domain experts. While data sets of protein sequence and structure exist, they lack certain components critical for machine learning, including high-quality multiple sequence alignments and insulated training/validation splits that account for deep but only weakly detectable homology across protein space. Results We created the ProteinNet series of data sets to provide a standardized mechanism for training and assessing data-driven models of protein sequence-structure relationships. ProteinNet integrates sequence, structure, and evolutionary information in programmatically accessible file formats tailored for machine learning frameworks. Multiple sequence alignments of all structurally characterized proteins were created using substantial high-performance computing resources. Standardized data splits were also generated to emulate the difficulty of past CASP (Critical Assessment of protein Structure Prediction) experiments by resetting protein sequence and structure space to the historical states that preceded six prior CASPs. Utilizing sensitive evolution-based distance metrics to segregate distantly related proteins, we have additionally created validation sets distinct from the official CASP sets that faithfully mimic their difficulty. Conclusion ProteinNet represents a comprehensive and accessible resource for training and assessing machine-learned models of protein structure.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Laboratory of Systems Pharmacology, Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, MA, 02115, USA.
| |
Collapse
|
94
|
Plasmodium pseudo-Tyrosine Kinase-like binds PP1 and SERA5 and is exported to host erythrocytes. Sci Rep 2019; 9:8120. [PMID: 31148576 PMCID: PMC6544628 DOI: 10.1038/s41598-019-44542-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 05/15/2019] [Indexed: 01/13/2023] Open
Abstract
Pseudokinases play key roles in many biological processes but they are poorly understood compared to active kinases. Eight putative pseudokinases have been predicted in Plasmodium species. We selected the unique pseudokinase belonging to tyrosine kinase like (TKL) family for detailed structural and functional analysis in P. falciparum and P. berghei. The primary structure of PfpTKL lacks residues critical for kinase activity, supporting its annotation as a pseudokinase. The recombinant pTKL pseudokinase domain was able to bind ATP, but lacked catalytic activity as predicted. The sterile alpha motif (SAM) and RVxF motifs of PfpTKL were found to interact with the P. falciparum proteins serine repeat antigen 5 (SERA5) and protein phosphatase type 1 (PP1) respectively, suggesting that pTKL has a scaffolding role. Furthermore, we found that PP1c activity in a heterologous model was modulated in an RVxF-dependent manner. During the trophozoite stages, PbpTKL was exported to infected erythrocytes where it formed complexes with proteins involved in cytoskeletal organization or host cell maturation and homeostasis. Finally, genetic analysis demonstrated that viable strains obtained by genomic deletion or knocking down PbpTKL did not affect the course of parasite intra-erythrocytic development or gametocyte emergence, indicating functional redundancy during these parasite stages.
Collapse
|
95
|
Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Costanzo LD, Christie C, Duarte JM, Dutta S, Feng Z, Ghosh S, Goodsell DS, Green RK, Guranovic V, Guzenko D, Hudson BP, Liang Y, Lowe R, Peisach E, Periskova I, Randle C, Rose A, Sekharan M, Shao C, Tao YP, Valasatava Y, Voigt M, Westbrook J, Young J, Zardecki C, Zhuravleva M, Kurisu G, Nakamura H, Kengaku Y, Cho H, Sato J, Kim JY, Ikegawa Y, Nakagawa A, Yamashita R, Kudou T, Bekker GJ, Suzuki H, Iwata T, Yokochi M, Kobayashi N, Fujiwara T, Velankar S, Kleywegt GJ, Anyango S, Armstrong DR, Berrisford JM, Conroy MJ, Dana JM, Deshpande M, Gane P, Gáborová R, Gupta D, Gutmanas A, Koča J, Mak L, Mir S, Mukhopadhyay A, Nadzirin N, Nair S, Patwardhan A, Paysan-Lafosse T, Pravda L, Salih O, Sehnal D, Varadi M, Vařeková R, Markley JL, Hoch JC, Romero PR, Baskaran K, Maziuk D, Ulrich EL, Wedell JR, Yao H, Livny M, Ioannidis YE. Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res 2019; 47:D520-D528. [PMID: 30357364 PMCID: PMC6324056 DOI: 10.1093/nar/gky949] [Citation(s) in RCA: 567] [Impact Index Per Article: 113.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 09/28/2018] [Accepted: 10/05/2018] [Indexed: 01/10/2023] Open
Abstract
The Protein Data Bank (PDB) is the single global archive of experimentally determined three-dimensional (3D) structure data of biological macromolecules. Since 2003, the PDB has been managed by the Worldwide Protein Data Bank (wwPDB; wwpdb.org), an international consortium that collaboratively oversees deposition, validation, biocuration, and open access dissemination of 3D macromolecular structure data. The PDB Core Archive houses 3D atomic coordinates of more than 144 000 structural models of proteins, DNA/RNA, and their complexes with metals and small molecules and related experimental data and metadata. Structure and experimental data/metadata are also stored in the PDB Core Archive using the readily extensible wwPDB PDBx/mmCIF master data format, which will continue to evolve as data/metadata from new experimental techniques and structure determination methods are incorporated by the wwPDB. Impacts of the recently developed universal wwPDB OneDep deposition/validation/biocuration system and various methods-specific wwPDB Validation Task Forces on improving the quality of structures and data housed in the PDB Core Archive are described together with current challenges and future plans.
Collapse
|
96
|
Role of solvent accessibility for aggregation-prone patches in protein folding. Sci Rep 2018; 8:12896. [PMID: 30150761 PMCID: PMC6110721 DOI: 10.1038/s41598-018-31289-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 08/15/2018] [Indexed: 11/21/2022] Open
Abstract
The arrangement of amino acids in a protein sequence encodes its native folding. However, the same arrangement in aggregation-prone regions may cause misfolding as a result of local environmental stress. Under normal physiological conditions, such regions congregate in the protein’s interior to avoid aggregation and attain the native fold. We have used solvent accessibility of aggregation patches (SAAPp) to determine the packing of aggregation-prone residues. Our results showed that SAAPp has low values for native crystal structures, consistent with protein folding as a mechanism to minimize the solvent accessibility of aggregation-prone residues. SAAPp also shows an average correlation of 0.76 with the global distance test (GDT) score on CASP12 template-based protein models. Using SAAPp scores and five structural features, a random forest machine learning quality assessment tool, SAAP-QA, showed 2.32 average GDT loss between best model predicted and actual best based on GDT score on independent CASP test data, with the ability to discriminate native-like folds having an AUC of 0.94. Overall, the Pearson correlation coefficient (PCC) between true and predicted GDT scores on independent CASP data was 0.86 while on the external CAMEO dataset, comprising high quality protein structures, PCC and average GDT loss were 0.71 and 4.46 respectively. SAAP-QA can be used to detect the quality of models and iteratively improve them to native or near-native structures.
Collapse
|
97
|
Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, Heer FT, de Beer TAP, Rempfer C, Bordoli L, Lepore R, Schwede T. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 2018. [PMID: 29788355 DOI: 10.1093/nar/gky427.pmid:29788355] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2023] Open
Abstract
Homology modelling has matured into an important technique in structural biology, significantly contributing to narrowing the gap between known protein sequences and experimentally determined structures. Fully automated workflows and servers simplify and streamline the homology modelling process, also allowing users without a specific computational expertise to generate reliable protein models and have easy access to modelling results, their visualization and interpretation. Here, we present an update to the SWISS-MODEL server, which pioneered the field of automated modelling 25 years ago and been continuously further developed. Recently, its functionality has been extended to the modelling of homo- and heteromeric complexes. Starting from the amino acid sequences of the interacting proteins, both the stoichiometry and the overall structure of the complex are inferred by homology modelling. Other major improvements include the implementation of a new modelling engine, ProMod3 and the introduction a new local model quality estimation method, QMEANDisCo. SWISS-MODEL is freely available at https://swissmodel.expasy.org.
Collapse
Affiliation(s)
- Andrew Waterhouse
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Martino Bertoni
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Stefan Bienert
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Rafal Gumienny
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Florian T Heer
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Tjaart A P de Beer
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Christine Rempfer
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Lorenza Bordoli
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Rosalba Lepore
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| |
Collapse
|
98
|
Uziela K, Menéndez Hurtado D, Shu N, Wallner B, Elofsson A. Improved protein model quality assessments by changing the target function. Proteins 2018. [PMID: 29524250 DOI: 10.1002/prot.25492] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Protein modeling quality is an important part of protein structure prediction. We have for more than a decade developed a set of methods for this problem. We have used various types of description of the protein and different machine learning methodologies. However, common to all these methods has been the target function used for training. The target function in ProQ describes the local quality of a residue in a protein model. In all versions of ProQ the target function has been the S-score. However, other quality estimation functions also exist, which can be divided into superposition- and contact-based methods. The superposition-based methods, such as S-score, are based on a rigid body superposition of a protein model and the native structure, while the contact-based methods compare the local environment of each residue. Here, we examine the effects of retraining our latest predictor, ProQ3D, using identical inputs but different target functions. We find that the contact-based methods are easier to predict and that predictors trained on these measures provide some advantages when it comes to identifying the best model. One possible reason for this is that contact based methods are better at estimating the quality of multi-domain targets. However, training on the S-score gives the best correlation with the GDT_TS score, which is commonly used in CASP to score the global model quality. To take the advantage of both of these features we provide an updated version of ProQ3D that predicts local and global model quality estimates based on different quality estimates.
Collapse
Affiliation(s)
- Karolis Uziela
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - David Menéndez Hurtado
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Nanjiang Shu
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden.,Bioinformatics Short-term Support and Infrastructure (BILS), Science for Life Laboratory, Solna, Sweden
| | - Björn Wallner
- Department of Physics, Chemistry and Biology (IFM)/Bioinformatics, Linköping University, Linköping, Sweden
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| |
Collapse
|
99
|
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)-Round XII. Proteins 2018; 86 Suppl 1:7-15. [PMID: 29082672 PMCID: PMC5897042 DOI: 10.1002/prot.25415] [Citation(s) in RCA: 245] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Revised: 10/25/2017] [Accepted: 10/27/2017] [Indexed: 12/24/2022]
Abstract
This article reports the outcome of the 12th round of Critical Assessment of Structure Prediction (CASP12), held in 2016. CASP is a community experiment to determine the state of the art in modeling protein structure from amino acid sequence. Participants are provided sequence information and in turn provide protein structure models and related information. Analysis of the submitted structures by independent assessors provides a comprehensive picture of the capabilities of current methods, and allows progress to be identified. This was again an exciting round of CASP, with significant advances in 4 areas: (i) The use of new methods for predicting three-dimensional contacts led to a two-fold improvement in contact accuracy. (ii) As a consequence, model accuracy for proteins where no template was available improved dramatically. (iii) Models based on a structural template showed overall improvement in accuracy. (iv) Methods for estimating the accuracy of a model continued to improve. CASP continued to develop new areas: (i) Assessing methods for building quaternary structure models, including an expansion of the collaboration between CASP and CAPRI. (ii) Modeling with the aid of experimental data was extended to include SAXS data, as well as again using chemical cross-linking information. (iii) A team of assessors evaluated the suitability of models for a range of applications, including mutation interpretation, analysis of ligand binding properties, and identification of interfaces. This article describes the experiment and summarizes the results. The rest of this special issue of PROTEINS contains papers describing CASP12 results and assessments in more detail.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research and Department of Cell Biology and Molecular Genetics, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - Torsten Schwede
- University of Basel, Biozentrum & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Anna Tramontano
- Department of Physics and Istituto Pasteur - Fondazione Cenci Bolognetti, Sapienza University of Rome, P.le Aldo Moro, 5, 00185 Rome, Italy
| |
Collapse
|