1
|
Simpkin AJ, Mesdaghi S, Sánchez Rodríguez F, Elliott L, Murphy DL, Kryshtafovych A, Keegan RM, Rigden DJ. Tertiary structure assessment at CASP15. Proteins 2023; 91:1616-1635. [PMID: 37746927 PMCID: PMC10792517 DOI: 10.1002/prot.26593] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 08/25/2023] [Accepted: 09/07/2023] [Indexed: 09/26/2023]
Abstract
The results of tertiary structure assessment at CASP15 are reported. For the first time, recognizing the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single-chain predictions were assessed together, irrespective of whether a template was available. At CASP15, there was no single stand-out group, with most of the best-scoring groups-led by PEZYFoldings, UM-TBM, and Yang Server-employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues. Local divergence between prediction and target correlated with localization at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, and should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups produced high-quality predictions for most targets, which are valuable for experimental structure determination, functional analysis, and many other tasks across biology. These include those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas: the confidence estimates of the former were also notably accurate.
Collapse
Affiliation(s)
- Adam J. Simpkin
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | - Shahram Mesdaghi
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
- Computational Biology Facility, MerseyBio, University of LiverpoolLiverpoolUK
| | - Filomeno Sánchez Rodríguez
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
- Life Science, Diamond Light Source, Harwell Science and Innovation CampusOxfordshireUK
- Department of Chemistry, York Structural Biology LaboratoryUniversity of YorkYorkUK
| | - Luc Elliott
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | - David L. Murphy
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | | | - Ronan M. Keegan
- UKRI‐STFC, Rutherford Appleton Laboratory, Research Complex at HarwellDidcotUK
| | - Daniel J. Rigden
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| |
Collapse
|
2
|
Lee JW, Won JH, Jeon S, Choo Y, Yeon Y, Oh JS, Kim M, Kim S, Joung I, Jang C, Lee SJ, Kim TH, Jin KH, Song G, Kim ES, Yoo J, Paek E, Noh YK, Joo K. DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function. Bioinformatics 2023; 39:btad712. [PMID: 37995286 PMCID: PMC10699847 DOI: 10.1093/bioinformatics/btad712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 11/25/2023] Open
Abstract
MOTIVATION Predicting protein structures with high accuracy is a critical challenge for the broad community of life sciences and industry. Despite progress made by deep neural networks like AlphaFold2, there is a need for further improvements in the quality of detailed structures, such as side-chains, along with protein backbone structures. RESULTS Building upon the successes of AlphaFold2, the modifications we made include changing the losses of side-chain torsion angles and frame aligned point error, adding loss functions for side chain confidence and secondary structure prediction, and replacing template feature generation with a new alignment method based on conditional random fields. We also performed re-optimization by conformational space annealing using a molecular mechanics energy function which integrates the potential energies obtained from distogram and side-chain prediction. In the CASP15 blind test for single protein and domain modeling (109 domains), DeepFold ranked fourth among 132 groups with improvements in the details of the structure in terms of backbone, side-chain, and Molprobity. In terms of protein backbone accuracy, DeepFold achieved a median GDT-TS score of 88.64 compared with 85.88 of AlphaFold2. For TBM-easy/hard targets, DeepFold ranked at the top based on Z-scores for GDT-TS. This shows its practical value to the structural biology community, which demands highly accurate structures. In addition, a thorough analysis of 55 domains from 39 targets with publicly available structures indicates that DeepFold shows superior side-chain accuracy and Molprobity scores among the top-performing groups. AVAILABILITY AND IMPLEMENTATION DeepFold tools are open-source software available at https://github.com/newtonjoo/deepfold.
Collapse
Affiliation(s)
- Jae-Won Lee
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jong-Hyun Won
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Seonggwang Jeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Yujin Choo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Yubin Yeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jin-Seon Oh
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Minsoo Kim
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - SeonHwa Kim
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | | | - Cheongjae Jang
- Artificial Intelligence Institute, Hanyang University, Seoul 04763, Korea
| | - Sung Jong Lee
- Basic Science Research Institute, Changwon National University, Changwon 51140, Korea
| | - Tae Hyun Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Kyong Hwan Jin
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | - Giltae Song
- School of Computer Science and Engineering, Pusan National University, Busan 46241, Korea
| | - Eun-Sol Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Jejoong Yoo
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Yung-Kyun Noh
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| |
Collapse
|
3
|
Oda T. Improving protein structure prediction with extended sequence similarity searches and deep-learning-based refinement in CASP15. Proteins 2023; 91:1712-1723. [PMID: 37485822 DOI: 10.1002/prot.26551] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/23/2023] [Accepted: 06/28/2023] [Indexed: 07/25/2023]
Abstract
The human predictor team PEZYFoldings got first place with the assessor's formulae (3rd place with Global Distance Test Total Score [GDT-TS]) in the single-domain category and 10th place in the multimer category in Critical Assessment of Structure Prediction 15. In this paper, I describe the exact method used by PEZYFoldings in the competition. As AlphaFold2 and AlphaFold-Multimer, developed by DeepMind, were state-of-the-art structure prediction tools, it was assumed that enhancing the input and output of the tools was an effective strategy to obtain the highest accuracy for structure prediction. Therefore, I used additional tools and databases to collect evolutionarily related sequences and introduced a deep-learning-based model in the refinement step. In addition to these modifications, manual interventions were performed to address various tasks. Detailed analyses were performed after the competition to identify the main contributors to performance. Comparing the number of evolutionarily related sequences I used with those of the other teams that provided AlphaFold2's baseline predictions revealed that an extensive sequence similarity search was one of the main contributors. Nonetheless, there were specific targets for which I could not identify any evolutionarily related sequences, resulting in my inability to construct accurate structures for these targets. Notably, I noticed that I had gained large Z-scores with the subunits of H1137, for which I performed manual domain parsing considering the interfaces between the subunits. This finding implies that the manual intervention contributed to my performance. The influence of the refinement model on the accuracy of structure prediction was minimal. I could have predicted structures with a similar level of accuracy without employing the refinement model. However, from the perspective of accuracy self-estimate, many structures demonstrated improvement after refinement. This improvement likely had a substantial influence on improving my position in the assessor's formulae rankings. These results highlight the opportunities for improvement in (1) multimer prediction, (2) building of larger and more diverse databases, and (3) developing tools to predict structures from primary sequences alone. In addition, transferring the manual intervention process to automation is a future concern.
Collapse
|
4
|
Huang GJ, Parry TK, McLaughlin WA. Assessment of the Performances of the Protein Modeling Techniques Participating in CASP15 Using a Structure-Based Functional Site Prediction Approach: ResiRole. Bioengineering (Basel) 2023; 10:1377. [PMID: 38135968 PMCID: PMC10740689 DOI: 10.3390/bioengineering10121377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/27/2023] [Accepted: 11/28/2023] [Indexed: 12/24/2023] Open
Abstract
BACKGROUND Model quality assessments via computational methods which entail comparisons of the modeled structures to the experimentally determined structures are essential in the field of protein structure prediction. The assessments provide means to benchmark the accuracies of the modeling techniques and to aid with their development. We previously described the ResiRole method to gauge model quality principally based on the preservation of the structural characteristics described in SeqFEATURE functional site prediction models. METHODS We apply ResiRole to benchmark modeling group performances in the Critical Assessment of Structure Prediction experiment, round 15. To gauge model quality, a normalized Predicted Functional site Similarity Score (PFSS) was calculated as the average of one minus the absolute values of the differences of the functional site prediction probabilities, as found for the experimental structures versus those found at the corresponding sites in the structure models. RESULTS The average PFSS per modeling group (gPFSS) correlates with standard quality metrics, and can effectively be used to rank the accuracies of the groups. For the free modeling (FM) category, correlation coefficients of the Local Distance Difference Test (LDDT) and Global Distance Test-Total Score (GDT-TS) metrics with gPFSS were 0.98239 and 0.87691, respectively. An example finding for a specific group is that the gPFSS for EMBER3D was higher than expected based on the predictive relationship between gPFSS and LDDT. We infer the result is due to the use of constraints imprinted by function that are a part of the EMBER3D methodology. Also, we find functional site predictions that may guide further functional characterizations of the respective proteins. CONCLUSION The gPFSS metric provides an effective means to assess and rank the performances of the structure prediction techniques according to their abilities to accurately recount the structural features at predicted functional sites.
Collapse
Affiliation(s)
| | | | - William A. McLaughlin
- Department of Medical Education, Geisinger Commonwealth School of Medicine, 525 Pine Street, Scranton, PA 18509, USA (T.K.P.)
| |
Collapse
|
5
|
Bartuzi D, Kaczor AA, Matosiuk D. Illuminating the "Twilight Zone": Advances in Difficult Protein Modeling. Methods Mol Biol 2023; 2627:25-40. [PMID: 36959440 DOI: 10.1007/978-1-0716-2974-1_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Homology modeling was long considered a method of choice in tertiary protein structure prediction. However, it used to provide models of acceptable quality only when templates with appreciable sequence identity with a target could be found. The threshold value was long assumed to be around 20-30%. Below this level, obtained sequence identity was getting dangerously close to values that can be obtained by chance, after aligning any random, unrelated sequences. In these cases, other approaches, including ab initio folding simulations or fragment assembly, were usually employed. The most recent editions of the CASP and CAMEO community-wide modeling methods assessment have brought some surprising outcomes, proving that much more clues can be inferred from protein sequence analyses than previously thought. In this chapter, we focus on recent advances in the field of difficult protein modeling, pushing the threshold deep into the "twilight zone", with particular attention devoted to improvements in applications of machine learning and model evaluation.
Collapse
Affiliation(s)
- Damian Bartuzi
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland.
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
- University of Eastern Finland, School of Pharmacy, Kuopio, Finland
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
| |
Collapse
|
6
|
Oeffner RD, Croll TI, Millán C, Poon BK, Schlicksup CJ, Read RJ, Terwilliger TC. Putting AlphaFold models to work with phenix.process_predicted_model and ISOLDE. Acta Crystallogr D Struct Biol 2022; 78:1303-1314. [PMID: 36322415 PMCID: PMC9629492 DOI: 10.1107/s2059798322010026] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/13/2022] [Indexed: 11/23/2022] Open
Abstract
AlphaFold has recently become an important tool in providing models for experimental structure determination by X-ray crystallography and cryo-EM. Large parts of the predicted models typically approach the accuracy of experimentally determined structures, although there are frequently local errors and errors in the relative orientations of domains. Importantly, residues in the model of a protein predicted by AlphaFold are tagged with a predicted local distance difference test score, informing users about which regions of the structure are predicted with less confidence. AlphaFold also produces a predicted aligned error matrix indicating its confidence in the relative positions of each pair of residues in the predicted model. The phenix.process_predicted_model tool downweights or removes low-confidence residues and can break a model into confidently predicted domains in preparation for molecular replacement or cryo-EM docking. These confidence metrics are further used in ISOLDE to weight torsion and atom-atom distance restraints, allowing the complete AlphaFold model to be interactively rearranged to match the docked fragments and reducing the need for the rebuilding of connecting regions.
Collapse
Affiliation(s)
- Robert D. Oeffner
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge Biomedical Campus, The Keith Peters Building, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Tristan I. Croll
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge Biomedical Campus, The Keith Peters Building, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Claudia Millán
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge Biomedical Campus, The Keith Peters Building, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Billy K. Poon
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory (LBNL), Building 33R0349, Berkeley, CA 94720-8235, USA
| | - Christopher J. Schlicksup
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory (LBNL), Building 33R0349, Berkeley, CA 94720-8235, USA
| | - Randy J. Read
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge Biomedical Campus, The Keith Peters Building, Hills Road, Cambridge CB2 0XY, United Kingdom,Correspondence e-mail: ,
| | - Tom C. Terwilliger
- New Mexico Consortium, Los Alamos National Laboratory, 100 Entrada Drive, Los Alamos, NM 87544, USA,Correspondence e-mail: ,
| |
Collapse
|
7
|
I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 2022; 17:2326-2353. [PMID: 35931779 DOI: 10.1038/s41596-022-00728-0] [Citation(s) in RCA: 104] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/24/2022] [Indexed: 01/17/2023]
Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
Collapse
|
8
|
Oliveira AL, Viegas MF, da Silva SL, Soares AM, Ramos MJ, Fernandes PA. The chemistry of snake venom and its medicinal potential. Nat Rev Chem 2022; 6:451-469. [PMID: 37117308 PMCID: PMC9185726 DOI: 10.1038/s41570-022-00393-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/26/2022] [Indexed: 12/15/2022]
Abstract
The fascination and fear of snakes dates back to time immemorial, with the first scientific treatise on snakebite envenoming, the Brooklyn Medical Papyrus, dating from ancient Egypt. Owing to their lethality, snakes have often been associated with images of perfidy, treachery and death. However, snakes did not always have such negative connotations. The curative capacity of venom has been known since antiquity, also making the snake a symbol of pharmacy and medicine. Today, there is renewed interest in pursuing snake-venom-based therapies. This Review focuses on the chemistry of snake venom and the potential for venom to be exploited for medicinal purposes in the development of drugs. The mixture of toxins that constitute snake venom is examined, focusing on the molecular structure, chemical reactivity and target recognition of the most bioactive toxins, from which bioactive drugs might be developed. The design and working mechanisms of snake-venom-derived drugs are illustrated, and the strategies by which toxins are transformed into therapeutics are analysed. Finally, the challenges in realizing the immense curative potential of snake venom are discussed, and chemical strategies by which a plethora of new drugs could be derived from snake venom are proposed.
Collapse
Affiliation(s)
- Ana L Oliveira
- Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, Porto, Portugal.,LAQV/Requimte, University of Porto, Porto, Portugal
| | - Matilde F Viegas
- Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, Porto, Portugal.,LAQV/Requimte, University of Porto, Porto, Portugal
| | - Saulo L da Silva
- Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, Porto, Portugal.,LAQV/Requimte, University of Porto, Porto, Portugal
| | - Andreimar M Soares
- Biotechnology Laboratory for Proteins and Bioactive Compounds from the Western Amazon, Oswaldo Cruz Foundation, National Institute of Epidemiology in the Western Amazon (INCT-EpiAmO), Porto Velho, Brazil.,Sao Lucas Universitary Center (UniSL), Porto Velho, Brazil
| | - Maria J Ramos
- Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, Porto, Portugal.,LAQV/Requimte, University of Porto, Porto, Portugal
| | - Pedro A Fernandes
- Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, Porto, Portugal.,LAQV/Requimte, University of Porto, Porto, Portugal
| |
Collapse
|
9
|
Simpkin AJ, Thomas JMH, Keegan RM, Rigden DJ. MrParse: finding homologues in the PDB and the EBI AlphaFold database for molecular replacement and more. ACTA CRYSTALLOGRAPHICA SECTION D STRUCTURAL BIOLOGY 2022; 78:553-559. [PMID: 35503204 PMCID: PMC9063843 DOI: 10.1107/s2059798322003576] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 03/29/2022] [Indexed: 11/10/2022]
Abstract
Crystallographers have an array of search-model options for structure solution by molecular replacement (MR). The well established options of homologous experimental structures and regular secondary-structure elements or motifs are increasingly supplemented by computational modelling. Such modelling may be carried out locally or may use pre-calculated predictions retrieved from databases such as the EBI AlphaFold database. MrParse is a new pipeline to help to streamline the decision process in MR by consolidating bioinformatic predictions in one place. When reflection data are provided, MrParse can rank any experimental homologues found using eLLG, which indicates the likelihood that a given search model will work in MR. Inbuilt displays of predicted secondary structure, coiled-coil and transmembrane regions further inform the choice of MR protocol. MrParse can also identify and rank homologues in the EBI AlphaFold database, a function that will also interest other structural biologists and bioinformaticians.
Collapse
|
10
|
Vingiani GM, Leone S, De Luca D, Borra M, Dobson ADW, Ianora A, De Luca P, Lauritano C. First identification and characterization of detoxifying plastic-degrading DBP hydrolases in the marine diatom Cylindrotheca closterium. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 812:152535. [PMID: 34942245 DOI: 10.1016/j.scitotenv.2021.152535] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 12/14/2021] [Accepted: 12/15/2021] [Indexed: 06/14/2023]
Abstract
Diatoms are photosynthetic organisms with potential biotechnological applications in the bioremediation sector, having shown the capacity to reduce environmental concentrations of different pollutants. The diatom Cylindrotheca closterium is known to degrade di-n-butyl phthalate (DBP), one of the most abundant phthalate esters in aquatic environments and a known endocrine-disrupting chemical. In this study, we present for the first time the in silico identification of two putative DBP hydrolases (provisionally called DBPH1 and DBPH2) in the transcriptome of C. closterium. We modeled the structure of both DBPH1-2 and their proposed interactions with the substrate to gain insights into their mechanism of action. Finally, we analyzed the expression levels of the two putative hydrolases upon exposure of C. closterium to different concentrations of DBP (5 and 10 mg/l) for 24 and 48 h. The data showed a DBP concentration-dependent increase in expression levels of both dbph1 and 2 genes, further highlighting their potential involvement in phthalates degradation. This is the first identification of phthalate-degrading enzymes in microalgae, providing new insights into the possible use of diatoms in bioremediation strategies targeting phthalates.
Collapse
Affiliation(s)
- Giorgio Maria Vingiani
- Ecosustainable Marine Biotechnology Department, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Napoli, Italy
| | - Serena Leone
- Department of Biology and Evolution of Marine Organisms (BEOM), Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Napoli, Italy
| | - Daniele De Luca
- Department of Biology, University of Naples Federico II, Botanic Garden of Naples, Via Foria 223, 80139 Naples, Italy
| | - Marco Borra
- Research Infrastructure for Marine Biological Resources Department, Stazione Zoologica Anton Dohrn, Villa Comunale, CAP80121, NA, Italy
| | - Alan D W Dobson
- School of Microbiology, University College Cork, College Road, T12 YN60 Cork, Ireland; Environmental Research Institute, University College Cork, Lee Road, T23XE10 Cork, Ireland
| | - Adrianna Ianora
- Ecosustainable Marine Biotechnology Department, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Napoli, Italy
| | - Pasquale De Luca
- Research Infrastructure for Marine Biological Resources Department, Stazione Zoologica Anton Dohrn, Villa Comunale, CAP80121, NA, Italy
| | - Chiara Lauritano
- Ecosustainable Marine Biotechnology Department, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Napoli, Italy.
| |
Collapse
|
11
|
Gao WND, Gao C, Deane JE, Carpentier DCJ, Smith GL, Graham SC. The crystal structure of vaccinia virus protein E2 and perspectives on the prediction of novel viral protein folds. J Gen Virol 2022; 103. [PMID: 35020582 PMCID: PMC8895614 DOI: 10.1099/jgv.0.001716] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The morphogenesis of vaccinia virus (VACV, family Poxviridae), the smallpox vaccine, is a complex process involving multiple distinct cellular membranes and resulting in multiple different forms of infectious virion. Efficient release of enveloped virions, which promote systemic spread of infection within hosts, requires the VACV protein E2 but the molecular basis of E2 function remains unclear and E2 lacks sequence homology to any well-characterised family of proteins. We solved the crystal structure of VACV E2 to 2.3 Å resolution, revealing that it comprises two domains with novel folds: an N-terminal annular (ring) domain and a C-terminal globular (head) domain. The C-terminal head domain displays weak structural homology with cellular (pseudo)kinases but lacks conserved surface residues or kinase features, suggesting that it is not enzymatically active, and possesses a large surface basic patch that might interact with phosphoinositide lipid headgroups. Recent deep learning methods have revolutionised our ability to predict the three-dimensional structures of proteins from primary sequence alone. VACV E2 is an exemplar ‘difficult’ viral protein target for structure prediction, being comprised of multiple novel domains and lacking sequence homologues outside Poxviridae. AlphaFold2 nonetheless succeeds in predicting the structures of the head and ring domains with high and moderate accuracy, respectively, allowing accurate inference of multiple structural properties. The advent of highly accurate virus structure prediction marks a step-change in structural virology and beckons a new era of structurally-informed molecular virology.
Collapse
Affiliation(s)
- William N D Gao
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Chen Gao
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Janet E Deane
- Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK
| | - David C J Carpentier
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Geoffrey L Smith
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Stephen C Graham
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| |
Collapse
|
12
|
McCoy AJ, Sammito MD, Read RJ. Implications of AlphaFold2 for crystallographic phasing by molecular replacement. Acta Crystallogr D Struct Biol 2022; 78:1-13. [PMID: 34981757 PMCID: PMC8725160 DOI: 10.1107/s2059798321012122] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 11/13/2021] [Indexed: 12/11/2022] Open
Abstract
The AlphaFold2 results in the 14th edition of Critical Assessment of Structure Prediction (CASP14) showed that accurate (low root-mean-square deviation) in silico models of protein structure domains are on the horizon, whether or not the protein is related to known structures through high-coverage sequence similarity. As highly accurate models become available, generated by harnessing the power of correlated mutations and deep learning, one of the aspects of structural biology to be impacted will be methods of phasing in crystallography. Here, the data from CASP14 are used to explore the prospects for changes in phasing methods, and in particular to explore the prospects for molecular-replacement phasing using in silico models.
Collapse
Affiliation(s)
- Airlie J. McCoy
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Massimo D. Sammito
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Randy J. Read
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| |
Collapse
|
13
|
Rudnev VR, Kulikova LI, Nikolsky KS, Malsagova KA, Kopylov AT, Kaysheva AL. Current Approaches in Supersecondary Structures Investigation. Int J Mol Sci 2021; 22:11879. [PMID: 34769310 PMCID: PMC8584461 DOI: 10.3390/ijms222111879] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 10/27/2021] [Accepted: 10/29/2021] [Indexed: 11/16/2022] Open
Abstract
Proteins expressed during the cell cycle determine cell function, topology, and responses to environmental influences. The development and improvement of experimental methods in the field of structural biology provide valuable information about the structure and functions of individual proteins. This work is devoted to the study of supersecondary structures of proteins and determination of their structural motifs, description of experimental methods for their detection, databases, and repositories for storage, as well as methods of molecular dynamics research. The interest in the study of supersecondary structures in proteins is due to their autonomous stability outside the protein globule, which makes it possible to study folding processes, conformational changes in protein isoforms, and aberrant proteins with high productivity.
Collapse
Affiliation(s)
- Vladimir R. Rudnev
- Biobanking Group, Branch of Institute of Biomedical Chemistry “Scientific and Education Center”, 109028 Moscow, Russia; (V.R.R.); (L.I.K.); (K.S.N.); (A.T.K.); (A.L.K.)
- Institute of Theoretical and Experimental Biophysics, Russian Academy of Sciences, 142290 Pushchino, Russia
| | - Liudmila I. Kulikova
- Biobanking Group, Branch of Institute of Biomedical Chemistry “Scientific and Education Center”, 109028 Moscow, Russia; (V.R.R.); (L.I.K.); (K.S.N.); (A.T.K.); (A.L.K.)
- Institute of Theoretical and Experimental Biophysics, Russian Academy of Sciences, 142290 Pushchino, Russia
- Institute of Mathematical Problems of Biology RAS—The Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, 142290 Pushchino, Russia
| | - Kirill S. Nikolsky
- Biobanking Group, Branch of Institute of Biomedical Chemistry “Scientific and Education Center”, 109028 Moscow, Russia; (V.R.R.); (L.I.K.); (K.S.N.); (A.T.K.); (A.L.K.)
| | - Kristina A. Malsagova
- Biobanking Group, Branch of Institute of Biomedical Chemistry “Scientific and Education Center”, 109028 Moscow, Russia; (V.R.R.); (L.I.K.); (K.S.N.); (A.T.K.); (A.L.K.)
| | - Arthur T. Kopylov
- Biobanking Group, Branch of Institute of Biomedical Chemistry “Scientific and Education Center”, 109028 Moscow, Russia; (V.R.R.); (L.I.K.); (K.S.N.); (A.T.K.); (A.L.K.)
| | - Anna L. Kaysheva
- Biobanking Group, Branch of Institute of Biomedical Chemistry “Scientific and Education Center”, 109028 Moscow, Russia; (V.R.R.); (L.I.K.); (K.S.N.); (A.T.K.); (A.L.K.)
| |
Collapse
|
14
|
Kryshtafovych A, Moult J, Billings WM, Della Corte D, Fidelis K, Kwon S, Olechnovič K, Seok C, Venclovas Č, Won J. Modeling SARS-CoV-2 proteins in the CASP-commons experiment. Proteins 2021; 89:1987-1996. [PMID: 34462960 PMCID: PMC8616790 DOI: 10.1002/prot.26231] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/23/2021] [Accepted: 08/26/2021] [Indexed: 01/21/2023]
Abstract
Critical Assessment of Structure Prediction (CASP) is an organization aimed at advancing the state of the art in computing protein structure from sequence. In the spring of 2020, CASP launched a community project to compute the structures of the most structurally challenging proteins coded for in the SARS-CoV-2 genome. Forty-seven research groups submitted over 3000 three-dimensional models and 700 sets of accuracy estimates on 10 proteins. The resulting models were released to the public. CASP community members also worked together to provide estimates of local and global accuracy and identify structure-based domain boundaries for some proteins. Subsequently, two of these structures (ORF3a and ORF8) have been solved experimentally, allowing assessment of both model quality and the accuracy estimates. Models from the AlphaFold2 group were found to have good agreement with the experimental structures, with main chain GDT_TS accuracy scores ranging from 63 (a correct topology) to 87 (competitive with experiment).
Collapse
Affiliation(s)
| | - John Moult
- Department of Cell Biology and Molecular genetics, Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, USA
| | - Wendy M Billings
- Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA
| | - Dennis Della Corte
- Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, Davis, California, USA
| | - Sohee Kwon
- Department of Chemistry, Seoul National University, Seoul, South Korea
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, South Korea
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Jonghun Won
- Department of Chemistry, Seoul National University, Seoul, South Korea
| | | |
Collapse
|
15
|
Millán C, Keegan RM, Pereira J, Sammito MD, Simpkin AJ, McCoy AJ, Lupas AN, Hartmann MD, Rigden DJ, Read RJ. Assessing the utility of CASP14 models for molecular replacement. Proteins 2021; 89:1752-1769. [PMID: 34387010 PMCID: PMC8881082 DOI: 10.1002/prot.26214] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 07/20/2021] [Accepted: 07/27/2021] [Indexed: 11/21/2022]
Abstract
The assessment of CASP models for utility in molecular replacement is a measure of their use in a valuable real‐world application. In CASP7, the metric for molecular replacement assessment involved full likelihood‐based molecular replacement searches; however, this restricted the assessable targets to crystal structures with only one copy of the target in the asymmetric unit, and to those where the search found the correct pose. In CASP10, full molecular replacement searches were replaced by likelihood‐based rigid‐body refinement of models superimposed on the target using the LGA algorithm, with the metric being the refined log‐likelihood‐gain (LLG) score. This enabled multi‐copy targets and very poor models to be evaluated, but a significant further issue remained: the requirement of diffraction data for assessment. We introduce here the relative‐expected‐LLG (reLLG), which is independent of diffraction data. This reLLG is also independent of any crystal form, and can be calculated regardless of the source of the target, be it X‐ray, NMR or cryo‐EM. We calibrate the reLLG against the LLG for targets in CASP14, showing that it is a robust measure of both model and group ranking. Like the LLG, the reLLG shows that accurate coordinate error estimates add substantial value to predicted models. We find that refinement by CASP groups can often convert an inadequate initial model into a successful MR search model. Consistent with findings from others, we show that the AlphaFold2 models are sufficiently good, and reliably so, to surpass other current model generation strategies for attempting molecular replacement phasing.
Collapse
Affiliation(s)
- Claudia Millán
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, United Kingdom
| | - Ronan M Keegan
- Scientific Computing Dept., Science and Technologies Facilities Council, UK Research and Innovation, Didcot, Oxfordshire, United Kingdom
| | - Joana Pereira
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, Tübingen, Germany
| | - Massimo D Sammito
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, United Kingdom
| | - Adam J Simpkin
- Institute of Systems, Molecular and Integrative Biology, Biosciences Building, Crown Street, Liverpool L69 7BE, United Kingdom
| | - Airlie J McCoy
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, United Kingdom
| | - Andrei N Lupas
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, Tübingen, Germany
| | - Marcus D Hartmann
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, Tübingen, Germany
| | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, Biosciences Building, Crown Street, Liverpool L69 7BE, United Kingdom
| | - Randy J Read
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, United Kingdom
| |
Collapse
|
16
|
Kinch LN, Schaeffer RD, Kryshtafovych A, Grishin NV. Target classification in the 14th round of the critical assessment of protein structure prediction (CASP14). Proteins 2021; 89:1618-1632. [PMID: 34350630 DOI: 10.1002/prot.26202] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/21/2021] [Accepted: 07/11/2021] [Indexed: 12/14/2022]
Abstract
An evolutionary-based definition and classification of target evaluation units (EUs) is presented for the 14th round of the critical assessment of structure prediction (CASP14). CASP14 targets included 84 experimental models submitted by various structural groups (designated T1024-T1101). Targets were split into EUs based on the domain organization of available templates and performance of server groups. Several targets required splitting (19 out of 25 multidomain targets) due in part to observed conformation changes. All in all, 96 CASP14 EUs were defined and assigned to tertiary structure assessment categories (Topology-based FM or High Accuracy-based TBM-easy and TBM-hard) considering their evolutionary relationship to existing ECOD fold space: 24 family level, 50 distant homologs (H-group), 12 analogs (X-group), and 10 new folds. Principal component analysis and heatmap visualization of sequence and structure similarity to known templates as well as performance of servers highlighted trends in CASP14 target difficulty. The assigned evolutionary levels (i.e., H-groups) and assessment classes (i.e., FM) displayed overlapping clusters of EUs. Many viral targets diverged considerably from their template homologs and thus were more difficult for prediction than other homology-related targets. On the other hand, some targets did not have sequence-identifiable templates, but were predicted better than expected due to relatively simple arrangements of secondary structural elements. An apparent improvement in overall server performance in CASP14 further complicated traditional classification, which ultimately assigned EUs into high-accuracy modeling (27 TBM-easy and 31 TBM-hard), topology (23 FM), or both (15 FM/TBM).
Collapse
Affiliation(s)
- Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - R Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | | | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, USA.,Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA.,Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| |
Collapse
|
17
|
Pereira J, Simpkin AJ, Hartmann MD, Rigden DJ, Keegan RM, Lupas AN. High-accuracy protein structure prediction in CASP14. Proteins 2021; 89:1687-1699. [PMID: 34218458 DOI: 10.1002/prot.26171] [Citation(s) in RCA: 174] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 06/16/2021] [Accepted: 06/23/2021] [Indexed: 12/25/2022]
Abstract
The application of state-of-the-art deep-learning approaches to the protein modeling problem has expanded the "high-accuracy" category in CASP14 to encompass all targets. Building on the metrics used for high-accuracy assessment in previous CASPs, we evaluated the performance of all groups that submitted models for at least 10 targets across all difficulty classes, and judged the usefulness of those produced by AlphaFold2 (AF2) as molecular replacement search models with AMPLE. Driven by the qualitative diversity of the targets submitted to CASP, we also introduce DipDiff as a new measure for the improvement in backbone geometry provided by a model versus available templates. Although a large leap in high-accuracy is seen due to AF2, the second-best method in CASP14 out-performed the best in CASP13, illustrating the role of community-based benchmarking in the development and evolution of the protein structure prediction field.
Collapse
Affiliation(s)
- Joana Pereira
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Adam J Simpkin
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Marcus D Hartmann
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Daniel J Rigden
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Ronan M Keegan
- Department of Scientific Computing, Science and Technologies Facilities Council, UK Research and Innovation, Didcot, Oxfordshire, UK
| | - Andrei N Lupas
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| |
Collapse
|
18
|
Pakhrin SC, Shrestha B, Adhikari B, KC DB. Deep Learning-Based Advances in Protein Structure Prediction. Int J Mol Sci 2021; 22:5553. [PMID: 34074028 PMCID: PMC8197379 DOI: 10.3390/ijms22115553] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 05/12/2021] [Accepted: 05/18/2021] [Indexed: 12/29/2022] Open
Abstract
Obtaining an accurate description of protein structure is a fundamental step toward understanding the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the success of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress in the field. Finally, we provide an outlook and possible future research directions for DL-based approaches in the protein structure prediction arena.
Collapse
Affiliation(s)
- Subash C. Pakhrin
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA;
| | - Bikash Shrestha
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA;
| | - Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA;
| | - Dukka B. KC
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA;
| |
Collapse
|
19
|
Wu F, Xu J. Deep template-based protein structure prediction. PLoS Comput Biol 2021; 17:e1008954. [PMID: 33939695 PMCID: PMC8118551 DOI: 10.1371/journal.pcbi.1008954] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 05/13/2021] [Accepted: 04/11/2021] [Indexed: 11/19/2022] Open
Abstract
MOTIVATION Protein structure prediction has been greatly improved by deep learning, but most efforts are devoted to template-free modeling. But very few deep learning methods are developed for TBM (template-based modeling), a popular technique for protein structure prediction. TBM has been studied extensively in the past, but its accuracy is not satisfactory when highly similar templates are not available. RESULTS This paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. NDThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally, NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence coevolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results show that NDThreader greatly outperforms existing methods such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best average GDT score among all CASP14 servers on the 58 TBM targets.
Collapse
Affiliation(s)
- Fandi Wu
- Toyota Technological Institute at Chicago, Chicago, IL, United States of America
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, United States of America
| |
Collapse
|
20
|
Pasquadibisceglie A, Polticelli F. Computational studies of the mitochondrial carrier family SLC25. Present status and future perspectives. BIO-ALGORITHMS AND MED-SYSTEMS 2021. [DOI: 10.1515/bams-2021-0018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Abstract
The members of the mitochondrial carrier family, also known as solute carrier family 25 (SLC25), are transmembrane proteins involved in the translocation of a plethora of small molecules between the mitochondrial intermembrane space and the matrix. These transporters are characterized by three homologous domains structure and a transport mechanism that involves the transition between different conformations. Mutations in regions critical for these transporters’ function often cause several diseases, given the crucial role of these proteins in the mitochondrial homeostasis. Experimental studies can be problematic in the case of membrane proteins, in particular concerning the characterization of the structure–function relationships. For this reason, computational methods are often applied in order to develop new hypotheses or to support/explain experimental evidence. Here the computational analyses carried out on the SLC25 members are reviewed, describing the main techniques used and the outcome in terms of improved knowledge of the transport mechanism. Potential future applications on this protein family of more recent and advanced in silico methods are also suggested.
Collapse
Affiliation(s)
| | - Fabio Polticelli
- Department of Sciences , Roma Tre University , Rome , Italy
- National Institute of Nuclear Physics, Roma Tre Section , Rome , Italy
| |
Collapse
|
21
|
Croll TI, Read RJ. Adaptive Cartesian and torsional restraints for interactive model rebuilding. Acta Crystallogr D Struct Biol 2021; 77:438-446. [PMID: 33825704 PMCID: PMC8025879 DOI: 10.1107/s2059798321001145] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 02/01/2021] [Indexed: 12/18/2022] Open
Abstract
When building atomic models into weak and/or low-resolution density, a common strategy is to restrain their conformation to that of a higher resolution model of the same or similar sequence. When doing so, it is important to avoid over-restraining to the reference model in the face of disagreement with the experimental data. The most common strategy for this is the use of `top-out' potentials. These act like simple harmonic restraints within a defined range, but gradually weaken when the deviation between the model and reference grows beyond that range. In each current implementation the rate at which the potential flattens at large deviations follows a fixed form, although the form chosen varies among implementations. A restraint potential with a tuneable rate of flattening would provide greater flexibility to encode the confidence in any given restraint. Here, two new such potentials are described: a Cartesian distance restraint derived from a recent generalization of common loss functions and a periodic torsion restraint based on a renormalization of the von Mises distribution. Further, their implementation as user-adjustable/switchable restraints in ISOLDE is described and their use in some real-world examples is demonstrated.
Collapse
Affiliation(s)
- Tristan Ian Croll
- Cambridge Institute for Medical Research, Keith Peters Building, Cambridge CB2 0XY, United Kingdom
| | - Randy J. Read
- Cambridge Institute for Medical Research, Keith Peters Building, Cambridge CB2 0XY, United Kingdom
| |
Collapse
|
22
|
Silva MK, Gomes HSS, Silva OLT, Campanelli SE, Campos DMO, Araújo JMG, Fernandes JV, Fulco UL, Oliveira JIN. Identification of promiscuous T cell epitopes on Mayaro virus structural proteins using immunoinformatics, molecular modeling, and QM:MM approaches. INFECTION GENETICS AND EVOLUTION 2021; 91:104826. [PMID: 33781966 DOI: 10.1016/j.meegid.2021.104826] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 03/11/2021] [Accepted: 03/23/2021] [Indexed: 10/21/2022]
Abstract
The Mayaro virus (MAYV) belongs to genus Alphavirus (family Togaviridae) and has been reported in several countries, especially in tropical regions of America. Due to its outbreaks and potential lack of medication, an effective vaccine formulation is strongly required. This study aimed to predict promiscuous T cell epitopes from structural polyproteins of MAYV using an immunoinformatics approach. For this purpose, consensus sequences were used to identify short protein sequences capable of binding to MHC class I and class II alleles. Our analysis pointed out 4 MHC-I/TCD8+ and 21 MHC-II/TCD4+ epitopes on capside (1;3), E1 (2;5), E2 (1;10), E3 (0;2), and 6 K (0;1) proteins. These predicted epitopes were characterized by high antigenicity, immunogenicity, conservancy, non-allergenic, non-toxic, and good population coverage rate values for North and South American geographical areas. Afterwards, we used the crystal structure of human toll-like receptor 3 (TLR3) ectodomain as a template to predict, through docking essays, the placement of a vaccine prototype at the TLR3 receptor binding site. Finally, classical and quantum mechanics/molecular mechanics (QM:MM) computations were employed to improve the quality of docking calculations, with the QM part of the simulations being accomplished by using the density functional theory (DFT) formalism. These results provide important insights into the advancement of diagnostic platforms, the development of vaccines, and immunotherapeutic interventions.
Collapse
Affiliation(s)
- Maria K Silva
- Departamento de Biofísica e Farmacologia, Universidade Federal do Rio Grande do Norte, 59072-970 Natal, RN, Brazil
| | - Heloísa S S Gomes
- Departamento de Biofísica e Farmacologia, Universidade Federal do Rio Grande do Norte, 59072-970 Natal, RN, Brazil
| | - Ohana L T Silva
- Departamento de Biofísica e Farmacologia, Universidade Federal do Rio Grande do Norte, 59072-970 Natal, RN, Brazil
| | - Stephany E Campanelli
- Departamento de Biofísica e Farmacologia, Universidade Federal do Rio Grande do Norte, 59072-970 Natal, RN, Brazil
| | - Daniel M O Campos
- Departamento de Biofísica e Farmacologia, Universidade Federal do Rio Grande do Norte, 59072-970 Natal, RN, Brazil
| | - Josélio M G Araújo
- Departamento de Microbiologia e Parasitologia, Universidade Federal do Rio Grande do Norte, 59072-970 Natal, RN, Brazil
| | - José V Fernandes
- Departamento de Microbiologia e Parasitologia, Universidade Federal do Rio Grande do Norte, 59072-970 Natal, RN, Brazil
| | - Umberto L Fulco
- Departamento de Biofísica e Farmacologia, Universidade Federal do Rio Grande do Norte, 59072-970 Natal, RN, Brazil
| | - Jonas I N Oliveira
- Departamento de Biofísica e Farmacologia, Universidade Federal do Rio Grande do Norte, 59072-970 Natal, RN, Brazil.
| |
Collapse
|
23
|
Residue-based pharmacophore approaches to study protein-protein interactions. Curr Opin Struct Biol 2021; 67:205-211. [PMID: 33486430 DOI: 10.1016/j.sbi.2020.12.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 12/04/2020] [Accepted: 12/28/2020] [Indexed: 01/22/2023]
Abstract
This review focuses on pharmacophore approaches in researching protein interfaces that bind protein ligands. Pharmacophore descriptions of binding interfaces that employ molecular dynamics simulation can account for effects of solvation and conformational flexibility. In addition, these calculations provide an approximation to entropic considerations and as such, a better approximation of the free energy of binding. Residue-based pharmacophore approaches can facilitate a variety of drug discovery tasks such as the identification of receptor-ligand partners, identifying their binding poses, designing protein interfaces for selectivity, or defining a reduced mutational combinatorial exploration for subsequent experimental engineering techniques by orders of magnitudes.
Collapse
|
24
|
Abstract
Every protein has a story-how it folds, what it binds, its biological actions, and how it misbehaves in aging or disease. Stories are often inferred from a protein's shape (i.e., its structure). But increasingly, stories are told using computational molecular physics (CMP). CMP is rooted in the principled physics of driving forces and reveals granular detail of conformational populations in space and time. Recent advances are accessing longer time scales, larger actions, and blind testing, enabling more of biology's stories to be told in the language of atomistic physics.
Collapse
Affiliation(s)
- Emiliano Brini
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, USA
| | - Carlos Simmerling
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, USA.,Department of Chemistry, Stony Brook University, Stony Brook, NY 11794, USA
| | - Ken Dill
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, USA. .,Department of Chemistry, Stony Brook University, Stony Brook, NY 11794, USA.,Department of Physics and Astronomy, Stony Brook University, Stony Brook, New NY 11794, USA
| |
Collapse
|
25
|
Studer G, Tauriello G, Bienert S, Biasini M, Johner N, Schwede T. ProMod3-A versatile homology modelling toolbox. PLoS Comput Biol 2021; 17:e1008667. [PMID: 33507980 PMCID: PMC7872268 DOI: 10.1371/journal.pcbi.1008667] [Citation(s) in RCA: 130] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 02/09/2021] [Accepted: 01/03/2021] [Indexed: 11/18/2022] Open
Abstract
Computational methods for protein structure modelling are routinely used to complement experimental structure determination, thus they help to address a broad spectrum of scientific questions in biomedical research. The most accurate methods today are based on homology modelling, i.e. detecting a homologue to the desired target sequence that can be used as a template for modelling. Here we present a versatile open source homology modelling toolbox as foundation for flexible and computationally efficient modelling workflows. ProMod3 is a fully scriptable software platform that can perform all steps required to generate a protein model by homology. Its modular design aims at fast prototyping of novel algorithms and implementing flexible modelling pipelines. Common modelling tasks, such as loop modelling, sidechain modelling or generating a full protein model by homology, are provided as production ready pipelines, forming the starting point for own developments and enhancements. ProMod3 is the central software component of the widely used SWISS-MODEL web-server.
Collapse
Affiliation(s)
- Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Stefan Bienert
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Marco Biasini
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Niklaus Johner
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
26
|
McCoy AJ, Stockwell DH, Sammito MD, Oeffner RD, Hatti KS, Croll TI, Read RJ. Phasertng: directed acyclic graphs for crystallographic phasing. Acta Crystallogr D Struct Biol 2021; 77:1-10. [PMID: 33404520 PMCID: PMC7787104 DOI: 10.1107/s2059798320014746] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 11/06/2020] [Indexed: 12/01/2022] Open
Abstract
Crystallographic phasing strategies increasingly require the exploration and ranking of many hypotheses about the number, types and positions of atoms, molecules and/or molecular fragments in the unit cell, each with only a small chance of being correct. Accelerating this move has been improvements in phasing methods, which are now able to extract phase information from the placement of very small fragments of structure, from weak experimental phasing signal or from combinations of molecular replacement and experimental phasing information. Describing phasing in terms of a directed acyclic graph allows graph-management software to track and manage the path to structure solution. The crystallographic software supporting the graph data structure must be strictly modular so that nodes in the graph are efficiently generated by the encapsulated functionality. To this end, the development of new software, Phasertng, which uses directed acyclic graphs natively for input/output, has been initiated. In Phasertng, the codebase of Phaser has been rebuilt, with an emphasis on modularity, on scripting, on speed and on continuing algorithm development. As a first application of phasertng, its advantages are demonstrated in the context of phasertng.xtricorder, a tool to analyse and triage merged data in preparation for molecular replacement or experimental phasing. The description of the phasing strategy with directed acyclic graphs is a generalization that extends beyond the functionality of Phasertng, as it can incorporate results from bioinformatics and other crystallographic tools, and will facilitate multifaceted search strategies, dynamic ranking of alternative search pathways and the exploitation of machine learning to further improve phasing strategies.
Collapse
Affiliation(s)
- Airlie J. McCoy
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Duncan H. Stockwell
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Massimo D. Sammito
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Robert D. Oeffner
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Kaushik S. Hatti
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
- Drug Discovery Unit, Wellcome Centre for Anti-Infectives Research, School of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, United Kingdom
| | - Tristan I. Croll
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Randy J. Read
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| |
Collapse
|
27
|
Zhang H, Shen Y. Template-based prediction of protein structure with deep learning. BMC Genomics 2020; 21:878. [PMID: 33372607 PMCID: PMC7771081 DOI: 10.1186/s12864-020-07249-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 11/18/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. RESULTS We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residue-residue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56, 13, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13's TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16, 9 and 8% in terms of TM-score, respectively. CONCLUSIONS These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.
Collapse
Affiliation(s)
- Haicang Zhang
- Department of Systems Biology, Columbia University, New York, NY, USA.
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY, USA.
- Program in Mathematical Genomics, Columbia University, New York, NY, USA.
| |
Collapse
|
28
|
Ziegler SJ, Mallinson SJ, St. John PC, Bomble YJ. Advances in integrative structural biology: Towards understanding protein complexes in their cellular context. Comput Struct Biotechnol J 2020; 19:214-225. [PMID: 33425253 PMCID: PMC7772369 DOI: 10.1016/j.csbj.2020.11.052] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Revised: 11/25/2020] [Accepted: 11/28/2020] [Indexed: 01/26/2023] Open
Abstract
Microorganisms rely on protein interactions to transmit signals, react to stimuli, and grow. One of the best ways to understand these protein interactions is through structural characterization. However, in the past, structural knowledge was limited to stable, high-affinity complexes that could be crystallized. Recent developments in structural biology have revolutionized how protein interactions are characterized. The combination of multiple techniques, known as integrative structural biology, has provided insight into how large protein complexes interact in their native environment. In this mini-review, we describe the past, present, and potential future of integrative structural biology as a tool for characterizing protein interactions in their cellular context.
Collapse
Key Words
- CLEM, correlated light and electron microscopy
- Crosslinking mass spectrometry
- Cryo-electron microscopy
- Cryo-electron tomography
- EPR, electron paramagnetic resonance
- FRET, Forster resonance energy transfer
- ISB, Integrative structural biology
- Integrative structural biology
- ML, machine learning
- MR, molecular replacement
- MSAs, multiple sequence alignments
- MX, macromolecular crystallography
- NMR, nuclear magnetic resonance
- PDB, Protein Data Bank
- Protein docking
- Protein structure prediction
- Quinary interactions
- SAD, single-wavelength anomalous dispersion
- SANS, small angle neutron scattering
- SAXS, small angle X-ray scattering
- X-ray crystallography
- XL-MS, cross-linking mass spectrometry
- cryo-EM SPA, cryo-EM single particle analysis
- cryo-EM, cryo-electron microscopy
- cryo-ET, cryo-electron tomography
Collapse
Affiliation(s)
- Samantha J. Ziegler
- Biosciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, CO 80401, USA
| | - Sam J.B. Mallinson
- Biosciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, CO 80401, USA
| | - Peter C. St. John
- Biosciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, CO 80401, USA
| | - Yannick J. Bomble
- Biosciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, CO 80401, USA
| |
Collapse
|
29
|
Structure-activity relationship studies and bioactivity evaluation of 1,2,3-triazole containing analogues as a selective sphingosine kinase-2 inhibitors. Eur J Med Chem 2020; 206:112713. [DOI: 10.1016/j.ejmech.2020.112713] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 07/10/2020] [Accepted: 07/29/2020] [Indexed: 12/14/2022]
|
30
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|
31
|
Jin S, Miller MD, Chen M, Schafer NP, Lin X, Chen X, Phillips GN, Wolynes PG. Molecular-replacement phasing using predicted protein structures from AWSEM-Suite. IUCRJ 2020; 7:1168-1178. [PMID: 33209327 PMCID: PMC7642774 DOI: 10.1107/s2052252520013494] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 10/07/2020] [Indexed: 06/11/2023]
Abstract
The phase problem in X-ray crystallography arises from the fact that only the intensities, and not the phases, of the diffracting electromagnetic waves are measured directly. Molecular replacement can often estimate the relative phases of reflections starting with those derived from a template structure, which is usually a previously solved structure of a similar protein. The key factor in the success of molecular replacement is finding a good template structure. When no good solved template exists, predicted structures based partially on templates can sometimes be used to generate models for molecular replacement, thereby extending the lower bound of structural and sequence similarity required for successful structure determination. Here, the effectiveness is examined of structures predicted by a state-of-the-art prediction algorithm, the Associative memory, Water-mediated, Structure and Energy Model Suite (AWSEM-Suite), which has been shown to perform well in predicting protein structures in CASP13 when there is no significant sequence similarity to a solved protein or only very low sequence similarity to known templates. The performance of AWSEM-Suite structures in molecular replacement is discussed and the results show that AWSEM-Suite performs well in providing useful phase information, often performing better than I-TASSER-MR and the previous algorithm AWSEM-Template.
Collapse
Affiliation(s)
- Shikai Jin
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA
- Department of Biosciences, Rice University, Houston, Texas, USA
| | | | - Mingchen Chen
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA
| | - Nicholas P. Schafer
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA
- Department of Chemistry, Rice University, Houston, Texas, USA
| | - Xingcheng Lin
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Xun Chen
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA
- Department of Chemistry, Rice University, Houston, Texas, USA
| | - George N. Phillips
- Department of Biosciences, Rice University, Houston, Texas, USA
- Department of Chemistry, Rice University, Houston, Texas, USA
| | - Peter G. Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA
- Department of Biosciences, Rice University, Houston, Texas, USA
- Department of Chemistry, Rice University, Houston, Texas, USA
- Department of Physics, Rice University, Houston, Texas, USA
| |
Collapse
|
32
|
Karunarathna KHT, Senathilake NHKS, Mewan KM, Weerasena OVDSJ, Perera SACN. In silico structural homology modelling of EST073 motif coding protein of tea Camellia sinensis (L). J Genet Eng Biotechnol 2020; 18:32. [PMID: 32685981 PMCID: PMC7370249 DOI: 10.1186/s43141-020-00038-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 06/04/2020] [Indexed: 11/17/2022]
Abstract
Background Tea (Camellia sinensis (L). O. Kuntze) is known as the oldest, mild stimulating caffeine containing non-alcoholic beverage. One of the major threats in south Asian tea industry is the blister blight leaf disease (BB), caused by the fungus Exobasidium vexans Masse. SSR DNA marker EST SSR 073 is used as a molecular marker to tag blister blight disease resistance trait of tea. The amino acid sequences were derived from cDNA sequences related to EST SSR 073 of BB susceptible (TRI 2023) and BB resistant (TRI 2043) cultivars. An attempt has been made to understand the structural characteristics and variations of EST SSR 073 locus that may reveal the factors influencing the BB resistance of tea with multiple bioinformatics tools such as ORF finder, ExPasy ProtParam tools, modeler V 9.17, Rampage server, UCSF-Chimera, and HADDOCK docking server. Results The primary, secondary, and tertiary structures of EST SSR 073 coding protein were analyzed using the amino acid sequences of both BB resistant TRI 2043 and BB susceptible TRI 2023 tea cultivars. The coding amino acid sequences of both the cultivars were homologous to photosystem I subunit protein (PsaD I) of Pisum sativum. The predicted 3D structures of proteins were validated and considered as an acceptable overall stereochemical quality. The BB resistant protein showed CT repeat extension and did not involve in topology of the PsaD I subunit. The C terminal truncation of BB resistance caused the formation of hydrogen bonds interacting with PsaD I and other subunits of photosystem I in the modeled three-dimensional protein structure. Conclusions Camellia sinensis EST 073 SSR motif coding protein was identified as the PsaD I subunit of photosystem I. The exact mechanism of PsaD I conferring the resistance for blister blight in tea needs to be further investigated.
Collapse
Affiliation(s)
- K H T Karunarathna
- Institute of Biochemistry, Molecular Biology and Biotechnology, University of Colombo, Colombo, Sri Lanka. .,Current address: Department of biosystems Technology, Faculty of Technology, University of Ruhuna, Matara, Sri Lanka.
| | - N H K S Senathilake
- Institute of Biochemistry, Molecular Biology and Biotechnology, University of Colombo, Colombo, Sri Lanka
| | - K M Mewan
- Department of Biotechnology, Faculty of Agriculture and Plantation Management, Wayamba University of Sri Lanka, Makandura, Gonawila, Sri Lanka
| | - O V D S J Weerasena
- Institute of Biochemistry, Molecular Biology and Biotechnology, University of Colombo, Colombo, Sri Lanka
| | - S A C N Perera
- Department of Agricultural Biology, Faculty of Agriculture, University of Peradeniya, Peradeniya, 20400, Sri Lanka
| |
Collapse
|
33
|
Hall R, Dixon T, Dickson A. On Calculating Free Energy Differences Using Ensembles of Transition Paths. Front Mol Biosci 2020; 7:106. [PMID: 32582764 PMCID: PMC7291376 DOI: 10.3389/fmolb.2020.00106] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 05/06/2020] [Indexed: 12/30/2022] Open
Abstract
The free energy of a process is the fundamental quantity that determines its spontaneity or propensity at a given temperature. In particular, the binding free energy of a drug candidate to its biomolecular target is used as an objective quantity in drug design. Recently, binding kinetics—rates of association (kon) and dissociation (koff)—have also demonstrated utility for their ability to predict efficacy and in some cases have been shown to be more predictive than the binding free energy alone. Some methods exist to calculate binding kinetics from molecular simulations, although these are typically more difficult to calculate than the binding affinity as they depend on details of the transition path ensemble. Assessing these rate constants can be difficult, due to uncertainty in the definition of the bound and unbound states, large error bars and the lack of experimental data. As an additional consistency check, rate constants from simulation can be used to calculate free energies (using the log of their ratio) which can then be compared to free energies obtained experimentally or using alchemical free energy perturbation. However, in this calculation it is not straightforward to account for common, practical details such as the finite simulation volume or the particular definition of the “bound” and “unbound” states. Here we derive a set of correction terms that can be applied to calculations of binding free energies using full reactive trajectories. We apply these correction terms to revisit the calculation of binding free energies from rate constants for a host-guest system that was part of a blind prediction challenge, where significant deviations were observed between free energies calculated with rate ratios and those calculated from alchemical perturbation. The correction terms combine to significantly decrease the error with respect to computational benchmarks, from 3.4 to 0.76 kcal/mol. Although these terms were derived with weighted ensemble simulations in mind, some of the correction terms are generally applicable to free energies calculated using physical pathways via methods such as Markov state modeling, metadynamics, milestoning, or umbrella sampling.
Collapse
Affiliation(s)
- Robert Hall
- Department of Biochemistry & Molecular Biology, Michigan State University, East Lansing, MI, United States
| | - Tom Dixon
- Department of Biochemistry & Molecular Biology, Michigan State University, East Lansing, MI, United States.,Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, United States
| | - Alex Dickson
- Department of Biochemistry & Molecular Biology, Michigan State University, East Lansing, MI, United States.,Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, United States
| |
Collapse
|
34
|
Abstract
The purpose of this quick guide is to help new modelers who have little or no background in comparative modeling yet are keen to produce high-resolution protein 3D structures for their study by following systematic good modeling practices, using affordable personal computers or online computational resources. Through the available experimental 3D-structure repositories, the modeler should be able to access and use the atomic coordinates for building homology models. We also aim to provide the modeler with a rationale behind making a simple list of atomic coordinates suitable for computational analysis abiding to principles of physics (e.g., molecular mechanics). Keeping that objective in mind, these quick tips cover the process of homology modeling and some postmodeling computations such as molecular docking and molecular dynamics (MD). A brief section was left for modeling nonprotein molecules, and a short case study of homology modeling is discussed.
Collapse
Affiliation(s)
- Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Brno, Czech Republic
| | - Vojtech Adam
- Department of Chemistry and Biochemistry, Mendel University in Brno, Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Brno, Czech Republic
| |
Collapse
|
35
|
Karczyńska AS, Ziȩba K, Uciechowska U, Mozolewska MA, Krupa P, Lubecka EA, Lipska AG, Sikorska C, Samsonov SA, Sieradzan AK, Giełdoń A, Liwo A, Ślusarz R, Ślusarz M, Lee J, Joo K, Czaplewski C. Improved Consensus-Fragment Selection in Template-Assisted Prediction of Protein Structures with the UNRES Force Field in CASP13. J Chem Inf Model 2020; 60:1844-1864. [PMID: 31999919 PMCID: PMC7588044 DOI: 10.1021/acs.jcim.9b00864] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
The method for protein-structure
prediction, which combines the
physics-based coarse-grained UNRES force field with knowledge-based
modeling, has been developed further and tested in the 13th Community
Wide Experiment on the Critical Assessment of Techniques for Protein
Structure Prediction (CASP13). The method implements restraints from
the consensus fragments common to server models. In this work, the
server models to derive fragments have been chosen on the basis of
quality assessment; a fully automatic fragment-selection procedure
has been introduced, and Dynamic Fragment Assembly pseudopotentials
have been fully implemented. The Global Distance Test Score (GDT_TS),
averaged over our “Model 1” predictions, increased by
over 10 units with respect to CASP12 for the free-modeling category
to reach 40.82. Our “Model 1” predictions ranked 20
and 14 for all and free-modeling targets, respectively (upper 20.2%
and 14.3% of all models submitted to CASP13 in these categories, respectively),
compared to 27 (upper 21.1%) and 24 (upper 18.9%) in CASP12, respectively.
For oligomeric targets, the Interface Patch Similarity (IPS) and Interface
Contact Similarity (ICS) averaged over our best oligomer models increased
from 0.28 to 0.36 and from 12.4 to 17.8, respectively, from CASP12
to CASP13, and top-ranking models of 2 targets (H0968 and T0997o)
were obtained (none in CASP12). The improvement of our method in CASP13
over CASP12 was ascribed to the combined effect of the overall enhancement
of server-model quality, our success in selecting server models and
fragments to derive restraints, and improvements of the restraint
and potential-energy functions.
Collapse
Affiliation(s)
| | - Karolina Ziȩba
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Urszula Uciechowska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Magdalena A Mozolewska
- Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, Warsaw PL-02668, Poland
| | - Paweł Krupa
- Institute of Physics, Polish Academy of Sciences, Aleja Lotników 32/46, Warsaw PL-02668, Poland
| | - Emilia A Lubecka
- Institute of Informatics, Faculty of Mathematics, Physics, and Informatics, University of Gdańsk, Wita Stwosza 57, Gdańsk 80-308, Poland
| | - Agnieszka G Lipska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Celina Sikorska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Sergey A Samsonov
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Adam K Sieradzan
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland.,School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Artur Giełdoń
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland.,School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Rafał Ślusarz
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Magdalena Ślusarz
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Jooyoung Lee
- School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| |
Collapse
|
36
|
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature 2020; 577:706-710. [PMID: 31942072 DOI: 10.1038/s41586-019-1923-7] [Citation(s) in RCA: 1364] [Impact Index Per Article: 341.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 12/10/2019] [Indexed: 12/16/2022]
Abstract
Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid sequence1. This problem is of fundamental importance as the structure of a protein largely determines its function2; however, protein structures can be difficult to determine experimentally. Considerable progress has recently been made by leveraging genetic information. It is possible to infer which amino acid residues are in contact by analysing covariation in homologous sequences, which aids in the prediction of protein structures3. Here we show that we can train a neural network to make accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions. Using this information, we construct a potential of mean force4 that can accurately describe the shape of a protein. We find that the resulting potential can be optimized by a simple gradient descent algorithm to generate structures without complex sampling procedures. The resulting system, named AlphaFold, achieves high accuracy, even for sequences with fewer homologous sequences. In the recent Critical Assessment of Protein Structure Prediction5 (CASP13)-a blind assessment of the state of the field-AlphaFold created high-accuracy structures (with template modelling (TM) scores6 of 0.7 or higher) for 24 out of 43 free modelling domains, whereas the next best method, which used sampling and contact information, achieved such accuracy for only 14 out of 43 domains. AlphaFold represents a considerable advance in protein-structure prediction. We expect this increased accuracy to enable insights into the function and malfunction of proteins, especially in cases for which no structures for homologous proteins have been experimentally determined7.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - David T Jones
- The Francis Crick Institute, London, UK.,University College London, London, UK
| | | | | | | |
Collapse
|
37
|
Revisiting the "satisfaction of spatial restraints" approach of MODELLER for protein homology modeling. PLoS Comput Biol 2019; 15:e1007219. [PMID: 31846452 PMCID: PMC6938380 DOI: 10.1371/journal.pcbi.1007219] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 12/31/2019] [Accepted: 11/13/2019] [Indexed: 01/02/2023] Open
Abstract
The most frequently used approach for protein structure prediction is currently homology modeling. The 3D model building phase of this methodology is critical for obtaining an accurate and biologically useful prediction. The most widely employed tool to perform this task is MODELLER. This program implements the “modeling by satisfaction of spatial restraints” strategy and its core algorithm has not been altered significantly since the early 1990s. In this work, we have explored the idea of modifying MODELLER with two effective, yet computationally light strategies to improve its 3D modeling performance. Firstly, we have investigated how the level of accuracy in the estimation of structural variability between a target protein and its templates in the form of σ values profoundly influences 3D modeling. We show that the σ values produced by MODELLER are on average weakly correlated to the true level of structural divergence between target-template pairs and that increasing this correlation greatly improves the program’s predictions, especially in multiple-template modeling. Secondly, we have inquired into how the incorporation of statistical potential terms (such as the DOPE potential) in the MODELLER’s objective function impacts positively 3D modeling quality by providing a small but consistent improvement in metrics such as GDT-HA and lDDT and a large increase in stereochemical quality. Python modules to harness this second strategy are freely available at https://github.com/pymodproject/altmod. In summary, we show that there is a large room for improving MODELLER in terms of 3D modeling quality and we propose strategies that could be pursued in order to further increase its performance. Proteins are fundamental biological molecules that carry out countless activities in living beings. Since the function of proteins is dictated by their three-dimensional atomic structures, acquiring structural details of proteins provides deep insights into their function. Currently, the most frequently used computational approach for protein structure prediction is template-based modeling. In this approach, a target protein is modeled using the experimentally-derived structural information of a template protein assumed to have a similar structure to the target. MODELLER is the most frequently used program for template-based 3D model building. Despite its success, its predictions are not always accurate enough to be useful in Biomedical Research. Here, we show that it is possible to greatly increase the performance of MODELLER by modifying two aspects of its algorithm. First, we demonstrate that providing the program with accurate estimations of local target-template structural divergence greatly increases the quality of its predictions. Additionally, we show that modifying MODELLER’s scoring function with statistical potential energetic terms also helps to improve modeling quality. This work will be useful in future research, since it reports practical strategies to improve the performance of this core tool in Structural Bioinformatics.
Collapse
|
38
|
Kryshtafovych A, Malhotra S, Monastyrskyy B, Cragnolini T, Joseph AP, Chiu W, Topf M. Cryo-electron microscopy targets in CASP13: Overview and evaluation of results. Proteins 2019; 87:1128-1140. [PMID: 31576602 PMCID: PMC7197460 DOI: 10.1002/prot.25817] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 08/30/2019] [Accepted: 09/13/2019] [Indexed: 11/07/2022]
Abstract
Structures of seven CASP13 targets were determined using cryo-electron microscopy (cryo-EM) technique with resolution between 3.0 and 4.0 Å. We provide an overview of the experimentally derived structures and describe results of the numerical evaluation of the submitted models. The evaluation is carried out by comparing coordinates of models to those of reference structures (CASP-style evaluation), as well as checking goodness-of-fit of modeled structures to the cryo-EM density maps. The performance of contributing research groups in the CASP-style evaluation is measured in terms of backbone accuracy, all-atom local geometry and similarity of inter-subunit interfaces. The results on the cryo-EM targets are compared with those on the whole set of eighty CASP13 targets. A posteriori refinement of the best models in their corresponding cryo-EM density maps resulted in structures that are very close to the reference structure, including some regions with better fit to the density.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - Sony Malhotra
- Institute of Structural and Molecular Biology, Birkbeck, University College London, Malet Street, London WC1E 7HX, UK
| | - Bohdan Monastyrskyy
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - Tristan Cragnolini
- Institute of Structural and Molecular Biology, Birkbeck, University College London, Malet Street, London WC1E 7HX, UK
| | - Agnel-Praveen Joseph
- Institute of Structural and Molecular Biology, Birkbeck, University College London, Malet Street, London WC1E 7HX, UK
| | - Wah Chiu
- Department of Bioengineering, Microbiology and Immunology and Photon Science, Stanford University, James H. Clark Center, MC5447, 318 Campus Drive, Stanford, CA 94305, USA
| | - Maya Topf
- Institute of Structural and Molecular Biology, Birkbeck, University College London, Malet Street, London WC1E 7HX, UK
| |
Collapse
|
39
|
Heo L, Feig M. High-accuracy protein structures by combining machine-learning with physics-based refinement. Proteins 2019; 88:637-642. [PMID: 31693199 DOI: 10.1002/prot.25847] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 10/05/2019] [Accepted: 11/03/2019] [Indexed: 12/16/2022]
Abstract
Protein structure prediction has long been available as an alternative to experimental structure determination, especially via homology modeling based on templates from related sequences. Recently, models based on distance restraints from coevolutionary analysis via machine learning to have significantly expanded the ability to predict structures for sequences without templates. One such method, AlphaFold, also performs well on sequences where templates are available but without using such information directly. Here we show that combining machine-learning based models from AlphaFold with state-of-the-art physics-based refinement via molecular dynamics simulations further improves predictions to outperform any other prediction method tested during the latest round of CASP. The resulting models have highly accurate global and local structures, including high accuracy at functionally important interface residues, and they are highly suitable as initial models for crystal structure determination via molecular replacement.
Collapse
Affiliation(s)
- Lim Heo
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan
| |
Collapse
|
40
|
Haas J, Gumienny R, Barbato A, Ackermann F, Tauriello G, Bertoni M, Studer G, Smolinski A, Schwede T. Introducing "best single template" models as reference baseline for the Continuous Automated Model Evaluation (CAMEO). Proteins 2019; 87:1378-1387. [PMID: 31571280 DOI: 10.1002/prot.25815] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2019] [Revised: 09/10/2019] [Accepted: 09/13/2019] [Indexed: 12/17/2022]
Abstract
Critical blind assessment of structure prediction techniques is crucial for the scientific community to establish the state of the art, identify bottlenecks, and guide future developments. In Critical Assessment of Techniques in Structure Prediction (CASP), human experts assess the performance of participating methods in relation to the difficulty of the prediction task in a biennial experiment on approximately 100 targets. Yet, the development of automated computational modeling methods requires more frequent evaluation cycles and larger sets of data. The "Continuous Automated Model EvaluatiOn (CAMEO)" platform complements CASP by conducting fully automated blind prediction evaluations based on the weekly pre-release of sequences of those structures, which are going to be published in the next release of the Protein Data Bank (PDB). Each week, CAMEO publishes benchmarking results for predictions corresponding to a set of about 20 targets collected during a 4-day prediction window. CAMEO benchmarking data are generated consistently for all methods at the same point in time, enabling developers to cross-validate their method's performance, and referring to their results in publications. Many successful participants of CASP have used CAMEO-either by directly benchmarking their methods within the system or by comparing their own performance to CAMEO reference data. CAMEO offers a variety of scores reflecting different aspects of structure modeling, for example, binding site accuracy, homo-oligomer interface quality, or accuracy of local model confidence estimates. By introducing the "bestSingleTemplate" method based on structure superpositions as a reference for the accuracy of 3D modeling predictions, CAMEO facilitates objective comparison of techniques and fosters the development of advanced methods.
Collapse
Affiliation(s)
- Juergen Haas
- Computational Structural Biology, University of Basel, Switzerland
| | - Rafal Gumienny
- Computational Structural Biology, Swiss Institute of Bioinformatics, Switzerland
| | - Alessandro Barbato
- Computational Structural Biology, Universitat Basel Department Biozentrum, Switzerland
| | - Flavio Ackermann
- Computational Structural Biology, University of Basel, Switzerland
| | | | - Martino Bertoni
- Computational Structural Biology, Universitat Basel Department Biozentrum, Switzerland
| | - Gabriel Studer
- Computational Structural Biology, University of Basel, Switzerland
| | - Anna Smolinski
- Computational Structural Biology, University of Basel, Switzerland
| | - Torsten Schwede
- Computational Structural Biology, University of Basel, Switzerland
| |
Collapse
|
41
|
Croll TI, Sammito MD, Kryshtafovych A, Read RJ. Evaluation of template-based modeling in CASP13. Proteins 2019; 87:1113-1127. [PMID: 31407380 PMCID: PMC6851432 DOI: 10.1002/prot.25800] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 07/29/2019] [Accepted: 08/08/2019] [Indexed: 12/12/2022]
Abstract
Performance in the template‐based modeling (TBM) category of CASP13 is assessed here, using a variety of metrics. Performance of the predictor groups that participated is ranked using the primary ranking score that was developed by the assessors for CASP12. This reveals that the best results are obtained by groups that include contact predictions or inter‐residue distance predictions derived from deep multiple sequence alignments. In cases where there is a good homolog in the wwPDB (TBM‐easy category), the best results are obtained by modifying a template. However, for cases with poorer homologs (TBM‐hard), very good results can be obtained without using an explicit template, by deep learning algorithms trained on the wwPDB. Alternative metrics are introduced, to allow testing of aspects of structural models that are not addressed by traditional CASP metrics. These include comparisons to the main‐chain and side‐chain torsion angles of the target, and the utility of models for solving crystal structures by the molecular replacement method. The alternative metrics are poorly correlated with the traditional metrics, and it is proposed that modeling has reached a sufficient level of maturity that the best models should be expected to satisfy this wider range of criteria.
Collapse
Affiliation(s)
- Tristan I Croll
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| | - Massimo D Sammito
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| | | | - Randy J Read
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| |
Collapse
|