1
|
Moreira D, Blaz J, Kim E, Eme L. A gene-rich mitochondrion with a unique ancestral protein transport system. Curr Biol 2024; 34:3812-3819.e3. [PMID: 39084221 DOI: 10.1016/j.cub.2024.07.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Revised: 05/03/2024] [Accepted: 07/02/2024] [Indexed: 08/02/2024]
Abstract
Mitochondria originated from an ancient endosymbiosis involving an alphaproteobacterium.1,2,3 Over time, these organelles reduced their gene content massively, with most genes being transferred to the host nucleus before the last eukaryotic common ancestor (LECA).4 This process has yielded varying gene compositions in modern mitogenomes, including the complete loss of this organellar genome in some extreme cases.5,6,7,8,9,10,11,12,13,14 At the other end of the spectrum, jakobids harbor the most gene-rich mitogenomes, encoding 60-66 proteins.8 Here, we introduce the mitogenome of Mantamonas sphyraenae, a protist from the deep-branching CRuMs supergroup.15,16 Remarkably, it boasts the most gene-rich mitogenome outside of jakobids, by housing 91 genes, including 62 protein-coding ones. These include rare homologs of the four subunits of the bacterial-type cytochrome c maturation system I (CcmA, CcmB, CcmC, and CcmF) alongside a unique ribosomal protein S6. During the early evolution of mitochondria, gene transfer from the proto-mitochondrial endosymbiont to the nucleus became possible thanks to systems facilitating the transport of proteins synthesized in the host cytoplasm back to the mitochondrion. In addition to the universally found eukaryotic protein import systems, jakobid mitogenomes were reported to uniquely encode the SecY transmembrane protein of the Sec general secretory pathway, whose evolutionary origin was however unclear. The Mantamonas mitogenome not only encodes SecY but also SecA, SecE, and SecG, making it the sole eukaryote known to house a complete mitochondrial Sec translocation system. Furthermore, our phylogenetic and comparative genomic analyses provide compelling evidence for the alphaproteobacterial origin of this system, establishing its presence in LECA.
Collapse
Affiliation(s)
- David Moreira
- Unité d'Ecologie Systématique et Evolution, CNRS, Université Paris-Saclay, AgroParisTech, 91190 Gif-sur-Yvette, France.
| | - Jazmin Blaz
- Unité d'Ecologie Systématique et Evolution, CNRS, Université Paris-Saclay, AgroParisTech, 91190 Gif-sur-Yvette, France
| | - Eunsoo Kim
- Division of EcoScience, Ewha Womans University, Seoul, South Korea; Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA
| | - Laura Eme
- Unité d'Ecologie Systématique et Evolution, CNRS, Université Paris-Saclay, AgroParisTech, 91190 Gif-sur-Yvette, France.
| |
Collapse
|
2
|
Pan X, Li Y, Huang P, Staecker H, He M. Extracellular vesicles for developing targeted hearing loss therapy. J Control Release 2024; 366:460-478. [PMID: 38182057 DOI: 10.1016/j.jconrel.2023.12.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 12/19/2023] [Accepted: 12/28/2023] [Indexed: 01/07/2024]
Abstract
Substantial efforts have been made for local administration of small molecules or biologics in treating hearing loss diseases caused by either trauma, genetic mutations, or drug ototoxicity. Recently, extracellular vesicles (EVs) naturally secreted from cells have drawn increasing attention on attenuating hearing impairment from both preclinical studies and clinical studies. Highly emerging field utilizing diverse bioengineering technologies for developing EVs as the bioderived therapeutic materials, along with artificial intelligence (AI)-based targeting toolkits, shed the light on the unique properties of EVs specific to inner ear delivery. This review will illuminate such exciting research field from fundamentals of hearing protective functions of EVs to biotechnology advancement and potential clinical translation of functionalized EVs. Specifically, the advancements in assessing targeting ligands using AI algorithms are systematically discussed. The overall translational potential of EVs is reviewed in the context of auditory sensing system for developing next generation gene therapy.
Collapse
Affiliation(s)
- Xiaoshu Pan
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida 32610, United States
| | - Yanjun Li
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, Florida 32610, United States
| | - Peixin Huang
- Department of Otolaryngology, Head and Neck Surgery, University of Kansas School of Medicine, Kansas City, Kansas 66160, United States
| | - Hinrich Staecker
- Department of Otolaryngology, Head and Neck Surgery, University of Kansas School of Medicine, Kansas City, Kansas 66160, United States.
| | - Mei He
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida 32610, United States.
| |
Collapse
|
3
|
Huang GJ, Parry TK, McLaughlin WA. Assessment of the Performances of the Protein Modeling Techniques Participating in CASP15 Using a Structure-Based Functional Site Prediction Approach: ResiRole. Bioengineering (Basel) 2023; 10:1377. [PMID: 38135968 PMCID: PMC10740689 DOI: 10.3390/bioengineering10121377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/27/2023] [Accepted: 11/28/2023] [Indexed: 12/24/2023] Open
Abstract
BACKGROUND Model quality assessments via computational methods which entail comparisons of the modeled structures to the experimentally determined structures are essential in the field of protein structure prediction. The assessments provide means to benchmark the accuracies of the modeling techniques and to aid with their development. We previously described the ResiRole method to gauge model quality principally based on the preservation of the structural characteristics described in SeqFEATURE functional site prediction models. METHODS We apply ResiRole to benchmark modeling group performances in the Critical Assessment of Structure Prediction experiment, round 15. To gauge model quality, a normalized Predicted Functional site Similarity Score (PFSS) was calculated as the average of one minus the absolute values of the differences of the functional site prediction probabilities, as found for the experimental structures versus those found at the corresponding sites in the structure models. RESULTS The average PFSS per modeling group (gPFSS) correlates with standard quality metrics, and can effectively be used to rank the accuracies of the groups. For the free modeling (FM) category, correlation coefficients of the Local Distance Difference Test (LDDT) and Global Distance Test-Total Score (GDT-TS) metrics with gPFSS were 0.98239 and 0.87691, respectively. An example finding for a specific group is that the gPFSS for EMBER3D was higher than expected based on the predictive relationship between gPFSS and LDDT. We infer the result is due to the use of constraints imprinted by function that are a part of the EMBER3D methodology. Also, we find functional site predictions that may guide further functional characterizations of the respective proteins. CONCLUSION The gPFSS metric provides an effective means to assess and rank the performances of the structure prediction techniques according to their abilities to accurately recount the structural features at predicted functional sites.
Collapse
Affiliation(s)
| | | | - William A. McLaughlin
- Department of Medical Education, Geisinger Commonwealth School of Medicine, 525 Pine Street, Scranton, PA 18509, USA (T.K.P.)
| |
Collapse
|
4
|
Hasan MM, Nabi AN, Yasmin T. Comprehensive analysis predicting effects of deleterious SNPs of human progesterone receptor gene on its structure and functions: a computational approach. J Biomol Struct Dyn 2023; 41:8002-8017. [PMID: 36166622 DOI: 10.1080/07391102.2022.2127908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 09/17/2022] [Indexed: 10/14/2022]
Abstract
Progesterone receptor plays a crucial role in the development of the mammary gland and breast cancer. Single nucleotide polymorphisms (SNPs) within its gene, PGR, are associated with the risk of miscarriages and preterm birth as well as many cancers across different populations. The main aim of this work is to investigate the most deleterious SNPs in the PGR gene to identify potential biomarkers for various disease susceptibility and treatments. Both sequence and structure-based computational approaches were adopted and in total 11 nsSNPs have been filtered out of 674 nsSNPs along with seven non-coding SNPs. R740Q, I744T and D746E belonged to a mutation cluster. R740Q, D746E along with S865L altered H-bond interactions within the receptor. The same mutations have been found to be associated with several cancers including uterine and breast cancer among others. It is, therefore, possible that the high-risk SNPs associated with cancers may exert their effect by causing changes in the protein structure, particularly in its bonding patterns, and thus affecting its function. In addition, seven non-coding SNPs that were located in the UTR region created a new miRNA site while three SNPs disrupted a conserved miRNA site. These high-risk SNPs can play an instrumental role in generating a dataset of the PGR gene's SNPs. Thus, the present study may pave the way to design and develop novel therapeutics for overcoming the challenges associated with certain cancers and pregnancy that result from a change in the protein structure and function due to the SNP mutations in the PGR gene.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- M Mahbub Hasan
- Population Genetics Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh
| | - Ahm Nurun Nabi
- Population Genetics Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh
| | - Tahirah Yasmin
- Population Genetics Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh
| |
Collapse
|
5
|
Jiao Y, Xing Y, Sun Y. Impact of E484Q and L452R Mutations on Structure and Binding Behavior of SARS-CoV-2 B.1.617.1 Using Deep Learning AlphaFold2, Molecular Docking and Dynamics Simulation. Int J Mol Sci 2023; 24:11564. [PMID: 37511322 PMCID: PMC10380202 DOI: 10.3390/ijms241411564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 07/04/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
During the outbreak of COVID-19, many SARS-CoV-2 variants presented key amino acid mutations that influenced their binding abilities with angiotensin-converting enzyme 2 (hACE2) and neutralizing antibodies. For the B.1.617 lineage, there had been fears that two key mutations, i.e., L452R and E484Q, would have additive effects on the evasion of neutralizing antibodies. In this paper, we systematically investigated the impact of the L452R and E484Q mutations on the structure and binding behavior of B.1.617.1 using deep learning AlphaFold2, molecular docking and dynamics simulation. We firstly predicted and verified the structure of the S protein containing L452R and E484Q mutations via the AlphaFold2-calculated pLDDT value and compared it with the experimental structure. Next, a molecular simulation was performed to reveal the structural and interaction stabilities of the S protein of the double mutant variant with hACE2. We found that the double mutations, L452R and E484Q, could lead to a decrease in hydrogen bonds and higher interaction energy between the S protein and hACE2, demonstrating the lower structural stability and the worse binding affinity in the long dynamic evolutional process, even though the molecular docking showed the lower binding energy score of the S1 RBD of the double mutant variant with hACE2 than that of the wild type (WT) with hACE2. In addition, docking to three approved neutralizing monoclonal antibodies (mAbs) showed a reduced binding affinity of the double mutant variant, suggesting a lower neutralization ability of the mAbs against the double mutant variant. Our study helps lay the foundation for further SARS-CoV-2 studies and provides bioinformatics and computational insights into how the double mutations lead to immune evasion, which could offer guidance for subsequent biomedical studies.
Collapse
Affiliation(s)
- Yanqi Jiao
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
| | - Yichen Xing
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
| | - Yao Sun
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
| |
Collapse
|
6
|
O'Reilly FJ, Graziadei A, Forbrig C, Bremenkamp R, Charles K, Lenz S, Elfmann C, Fischer L, Stülke J, Rappsilber J. Protein complexes in cells by AI-assisted structural proteomics. Mol Syst Biol 2023; 19:e11544. [PMID: 36815589 PMCID: PMC10090944 DOI: 10.15252/msb.202311544] [Citation(s) in RCA: 40] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 01/24/2023] [Accepted: 02/07/2023] [Indexed: 02/24/2023] Open
Abstract
Accurately modeling the structures of proteins and their complexes using artificial intelligence is revolutionizing molecular biology. Experimental data enable a candidate-based approach to systematically model novel protein assemblies. Here, we use a combination of in-cell crosslinking mass spectrometry and co-fractionation mass spectrometry (CoFrac-MS) to identify protein-protein interactions in the model Gram-positive bacterium Bacillus subtilis. We show that crosslinking interactions prior to cell lysis reveals protein interactions that are often lost upon cell lysis. We predict the structures of these protein interactions and others in the SubtiWiki database with AlphaFold-Multimer and, after controlling for the false-positive rate of the predictions, we propose novel structural models of 153 dimeric and 14 trimeric protein assemblies. Crosslinking MS data independently validates the AlphaFold predictions and scoring. We report and validate novel interactors of central cellular machineries that include the ribosome, RNA polymerase, and pyruvate dehydrogenase, assigning function to several uncharacterized proteins. Our approach uncovers protein-protein interactions inside intact cells, provides structural insight into their interaction interfaces, and is applicable to genetically intractable organisms, including pathogenic bacteria.
Collapse
Affiliation(s)
- Francis J O'Reilly
- Chair of BioanalyticsTechnische Universität BerlinBerlinGermany
- Present address:
Center for Structural Biology, Center for Cancer ResearchNational Cancer Institute (NCI)FrederickMDUSA
| | | | | | - Rica Bremenkamp
- Department of General Microbiology, Institute of Microbiology and GeneticsAugust‐University GöttingenGöttingenGermany
| | | | - Swantje Lenz
- Chair of BioanalyticsTechnische Universität BerlinBerlinGermany
| | - Christoph Elfmann
- Department of General Microbiology, Institute of Microbiology and GeneticsAugust‐University GöttingenGöttingenGermany
| | - Lutz Fischer
- Chair of BioanalyticsTechnische Universität BerlinBerlinGermany
| | - Jörg Stülke
- Department of General Microbiology, Institute of Microbiology and GeneticsAugust‐University GöttingenGöttingenGermany
| | - Juri Rappsilber
- Chair of BioanalyticsTechnische Universität BerlinBerlinGermany
- Wellcome Centre for Cell BiologyUniversity of EdinburghEdinburghUK
| |
Collapse
|
7
|
Moafinejad SN, Pandaranadar Jeyeram IPN, Jaryani F, Shirvanizadeh N, Baulin EF, Bujnicki JM. 1D2DSimScore: A novel method for comparing contacts in biomacromolecules and their complexes. Protein Sci 2023; 32:e4503. [PMID: 36369832 PMCID: PMC9795538 DOI: 10.1002/pro.4503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 10/28/2022] [Accepted: 11/01/2022] [Indexed: 11/13/2022]
Abstract
The biologically relevant structures of proteins and nucleic acids and their complexes are dynamic. They include a combination of regions ranging from rigid structural segments to structural switches to regions that are almost always disordered, which interact with each other in various ways. Comparing conformational changes and variation in contacts between different conformational states is essential to understand the biological functions of proteins, nucleic acids, and their complexes. Here, we describe a new computational tool, 1D2DSimScore, for comparing contacts and contact interfaces in all kinds of macromolecules and macromolecular complexes, including proteins, nucleic acids, and other molecules. 1D2DSimScore can be used to compare structural features of macromolecular models between alternative structures obtained in a particular experiment or to score various predictions against a defined "ideal" reference structure. Comparisons at the level of contacts are particularly useful for flexible molecules, for which comparisons in 3D that require rigid-body superpositions are difficult, and in biological systems where the formation of specific inter-residue contacts is more relevant for the biological function than the maintenance of a specific global 3D structure. Similarity/dissimilarity scores calculated by 1D2DSimScore can be used to complement scores describing 3D structural similarity measures calculated by the existing tools.
Collapse
Affiliation(s)
- S. Naeim Moafinejad
- Laboratory of Bioinformatics and Protein EngineeringInternational Institute of Molecular and Cell Biology in WarsawWarsawPoland
| | | | - Farhang Jaryani
- Laboratory of Bioinformatics and Protein EngineeringInternational Institute of Molecular and Cell Biology in WarsawWarsawPoland
| | - Niloofar Shirvanizadeh
- Laboratory of Bioinformatics and Protein EngineeringInternational Institute of Molecular and Cell Biology in WarsawWarsawPoland
| | - Eugene F. Baulin
- Laboratory of Bioinformatics and Protein EngineeringInternational Institute of Molecular and Cell Biology in WarsawWarsawPoland
| | - Janusz M. Bujnicki
- Laboratory of Bioinformatics and Protein EngineeringInternational Institute of Molecular and Cell Biology in WarsawWarsawPoland
| |
Collapse
|
8
|
Zea DJ, Teppa E, Marino-Buslje C. Easy Not Easy: Comparative Modeling with High-Sequence Identity Templates. Methods Mol Biol 2023; 2627:83-100. [PMID: 36959443 DOI: 10.1007/978-1-0716-2974-1_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Homology modeling is the most common technique to build structural models of a target protein based on the structure of proteins with high-sequence identity and available high-resolution structures. This technique is based on the idea that protein structure shows fewer changes than sequence through evolution. While in this scenario single mutations would minimally perturb the structure, experimental evidence shows otherwise: proteins with high conformational diversity impose a limit of the paradigm of comparative modeling as the same protein sequence can adopt dissimilar three-dimensional structures. These cases present challenges for modeling; at first glance, they may seem to be easy cases, but they have a complexity that is not evident at the sequence level. In this chapter, we address the following questions: Why should we care about conformational diversity? How to consider conformational diversity when doing template-based modeling in a practical way?
Collapse
Affiliation(s)
- Diego Javier Zea
- Laboratory of Computational and Quantitative Biology, LCQB, UMR 7238 CNRS, IBPS, Sorbonne Université, Paris, France
| | - Elin Teppa
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRA, INSA, Toulouse, France
| | | |
Collapse
|
9
|
Varadi M, Nair S, Sillitoe I, Tauriello G, Anyango S, Bienert S, Borges C, Deshpande M, Green T, Hassabis D, Hatos A, Hegedus T, Hekkelman ML, Joosten R, Jumper J, Laydon A, Molodenskiy D, Piovesan D, Salladini E, Salzberg SL, Sommer MJ, Steinegger M, Suhajda E, Svergun D, Tenorio-Ku L, Tosatto S, Tunyasuvunakool K, Waterhouse AM, Žídek A, Schwede T, Orengo C, Velankar S. 3D-Beacons: decreasing the gap between protein sequences and structures through a federated network of protein structure data resources. Gigascience 2022; 11:giac118. [PMID: 36448847 PMCID: PMC9709962 DOI: 10.1093/gigascience/giac118] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/20/2022] [Accepted: 11/11/2022] [Indexed: 12/02/2022] Open
Abstract
While scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, the gap between the number of known protein sequences and their experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational protein modeling approaches. While often powerful on their own, most methods have strengths and weaknesses. Therefore, it benefits researchers to examine models from various model providers and perform comparative analysis to identify what models can best address their specific use cases. To make data from a large array of model providers more easily accessible to the broader scientific community, we established 3D-Beacons, a collaborative initiative to create a federated network with unified data access mechanisms. The 3D-Beacons Network allows researchers to collate coordinate files and metadata for experimentally determined and theoretical protein models from state-of-the-art and specialist model providers and also from the Protein Data Bank.
Collapse
Affiliation(s)
- Mihaly Varadi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SA, UK
| | - Sreenath Nair
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SA, UK
| | - Ian Sillitoe
- Department of Structural and Molecular Biology, UCL, London WC1E 6BT, UK
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel 4056, Switzerland
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Stephen Anyango
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SA, UK
| | - Stefan Bienert
- Biozentrum, University of Basel, Basel 4056, Switzerland
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Clemente Borges
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
- European Molecular Biology Laboratory, EMBL Hamburg, Hamburg 69117, Germany
| | - Mandar Deshpande
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SA, UK
| | | | | | - Andras Hatos
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
- Department of Oncology, Lausanne University Hospital, Lausanne 1015, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Swiss Cancer Center Leman, Lausanne 1005, Switzerland
| | - Tamas Hegedus
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest 1094, Hungary
| | | | - Robbie Joosten
- Netherlands Cancer Institute, Amsterdam 1066 CX, The Netherlands
| | | | | | - Dmitry Molodenskiy
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
- European Molecular Biology Laboratory, EMBL Hamburg, Hamburg 69117, Germany
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
| | - Edoardo Salladini
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
| | - Steven L Salzberg
- Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Markus J Sommer
- Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Martin Steinegger
- School of Biology, Seoul National University, Seoul 82-2-880-6971, 6977, South Korea
| | - Erzsebet Suhajda
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest 1094, Hungary
| | - Dmitri Svergun
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
- European Molecular Biology Laboratory, EMBL Hamburg, Hamburg 69117, Germany
| | - Luiggi Tenorio-Ku
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
| | - Silvio Tosatto
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
| | | | - Andrew Mark Waterhouse
- Biozentrum, University of Basel, Basel 4056, Switzerland
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | | | - Torsten Schwede
- Biozentrum, University of Basel, Basel 4056, Switzerland
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Christine Orengo
- Department of Structural and Molecular Biology, UCL, London WC1E 6BT, UK
| | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SA, UK
| |
Collapse
|
10
|
Lee C, Su BH, Tseng YJ. Comparative studies of AlphaFold, RoseTTAFold and Modeller: a case study involving the use of G-protein-coupled receptors. Brief Bioinform 2022; 23:6658852. [PMID: 35945035 PMCID: PMC9487610 DOI: 10.1093/bib/bbac308] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 06/22/2022] [Accepted: 07/07/2022] [Indexed: 11/13/2022] Open
Abstract
Neural network (NN)-based protein modeling methods have improved significantly in recent years. Although the overall accuracy of the two non-homology-based modeling methods, AlphaFold and RoseTTAFold, is outstanding, their performance for specific protein families has remained unexamined. G-protein-coupled receptor (GPCR) proteins are particularly interesting since they are involved in numerous pathways. This work directly compares the performance of these novel deep learning-based protein modeling methods for GPCRs with the most widely used template-based software—Modeller. We collected the experimentally determined structures of 73 GPCRs from the Protein Data Bank. The official AlphaFold repository and RoseTTAFold web service were used with default settings to predict five structures of each protein sequence. The predicted models were then aligned with the experimentally solved structures and evaluated by the root-mean-square deviation (RMSD) metric. If only looking at each program’s top-scored structure, Modeller had the smallest average modeling RMSD of 2.17 Å, which is better than AlphaFold’s 5.53 Å and RoseTTAFold’s 6.28 Å, probably since Modeller already included many known structures as templates. However, the NN-based methods (AlphaFold and RoseTTAFold) outperformed Modeller in 21 and 15 out of the 73 cases with the top-scored model, respectively, where no good templates were available for Modeller. The larger RMSD values generated by the NN-based methods were primarily due to the differences in loop prediction compared to the crystal structures.
Collapse
Affiliation(s)
- Chien Lee
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Bo-Han Su
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Yufeng Jane Tseng
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan.,Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
11
|
Binette V, Mousseau N, Tuffery P. A Generalized Attraction-Repulsion Potential and Revisited Fragment Library Improves PEP-FOLD Peptide Structure Prediction. J Chem Theory Comput 2022; 18:2720-2736. [PMID: 35298162 DOI: 10.1021/acs.jctc.1c01293] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Fast and accurate structure prediction is essential to the study of peptide function, molecular targets, and interactions and has been the subject of considerable efforts in the past decade. In this work, we present improvements to the popular simplified PEP-FOLD technique for small peptide structure prediction. PEP-FOLD originality is threefold: (i) it uses a predetermined structural alphabet, (ii) it uses a sequential algorithm to reconstruct the tridimensional structures of these peptides in a discrete space using a fragment library, and (iii) it assesses the energy of these structures using a coarse-grained representation in which all of the backbone atoms but the α-hydrogen are present, and the side chain corresponds to a unique bead. In former versions of PEP-FOLD, a van der Waals formulation was used for non-bonded interactions, with each side chain being associated with a fixed radius. Here, we explore the relevance of using instead a generalized formulation in which not only the optimal distance of interaction and the energy at this distance are parameters but also the distance at which the potential is zero. This allows each side chain to be associated with a different radius and potential energy shape, depending on its interaction partner, and in principle to make more effective the coarse-grained representation. In addition, the new PEP-FOLD version is associated with an updated library of fragments. We show that these modifications lead to important improvements for many of the problematic targets identified with the former PEP-FOLD version while maintaining already correct predictions. The improvement is in terms of both model ranking and model accuracy. We also compare the PEP-FOLD enhanced version to state-of-the-art techniques for both peptide and structure predictions: APPTest, RaptorX, and AlphaFold2. We find that the new predictions are superior, in particular with respect to the prediction of small β-targets, to those of APPTest and RaptorX and bring, with its original approach, additional understanding on folded structures, even when less precise than AlphaFold2. With their strong physical influence, the revised structural library and coarse-grained potential offer, however, the means for a deeper understanding of the nature of folding and open a solid basis for studying flexibility and other dynamical properties not accessible to IA structure prediction approaches.
Collapse
Affiliation(s)
- Vincent Binette
- Départment de Physique, Université de Montréal, Case postale 6128, succursale Centre-ville, Montréal, QC H3C 3J7, Canada
| | - Normand Mousseau
- Départment de Physique, Université de Montréal, Case postale 6128, succursale Centre-ville, Montréal, QC H3C 3J7, Canada
| | - Pierre Tuffery
- Université de Paris, INSERM U1133, CNRS UMR 8251, F-75205 Paris, France
| |
Collapse
|
12
|
Cragnolini T, Kryshtafovych A, Topf M. Cryo-EM targets in CASP14. Proteins 2021; 89:1949-1958. [PMID: 34398978 PMCID: PMC8630773 DOI: 10.1002/prot.26216] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 07/27/2021] [Accepted: 08/06/2021] [Indexed: 11/22/2022]
Abstract
Structures of seven CASP14 targets were determined using cryo-electron microscopy (cryo-EM) technique with resolution between 2.1 and 3.8 Å. We provide an evaluation of the submitted models versus the experimental data (cryo-EM density maps) and experimental reference structures built into the maps. The accuracy of models is measured in terms of coordinate-to-density and coordinate-to-coordinate fit. A-posteriori refinement of the most accurate models in their corresponding cryo-EM density resulted in structures that are close to the reference structure, including some regions with better fit to the density. Regions that were found to be less "refineable" correlate well with regions of high diversity between the CASP models and low goodness-of-fit to density in the reference structure.
Collapse
Affiliation(s)
- Tristan Cragnolini
- Institute of Structural and Molecular Biology, Birkbeck, University College London, London, UK
| | | | - Maya Topf
- Center for Structural Systems Biology, Leibniz-Institut für Experimentelle Virologie and Universitätsklinikum Hamburg-Eppendorf (UKE), Hamburg, Germany
| |
Collapse
|
13
|
Amoozadeh S, Johnston J, Meisrimler CN. Exploiting Structural Modelling Tools to Explore Host-Translocated Effector Proteins. Int J Mol Sci 2021; 22:12962. [PMID: 34884778 PMCID: PMC8657640 DOI: 10.3390/ijms222312962] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 11/24/2021] [Accepted: 11/26/2021] [Indexed: 12/12/2022] Open
Abstract
Oomycete and fungal interactions with plants can be neutral, symbiotic or pathogenic with different impact on plant health and fitness. Both fungi and oomycetes can generate so-called effector proteins in order to successfully colonize the host plant. These proteins modify stress pathways, developmental processes and the innate immune system to the microbes' benefit, with a very different outcome for the plant. Investigating the biological and functional roles of effectors during plant-microbe interactions are accessible through bioinformatics and experimental approaches. The next generation protein modeling software RoseTTafold and AlphaFold2 have made significant progress in defining the 3D-structure of proteins by utilizing novel machine-learning algorithms using amino acid sequences as their only input. As these two methods rely on super computers, Google Colabfold alternatives have received significant attention, making the approaches more accessible to users. Here, we focus on current structural biology, sequence motif and domain knowledge of effector proteins from filamentous microbes and discuss the broader use of novel modelling strategies, namely AlphaFold2 and RoseTTafold, in the field of effector biology. Finally, we compare the original programs and their Colab versions to assess current strengths, ease of access, limitations and future applications.
Collapse
Affiliation(s)
- Sahel Amoozadeh
- School of Biological Science, University of Canterbury, Christchurch 8041, New Zealand;
| | - Jodie Johnston
- School of Physical and Chemical Sciences, University of Canterbury, Christchurch 8041, New Zealand;
| | | |
Collapse
|
14
|
Igashov I, Pavlichenko N, Grudinin S. Spherical convolutions on molecular graphs for protein model quality assessment. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abf856] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Abstract
Processing information on three-dimensional (3D) objects requires methods stable to rigid-body transformations, in particular rotations, of the input data. In image processing tasks, convolutional neural networks achieve this property using rotation-equivariant operations. However, contrary to images, graphs generally have irregular topology. This makes it challenging to define a rotation-equivariant convolution operation on these structures. In this work, we propose spherical graph convolutional network that processes 3D models of proteins represented as molecular graphs. In a protein molecule, individual amino acids have common topological elements. This allows us to unambiguously associate each amino acid with a local coordinate system and construct rotation-equivariant spherical filters that operate on angular information between graph nodes. Within the framework of the protein model quality assessment problem, we demonstrate that the proposed spherical convolution method significantly improves the quality of model assessment compared to the standard message-passing approach. It is also comparable to state-of-the-art methods, as we demonstrate on critical assessment of structure prediction benchmarks. The proposed technique operates only on geometric features of protein 3D models. This makes it universal and applicable to any other geometric-learning task where the graph structure allows constructing local coordinate systems. The method is available at https://team.inria.fr/nano-d/software/s-gcn/.
Collapse
|
15
|
Igashov I, Olechnovič L, Kadukova M, Venclovas Č, Grudinin S. VoroCNN: Deep convolutional neural network built on 3D Voronoi tessellation of protein structures. Bioinformatics 2021; 37:2332-2339. [PMID: 33620450 DOI: 10.1093/bioinformatics/btab118] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 01/08/2021] [Accepted: 02/22/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Effective use of evolutionary information has recently led to tremendous progress in computational prediction of three-dimensional (3D) structures of proteins and their complexes. Despite the progress, the accuracy of predicted structures tends to vary considerably from case to case. Since the utility of computational models depends on their accuracy, reliable estimates of deviation between predicted and native structures are of utmost importance. RESULTS For the first time, we present a deep convolutional neural network (CNN) constructed on a Voronoi tessellation of 3D molecular structures. Despite the irregular data domain, our data representation allows us to efficiently introduce both convolution and pooling operations and train the network in an end-to-end fashion without precomputed descriptors. The resultant model, VoroCNN, predicts local qualities of 3D protein folds. The prediction results are competitive to state of the art and superior to the previous 3D CNN architectures built for the same task. We also discuss practical applications of VoroCNN, for example, in recognition of protein binding interfaces. AVAILABILITY The model, data, and evaluation tests are available at https://team.inria.fr/nano-d/software/vorocnn/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ilia Igashov
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.,Moscow Institute of Physics and Technology, 141701 Dolgoprudniy, Russia
| | - Liment Olechnovič
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, LT 10257, Lithuania
| | - Maria Kadukova
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.,Moscow Institute of Physics and Technology, 141701 Dolgoprudniy, Russia
| | - Česlovas Venclovas
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, LT 10257, Lithuania
| | - Sergei Grudinin
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
16
|
Studer G, Rempfer C, Waterhouse AM, Gumienny R, Haas J, Schwede T. QMEANDisCo-distance constraints applied on model quality estimation. Bioinformatics 2020; 36:1765-1771. [PMID: 31697312 PMCID: PMC7075525 DOI: 10.1093/bioinformatics/btz828] [Citation(s) in RCA: 462] [Impact Index Per Article: 115.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 10/24/2019] [Accepted: 11/06/2019] [Indexed: 01/13/2023] Open
Abstract
Motivation Methods that estimate the quality of a 3D protein structure model in absence of an experimental reference structure are crucial to determine a model’s utility and potential applications. Single model methods assess individual models whereas consensus methods require an ensemble of models as input. In this work, we extend the single model composite score QMEAN that employs statistical potentials of mean force and agreement terms by introducing a consensus-based distance constraint (DisCo) score. Results DisCo exploits distance distributions from experimentally determined protein structures that are homologous to the model being assessed. Feed-forward neural networks are trained to adaptively weigh contributions by the multi-template DisCo score and classical single model QMEAN parameters. The result is the composite score QMEANDisCo, which combines the accuracy of consensus methods with the broad applicability of single model approaches. We also demonstrate that, despite being the de-facto standard for structure prediction benchmarking, CASP models are not the ideal data source to train predictive methods for model quality estimation. For performance assessment, QMEANDisCo is continuously benchmarked within the CAMEO project and participated in CASP13. For both, it ranks among the top performers and excels with low response times. Availability and implementation QMEANDisCo is available as web-server at https://swissmodel.expasy.org/qmean. The source code can be downloaded from https://git.scicore.unibas.ch/schwede/QMEAN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriel Studer
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Christine Rempfer
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Andrew M Waterhouse
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Rafal Gumienny
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Juergen Haas
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| |
Collapse
|
17
|
Olechnovič K, Venclovas Č. Contact Area-Based Structural Analysis of Proteins and Their Complexes Using CAD-Score. Methods Mol Biol 2020; 2112:75-90. [PMID: 32006279 DOI: 10.1007/978-1-0716-0270-6_6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Quantifying discrepancies between computationally derived and native (reference) structures is an essential step in the development and comparison of protein modeling and protein-protein docking methods. Measuring conformational differences of proteins or protein complexes is also important in other areas of structural biology such as molecular dynamics and crystallography. There are multiple scores to do that. However, nearly all of them, whether superposition-based (e.g., RMSD) or superposition-free, use distances to measure similarity. CAD-score is conceptually different as it uses physical contacts represented as contact areas. Such representation makes it possible to quantify differences of both structures and surfaces (e.g., protein-protein interfaces and binding sites) using the same framework. A number of studies have found CAD-score to be among the most robust scores. The method is implemented both as a web server and as standalone software available at http://bioinformatics.lt/software/cad-score . Here, we describe how to use the standalone CAD-score software for comparison and analysis of protein structures, interfaces, and binding sites.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania.
| |
Collapse
|
18
|
Fajardo JE, Shrestha R, Gil N, Belsom A, Crivelli SN, Czaplewski C, Fidelis K, Grudinin S, Karasikov M, Karczyńska AS, Kryshtafovych A, Leitner A, Liwo A, Lubecka EA, Monastyrskyy B, Pagès G, Rappsilber J, Sieradzan AK, Sikorska C, Trabjerg E, Fiser A. Assessment of chemical-crosslink-assisted protein structure modeling in CASP13. Proteins 2019; 87:1283-1297. [PMID: 31569265 PMCID: PMC6851497 DOI: 10.1002/prot.25816] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 08/08/2019] [Accepted: 09/13/2019] [Indexed: 12/22/2022]
Abstract
With the advance of experimental procedures obtaining chemical crosslinking information is becoming a fast and routine practice. Information on crosslinks can greatly enhance the accuracy of protein structure modeling. Here, we review the current state of the art in modeling protein structures with the assistance of experimentally determined chemical crosslinks within the framework of the 13th meeting of Critical Assessment of Structure Prediction approaches. This largest-to-date blind assessment reveals benefits of using data assistance in difficult to model protein structure prediction cases. However, in a broader context, it also suggests that with the unprecedented advance in accuracy to predict contacts in recent years, experimental crosslinks will be useful only if their specificity and accuracy further improved and they are better integrated into computational workflows.
Collapse
Affiliation(s)
- J. Eduardo Fajardo
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Rojan Shrestha
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Nelson Gil
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Adam Belsom
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany
| | - Silvia N. Crivelli
- Department of Computer Science, UC Davis, One Shields Ave., Davis, CA 95616
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP LJK, 38000 Grenoble, France
| | - Mikhail Karasikov
- Center for Energy Systems, Skolkovo Institute of Science and Technology, Moscow, 143026, Russia
- Moscow Institute of Physics and Technology, Moscow, 141701, Russia
- Department of Computer Science, ETH Zurich, Zurich, 8092, Switzerland
| | | | - Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Alexander Leitner
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Otto-Stern-Weg 3, 8093 Zurich, Switzerland
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
- School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Emilia A. Lubecka
- Institute of Informatics, Faculty of Mathematics, Physics, and Informatics, University of Gdańsk, Wita Stwosza 57, 80-308 Gdańsk, Poland
| | - Bohdan Monastyrskyy
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Guillaume Pagès
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP LJK, 38000 Grenoble, France
| | - Juri Rappsilber
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom
| | - Adam K. Sieradzan
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Celina Sikorska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Esben Trabjerg
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Otto-Stern-Weg 3, 8093 Zurich, Switzerland
| | - Andras Fiser
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| |
Collapse
|
19
|
Kryshtafovych A, Malhotra S, Monastyrskyy B, Cragnolini T, Joseph AP, Chiu W, Topf M. Cryo-electron microscopy targets in CASP13: Overview and evaluation of results. Proteins 2019; 87:1128-1140. [PMID: 31576602 PMCID: PMC7197460 DOI: 10.1002/prot.25817] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 08/30/2019] [Accepted: 09/13/2019] [Indexed: 11/07/2022]
Abstract
Structures of seven CASP13 targets were determined using cryo-electron microscopy (cryo-EM) technique with resolution between 3.0 and 4.0 Å. We provide an overview of the experimentally derived structures and describe results of the numerical evaluation of the submitted models. The evaluation is carried out by comparing coordinates of models to those of reference structures (CASP-style evaluation), as well as checking goodness-of-fit of modeled structures to the cryo-EM density maps. The performance of contributing research groups in the CASP-style evaluation is measured in terms of backbone accuracy, all-atom local geometry and similarity of inter-subunit interfaces. The results on the cryo-EM targets are compared with those on the whole set of eighty CASP13 targets. A posteriori refinement of the best models in their corresponding cryo-EM density maps resulted in structures that are very close to the reference structure, including some regions with better fit to the density.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - Sony Malhotra
- Institute of Structural and Molecular Biology, Birkbeck, University College London, Malet Street, London WC1E 7HX, UK
| | - Bohdan Monastyrskyy
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - Tristan Cragnolini
- Institute of Structural and Molecular Biology, Birkbeck, University College London, Malet Street, London WC1E 7HX, UK
| | - Agnel-Praveen Joseph
- Institute of Structural and Molecular Biology, Birkbeck, University College London, Malet Street, London WC1E 7HX, UK
| | - Wah Chiu
- Department of Bioengineering, Microbiology and Immunology and Photon Science, Stanford University, James H. Clark Center, MC5447, 318 Campus Drive, Stanford, CA 94305, USA
| | - Maya Topf
- Institute of Structural and Molecular Biology, Birkbeck, University College London, Malet Street, London WC1E 7HX, UK
| |
Collapse
|
20
|
Hura GL, Hodge CD, Rosenberg D, Guzenko D, Duarte JM, Monastyrskyy B, Grudinin S, Kryshtafovych A, Tainer JA, Fidelis K, Tsutakawa SE. Small angle X-ray scattering-assisted protein structure prediction in CASP13 and emergence of solution structure differences. Proteins 2019; 87:1298-1314. [PMID: 31589784 DOI: 10.1002/prot.25827] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 09/27/2019] [Accepted: 09/27/2019] [Indexed: 12/14/2022]
Abstract
Small angle X-ray scattering (SAXS) measures comprehensive distance information on a protein's structure, which can constrain and guide computational structure prediction algorithms. Here, we evaluate structure predictions of 11 monomeric and oligomeric proteins for which SAXS data were collected and provided to predictors in the 13th round of the Critical Assessment of protein Structure Prediction (CASP13). The category for SAXS-assisted predictions made gains in certain areas for CASP13 compared to CASP12. Improvements included higher quality data with size exclusion chromatography-SAXS (SEC-SAXS) and better selection of targets and communication of results by CASP organizers. In several cases, we can track improvements in model accuracy with use of SAXS data. For hard multimeric targets where regular folding algorithms were unsuccessful, SAXS data helped predictors to build models better resembling the global shape of the target. For most models, however, no significant improvement in model accuracy at the domain level was registered from use of SAXS data, when rigorously comparing SAXS-assisted models to the best regular server predictions. To promote future progress in this category, we identify successes, challenges, and opportunities for improved strategies in prediction, assessment, and communication of SAXS data to predictors. An important observation is that, for many targets, SAXS data were inconsistent with crystal structures, suggesting that these proteins adopt different conformation(s) in solution. This CASP13 result, if representative of PDB structures and future CASP targets, may have substantive implications for the structure training databases used for machine learning, CASP, and use of prediction models for biology.
Collapse
Affiliation(s)
- Greg L Hura
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California.,Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, California
| | - Curtis D Hodge
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California
| | - Daniel Rosenberg
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California
| | - Dmytro Guzenko
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California
| | - Bohdan Monastyrskyy
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000, Grenoble, France
| | - Andriy Kryshtafovych
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California
| | - John A Tainer
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California.,Department of Molecular and Cellular Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
| | - Krzysztof Fidelis
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California
| | - Susan E Tsutakawa
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California
| |
Collapse
|
21
|
Sala D, Huang YJ, Cole CA, Snyder DA, Liu G, Ishida Y, Swapna GVT, Brock KP, Sander C, Fidelis K, Kryshtafovych A, Inouye M, Tejero R, Valafar H, Rosato A, Montelione GT. Protein structure prediction assisted with sparse NMR data in CASP13. Proteins 2019; 87:1315-1332. [PMID: 31603581 DOI: 10.1002/prot.25837] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 01/05/2023]
Abstract
CASP13 has investigated the impact of sparse NMR data on the accuracy of protein structure prediction. NOESY and 15 N-1 H residual dipolar coupling data, typical of that obtained for 15 N,13 C-enriched, perdeuterated proteins up to about 40 kDa, were simulated for 11 CASP13 targets ranging in size from 80 to 326 residues. For several targets, two prediction groups generated models that are more accurate than those produced using baseline methods. Real NMR data collected for a de novo designed protein were also provided to predictors, including one data set in which only backbone resonance assignments were available. Some NMR-assisted prediction groups also did very well with these data. CASP13 also assessed whether incorporation of sparse NMR data improves the accuracy of protein structure prediction relative to nonassisted regular methods. In most cases, incorporation of sparse, noisy NMR data results in models with higher accuracy. The best NMR-assisted models were also compared with the best regular predictions of any CASP13 group for the same target. For six of 13 targets, the most accurate model provided by any NMR-assisted prediction group was more accurate than the most accurate model provided by any regular prediction group; however, for the remaining seven targets, one or more regular prediction method provided a more accurate model than even the best NMR-assisted model. These results suggest a novel approach for protein structure determination, in which advanced prediction methods are first used to generate structural models, and sparse NMR data is then used to validate and/or refine these models.
Collapse
Affiliation(s)
- Davide Sala
- Magnetic Resonance Center, University of Florence, Sesto Fiorentino, Italy.,Department of Chemistry, University of Florence, Sesto Fiorentino, Italy
| | - Yuanpeng Janet Huang
- Center for Advanced Biotechnology and Medicine, and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey.,Department of Chemistry and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, New York
| | - Casey A Cole
- Department of Computer Science & Engineering, University of South Carolina, Columbia, South Carolina
| | - David A Snyder
- Department of Chemistry, College of Science and Health, William Paterson University, Wayne, New Jersey
| | - Gaohua Liu
- Center for Advanced Biotechnology and Medicine, and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey.,Nexomics Biosciences, Bordentown, New Jersey
| | - Yojiro Ishida
- Center for Advanced Biotechnology and Medicine, and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey.,Department of Biochemistry and Molecular Biology, The Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, New Jersey
| | - G V T Swapna
- Center for Advanced Biotechnology and Medicine, and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey
| | - Kelly P Brock
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts.,cBio Center, Dana-Farber Cancer Institute, Boston, Massachusetts
| | | | | | - Masayori Inouye
- Department of Biochemistry and Molecular Biology, The Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, New Jersey
| | - Roberto Tejero
- Departamento de Quimica Fisica, Universidad de Valencia, Valencia, Spain
| | - Homayoun Valafar
- Department of Computer Science & Engineering, University of South Carolina, Columbia, South Carolina
| | - Antonio Rosato
- Magnetic Resonance Center, University of Florence, Sesto Fiorentino, Italy.,Department of Chemistry, University of Florence, Sesto Fiorentino, Italy
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey.,Department of Chemistry and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, New York.,Department of Biochemistry and Molecular Biology, The Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, New Jersey
| |
Collapse
|
22
|
Won J, Baek M, Monastyrskyy B, Kryshtafovych A, Seok C. Assessment of protein model structure accuracy estimation in CASP13: Challenges in the era of deep learning. Proteins 2019; 87:1351-1360. [PMID: 31436360 DOI: 10.1002/prot.25804] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 08/08/2019] [Accepted: 08/19/2019] [Indexed: 12/20/2022]
Abstract
Scoring model structure is an essential component of protein structure prediction that can affect the prediction accuracy tremendously. Users of protein structure prediction results also need to score models to select the best models for their application studies. In Critical Assessment of techniques for protein Structure Prediction (CASP), model accuracy estimation methods have been tested in a blind fashion by providing models submitted by the tertiary structure prediction servers for scoring. In CASP13, model accuracy estimation results were evaluated in terms of both global and local structure accuracy. Global structure accuracy estimation was evaluated by the quality of the models selected by the global structure scores and by the absolute estimates of the global scores. Residue-wise, local structure accuracy estimations were evaluated by three different measures. A new measure introduced in CASP13 evaluates the ability to predict inaccurately modeled regions that may be improved by refinement. An intensive comparative analysis on CASP13 and the previous CASPs revealed that the tertiary structure models generated by the CASP13 servers show very distinct features. Higher consensus toward models of higher global accuracy appeared even for free modeling targets, and many models of high global accuracy were not well optimized at the atomic level. This is related to the new technology in CASP13, deep learning for tertiary contact prediction. The tertiary model structures generated by deep learning pose a new challenge for EMA (estimation of model accuracy) method developers. Model accuracy estimation itself is also an area where deep learning can potentially have an impact, although current EMA methods have not fully explored that direction.
Collapse
Affiliation(s)
- Jonghun Won
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Minkyung Baek
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | | | | | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
23
|
Croll TI, Sammito MD, Kryshtafovych A, Read RJ. Evaluation of template-based modeling in CASP13. Proteins 2019; 87:1113-1127. [PMID: 31407380 PMCID: PMC6851432 DOI: 10.1002/prot.25800] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 07/29/2019] [Accepted: 08/08/2019] [Indexed: 12/12/2022]
Abstract
Performance in the template‐based modeling (TBM) category of CASP13 is assessed here, using a variety of metrics. Performance of the predictor groups that participated is ranked using the primary ranking score that was developed by the assessors for CASP12. This reveals that the best results are obtained by groups that include contact predictions or inter‐residue distance predictions derived from deep multiple sequence alignments. In cases where there is a good homolog in the wwPDB (TBM‐easy category), the best results are obtained by modifying a template. However, for cases with poorer homologs (TBM‐hard), very good results can be obtained without using an explicit template, by deep learning algorithms trained on the wwPDB. Alternative metrics are introduced, to allow testing of aspects of structural models that are not addressed by traditional CASP metrics. These include comparisons to the main‐chain and side‐chain torsion angles of the target, and the utility of models for solving crystal structures by the molecular replacement method. The alternative metrics are poorly correlated with the traditional metrics, and it is proposed that modeling has reached a sufficient level of maturity that the best models should be expected to satisfy this wider range of criteria.
Collapse
Affiliation(s)
- Tristan I Croll
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| | - Massimo D Sammito
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| | | | - Randy J Read
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| |
Collapse
|
24
|
Abriata LA, Tamò GE, Dal Peraro M. A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments. Proteins 2019; 87:1100-1112. [PMID: 31344267 DOI: 10.1002/prot.25787] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/26/2019] [Accepted: 07/19/2019] [Indexed: 12/22/2022]
Abstract
We present our assessment of tertiary structure predictions for hard targets in Critical Assessment of Structure Prediction round 13 (CASP13). The analysis includes (a) assignment and discussion of best models through scores-aided visual inspection of models for each evaluation unit (EU); (b) ranking of predictors resulting from this evaluation and from global scores; and (c) evaluation of progress, state of the art, and current limitations of protein structure prediction. We witness a sizable improvement in tertiary structure prediction building on the progress observed from CASP11 to CASP12, with (a) top models reaching backbone RMSD <3 å for several EUs of size <150 residues, contributed by many groups; (b) at least one model that roughly captures global topology for all EUs, probably unprecedented in this track of CASP; and (c) even quite good models for full, unsplit targets. Better structure predictions are brought about mainly by improved residue-residue contact predictions, and since this CASP also by distance predictions, achieved through state-of-the-art machine learning methods which also progressed to work with slightly shallower alignments compared to CASP12. As we reach a new realm of tertiary structure prediction quality, new directions are proposed and explored for future CASPs: (a) dropping splitting into EUs, (b) rethinking difficulty metrics probably in terms of contact and distance predictions, (c) assessing also side chains for models of high backbone accuracy, and (d) assessing residue-wise and possibly residue-residue quality estimates.
Collapse
Affiliation(s)
- Luciano A Abriata
- School of Life Sciences, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Giorgio E Tamò
- School of Life Sciences, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Matteo Dal Peraro
- School of Life Sciences, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
25
|
Cheng J, Choe MH, Elofsson A, Han KS, Hou J, Maghrabi AHA, McGuffin LJ, Menéndez-Hurtado D, Olechnovič K, Schwede T, Studer G, Uziela K, Venclovas Č, Wallner B. Estimation of model accuracy in CASP13. Proteins 2019; 87:1361-1377. [PMID: 31265154 DOI: 10.1002/prot.25767] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 06/04/2019] [Accepted: 06/15/2019] [Indexed: 12/28/2022]
Abstract
Methods to reliably estimate the accuracy of 3D models of proteins are both a fundamental part of most protein folding pipelines and important for reliable identification of the best models when multiple pipelines are used. Here, we describe the progress made from CASP12 to CASP13 in the field of estimation of model accuracy (EMA) as seen from the progress of the most successful methods in CASP13. We show small but clear progress, that is, several methods perform better than the best methods from CASP12 when tested on CASP13 EMA targets. Some progress is driven by applying deep learning and residue-residue contacts to model accuracy prediction. We show that the best EMA methods select better models than the best servers in CASP13, but that there exists a great potential to improve this further. Also, according to the evaluation criteria based on local similarities, such as lDDT and CAD, it is now clear that single model accuracy methods perform relatively better than consensus-based methods.
Collapse
Affiliation(s)
- Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Myong-Ho Choe
- Department of Life Science, University of Science, Pyongyang, DPR Korea
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Kun-Sop Han
- Department of Life Science, University of Science, Pyongyang, DPR Korea
| | - Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Ali H A Maghrabi
- School of Biological Sciences, University of Reading, Reading, UK
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, UK
| | - David Menéndez-Hurtado
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Basel, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Basel, Switzerland
| | - Karolis Uziela
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Björn Wallner
- Department of Physics, Chemistry, and Biology, Bioinformatics Division, Linköping University, Linköping, Sweden
| |
Collapse
|