1
|
Chen L, Mondal A, Perez A, Miranda-Quintana RA. Protein Retrieval via Integrative Molecular Ensembles (PRIME) through Extended Similarity Indices. J Chem Theory Comput 2024; 20:6303-6315. [PMID: 38978294 PMCID: PMC11807272 DOI: 10.1021/acs.jctc.4c00362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Molecular dynamics (MD) simulations are ideally suited to describe conformational ensembles of biomolecules such as proteins and nucleic acids. Microsecond-long simulations are now routine, facilitated by the emergence of graphical processing units. Clustering, which groups objects based on structural similarity, is typically used to process ensembles, leading to different states, their populations, and the identification of representative structures. A popular pipeline combines hierarchical clustering for clustering and selecting the cluster centroid as representative of the cluster. Here, we propose to improve on this approach, by developing a module-Protein Retrieval via Integrative Molecular Ensembles (PRIME), that consists of tools to improve the prediction of the representative in the most populated cluster using extended continuous similarity. PRIME is integrated with our Molecular Dynamics Analysis with N-ary Clustering Ensembles (MDANCE) package and can be used as a postprocessing tool for arbitrary clustering algorithms, compatible with several MD suites. PRIME predictions produced structures that when aligned to the experimental structure were better superposed (lower RMSD). A further benefit of PRIME is its linear scaling─rather than the traditional O(N2) traditionally associated with comparisons of elements in a set.
Collapse
Affiliation(s)
- Lexin Chen
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| | - Arup Mondal
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| | - Alberto Perez
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| | - Ramón Alain Miranda-Quintana
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| |
Collapse
|
2
|
Faraji N, Daly NL, Arab SS, Khosroushahi AY. In silico design of potential Mcl-1 peptide-based inhibitors. J Mol Model 2024; 30:108. [PMID: 38499818 DOI: 10.1007/s00894-024-05901-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Accepted: 03/10/2024] [Indexed: 03/20/2024]
Abstract
CONTEXT BIM (Bcl-2 interacting mediator of apoptosis)-derived peptides that specifically target over-expressed Mcl-1 (myeloid cell leukemia-1) protein and induce apoptosis are potentially anti-cancer agents. Since the helicity of BIM-derived peptides has a crucial role in their functionality, a range of strategies have been used to increase the helicity including the introduction of unnatural residues and stapling methods that have some drawbacks such as the accumulation in the liver. To avoid these drawbacks, this study aimed to design a more helical peptide by utilizing bioinformatics algorithms and molecular dynamics simulations without exploiting unnatural residues and stapling methods. MM-PBSA results showed that the mutations of A4fE and A2eE in analogue 5 demonstrate a preference towards binding with Mcl-1. As evidenced by Circular dichroism results, the helicity increases from 18 to 34%, these findings could enhance the potential of analogue 5 as an anti-cancer agent targeting Mcl-1. The applied strategies in this research could shed light on the in silico peptide design. Moreover, analogue 5 as a drug candidate can be evaluated in vitro and in vivo studies. METHODS The sequence of the lead peptide was determined using the ApInAPDB database and PRALINE program. Contact finder and PDBsum web server softwares were used to determine the contact involved amino acids in complex with Mcl-1. All identified salt bridge contributing residues were unaltered to preserve the binding affinity. After proposing novel analogues, their secondary structures were predicted by Cham finder web server software and GOR, Neural Network, and Chou-Fasman algorithms. Finally, molecular dynamics simulations run for 100 ns were done using the GROMACS, version 5.0.7, with the CHARMM36 force field. MM-PBSA was used to assess binding affinity specificity in targeting Mcl-1 and Bcl-xL (B-cell lymphoma extra-large).
Collapse
Affiliation(s)
- Naser Faraji
- Department of Medical Nanotechnology, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Daneshgah Street, Tabriz, Iran
- Drug Applied Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Norelle L Daly
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, 4870, Australia
| | - Seyed Shahriar Arab
- Department of Biophysics, Faculty of Biological Sciences, School of Biological Sciences, Tarbiat Modares University, Tehran, Iran.
| | - Ahmad Yari Khosroushahi
- Department of Medical Nanotechnology, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Daneshgah Street, Tabriz, Iran.
- Drug Applied Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
3
|
Baron L, Hadjerci J, Thoidingjam L, Plays M, Bucci R, Morris N, Müller S, Sindikubwabo F, Solier S, Cañeque T, Colombeau L, Blouin CM, Lamaze C, Puisieux A, Bono Y, Gaillet C, Laraia L, Vauzeilles B, Taran F, Papot S, Karoyan P, Duval R, Mahuteau-Betzer F, Arimondo P, Cariou K, Guichard G, Micouin L, Ethève-Quelquejeu M, Verga D, Versini A, Gasser G, Tang C, Belmont P, Linkermann A, Bonfio C, Gillingham D, Poulsen T, Di Antonio M, Lopez M, Guianvarc'h D, Thomas C, Masson G, Gautier A, Johannes L, Rodriguez R. PSL Chemical Biology Symposia Third Edition: A Branch of Science in its Explosive Phase. Chembiochem 2023; 24:e202300093. [PMID: 36942862 DOI: 10.1002/cbic.202300093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Indexed: 03/23/2023]
Abstract
This symposium is the third PSL (Paris Sciences & Lettres) Chemical Biology meeting (2016, 2019, 2023) held at Institut Curie. This initiative originally started at Institut de Chimie des Substances Naturelles (ICSN) in Gif-sur-Yvette (2013, 2014), under the directorship of Professor Max Malacria, with a strong focus on chemistry. It was then continued at the Institut Curie (2015) covering a larger scope, before becoming the official PSL Chemical Biology meeting. This latest edition was postponed twice for the reasons that we know. This has given us the opportunity to invite additional speakers of great standing. This year, Institut Curie hosted around 300 participants, including 220 on site and over 80 online. The pandemic has had, at least, the virtue of promoting online meetings, which we came to realize is not perfect but has its own merits. In particular, it enables those with restricted time and resources to take part in events and meetings, which can now accommodate unlimited participants. We apologize to all those who could not attend in person this time due to space limitation at Institut Curie.
Collapse
Affiliation(s)
- Leeroy Baron
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Justine Hadjerci
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Leishemba Thoidingjam
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Marina Plays
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Romain Bucci
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Nolwenn Morris
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Sebastian Müller
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Fabien Sindikubwabo
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Stéphanie Solier
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Tatiana Cañeque
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Ludovic Colombeau
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Cedric M Blouin
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Christophe Lamaze
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Alain Puisieux
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Yannick Bono
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Christine Gaillet
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Luca Laraia
- Technical University of Denmark, Department of Chemistry, 2800, Kgs. Lyngby, Denmark
| | - Boris Vauzeilles
- Université Paris-Saclay, CNRS UPR 2301, 91198, Gif-sur-Yvette, France
| | - Frédéric Taran
- Université Paris-Saclay, CEA, 91191, Gif-sur-Yvette, France
| | - Sébastien Papot
- Université de Poitiers, CNRS UMR 7285, 86073, Poitiers, France
| | - Philippe Karoyan
- PSL Université Paris, Sorbonne Université Ecole Normale Supérieure, CNRS UMR 7203, 75005, Paris, France
| | - Romain Duval
- Faculté de Pharmacie de Paris, Université Paris Cité CNRS UMR 261, 75006, Paris, France
| | | | | | - Kevin Cariou
- PSL Université Paris, Chimie ParisTech, CNRS, Institute of Chemistry and Health Sciences CNRS UMR 8060, 75005, Paris, France
| | - Gilles Guichard
- Université de Bordeaux, CNRS, Bordeaux INP CBMN, UMR 5248, 33600, Pessac, France
| | | | | | - Daniela Verga
- PSL Université Paris, Institut Curie CNRS UMR 9187, INSERM U1196, 91405, Orsay, France
| | - Antoine Versini
- University of Zurich, Department of Chemistry, 8057, Zurich, Switzerland
| | - Gilles Gasser
- PSL Université Paris, Chimie ParisTech, CNRS, Institute of Chemistry and Health Sciences CNRS UMR 8060, 75005, Paris, France
| | - Cong Tang
- Universidade de Lisboa, Instituto de Medicina Molecular João Lobo Antunes, 1649-028, Lisboa, Portugal
| | | | - Andreas Linkermann
- Technische Universität Dresden Department of Internal Medicine 3, 01062, Dresden, Germany
| | - Claudia Bonfio
- Université de Strasbourg, CNRS UMR 7006, 67000, Strasbourg, France
| | | | - Thomas Poulsen
- Aarhus University, Department of Chemistry, 8000, Aarhus C Aarhus, Denmark
| | - Marco Di Antonio
- Imperial College London, Molecular Sciences Research Hub, London, W12 0BZ, UK
| | - Marie Lopez
- Université de Montpellier, CNRS UMR 5247, 34000, Montpellier, France
| | | | - Christophe Thomas
- PSL Université Paris, Chimie ParisTech CNRS UMR 6226, 75005, Paris, France
| | - Géraldine Masson
- Université Paris-Saclay, CNRS UPR 2301, 91198, Gif-sur-Yvette, France
| | - Arnaud Gautier
- Sorbonne Université, École Normale Supérieure, Université PSL, CNRS, Laboratoire des Biomolécules, LBM, 75005, Paris, France
| | - Ludger Johannes
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| | - Raphaël Rodriguez
- Institut Curie, Department of Cellular and Chemical Biology, UMR 3666 CNRS, U1143 INSERM, PSL Université Paris, 75005, Paris, France
| |
Collapse
|
4
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
5
|
Sala D, Del Alamo D, Mchaourab HS, Meiler J. Modeling of protein conformational changes with Rosetta guided by limited experimental data. Structure 2022; 30:1157-1168.e3. [PMID: 35597243 PMCID: PMC9357069 DOI: 10.1016/j.str.2022.04.013] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 04/08/2022] [Accepted: 04/25/2022] [Indexed: 11/24/2022]
Abstract
Conformational changes are an essential component of functional cycles of many proteins, but their characterization often requires an integrative structural biology approach. Here, we introduce and benchmark ConfChangeMover (CCM), a new method built into the widely used macromolecular modeling suite Rosetta that is tailored to model conformational changes in proteins using sparse experimental data. CCM can rotate and translate secondary structural elements and modify their backbone dihedral angles in regions of interest. We benchmarked CCM on soluble and membrane proteins with simulated Cα-Cα distance restraints and sparse experimental double electron-electron resonance (DEER) restraints, respectively. In both benchmarks, CCM outperformed state-of-the-art Rosetta methods, showing that it can model a diverse array of conformational changes. In addition, the Rosetta framework allows a wide variety of experimental data to be integrated with CCM, thus extending its capability beyond DEER restraints. This method will contribute to the biophysical characterization of protein dynamics.
Collapse
Affiliation(s)
- Davide Sala
- Institute for Drug Discovery, Leipzig University, Leipzig, Saxony 04103, Germany
| | - Diego Del Alamo
- Department of Chemistry, Vanderbilt University, Nashville, TN 37232, USA; Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USA
| | - Hassane S Mchaourab
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USA
| | - Jens Meiler
- Institute for Drug Discovery, Leipzig University, Leipzig, Saxony 04103, Germany; Department of Chemistry, Vanderbilt University, Nashville, TN 37232, USA.
| |
Collapse
|
6
|
Lang EJM, Baker EG, Woolfson DN, Mulholland AJ. Generalized Born Implicit Solvent Models Do Not Reproduce Secondary Structures of De Novo Designed Glu/Lys Peptides. J Chem Theory Comput 2022; 18:4070-4076. [PMID: 35687842 PMCID: PMC9281390 DOI: 10.1021/acs.jctc.1c01172] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
We test a range of
standard generalized Born (GB) models and protein
force fields for a set of five experimentally characterized, designed
peptides comprising alternating blocks of glutamate and lysine, which
have been shown to differ significantly in α-helical content.
Sixty-five combinations of force fields and GB models are evaluated
in >800 μs of molecular dynamics simulations. GB models generally
do not reproduce the experimentally observed α-helical content,
and none perform well for all five peptides. These results illustrate
that these models are not usefully predictive in this context. These
peptides provide a useful test set for simulation methods.
Collapse
Affiliation(s)
- Eric J M Lang
- Centre for Computational Chemistry, School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, U.K.,School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, U.K.,BrisSynBio, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, U.K
| | - Emily G Baker
- School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, U.K.,BrisSynBio, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, U.K
| | - Derek N Woolfson
- School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, U.K.,BrisSynBio, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, U.K.,School of Biochemistry, University of Bristol, Medical Sciences Building, University Walk, Bristol BS8 1TD, U.K
| | - Adrian J Mulholland
- Centre for Computational Chemistry, School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, U.K.,School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, U.K
| |
Collapse
|
7
|
Shrestha B, Adhikari B. Scoring protein sequence alignments using deep Learning. Bioinformatics 2022; 38:2988-2995. [PMID: 35385080 DOI: 10.1093/bioinformatics/btac210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 04/01/2022] [Accepted: 04/05/2022] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND A high-quality sequence alignment (SA) is the most important input feature for accurate protein structure prediction. For a protein sequence, there are many methods to generate a SA. However, when given a choice of more than one SA for a protein sequence, there are no methods to predict which SA may lead to more accurate models without actually building the models. In this work, we describe a method to predict the quality of a protein's SA. METHODS We created our own dataset by generating a variety of SAs for a set of 1,351 representative proteins and investigated various deep learning architectures to predict the local distance difference test (lDDT) scores of distance maps predicted with SAs as the input. These lDDT scores serve as indicators of the quality of the SAs. RESULTS Using two independent test datasets consisting of CASP13 and CASP14 targets, we show that our method is effective for scoring and ranking SAs when a pool of SAs is available for a protein sequence. With an example, we further discuss that SA selection using our method can lead to improved structure prediction. AVAILABILITY Code and datasets are available at https://github.com/ba-lab/Alignment-Score/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bikash Shrestha
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63132, USA
| | - Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63132, USA
| |
Collapse
|
8
|
Hou Q, Pucci F, Pan F, Xue F, Rooman M, Feng Q. Using metagenomic data to boost protein structure prediction and discovery. Comput Struct Biotechnol J 2022; 20:434-442. [PMID: 35070166 PMCID: PMC8760478 DOI: 10.1016/j.csbj.2021.12.030] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 12/17/2021] [Accepted: 12/21/2021] [Indexed: 11/19/2022] Open
Abstract
Over the past decade, metagenomic sequencing approaches have been providing an ever-increasing amount of protein sequence data at an astonishing rate. These constitute an invaluable source of information which has been exploited in various research fields such as the study of the role of the gut microbiota in human diseases and aging. However, only a small fraction of all metagenomic sequences collected have been functionally or structurally characterized, leaving much of them completely unexplored. Here, we review how this information has been used in protein structure prediction and protein discovery. We begin by presenting some widely used metagenomic databases and analyze in detail how metagenomic data has contributed to the impressive improvement in the accuracy of structure prediction methods in recent years. We then examine how metagenomic information can be exploited to annotate protein sequences. More specifically, we focus on the role of metagenomes in the discovery of enzymes and new CRISPR-Cas systems, and in the identification of antibiotic resistance genes. With this review, we provide an overview of how metagenomic data is currently revolutionizing our understanding of protein science.
Collapse
Affiliation(s)
- Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250012, China
- National Institute of Health Data Science of China, Shandong University, Shandong 250002, China
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Fengming Pan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250012, China
- National Institute of Health Data Science of China, Shandong University, Shandong 250002, China
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250012, China
- National Institute of Health Data Science of China, Shandong University, Shandong 250002, China
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Qiang Feng
- Shandong Provincial Key Laboratory of Oral Tissue Regeneration & Shandong Engineering Laboratory for Dental Materials and Oral Tissue Regeneration, Department of Human Microbiome, School of Stomatology, Shandong University, Jinan, Shandong Province 250012, China
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, Shandong Province 266237, China
| |
Collapse
|
9
|
Wang D, Wang Y, Chang J, Zhang L, Wang H, E W. Efficient sampling of high-dimensional free energy landscapes using adaptive reinforced dynamics. NATURE COMPUTATIONAL SCIENCE 2022; 2:20-29. [PMID: 38177702 DOI: 10.1038/s43588-021-00173-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Accepted: 11/15/2021] [Indexed: 01/06/2024]
Abstract
Enhanced sampling methods such as metadynamics and umbrella sampling have become essential tools for exploring the configuration space of molecules and materials. At the same time, they have long faced a number of issues such as the inefficiency when dealing with a large number of collective variables (CVs) or systems with high free energy barriers. Here we show that, with clustering and adaptive tuning techniques, the reinforced dynamics (RiD) scheme can be used to efficiently explore the configuration space and free energy landscapes with a large number of CVs or systems with high free energy barriers. We illustrate this by studying various representative and challenging examples. First we demonstrate the efficiency of adaptive RiD compared with other methods and construct the nine-dimensional (9D) free energy landscape of a peptoid trimer, which has energy barriers of more than 8 kcal mol-1. We then study the folding of the protein chignolin using 18 CVs. In this case, both the folding and unfolding rates are observed to be 4.30 μs-1. Finally, we propose a protein structure refinement protocol based on RiD. This protocol allows us to efficiently employ more than 100 CVs for exploring the landscape of protein structures and it gives rise to an overall improvement of 14.6 units over the initial global distance test-high accuracy (GDT-HA) score.
Collapse
Affiliation(s)
- Dongdong Wang
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
- DP Technology, Beijing, People's Republic of China
| | - Yanze Wang
- DP Technology, Beijing, People's Republic of China
- College of Chemistry and Molecular Engineering, Peking University, Beijing, People's Republic of China
| | - Junhan Chang
- DP Technology, Beijing, People's Republic of China
- College of Chemistry and Molecular Engineering, Peking University, Beijing, People's Republic of China
| | - Linfeng Zhang
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA.
- DP Technology, Beijing, People's Republic of China.
| | - Han Wang
- Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, Beijing, People's Republic of China.
| | - Weinan E
- School of Mathematical Sciences, Peking University, Beijing, People's Republic of China
- Department of Mathematics and Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
- Beijing Institute of Big Data Research, Beijing, People's Republic of China
| |
Collapse
|
10
|
Liu J, Zhao KL, He GX, Wang LJ, Zhou XG, Zhang GJ. A de novo protein structure prediction by iterative partition sampling, topology adjustment and residue-level distance deviation optimization. Bioinformatics 2021; 38:99-107. [PMID: 34459867 DOI: 10.1093/bioinformatics/btab620] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 07/23/2021] [Accepted: 08/25/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION With the great progress of deep learning-based inter-residue contact/distance prediction, the discrete space formed by fragment assembly cannot satisfy the distance constraint well. Thus, the optimal solution of the continuous space may not be achieved. Designing an effective closed-loop continuous dihedral angle optimization strategy that complements the discrete fragment assembly is crucial to improve the performance of the distance-assisted fragment assembly method. RESULTS In this article, we proposed a de novo protein structure prediction method called IPTDFold based on closed-loop iterative partition sampling, topology adjustment and residue-level distance deviation optimization. First, local dihedral angle crossover and mutation operators are designed to explore the conformational space extensively and achieve information exchange between the conformations in the population. Then, the dihedral angle rotation model of loop region with partial inter-residue distance constraints is constructed, and the rotation angle satisfying the constraints is obtained by differential evolution algorithm, so as to adjust the spatial position relationship between the secondary structures. Finally, the residue distance deviation is evaluated according to the difference between the conformation and the predicted distance, and the dihedral angle of the residue is optimized with biased probability. The final model is generated by iterating the above three steps. IPTDFold is tested on 462 benchmark proteins, 24 FM targets of CASP13 and 20 FM targets of CASP14. Results show that IPTDFold is significantly superior to the distance-assisted fragment assembly method Rosetta_D (Rosetta with distance). In particular, the prediction accuracy of IPTDFold does not decrease as the length of the protein increases. When using the same FastRelax protocol, the prediction accuracy of IPTDFold is significantly superior to that of trRosetta without orientation constraints, and is equivalent to that of the full version of trRosetta. AVAILABILITYAND IMPLEMENTATION The source code and executable are freely available at https://github.com/iobio-zjut/IPTDFold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guang-Xing He
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Liu-Jing Wang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
11
|
Anishchenko I, Baek M, Park H, Hiranuma N, Kim DE, Dauparas J, Mansoor S, Humphreys IR, Baker D. Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14. Proteins 2021; 89:1722-1733. [PMID: 34331359 PMCID: PMC8616808 DOI: 10.1002/prot.26194] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/23/2021] [Accepted: 07/25/2021] [Indexed: 12/29/2022]
Abstract
The trRosetta structure prediction method employs deep learning to generate predicted residue-residue distance and orientation distributions from which 3D models are built. We sought to improve the method by incorporating as inputs (in addition to sequence information) both language model embeddings and template information weighted by sequence similarity to the target. We also developed a refinement pipeline that recombines models generated by template-free and template utilizing versions of trRosetta guided by the DeepAccNet accuracy predictor. Both benchmark tests and CASP results show that the new pipeline is a considerable improvement over the original trRosetta, and it is faster and requires less computing resources, completing the entire modeling process in a median < 3 h in CASP14. Our human group improved results with this pipeline primarily by identifying additional homologous sequences for input into the network. We also used the DeepAccNet accuracy predictor to guide Rosetta high-resolution refinement for submissions in the regular and refinement categories; although performance was quite good on a CASP relative scale, the overall improvements were rather modest in part due to missing inter-domain or inter-chain contacts.
Collapse
Affiliation(s)
- Ivan Anishchenko
- Department of Biochemistry and Institute for Protein DesignUniversity of WashingtonSeattleWashingtonUSA
| | - Minkyung Baek
- Department of Biochemistry and Institute for Protein DesignUniversity of WashingtonSeattleWashingtonUSA
| | - Hahnbeom Park
- Department of Biochemistry and Institute for Protein DesignUniversity of WashingtonSeattleWashingtonUSA
| | - Naozumi Hiranuma
- Department of Biochemistry and Institute for Protein DesignUniversity of WashingtonSeattleWashingtonUSA
- Paul G. Allen School of Computer Science & EngineeringUniversity of WashingtonSeattleWashingtonUSA
| | - David E. Kim
- Department of Biochemistry and Institute for Protein DesignUniversity of WashingtonSeattleWashingtonUSA
- Howard Hughes Medical InstituteUniversity of WashingtonSeattleWashingtonUSA
| | - Justas Dauparas
- Department of Biochemistry and Institute for Protein DesignUniversity of WashingtonSeattleWashingtonUSA
| | - Sanaa Mansoor
- Department of Biochemistry and Institute for Protein DesignUniversity of WashingtonSeattleWashingtonUSA
| | - Ian R. Humphreys
- Department of Biochemistry and Institute for Protein DesignUniversity of WashingtonSeattleWashingtonUSA
| | - David Baker
- Department of Biochemistry and Institute for Protein DesignUniversity of WashingtonSeattleWashingtonUSA
- Howard Hughes Medical InstituteUniversity of WashingtonSeattleWashingtonUSA
| |
Collapse
|
12
|
Kryshtafovych A, Moult J, Billings WM, Della Corte D, Fidelis K, Kwon S, Olechnovič K, Seok C, Venclovas Č, Won J. Modeling SARS-CoV-2 proteins in the CASP-commons experiment. Proteins 2021; 89:1987-1996. [PMID: 34462960 PMCID: PMC8616790 DOI: 10.1002/prot.26231] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/23/2021] [Accepted: 08/26/2021] [Indexed: 01/21/2023]
Abstract
Critical Assessment of Structure Prediction (CASP) is an organization aimed at advancing the state of the art in computing protein structure from sequence. In the spring of 2020, CASP launched a community project to compute the structures of the most structurally challenging proteins coded for in the SARS-CoV-2 genome. Forty-seven research groups submitted over 3000 three-dimensional models and 700 sets of accuracy estimates on 10 proteins. The resulting models were released to the public. CASP community members also worked together to provide estimates of local and global accuracy and identify structure-based domain boundaries for some proteins. Subsequently, two of these structures (ORF3a and ORF8) have been solved experimentally, allowing assessment of both model quality and the accuracy estimates. Models from the AlphaFold2 group were found to have good agreement with the experimental structures, with main chain GDT_TS accuracy scores ranging from 63 (a correct topology) to 87 (competitive with experiment).
Collapse
Affiliation(s)
| | - John Moult
- Department of Cell Biology and Molecular genetics, Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, USA
| | - Wendy M Billings
- Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA
| | - Dennis Della Corte
- Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, Davis, California, USA
| | - Sohee Kwon
- Department of Chemistry, Seoul National University, Seoul, South Korea
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, South Korea
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Jonghun Won
- Department of Chemistry, Seoul National University, Seoul, South Korea
| | | |
Collapse
|
13
|
Wang L, Liu J, Xia Y, Xu J, Zhou X, Zhang G. Distance-guided protein folding based on generalized descent direction. Brief Bioinform 2021; 22:6341661. [PMID: 34355233 DOI: 10.1093/bib/bbab296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 06/30/2021] [Accepted: 07/12/2021] [Indexed: 12/25/2022] Open
Abstract
Advances in the prediction of the inter-residue distance for a protein sequence have increased the accuracy to predict the correct folds of proteins with distance information. Here, we propose a distance-guided protein folding algorithm based on generalized descent direction, named GDDfold, which achieves effective structural perturbation and potential minimization in two stages. In the global stage, random-based direction is designed using evolutionary knowledge, which guides conformation population to cross potential barriers and explore conformational space rapidly in a large range. In the local stage, locally rugged potential landscape can be explored with the aid of conjugate-based direction integrated into a specific search strategy, which can improve the exploitation ability. GDDfold is tested on 347 proteins of a benchmark set, 24 template-free modeling (FM) approaches targets of CASP13 and 20 FM targets of CASP14. Results show that GDDfold correctly folds [template modeling (TM) score ≥ = 0.5] 316 out of 347 proteins, where 65 proteins have TM scores that are greater than 0.8, and significantly outperforms Rosetta-dist (distance-assisted fragment assembly method) and L-BFGSfold (distance geometry optimization method). On CASP FM targets, GDDfold is comparable with five state-of-the-art full-version methods, namely, Quark, RaptorX, Rosetta, MULTICOM and trRosetta in the CASP 13 and 14 server groups.
Collapse
Affiliation(s)
- Liujing Wang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jiakang Xu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Michigan USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
14
|
Xia YH, Peng CX, Zhou XG, Zhang GJ. A Sequential Niche Multimodal Conformational Sampling Algorithm for Protein Structure Prediction. Bioinformatics 2021; 37:4357-4365. [PMID: 34245242 DOI: 10.1093/bioinformatics/btab500] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 06/23/2021] [Accepted: 07/05/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Massive local minima on the protein energy landscape often cause traditional conformational sampling algorithms to be easily trapped in local basin regions, because they find it difficult to overcome high-energy barriers. Also, the lowest energy conformation may not correspond to the native structure due to the inaccuracy of energy models. This study investigates whether these two problems can be alleviated by a sequential niche technique without loss of accuracy. RESULTS A sequential niche multimodal conformational sampling algorithm for protein structure prediction (SNfold) is proposed in this study. In SNfold, a derating function is designed based on the knowledge learned from the previous sampling and used to construct a series of sampling-guided energy functions. These functions then help the sampling algorithm overcome high-energy barriers and avoid the re-sampling of the explored regions. In inaccurate protein energy models, the high-energy conformation that may correspond to the native structure can be sampled with successively updated sampling-guided energy functions. The proposed SNfold is tested on 300 benchmark proteins, 24 CASP13 and 19 CASP14 FM targets. Results show that SNfold correctly folds (TM-score ≥ 0.5) 231 out of 300 proteins. In particular, compared with Rosetta restrained by distance (Rosetta-dist), SNfold achieves higher average TM-score and improves the sampling efficiency by more than 100 times. On several CASP FM targets, SNfold also shows good performance compared with four state-of-the-art servers in CASP. As a plug-in conformational sampling algorithm, SNfold can be extended to other protein structure prediction methods. AVAILABILITY The source code and executable versions are freely available at https://github.com/iobio-zjut/SNfold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| |
Collapse
|
15
|
Shuvo MH, Gulfam M, Bhattacharya D. DeepRefiner: high-accuracy protein structure refinement by deep network calibration. Nucleic Acids Res 2021; 49:W147-W152. [PMID: 33999209 PMCID: PMC8262753 DOI: 10.1093/nar/gkab361] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 04/18/2021] [Accepted: 04/23/2021] [Indexed: 12/20/2022] Open
Abstract
The DeepRefiner webserver, freely available at http://watson.cse.eng.auburn.edu/DeepRefiner/, is an interactive and fully configurable online system for high-accuracy protein structure refinement. Fuelled by deep learning, DeepRefiner offers the ability to leverage cutting-edge deep neural network architectures which can be calibrated for on-demand selection of adventurous or conservative refinement modes targeted at degree or consistency of refinement. The method has been extensively tested in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments under the group name 'Bhattacharya-Server' and was officially ranked as the No. 2 refinement server in CASP13 (second only to 'Seok-server' and outperforming all other refinement servers) and No. 2 refinement server in CASP14 (second only to 'FEIG-S' and outperforming all other refinement servers including 'Seok-server'). The DeepRefiner web interface offers a number of convenient features, including (i) fully customizable refinement job submission and validation; (ii) automated job status update, tracking, and notifications; (ii) interactive and interpretable web-based results retrieval with quantitative and visual analysis and (iv) extensive help information on job submission and results interpretation via web-based tutorial and help tooltips.
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Muhammad Gulfam
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| |
Collapse
|
16
|
Jing X, Xu J. Fast and effective protein model refinement using deep graph neural networks. NATURE COMPUTATIONAL SCIENCE 2021; 1:462-469. [PMID: 35321360 DOI: 10.1038/s43588-021-00098-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Protein model refinement is the last step applied to improve the quality of a predicted protein model. Currently the most successful refinement methods rely on extensive conformational sampling and thus, take hours or days to refine even a single protein model. Here we propose a fast and effective model refinement method that applies GNN (graph neural networks) to predict refined inter-atom distance probability distribution from an initial model and then rebuilds 3D models from the predicted distance distribution. Tested on the CASP (Critical Assessment of Structure Prediction) refinement targets, our method has comparable accuracy as two leading human groups Feig and Baker, but runs substantially faster. Our method may refine one protein model within ~11 minutes on 1 CPU while Baker needs ~30 hours on 60 CPUs and Feig needs ~16 hours on 1 GPU. Finally, our study shows that GNN outperforms ResNet (convolutional residual neural networks) for model refinement when very limited conformational sampling is allowed.
Collapse
Affiliation(s)
- Xiaoyang Jing
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| |
Collapse
|
17
|
Abdin O, Kim PM. Rapid protein model refinement by deep learning. NATURE COMPUTATIONAL SCIENCE 2021; 1:456-457. [PMID: 38217116 DOI: 10.1038/s43588-021-00104-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2024]
Affiliation(s)
- Osama Abdin
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Philip M Kim
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
- Donnelly Centre, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
18
|
Zhao KL, Liu J, Zhou XG, Su JZ, Zhang Y, Zhang GJ. MMpred: a distance-assisted multimodal conformation sampling for de novo protein structure prediction. Bioinformatics 2021; 37:4350-4356. [PMID: 34185079 DOI: 10.1093/bioinformatics/btab484] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 06/22/2021] [Accepted: 06/28/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The mathematically optimal solution in computational protein folding simulations does not always correspond to the native structure, due to the imperfection of the energy force fields. There is therefore a need to search for more diverse suboptimal solutions in order to identify the states close to the native. We propose a novel multimodal optimization protocol to improve the conformation sampling efficiency and modeling accuracy of de novo protein structure folding simulations. RESULTS A distance-assisted multimodal optimization sampling algorithm, MMpred, is proposed for de novo protein structure prediction. The protocol consists of three stages. In the first modal exploration stage, a structural similarity evaluation model DMscore is designed to control the diversity of conformations, generating a population of diverse structures in different low-energy basins. In the second modal maintaining stage, an adaptive clustering algorithm MNDcluster is proposed to divide the populations and merge the modal by adjusting the annealing temperature to locate the promising basins. In the last stage of modal exploitation, a greedy search strategy is used to accelerate the convergence of the modal. Distance constraint information is used to construct the conformation scoring model to guide sampling. MMpred is tested on 320 non-redundant proteins, where MMpred obtains models with TM-score ≥ 0.5 on 268 cases, which is 20.3% higher than that of Rosetta guided with the same distance constraints. In addition, on 320 benchmark proteins, the average TM-score of the enhanced version of MMpred (E-MMpred) is 0.732 on the best model, which is comparable to trRosetta (0.730). AVAILABILITY The source code and executable are freely available at https://github.com/iobio-zjut/MMpred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, MI 48109-2218, USA
| | - Jian-Zhong Su
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325011, Zhejiang, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, MI 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
19
|
Heo L, Park S, Seok C. GalaxyWater-wKGB: Prediction of Water Positions on Protein Structure Using wKGB Statistical Potential. J Chem Inf Model 2021; 61:2283-2293. [PMID: 33938216 DOI: 10.1021/acs.jcim.0c01434] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Proteins fold and function in water, and protein-water interactions play important roles in protein structure and function. In computational studies on protein structure and interaction, the effect of water is considered either implicitly or explicitly. Implicit water models are frequently used in protein structure prediction and docking because they are computationally much more efficient than explicit water models, which are often employed in molecular dynamics (MD) simulations. However, implicit water models that treat water as a continuous solvent medium cannot account for specific atomistic protein-water interactions that are critical for structure formation and interactions with other molecules. Various methods for predicting water molecules that form specific atomistic interactions with proteins have been developed. Methods involving MD simulations or the integral equation theory tend to produce more accurate results at a higher computational cost than simple geometry- or energy-based methods. Here, we present a novel method for predicting water positions on a protein surface called GalaxyWater-wKGB, which is based on a statistical potential, a water knowledge-based potential based on the generalized Born model (wKGB). This method is accurate and rapid because it does not require conformational sampling or iterative computation owing to the effective statistical treatment employed to derive the potential. The statistical potential describes specific protein atom-water interactions more accurately than conventional potentials by considering the dependence on the degree of solvent accessibility of protein atoms as well as on protein atom-water distances and orientations. The introduction of solvent accessibility allows effective consideration of competing nonspecific protein-water and intraprotein interactions. When tested on high-resolution protein crystal structures, this method could recover similar or larger fractions of crystallographic water 180 times faster than the sophisticated integral equation theory, 3D-RISM. A web service of this water prediction method is freely available at http://galaxy.seoklab.org/wkgb.
Collapse
Affiliation(s)
- Lim Heo
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| | - Sangwoo Park
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| |
Collapse
|
20
|
Kapla J, Rodríguez-Espigares I, Ballante F, Selent J, Carlsson J. Can molecular dynamics simulations improve the structural accuracy and virtual screening performance of GPCR models? PLoS Comput Biol 2021; 17:e1008936. [PMID: 33983933 PMCID: PMC8186765 DOI: 10.1371/journal.pcbi.1008936] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 06/08/2021] [Accepted: 04/02/2021] [Indexed: 01/14/2023] Open
Abstract
The determination of G protein-coupled receptor (GPCR) structures at atomic resolution has improved understanding of cellular signaling and will accelerate the development of new drug candidates. However, experimental structures still remain unavailable for a majority of the GPCR family. GPCR structures and their interactions with ligands can also be modelled computationally, but such predictions have limited accuracy. In this work, we explored if molecular dynamics (MD) simulations could be used to refine the accuracy of in silico models of receptor-ligand complexes that were submitted to a community-wide assessment of GPCR structure prediction (GPCR Dock). Two simulation protocols were used to refine 30 models of the D3 dopamine receptor (D3R) in complex with an antagonist. Close to 60 μs of simulation time was generated and the resulting MD refined models were compared to a D3R crystal structure. In the MD simulations, the receptor models generally drifted further away from the crystal structure conformation. However, MD refinement was able to improve the accuracy of the ligand binding mode. The best refinement protocol improved agreement with the experimentally observed ligand binding mode for a majority of the models. Receptor structures with improved virtual screening performance, which was assessed by molecular docking of ligands and decoys, could also be identified among the MD refined models. Application of weak restraints to the transmembrane helixes in the MD simulations further improved predictions of the ligand binding mode and second extracellular loop. These results provide guidelines for application of MD refinement in prediction of GPCR-ligand complexes and directions for further method development.
Collapse
Affiliation(s)
- Jon Kapla
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Ismael Rodríguez-Espigares
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences of Pompeu Fabra University (UPF), Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
| | - Flavio Ballante
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Jana Selent
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences of Pompeu Fabra University (UPF), Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
| | - Jens Carlsson
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
21
|
Protein Structure Refinement Using Multi-Objective Particle Swarm Optimization with Decomposition Strategy. Int J Mol Sci 2021; 22:ijms22094408. [PMID: 33922489 PMCID: PMC8122964 DOI: 10.3390/ijms22094408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 04/16/2021] [Accepted: 04/20/2021] [Indexed: 12/02/2022] Open
Abstract
Protein structure refinement is a crucial step for more accurate protein structure predictions. Most existing approaches treat it as an energy minimization problem to intuitively improve the quality of initial models by searching for structures with lower energy. Considering that a single energy function could not reflect the accurate energy landscape of all the proteins, our previous AIR 1.0 pipeline uses multiple energy functions to realize a multi-objectives particle swarm optimization-based model refinement. It is expected to provide a general balanced conformation search protocol guided from different energy evaluations. However, AIR 1.0 solves the multi-objective optimization problem as a whole, which could not result in good solution diversity and convergence on some targets. In this study, we report a decomposition-based method AIR 2.0, which is an updated version of AIR, for protein structure refinement. AIR 2.0 decomposes a multi-objective optimization problem into a number of subproblems and optimizes them simultaneously using particle swarm optimization algorithm. The solutions yielded by AIR 2.0 show better convergence and diversity compared to its previous version, which increases the possibilities of digging out better structure conformations. The experimental results on CASP13 refinement benchmark targets and blind tests in CASP 14 demonstrate the efficacy of AIR 2.0.
Collapse
|
22
|
Hiranuma N, Park H, Baek M, Anishchenko I, Dauparas J, Baker D. Improved protein structure refinement guided by deep learning based accuracy estimation. Nat Commun 2021; 12:1340. [PMID: 33637700 PMCID: PMC7910447 DOI: 10.1038/s41467-021-21511-x] [Citation(s) in RCA: 128] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 01/18/2021] [Indexed: 11/22/2022] Open
Abstract
We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models and uses these predictions to guide Rosetta protein structure refinement. The network uses 3D convolutions to evaluate local atomic environments followed by 2D convolutions to provide their global contexts and outperforms other methods that similarly predict the accuracy of protein structure models. Overall accuracy predictions for X-ray and cryoEM structures in the PDB correlate with their resolution, and the network should be broadly useful for assessing the accuracy of both predicted structure models and experimentally determined structures and identifying specific regions likely to be in error. Incorporation of the accuracy predictions at multiple stages in the Rosetta refinement protocol considerably increased the accuracy of the resulting protein structure models, illustrating how deep learning can improve search for global energy minima of biomolecules.
Collapse
Affiliation(s)
- Naozumi Hiranuma
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Washington, WA, USA
| | - Hahnbeom Park
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Minkyung Baek
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Ivan Anishchenko
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Justas Dauparas
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - David Baker
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Washington, WA, USA.
| |
Collapse
|
23
|
Phan IQ, Subramanian S, Kim D, Murphy M, Pettie D, Carter L, Anishchenko I, Barrett LK, Craig J, Tillery L, Shek R, Harrington WE, Koelle DM, Wald A, Veesler D, King N, Boonyaratanakornkit J, Isoherranen N, Greninger AL, Jerome KR, Chu H, Staker B, Stewart L, Myler PJ, Van Voorhis WC. In silico detection of SARS-CoV-2 specific B-cell epitopes and validation in ELISA for serological diagnosis of COVID-19. Sci Rep 2021; 11:4290. [PMID: 33619344 PMCID: PMC7900118 DOI: 10.1038/s41598-021-83730-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Accepted: 02/03/2021] [Indexed: 02/07/2023] Open
Abstract
Rapid generation of diagnostics is paramount to understand epidemiology and to control the spread of emerging infectious diseases such as COVID-19. Computational methods to predict serodiagnostic epitopes that are specific for the pathogen could help accelerate the development of new diagnostics. A systematic survey of 27 SARS-CoV-2 proteins was conducted to assess whether existing B-cell epitope prediction methods, combined with comprehensive mining of sequence databases and structural data, could predict whether a particular protein would be suitable for serodiagnosis. Nine of the predictions were validated with recombinant SARS-CoV-2 proteins in the ELISA format using plasma and sera from patients with SARS-CoV-2 infection, and a further 11 predictions were compared to the recent literature. Results appeared to be in agreement with 12 of the predictions, in disagreement with 3, while a further 5 were deemed inconclusive. We showed that two of our top five candidates, the N-terminal fragment of the nucleoprotein and the receptor-binding domain of the spike protein, have the highest sensitivity and specificity and signal-to-noise ratio for detecting COVID-19 sera/plasma by ELISA. Mixing the two antigens together for coating ELISA plates led to a sensitivity of 94% (N = 80 samples from persons with RT-PCR confirmed SARS-CoV-2 infection), and a specificity of 97.2% (N = 106 control samples).
Collapse
Affiliation(s)
- Isabelle Q Phan
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, WA, USA
| | - Sandhya Subramanian
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, WA, USA
| | - David Kim
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Michael Murphy
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
| | - Deleah Pettie
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
| | - Lauren Carter
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
| | - Ivan Anishchenko
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
| | - Lynn K Barrett
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Division of Allergy and Infectious Diseases, Department of Medicine, Center for Emerging and Re-Emerging Infectious Diseases (CERID), University of Washington, Seattle, WA, USA
| | - Justin Craig
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Division of Allergy and Infectious Diseases, Department of Medicine, Center for Emerging and Re-Emerging Infectious Diseases (CERID), University of Washington, Seattle, WA, USA
| | - Logan Tillery
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Division of Allergy and Infectious Diseases, Department of Medicine, Center for Emerging and Re-Emerging Infectious Diseases (CERID), University of Washington, Seattle, WA, USA
| | - Roger Shek
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Division of Allergy and Infectious Diseases, Department of Medicine, Center for Emerging and Re-Emerging Infectious Diseases (CERID), University of Washington, Seattle, WA, USA
| | - Whitney E Harrington
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, WA, USA
- Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - David M Koelle
- Division of Allergy and Infectious Diseases, Department of Medicine, Center for Emerging and Re-Emerging Infectious Diseases (CERID), University of Washington, Seattle, WA, USA
- Vaccine and Infectious Diseases Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Benaroya Research Institute, Seattle, WA, USA
- Department of Global Health, University of Washington, Seattle, WA, USA
| | - Anna Wald
- Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington, Seattle, WA, USA
- Vaccine and Infectious Diseases Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - David Veesler
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Neil King
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
| | - Jim Boonyaratanakornkit
- Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington, Seattle, WA, USA
- Vaccine and Infectious Diseases Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Nina Isoherranen
- Department of Pharmaceutics, University of Washington, Seattle, WA, USA
| | - Alexander L Greninger
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Keith R Jerome
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Helen Chu
- Division of Allergy and Infectious Diseases, Department of Medicine, Center for Emerging and Re-Emerging Infectious Diseases (CERID), University of Washington, Seattle, WA, USA
| | - Bart Staker
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, WA, USA
| | - Lance Stewart
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
| | - Peter J Myler
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, WA, USA
- Department of Medical Education and Biomedical Informatics & Department of Global Health, University of Washington, Seattle, WA, USA
| | - Wesley C Van Voorhis
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
- Department of Microbiology, University of Washington, Seattle, WA, USA.
- Department of Global Health, University of Washington, Seattle, WA, USA.
| |
Collapse
|
24
|
Mahmoudi Gomari M, Rostami N, Omidi-Ardali H, Arab SS. Insight into molecular characteristics of SARS-CoV-2 spike protein following D614G point mutation, a molecular dynamics study. J Biomol Struct Dyn 2021; 40:5634-5642. [PMID: 33475020 PMCID: PMC7832383 DOI: 10.1080/07391102.2021.1872418] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Undoubtedly, the SARS-CoV-2 has become a major concern for all societies due to its catastrophic effects on public health. In addition, mutations and changes in the structure of the virus make it difficult to design effective treatment. Moreover, the amino acid sequence of a protein is a major factor in the formation of the second and tertiary structure in a protein. Amino acid replacement can have noticeable effects on the folding of a protein, especially if an asymmetric change (substitution of polar residue with non-polar, charged with an uncharged, positive charge with a negative charge, or large residue with small residue) occurs. D614G as a spike mutant of SARS-CoV-2 previously identified as an associated risk factor with a high mortality rate of this virus. Using structural bioinformatics, our group determined that D614G mutation could cause extensive changes in SARS-CoV-2 behavior including the secondary structure, receptor binding pattern, 3D conformation, and stability of it. Communicated by Ramaswamy H. Sarma
Collapse
Affiliation(s)
| | - Neda Rostami
- Department of Chemical Engineering, Faculty of Engineering, Arak University, Iran
| | - Hossein Omidi-Ardali
- Clinical Biochemistry Research Center, Basic Health Sciences Institute, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | - Seyed Shahriar Arab
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
25
|
Jing X, Xu J. Improved Protein Model Quality Assessment By Integrating Sequential And Pairwise Features Using Deep Learning. Bioinformatics 2020; 36:5361-5367. [PMID: 33325480 PMCID: PMC8016469 DOI: 10.1093/bioinformatics/btaa1037] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 11/27/2020] [Accepted: 12/06/2020] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Accurately estimating protein model quality in the absence of experimental structure is not only important for model evaluation and selection, but also useful for model refinement. Progress has been steadily made by introducing new features and algorithms (especially deep neural networks), but the accuracy of quality assessment (QA) is still not very satisfactory, especially local QA on hard protein targets. RESULTS We propose a new single-model-based QA method ResNetQA for both local and global quality assessment. Our method predicts model quality by integrating sequential and pairwise features using a deep neural network composed of both 1 D and 2 D convolutional residual neural networks (ResNet). The 2 D ResNet module extracts useful information from pairwise features such as model-derived distance maps, co-evolution information, and predicted distance potential from sequences. The 1 D ResNet is used to predict local (global) model quality from sequential features and pooled pairwise information generated by 2 D ResNet. Tested on the CASP12 and CASP13 datasets, our experimental results show that our method greatly outperforms existing state-of-the-art methods. Our ablation studies indicate that the 2 D ResNet module and pairwise features play an important role in improving model quality assessment. AVAILABILITY https://github.com/AndersJing/ResNetQA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoyang Jing
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| |
Collapse
|
26
|
Hameduh T, Haddad Y, Adam V, Heger Z. Homology modeling in the time of collective and artificial intelligence. Comput Struct Biotechnol J 2020; 18:3494-3506. [PMID: 33304450 PMCID: PMC7695898 DOI: 10.1016/j.csbj.2020.11.007] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 12/12/2022] Open
Abstract
Homology modeling is a method for building protein 3D structures using protein primary sequence and utilizing prior knowledge gained from structural similarities with other proteins. The homology modeling process is done in sequential steps where sequence/structure alignment is optimized, then a backbone is built and later, side-chains are added. Once the low-homology loops are modeled, the whole 3D structure is optimized and validated. In the past three decades, a few collective and collaborative initiatives allowed for continuous progress in both homology and ab initio modeling. Critical Assessment of protein Structure Prediction (CASP) is a worldwide community experiment that has historically recorded the progress in this field. Folding@Home and Rosetta@Home are examples of crowd-sourcing initiatives where the community is sharing computational resources, whereas RosettaCommons is an example of an initiative where a community is sharing a codebase for the development of computational algorithms. Foldit is another initiative where participants compete with each other in a protein folding video game to predict 3D structure. In the past few years, contact maps deep machine learning was introduced to the 3D structure prediction process, adding more information and increasing the accuracy of models significantly. In this review, we will take the reader in a journey of exploration from the beginnings to the most recent turnabouts, which have revolutionized the field of homology modeling. Moreover, we discuss the new trends emerging in this rapidly growing field.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Vojtech Adam
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| |
Collapse
|
27
|
Mashtalir N, Suzuki H, Farrell DP, Sankar A, Luo J, Filipovski M, D'Avino AR, St Pierre R, Valencia AM, Onikubo T, Roeder RG, Han Y, He Y, Ranish JA, DiMaio F, Walz T, Kadoch C. A Structural Model of the Endogenous Human BAF Complex Informs Disease Mechanisms. Cell 2020; 183:802-817.e24. [PMID: 33053319 PMCID: PMC7717177 DOI: 10.1016/j.cell.2020.09.051] [Citation(s) in RCA: 110] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 07/15/2020] [Accepted: 09/08/2020] [Indexed: 02/06/2023]
Abstract
Mammalian SWI/SNF complexes are ATP-dependent chromatin remodeling complexes that regulate genomic architecture. Here, we present a structural model of the endogenously purified human canonical BAF complex bound to the nucleosome, generated using cryoelectron microscopy (cryo-EM), cross-linking mass spectrometry, and homology modeling. BAF complexes bilaterally engage the nucleosome H2A/H2B acidic patch regions through the SMARCB1 C-terminal α-helix and the SMARCA4/2 C-terminal SnAc/post-SnAc regions, with disease-associated mutations in either causing attenuated chromatin remodeling activities. Further, we define changes in BAF complex architecture upon nucleosome engagement and compare the structural model of endogenous BAF to those of related SWI/SNF-family complexes. Finally, we assign and experimentally interrogate cancer-associated hot-spot mutations localizing within the endogenous human BAF complex, identifying those that disrupt BAF subunit-subunit and subunit-nucleosome interfaces in the nucleosome-bound conformation. Taken together, this integrative structural approach provides important biophysical foundations for understanding the mechanisms of BAF complex function in normal and disease states.
Collapse
Affiliation(s)
- Nazar Mashtalir
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Hiroshi Suzuki
- Laboratory of Molecular Electron Microscopy, The Rockefeller University, New York, NY, USA
| | - Daniel P Farrell
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Akshay Sankar
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jie Luo
- Institute for Systems Biology, Seattle, WA, USA
| | - Martin Filipovski
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Andrew R D'Avino
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Roodolph St Pierre
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Chemical Biology Program, Harvard Medical School, Boston, MA, USA
| | - Alfredo M Valencia
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Chemical Biology Program, Harvard Medical School, Boston, MA, USA
| | - Takashi Onikubo
- Laboratory of Biochemistry and Molecular Biology, The Rockefeller University, New York, NY, USA
| | - Robert G Roeder
- Laboratory of Biochemistry and Molecular Biology, The Rockefeller University, New York, NY, USA
| | - Yan Han
- Department of Molecular Biosciences, Northwestern University, Evanston, IL, USA
| | - Yuan He
- Department of Molecular Biosciences, Northwestern University, Evanston, IL, USA
| | | | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA, USA.
| | - Thomas Walz
- Laboratory of Molecular Electron Microscopy, The Rockefeller University, New York, NY, USA.
| | - Cigall Kadoch
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
28
|
Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci U S A 2020; 117:1496-1503. [PMID: 31896580 DOI: 10.1073/pnas.1914677117] [Citation(s) in RCA: 867] [Impact Index Per Article: 173.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
The prediction of interresidue contacts and distances from coevolutionary data using deep learning has considerably advanced protein structure prediction. Here, we build on these advances by developing a deep residual network for predicting interresidue orientations, in addition to distances, and a Rosetta-constrained energy-minimization protocol for rapidly and accurately generating structure models guided by these restraints. In benchmark tests on 13th Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP13)- and Continuous Automated Model Evaluation (CAMEO)-derived sets, the method outperforms all previously described structure-prediction methods. Although trained entirely on native proteins, the network consistently assigns higher probability to de novo-designed proteins, identifying the key fold-determining residues and providing an independent quantitative measure of the "ideality" of a protein structure. The method promises to be useful for a broad range of protein structure prediction and design problems.
Collapse
|
29
|
Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)-Round XIII. Proteins 2019; 87:1011-1020. [PMID: 31589781 DOI: 10.1002/prot.25823] [Citation(s) in RCA: 301] [Impact Index Per Article: 50.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 09/25/2019] [Accepted: 09/27/2019] [Indexed: 12/24/2022]
Abstract
CASP (critical assessment of structure prediction) assesses the state of the art in modeling protein structure from amino acid sequence. The most recent experiment (CASP13 held in 2018) saw dramatic progress in structure modeling without use of structural templates (historically "ab initio" modeling). Progress was driven by the successful application of deep learning techniques to predict inter-residue distances. In turn, these results drove dramatic improvements in three-dimensional structure accuracy: With the proviso that there are an adequate number of sequences known for the protein family, the new methods essentially solve the long-standing problem of predicting the fold topology of monomeric proteins. Further, the number of sequences required in the alignment has fallen substantially. There is also substantial improvement in the accuracy of template-based models. Other areas-model refinement, accuracy estimation, and the structure of protein assemblies-have again yielded interesting results. CASP13 placed increased emphasis on the use of sparse data together with modeling and chemical crosslinking, SAXS, and NMR all yielded more mature results. This paper summarizes the key outcomes of CASP13. The special issue of PROTEINS contains papers describing the CASP13 assessments in each modeling category and contributions from the participants.
Collapse
Affiliation(s)
| | - Torsten Schwede
- Biozentrum & SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Maya Topf
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, UK
| | | | - John Moult
- Institute for Bioscience and Biotechnology Research, Rockville, Maryland.,Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland
| |
Collapse
|