1
|
de Bruyn E, Dorn AE, Rossetti G, Fernandez C, Outeiro TF, Schulz JB, Carloni P. Impact of Phosphorylation on the Physiological Form of Human alpha-Synuclein in Aqueous Solution. J Chem Inf Model 2024. [PMID: 39462994 DOI: 10.1021/acs.jcim.4c01172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Serine 129 can be phosphorylated in pathological inclusions formed by the intrinsically disordered protein human α-synuclein (AS), a key player in Parkinson's disease and other synucleinopathies. Here, molecular simulations provide insight into the structural ensemble of phosphorylated AS. The simulations allow us to suggest that phosphorylation significantly impacts the structural content of the physiological AS conformational ensemble in aqueous solution, as the phosphate group is mostly solvated. The hydrophobic region of AS contains β-hairpin structures, which may increase the propensity of the protein to undergo amyloid formation, as seen in the nonphysiological (nonacetylated) form of the protein in a recent molecular simulation study. Our findings are consistent with existing experimental data with the caveat of the observed limitations of the force field for the phosphorylated moiety.
Collapse
Affiliation(s)
- Emile de Bruyn
- Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
- Department of Physics, RWTH Aachen University, 52062 Aachen, Germany
| | - Anton Emil Dorn
- Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
- Faculty of Biology, University of Duisburg-Essen, 45141 Essen, Germany
| | - Giulia Rossetti
- Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
- Computational Biomedicine (IAS-5/INM-9), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
- Department of Neurology, RWTH Aachen University, 52074 Aachen, Germany
| | - Claudio Fernandez
- Max Planck Laboratory for Structural Biology, Chemistry and Molecular Biophysics of Rosario (MPLbioR, UNR-MPINAT), Partner of the Max Planck Institute for Multidisciplinary Sciences (MPINAT, MPG), Centro de Estudios Interdisciplinarios, Universidad Nacional de Rosario, S2002LRK Rosario, Argentina
- Department of NMR-based Structural Biology, Max Planck Institute for Multidisciplinary Sciences, 37077 Göttingen, Germany
| | - Tiago F Outeiro
- Department of Experimental Neurodegeneration, Center for Biostructural Imaging of Neurodegeneration, University Medical Center Göttingen, 37075 Göttingen, Germany
- Max Planck Institute for Multidisciplinary Sciences, 37075 Göttingen, Germany
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| | - Jörg B Schulz
- Department of Physics, RWTH Aachen University, 52062 Aachen, Germany
- Department of Neurology, RWTH Aachen University, 52074 Aachen, Germany
- JARA Brain Institute Molecular Neuroscience and Neuroimaging (INM-11), Research Centre Jülich and RWTH Aachen University, 52074 Aachen, Germany
| | - Paolo Carloni
- Department of Physics, RWTH Aachen University, 52062 Aachen, Germany
- Computational Biomedicine (IAS-5/INM-9), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
2
|
Li H, Tuttle MD, Zilm KW, Batista VS. Rapid Quantification of Protein Secondary Structure Composition from a Single Unassigned 1D 13C Nuclear Magnetic Resonance Spectrum. J Am Chem Soc 2024; 146:27542-27554. [PMID: 39322561 DOI: 10.1021/jacs.4c08300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
The function of a protein is predicated upon its three-dimensional fold. Representing its complex structure as a series of repeating secondary structural elements is one of the most useful ways by which we study, characterize, and visualize a protein. Consequently, experimental methods that quantify the secondary structure content allow us to connect a protein's structure to its function. Here, we introduce an automated gradient descent-based method we refer to as secondary-structure distribution by NMR that allows for rapid quantification of the protein secondary structure composition of a protein from a single, 1D 13C NMR spectrum without chemical shift assignments. The analysis of nearly 900 proteins with known structure and chemical shifts demonstrates the capabilities of our approach. We show that these results rival alternative techniques such as FT-IR and circular dichroism that are commonly used to estimate secondary structure compositions. The resulting method requires only the primary sequence of the protein and its referenced 13C NMR spectrum. Each residue is modeled in an ensemble of secondary structures with percentage contributions from random coil, α-helix, and β-sheet secondary structures obtained by minimizing the difference between a simulated and experimental 1D 13C NMR spectrum. The capabilities of the method are demonstrated as applied to samples at natural abundance or enriched in 13C, acquired by either solution or solid-state NMR, and even on low magnetic field benchtop NMR spectrometers. This approach allows for rapid characterization of protein secondary structure across traditionally challenging to characterize states including liquid-liquid phase-separated, membrane-bound, or aggregated states.
Collapse
Affiliation(s)
- Haote Li
- Department of Chemistry, Yale University, New Haven, Connecticut 06520, United States
| | - Marcus D Tuttle
- Department of Chemistry, Yale University, New Haven, Connecticut 06520, United States
| | - Kurt W Zilm
- Department of Chemistry, Yale University, New Haven, Connecticut 06520, United States
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven, Connecticut 06520, United States
| |
Collapse
|
3
|
Benavides TL, Montelione GT. Integrative Modeling of Protein-Polypeptide Complexes by Bayesian Model Selection using AlphaFold and NMR Chemical Shift Perturbation Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.19.613999. [PMID: 39345459 PMCID: PMC11430059 DOI: 10.1101/2024.09.19.613999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Protein-polypeptide interactions, including those involving intrinsically-disordered peptides and intrinsically-disordered regions of protein binding partners, are crucial for many biological functions. However, experimental structure determination of protein-peptide complexes can be challenging. Computational methods, while promising, generally require experimental data for validation and refinement. Here we present CSP_Rank, an integrated modeling approach to determine the structures of protein-peptide complexes. This method combines AlphaFold2 (AF2) enhanced sampling methods with a Bayesian conformational selection process based on experimental Nuclear Magnetic Resonance (NMR) Chemical Shift Perturbation (CSP) data and AF2 confidence metrics. Using a curated dataset of 108 protein-peptide complexes from the Biological Magnetic Resonance Data Bank (BMRB), we observe that while AF2 typically yields models with excellent consistency with experimental CSP data, applying enhanced sampling followed by data-guided conformational selection routinely results in ensembles of structures with improved agreement with NMR observables. For two systems, we cross-validate the CSP-selected models using independently acquired nuclear Overhauser effect (NOE) NMR data and demonstrate how CSP and NMR can be combined using our Bayesian framework for model selection. CSP_Rank is a novel method for integrative modeling of protein-peptide complexes and has broad implications for studies of protein-peptide interactions and aiding in understanding their biological functions.
Collapse
Affiliation(s)
- Tiburon L. Benavides
- Department of Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA
| | - Gaetano T. Montelione
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA
| |
Collapse
|
4
|
Stöckelmaier J, Oostenbrink C. Conformational dependence of chemical shifts in the proline rich region of TAU protein. Phys Chem Chem Phys 2024; 26:23856-23870. [PMID: 39230359 PMCID: PMC11373535 DOI: 10.1039/d4cp02484b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Nuclear magnetic resonance (NMR) is an important method for structure elucidation of proteins, as it is an easily accessible and well understood method. To characterize intrinsically disordered proteins (IDPs) using computational models it is often necessary to analyze and integrate calculated observables with measurements derived from solution NMR experiments. In this case study, we investigate whether and which chemical shifts of the proline-rich region of Tau protein (residues 210-240) offer information about the conformational state to distinguish two different microscopic conformers. Using multiple computational methods, the chemical shifts of these two conformationally distinct structures are calculated. The different methods are compared regarding their ability to compute chemical shifts that are sensitive to conformational change. The analysis of the data shows significant differences between the available methods and gives suggestions for an improved pathway for ensemble reweighting. Nevertheless, the variation in the chemical shifts which are predicted for configurations that are commonly considered to belong to the same conformation is such that this obscures a comparison between distinct conformations. Conformational sensitivity is found for up to ∼26% of calculated chemical shifts. It is found to be unrelated to the atom element and has a minor relationship with the change in the corresponding ϕ dihedral angle.
Collapse
Affiliation(s)
- Johannes Stöckelmaier
- Institute of Molecular Modeling and Simulation (MMS), University of Natural Resources and Life Sciences, Vienna, Austria.
| | - Chris Oostenbrink
- Institute of Molecular Modeling and Simulation (MMS), University of Natural Resources and Life Sciences, Vienna, Austria.
| |
Collapse
|
5
|
Liu ZH, Tsanai M, Zhang O, Forman-Kay J, Head-Gordon T. Computational Methods to Investigate Intrinsically Disordered Proteins and their Complexes. ARXIV 2024:arXiv:2409.02240v1. [PMID: 39279844 PMCID: PMC11398552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 09/18/2024]
Abstract
In 1999 Wright and Dyson highlighted the fact that large sections of the proteome of all organisms are comprised of protein sequences that lack globular folded structures under physiological conditions. Since then the biophysics community has made significant strides in unraveling the intricate structural and dynamic characteristics of intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs). Unlike crystallographic beamlines and their role in streamlining acquisition of structures for folded proteins, an integrated experimental and computational approach aimed at IDPs/IDRs has emerged. In this Perspective we aim to provide a robust overview of current computational tools for IDPs and IDRs, and most recently their complexes and phase separated states, including statistical models, physics-based approaches, and machine learning methods that permit structural ensemble generation and validation against many solution experimental data types.
Collapse
Affiliation(s)
- Zi Hao Liu
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Maria Tsanai
- Kenneth S. Pitzer Center for Theoretical Chemistry and Department of Chemistry, University of California, Berkeley, Berkeley, California 94720, USA
| | - Oufan Zhang
- Kenneth S. Pitzer Center for Theoretical Chemistry and Department of Chemistry, University of California, Berkeley, Berkeley, California 94720, USA
| | - Julie Forman-Kay
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Teresa Head-Gordon
- Kenneth S. Pitzer Center for Theoretical Chemistry and Department of Chemistry, University of California, Berkeley, Berkeley, California 94720, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, California 94720, USA
| |
Collapse
|
6
|
Bakker MJ, Gaffour A, Juhás M, Zapletal V, Stošek J, Bratholm LA, Pavlíková Přecechtělová J. Streamlining NMR Chemical Shift Predictions for Intrinsically Disordered Proteins: Design of Ensembles with Dimensionality Reduction and Clustering. J Chem Inf Model 2024; 64:6542-6556. [PMID: 39099394 PMCID: PMC11412307 DOI: 10.1021/acs.jcim.4c00809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/06/2024]
Abstract
By merging advanced dimensionality reduction (DR) and clustering algorithm (CA) techniques, our study advances the sampling procedure for predicting NMR chemical shifts (CS) in intrinsically disordered proteins (IDPs), making a significant leap forward in the field of protein analysis/modeling. We enhance NMR CS sampling by generating clustered ensembles that accurately reflect the different properties and phenomena encapsulated by the IDP trajectories. This investigation critically assessed different rapid CS predictors, both neural network (e.g., Sparta+ and ShiftX2) and database-driven (ProCS-15), and highlighted the need for more advanced quantum calculations and the subsequent need for more tractable-sized conformational ensembles. Although neural network CS predictors outperformed ProCS-15 for all atoms, all tools showed poor agreement with HN CSs, and the neural network CS predictors were unable to capture the influence of phosphorylated residues, highly relevant for IDPs. This study also addressed the limitations of using direct clustering with collective variables, such as the widespread implementation of the GROMOS algorithm. Clustered ensembles (CEs) produced by this algorithm showed poor performance with chemical shifts compared to sequential ensembles (SEs) of similar size. Instead, we implement a multiscale DR and CA approach and explore the challenges and limitations of applying these algorithms to obtain more robust and tractable CEs. The novel feature of this investigation is the use of solvent-accessible surface area (SASA) as one of the fingerprints for DR alongside previously investigated α carbon distance/angles or ϕ/ψ dihedral angles. The ensembles produced with SASA tSNE DR produced CEs better aligned with the experimental CS of between 0.17 and 0.36 r2 (0.18-0.26 ppm) depending on the system and replicate. Furthermore, this technique produced CEs with better agreement than traditional SEs in 85.7% of all ensemble sizes. This study investigates the quality of ensembles produced based on different input features, comparing latent spaces produced by linear vs nonlinear DR techniques and a novel integrated silhouette score scanning protocol for tSNE DR.
Collapse
Affiliation(s)
- Michael J Bakker
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
| | - Amina Gaffour
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
| | - Martin Juhás
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
- Department of Chemistry, Faculty of Science, University of Hradec Králové, Rokitanského 62, 500 03 Hradec Králové, Czech Republic
| | - Vojtěch Zapletal
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
| | - Jakub Stošek
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
- Department of Chemistry, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Lars A Bratholm
- School of Chemistry, University of Bristol, Cantock's Close, BS8 1TS Bristol, U.K
| | - Jana Pavlíková Přecechtělová
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
| |
Collapse
|
7
|
Gu X, Myung Y, Rodrigues CHM, Ascher DB. EFG-CS: Predicting chemical shifts from amino acid sequences with protein structure prediction using machine learning and deep learning models. Protein Sci 2024; 33:e5096. [PMID: 38979954 PMCID: PMC11232051 DOI: 10.1002/pro.5096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 05/06/2024] [Accepted: 06/15/2024] [Indexed: 07/10/2024]
Abstract
Nuclear magnetic resonance (NMR) crystallography is one of the main methods in structural biology for analyzing protein stereochemistry and structure. The chemical shift of the resonance frequency reflects the effect of the protons in a molecule producing distinct NMR signals in different chemical environments. Apprehending chemical shifts from NMR signals can be challenging since having an NMR structure does not necessarily provide all the required chemical shift information, making predictive models essential for accurately deducing chemical shifts, either from protein structures or, more ideally, directly from amino acid sequences. Here, we present EFG-CS, a web server that specializes in chemical shift prediction. EFG-CS employs a machine learning-based transfer prediction model for backbone atom chemical shift prediction, using ESMFold-predicted protein structures. Additionally, ESG-CS incorporates a graph neural network-based model to provide comprehensive side-chain atom chemical shift predictions. Our method demonstrated reliable performance in backbone atom prediction, achieving comparable accuracy levels with root mean square errors (RMSE) of 0.30 ppm for H, 0.22 ppm for Hα, 0.89 ppm for C, 0.89 ppm for Cα, 0.84 ppm for Cβ, and 1.69 ppm for N. Moreover, our approach also showed predictive capabilities in side-chain atom chemical shift prediction achieving RMSE values of 0.71 ppm for Hβ, 0.74-1.15 ppm for Hδ, and 0.58-0.94 ppm for Hγ, solely utilizing amino acid sequences without homology or feature curation. This work shows for the first time that generative AI protein models can predict NMR shifts nearly comparable to experimental models. This web server is freely available at https://biosig.lab.uq.edu.au/efg_cs, and the chemical shift prediction results can be downloaded in tabular format and visualized in 3D format.
Collapse
Affiliation(s)
- Xiaotong Gu
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| | - Yoochan Myung
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| | - Carlos H. M. Rodrigues
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| | - David B. Ascher
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| |
Collapse
|
8
|
Han C, Zhang D, Xia S, Zhang Y. Accurate Prediction of NMR Chemical Shifts: Integrating DFT Calculations with Three-Dimensional Graph Neural Networks. J Chem Theory Comput 2024; 20:5250-5258. [PMID: 38842505 PMCID: PMC11209944 DOI: 10.1021/acs.jctc.4c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/25/2024] [Accepted: 05/29/2024] [Indexed: 06/07/2024]
Abstract
Computer prediction of NMR chemical shifts plays an increasingly important role in molecular structure assignment and elucidation for organic molecule studies. Density functional theory (DFT) and gauge-including atomic orbital (GIAO) have established a framework to predict NMR chemical shifts but often at a significant computational expense with a limited prediction accuracy. Recent advancements in deep learning methods, especially graph neural networks (GNNs), have shown promise in improving the accuracy of predicting experimental chemical shifts, either by using 2D molecular topological features or 3D conformational representation. This study presents a new 3D GNN model to predict 1H and 13C chemical shifts, CSTShift, that combines atomic features with DFT-calculated shielding tensor descriptors, capturing both isotropic and anisotropic shielding effects. Utilizing the NMRShiftDB2 data set and conducting DFT optimization and GIAO calculations at the B3LYP/6-31G(d) level, we prepared the NMRShiftDB2-DFT data set of high-quality 3D structures and shielding tensors with corresponding experimentally measured 1H and 13C chemical shifts. The developed CSTShift models achieve the state-of-the-art prediction performance on both the NMRShiftDB2-DFT test data set and external CHESHIRE data set. Further case studies on identifying correct structures from two groups of constitutional isomers show its capability for structure assignment and elucidation. The source code and data are accessible at https://yzhang.hpc.nyu.edu/IMA.
Collapse
Affiliation(s)
- Chao Han
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Dongdong Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
9
|
Wayment-Steele HK, Otten R, Pitsawong W, Ojoawo AM, Glaser A, Calderone LA, Kern D. The conformational landscape of fold-switcher KaiB is tuned to the circadian rhythm timescale. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.03.597139. [PMID: 38895306 PMCID: PMC11185700 DOI: 10.1101/2024.06.03.597139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
How can a single protein domain encode a conformational landscape with multiple stably-folded states, and how do those states interconvert? Here, we use real-time and relaxation-dispersion NMR to characterize the conformational landscape of the circadian rhythm protein KaiB from Rhodobacter sphaeroides. Unique among known natural metamorphic proteins, this KaiB variant spontaneously interconverts between two monomeric states: the "Ground" and "Fold-switched" (FS) state. KaiB in its FS state interacts with multiple binding partners, including the central KaiC protein, to regulate circadian rhythms. We find that KaiB itself takes hours to interconvert between the Ground and FS state, underscoring the ability of a single sequence to encode the slow process needed for function. We reveal the rate-limiting step between the Ground and FS state is the cis-trans isomerization of three prolines in the fold-switching region by demonstrating interconversion acceleration by the prolyl isomerase CypA. The interconversion proceeds through a "partially disordered" (PD) state, where the C-terminal half becomes disordered while the N-terminal half remains stably folded. We discovered two additional properties of KaiB's landscape. Firstly, the Ground state experiences cold denaturation: at 4°C, the PD state becomes the majorly populated state. Secondly, the Ground state exchanges with a fourth state, the "Enigma" state, on the millisecond timescale. We combine AlphaFold2-based predictions and NMR chemical shift predictions to predict this "Enigma" state is a beta-strand register shift that eases buried charged residues, and support this structure experimentally. These results provide mechanistic insight in how evolution can design a single sequence that achieves specific timing needed for its function.
Collapse
Affiliation(s)
- Hannah K Wayment-Steele
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
| | - Renee Otten
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
- Present address: Treeline Biosciences, Watertown, MA, USA
| | - Warintra Pitsawong
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
- Present address: Biomolecular Discovery, Relay Therapeutics, Cambridge, MA, USA
| | - Adedolapo M Ojoawo
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
| | - Andrew Glaser
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
| | - Logan A Calderone
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
| | - Dorothee Kern
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
| |
Collapse
|
10
|
Smardz P, Anila MM, Rogowski P, Li MS, Różycki B, Krupa P. A Practical Guide to All-Atom and Coarse-Grained Molecular Dynamics Simulations Using Amber and Gromacs: A Case Study of Disulfide-Bond Impact on the Intrinsically Disordered Amyloid Beta. Int J Mol Sci 2024; 25:6698. [PMID: 38928405 PMCID: PMC11204378 DOI: 10.3390/ijms25126698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 06/12/2024] [Accepted: 06/13/2024] [Indexed: 06/28/2024] Open
Abstract
Intrinsically disordered proteins (IDPs) pose challenges to conventional experimental techniques due to their large-scale conformational fluctuations and transient structural elements. This work presents computational methods for studying IDPs at various resolutions using the Amber and Gromacs packages with both all-atom (Amber ff19SB with the OPC water model) and coarse-grained (Martini 3 and SIRAH) approaches. The effectiveness of these methodologies is demonstrated by examining the monomeric form of amyloid-β (Aβ42), an IDP, with and without disulfide bonds at different resolutions. Our results clearly show that the addition of a disulfide bond decreases the β-content of Aβ42; however, it increases the tendency of the monomeric Aβ42 to form fibril-like conformations, explaining the various aggregation rates observed in experiments. Moreover, analysis of the monomeric Aβ42 compactness, secondary structure content, and comparison between calculated and experimental chemical shifts demonstrates that all three methods provide a reasonable choice to study IDPs; however, coarse-grained approaches may lack some atomistic details, such as secondary structure recognition, due to the simplifications used. In general, this study not only explains the role of disulfide bonds in Aβ42 but also provides a step-by-step protocol for setting up, conducting, and analyzing molecular dynamics (MD) simulations, which is adaptable for studying other biomacromolecules, including folded and disordered proteins and peptides.
Collapse
Affiliation(s)
| | | | | | | | | | - Pawel Krupa
- Institute of Physics Polish Academy of Sciences, Al. Lotników 32/46, 02-668 Warsaw, Poland; (P.S.); (M.M.A.); (P.R.); (M.S.L.); (B.R.)
| |
Collapse
|
11
|
Li J, Liang J, Wang Z, Ptaszek AL, Liu X, Ganoe B, Head-Gordon M, Head-Gordon T. Highly Accurate Prediction of NMR Chemical Shifts from Low-Level Quantum Mechanics Calculations Using Machine Learning. J Chem Theory Comput 2024; 20:2152-2166. [PMID: 38331423 DOI: 10.1021/acs.jctc.3c01256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
Theoretical predictions of NMR chemical shifts from first-principles can greatly facilitate experimental interpretation and structure identification of molecules in gas, solution, and solid-state phases. However, accurate prediction of chemical shifts using the gold-standard coupled cluster with singles, doubles, and perturbative triple excitations [CCSD(T)] method with a complete basis set (CBS) can be prohibitively expensive. By contrast, machine learning (ML) methods offer inexpensive alternatives for chemical shift predictions but are hampered by generalization to molecules outside the original training set. Here, we propose several new ideas in ML of the chemical shift prediction for H, C, N, and O that first introduce a novel feature representation, based on the atomic chemical shielding tensors within a molecular environment using an inexpensive quantum mechanics (QM) method, and train it to predict NMR chemical shieldings of a high-level composite theory that approaches the accuracy of CCSD(T)/CBS. In addition, we train the ML model through a new progressive active learning workflow that reduces the total number of expensive high-level composite calculations required while allowing the model to continuously improve on unseen data. Furthermore, the algorithm provides an error estimation, signaling potential unreliability in predictions if the error is large. Finally, we introduce a novel approach to keep the rotational invariance of the features using tensor environment vectors (TEVs) that yields a ML model with the highest accuracy compared to a similar model using data augmentation. We illustrate the predictive capacity of the resulting inexpensive shift machine learning (iShiftML) models across several benchmarks, including unseen molecules in the NS372 data set, gas-phase experimental chemical shifts for small organic molecules, and much larger and more complex natural products in which we can accurately differentiate between subtle diastereomers based on chemical shift assignments.
Collapse
Affiliation(s)
- Jie Li
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Jiashu Liang
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Zhe Wang
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Aleksandra L Ptaszek
- Christian Doppler Laboratory for High-Content Structural Biology and Biotechnology, Department of Structural and Computational Biology, Max Perutz Laboratories, University of Vienna, Campus Vienna Biocenter 5, Vienna 1030, Austria
- Laboratory for Computer-Aided Molecular Design, Division of Medicinal Chemistry, Otto Loewi Research Center, Medical University Graz, Neue Stiftingtalstrasse 6/III, Graz 8010, Austria
| | - Xiao Liu
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Brad Ganoe
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Martin Head-Gordon
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
- Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Teresa Head-Gordon
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
- Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- Departments of Bioengineering and Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, California 94720, United States
| |
Collapse
|
12
|
Karamanos TK, Matthews S. Biomolecular NMR in the AI-assisted structural biology era: Old tricks and new opportunities. BIOCHIMICA ET BIOPHYSICA ACTA. PROTEINS AND PROTEOMICS 2024; 1872:140949. [PMID: 37572958 DOI: 10.1016/j.bbapap.2023.140949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/07/2023] [Accepted: 08/09/2023] [Indexed: 08/14/2023]
Abstract
Over the last 40 years nuclear magnetic resonance (NMR) spectroscopy has established itself as one of the most versatile techniques for the characterization of biomolecules, especially proteins. Given the molecular size limitations of NMR together with recent advances in cryo-electron microscopy and artificial intelligence-assisted protein structure prediction, the bright future of NMR in structural biology has been put into question. In this mini review we argue the contrary. We discuss the unique opportunities solution NMR offers to the protein chemist that distinguish it from all other experimental or computational methods, and how it can benefit from machine learning.
Collapse
Affiliation(s)
| | - Stephen Matthews
- Department of Life Sciences, Faculty of Natural Sciences, Imperial College London.
| |
Collapse
|
13
|
Klukowski P, Damberger FF, Allain FHT, Iwai H, Kadavath H, Ramelot TA, Montelione GT, Riek R, Güntert P. The 100-protein NMR spectra dataset: A resource for biomolecular NMR data analysis. Sci Data 2024; 11:30. [PMID: 38177162 PMCID: PMC10767026 DOI: 10.1038/s41597-023-02879-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/22/2023] [Indexed: 01/06/2024] Open
Abstract
Multidimensional NMR spectra are the basis for studying proteins by NMR spectroscopy and crucial for the development and evaluation of methods for biomolecular NMR data analysis. Nevertheless, in contrast to derived data such as chemical shift assignments in the BMRB and protein structures in the PDB databases, this primary data is in general not publicly archived. To change this unsatisfactory situation, we present a standardized set of solution NMR data comprising 1329 2-4-dimensional NMR spectra and associated reference (chemical shift assignments, structures) and derived (peak lists, restraints for structure calculation, etc.) annotations. With the 100-protein NMR spectra dataset that was originally compiled for the development of the ARTINA deep learning-based spectra analysis method, 100 protein structures can be reproduced from their original experimental data. The 100-protein NMR spectra dataset is expected to help the development of computational methods for NMR spectroscopy, in particular machine learning approaches, and enable consistent and objective comparisons of these methods.
Collapse
Affiliation(s)
- Piotr Klukowski
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
| | - Fred F Damberger
- Institute of Biochemistry, ETH Zurich, 8093, Zurich, Switzerland
| | | | - Hideo Iwai
- Institute of Biotechnology, University of Helsinki, 00100, Helsinki, Finland
| | | | - Theresa A Ramelot
- Department of Chemistry and Chemical Biology, and Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
| | - Gaetano T Montelione
- Department of Chemistry and Chemical Biology, and Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
| | - Roland Riek
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
| | - Peter Güntert
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
- Institute of Biophysical Chemistry, Goethe University, 60438, Frankfurt am Main, Germany.
- Department of Chemistry, Tokyo Metropolitan University, Hachioji, 192-0397, Tokyo, Japan.
| |
Collapse
|
14
|
Klukowski P, Riek R, Güntert P. Time-optimized protein NMR assignment with an integrative deep learning approach using AlphaFold and chemical shift prediction. SCIENCE ADVANCES 2023; 9:eadi9323. [PMID: 37992167 PMCID: PMC10664993 DOI: 10.1126/sciadv.adi9323] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 10/20/2023] [Indexed: 11/24/2023]
Abstract
Chemical shift assignment is vital for nuclear magnetic resonance (NMR)-based studies of protein structures, dynamics, and interactions, providing crucial atomic-level insight. However, obtaining chemical shift assignments is labor intensive and requires extensive measurement time. To address this limitation, we previously proposed ARTINA, a deep learning method for automatic assignment of two-dimensional (2D)-4D NMR spectra. Here, we present an integrative approach that combines ARTINA with AlphaFold and UCBShift, enabling chemical shift assignment with reduced experimental data, increased accuracy, and enhanced robustness for larger systems, as presented in a comprehensive study with more than 5000 automated assignment calculations on 89 proteins. We demonstrate that five 3D spectra yield more accurate assignments (92.59%) than pure ARTINA runs using all experimentally available NMR data (on average 10 3D spectra per protein, 91.37%), considerably reducing the required measurement time. We also showcase automated assignments of only 15N-labeled samples, and report improved assignment accuracy in larger synthetic systems of up to 500 residues.
Collapse
Affiliation(s)
- Piotr Klukowski
- Institute of Molecular Physical Science, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Roland Riek
- Institute of Molecular Physical Science, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Peter Güntert
- Institute of Molecular Physical Science, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
- Institute of Biophysical Chemistry, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438 Frankfurt am Main, Germany
- Department of Chemistry, Tokyo Metropolitan University, 1-1 Minami-Osawa, Hachioji, 192-0397 Tokyo, Japan
| |
Collapse
|
15
|
Shukla VK, Heller GT, Hansen DF. Biomolecular NMR spectroscopy in the era of artificial intelligence. Structure 2023; 31:1360-1374. [PMID: 37848030 DOI: 10.1016/j.str.2023.09.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/15/2023] [Accepted: 09/21/2023] [Indexed: 10/19/2023]
Abstract
Biomolecular nuclear magnetic resonance (NMR) spectroscopy and artificial intelligence (AI) have a burgeoning synergy. Deep learning-based structural predictors have forever changed structural biology, yet these tools currently face limitations in accurately characterizing protein dynamics, allostery, and conformational heterogeneity. We begin by highlighting the unique abilities of biomolecular NMR spectroscopy to complement AI-based structural predictions toward addressing these knowledge gaps. We then highlight the direct integration of deep learning approaches into biomolecular NMR methods. AI-based tools can dramatically improve the acquisition and analysis of NMR spectra, enhancing the accuracy and reliability of NMR measurements, thus streamlining experimental processes. Additionally, deep learning enables the development of novel types of NMR experiments that were previously unattainable, expanding the scope and potential of biomolecular NMR spectroscopy. Ultimately, a combination of AI and NMR promises to further revolutionize structural biology on several levels, advance our understanding of complex biomolecular systems, and accelerate drug discovery efforts.
Collapse
Affiliation(s)
- Vaibhav Kumar Shukla
- Department of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, UK
| | - Gabriella T Heller
- Department of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, UK.
| | - D Flemming Hansen
- Department of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, UK.
| |
Collapse
|
16
|
Chandy SK, Raghavachari K. MIM-ML: A Novel Quantum Chemical Fragment-Based Random Forest Model for Accurate Prediction of NMR Chemical Shifts of Nucleic Acids. J Chem Theory Comput 2023; 19:6632-6642. [PMID: 37703522 DOI: 10.1021/acs.jctc.3c00563] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
We developed a random forest machine learning (ML) model for the prediction of 1H and 13C NMR chemical shifts of nucleic acids. Our ML model is trained entirely on reproducing computed chemical shifts obtained previously on 10 nucleic acids using a Molecules-in-Molecules (MIM) fragment-based density functional theory (DFT) protocol including microsolvation effects. Our ML model includes structural descriptors as well as electronic descriptors from an inexpensive low-level semiempirical calculation (GFN2-xTB) and trained on a relatively small number of DFT chemical shifts (2080 1H chemical shifts and 1780 13C chemical shifts on the 10 nucleic acids). The ML model is then used to make chemical shift predictions on 8 new nucleic acids ranging in size from 600 to 900 atoms and compared directly to experimental data. Though no experimental data was used in the training, the performance of our model is excellent (mean absolute deviation of 0.34 ppm for 1H chemical shifts and 2.52 ppm for 13C chemical shifts for the test set), despite having some nonstandard structures. A simple analysis suggests that both structural and electronic descriptors are critical for achieving reliable predictions. This is the first attempt to combine ML from fragment-based DFT calculations to predict experimental chemical shifts accurately, making the MIM-ML model a valuable tool for NMR predictions of nucleic acids.
Collapse
Affiliation(s)
- Sruthy K Chandy
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
17
|
Cordova M, Moutzouri P, Nilsson Lill SO, Cousen A, Kearns M, Norberg ST, Svensk Ankarberg A, McCabe J, Pinon AC, Schantz S, Emsley L. Atomic-level structure determination of amorphous molecular solids by NMR. Nat Commun 2023; 14:5138. [PMID: 37612269 PMCID: PMC10447443 DOI: 10.1038/s41467-023-40853-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 08/10/2023] [Indexed: 08/25/2023] Open
Abstract
Structure determination of amorphous materials remains challenging, owing to the disorder inherent to these materials. Nuclear magnetic resonance (NMR) powder crystallography is a powerful method to determine the structure of molecular solids, but disorder leads to a high degree of overlap between measured signals, and prevents the unambiguous identification of a single modeled periodic structure as representative of the whole material. Here, we determine the atomic-level ensemble structure of the amorphous form of the drug AZD4625 by combining solid-state NMR experiments with molecular dynamics (MD) simulations and machine-learned chemical shifts. By considering the combined shifts of all 1H and 13C atomic sites in the molecule, we determine the structure of the amorphous form by identifying an ensemble of local molecular environments that are in agreement with experiment. We then extract and analyze preferred conformations and intermolecular interactions in the amorphous sample in terms of the stabilization of the amorphous form of the drug.
Collapse
Affiliation(s)
- Manuel Cordova
- Institut des Sciences et Ingénierie Chimiques, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland
- National Centre for Computational Design and Discovery of Novel Materials MARVEL, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Pinelopi Moutzouri
- Institut des Sciences et Ingénierie Chimiques, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland
| | - Sten O Nilsson Lill
- Data Science & Modelling, Pharmaceutical Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Alexander Cousen
- Early Chemical Development, Pharmaceutical Sciences, R&D, AstraZeneca, Macclesfield, UK
| | - Martin Kearns
- Early Product Development and Manufacturing, Pharmaceutical Sciences, R&D, AstraZeneca, Macclesfield, UK
| | - Stefan T Norberg
- Oral Product Development, Pharmaceutical Technology & Development, Operations, AstraZeneca, Gothenburg, Sweden
| | - Anna Svensk Ankarberg
- Oral Product Development, Pharmaceutical Technology & Development, Operations, AstraZeneca, Gothenburg, Sweden
| | - James McCabe
- Early Product Development and Manufacturing, Pharmaceutical Sciences, R&D, AstraZeneca, Macclesfield, UK
| | - Arthur C Pinon
- Swedish NMR Center, Department of Chemistry and Molecular Biology, University of Gothenburg, 41390, Gothenburg, Sweden
| | - Staffan Schantz
- Oral Product Development, Pharmaceutical Technology & Development, Operations, AstraZeneca, Gothenburg, Sweden.
| | - Lyndon Emsley
- Institut des Sciences et Ingénierie Chimiques, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland.
- National Centre for Computational Design and Discovery of Novel Materials MARVEL, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
| |
Collapse
|
18
|
Maya N, Kyoko N, Misaki I, Yuichi U, Nitta Y. Expression and purification of 15N-labeled Fra a 1, a strawberry allergen, to prepare samples for NMR measurements. Protein Expr Purif 2023; 210:106296. [PMID: 37192728 DOI: 10.1016/j.pep.2023.106296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 05/12/2023] [Accepted: 05/13/2023] [Indexed: 05/18/2023]
Abstract
Raw strawberries contain allergens that cause oral allergic syndrome. Fra a 1 is one of the major allergens in strawberries and might decrease their allergenicity by heating, likely due to structural changes in the allergen leading to decreased recognition of the allergens in the oral cavity. In the present study, to understand the relationship between allergen structure and allergenicity, the expression and purification of 15N-labeled Fra a 1 were examined and the sample was used for NMR analysis. Two isoforms, Fra a 1.01 and Fra a 1.02, were used and expressed in E. coli BL21(DE3) in M9 minimal medium. Fra a 1.02 was purified as a single protein by using the GST tag approach, whereas histidine×6-tag (his6-tag) Fra a 1.02 was obtained both as the full-length (∼20 kDa) and a truncated (∼18 kDa) form. On the other hand, his6-tag Fra a 1.01 was purified as a homogeneous protein. 1⁵N-labeled HSQC NMR spectra suggested that Fra a 1.02 was thermally denatured at lower temperatures than Fra a 1.01, despite the high amino acid sequence homology (79.4%) of these isoforms. Furthermore, the samples in the present study allowed us to analyze ligand binding that probably affects structural stability. In conclusion, GST tag was effective for obtaining a homogeneous protein when his6-tag failed to give a single form, and the present study provided a sample that could be used for NMR studies of the details of the allergenicity and structure of Fra a 1.
Collapse
Affiliation(s)
- Nishino Maya
- Department of Nutrition and Food Science, Ochanomizu University, Japan
| | - Noda Kyoko
- Department of Nutrition and Food Science, Ochanomizu University, Japan
| | - Ishibashi Misaki
- Graduate School of Agricultural Science, Kobe University, Japan; Graduate School of Agriculture, Kyoto University, Japan
| | - Uno Yuichi
- Graduate School of Agricultural Science, Kobe University, Japan
| | - Yoko Nitta
- Department of Nutrition and Food Science, Ochanomizu University, Japan.
| |
Collapse
|
19
|
Liu ZH, Zhang O, Teixeira JMC, Li J, Head-Gordon T, Forman-Kay JD. SPyCi-PDB: A modular command-line interface for back-calculating experimental datatypes of protein structures. JOURNAL OF OPEN SOURCE SOFTWARE 2023; 8:4861. [PMID: 38726305 PMCID: PMC11081106 DOI: 10.21105/joss.04861] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/12/2024]
Affiliation(s)
- Zi Hao Liu
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
| | - Oufan Zhang
- Pitzer Center for Theoretical Chemistry, University of California, Berkeley, California 94720-1460, USA
- Department of Chemistry, University of California, Berkeley, California 94720-1460, USA
| | - João M C Teixeira
- Department of Biomedical Sciences, University of Padova, Padova 35131, Italy
| | - Jie Li
- Pitzer Center for Theoretical Chemistry, University of California, Berkeley, California 94720-1460, USA
- Department of Chemistry, University of California, Berkeley, California 94720-1460, USA
| | - Teresa Head-Gordon
- Pitzer Center for Theoretical Chemistry, University of California, Berkeley, California 94720-1460, USA
- Department of Chemistry, University of California, Berkeley, California 94720-1460, USA
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720-1462, USA
- Department of Bioengineering, University of California, Berkeley, California 94720-1762, USA
| | - Julie D Forman-Kay
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
| |
Collapse
|
20
|
Zhang O, Haghighatlari M, Li J, Liu ZH, Namini A, Teixeira JMC, Forman-Kay JD, Head-Gordon T. Learning to evolve structural ensembles of unfolded and disordered proteins using experimental solution data. J Chem Phys 2023; 158:174113. [PMID: 37144719 PMCID: PMC10163956 DOI: 10.1063/5.0141474] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/11/2023] [Indexed: 05/06/2023] Open
Abstract
The structural characterization of proteins with a disorder requires a computational approach backed by experiments to model their diverse and dynamic structural ensembles. The selection of conformational ensembles consistent with solution experiments of disordered proteins highly depends on the initial pool of conformers, with currently available tools limited by conformational sampling. We have developed a Generative Recurrent Neural Network (GRNN) that uses supervised learning to bias the probability distributions of torsions to take advantage of experimental data types such as nuclear magnetic resonance J-couplings, nuclear Overhauser effects, and paramagnetic resonance enhancements. We show that updating the generative model parameters according to the reward feedback on the basis of the agreement between experimental data and probabilistic selection of torsions from learned distributions provides an alternative to existing approaches that simply reweight conformers of a static structural pool for disordered proteins. Instead, the biased GRNN, DynamICE, learns to physically change the conformations of the underlying pool of the disordered protein to those that better agree with experiments.
Collapse
Affiliation(s)
- Oufan Zhang
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Mojtaba Haghighatlari
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Jie Li
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | | | - Ashley Namini
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5S 1A8, Canada
| | | | | | | |
Collapse
|
21
|
Ksenofontov AA, Isaev YI, Lukanov MM, Makarov DM, Eventova VA, Khodov IA, Berezin MB. Accurate prediction of 11B NMR chemical shift of BODIPYs via machine learning. Phys Chem Chem Phys 2023; 25:9472-9481. [PMID: 36935644 DOI: 10.1039/d3cp00253e] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
In this article, we present the results of developing a model based on an RFR machine learning method using the ISIDA fragment descriptors for predicting the 11B NMR chemical shift of BODIPYs. The model is freely available at https://ochem.eu/article/146458. The model demonstrates the high quality of predicting the 11B NMR chemical shift (RMSE, 5CV (FINALE training set) = 0.40 ppm, RMSE (TEST set) = 0.14 ppm). In addition, we compared the "cost" and the user-friendliness for calculations using the quantum-chemical model with the DFT/GIAO approach. The 11B NMR chemical shift prediction accuracy (RMSE) of the model considered is more than three times higher and tremendously faster than the DFT/GIAO calculations. As a result, we provide a convenient tool and database that we collected for all researchers, that allows them to predict the 11B NMR chemical shift of boron-containing dyes. We believe that the new model will make it easier for researchers to correctly interpret the 11B NMR chemical shifts experimentally determined and to select more optimal conditions to perform an NMR experiment.
Collapse
Affiliation(s)
- Alexander A Ksenofontov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Yaroslav I Isaev
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia. .,Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Michail M Lukanov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Dmitry M Makarov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Varvara A Eventova
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia. .,Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Ilya A Khodov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Mechail B Berezin
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| |
Collapse
|
22
|
Hoch JC, Baskaran K, Burr H, Chin J, Eghbalnia H, Fujiwara T, Gryk M, Iwata T, Kojima C, Kurisu G, Maziuk D, Miyanoiri Y, Wedell J, Wilburn C, Yao H, Yokochi M. Biological Magnetic Resonance Data Bank. Nucleic Acids Res 2023; 51:D368-D376. [PMID: 36478084 PMCID: PMC9825541 DOI: 10.1093/nar/gkac1050] [Citation(s) in RCA: 78] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 10/20/2022] [Accepted: 10/23/2022] [Indexed: 12/12/2022] Open
Abstract
The Biological Magnetic Resonance Data Bank (BMRB, https://bmrb.io) is the international open data repository for biomolecular nuclear magnetic resonance (NMR) data. Comprised of both empirical and derived data, BMRB has applications in the study of biomacromolecular structure and dynamics, biomolecular interactions, drug discovery, intrinsically disordered proteins, natural products, biomarkers, and metabolomics. Advances including GHz-class NMR instruments, national and trans-national NMR cyberinfrastructure, hybrid structural biology methods and machine learning are driving increases in the amount, type, and applications of NMR data in the biosciences. BMRB is a Core Archive and member of the World-wide Protein Data Bank (wwPDB).
Collapse
Affiliation(s)
- Jeffrey C Hoch
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT 06030-3305, USA
| | - Kumaran Baskaran
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT 06030-3305, USA
| | - Harrison Burr
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT 06030-3305, USA
| | - John Chin
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT 06030-3305, USA
| | - Hamid R Eghbalnia
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT 06030-3305, USA
| | - Toshimichi Fujiwara
- Protein Data Bank Japan, Institute for Protein Research, Osaka University, Suita, Osaka 565-0871. Japan
| | - Michael R Gryk
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT 06030-3305, USA
| | - Takeshi Iwata
- Protein Data Bank Japan, Institute for Protein Research, Osaka University, Suita, Osaka 565-0871. Japan
| | - Chojiro Kojima
- Protein Data Bank Japan, Institute for Protein Research, Osaka University, Suita, Osaka 565-0871. Japan
- Graduate School of Engineering Science, Yokohama National University, Yokohama 240-8501, Japan
| | - Genji Kurisu
- Protein Data Bank Japan, Institute for Protein Research, Osaka University, Suita, Osaka 565-0871. Japan
| | - Dmitri Maziuk
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT 06030-3305, USA
| | - Yohei Miyanoiri
- Protein Data Bank Japan, Institute for Protein Research, Osaka University, Suita, Osaka 565-0871. Japan
| | - Jonathan R Wedell
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT 06030-3305, USA
| | - Colin Wilburn
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT 06030-3305, USA
| | - Hongyang Yao
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT 06030-3305, USA
| | - Masashi Yokochi
- Protein Data Bank Japan, Institute for Protein Research, Osaka University, Suita, Osaka 565-0871. Japan
| |
Collapse
|
23
|
Qi G, Vrettas MD, Biancaniello C, Sanz-Hernandez M, Cafolla CT, Morgan JWR, Wang Y, De Simone A, Wales DJ. Enhancing Biomolecular Simulations with Hybrid Potentials Incorporating NMR Data. J Chem Theory Comput 2022; 18:7733-7750. [PMID: 36395419 DOI: 10.1021/acs.jctc.2c00657] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Some recent advances in biomolecular simulation and global optimization have used hybrid restraint potentials, where harmonic restraints that penalize conformations inconsistent with experimental data are combined with molecular mechanics force fields. These hybrid potentials can be used to improve the performance of molecular dynamics, structure prediction, energy landscape sampling, and other computational methods that rely on the accuracy of the underlying force field. Here, we develop a hybrid restraint potential based on NapShift, an artificial neural network trained to predict protein nuclear magnetic resonance (NMR) chemical shifts from sequence and structure. In addition to providing accurate predictions of experimental chemical shifts, NapShift is fully differentiable with respect to atomic coordinates, which allows us to use it for structural refinement. By employing NapShift to predict chemical shifts from the protein conformation at each simulation step, we can compute an energy penalty and the corresponding hybrid restraint forces based on the difference between the predicted values and the experimental chemical shifts. The performance of the hybrid restraint potential was benchmarked using both basin-hopping global optimization and molecular dynamics simulations. In each case, the NapShift hybrid potential improved the accuracy, leading to better structure prediction via basin-hopping and increased local stability in molecular dynamics simulations. Our results suggest that neural network hybrid potentials based on NMR observables can enhance a broad range of molecular simulation methods, and the prediction accuracy will improve as more experimental training data become available.
Collapse
Affiliation(s)
- Guowei Qi
- Department of Chemistry, University of Cambridge, Lensfield Road, CambridgeCB2 1EW, U.K
| | - Michail D Vrettas
- Department of Pharmacy, University of Naples Federico II, 80131Naples, Italy
| | - Carmen Biancaniello
- Department of Pharmacy, University of Naples Federico II, 80131Naples, Italy
| | - Maximo Sanz-Hernandez
- Department of Life Sciences, Imperial College London, South Kensington, LondonSW7 2AZ, U.K
| | - Conor T Cafolla
- Department of Chemistry, University of Cambridge, Lensfield Road, CambridgeCB2 1EW, U.K
| | - John W R Morgan
- Department of Chemistry, University of Cambridge, Lensfield Road, CambridgeCB2 1EW, U.K
| | - Yifei Wang
- Department of Chemistry, University of Cambridge, Lensfield Road, CambridgeCB2 1EW, U.K
| | - Alfonso De Simone
- Department of Pharmacy, University of Naples Federico II, 80131Naples, Italy
| | - David J Wales
- Department of Chemistry, University of Cambridge, Lensfield Road, CambridgeCB2 1EW, U.K
| |
Collapse
|
24
|
Fraga KJ, Huang YJ, Ramelot TA, Swapna GVT, Lashawn Anak Kendary A, Li E, Korf I, Montelione GT. SpecDB: A relational database for archiving biomolecular NMR spectral data. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2022; 342:107268. [PMID: 35930941 PMCID: PMC9922030 DOI: 10.1016/j.jmr.2022.107268] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 06/16/2022] [Accepted: 07/06/2022] [Indexed: 05/11/2023]
Abstract
NMR is a valuable experimental tool in the structural biologist's toolkit to elucidate the structures, functions, and motions of biomolecules. The progress of machine learning, particularly in structural biology, reveals the critical importance of large, diverse, and reliable datasets in developing new methods and understanding in structural biology and science more broadly. Biomolecular NMR research groups produce large amounts of data, and there is renewed interest in organizing these data to train new, sophisticated machine learning architectures and to improve biomolecular NMR analysis pipelines. The foundational data type in NMR is the free-induction decay (FID). There are opportunities to build sophisticated machine learning methods to tackle long-standing problems in NMR data processing, resonance assignment, dynamics analysis, and structure determination using NMR FIDs. Our goal in this study is to provide a lightweight, broadly available tool for archiving FID data as it is generated at the spectrometer, and grow a new resource of FID data and associated metadata. This study presents a relational schema for storing and organizing the metadata items that describe an NMR sample and FID data, which we call Spectral Database (SpecDB). SpecDB is implemented in SQLite and includes a Python software library providing a command-line application to create, organize, query, backup, share, and maintain the database. This set of software tools and database schema allow users to store, organize, share, and learn from NMR time domain data. SpecDB is freely available under an open source license at https://github.rpi.edu/RPIBioinformatics/SpecDB.
Collapse
Affiliation(s)
- Keith J Fraga
- Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA.
| | - Yuanpeng J Huang
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
| | - Theresa A Ramelot
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
| | - G V T Swapna
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA; Department of Pharmacology, Robert Wood Johnson Medical School, Rutgers The State University of New Jersey, Piscataway, NJ 08854, USA.
| | | | - Ethan Li
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
| | - Ian Korf
- Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA.
| | - Gaetano T Montelione
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
| |
Collapse
|
25
|
Regression Machine Learning Models Used to Predict DFT-Computed NMR Parameters of Zeolites. COMPUTATION 2022. [DOI: 10.3390/computation10050074] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Machine learning approaches can drastically decrease the computational time for the predictions of spectroscopic properties in materials, while preserving the quality of the computational approaches. We studied the performance of kernel-ridge regression (KRR) and gradient boosting regressor (GBR) models trained on the isotropic shielding values, computed with density-functional theory (DFT), in a series of different known zeolites containing out-of-frame metal cations or fluorine anion and organic structure-directing cations. The smooth overlap of atomic position descriptors were computed from the DFT-optimised Cartesian coordinates of each atoms in the zeolite crystal cells. The use of these descriptors as inputs in both machine learning regression methods led to the prediction of the DFT isotropic shielding values with mean errors within 0.6 ppm. The results showed that the GBR model scales better than the KRR model.
Collapse
|
26
|
Yang W, Kim BS, Muniyappan S, Lee YH, Kim JH, Yu W. Aggregation-Prone Structural Ensembles of Transthyretin Collected With Regression Analysis for NMR Chemical Shift. Front Mol Biosci 2021; 8:766830. [PMID: 34746240 PMCID: PMC8568061 DOI: 10.3389/fmolb.2021.766830] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 10/05/2021] [Indexed: 11/26/2022] Open
Abstract
Monomer dissociation and subsequent misfolding of the transthyretin (TTR) is one of the most critical causative factors of TTR amyloidosis. TTR amyloidosis causes several human diseases, such as senile systemic amyloidosis and familial amyloid cardiomyopathy/polyneuropathy; therefore, it is important to understand the molecular details of the structural deformation and aggregation mechanisms of TTR. However, such molecular characteristics are still elusive because of the complicated structural heterogeneity of TTR and its highly sensitive nature to various environmental factors. Several nuclear magnetic resonance (NMR) spectroscopy and molecular dynamics (MD) studies of TTR variants have recently reported evidence of transient aggregation-prone structural states of TTR. According to these studies, the stability of the DAGH β-sheet, one of the two main β-sheets in TTR, is a crucial determinant of the TTR amyloidosis mechanism. In addition, its conformational perturbation and possible involvement of nearby structural motifs facilitates TTR aggregation. This study proposes aggregation-prone structural ensembles of TTR obtained by MD simulation with enhanced sampling and a multiple linear regression approach. This method provides plausible structural models that are composed of ensemble structures consistent with NMR chemical shift data. This study validated the ensemble models with experimental data obtained from circular dichroism (CD) spectroscopy and NMR order parameter analysis. In addition, our results suggest that the structural deformation of the DAGH β-sheet and the AB loop regions may correlate with the manifestation of the aggregation-prone conformational states of TTR. In summary, our method employing MD techniques to extend the structural ensembles from NMR experimental data analysis may provide new opportunities to investigate various transient yet important structural states of amyloidogenic proteins.
Collapse
Affiliation(s)
- Wonjin Yang
- Department of Brain and Cognitive Sciences, DGIST, Daegu, South Korea
| | - Beom Soo Kim
- Department of Brain and Cognitive Sciences, DGIST, Daegu, South Korea
| | | | - Young-Ho Lee
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Ochang, South Korea.,Department of Bio-analytical Science, University of Science and Technology, Daejeon, South Korea.,Graduate School of Analytical Science and Technology, Chungnam National University, Daejeon, South Korea.,Research Headquarters, Korea Brain Research Institute, Daegu, South Korea
| | - Jin Hae Kim
- Department of New Biology, DGIST, Daegu, South Korea
| | - Wookyung Yu
- Department of Brain and Cognitive Sciences, DGIST, Daegu, South Korea.,Core Protein Resources Center, DGIST, Daegu, South Korea
| |
Collapse
|
27
|
Guan X, Leven I, Heidar-Zadeh F, Head-Gordon T. Protein C-GeM: A Coarse-Grained Electron Model for Fast and Accurate Protein Electrostatics Prediction. J Chem Inf Model 2021; 61:4357-4369. [PMID: 34490776 DOI: 10.1021/acs.jcim.1c00388] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The electrostatic potential (ESP) is a powerful property for understanding and predicting electrostatic charge distributions that drive interactions between molecules. In this study, we compare various charge partitioning schemes including fitted charges, density-based quantum mechanical (QM) partitioning schemes, charge equilibration methods, and our recently introduced coarse-grained electron model, C-GeM, to describe the ESP for protein systems. When benchmarked against high quality density functional theory calculations of the ESP for tripeptides and the crambin protein, we find that the C-GeM model is of comparable accuracy to ab initio charge partitioning methods, but with orders of magnitude improvement in computational efficiency since it does not require either the electron density or the electrostatic potential as input.
Collapse
Affiliation(s)
- Xingyi Guan
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States.,Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Itai Leven
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States.,Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Farnaz Heidar-Zadeh
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States.,Department of Chemistry, Queen's University, Kingston, Ontario K7L 3N6, Canada
| | - Teresa Head-Gordon
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States.,Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States.,Departments of Bioengineering and Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720, United States
| |
Collapse
|
28
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
29
|
Lindorff-Larsen K, Kragelund BB. On the potential of machine learning to examine the relationship between sequence, structure, dynamics and function of intrinsically disordered proteins. J Mol Biol 2021; 433:167196. [PMID: 34390736 DOI: 10.1016/j.jmb.2021.167196] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 08/03/2021] [Accepted: 08/04/2021] [Indexed: 11/29/2022]
Abstract
Intrinsically disordered proteins (IDPs) constitute a broad set of proteins with few uniting and many diverging properties. IDPs-and intrinsically disordered regions (IDRs) interspersed between folded domains-are generally characterized as having no persistent tertiary structure; instead they interconvert between a large number of different and often expanded structures. IDPs and IDRs are involved in an enormously wide range of biological functions and reveal novel mechanisms of interactions, and while they defy the common structure-function paradigm of folded proteins, their structural preferences and dynamics are important for their function. We here discuss open questions in the field of IDPs and IDRs, focusing on areas where machine learning and other computational methods play a role. We discuss computational methods aimed to predict transiently formed local and long-range structure, including methods for integrative structural biology. We discuss the many different ways in which IDPs and IDRs can bind to other molecules, both via short linear motifs, as well as in the formation of larger dynamic complexes such as biomolecular condensates. We discuss how experiments are providing insight into such complexes and may enable more accurate predictions. Finally, we discuss the role of IDPs in disease and how new methods are needed to interpret the mechanistic effects of genomic variants in IDPs.
Collapse
Affiliation(s)
- Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Birthe B Kragelund
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
30
|
Feng JJ, Chen JN, Kang W, Wu YD. Accurate Structure Prediction for Protein Loops Based on Molecular Dynamics Simulations with RSFF2C. J Chem Theory Comput 2021; 17:4614-4628. [PMID: 34170125 DOI: 10.1021/acs.jctc.1c00341] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein loops, connecting the α-helices and β-strands, are involved in many important biological processes. However, due to their conformational flexibility, it is still challenging to accurately determine three-dimensional (3D) structures of long loops experimentally and computationally. Herein, we present a systematic study of the protein loop structure prediction via a total of ∼850 μs molecular dynamics (MD) simulations. For a set of 15 long (10-16 residues) and solvent-exposed loops, we first evaluated the performance of four state-of-the-art loop modeling algorithms, DaReUS-Loop, Sphinx, Rosetta-NGK, and MODELLER, on each loop, and none of them could accurately predict the structures for most loops. Then, temperature replica exchange molecular dynamics (REMD) simulations were conducted with three recent force fields, RSFF2C with TIP3P water model, CHARMM36m with CHARMM-modified TIP3P, and AMBER ff19SB with OPC. We found that our recently developed residue-specific force field RSFF2C performed the best and successfully predicted 12 out of 15 loops with a root-mean-square deviation (RMSD) < 1.5 Å. As an alternative with lower computational cost, normal MD simulations at high temperatures (380, 500, and 620 K) were investigated. Temperature-dependent performance was observed for each force field, and, for RSFF2C+TIP3P, we found that three independent 100-ns MD simulations at 500 K gave comparable results with REMD simulations. These results suggest that MD simulations, especially with enhanced sampling techniques such as replica exchange, with the RSFF2C force field could be useful for accurate loop structure prediction.
Collapse
Affiliation(s)
- Jia-Jie Feng
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Jia-Nan Chen
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Wei Kang
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen 518132, China
| | - Yun-Dong Wu
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China.,College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China.,Shenzhen Bay Laboratory, Shenzhen 518132, China
| |
Collapse
|
31
|
Ito K, Xu X, Kikuchi J. Improved Prediction of Carbonless NMR Spectra by the Machine Learning of Theoretical and Fragment Descriptors for Environmental Mixture Analysis. Anal Chem 2021; 93:6901-6906. [PMID: 33929838 DOI: 10.1021/acs.analchem.1c00756] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
As the first multidimensional NMR approach, 2D J-resolved (2DJ) spectroscopy is distinguished by signal resolution and detection sensitivity with remarkable advantages for the exhaustive evaluation of complex mixtures and environmental samples due to its carbonless feature without the requirement of 13C connectivity. Generally, the 2DJ signal assignment of metabolic mixtures is problematic in spite of references to experimental NMR databases, owing to the existence of metabolic "dark matter." In this study, a new method to predict 2DJ spectra was developed with a combination of quantum mechanical (QM) computation and machine learning (ML). The predictive accuracy of J-coupling constants was evaluated using validated data. The root-mean-square deviation (RMSD) for QM computation was 3.52 Hz, while the RMSD for QM + ML was 1.21 Hz, indicating a substantial increase in predictive accuracy. The proposed model was applied to predict the 2DJ spectra of 60 standard substances and 55 components of seawater. Furthermore, two practical environmental samples were used to evaluate the robustness of the constructed predictive model. A J-coupling tree and J-split spectra produced from QM + ML of aliphatic moieties had good consistency with the experimental data, as compared with the theoretical data produced by QM computation. The predicted J-coupling tree for the J-coupling multiplet analysis of freely rotating bonds in the complex mixture, which is traditionally difficult, was interpretable. In addition, in silico identification of the J-split 1H NMR signals, which was independent of experimental databases, aided in the discovery of new components in a mixture.
Collapse
Affiliation(s)
- Kengo Ito
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Xiangru Xu
- Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Jun Kikuchi
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,Graduate School of Bioagricultural Sciences, Nagoya University, 1 Furo-cho, Chikusa-ku, Nagoya, Aichi 464-0810, Japan
| |
Collapse
|
32
|
Chen MS, Zuehlsdorff TJ, Morawietz T, Isborn CM, Markland TE. Exploiting Machine Learning to Efficiently Predict Multidimensional Optical Spectra in Complex Environments. J Phys Chem Lett 2020; 11:7559-7568. [PMID: 32808797 DOI: 10.1021/acs.jpclett.0c02168] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The excited-state dynamics of chromophores in complex environments determine a range of vital biological and energy capture processes. Time-resolved, multidimensional optical spectroscopies provide a key tool to investigate these processes. Although theory has the potential to decode these spectra in terms of the electronic and atomistic dynamics, the need for large numbers of excited-state electronic structure calculations severely limits first-principles predictions of multidimensional optical spectra for chromophores in the condensed phase. Here, we leverage the locality of chromophore excitations to develop machine learning models to predict the excited-state energy gap of chromophores in complex environments for efficiently constructing linear and multidimensional optical spectra. By analyzing the performance of these models, which span a hierarchy of physical approximations, across a range of chromophore-environment interaction strengths, we provide strategies for the construction of machine learning models that greatly accelerate the calculation of multidimensional optical spectra from first principles.
Collapse
Affiliation(s)
- Michael S Chen
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Tim J Zuehlsdorff
- Chemistry and Chemical Biology, University of California Merced, Merced, California 95343, United States
| | - Tobias Morawietz
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Christine M Isborn
- Chemistry and Chemical Biology, University of California Merced, Merced, California 95343, United States
| | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
33
|
Lincoff J, Haghighatlari M, Krzeminski M, Teixeira JMC, Gomes GNW, Gradinaru CC, Forman-Kay JD, Head-Gordon T. Extended Experimental Inferential Structure Determination Method in Determining the Structural Ensembles of Disordered Protein States. Commun Chem 2020; 3:74. [PMID: 32775701 PMCID: PMC7409953 DOI: 10.1038/s42004-020-0323-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 04/22/2020] [Indexed: 01/12/2023] Open
Abstract
Proteins with intrinsic or unfolded state disorder comprise a new frontier in structural biology, requiring the characterization of diverse and dynamic structural ensembles. We introduce a comprehensive Bayesian framework, the Extended Experimental Inferential Structure Determination (X-EISD) method, that calculates the maximum log-likelihood of a disordered protein ensemble. X-EISD accounts for the uncertainties of a range of experimental data and back-calculation models from structures, including NMR chemical shifts, J-couplings, Nuclear Overhauser Effects (NOEs), paramagnetic relaxation enhancements (PREs), residual dipolar couplings (RDCs), hydrodynamic radii (R h ), single molecule fluorescence Förster resonance energy transfer (smFRET) and small angle X-ray scattering (SAXS). We apply X-EISD to the joint optimization against experimental data for the unfolded drkN SH3 domain and find that combining a local data type, such as chemical shifts or J-couplings, paired with long-ranged restraints such as NOEs, PREs or smFRET, yields structural ensembles in good agreement with all other data types if combined with representative IDP conformers.
Collapse
Affiliation(s)
- James Lincoff
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, CA 94720 USA
- Pitzer Center for Theoretical Chemistry, University of California, Berkeley, CA 94720 USA
- Present Address: Cardiovascular Research Institute, University of California, San Francisco, CA 94158 USA
| | - Mojtaba Haghighatlari
- Pitzer Center for Theoretical Chemistry, University of California, Berkeley, CA 94720 USA
- Department of Chemistry, University of California, Berkeley, CA 94720 USA
| | - Mickael Krzeminski
- Molecular Structure and Function Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4 Canada
| | - João M. C. Teixeira
- Molecular Structure and Function Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4 Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8 Canada
| | - Gregory-Neal W. Gomes
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6 Canada
| | - Claudiu C. Gradinaru
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6 Canada
| | - Julie D. Forman-Kay
- Molecular Structure and Function Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4 Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8 Canada
| | - Teresa Head-Gordon
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, CA 94720 USA
- Pitzer Center for Theoretical Chemistry, University of California, Berkeley, CA 94720 USA
- Department of Chemistry, University of California, Berkeley, CA 94720 USA
- Department of Bioengineering, University of California, Berkeley, CA 94720 USA
| |
Collapse
|