1
|
Rasulov U, Wang HK, Viennet T, Droemer MA, Matosin S, Schindler S, Sun ZYJ, Mureddu L, Vuister GW, Robson SA, Arthanari H, Kuprov I. Protein NMR assignment by isotope pattern recognition. SCIENCE ADVANCES 2024; 10:eado0403. [PMID: 39231223 PMCID: PMC11373586 DOI: 10.1126/sciadv.ado0403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Accepted: 07/29/2024] [Indexed: 09/06/2024]
Abstract
The current standard method for amino acid signal identification in protein NMR spectra is sequential assignment using triple-resonance experiments. Good software and elaborate heuristics exist, but the process remains laboriously manual. Machine learning does help, but its training databases need millions of samples that cover all relevant physics and every kind of instrumental artifact. In this communication, we offer a solution to this problem. We propose polyadic decompositions to store millions of simulated three-dimensional NMR spectra, on-the-fly generation of artifacts during training, a probabilistic way to incorporate prior and posterior information, and integration with the industry standard CcpNmr software framework. The resulting neural nets take [1H,13C] slices of mixed pyruvate-labeled HNCA spectra (different CA signal shapes for different residue types) and return an amino acid probability table. In combination with primary sequence information, backbones of common proteins (GB1, MBP, and INMT) are rapidly assigned from just the HNCA spectrum.
Collapse
Affiliation(s)
- Uluk Rasulov
- School of Chemistry, University of Southampton, University Road, Southampton SO17 1BJ, UK
| | - Harrison K Wang
- Department of Biochemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
- Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA
| | - Thibault Viennet
- Department of Chemistry and iNANO, Aarhus University, Langelandsgade 140, 8000 Aarhus C, Denmark
| | - Maxim A Droemer
- Faculty for Chemistry and Pharmacy, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Srđan Matosin
- Department of Biochemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
- Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA
| | - Sebastian Schindler
- Department of Biochemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
- Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA
| | - Zhen-Yu J Sun
- Department of Biochemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
- Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA
| | - Luca Mureddu
- Department of Molecular and Cell Biology, Institute for Structural and Chemical Biology, University of Leicester, Lancaster Road, Leicester LE1 7HB, UK
| | - Geerten W Vuister
- Department of Molecular and Cell Biology, Institute for Structural and Chemical Biology, University of Leicester, Lancaster Road, Leicester LE1 7HB, UK
| | - Scott A Robson
- Department of Biochemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
- Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA
| | - Haribabu Arthanari
- Department of Biochemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
- Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA
| | - Ilya Kuprov
- School of Chemistry, University of Southampton, University Road, Southampton SO17 1BJ, UK
| |
Collapse
|
2
|
Agarwal V, McShan AC. The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins. Nat Chem Biol 2024; 20:950-959. [PMID: 38907110 DOI: 10.1038/s41589-024-01638-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 04/29/2024] [Indexed: 06/23/2024]
Abstract
Artificial intelligence-driven advances in protein structure prediction in recent years have raised the question: has the protein structure-prediction problem been solved? Here, with a focus on nonglobular proteins, we highlight the many strengths and potential weaknesses of DeepMind's AlphaFold2 in the context of its biological and therapeutic applications. We summarize the subtleties associated with evaluation of AlphaFold2 model quality and reliability using the predicted local distance difference test (pLDDT) and predicted aligned error (PAE) values. We highlight various classes of proteins that AlphaFold2 can be applied to and the caveats involved. Concrete examples of how AlphaFold2 models can be integrated with experimental data in the form of small-angle X-ray scattering (SAXS), solution NMR, cryo-electron microscopy (cryo-EM) and X-ray diffraction are discussed. Finally, we highlight the need to move beyond structure prediction of rigid, static structural snapshots toward conformational ensembles and alternate biologically relevant states. The overarching theme is that careful consideration is due when using AlphaFold2-generated models to generate testable hypotheses and structural models, rather than treating predicted models as de facto ground truth structures.
Collapse
Affiliation(s)
- Vinayak Agarwal
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA.
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Andrew C McShan
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
3
|
Wu F, Huang Y, Yang G, Ye S, Mukamel S, Jiang J. Unraveling dynamic protein structures by two-dimensional infrared spectra with a pretrained machine learning model. Proc Natl Acad Sci U S A 2024; 121:e2409257121. [PMID: 38917009 PMCID: PMC11228460 DOI: 10.1073/pnas.2409257121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 05/28/2024] [Indexed: 06/27/2024] Open
Abstract
Dynamic protein structures are crucial for deciphering their diverse biological functions. Two-dimensional infrared (2DIR) spectroscopy stands as an ideal tool for tracing rapid conformational evolutions in proteins. However, linking spectral characteristics to dynamic structures poses a formidable challenge. Here, we present a pretrained machine learning model based on 2DIR spectra analysis. This model has learned signal features from approximately 204,300 spectra to establish a "spectrum-structure" correlation, thereby tracing the dynamic conformations of proteins. It excels in accurately predicting the dynamic content changes of various secondary structures and demonstrates universal transferability on real folding trajectories spanning timescales from microseconds to milliseconds. Beyond exceptional predictive performance, the model offers attention-based spectral explanations of dynamic conformational changes. Our 2DIR-based pretrained model is anticipated to provide unique insights into the dynamic structural information of proteins in their native environments.
Collapse
Affiliation(s)
- Fan Wu
- Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei230026, Anhui, China
| | - Yan Huang
- Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei230026, Anhui, China
| | - Guokun Yang
- Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei230026, Anhui, China
| | - Sheng Ye
- Anhui Provincial Engineering Research Center for Unmanned System and Intelligent Technology, School of Artificial Intelligence, Anhui University, Hefei230601, Anhui, China
| | - Shaul Mukamel
- Department of Chemistry and of Physics & Astronomy, University of California, Irvine, CA92697
| | - Jun Jiang
- Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei230026, Anhui, China
| |
Collapse
|
4
|
Klukowski P, Damberger FF, Allain FHT, Iwai H, Kadavath H, Ramelot TA, Montelione GT, Riek R, Güntert P. The 100-protein NMR spectra dataset: A resource for biomolecular NMR data analysis. Sci Data 2024; 11:30. [PMID: 38177162 PMCID: PMC10767026 DOI: 10.1038/s41597-023-02879-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/22/2023] [Indexed: 01/06/2024] Open
Abstract
Multidimensional NMR spectra are the basis for studying proteins by NMR spectroscopy and crucial for the development and evaluation of methods for biomolecular NMR data analysis. Nevertheless, in contrast to derived data such as chemical shift assignments in the BMRB and protein structures in the PDB databases, this primary data is in general not publicly archived. To change this unsatisfactory situation, we present a standardized set of solution NMR data comprising 1329 2-4-dimensional NMR spectra and associated reference (chemical shift assignments, structures) and derived (peak lists, restraints for structure calculation, etc.) annotations. With the 100-protein NMR spectra dataset that was originally compiled for the development of the ARTINA deep learning-based spectra analysis method, 100 protein structures can be reproduced from their original experimental data. The 100-protein NMR spectra dataset is expected to help the development of computational methods for NMR spectroscopy, in particular machine learning approaches, and enable consistent and objective comparisons of these methods.
Collapse
Affiliation(s)
- Piotr Klukowski
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
| | - Fred F Damberger
- Institute of Biochemistry, ETH Zurich, 8093, Zurich, Switzerland
| | | | - Hideo Iwai
- Institute of Biotechnology, University of Helsinki, 00100, Helsinki, Finland
| | | | - Theresa A Ramelot
- Department of Chemistry and Chemical Biology, and Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
| | - Gaetano T Montelione
- Department of Chemistry and Chemical Biology, and Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
| | - Roland Riek
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
| | - Peter Güntert
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
- Institute of Biophysical Chemistry, Goethe University, 60438, Frankfurt am Main, Germany.
- Department of Chemistry, Tokyo Metropolitan University, Hachioji, 192-0397, Tokyo, Japan.
| |
Collapse
|