1
|
Agarwal V, McShan AC. The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins. Nat Chem Biol 2024; 20:950-959. [PMID: 38907110 DOI: 10.1038/s41589-024-01638-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 04/29/2024] [Indexed: 06/23/2024]
Abstract
Artificial intelligence-driven advances in protein structure prediction in recent years have raised the question: has the protein structure-prediction problem been solved? Here, with a focus on nonglobular proteins, we highlight the many strengths and potential weaknesses of DeepMind's AlphaFold2 in the context of its biological and therapeutic applications. We summarize the subtleties associated with evaluation of AlphaFold2 model quality and reliability using the predicted local distance difference test (pLDDT) and predicted aligned error (PAE) values. We highlight various classes of proteins that AlphaFold2 can be applied to and the caveats involved. Concrete examples of how AlphaFold2 models can be integrated with experimental data in the form of small-angle X-ray scattering (SAXS), solution NMR, cryo-electron microscopy (cryo-EM) and X-ray diffraction are discussed. Finally, we highlight the need to move beyond structure prediction of rigid, static structural snapshots toward conformational ensembles and alternate biologically relevant states. The overarching theme is that careful consideration is due when using AlphaFold2-generated models to generate testable hypotheses and structural models, rather than treating predicted models as de facto ground truth structures.
Collapse
Affiliation(s)
- Vinayak Agarwal
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA.
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Andrew C McShan
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
2
|
Bonin JP, Aramini JM, Dong Y, Wu H, Kay LE. AlphaFold2 as a replacement for solution NMR structure determination of small proteins: Not so fast! JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2024; 364:107725. [PMID: 38917639 DOI: 10.1016/j.jmr.2024.107725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 06/18/2024] [Accepted: 06/19/2024] [Indexed: 06/27/2024]
Abstract
The determination of a protein's structure is often a first step towards the development of a mechanistic understanding of its function. Considerable advances in computational protein structure prediction have been made in recent years, with AlphaFold2 (AF2) emerging as the primary tool used by researchers for this purpose. While AF2 generally predicts accurate structures of folded proteins, we present here a case where AF2 incorrectly predicts the structure of a small, folded and compact protein with high confidence. This protein, pro-interleukin-18 (pro-IL-18), is the precursor of the cytokine IL-18. Interestingly, the structure of pro-IL-18 predicted by AF2 matches that of the mature cytokine, and not the corresponding experimentally determined structure of the pro-form of the protein. Thus, while computational structure prediction holds immense promise for addressing problems in protein biophysics, there is still a need for experimental structure determination, even in the context of small well-folded, globular proteins.
Collapse
Affiliation(s)
- Jeffrey P Bonin
- Departments of Molecular Genetics and Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada; Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada; Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, Ontario M5G 0A4, Canada
| | - James M Aramini
- Departments of Molecular Genetics and Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada; Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada; Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, Ontario M5G 0A4, Canada
| | - Ying Dong
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA; Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Hao Wu
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA; Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Lewis E Kay
- Departments of Molecular Genetics and Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada; Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada; Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, Ontario M5G 0A4, Canada.
| |
Collapse
|
3
|
Jiao Z, He Y, Fu X, Zhang X, Geng Z, Ding W. A predicted model-aided reconstruction algorithm for X-ray free-electron laser single-particle imaging. IUCRJ 2024; 11:602-619. [PMID: 38904548 PMCID: PMC11220885 DOI: 10.1107/s2052252524004858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Accepted: 05/23/2024] [Indexed: 06/22/2024]
Abstract
Ultra-intense, ultra-fast X-ray free-electron lasers (XFELs) enable the imaging of single protein molecules under ambient temperature and pressure. A crucial aspect of structure reconstruction involves determining the relative orientations of each diffraction pattern and recovering the missing phase information. In this paper, we introduce a predicted model-aided algorithm for orientation determination and phase retrieval, which has been tested on various simulated datasets and has shown significant improvements in the success rate, accuracy and efficiency of XFEL data reconstruction.
Collapse
Affiliation(s)
- Zhichao Jiao
- Laboratory of Soft Matter PhysicsInstitute of Physics, Chinese Academy of SciencesBeijing100190People’s Republic of China
- University of Chinese Academy of SciencesBeijing100049People’s Republic of China
| | - Yao He
- Research Instrument ScientistNew York University Abu DhabiAbu DhabiUnited Arab Emirates
| | - Xingke Fu
- Laboratory of Soft Matter PhysicsInstitute of Physics, Chinese Academy of SciencesBeijing100190People’s Republic of China
- University of Chinese Academy of SciencesBeijing100049People’s Republic of China
| | - Xin Zhang
- The University of Hong KongHong Kong SARPeople’s Republic of China
| | - Zhi Geng
- Beijing Synchrotron Radiation FacilityInstitute of High Energy Physics, Chinese Academy of SciencesBeijing100049People’s Republic of China
- University of Chinese Academy of SciencesBeijing100049People’s Republic of China
| | - Wei Ding
- Laboratory of Soft Matter PhysicsInstitute of Physics, Chinese Academy of SciencesBeijing100190People’s Republic of China
- University of Chinese Academy of SciencesBeijing100049People’s Republic of China
| |
Collapse
|
4
|
Huang YJ, Montelione GT. Hidden Structural States of Proteins Revealed by Conformer Selection with AlphaFold-NMR. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.26.600902. [PMID: 38979209 PMCID: PMC11230435 DOI: 10.1101/2024.06.26.600902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Recent advances in molecular modeling using deep learning can revolutionize our understanding of dynamic protein structures. NMR is particularly well-suited for determining dynamic features of biomolecular structures. The conventional process for determining biomolecular structures from experimental NMR data involves its representation as conformation-dependent restraints, followed by generation of structural models guided by these spatial restraints. Here we describe an alternative approach: generating a distribution of realistic protein conformational models using artificial intelligence-(AI-) based methods and then selecting the sets of conformers that best explain the experimental data. We applied this conformational selection approach to redetermine the solution NMR structure of the enzyme Gaussia luciferase. First, we generated a diverse set of conformer models using AlphaFold2 (AF2) with an enhanced sampling protocol. The models that best-fit NOESY and chemical shift data were then selected with a Bayesian scoring metric. The resulting models include features of both the published NMR structure and the standard AF2 model generated without enhanced sampling. This "AlphaFold-NMR" protocol also generated an alternative "open" conformational state that fits nearly as well to the overall NMR data but accounts for some NOESY data that is not consistent with first "closed" conformational state; while other NOESY data consistent with this second state are not consistent with the first conformational state. The structure of this "open" structural state differs from that of the "closed" state primarily by the position of a thumb-shaped loop between α-helices H5 and H6, revealing a cryptic surface pocket. These alternative conformational states of Gluc are supported by "double recall" analysis of NOESY data and AF2 models. Additional structural states are also indicated by backbone chemical shift data indicating partially-disordered conformations for the C-terminal segment. Considered as a multistate ensemble, these multiple states of Gluc together fit the NOESY and chemical shift data better than the "restraint-based" NMR structure and provide novel insights into its structure-dynamic-function relationships. This study demonstrates the potential of AI-based modeling with enhanced sampling to generate conformational ensembles followed by conformer selection with experimental data as an alternative to conventional restraint satisfaction protocols for protein NMR structure determination.
Collapse
Affiliation(s)
- Yuanpeng J. Huang
- Dept of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York, 12180 USA
| | - Gaetano T. Montelione
- Dept of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York, 12180 USA
| |
Collapse
|
5
|
Mubeen H, Masood A, Zafar A, Khan ZQ, Khan MQ, Nisa AU. Insights into AlphaFold's breakthrough in neurodegenerative diseases. Ir J Med Sci 2024:10.1007/s11845-024-03721-6. [PMID: 38833116 DOI: 10.1007/s11845-024-03721-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 05/19/2024] [Indexed: 06/06/2024]
Abstract
Neurodegenerative diseases (ND) are disorders of the central nervous system (CNS) characterized by impairment in neurons' functions, and complete loss, leading to memory loss, and difficulty in learning, language, and movement processes. The most common among these NDs are Alzheimer's disease (AD) and Parkinson's disease (PD), although several other disorders also exist. These are frontotemporal dementia (FTD), amyotrophic lateral syndrome (ALS), Huntington's disease (HD), and others; the major pathological hallmark of NDs is the proteinopathies, either of amyloid-β (Aβ), tauopathies, or synucleinopathies. Aggregation of proteins that do not undergo normal configuration, either due to mutations or through some disturbance in cellular pathway contributes to the diseases. Artificial Intelligence (AI) and deep learning (DL) have proven to be successful in the diagnosis and treatment of various congenital diseases. DL approaches like AlphaFold (AF) are a major leap towards success in CNS disorders. This 3D protein geometry modeling algorithm developed by DeepMind has the potential to revolutionize biology. AF has the potential to predict 3D-protein confirmation at an accuracy level comparable to experimentally predicted one, with the additional advantage of precisely estimating protein interactions. This breakthrough will be beneficial to identify diseases' advancement and the disturbance of signaling pathways stimulating impaired functions of proteins. Though AlphaFold has solved a major problem in structural biology, it cannot predict membrane proteins-a beneficial approach for drug designing.
Collapse
Affiliation(s)
- Hira Mubeen
- Department of Biotechnology, Faculty of Science & Technology, University of Central Punjab, Lahore, Pakistan.
| | - Ammara Masood
- Department of Biotechnology, Faculty of Science & Technology, University of Central Punjab, Lahore, Pakistan
| | - Asma Zafar
- Department of Biotechnology, Faculty of Science & Technology, University of Central Punjab, Lahore, Pakistan
| | - Zohaira Qayyum Khan
- Department of Biotechnology, Faculty of Science & Technology, University of Central Punjab, Lahore, Pakistan
| | - Muneeza Qayyum Khan
- Department of Biotechnology, Faculty of Science & Technology, University of Central Punjab, Lahore, Pakistan
| | - Alim Un Nisa
- Pakistan Council of Scientific and Industrial Research, Lahore, Pakistan
| |
Collapse
|
6
|
Caparotta M, Perez A. Advancing Molecular Dynamics: Toward Standardization, Integration, and Data Accessibility in Structural Biology. J Phys Chem B 2024; 128:2219-2227. [PMID: 38418288 DOI: 10.1021/acs.jpcb.3c04823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2024]
Abstract
Molecular dynamics (MD) simulations have become a valuable tool in structural biology, offering insights into complex biological systems that are difficult to obtain through experimental techniques alone. The lack of available data sets and structures in most published computational work has limited other researchers' use of these models. In recent years, the emergence of online sharing platforms and MD database initiatives favor the deposition of ensembles and structures to accompany publications, favoring reuse of the data sets. However, the lack of uniform metadata collection, formats, and what data are deposited limits the impact and its use by different communities that are not necessarily experts in MD. This Perspective highlights the need for standardization and better resource sharing for processing and interpreting MD simulation results, akin to efforts in other areas of structural biology. As the field moves forward, we will see an increase in popularity and benefits of MD-based integrative approaches combining experimental data and simulations through probabilistic reasoning, but these too are limited by uniformity in experimental data availability and choices on how the data are modeled that are not trivial to decipher from papers. Other fields have addressed similar challenges comprehensively by establishing task forces with different degrees of success. The large scope and number of communities to represent the breadth of types of MD simulations complicates a parallel approach that would fit all. Thus, each group typically decides what data and which format to upload on servers like Zenodo. Uploading data with FAIR (findable, accessible, interoperable, reusable) principles in mind including optimal metadata collection will make the data more accessible and actionable by the community. Such a wealth of simulation data will foster method development and infrastructure advancements, thus propelling the field forward.
Collapse
Affiliation(s)
- Marcelo Caparotta
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| | - Alberto Perez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| |
Collapse
|
7
|
Klukowski P, Damberger FF, Allain FHT, Iwai H, Kadavath H, Ramelot TA, Montelione GT, Riek R, Güntert P. The 100-protein NMR spectra dataset: A resource for biomolecular NMR data analysis. Sci Data 2024; 11:30. [PMID: 38177162 PMCID: PMC10767026 DOI: 10.1038/s41597-023-02879-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/22/2023] [Indexed: 01/06/2024] Open
Abstract
Multidimensional NMR spectra are the basis for studying proteins by NMR spectroscopy and crucial for the development and evaluation of methods for biomolecular NMR data analysis. Nevertheless, in contrast to derived data such as chemical shift assignments in the BMRB and protein structures in the PDB databases, this primary data is in general not publicly archived. To change this unsatisfactory situation, we present a standardized set of solution NMR data comprising 1329 2-4-dimensional NMR spectra and associated reference (chemical shift assignments, structures) and derived (peak lists, restraints for structure calculation, etc.) annotations. With the 100-protein NMR spectra dataset that was originally compiled for the development of the ARTINA deep learning-based spectra analysis method, 100 protein structures can be reproduced from their original experimental data. The 100-protein NMR spectra dataset is expected to help the development of computational methods for NMR spectroscopy, in particular machine learning approaches, and enable consistent and objective comparisons of these methods.
Collapse
Affiliation(s)
- Piotr Klukowski
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
| | - Fred F Damberger
- Institute of Biochemistry, ETH Zurich, 8093, Zurich, Switzerland
| | | | - Hideo Iwai
- Institute of Biotechnology, University of Helsinki, 00100, Helsinki, Finland
| | | | - Theresa A Ramelot
- Department of Chemistry and Chemical Biology, and Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
| | - Gaetano T Montelione
- Department of Chemistry and Chemical Biology, and Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
| | - Roland Riek
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
| | - Peter Güntert
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
- Institute of Biophysical Chemistry, Goethe University, 60438, Frankfurt am Main, Germany.
- Department of Chemistry, Tokyo Metropolitan University, Hachioji, 192-0397, Tokyo, Japan.
| |
Collapse
|
8
|
Abstract
Computational prediction of protein structure has been pursued intensely for decades, motivated largely by the goal of using structural models for drug discovery. Recently developed machine-learning methods such as AlphaFold 2 (AF2) have dramatically improved protein structure prediction, with reported accuracy approaching that of experimentally determined structures. To what extent do these advances translate to an ability to predict more accurately how drugs and drug candidates bind to their target proteins? Here, we carefully examine the utility of AF2 protein structure models for predicting binding poses of drug-like molecules at the largest class of drug targets, the G-protein-coupled receptors. We find that AF2 models capture binding pocket structures much more accurately than traditional homology models, with errors nearly as small as differences between structures of the same protein determined experimentally with different ligands bound. Strikingly, however, the accuracy of ligand-binding poses predicted by computational docking to AF2 models is not significantly higher than when docking to traditional homology models and is much lower than when docking to structures determined experimentally without these ligands bound. These results have important implications for all those who might use predicted protein structures for drug discovery.
Collapse
Affiliation(s)
- Masha Karelina
- Biophysics Program, Stanford UniversityStanfordUnited States
- Department of Computer Science, Stanford UniversityStanfordUnited States
- Department of Molecular and Cellular Physiology, Stanford University School of MedicineStanfordUnited States
- Department of Structural Biology, Stanford University School of MedicineStanfordUnited States
- Institute for Computational and Mathematical Engineering, Stanford UniversityStanfordUnited States
| | - Joseph J Noh
- Department of Computer Science, Stanford UniversityStanfordUnited States
- Department of Molecular and Cellular Physiology, Stanford University School of MedicineStanfordUnited States
- Department of Structural Biology, Stanford University School of MedicineStanfordUnited States
- Institute for Computational and Mathematical Engineering, Stanford UniversityStanfordUnited States
| | - Ron O Dror
- Biophysics Program, Stanford UniversityStanfordUnited States
- Department of Computer Science, Stanford UniversityStanfordUnited States
- Department of Molecular and Cellular Physiology, Stanford University School of MedicineStanfordUnited States
- Department of Structural Biology, Stanford University School of MedicineStanfordUnited States
- Institute for Computational and Mathematical Engineering, Stanford UniversityStanfordUnited States
| |
Collapse
|
9
|
Simoens L, Fijalkowski I, Van Damme P. Exposing the small protein load of bacterial life. FEMS Microbiol Rev 2023; 47:fuad063. [PMID: 38012116 PMCID: PMC10723866 DOI: 10.1093/femsre/fuad063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 11/10/2023] [Accepted: 11/24/2023] [Indexed: 11/29/2023] Open
Abstract
The ever-growing repertoire of genomic techniques continues to expand our understanding of the true diversity and richness of prokaryotic genomes. Riboproteogenomics laid the foundation for dynamic studies of previously overlooked genomic elements. Most strikingly, bacterial genomes were revealed to harbor robust repertoires of small open reading frames (sORFs) encoding a diverse and broadly expressed range of small proteins, or sORF-encoded polypeptides (SEPs). In recent years, continuous efforts led to great improvements in the annotation and characterization of such proteins, yet many challenges remain to fully comprehend the pervasive nature of small proteins and their impact on bacterial biology. In this work, we review the recent developments in the dynamic field of bacterial genome reannotation, catalog the important biological roles carried out by small proteins and identify challenges obstructing the way to full understanding of these elusive proteins.
Collapse
Affiliation(s)
- Laure Simoens
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| | - Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| |
Collapse
|
10
|
Li EH, Spaman LE, Tejero R, Janet Huang Y, Ramelot TA, Fraga KJ, Prestegard JH, Kennedy MA, Montelione GT. Blind assessment of monomeric AlphaFold2 protein structure models with experimental NMR data. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2023; 352:107481. [PMID: 37257257 PMCID: PMC10659763 DOI: 10.1016/j.jmr.2023.107481] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 05/08/2023] [Accepted: 05/15/2023] [Indexed: 06/02/2023]
Abstract
Recent advances in molecular modeling of protein structures are changing the field of structural biology. AlphaFold-2 (AF2), an AI system developed by DeepMind, Inc., utilizes attention-based deep learning to predict models of protein structures with high accuracy relative to structures determined by X-ray crystallography and cryo-electron microscopy (cryoEM). Comparing AF2 models to structures determined using solution NMR data, both high similarities and distinct differences have been observed. Since AF2 was trained on X-ray crystal and cryoEM structures, we assessed how accurately AF2 can model small, monomeric, solution protein NMR structures which (i) were not used in the AF2 training data set, and (ii) did not have homologous structures in the Protein Data Bank at the time of AF2 training. We identified nine open-source protein NMR data sets for such "blind" targets, including chemical shift, raw NMR FID data, NOESY peak lists, and (for 1 case) 15N-1H residual dipolar coupling data. For these nine small (70-108 residues) monomeric proteins, we generated AF2 prediction models and assessed how well these models fit to these experimental NMR data, using several well-established NMR structure validation tools. In most of these cases, the AF2 models fit the NMR data nearly as well, or sometimes better than, the corresponding NMR structure models previously deposited in the Protein Data Bank. These results provide benchmark NMR data for assessing new NMR data analysis and protein structure prediction methods. They also document the potential for using AF2 as a guiding tool in protein NMR data analysis, and more generally for hypothesis generation in structural biology research.
Collapse
Affiliation(s)
- Ethan H Li
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Laura E Spaman
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
| | - Roberto Tejero
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
| | - Yuanpeng Janet Huang
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
| | - Theresa A Ramelot
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
| | - Keith J Fraga
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
| | - James H Prestegard
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602, USA.
| | - Michael A Kennedy
- Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA.
| | - Gaetano T Montelione
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
| |
Collapse
|
11
|
Sedinkin SL, Burns D, Shukla D, Potoyan DA, Venditti V. Solution Structure Ensembles of the Open and Closed Forms of the ∼130 kDa Enzyme I via AlphaFold Modeling, Coarse Grained Simulations, and NMR. J Am Chem Soc 2023; 145:13347-13356. [PMID: 37278728 PMCID: PMC10772991 DOI: 10.1021/jacs.3c03425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Large-scale interdomain rearrangements are essential to protein function, governing the activity of large enzymes and molecular machineries. Yet, obtaining an atomic-resolution understanding of how the relative domain positioning is affected by external stimuli is a hard task in modern structural biology. Here, we show that combining structural modeling by AlphaFold2 with coarse-grained molecular dynamics simulations and NMR residual dipolar coupling data is sufficient to characterize the spatial domain organization of bacterial enzyme I (EI), a ∼130 kDa multidomain oligomeric protein that undergoes large-scale conformational changes during its catalytic cycle. In particular, we solve conformational ensembles for EI at two different experimental temperatures and demonstrate that a lower temperature favors sampling of the catalytically competent closed state of the enzyme. These results suggest a role for conformational entropy in the activation of EI and demonstrate the ability of our protocol to detect and characterize the effect of external stimuli (such as mutations, ligand binding, and post-translational modifications) on the interdomain organization of multidomain proteins. We expect the ensemble refinement protocol described here to be easily transferrable to the investigation of the structure and dynamics of other uncharted multidomain systems and have assembled a Google Colab page (https://potoyangroup.github.io/Seq2Ensemble/) to facilitate implementation of the presented methodology elsewhere.
Collapse
Affiliation(s)
| | - Daniel Burns
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa 50011, USA
| | - Divyanshu Shukla
- Department of Chemistry, Iowa State University, Ames, Iowa 50011, USA
| | - Davit A. Potoyan
- Department of Chemistry, Iowa State University, Ames, Iowa 50011, USA
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa 50011, USA
| | - Vincenzo Venditti
- Department of Chemistry, Iowa State University, Ames, Iowa 50011, USA
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa 50011, USA
| |
Collapse
|
12
|
Nussinov R, Zhang M, Liu Y, Jang H. AlphaFold, allosteric, and orthosteric drug discovery: Ways forward. Drug Discov Today 2023; 28:103551. [PMID: 36907321 PMCID: PMC10238671 DOI: 10.1016/j.drudis.2023.103551] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 02/27/2023] [Accepted: 03/07/2023] [Indexed: 03/13/2023]
Abstract
Drug discovery is arguably a highly challenging and significant interdisciplinary aim. The stunning success of the artificial intelligence-powered AlphaFold, whose latest version is buttressed by an innovative machine-learning approach that integrates physical and biological knowledge about protein structures, raised drug discovery hopes that unsurprisingly, have not come to bear. Even though accurate, the models are rigid, including the drug pockets. AlphaFold's mixed performance poses the question of how its power can be harnessed in drug discovery. Here we discuss possible ways of going forward wielding its strengths, while bearing in mind what AlphaFold can and cannot do. For kinases and receptors, an input enriched in active (ON) state models can better AlphaFold's chance of rational drug design success.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA; Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.
| | - Mingzhen Zhang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Yonglan Liu
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| |
Collapse
|
13
|
Koehler Leman J, Künze G. Recent Advances in NMR Protein Structure Prediction with ROSETTA. Int J Mol Sci 2023; 24:ijms24097835. [PMID: 37175539 PMCID: PMC10178863 DOI: 10.3390/ijms24097835] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 04/15/2023] [Accepted: 04/21/2023] [Indexed: 05/15/2023] Open
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is a powerful method for studying the structure and dynamics of proteins in their native state. For high-resolution NMR structure determination, the collection of a rich restraint dataset is necessary. This can be difficult to achieve for proteins with high molecular weight or a complex architecture. Computational modeling techniques can complement sparse NMR datasets (<1 restraint per residue) with additional structural information to elucidate protein structures in these difficult cases. The Rosetta software for protein structure modeling and design is used by structural biologists for structure determination tasks in which limited experimental data is available. This review gives an overview of the computational protocols available in the Rosetta framework for modeling protein structures from NMR data. We explain the computational algorithms used for the integration of different NMR data types in Rosetta. We also highlight new developments, including modeling tools for data from paramagnetic NMR and hydrogen-deuterium exchange, as well as chemical shifts in CS-Rosetta. Furthermore, strategies are discussed to complement and improve structure predictions made by the current state-of-the-art AlphaFold2 program using NMR-guided Rosetta modeling.
Collapse
Affiliation(s)
- Julia Koehler Leman
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY 10010, USA
| | - Georg Künze
- Institute for Drug Discovery, Medical Faculty, University of Leipzig, Brüderstr. 34, D-04103 Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107 Leipzig, Germany
| |
Collapse
|
14
|
Yang Z, Zeng X, Zhao Y, Chen R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct Target Ther 2023; 8:115. [PMID: 36918529 PMCID: PMC10011802 DOI: 10.1038/s41392-023-01381-z] [Citation(s) in RCA: 84] [Impact Index Per Article: 84.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/27/2022] [Accepted: 02/16/2023] [Indexed: 03/16/2023] Open
Abstract
AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction is one of the most challenging problems in computational biology and chemistry, and has puzzled scientists for 50 years. The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention. Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community, especially in the fields of biology and medicine. AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information, such as drug discovery, protein design, prediction of protein function, et al. Though the time is not long since AF2 was developed, there are already quite a few application studies of AF2 in the fields of biology and medicine, with many of them having preliminarily proved the potential of AF2. To better understand AF2 and promote its applications, we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success, and particularly focus on reviewing its applications in the fields of biology and medicine. Limitations of current AF2 prediction will also be discussed.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Yi Zhao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China.
| |
Collapse
|
15
|
Chang L, Perez A. Ranking Peptide Binders by Affinity with AlphaFold. Angew Chem Int Ed Engl 2023; 62:e202213362. [PMID: 36542066 DOI: 10.1002/anie.202213362] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 12/20/2022] [Accepted: 12/21/2022] [Indexed: 12/24/2022]
Abstract
AlphaFold has revolutionized structural biology by predicting highly accurate structures of proteins and their complexes with peptides and other proteins. However, for protein-peptide systems, we are also interested in identifying the highest affinity binder among a set of candidate peptides. We present a novel competitive binding assay using AlphaFold to predict structures of the receptor in the presence of two peptides. For systems in which the individual structures of the peptides are well predicted, the assay captures the higher affinity binder in the bound state, and the other peptide in the unbound form with statistical significance. We test the application on six protein receptors for which we have experimental binding affinities to several peptides. We find that the assay is best suited for identifying medium to strong peptide binders that adopt stable secondary structures upon binding.
Collapse
Affiliation(s)
- Liwei Chang
- Department of Chemistry, University of Florida, Gainesville, FL, USA.,Quantum Theory Project, University of Florida, Gainesville, FL, USA
| | - Alberto Perez
- Department of Chemistry, University of Florida, Gainesville, FL, USA.,Quantum Theory Project, University of Florida, Gainesville, FL, USA
| |
Collapse
|
16
|
Zhao H, Zhang H, She Z, Gao Z, Wang Q, Geng Z, Dong Y. Exploring AlphaFold2's Performance on Predicting Amino Acid Side-Chain Conformations and Its Utility in Crystal Structure Determination of B318L Protein. Int J Mol Sci 2023; 24:2740. [PMID: 36769074 PMCID: PMC9916901 DOI: 10.3390/ijms24032740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 01/10/2023] [Accepted: 01/12/2023] [Indexed: 02/04/2023] Open
Abstract
Recent technological breakthroughs in machine-learning-based AlphaFold2 (AF2) are pushing the prediction accuracy of protein structures to an unprecedented level that is on par with experimental structural quality. Despite its outstanding structural modeling capability, further experimental validations and performance assessments of AF2 predictions are still required, thus necessitating the development of integrative structural biology in synergy with both computational and experimental methods. Focusing on the B318L protein that plays an essential role in the African swine fever virus (ASFV) for viral replication, we experimentally demonstrate the high quality of the AF2 predicted model and its practical utility in crystal structural determination. Structural alignment implies that the AF2 model shares nearly the same atomic arrangement as the B318L crystal structure except for some flexible and disordered regions. More importantly, side-chain-based analysis at the individual residue level reveals that AF2's performance is likely dependent on the specific amino acid type and that hydrophobic residues tend to be more accurately predicted by AF2 than hydrophilic residues. Quantitative per-residue RMSD comparisons and further molecular replacement trials suggest that AF2 has a large potential to outperform other computational modeling methods in terms of structural determination. Additionally, it is numerically confirmed that the AF2 model is accurate enough so that it may well potentially withstand experimental data quality to a large extent for structural determination. Finally, an overall structural analysis and molecular docking simulation of the B318L protein are performed. Taken together, our study not only provides new insights into AF2's performance in predicting side-chain conformations but also sheds light upon the significance of AF2 in promoting crystal structural determination, especially when the experimental data quality of the protein crystal is poor.
Collapse
Affiliation(s)
- Haifan Zhao
- School of Life Sciences, University of Science and Technology of China, Hefei 230027, China
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Heng Zhang
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Zhun She
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Zengqiang Gao
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Qi Wang
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhi Geng
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Yuhui Dong
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
17
|
Li EH, Spaman L, Tejero R, Huang YJ, Ramelot TA, Fraga KJ, Prestegard JH, Kennedy MA, Montelione GT. Blind Assessment of Monomeric AlphaFold2 Protein Structure Models with Experimental NMR Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.22.525096. [PMID: 36712039 PMCID: PMC9882346 DOI: 10.1101/2023.01.22.525096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Recent advances in molecular modeling of protein structures are changing the field of structural biology. AlphaFold-2 (AF2), an AI system developed by DeepMind, Inc., utilizes attention-based deep learning to predict models of protein structures with high accuracy relative to structures determined by X-ray crystallography and cryo-electron microscopy (cryoEM). Comparing AF2 models to structures determined using solution NMR data, both high similarities and distinct differences have been observed. Since AF2 was trained on X-ray crystal and cryoEM structures, we assessed how accurately AF2 can model small, monomeric, solution protein NMR structures which (i) were not used in the AF2 training data set, and (ii) did not have homologous structures in the Protein Data Bank at the time of AF2 training. We identified nine open source protein NMR data sets for such "blind" targets, including chemical shift, raw NMR FID data, NOESY peak lists, and (for 1 case) 15 N- 1 H residual dipolar coupling data. For these nine small (70 - 108 residues) monomeric proteins, we generated AF2 prediction models and assessed how well these models fit to these experimental NMR data, using several well-established NMR structure validation tools. In most of these cases, the AF2 models fit the NMR data nearly as well, or sometimes better than, the corresponding NMR structure models previously deposited in the Protein Data Bank. These results provide benchmark NMR data for assessing new NMR data analysis and protein structure prediction methods. They also document the potential for using AF2 as a guiding tool in protein NMR data analysis, and more generally for hypothesis generation in structural biology research. Highlights AF2 models assessed against NMR data for 9 monomeric proteins not used in training.AF2 models fit NMR data almost as well as the experimentally-determined structures. RPF-DP, PSVS , and PDBStat software provide structure quality and RDC assessment. RPF-DP analysis using AF2 models suggests multiple conformational states.
Collapse
Affiliation(s)
- Ethan H. Li
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA
| | - Laura Spaman
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA
| | - Roberto Tejero
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA
| | - Yuanpeng Janet Huang
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA
| | - Theresa A. Ramelot
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA
| | - Keith J. Fraga
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA
| | - James H. Prestegard
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602 USA
| | - Michael A. Kennedy
- Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056 USA
| | - Gaetano T. Montelione
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA
| |
Collapse
|