1
|
Notin P, Kollasch AW, Ritter D, van Niekerk L, Paul S, Spinner H, Rollins N, Shaw A, Weitzman R, Frazer J, Dias M, Franceschi D, Orenbuch R, Gal Y, Marks DS. ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.07.570727. [PMID: 38106144 PMCID: PMC10723403 DOI: 10.1101/2023.12.07.570727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite a surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and the variable performance of models across different protein families. Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design. It encompasses both a broad collection of over 250 standardized deep mutational scanning assays, spanning millions of mutated sequences, as well as curated clinical datasets providing high-quality expert annotations about mutation effects. We devise a robust evaluation framework that combines metrics for both fitness prediction and design, factors in known limitations of the underlying experimental methods, and covers both zero-shot and supervised settings. We report the performance of a diverse set of over 70 high-performing models from various subfields (eg., alignment-based, inverse folding) into a unified benchmark suite. We open source the corresponding codebase, datasets, MSAs, structures, model predictions and develop a user-friendly website that facilitates data access and analysis.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Ada Shaw
- Applied Mathematics, Harvard University
| | | | | | - Mafalda Dias
- Centre for Genomic Regulation, Universitat Pompeu Fabra
| | | | | | - Yarin Gal
- Computer Science, University of Oxford
| | | |
Collapse
|
2
|
Fu Y, Bedő J, Papenfuss AT, Rubin AF. Integrating deep mutational scanning and low-throughput mutagenesis data to predict the impact of amino acid variants. Gigascience 2022; 12:giad073. [PMID: 37721410 PMCID: PMC10506130 DOI: 10.1093/gigascience/giad073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 07/02/2023] [Accepted: 08/23/2023] [Indexed: 09/19/2023] Open
Abstract
BACKGROUND Evaluating the impact of amino acid variants has been a critical challenge for studying protein function and interpreting genomic data. High-throughput experimental methods like deep mutational scanning (DMS) can measure the effect of large numbers of variants in a target protein, but because DMS studies have not been performed on all proteins, researchers also model DMS data computationally to estimate variant impacts by predictors. RESULTS In this study, we extended a linear regression-based predictor to explore whether incorporating data from alanine scanning (AS), a widely used low-throughput mutagenesis method, would improve prediction results. To evaluate our model, we collected 146 AS datasets, mapping to 54 DMS datasets across 22 distinct proteins. CONCLUSIONS We show that improved model performance depends on the compatibility of the DMS and AS assays, and the scale of improvement is closely related to the correlation between DMS and AS results.
Collapse
Affiliation(s)
- Yunfan Fu
- The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Pde, Parkville, Victoria 3052, Australia
- The University of Melbourne, Department of Medical Biology, Parkville, Victoria 3010, Australia
| | - Justin Bedő
- The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Pde, Parkville, Victoria 3052, Australia
- The University of Melbourne, Department of Medical Biology, Parkville, Victoria 3010, Australia
| | - Anthony T Papenfuss
- The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Pde, Parkville, Victoria 3052, Australia
- The University of Melbourne, Department of Medical Biology, Parkville, Victoria 3010, Australia
- Peter MacCallum Cancer Centre, Melbourne, Victoria 3000, Australia
| | - Alan F Rubin
- The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Pde, Parkville, Victoria 3052, Australia
- The University of Melbourne, Department of Medical Biology, Parkville, Victoria 3010, Australia
| |
Collapse
|
3
|
Navarro-Paya C, Sanz-Hernandez M, De Simone A. Plasticity of Membrane Binding by the Central Region of α-Synuclein. Front Mol Biosci 2022; 9:857217. [PMID: 35782868 PMCID: PMC9240306 DOI: 10.3389/fmolb.2022.857217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 05/02/2022] [Indexed: 12/20/2022] Open
Abstract
Membrane binding by α-synuclein (αS), an intrinsically disordered protein whose aggregation is associated with Parkinson’s disease, is a key step in determining its biological properties under both physiological and pathological conditions. Upon membrane interaction, αS retains a partial level of structural disorder despite acquiring α-helical content. In the membrane-bound state, the equilibrium between the helical-bound and disordered-detached states of the central region of αS (residues 65–97) has been involved in a double-anchor mechanism that promotes the clustering of synaptic vesicles. Herein, we investigated the underlying molecular bases of this equilibrium using enhanced coarse-grained molecular dynamics simulations. The results enabled clarifying the conformational dependencies of the membrane affinity by this protein region that, in addition to playing a role in physiological membrane binding, has key relevance for the aggregation of αS and the mechanisms of the toxicity of the resulting assemblies.
Collapse
Affiliation(s)
- Carlos Navarro-Paya
- Department of Life Sciences, Imperial College London, London, United Kingdom
| | | | - Alfonso De Simone
- Department of Life Sciences, Imperial College London, London, United Kingdom
- Department of Pharmacy, University of Naples Federico II, Naples, Italy
- *Correspondence: Alfonso De Simone,
| |
Collapse
|
4
|
Kratochvil HT, Newberry RW, Mensa B, Mravic M, DeGrado WF. Spiers Memorial Lecture: Analysis and de novo design of membrane-interactive peptides. Faraday Discuss 2021; 232:9-48. [PMID: 34693965 PMCID: PMC8979563 DOI: 10.1039/d1fd00061f] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Membrane-peptide interactions play critical roles in many cellular and organismic functions, including protection from infection, remodeling of membranes, signaling, and ion transport. Peptides interact with membranes in a variety of ways: some associate with membrane surfaces in either intrinsically disordered conformations or well-defined secondary structures. Peptides with sufficient hydrophobicity can also insert vertically as transmembrane monomers, and many associate further into membrane-spanning helical bundles. Indeed, some peptides progress through each of these stages in the process of forming oligomeric bundles. In each case, the structure of the peptide and the membrane represent a delicate balance between peptide-membrane and peptide-peptide interactions. We will review this literature from the perspective of several biologically important systems, including antimicrobial peptides and their mimics, α-synuclein, receptor tyrosine kinases, and ion channels. We also discuss the use of de novo design to construct models to test our understanding of the underlying principles and to provide useful leads for pharmaceutical intervention of diseases.
Collapse
Affiliation(s)
- Huong T Kratochvil
- Department of Pharmaceutical Chemistry, University of California - San Francisco, San Francisco, CA 94158, USA.
| | - Robert W Newberry
- Department of Pharmaceutical Chemistry, University of California - San Francisco, San Francisco, CA 94158, USA.
| | - Bruk Mensa
- Department of Pharmaceutical Chemistry, University of California - San Francisco, San Francisco, CA 94158, USA.
| | - Marco Mravic
- Department of Integrative Structural and Computational Biology, Scripps Research Institute, La Jolla, CA 92037, USA
| | - William F DeGrado
- Department of Pharmaceutical Chemistry, University of California - San Francisco, San Francisco, CA 94158, USA.
| |
Collapse
|
5
|
Bassereau P. Concluding remarks: peptide-membrane interactions. Faraday Discuss 2021; 232:482-493. [PMID: 34825682 DOI: 10.1039/d1fd00077b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
This article is based on the concluding remarks lecture given at the Faraday Discussion meeting on peptide-membrane interactions, held online, 8-10th September 2021.
Collapse
Affiliation(s)
- Patricia Bassereau
- Institut Curie, Université PSL, Sorbonne Université, CNRS UMR168, Laboratoire Physico-Chimie Curie, 75005 Paris, France.
| |
Collapse
|
6
|
Lindorff-Larsen K, Kragelund BB. On the potential of machine learning to examine the relationship between sequence, structure, dynamics and function of intrinsically disordered proteins. J Mol Biol 2021; 433:167196. [PMID: 34390736 DOI: 10.1016/j.jmb.2021.167196] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 08/03/2021] [Accepted: 08/04/2021] [Indexed: 11/29/2022]
Abstract
Intrinsically disordered proteins (IDPs) constitute a broad set of proteins with few uniting and many diverging properties. IDPs-and intrinsically disordered regions (IDRs) interspersed between folded domains-are generally characterized as having no persistent tertiary structure; instead they interconvert between a large number of different and often expanded structures. IDPs and IDRs are involved in an enormously wide range of biological functions and reveal novel mechanisms of interactions, and while they defy the common structure-function paradigm of folded proteins, their structural preferences and dynamics are important for their function. We here discuss open questions in the field of IDPs and IDRs, focusing on areas where machine learning and other computational methods play a role. We discuss computational methods aimed to predict transiently formed local and long-range structure, including methods for integrative structural biology. We discuss the many different ways in which IDPs and IDRs can bind to other molecules, both via short linear motifs, as well as in the formation of larger dynamic complexes such as biomolecular condensates. We discuss how experiments are providing insight into such complexes and may enable more accurate predictions. Finally, we discuss the role of IDPs in disease and how new methods are needed to interpret the mechanistic effects of genomic variants in IDPs.
Collapse
Affiliation(s)
- Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Birthe B Kragelund
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
7
|
Invertebrate Models Untangle the Mechanism of Neurodegeneration in Parkinson's Disease. Cells 2021; 10:cells10020407. [PMID: 33669308 PMCID: PMC7920059 DOI: 10.3390/cells10020407] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/09/2021] [Accepted: 02/13/2021] [Indexed: 12/15/2022] Open
Abstract
Parkinson's disease (PD) is the second most common neurodegenerative disease, afflicting ~10 million people worldwide. Although several genes linked to PD are currently identified, PD remains primarily an idiopathic disorder. Neuronal protein α-synuclein is a major player in disease progression of both genetic and idiopathic forms of PD. However, it cannot alone explain underlying pathological processes. Recent studies demonstrate that many other risk factors can accelerate or further worsen brain dysfunction in PD patients. Several PD models, including non-mammalian eukaryotic organisms, have been developed to identify and characterize these factors. This review discusses recent findings in three PD model organisms, i.e., yeast, Drosophila, and Caenorhabditis elegans, that opened new mechanisms and identified novel contributors to this disorder. These non-mammalian models share many conserved molecular pathways and cellular processes with humans. New players affecting PD pathogenesis include previously unknown genes/proteins, novel signaling pathways, and low molecular weight substances. These findings might respond to the urgent need to discover novel drug targets for PD treatment and new biomarkers for early diagnostics of this disease. Since the study of neurodegeneration using simple eukaryotic organisms brought a huge amount of information, we include only the most recent or the most important relevant data.
Collapse
|