1
|
Krapp LF, Meireles FA, Abriata LA, Devillard J, Vacle S, Marcaida MJ, Dal Peraro M. Context-aware geometric deep learning for protein sequence design. Nat Commun 2024; 15:6273. [PMID: 39054322 PMCID: PMC11272779 DOI: 10.1038/s41467-024-50571-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 07/15/2024] [Indexed: 07/27/2024] Open
Abstract
Protein design and engineering are evolving at an unprecedented pace leveraging the advances in deep learning. Current models nonetheless cannot natively consider non-protein entities within the design process. Here, we introduce a deep learning approach based solely on a geometric transformer of atomic coordinates and element names that predicts protein sequences from backbone scaffolds aware of the restraints imposed by diverse molecular environments. To validate the method, we show that it can produce highly thermostable, catalytically active enzymes with high success rates. This concept is anticipated to improve the versatility of protein design pipelines for crafting desired functions.
Collapse
Affiliation(s)
- Lucien F Krapp
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Fernando A Meireles
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Luciano A Abriata
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Jean Devillard
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Sarah Vacle
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Maria J Marcaida
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Matteo Dal Peraro
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| |
Collapse
|
2
|
The Natterin Proteins Diversity: A Review on Phylogeny, Structure, and Immune Function. Toxins (Basel) 2021; 13:toxins13080538. [PMID: 34437409 PMCID: PMC8402412 DOI: 10.3390/toxins13080538] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/12/2021] [Accepted: 07/21/2021] [Indexed: 12/14/2022] Open
Abstract
Since the first record of the five founder members of the group of Natterin proteins in the venom of the medically significant fish Thalassophryne nattereri, new sequences have been identified in other species. In this work, we performed a detailed screening using available genome databases across a wide range of species to identify sequence members of the Natterin group, sequence similarities, conserved domains, and evolutionary relationships. The high-throughput tools have enabled us to dramatically expand the number of members within this group of proteins, which has a remote origin (around 400 million years ago) and is spread across Eukarya organisms, even in plants and primitive Agnathans jawless fish. Overall, the survey resulted in 331 species presenting Natterin-like proteins, mainly fish, and 859 putative genes. Besides fish, the groups with more species included in our analysis were insects and birds. The number and variety of annotations increased the knowledge of the obtained sequences in detail, such as the conserved motif AGIP in the pore-forming loop involved in the transmembrane barrel insertion, allowing us to classify them as important constituents of the innate immune defense system as effector molecules activating immune cells by interacting with conserved intracellular signaling mechanisms in the hosts.
Collapse
|
3
|
Louis BBV, Abriata LA. Reviewing Challenges of Predicting Protein Melting Temperature Change Upon Mutation Through the Full Analysis of a Highly Detailed Dataset with High-Resolution Structures. Mol Biotechnol 2021; 63:863-884. [PMID: 34101125 PMCID: PMC8443528 DOI: 10.1007/s12033-021-00349-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 06/01/2021] [Indexed: 11/26/2022]
Abstract
Predicting the effects of mutations on protein stability is a key problem in fundamental and applied biology, still unsolved even for the relatively simple case of small, soluble, globular, monomeric, two-state-folder proteins. Many articles discuss the limitations of prediction methods and of the datasets used to train them, which result in low reliability for actual applications despite globally capturing trends. Here, we review these and other issues by analyzing one of the most detailed, carefully curated datasets of melting temperature change (ΔTm) upon mutation for proteins with high-resolution structures. After examining the composition of this dataset to discuss imbalances and biases, we inspect several of its entries assisted by an online app for data navigation and structure display and aided by a neural network that predicts ΔTm with accuracy close to that of programs available to this end. We pose that the ΔTm predictions of our network, and also likely those of other programs, account only for a baseline-like general effect of each type of amino acid substitution which then requires substantial corrections to reproduce the actual stability changes. The corrections are very different for each specific case and arise from fine structural details which are not well represented in the dataset and which, despite appearing reasonable upon visual inspection of the structures, are hard to encode and parametrize. Based on these observations, additional analyses, and a review of recent literature, we propose recommendations for developers of stability prediction methods and for efforts aimed at improving the datasets used for training. We leave our interactive interface for analysis available online at http://lucianoabriata.altervista.org/papersdata/proteinstability2021/s1626navigation.html so that users can further explore the dataset and baseline predictions, possibly serving as a tool useful in the context of structural biology and protein biotechnology research and as material for education in protein biophysics.
Collapse
Affiliation(s)
- Benjamin B V Louis
- Master of Life Sciences Engineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, CH-1015, Lausanne, Switzerland
| | - Luciano A Abriata
- Laboratory for Biomolecular Modeling, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, CH-1015, Lausanne, Switzerland.
- Protein Production and Structure Core Facility, School of Life Sciences, École Polytechnique Fédérale de Lausanne, CH-1015, Lausanne, Switzerland.
| |
Collapse
|
4
|
Cagiada M, Johansson KE, Valanciute A, Nielsen SV, Hartmann-Petersen R, Yang JJ, Fowler DM, Stein A, Lindorff-Larsen K. Understanding the Origins of Loss of Protein Function by Analyzing the Effects of Thousands of Variants on Activity and Abundance. Mol Biol Evol 2021; 38:3235-3246. [PMID: 33779753 PMCID: PMC8321532 DOI: 10.1093/molbev/msab095] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Understanding and predicting how amino acid substitutions affect proteins are keys to our basic understanding of protein function and evolution. Amino acid changes may affect protein function in a number of ways including direct perturbations of activity or indirect effects on protein folding and stability. We have analyzed 6,749 experimentally determined variant effects from multiplexed assays on abundance and activity in two proteins (NUDT15 and PTEN) to quantify these effects and find that a third of the variants cause loss of function, and about half of loss-of-function variants also have low cellular abundance. We analyze the structural and mechanistic origins of loss of function and use the experimental data to find residues important for enzymatic activity. We performed computational analyses of protein stability and evolutionary conservation and show how we may predict positions where variants cause loss of activity or abundance. In this way, our results link thermodynamic stability and evolutionary conservation to experimental studies of different properties of protein fitness landscapes.
Collapse
Affiliation(s)
- Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Audrone Valanciute
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Sofie V Nielsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jun J Yang
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, USA.,Department of Oncology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.,Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
5
|
Faber MS, Wrenbeck EE, Azouz LR, Steiner PJ, Whitehead TA. Impact of In Vivo Protein Folding Probability on Local Fitness Landscapes. Mol Biol Evol 2020; 36:2764-2777. [PMID: 31400199 DOI: 10.1093/molbev/msz184] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
It is incompletely understood how biophysical properties like protein stability impact molecular evolution and epistasis. Epistasis is defined as specific when a mutation exclusively influences the phenotypic effect of another mutation, often at physically interacting residues. In contrast, nonspecific epistasis results when a mutation is influenced by a large number of nonlocal mutations. As most mutations are pleiotropic, the in vivo folding probability-governed by basal protein stability-is thought to determine activity-enhancing mutational tolerance, implying that nonspecific epistasis is dominant. However, evidence exists for both specific and nonspecific epistasis as the prevalent factor, with limited comprehensive data sets to support either claim. Here, we use deep mutational scanning to probe how in vivo enzyme folding probability impacts local fitness landscapes. We computationally designed two different variants of the amidase AmiE with statistically indistinguishable catalytic efficiencies but lower probabilities of folding in vivo compared with wild-type. Local fitness landscapes show slight alterations among variants, with essentially the same global distribution of fitness effects. However, specific epistasis was predominant for the subset of mutations exhibiting positive sign epistasis. These mutations mapped to spatially distinct locations on AmiE near the initial mutation or proximal to the active site. Intriguingly, the majority of specific epistatic mutations were codon dependent, with different synonymous codons resulting in fitness sign reversals. Together, these results offer a nuanced view of how protein folding probability impacts local fitness landscapes and suggest that transcriptional-translational effects are as important as stability in determining evolutionary outcomes.
Collapse
Affiliation(s)
- Matthew S Faber
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI
| | - Emily E Wrenbeck
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI
| | - Laura R Azouz
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI
| | - Paul J Steiner
- Department of Chemical and Biological Engineering, University of Colorado, Boulder, CO
| | - Timothy A Whitehead
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI.,Department of Chemical and Biological Engineering, University of Colorado, Boulder, CO.,E.E.W. Ginkgo Bioworks, L.R.A. McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, TX
| |
Collapse
|