1
|
Küng C, Protsenko O, Vanella R, Nash MA. Deep mutational scanning reveals a de novo disulfide bond and combinatorial mutations for engineering thermostable myoglobin. Protein Sci 2025; 34:e70112. [PMID: 40247745 PMCID: PMC12006728 DOI: 10.1002/pro.70112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Revised: 02/18/2025] [Accepted: 03/13/2025] [Indexed: 04/19/2025]
Abstract
Engineering protein stability is a critical challenge in biotechnology. Here, we used massively parallel deep mutational scanning (DMS) to comprehensively explore the mutational stability landscape of human myoglobin (hMb) and identify key mutations that enhance stability. Our DMS approach involved screening over 10,000 hMb variants by yeast surface display, single-cell sorting, and high-throughput DNA sequencing. We show how surface display levels serve as a proxy for thermostability of soluble hMb variants and report strong correlations between DMS-derived display levels and top-performing machine learning stability prediction algorithms. This approach led to the discovery of a variant with a de novo disulfide bond between residues R32C and C111, which increased thermostability by >12°C compared with wild-type hMb. By combining single stabilizing mutations with R32C, we engineered combinatorial variants that exhibited predominantly additive effects on stability with minimal epistasis. The most stable combinatorial variant exhibited a denaturation temperature exceeding 89°C, representing a >17°C improvement over wild-type hMb. Our findings demonstrate the capabilities in DMS-assisted combinatorial protein engineering to guide the discovery of thermostable variants and highlight the potential of massively parallel mutational analysis for the development of proteins for industrial and biomedical applications.
Collapse
Affiliation(s)
- Christoph Küng
- Department of Chemistry, Institute of Physical ChemistryUniversity of BaselBaselSwitzerland
- Department of Biosystems Science and EngineeringETH ZurichBaselSwitzerland
| | - Olena Protsenko
- Department of Chemistry, Institute of Physical ChemistryUniversity of BaselBaselSwitzerland
- Department of Biosystems Science and EngineeringETH ZurichBaselSwitzerland
| | - Rosario Vanella
- Department of Chemistry, Institute of Physical ChemistryUniversity of BaselBaselSwitzerland
- Department of Biosystems Science and EngineeringETH ZurichBaselSwitzerland
| | - Michael A. Nash
- Department of Chemistry, Institute of Physical ChemistryUniversity of BaselBaselSwitzerland
- Department of Biosystems Science and EngineeringETH ZurichBaselSwitzerland
| |
Collapse
|
2
|
Xu M, Dantu SC, Garnett JA, Bonomo RA, Pandini A, Haider S. Functionally important residues from graph analysis of coevolved dynamic couplings. eLife 2025; 14:RP105005. [PMID: 40153310 PMCID: PMC11952748 DOI: 10.7554/elife.105005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2025] Open
Abstract
The relationship between protein dynamics and function is essential for understanding biological processes and developing effective therapeutics. Functional sites within proteins are critical for activities such as substrate binding, catalysis, and structural changes. Existing computational methods for the predictions of functional residues are trained on sequence, structural, and experimental data, but they do not explicitly model the influence of evolution on protein dynamics. This overlooked contribution is essential as it is known that evolution can fine-tune protein dynamics through compensatory mutations either to improve the proteins' performance or diversify its function while maintaining the same structural scaffold. To model this critical contribution, we introduce DyNoPy, a computational method that combines residue coevolution analysis with molecular dynamics simulations, revealing hidden correlations between functional sites. DyNoPy constructs a graph model of residue-residue interactions, identifies communities of key residue groups, and annotates critical sites based on their roles. By leveraging the concept of coevolved dynamical couplings-residue pairs with critical dynamical interactions that have been preserved during evolution-DyNoPy offers a powerful method for predicting and analysing protein evolution and dynamics. We demonstrate the effectiveness of DyNoPy on SHV-1 and PDC-3, chromosomally encoded β-lactamases linked to antibiotic resistance, highlighting its potential to inform drug design and address pressing healthcare challenges.
Collapse
Affiliation(s)
- Manming Xu
- UCL School of PharmacyLondonUnited Kingdom
| | | | - James A Garnett
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King’s College LondonLondonUnited Kingdom
| | - Robert A Bonomo
- Research Service, Louis Stokes Cleveland Department of Veterans Affairs Medical CenterClevelandUnited States
- Department of Molecular Biology and Microbiology, Case Western Reserve University School of MedicineClevelandUnited States
- Department of Medicine, Case Western Reserve University School of MedicineClevelandUnited States
- Departments of Pharmacology, Biochemistry, and Proteomics and Bioinformatics Case Western Reserve University School of MedicineClevelandUnited States
- CWRU-Cleveland VAMC Center for Antimicrobial Resistance and Epidemiology (Case VA CARES)ClevelandUnited States
| | - Alessandro Pandini
- Department of Computer Science, Brunel University LondonUxbridgeUnited Kingdom
| | - Shozeb Haider
- UCL School of PharmacyLondonUnited Kingdom
- University of Tabuk (PFSCBR)TabukSaudi Arabia
- UCL Center for Advanced Research Computing, University College LondonLondonUnited Kingdom
| |
Collapse
|
3
|
Topolska M, Beltran A, Lehner B. Deep indel mutagenesis reveals the impact of amino acid insertions and deletions on protein stability and function. Nat Commun 2025; 16:2617. [PMID: 40097423 PMCID: PMC11914627 DOI: 10.1038/s41467-025-57510-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 02/21/2025] [Indexed: 03/19/2025] Open
Abstract
Amino acid insertions and deletions (indels) are an abundant class of genetic variants. However, compared to substitutions, the effects of indels on protein stability are not well understood. To better understand indels here we analyse new and existing large-scale deep indel mutagenesis (DIM) of structurally diverse proteins. The effects of indels on protein stability vary extensively among and within proteins and are not well predicted by existing computational methods. To address this shortcoming we present INDELi, a series of models that combine experimental or predicted substitution effects and secondary structure information to provide good prediction of the effects of indels on both protein stability and pathogenicity. Moreover, quantifying the effects of indels on protein-protein interactions suggests that insertions can be an important class of gain-of-function variants. Our results provide an overview of the impact of indels on proteins and a method to predict their effects genome-wide.
Collapse
Affiliation(s)
- Magdalena Topolska
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- University Pompeu Fabra (UPF), Barcelona, Spain
| | - Antoni Beltran
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- University Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i estudis Avançats (ICREA), Barcelona, Spain.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
4
|
Ozkan S, Padilla N, de la Cruz X. QAFI: a novel method for quantitative estimation of missense variant impact using protein-specific predictors and ensemble learning. Hum Genet 2025; 144:191-208. [PMID: 39048855 PMCID: PMC11976337 DOI: 10.1007/s00439-024-02692-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 07/14/2024] [Indexed: 07/27/2024]
Abstract
Next-generation sequencing (NGS) has revolutionized genetic diagnostics, yet its application in precision medicine remains incomplete, despite significant advances in computational tools for variant annotation. Many variants remain unannotated, and existing tools often fail to accurately predict the range of impacts that variants have on protein function. This limitation restricts their utility in relevant applications such as predicting disease severity and onset age. In response to these challenges, a new generation of computational models is emerging, aimed at producing quantitative predictions of genetic variant impacts. However, the field is still in its early stages, and several issues need to be addressed, including improved performance and better interpretability. This study introduces QAFI, a novel methodology that integrates protein-specific regression models within an ensemble learning framework, utilizing conservation-based and structure-related features derived from AlphaFold models. Our findings indicate that QAFI significantly enhances the accuracy of quantitative predictions across various proteins. The approach has been rigorously validated through its application in the CAGI6 contest, focusing on ARSA protein variants, and further tested on a comprehensive set of clinically labeled variants, demonstrating its generalizability and robust predictive power. The straightforward nature of our models may also contribute to better interpretability of the results.
Collapse
Affiliation(s)
- Selen Ozkan
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Natàlia Padilla
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Xavier de la Cruz
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
5
|
Axakova A, Ding M, Cote AG, Subramaniam R, Senguttuvan V, Zhang H, Weile J, Douville SV, Gebbia M, Al-Chalabi A, Wahl A, Reuter J, Hurt J, Mitchell A, Fradette S, Andersen PM, van Loggerenberg W, Roth FP. Landscapes of missense variant impact for human superoxide dismutase 1. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.25.640191. [PMID: 40060668 PMCID: PMC11888409 DOI: 10.1101/2025.02.25.640191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/15/2025]
Abstract
Amyotrophic lateral sclerosis (ALS) is a progressive motor neuron disease for which important subtypes are caused by variation in the Superoxide Dismutase 1 gene SOD1. Diagnosis based on SOD1 sequencing can not only be definitive but also indicate specific therapies available for SOD1-associated ALS (SOD1-ALS). Unfortunately, SOD1-ALS diagnosis is limited by the fact that a substantial fraction (currently 26%) of ClinVar SOD1 missense variants are classified as "variants of uncertain significance" (VUS). Although functional assays can provide strong evidence for clinical variant interpretation, SOD1 assay validation is challenging, given the current incomplete and controversial understanding of SOD1-ALS disease mechanism. Using saturation mutagenesis and multiplexed cell-based assays, we measured the functional impact of over two thousand SOD1 amino acid substitutions on both enzymatic function and protein abundance. The resulting 'missense variant effect maps' not only reflect prior biochemical knowledge of SOD1 but also provide sequence-structure-function insights. Importantly, our variant abundance assay can discriminate pathogenic missense variation and provides new evidence for 41% of missense variants that had been previously reported as VUS, offering the potential to identify additional patients who would benefit from therapy approved for SOD1-ALS.
Collapse
Affiliation(s)
- Anna Axakova
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3K3, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
| | - Megan Ding
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3K3, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
| | - Atina G Cote
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3K3, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
| | - Radha Subramaniam
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3K3, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
| | - Vignesh Senguttuvan
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, USA
| | - Haotian Zhang
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, USA
| | - Jochen Weile
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3K3, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
| | - Samuel V Douville
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3K3, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
- Faculty of Health Science, McMaster University, Hamilton, ON L8S 4L8, Canada
| | - Marinella Gebbia
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3K3, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
| | - Ammar Al-Chalabi
- Maurice Wohl Clinical Neuroscience Institute, King's College London, London, SE5 9RX, UK
| | - Alexander Wahl
- Labcorp Genetics (Formerly Invitae Corp.), CA 94103, USA
| | - Jason Reuter
- Labcorp Genetics (Formerly Invitae Corp.), CA 94103, USA
| | | | | | | | | | - Warren van Loggerenberg
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3K3, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, USA
| | - Frederick P Roth
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3K3, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, USA
| |
Collapse
|
6
|
Lyu Y, Xiong T, Shi S, Wang D, Yang X, Liu Q, Li Z, Li Z, Wang C, Chen R. Prediction of the Trimer Protein Interface Residue Pair by CNN-GRU Model Based on Multi-Feature Map. NANOMATERIALS (BASEL, SWITZERLAND) 2025; 15:188. [PMID: 39940164 PMCID: PMC11821012 DOI: 10.3390/nano15030188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/26/2024] [Revised: 01/21/2025] [Accepted: 01/22/2025] [Indexed: 02/14/2025]
Abstract
Most life activities of organisms are realized through protein-protein interactions, and these interactions are mainly achieved through residue-residue contact between monomer proteins. Consequently, studying residue-residue contact at the protein interaction interface can contribute to a deeper understanding of the protein-protein interaction mechanism. In this paper, we focus on the research of the trimer protein interface residue pair. Firstly, we utilize the amino acid k-interval product factor descriptor (AAIPF(k)) to integrate the positional information and physicochemical properties of amino acids, combined with the electric properties and geometric shape features of residues, to construct an 8 × 16 multi-feature map. This multi-feature map represents a sample composed of two residues on a trimer protein. Secondly, we construct a CNN-GRU deep learning framework to predict the trimer protein interface residue pair. The results show that when each dimer protein provides 10 prediction results and two protein-protein interaction interfaces of a trimer protein needed to be accurately predicted, the accuracy of our proposed method is 60%. When each dimer protein provides 10 prediction results and one protein-protein interaction interface of a trimer protein needs to be accurately predicted, the accuracy of our proposed method is 93%. Our results can provide experimental researchers with a limited yet precise dataset containing correct trimer protein interface residue pairs, which is of great significance in guiding the experimental resolution of the trimer protein three-dimensional structure. Furthermore, compared to other computational methods, our proposed approach exhibits superior performance in predicting residue-residue contact at the trimer protein interface.
Collapse
Affiliation(s)
- Yanfen Lyu
- College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China; (Y.L.); (T.X.)
- School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, China; (S.S.); (X.Y.); (Q.L.); (Z.L.); (Z.L.)
- Key Laboratory of Manufacture Technology of Veterinary Bioproducts, Ministry of Agriculture and Rural Affairs, Zhaoqing Dahuanong Biology Medicine Co., Ltd., Zhaoqing 526238, China
| | - Ting Xiong
- College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China; (Y.L.); (T.X.)
- Zhaoqing Branch of Guangdong Laboratory of Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China
| | - Shuaibo Shi
- School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, China; (S.S.); (X.Y.); (Q.L.); (Z.L.); (Z.L.)
| | - Dong Wang
- School of Mechanical and Equipment Engineering, Hebei University of Engineering, Handan 056038, China;
| | - Xueqing Yang
- School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, China; (S.S.); (X.Y.); (Q.L.); (Z.L.); (Z.L.)
| | - Qihuan Liu
- School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, China; (S.S.); (X.Y.); (Q.L.); (Z.L.); (Z.L.)
| | - Zhengtan Li
- School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, China; (S.S.); (X.Y.); (Q.L.); (Z.L.); (Z.L.)
| | - Zhixin Li
- School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, China; (S.S.); (X.Y.); (Q.L.); (Z.L.); (Z.L.)
| | - Chunxia Wang
- College of Landscape and Ecological Engineering, Hebei University of Engineering, Handan 056038, China
| | - Ruiai Chen
- College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China; (Y.L.); (T.X.)
- Key Laboratory of Manufacture Technology of Veterinary Bioproducts, Ministry of Agriculture and Rural Affairs, Zhaoqing Dahuanong Biology Medicine Co., Ltd., Zhaoqing 526238, China
- Zhaoqing Branch of Guangdong Laboratory of Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China
| |
Collapse
|
7
|
Chen M, Ma L, Li M, Fang X, Yang Y, Wang C. Position-Regulated Electrostatic Interactions for Single Amino Acid Revealed by Aspartic Acid-Scanning Mutagenesis. Chembiochem 2025; 26:e202400891. [PMID: 39668651 DOI: 10.1002/cbic.202400891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 12/10/2024] [Accepted: 12/11/2024] [Indexed: 12/14/2024]
Abstract
We have examined in this contribution the electrostatic interactions between single arginine and aspartic acid by analyzing the peptide-peptide binding characteristics involving arginine-aspartic acid, arginine-glycine, arginine-tryptophan and tryptophan-glycine interactions. The results of aspartic acid mutagenesis revealed that the interactions between arginine and aspartic acid have significant dependence on the position and composition of amino acids. While the primary interaction can be attributed to arginine-tryptophan contacts originated from the indole moieties with the main chains of 14-mers containing N-H and C=O moieties, pronounced enhancement could be identified in association with the electrostatic side-chain-side-chain interactions between arginine and aspartic acid. An optimal separation of 2~4 amino acids between two adjacent aspartic acid and tryptophan binding sites can be identified to achieve maximal enhancement of binding interactions. Such observed separation dependence may be utilized to unravel cooperative effects in heterogeneous interactions between single pair of amino acids.
Collapse
Affiliation(s)
- Mengting Chen
- Key Laboratory for Biological Effects of Nanomaterials and Nanosafety, Key Laboratory of Standardization and Measurement for Nanotechnology, National Center for Nanoscience and Technology, Beijing, 100190, P.R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P.R. China
| | - Lilusi Ma
- Key Laboratory for Biological Effects of Nanomaterials and Nanosafety, Key Laboratory of Standardization and Measurement for Nanotechnology, National Center for Nanoscience and Technology, Beijing, 100190, P.R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P.R. China
| | - Minxian Li
- Key Laboratory for Biological Effects of Nanomaterials and Nanosafety, Key Laboratory of Standardization and Measurement for Nanotechnology, National Center for Nanoscience and Technology, Beijing, 100190, P.R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P.R. China
| | - Xiaocui Fang
- Key Laboratory for Biological Effects of Nanomaterials and Nanosafety, Key Laboratory of Standardization and Measurement for Nanotechnology, National Center for Nanoscience and Technology, Beijing, 100190, P.R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P.R. China
| | - Yanlian Yang
- Key Laboratory for Biological Effects of Nanomaterials and Nanosafety, Key Laboratory of Standardization and Measurement for Nanotechnology, National Center for Nanoscience and Technology, Beijing, 100190, P.R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P.R. China
| | - Chen Wang
- Key Laboratory for Biological Effects of Nanomaterials and Nanosafety, Key Laboratory of Standardization and Measurement for Nanotechnology, National Center for Nanoscience and Technology, Beijing, 100190, P.R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P.R. China
| |
Collapse
|
8
|
Keen MM, Keith AD, Ortlund EA. Epitope mapping via in vitro deep mutational scanning methods and its applications. J Biol Chem 2025; 301:108072. [PMID: 39674321 PMCID: PMC11783119 DOI: 10.1016/j.jbc.2024.108072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 12/04/2024] [Accepted: 12/09/2024] [Indexed: 12/16/2024] Open
Abstract
Epitope mapping is a technique employed to define the region of an antigen that elicits an immune response, providing crucial insight into the structural architecture of the antigen as well as epitope-paratope interactions. With this breadth of knowledge, immunotherapies, diagnostics, and vaccines are being developed with a rational and data-supported design. Traditional epitope mapping methods are laborious, time-intensive, and often lack the ability to screen proteins in a high-throughput manner or provide high resolution. Deep mutational scanning (DMS), however, is revolutionizing the field as it can screen all possible single amino acid mutations and provide an efficient and high-throughput way to infer the structures of both linear and three-dimensional epitopes with high resolution. Currently, more than 50 publications take this approach to efficiently identify enhancing or escaping mutations, with many then employing this information to rapidly develop broadly neutralizing antibodies, T-cell immunotherapies, vaccine platforms, or diagnostics. We provide a comprehensive review of the approaches to accomplish epitope mapping while also providing a summation of the development of DMS technology and its impactful applications.
Collapse
Affiliation(s)
- Meredith M Keen
- Department of Biochemistry, Emory School of Medicine, Emory University, Atlanta, Georgia, USA
| | - Alasdair D Keith
- Department of Biochemistry, Emory School of Medicine, Emory University, Atlanta, Georgia, USA
| | - Eric A Ortlund
- Department of Biochemistry, Emory School of Medicine, Emory University, Atlanta, Georgia, USA.
| |
Collapse
|
9
|
Tzavella K, Diaz A, Olsen C, Vranken W. Combining evolution and protein language models for an interpretable cancer driver mutation prediction with D2Deep. Brief Bioinform 2024; 26:bbae664. [PMID: 39708841 DOI: 10.1093/bib/bbae664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 09/15/2024] [Accepted: 12/07/2024] [Indexed: 12/23/2024] Open
Abstract
The mutations driving cancer are being increasingly exposed through tumor-specific genomic data. However, differentiating between cancer-causing driver mutations and random passenger mutations remains challenging. State-of-the-art homology-based predictors contain built-in biases and are often ill-suited to the intricacies of cancer biology. Protein language models have successfully addressed various biological problems but have not yet been tested on the challenging task of cancer driver mutation prediction at a large scale. Additionally, they often fail to offer result interpretation, hindering their effective use in clinical settings. The AI-based D2Deep method we introduce here addresses these challenges by combining two powerful elements: (i) a nonspecialized protein language model that captures the makeup of all protein sequences and (ii) protein-specific evolutionary information that encompasses functional requirements for a particular protein. D2Deep relies exclusively on sequence information, outperforms state-of-the-art predictors, and captures intricate epistatic changes throughout the protein caused by mutations. These epistatic changes correlate with known mutations in the clinical setting and can be used for the interpretation of results. The model is trained on a balanced, somatic training set and so effectively mitigates biases related to hotspot mutations compared to state-of-the-art techniques. The versatility of D2Deep is illustrated by its performance on non-cancer mutation prediction, where most variants still lack known consequences. D2Deep predictions and confidence scores are available via https://tumorscope.be/d2deep to help with clinical interpretation and mutation prioritization.
Collapse
Affiliation(s)
- Konstantina Tzavella
- Interuniversity Institute of Bioinformatics (IB2), Université Libre de Bruxelles, Vrije Universiteit Brussel (ULB-VUB), Triomflaan, Brussels 1050, Belgium
| | - Adrian Diaz
- Interuniversity Institute of Bioinformatics (IB2), Université Libre de Bruxelles, Vrije Universiteit Brussel (ULB-VUB), Triomflaan, Brussels 1050, Belgium
| | - Catharina Olsen
- Interuniversity Institute of Bioinformatics (IB2), Université Libre de Bruxelles, Vrije Universiteit Brussel (ULB-VUB), Triomflaan, Brussels 1050, Belgium
- Brussels Interuniversity Genomics High Throughput Core (BRIGHTcore), Vrije Universiteit Brussel (VUB), Université Libre de Bruxelles (ULB), Laarbeeklaan 101, Brussels 1090, Belgium
- Clinical Sciences, Research Group Genetics, Reproduction and Development (GRAD), Vrije Universiteit Brussel (VUB), Universitair Ziekenhuis Brussel (UZ Brussel), Laarbeeklaan 101, Brussels 1090, Belgium
| | - Wim Vranken
- Interuniversity Institute of Bioinformatics (IB2), Université Libre de Bruxelles, Vrije Universiteit Brussel (ULB-VUB), Triomflaan, Brussels 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel (VUB), Pleinlaan 2, Brussels 1050, Belgium
- Chemistry Department, Vrije Universiteit Brussel, Pleinlaan 2, Brussels 1050, Belgium
- AI Lab, Vrije Universtiteit Brussel, Pleinlaan 2, Brussels 1050, Belgium
- Biomedical sciences, Vrije Universiteit Brussel, Laarbeeklaan 101, Brussels 1090, Belgium
| |
Collapse
|
10
|
Yang S, Ni J, Xu P. AI4ACEIP: A Computing Tool to Identify Food Peptides with High Inhibitory Activity for ACE by Merged Molecular Representation and Rich Intrinsic Sequence Information Based on an Ensemble Learning Strategy. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:25340-25356. [PMID: 39495772 DOI: 10.1021/acs.jafc.4c05650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2024]
Abstract
Hypertension is a common chronic disorder and a major risk factor for cardiovascular diseases. Angiotensin-converting enzyme (ACE) converts angiotensin I to angiotensin II, causing vasoconstriction and raising blood pressure. Pharmacotherapy is the mainstay of traditional hypertension treatment, leading to various negative side effects. Some food-derived peptides can suppress ACE, named ACEIP with fewer undesirable effects. Therefore, it is crucial to seek strong dietary ACEIP to aid in hypertension treatment. In this article, we propose a new model called AI4ACEIP to identify ACEIP. AI4ACEIP uses a novel two-layer stacked ensemble architecture to predict ACEIP relying on integrated view features derived from sequence, large language models, and molecular-based information. The analysis of feature combinations reveals that four selected integrated feature pairs exhibit enhancing performance for identifying ACEIP. For finding meta models with strong abilities to learn information from integrated feature pairs, PowerShap, a feature selection method, is used to select 40 optimal feature and meta model combinations. Compared with seven state-of-the-art methods on the source and clear benchmark data sets, AI4ACEIP significantly outperformed by 8.47 to 20.65% and 5.49 to 14.42% for Matthew's correlation coefficient. In brief, AI4ACEIP is a reliable model for ACEIP prediction and is freely available at https://github.com/abcair/AI4ACEIP.
Collapse
Affiliation(s)
- Sen Yang
- School of Computer Science and Artificial Intelligence, Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
- The Affiliated Changzhou No.2 People's Hospital of Nanjing Medical University, Changzhou 213164, China
| | - Jiaqi Ni
- School of Computer Science and Artificial Intelligence, Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Piao Xu
- College of Economics and Management, Nanjing Forestry University, Nanjing 210037, China
| |
Collapse
|
11
|
McBride JM, Tlusty T. AI-Predicted Protein Deformation Encodes Energy Landscape Perturbation. PHYSICAL REVIEW LETTERS 2024; 133:098401. [PMID: 39270162 DOI: 10.1103/physrevlett.133.098401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 02/27/2024] [Accepted: 07/24/2024] [Indexed: 09/15/2024]
Abstract
AI algorithms have proven to be excellent predictors of protein structure, but whether and how much these algorithms can capture the underlying physics remains an open question. Here, we aim to test this question using the Alphafold2 (AF) algorithm: We use AF to predict the subtle structural deformation induced by single mutations, quantified by strain, and compare with experimental datasets of corresponding perturbations in folding free energy ΔΔG. Unexpectedly, we find that physical strain alone-without any additional data or computation-correlates almost as well with ΔΔG as state-of-the-art energy-based and machine-learning predictors. This indicates that the AF-predicted structures alone encode fine details about the energy landscape. In particular, the structures encode significant information on stability, enough to estimate (de-)stabilizing effects of mutations, thus paving the way for the development of novel, structure-based stability predictors for protein design and evolution.
Collapse
Affiliation(s)
- John M McBride
- Center for Algorithmic and Robotized Synthesis, Institute for Basic Science, Ulsan 44919, South Korea
| | | |
Collapse
|
12
|
Guclu TF, Atilgan AR, Atilgan C. Deciphering GB1's Single Mutational Landscape: Insights from MuMi Analysis. J Phys Chem B 2024; 128:7987-7996. [PMID: 39115184 PMCID: PMC11671028 DOI: 10.1021/acs.jpcb.4c04916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 08/02/2024] [Accepted: 08/02/2024] [Indexed: 08/23/2024]
Abstract
Mutational changes that affect the binding of the C2 fragment of Streptococcal protein G (GB1) to the Fc domain of human IgG (IgG-Fc) have been extensively studied using deep mutational scanning (DMS), and the binding affinity of all single mutations has been measured experimentally in the literature. To investigate the underlying molecular basis, we perform in silico mutational scanning for all possible single mutations, along with 2 μs-long molecular dynamics (WT-MD) of the wild-type (WT) GB1 in both unbound and IgG-Fc bound forms. We compute the hydrogen bonds between GB1 and IgG-Fc in WT-MD to identify the dominant hydrogen bonds for binding, which we then assess in conformations produced by Mutation and Minimization (MuMi) to explain the fitness landscape of GB1 and IgG-Fc binding. Furthermore, we analyze MuMi and WT-MD to investigate the dynamics of binding, focusing on the relative solvent accessibility of residues and the probability of residues being located at the binding interface. With these analyses, we explain the interactions between GB1 and IgG-Fc and display the structural features of binding. In sum, our findings highlight the potential of MuMi as a reliable and computationally efficient tool for predicting protein fitness landscapes, offering significant advantages over traditional methods. The methodologies and results presented in this study pave the way for improved predictive accuracy in protein stability and interaction studies, which are crucial for advancements in drug design and synthetic biology.
Collapse
Affiliation(s)
- Tandac F. Guclu
- Faculty of Natural Sciences
and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| | - Ali Rana Atilgan
- Faculty of Natural Sciences
and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| | - Canan Atilgan
- Faculty of Natural Sciences
and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| |
Collapse
|
13
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
14
|
Shirvanizadeh N, Vihinen M. VariBench, new variation benchmark categories and data sets. FRONTIERS IN BIOINFORMATICS 2023; 3:1248732. [PMID: 37795169 PMCID: PMC10546188 DOI: 10.3389/fbinf.2023.1248732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 09/08/2023] [Indexed: 10/06/2023] Open
Affiliation(s)
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| |
Collapse
|
15
|
Chen L, Zhang Z, Li Z, Li R, Huo R, Chen L, Wang D, Luo X, Chen K, Liao C, Zheng M. Learning protein fitness landscapes with deep mutational scanning data from multiple sources. Cell Syst 2023; 14:706-721.e5. [PMID: 37591206 DOI: 10.1016/j.cels.2023.07.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/30/2023] [Accepted: 07/18/2023] [Indexed: 08/19/2023]
Abstract
One of the key points of machine learning-assisted directed evolution (MLDE) is the accurate learning of the fitness landscape, a conceptual mapping from sequence variants to the desired function. Here, we describe a multi-protein training scheme that leverages the existing deep mutational scanning data from diverse proteins to aid in understanding the fitness landscape of a new protein. Proof-of-concept trials are designed to validate this training scheme in three aspects: random and positional extrapolation for single-variant effects, zero-shot fitness predictions for new proteins, and extrapolation for higher-order variant effects from single-variant effects. Moreover, our study identified previously overlooked strong baselines, and their unexpectedly good performance brings our attention to the pitfalls of MLDE. Overall, these results may improve our understanding of the association between different protein fitness profiles and shed light on developing better machine learning-assisted approaches to the directed evolution of proteins. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Lin Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zehong Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhenghao Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Rui Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Ruifeng Huo
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Lifan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | | | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Cangsong Liao
- University of Chinese Academy of Sciences, Beijing 100049, China; Chemical Biology Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Science, Shanghai 201203, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China; School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China.
| |
Collapse
|
16
|
Sivelle C, Sierocki R, Lesparre Y, Lomet A, Quintilio W, Dubois S, Correia E, Moro AM, Maillère B, Nozach H. Combining deep mutational scanning to heatmap of HLA class II binding of immunogenic sequences to preserve functionality and mitigate predicted immunogenicity. Front Immunol 2023; 14:1197919. [PMID: 37575221 PMCID: PMC10416631 DOI: 10.3389/fimmu.2023.1197919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 07/10/2023] [Indexed: 08/15/2023] Open
Abstract
Removal of CD4 T cell epitopes from therapeutic antibody sequences is expected to mitigate their potential immunogenicity, but its application is complicated by the location of their T cell epitopes, which mainly overlap with complementarity-determining regions. We therefore evaluated the flexibility of antibody sequences to reduce the predicted affinity of corresponding peptides for HLA II molecules and to maintain antibody binding to its target in order to guide antibody engineering for mitigation of predicted immunogenicity. Permissive substitutions to reduce affinity of peptides for HLA II molecules were identified by establishing a heatmap of HLA class II binding using T-cell epitope prediction tools, while permissive substitutions preserving binding to the target were identified by means of deep mutational scanning and yeast surface display. Combinatorial libraries were then designed to identify active clones. Applied to adalimumab, an anti-TNFα human antibody, this approach identified 200 mutants with a lower HLA binding score than adalimumab. Three mutants were produced as full-length antibodies and showed a higher affinity for TNFα and neutralization ability than adalimumab. This study also sheds light on the permissiveness of antibody sequences with regard to functionality and predicted T cell epitope content.
Collapse
Affiliation(s)
- Coline Sivelle
- Université de Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SIMoS, Gif-sur-Yvette, France
| | - Raphael Sierocki
- Université de Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SIMoS, Gif-sur-Yvette, France
- Deeptope SAS, Orsay, France
| | | | - Aurore Lomet
- CEA List, Université Paris-Saclay, Palaiseau, France
| | - Wagner Quintilio
- Biopharmaceuticals Laboratory, Butantan Institute, Sao Paulo, Brazil
| | - Steven Dubois
- Université de Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SIMoS, Gif-sur-Yvette, France
| | - Evelyne Correia
- Université de Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SIMoS, Gif-sur-Yvette, France
| | - Ana Maria Moro
- Biopharmaceuticals Laboratory, Butantan Institute, Sao Paulo, Brazil
| | - Bernard Maillère
- Université de Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SIMoS, Gif-sur-Yvette, France
| | - Hervé Nozach
- Université de Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SIMoS, Gif-sur-Yvette, France
| |
Collapse
|
17
|
Nagar N, Tubiana J, Loewenthal G, Wolfson HJ, Ben Tal N, Pupko T. EvoRator2: Predicting Site-specific Amino Acid Substitutions Based on Protein Structural Information Using Deep Learning. J Mol Biol 2023; 435:168155. [PMID: 37356902 DOI: 10.1016/j.jmb.2023.168155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 05/13/2023] [Accepted: 05/17/2023] [Indexed: 06/27/2023]
Abstract
Multiple sequence alignments (MSAs) are the workhorse of molecular evolution and structural biology research. From MSAs, the amino acids that are tolerated at each site during protein evolution can be inferred. However, little is known regarding the repertoire of tolerated amino acids in proteins when only a few or no sequence homologs are available, such as orphan and de novo designed proteins. Here we present EvoRator2, a deep-learning algorithm trained on over 15,000 protein structures that can predict which amino acids are tolerated at any given site, based exclusively on protein structural information mined from atomic coordinate files. We show that EvoRator2 obtained satisfying results for the prediction of position-weighted scoring matrices (PSSM). We further show that EvoRator2 obtained near state-of-the-art performance on proteins with high quality structures in predicting the effect of mutations in deep mutation scanning (DMS) experiments and that for certain DMS targets, EvoRator2 outperformed state-of-the-art methods. We also show that by combining EvoRator2's predictions with those obtained by a state-of-the-art deep-learning method that accounts for the information in the MSA, the prediction of the effect of mutation in DMS experiments was improved in terms of both accuracy and stability. EvoRator2 is designed to predict which amino-acid substitutions are tolerated in such proteins without many homologous sequences, including orphan or de novo designed proteins. We implemented our approach in the EvoRator web server (https://evorator.tau.ac.il).
Collapse
Affiliation(s)
- Natan Nagar
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Jérôme Tubiana
- Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Haim J Wolfson
- Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Nir Ben Tal
- School of Neurobiology, Biochemistry & Biophysics, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| |
Collapse
|
18
|
Cagiada M, Bottaro S, Lindemose S, Schenstrøm SM, Stein A, Hartmann-Petersen R, Lindorff-Larsen K. Discovering functionally important sites in proteins. Nat Commun 2023; 14:4175. [PMID: 37443362 PMCID: PMC10345196 DOI: 10.1038/s41467-023-39909-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 07/02/2023] [Indexed: 07/15/2023] Open
Abstract
Proteins play important roles in biology, biotechnology and pharmacology, and missense variants are a common cause of disease. Discovering functionally important sites in proteins is a central but difficult problem because of the lack of large, systematic data sets. Sequence conservation can highlight residues that are functionally important but is often convoluted with a signal for preserving structural stability. We here present a machine learning method to predict functional sites by combining statistical models for protein sequences with biophysical models of stability. We train the model using multiplexed experimental data on variant effects and validate it broadly. We show how the model can be used to discover active sites, as well as regulatory and binding sites. We illustrate the utility of the model by prospective prediction and subsequent experimental validation on the functional consequences of missense variants in HPRT1 which may cause Lesch-Nyhan syndrome, and pinpoint the molecular mechanisms by which they cause disease.
Collapse
Affiliation(s)
- Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Sandro Bottaro
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Søren Lindemose
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Signe M Schenstrøm
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
19
|
Dunham AS, Beltrao P, AlQuraishi M. High-throughput deep learning variant effect prediction with Sequence UNET. Genome Biol 2023; 24:110. [PMID: 37161576 PMCID: PMC10169183 DOI: 10.1186/s13059-023-02948-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 04/20/2023] [Indexed: 05/11/2023] Open
Abstract
Understanding coding mutations is important for many applications in biology and medicine but the vast mutation space makes comprehensive experimental characterisation impossible. Current predictors are often computationally intensive and difficult to scale, including recent deep learning models. We introduce Sequence UNET, a highly scalable deep learning architecture that classifies and predicts variant frequency from sequence alone using multi-scale representations from a fully convolutional compression/expansion architecture. It achieves comparable pathogenicity prediction to recent methods. We demonstrate scalability by analysing 8.3B variants in 904,134 proteins detected through large-scale proteomics. Sequence UNET runs on modest hardware with a simple Python package.
Collapse
Affiliation(s)
- Alistair S Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1RQ, UK.
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093, Zurich, Switzerland
| | | |
Collapse
|
20
|
Mathy CJP, Mishra P, Flynn JM, Perica T, Mavor D, Bolon DNA, Kortemme T. A complete allosteric map of a GTPase switch in its native cellular network. Cell Syst 2023; 14:237-246.e7. [PMID: 36801015 PMCID: PMC10173951 DOI: 10.1016/j.cels.2023.01.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 11/08/2022] [Accepted: 01/06/2023] [Indexed: 02/19/2023]
Abstract
Allosteric regulation is central to protein function in cellular networks. A fundamental open question is whether cellular regulation of allosteric proteins occurs only at a few defined positions or at many sites distributed throughout the structure. Here, we probe the regulation of GTPases-protein switches that control signaling through regulated conformational cycling-at residue-level resolution by deep mutagenesis in the native biological network. For the GTPase Gsp1/Ran, we find that 28% of the 4,315 assayed mutations show pronounced gain-of-function responses. Twenty of the sixty positions enriched for gain-of-function mutations are outside the canonical GTPase active site switch regions. Kinetic analysis shows that these distal sites are allosterically coupled to the active site. We conclude that the GTPase switch mechanism is broadly sensitive to cellular allosteric regulation. Our systematic discovery of new regulatory sites provides a functional map to interrogate and target GTPases controlling many essential biological processes.
Collapse
Affiliation(s)
- Christopher J P Mathy
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA; Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA; The UC Berkeley-UCSF Graduate Program in Bioengineering, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Parul Mishra
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Medical School, Worcester, MA 01605, USA; School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Julia M Flynn
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Tina Perica
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA; Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - David Mavor
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Daniel N A Bolon
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Medical School, Worcester, MA 01605, USA.
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA; Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA; The UC Berkeley-UCSF Graduate Program in Bioengineering, University of California, San Francisco, San Francisco, CA 94158, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA.
| |
Collapse
|
21
|
Dewachter L, Brooks AN, Noon K, Cialek C, Clark-ElSayed A, Schalck T, Krishnamurthy N, Versées W, Vranken W, Michiels J. Deep mutational scanning of essential bacterial proteins can guide antibiotic development. Nat Commun 2023; 14:241. [PMID: 36646716 PMCID: PMC9842644 DOI: 10.1038/s41467-023-35940-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 01/09/2023] [Indexed: 01/18/2023] Open
Abstract
Deep mutational scanning is a powerful approach to investigate a wide variety of research questions including protein function and stability. Here, we perform deep mutational scanning on three essential E. coli proteins (FabZ, LpxC and MurA) involved in cell envelope synthesis using high-throughput CRISPR genome editing, and study the effect of the mutations in their original genomic context. We use more than 17,000 variants of the proteins to interrogate protein function and the importance of individual amino acids in supporting viability. Additionally, we exploit these libraries to study resistance development against antimicrobial compounds that target the selected proteins. Among the three proteins studied, MurA seems to be the superior antimicrobial target due to its low mutational flexibility, which decreases the chance of acquiring resistance-conferring mutations that simultaneously preserve MurA function. Additionally, we rank anti-LpxC lead compounds for further development, guided by the number of resistance-conferring mutations against each compound. Our results show that deep mutational scanning studies can be used to guide drug development, which we hope will contribute towards the development of novel antimicrobial therapies.
Collapse
Affiliation(s)
- Liselot Dewachter
- Centre of Microbial and Plant Genetics, KU Leuven, Leuven, Belgium. .,VIB-KU Leuven Center for Microbiology, Leuven, Belgium.
| | | | | | | | | | - Thomas Schalck
- Centre of Microbial and Plant Genetics, KU Leuven, Leuven, Belgium.,VIB-KU Leuven Center for Microbiology, Leuven, Belgium
| | | | - Wim Versées
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.,VIB-VUB Center for Structural Biology, Brussels, Belgium
| | - Wim Vranken
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.,VIB-VUB Center for Structural Biology, Brussels, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
| | - Jan Michiels
- Centre of Microbial and Plant Genetics, KU Leuven, Leuven, Belgium. .,VIB-KU Leuven Center for Microbiology, Leuven, Belgium.
| |
Collapse
|
22
|
Pruvost T, Mathieu M, Dubois S, Maillère B, Vigne E, Nozach H. Deciphering cross-species reactivity of LAMP-1 antibodies using deep mutational epitope mapping and AlphaFold. MAbs 2023; 15:2175311. [PMID: 36797224 PMCID: PMC9980635 DOI: 10.1080/19420862.2023.2175311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 01/20/2023] [Indexed: 02/18/2023] Open
Abstract
Delineating the precise regions on an antigen that are targeted by antibodies has become a key step for the development of antibody therapeutics. X-ray crystallography and cryogenic electron microscopy are considered the gold standard for providing precise information about these binding sites at atomic resolution. However, they are labor-intensive and a successful outcome is not guaranteed. We used deep mutational scanning (DMS) of the human LAMP-1 antigen displayed on yeast surface and leveraged next-generation sequencing to observe the effect of individual mutants on the binding of two LAMP-1 antibodies and to determine their functional epitopes on LAMP-1. Fine-tuned epitope mapping by DMS approaches is augmented by knowledge of experimental antigen structure. As human LAMP-1 structure has not yet been solved, we used the AlphaFold predicted structure of the full-length protein to combine with DMS data and ultimately finely map antibody epitopes. The accuracy of this method was confirmed by comparing the results to the co-crystal structure of one of the two antibodies with a LAMP-1 luminal domain. Finally, we used AlphaFold models of non-human LAMP-1 to understand the lack of mAb cross-reactivity. While both epitopes in the murine form exhibit multiple mutations in comparison to human LAMP-1, only one and two mutations in the Macaca form suffice to hinder the recognition by mAb B and A, respectively. Altogether, this study promotes a new application of AlphaFold to speed up precision mapping of antibody-antigen interactions and consequently accelerate antibody engineering for optimization.
Collapse
Affiliation(s)
- Tiphanie Pruvost
- CEA, INRAE, Medicines and Healthcare Technologies Department, Université Paris-Saclay, SIMoS, France
- Sanofi, Large Molecule Research, Vitry-sur-Seine, France
| | - Magali Mathieu
- Sanofi, Integrated Drug Discovery, Vitry-sur-Seine, France
| | - Steven Dubois
- CEA, INRAE, Medicines and Healthcare Technologies Department, Université Paris-Saclay, SIMoS, France
| | - Bernard Maillère
- CEA, INRAE, Medicines and Healthcare Technologies Department, Université Paris-Saclay, SIMoS, France
| | | | - Hervé Nozach
- CEA, INRAE, Medicines and Healthcare Technologies Department, Université Paris-Saclay, SIMoS, France
| |
Collapse
|
23
|
Akdel M, Pires DEV, Pardo EP, Jänes J, Zalevsky AO, Mészáros B, Bryant P, Good LL, Laskowski RA, Pozzati G, Shenoy A, Zhu W, Kundrotas P, Serra VR, Rodrigues CHM, Dunham AS, Burke D, Borkakoti N, Velankar S, Frost A, Basquin J, Lindorff-Larsen K, Bateman A, Kajava AV, Valencia A, Ovchinnikov S, Durairaj J, Ascher DB, Thornton JM, Davey NE, Stein A, Elofsson A, Croll TI, Beltrao P. A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol 2022; 29:1056-1067. [PMID: 36344848 PMCID: PMC9663297 DOI: 10.1038/s41594-022-00849-w] [Citation(s) in RCA: 276] [Impact Index Per Article: 92.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 09/20/2022] [Indexed: 11/09/2022]
Abstract
Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research.
Collapse
Affiliation(s)
- Mehmet Akdel
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Douglas E V Pires
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - Eduard Porta Pardo
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Jürgen Jänes
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Arthur O Zalevsky
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russian Federation
| | | | - Patrick Bryant
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Lydia L Good
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Roman A Laskowski
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Gabriele Pozzati
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Aditi Shenoy
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Wensi Zhu
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Petras Kundrotas
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | | | - Carlos H M Rodrigues
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - Alistair S Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - David Burke
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Neera Borkakoti
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Adam Frost
- Department of Biochemistry and Biophysics University of California, San Francisco, CA, USA
| | - Jérôme Basquin
- Department of Structural Cell Biology, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Andrey V Kajava
- Université de Montpellier, Centre de Recherche en Biologie Cellulaire de Montpellier (CRBM) CNRS, Montpellier, France
| | | | - Sergey Ovchinnikov
- Faculty of Arts and Sciences, Division of Science, Harvard University, Cambridge, MA, USA.
| | | | - David B Ascher
- School of Chemistry and Molecular Biology, University of Queensland, Brisbane, Queensland, Australia.
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
| | | | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Arne Elofsson
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden.
| | - Tristan I Croll
- Cambridge Institute for Medical Research, Department of Haematology, The University of Cambridge, Cambridge, UK.
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland.
| |
Collapse
|
24
|
Azbukina N, Zharikova A, Ramensky V. Intragenic compensation through the lens of deep mutational scanning. Biophys Rev 2022; 14:1161-1182. [PMID: 36345285 PMCID: PMC9636336 DOI: 10.1007/s12551-022-01005-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 09/26/2022] [Indexed: 12/20/2022] Open
Abstract
A significant fraction of mutations in proteins are deleterious and result in adverse consequences for protein function, stability, or interaction with other molecules. Intragenic compensation is a specific case of positive epistasis when a neutral missense mutation cancels effect of a deleterious mutation in the same protein. Permissive compensatory mutations facilitate protein evolution, since without them all sequences would be extremely conserved. Understanding compensatory mechanisms is an important scientific challenge at the intersection of protein biophysics and evolution. In human genetics, intragenic compensatory interactions are important since they may result in variable penetrance of pathogenic mutations or fixation of pathogenic human alleles in orthologous proteins from related species. The latter phenomenon complicates computational and clinical inference of an allele's pathogenicity. Deep mutational scanning is a relatively new technique that enables experimental studies of functional effects of thousands of mutations in proteins. We review the important aspects of the field and discuss existing limitations of current datasets. We reviewed ten published DMS datasets with quantified functional effects of single and double mutations and described rates and patterns of intragenic compensation in eight of them. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-022-01005-w.
Collapse
Affiliation(s)
- Nadezhda Azbukina
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
| | - Anastasia Zharikova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| | - Vasily Ramensky
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| |
Collapse
|
25
|
Ding Y, Perez-Ortiz G, Peate J, Barry SM. Redesigning Enzymes for Biocatalysis: Exploiting Structural Understanding for Improved Selectivity. Front Mol Biosci 2022; 9:908285. [PMID: 35936784 PMCID: PMC9355150 DOI: 10.3389/fmolb.2022.908285] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 06/08/2022] [Indexed: 11/13/2022] Open
Abstract
The discovery of new enzymes, alongside the push to make chemical processes more sustainable, has resulted in increased industrial interest in the use of biocatalytic processes to produce high-value and chiral precursor chemicals. Huge strides in protein engineering methodology and in silico tools have facilitated significant progress in the discovery and production of enzymes for biocatalytic processes. However, there are significant gaps in our knowledge of the relationship between enzyme structure and function. This has demonstrated the need for improved computational methods to model mechanisms and understand structure dynamics. Here, we explore efforts to rationally modify enzymes toward changing aspects of their catalyzed chemistry. We highlight examples of enzymes where links between enzyme function and structure have been made, thus enabling rational changes to the enzyme structure to give predictable chemical outcomes. We look at future directions the field could take and the technologies that will enable it.
Collapse
|
26
|
Horne J, Shukla D. Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering. Ind Eng Chem Res 2022; 61:6235-6245. [PMID: 36051311 PMCID: PMC9432854 DOI: 10.1021/acs.iecr.1c04943] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteins are Nature's molecular machinery and comprise diverse roles while consisting of chemically similar building blocks. In recent years, protein engineering and design have become important research areas, with many applications in the pharmaceutical, energy, and biocatalysis fields, among others-where the aim is to ultimately create a protein given desired structural and functional properties. It is often critical to model the relationship between a protein's sequence, folded structure, and biological function to assist in such protein engineering pursuits. However, significant challenges remain in concretely mapping an amino acid sequence to specific protein properties and biological activities. Mutations may enhance or diminish molecular protein function, and the epistatic interactions between mutations result in an inherently complex mapping between genetic modifications and protein function. Therefore, estimating the quantitative effects of mutations on protein function(s) remains a grand challenge of biology, bioinformatics, and many related fields and would rapidly accelerate protein engineering tasks when successful. Such estimation is often known as variant effect prediction (VEP). However, progress has been demonstrated in recent years with the development of machine learning (ML) methods in modeling the relationship between mutations and protein function. In this Review, recent advances in variant effect prediction (VEP) are discussed as tools for protein engineering, focusing on techniques incorporating gains from the broader ML community and challenges in estimating biomolecular functional differences. Primary developments highlighted include convolutional neural networks, graph neural networks, and natural language embeddings for protein sequences.
Collapse
Affiliation(s)
- Jesse Horne
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering and Department of Bioengineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States; Department of Plant Biology, Cancer Center at Illinois, and Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States
| |
Collapse
|
27
|
Barbon L, Offord V, Radford EJ, Butler AP, Gerety SS, Adams DJ, Tan HK, Waters AJ. Variant Library Annotation Tool (VaLiAnT): an oligonucleotide library design and annotation tool for saturation genome editing and other deep mutational scanning experiments. Bioinformatics 2022; 38:892-899. [PMID: 34791067 PMCID: PMC8796380 DOI: 10.1093/bioinformatics/btab776] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 07/13/2021] [Accepted: 11/10/2021] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION CRISPR/Cas9-based technology allows for the functional analysis of genetic variants at single nucleotide resolution whilst maintaining genomic context. This approach, known as saturation genome editing (SGE), a form of deep mutational scanning, systematically alters each position in a target region to explore its function. SGE experiments require the design and synthesis of oligonucleotide variant libraries which are introduced into the genome. This technology is applicable to diverse fields such as disease variant identification, drug development, structure-function studies, synthetic biology, evolutionary genetics and host-pathogen interactions. Here, we present the Variant Library Annotation Tool (VaLiAnT) which can be used to generate variant libraries from user-defined genomic coordinates and standard input files. The software can accommodate user-specified species, reference sequences and transcript annotations. RESULTS Coordinates for a genomic range are provided by the user to retrieve a corresponding oligonucleotide reference sequence. A user-specified range within this sequence is then subject to systematic, nucleotide and/or amino acid saturating mutator functions. VaLiAnT provides a novel way to retrieve, mutate and annotate genomic sequences for oligonucleotide library generation. Specific features for SGE library generation can be employed. In addition, VaLiAnT is configurable, allowing for cDNA and prime editing saturation library generation, with other diverse applications possible. AVAILABILITY AND IMPLEMENTATION VaLiAnT is a command line tool written in Python. Source code, testing data, example input and output files and executables are available (https://github.com/cancerit/VaLiAnT) in addition to a detailed user manual (https://github.com/cancerit/VaLiAnT/wiki). VaLiAnT is licensed under AGPLv3. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luca Barbon
- Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Victoria Offord
- Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Elizabeth J Radford
- Human Genetics Programme, Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
- Department of Paediatrics, University of Cambridge, Cambridge CB2 0QQ, UK
| | - Adam P Butler
- Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Sebastian S Gerety
- Human Genetics Programme, Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - David J Adams
- Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Hong Kee Tan
- Human Genetics Programme, Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Andrew J Waters
- Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
28
|
Høie MH, Cagiada M, Beck Frederiksen AH, Stein A, Lindorff-Larsen K. Predicting and interpreting large-scale mutagenesis data using analyses of protein stability and conservation. Cell Rep 2022; 38:110207. [PMID: 35021073 DOI: 10.1016/j.celrep.2021.110207] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 10/01/2021] [Accepted: 12/13/2021] [Indexed: 01/23/2023] Open
Abstract
Understanding and predicting the functional consequences of single amino acid changes is central in many areas of protein science. Here, we collect and analyze experimental measurements of effects of >150,000 variants in 29 proteins. We use biophysical calculations to predict changes in stability for each variant and assess them in light of sequence conservation. We find that the sequence analyses give more accurate prediction of variant effects than predictions of stability and that about half of the variants that show loss of function do so due to stability effects. We construct a machine learning model to predict variant effects from protein structure and sequence alignments and show how the two sources of information support one another and enable mechanistic interpretations. Together, our results show how one can leverage large-scale experimental assessments of variant effects to gain deeper and general insights into the mechanisms that cause loss of function.
Collapse
Affiliation(s)
- Magnus Haraldson Høie
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Anders Haagen Beck Frederiksen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
29
|
Chu HY, Wong ASL. Facilitating Machine Learning-Guided Protein Engineering with Smart Library Design and Massively Parallel Assays. ADVANCED GENETICS (HOBOKEN, N.J.) 2021; 2:2100038. [PMID: 36619853 PMCID: PMC9744531 DOI: 10.1002/ggn2.202100038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 11/08/2021] [Indexed: 01/11/2023]
Abstract
Protein design plays an important role in recent medical advances from antibody therapy to vaccine design. Typically, exhaustive mutational screens or directed evolution experiments are used for the identification of the best design or for improvements to the wild-type variant. Even with a high-throughput screening on pooled libraries and Next-Generation Sequencing to boost the scale of read-outs, surveying all the variants with combinatorial mutations for their empirical fitness scores is still of magnitudes beyond the capacity of existing experimental settings. To tackle this challenge, in-silico approaches using machine learning to predict the fitness of novel variants based on a subset of empirical measurements are now employed. These machine learning models turn out to be useful in many cases, with the premise that the experimentally determined fitness scores and the amino-acid descriptors of the models are informative. The machine learning models can guide the search for the highest fitness variants, resolve complex epistatic relationships, and highlight bio-physical rules for protein folding. Using machine learning-guided approaches, researchers can build more focused libraries, thus relieving themselves from labor-intensive screens and fast-tracking the optimization process. Here, we describe the current advances in massive-scale variant screens, and how machine learning and mutagenesis strategies can be integrated to accelerate protein engineering. More specifically, we examine strategies to make screens more economical, informative, and effective in discovery of useful variants.
Collapse
Affiliation(s)
- Hoi Yee Chu
- Laboratory of Combinatorial Genetics and Synthetic BiologySchool of Biomedical SciencesThe University of Hong KongHong Kong852China
| | - Alan S. L. Wong
- Laboratory of Combinatorial Genetics and Synthetic BiologySchool of Biomedical SciencesThe University of Hong KongHong Kong852China
- Electrical and Electronic EngineeringThe University of Hong KongPokfulamHong Kong852China
| |
Collapse
|
30
|
Cagiada M, Johansson KE, Valanciute A, Nielsen SV, Hartmann-Petersen R, Yang JJ, Fowler DM, Stein A, Lindorff-Larsen K. Understanding the Origins of Loss of Protein Function by Analyzing the Effects of Thousands of Variants on Activity and Abundance. Mol Biol Evol 2021; 38:3235-3246. [PMID: 33779753 PMCID: PMC8321532 DOI: 10.1093/molbev/msab095] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Understanding and predicting how amino acid substitutions affect proteins are keys to our basic understanding of protein function and evolution. Amino acid changes may affect protein function in a number of ways including direct perturbations of activity or indirect effects on protein folding and stability. We have analyzed 6,749 experimentally determined variant effects from multiplexed assays on abundance and activity in two proteins (NUDT15 and PTEN) to quantify these effects and find that a third of the variants cause loss of function, and about half of loss-of-function variants also have low cellular abundance. We analyze the structural and mechanistic origins of loss of function and use the experimental data to find residues important for enzymatic activity. We performed computational analyses of protein stability and evolutionary conservation and show how we may predict positions where variants cause loss of activity or abundance. In this way, our results link thermodynamic stability and evolutionary conservation to experimental studies of different properties of protein fitness landscapes.
Collapse
Affiliation(s)
- Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Audrone Valanciute
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Sofie V Nielsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jun J Yang
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, USA.,Department of Oncology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.,Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|