1
|
Shukla K, Idanwekhai K, Naradikian M, Ting S, Schoenberger SP, Brunk E. Machine Learning of Three-Dimensional Protein Structures to Predict the Functional Impacts of Genome Variation. J Chem Inf Model 2024; 64:5328-5343. [PMID: 38635316 DOI: 10.1021/acs.jcim.3c01967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Research in the human genome sciences generates a substantial amount of genetic data for hundreds of thousands of individuals, which concomitantly increases the number of variants of unknown significance (VUS). Bioinformatic analyses can successfully reveal rare variants and variants with clear associations with disease-related phenotypes. These studies have had a significant impact on how clinical genetic screens are interpreted and how patients are stratified for treatment. There are few, if any, computational methods for variants comparable to biological activity predictions. To address this gap, we developed a machine learning method that uses protein three-dimensional structures from AlphaFold to predict how a variant will influence changes to a gene's downstream biological pathways. We trained state-of-the-art machine learning classifiers to predict which protein regions will most likely impact transcriptional activities of two proto-oncogenes, nuclear factor erythroid 2 (NFE2L2)-related factor 2 (NRF2) and c-Myc. We have identified classifiers that attain accuracies higher than 80%, which have allowed us to identify a set of key protein regions that lead to significant perturbations in c-Myc or NRF2 transcriptional pathway activities.
Collapse
Affiliation(s)
- Kriti Shukla
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | - Kelvin Idanwekhai
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | - Martin Naradikian
- La Jolla Institute for Immunology, San Diego, California 92093, United States
| | - Stephanie Ting
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | | | - Elizabeth Brunk
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Integrative Program for Biological and Genome Sciences (IBGS), University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| |
Collapse
|
2
|
Zhou Z, Zhang L, Yu Y, Wu B, Li M, Hong L, Tan P. Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning. Nat Commun 2024; 15:5566. [PMID: 38956442 PMCID: PMC11219809 DOI: 10.1038/s41467-024-49798-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 06/11/2024] [Indexed: 07/04/2024] Open
Abstract
Accurately modeling the protein fitness landscapes holds great importance for protein engineering. Pre-trained protein language models have achieved state-of-the-art performance in predicting protein fitness without wet-lab experimental data, but their accuracy and interpretability remain limited. On the other hand, traditional supervised deep learning models require abundant labeled training examples for performance improvements, posing a practical barrier. In this work, we introduce FSFP, a training strategy that can effectively optimize protein language models under extreme data scarcity for fitness prediction. By combining meta-transfer learning, learning to rank, and parameter-efficient fine-tuning, FSFP can significantly boost the performance of various protein language models using merely tens of labeled single-site mutants from the target protein. In silico benchmarks across 87 deep mutational scanning datasets demonstrate FSFP's superiority over both unsupervised and supervised baselines. Furthermore, we successfully apply FSFP to engineer the Phi29 DNA polymerase through wet-lab experiments, achieving a 25% increase in the positive rate. These results underscore the potential of our approach in aiding AI-guided protein engineering.
Collapse
Affiliation(s)
- Ziyi Zhou
- School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai National Center for Applied Mathematics (SJTU Center) & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Liang Zhang
- School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yuanxi Yu
- School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Banghao Wu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Mingchen Li
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Liang Hong
- School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai National Center for Applied Mathematics (SJTU Center) & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
- Zhang Jiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai, 201203, China.
| | - Pan Tan
- School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai National Center for Applied Mathematics (SJTU Center) & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
| |
Collapse
|
3
|
Wirnsberger G, Pritišanac I, Oberdorfer G, Gruber K. Flattening the curve-How to get better results with small deep-mutational-scanning datasets. Proteins 2024; 92:886-902. [PMID: 38501649 DOI: 10.1002/prot.26686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 02/24/2024] [Accepted: 03/07/2024] [Indexed: 03/20/2024]
Abstract
Proteins are used in various biotechnological applications, often requiring the optimization of protein properties by introducing specific amino-acid exchanges. Deep mutational scanning (DMS) is an effective high-throughput method for evaluating the effects of these exchanges on protein function. DMS data can then inform the training of a neural network to predict the impact of mutations. Most approaches use some representation of the protein sequence for training and prediction. As proteins are characterized by complex structures and intricate residue interaction networks, directly providing structural information as input reduces the need to learn these features from the data. We introduce a method for encoding protein structures as stacked 2D contact maps, which capture residue interactions, their evolutionary conservation, and mutation-induced interaction changes. Furthermore, we explored techniques to augment neural network training performance on smaller DMS datasets. To validate our approach, we trained three neural network architectures originally used for image analysis on three DMS datasets, and we compared their performances with networks trained solely on protein sequences. The results confirm the effectiveness of the protein structure encoding in machine learning efforts on DMS data. Using structural representations as direct input to the networks, along with data augmentation and pretraining, significantly reduced demands on training data size and improved prediction performance, especially on smaller datasets, while performance on large datasets was on par with state-of-the-art sequence convolutional neural networks. The methods presented here have the potential to provide the same workflow as DMS without the experimental and financial burden of testing thousands of mutants. Additionally, we present an open-source, user-friendly software tool to make these data analysis techniques accessible, particularly to biotechnology and protein engineering researchers who wish to apply them to their mutagenesis data.
Collapse
Affiliation(s)
| | - Iva Pritišanac
- Institute of Molecular Biology and Biochemistry, Medical University of Graz, Graz, Austria
- BioTechMed-Graz, Graz, Austria
| | - Gustav Oberdorfer
- BioTechMed-Graz, Graz, Austria
- Institute of Biochemistry, Graz University of Technology, Graz, Austria
| | - Karl Gruber
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- BioTechMed-Graz, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
| |
Collapse
|
4
|
Cannon AE, Horn PJ. The Molecular Frequency, Conservation and Role of Reactive Cysteines in Plant Lipid Metabolism. PLANT & CELL PHYSIOLOGY 2024; 65:826-844. [PMID: 38113384 DOI: 10.1093/pcp/pcad163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 11/21/2023] [Accepted: 12/18/2023] [Indexed: 12/21/2023]
Abstract
Cysteines (Cys) are chemically reactive amino acids containing sulfur that play diverse roles in plant biology. Recent proteomics investigations in Arabidopsis thaliana have revealed the presence of thiol post-translational modifications (PTMs) in several Cys residues. These PTMs are presumed to impact protein structure and function, yet mechanistic data regarding the specific Cys susceptible to modification and their biochemical relevance remain limited. To help address these limitations, we have conducted a wide-ranging analysis by integrating published datasets encompassing PTM proteomics (comparing S-sulfenylation, persulfidation, S-nitrosylation and S-acylation), genomics and protein structures, with a specific focus on proteins involved in plant lipid metabolism. The prevalence and distribution of modified Cys residues across all analyzed proteins is diverse and multifaceted. Nevertheless, by combining an evaluation of sequence conservation across 100+ plant genomes with AlphaFold-generated protein structures and physicochemical predictions, we have unveiled structural propensities associated with Cys modifications. Furthermore, we have identified discernible patterns in lipid biochemical pathways enriched with Cys PTMs, notably involving beta-oxidation, jasmonic acid biosynthesis, fatty acid biosynthesis and wax biosynthesis. These collective findings provide valuable insights for future investigations targeting the mechanistic foundations of Cys modifications and the regulation of modified proteins in lipid metabolism and other metabolic pathways.
Collapse
Affiliation(s)
- Ashley E Cannon
- BioDiscovery Institute and Department of Biological Sciences, University of North Texas, 1155 Union Circle, Denton, TX 76203, USA
| | - Patrick J Horn
- BioDiscovery Institute and Department of Biological Sciences, University of North Texas, 1155 Union Circle, Denton, TX 76203, USA
| |
Collapse
|
5
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.593772. [PMID: 38798671 PMCID: PMC11118547 DOI: 10.1101/2024.05.12.593772] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation. Significance Statement Computational biology relies heavily on mathematical models that predict biological activities from DNA, RNA, or protein sequences. Interpreting the parameters of these models, however, remains difficult. Here we address a core challenge for model interpretation-the presence of 'gauge freedoms', i.e., ways of changing model parameters without affecting model predictions. The results unify commonly used methods for eliminating gauge freedoms and show how these methods can be used to simplify complex models in localized regions of sequence space. This work thus overcomes a major obstacle in the interpretation of quantitative sequence-function relationships.
Collapse
|
6
|
Chen SK, Liu J, Van Nynatten A, Tudor-Price BM, Chang BSW. Sampling Strategies for Experimentally Mapping Molecular Fitness Landscapes Using High-Throughput Methods. J Mol Evol 2024:10.1007/s00239-024-10179-8. [PMID: 38886207 DOI: 10.1007/s00239-024-10179-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/20/2024] [Indexed: 06/20/2024]
Abstract
Empirical studies of genotype-phenotype-fitness maps of proteins are fundamental to understanding the evolutionary process, in elucidating the space of possible genotypes accessible through mutations in a landscape of phenotypes and fitness effects. Yet, comprehensively mapping molecular fitness landscapes remains challenging since all possible combinations of amino acid substitutions for even a few protein sites are encoded by an enormous genotype space. High-throughput mapping of genotype space can be achieved using large-scale screening experiments known as multiplexed assays of variant effect (MAVEs). However, to accommodate such multi-mutational studies, the size of MAVEs has grown to the point where a priori determination of sampling requirements is needed. To address this problem, we propose calculations and simulation methods to approximate minimum sampling requirements for multi-mutational MAVEs, which we combine with a new library construction protocol to experimentally validate our approximation approaches. Analysis of our simulated data reveals how sampling trajectories differ between simulations of nucleotide versus amino acid variants and among mutagenesis schemes. For this, we show quantitatively that marginal gains in sampling efficiency demand increasingly greater sampling effort when sampling for nucleotide sequences over their encoded amino acid equivalents. We present a new library construction protocol that efficiently maximizes sequence variation, and demonstrate using ultradeep sequencing that the library encodes virtually all possible combinations of mutations within the experimental design. Insights learned from our analyses together with the methodological advances reported herein are immediately applicable toward pooled experimental screens of arbitrary design, enabling further assay upscaling and expanded testing of genotype space.
Collapse
Affiliation(s)
- Steven K Chen
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Jing Liu
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Alexander Van Nynatten
- Department of Biological Science, University of Toronto Scarborough, Toronto, ON, Canada
| | | | - Belinda S W Chang
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada.
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, ON, Canada.
- Centre for the Analysis of Genome Evolution & Function, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
7
|
Liu Z, Gillis TG, Raman S, Cui Q. A parameterized two-domain thermodynamic model explains diverse mutational effects on protein allostery. eLife 2024; 12:RP92262. [PMID: 38836839 PMCID: PMC11152574 DOI: 10.7554/elife.92262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024] Open
Abstract
New experimental findings continue to challenge our understanding of protein allostery. Recent deep mutational scanning study showed that allosteric hotspots in the tetracycline repressor (TetR) and its homologous transcriptional factors are broadly distributed rather than spanning well-defined structural pathways as often assumed. Moreover, hotspot mutation-induced allostery loss was rescued by distributed additional mutations in a degenerate fashion. Here, we develop a two-domain thermodynamic model for TetR, which readily rationalizes these intriguing observations. The model accurately captures the in vivo activities of various mutants with changes in physically transparent parameters, allowing the data-based quantification of mutational effects using statistical inference. Our analysis reveals the intrinsic connection of intra- and inter-domain properties for allosteric regulation and illustrate epistatic interactions that are consistent with structural features of the protein. The insights gained from this study into the nature of two-domain allostery are expected to have broader implications for other multi-domain allosteric proteins.
Collapse
Affiliation(s)
- Zhuang Liu
- Department of Physics, Boston UniversityBostonUnited States
| | - Thomas G Gillis
- Department of Biochemistry, University of WisconsinMadisonUnited States
| | - Srivatsan Raman
- Department of Biochemistry, University of WisconsinMadisonUnited States
- Department of Chemistry, University of WisconsinMadisonUnited States
- Department of Bacteriology, University of WisconsinMadisonUnited States
| | - Qiang Cui
- Department of Physics, Boston UniversityBostonUnited States
- Department of Chemistry, Boston UniversityBostonUnited States
| |
Collapse
|
8
|
Rao J, Xin R, Macdonald C, Howard MK, Estevam GO, Yee SW, Wang M, Fraser JS, Coyote-Maestas W, Pimentel H. Rosace: a robust deep mutational scanning analysis framework employing position and mean-variance shrinkage. Genome Biol 2024; 25:138. [PMID: 38789982 PMCID: PMC11127319 DOI: 10.1186/s13059-024-03279-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 05/14/2024] [Indexed: 05/26/2024] Open
Abstract
Deep mutational scanning (DMS) measures the effects of thousands of genetic variants in a protein simultaneously. The small sample size renders classical statistical methods ineffective. For example, p-values cannot be correctly calibrated when treating variants independently. We propose Rosace, a Bayesian framework for analyzing growth-based DMS data. Rosace leverages amino acid position information to increase power and control the false discovery rate by sharing information across parameters via shrinkage. We also developed Rosette for simulating the distributional properties of DMS. We show that Rosace is robust to the violation of model assumptions and is more powerful than existing tools.
Collapse
Affiliation(s)
- Jingyou Rao
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Ruiqi Xin
- Computational and Systems Biology Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Christian Macdonald
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA
| | - Matthew K Howard
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA
- Tetrad Graduate Program, UCSF, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, UCSF, San Francisco, CA, USA
| | - Gabriella O Estevam
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA
- Tetrad Graduate Program, UCSF, San Francisco, CA, USA
| | - Sook Wah Yee
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA
| | - Mingsen Wang
- Department of Mathematics, Baruch College, CUNY, New York, NY, USA
| | - James S Fraser
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA
- Quantitative Biosciences Institute, UCSF, San Francisco, CA, USA
| | - Willow Coyote-Maestas
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA.
- Quantitative Biosciences Institute, UCSF, San Francisco, CA, USA.
| | - Harold Pimentel
- Department of Computer Science, UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
9
|
Metzger BPH, Park Y, Starr TN, Thornton JW. Epistasis facilitates functional evolution in an ancient transcription factor. eLife 2024; 12:RP88737. [PMID: 38767330 PMCID: PMC11105156 DOI: 10.7554/elife.88737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024] Open
Abstract
A protein's genetic architecture - the set of causal rules by which its sequence produces its functions - also determines its possible evolutionary trajectories. Prior research has proposed that the genetic architecture of proteins is very complex, with pervasive epistatic interactions that constrain evolution and make function difficult to predict from sequence. Most of this work has analyzed only the direct paths between two proteins of interest - excluding the vast majority of possible genotypes and evolutionary trajectories - and has considered only a single protein function, leaving unaddressed the genetic architecture of functional specificity and its impact on the evolution of new functions. Here, we develop a new method based on ordinal logistic regression to directly characterize the global genetic determinants of multiple protein functions from 20-state combinatorial deep mutational scanning (DMS) experiments. We use it to dissect the genetic architecture and evolution of a transcription factor's specificity for DNA, using data from a combinatorial DMS of an ancient steroid hormone receptor's capacity to activate transcription from two biologically relevant DNA elements. We show that the genetic architecture of DNA recognition consists of a dense set of main and pairwise effects that involve virtually every possible amino acid state in the protein-DNA interface, but higher-order epistasis plays only a tiny role. Pairwise interactions enlarge the set of functional sequences and are the primary determinants of specificity for different DNA elements. They also massively expand the number of opportunities for single-residue mutations to switch specificity from one DNA target to another. By bringing variants with different functions close together in sequence space, pairwise epistasis therefore facilitates rather than constrains the evolution of new functions.
Collapse
Affiliation(s)
- Brian PH Metzger
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
| | - Yeonwoo Park
- Program in Genetics, Genomics, and Systems Biology, University of ChicagoChicagoUnited States
| | - Tyler N Starr
- Department of Biochemistry and Molecular Biophysics, University of ChicagoChicagoUnited States
| | - Joseph W Thornton
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
- Department of Human Genetics, University of ChicagoChicagoUnited States
| |
Collapse
|
10
|
Schnettler JD, Wang MS, Gantz M, Bunzel HA, Karas C, Hollfelder F, Hecht MH. Selection of a promiscuous minimalist cAMP phosphodiesterase from a library of de novo designed proteins. Nat Chem 2024:10.1038/s41557-024-01490-4. [PMID: 38702405 DOI: 10.1038/s41557-024-01490-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 02/27/2024] [Indexed: 05/06/2024]
Abstract
The ability of unevolved amino acid sequences to become biological catalysts was key to the emergence of life on Earth. However, billions of years of evolution separate complex modern enzymes from their simpler early ancestors. To probe how unevolved sequences can develop new functions, we use ultrahigh-throughput droplet microfluidics to screen for phosphoesterase activity amidst a library of more than one million sequences based on a de novo designed 4-helix bundle. Characterization of hits revealed that acquisition of function involved a large jump in sequence space enriching for truncations that removed >40% of the protein chain. Biophysical characterization of a catalytically active truncated protein revealed that it dimerizes into an α-helical structure, with the gain of function accompanied by increased structural dynamics. The identified phosphodiesterase is a manganese-dependent metalloenzyme that hydrolyses a range of phosphodiesters. It is most active towards cyclic AMP, with a rate acceleration of ~109 and a catalytic proficiency of >1014 M-1, comparable to larger enzymes shaped by billions of years of evolution.
Collapse
Affiliation(s)
| | - Michael S Wang
- Department of Chemistry, Princeton University, Princeton, USA
| | - Maximilian Gantz
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - H Adrian Bunzel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Christina Karas
- Department of Molecular Biology, Princeton University, Princeton, USA
| | | | - Michael H Hecht
- Department of Chemistry, Princeton University, Princeton, USA.
| |
Collapse
|
11
|
Hoskins I, Rao S, Tante C, Cenik C. Integrated multiplexed assays of variant effect reveal determinants of catechol-O-methyltransferase gene expression. Mol Syst Biol 2024; 20:481-505. [PMID: 38355921 PMCID: PMC11066095 DOI: 10.1038/s44320-024-00018-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 01/16/2024] [Accepted: 01/18/2024] [Indexed: 02/16/2024] Open
Abstract
Multiplexed assays of variant effect are powerful methods to profile the consequences of rare variants on gene expression and organismal fitness. Yet, few studies have integrated several multiplexed assays to map variant effects on gene expression in coding sequences. Here, we pioneered a multiplexed assay based on polysome profiling to measure variant effects on translation at scale, uncovering single-nucleotide variants that increase or decrease ribosome load. By combining high-throughput ribosome load data with multiplexed mRNA and protein abundance readouts, we mapped the cis-regulatory landscape of thousands of catechol-O-methyltransferase (COMT) variants from RNA to protein and found numerous coding variants that alter COMT expression. Finally, we trained machine learning models to map signatures of variant effects on COMT gene expression and uncovered both directional and divergent impacts across expression layers. Our analyses reveal expression phenotypes for thousands of variants in COMT and highlight variant effects on both single and multiple layers of expression. Our findings prompt future studies that integrate several multiplexed assays for the readout of gene expression.
Collapse
Affiliation(s)
- Ian Hoskins
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Shilpa Rao
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Charisma Tante
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Can Cenik
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA.
| |
Collapse
|
12
|
Gersing S, Schulze TK, Cagiada M, Stein A, Roth FP, Lindorff-Larsen K, Hartmann-Petersen R. Characterizing glucokinase variant mechanisms using a multiplexed abundance assay. Genome Biol 2024; 25:98. [PMID: 38627865 PMCID: PMC11021015 DOI: 10.1186/s13059-024-03238-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 04/04/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Amino acid substitutions can perturb protein activity in multiple ways. Understanding their mechanistic basis may pinpoint how residues contribute to protein function. Here, we characterize the mechanisms underlying variant effects in human glucokinase (GCK) variants, building on our previous comprehensive study on GCK variant activity. RESULTS Using a yeast growth-based assay, we score the abundance of 95% of GCK missense and nonsense variants. When combining the abundance scores with our previously determined activity scores, we find that 43% of hypoactive variants also decrease cellular protein abundance. The low-abundance variants are enriched in the large domain, while residues in the small domain are tolerant to mutations with respect to abundance. Instead, many variants in the small domain perturb GCK conformational dynamics which are essential for appropriate activity. CONCLUSIONS In this study, we identify residues important for GCK metabolic stability and conformational dynamics. These residues could be targeted to modulate GCK activity, and thereby affect glucose homeostasis.
Collapse
Affiliation(s)
- Sarah Gersing
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen, Denmark.
| | - Thea K Schulze
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen, Denmark
| | - Matteo Cagiada
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen, Denmark
| | - Amelie Stein
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen, Denmark
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, M5S 3E1, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, M5S 1A8, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, M5G 1X5, Toronto, ON, Canada
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, 15213, Pittsburgh, USA
| | - Kresten Lindorff-Larsen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen, Denmark.
| | - Rasmus Hartmann-Petersen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen, Denmark.
| |
Collapse
|
13
|
Dawood M, Fayer S, Pendyala S, Post M, Kalra D, Patterson K, Venner E, Muffley LA, Fowler DM, Rubin AF, Posey JE, Plon SE, Lupski JR, Gibbs RA, Starita LM, Robles-Espinoza CD, Coyote-Maestas W, Gallego Romero I. Defining and Reducing Variant Classification Disparities. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.11.24305690. [PMID: 38645101 PMCID: PMC11030469 DOI: 10.1101/2024.04.11.24305690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Background Multiplexed Assays of Variant Effects (MAVEs) can test all possible single variants in a gene of interest. The resulting saturation-style data may help resolve variant classification disparities between populations, especially for variants of uncertain significance (VUS). Methods We analyzed clinical significance classifications in 213,663 individuals of European-like genetic ancestry versus 206,975 individuals of non-European-like genetic ancestry from All of Us and the Genome Aggregation Database. Then, we incorporated clinically calibrated MAVE data into the Clinical Genome Resource's Variant Curation Expert Panel rules to automate VUS reclassification for BRCA1, TP53, and PTEN . Results Using two orthogonal statistical approaches, we show a higher prevalence ( p ≤5.95e-06) of VUS in individuals of non-European-like genetic ancestry across all medical specialties assessed in all three databases. Further, in the non-European-like genetic ancestry group, higher rates of Benign or Likely Benign and variants with no clinical designation ( p ≤2.5e-05) were found across many medical specialties, whereas Pathogenic or Likely Pathogenic assignments were higher in individuals of European-like genetic ancestry ( p ≤2.5e-05). Using MAVE data, we reclassified VUS in individuals of non-European-like genetic ancestry at a significantly higher rate in comparison to reclassified VUS from European-like genetic ancestry ( p =9.1e-03) effectively compensating for the VUS disparity. Further, essential code analysis showed equitable impact of MAVE evidence codes but inequitable impact of allele frequency ( p =7.47e-06) and computational predictor ( p =6.92e-05) evidence codes for individuals of non-European-like genetic ancestry. Conclusions Generation of saturation-style MAVE data should be a priority to reduce VUS disparities and produce equitable training data for future computational predictors.
Collapse
|
14
|
Ye D, Shao YZ, Li WR, Cui ZJ, Gong T, Yang JL, Wang HQ, Dai JG, Feng KP, Ma M, Ma SG, Liu YB, Zhu P, Yu SS. Characterization and Engineering of Two Highly Paralogous Sesquiterpene Synthases Reveal a Regioselective Reprotonation Switch. Angew Chem Int Ed Engl 2024; 63:e202315674. [PMID: 38327006 DOI: 10.1002/anie.202315674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 01/26/2024] [Accepted: 02/06/2024] [Indexed: 02/09/2024]
Abstract
Sesquiterpene synthases (STPSs) catalyze carbocation-driven cyclization reactions that can generate structurally diverse hydrocarbons. The deprotonation-reprotonation process is widely used in STPSs to promote structural diversity, largely attributable to the distinct regio/stereoselective reprotonations. However, the molecular basis for reprotonation regioselectivity remains largely understudied. Herein, we analyzed two highly paralogous STPSs, Artabotrys hexapetalus (-)-cyperene synthase (AhCS) and ishwarane synthase (AhIS), which catalyze reactions that are distinct from the regioselective protonation of germacrene A (GA), resulting in distinct skeletons of 5/5/6 tricyclic (-)-cyperene and 6/6/5/3 tetracyclic ishwarane, respectively. Isotopic labeling experiments demonstrated that these protonations occur at C3 and C6 of GA in AhCS and AhIS, respectively. The cryo-electron microscopy-derived AhCS complex structure provided the structural basis for identifying different key active site residues that may govern their functional disparity. The structure-guided mutagenesis of these residues resulted in successful functional interconversion between AhCS and AhIS, thus targeting the three active site residues [L311-S419-C458]/[M311-V419-A458] that may act as a C3/C6 reprotonation switch for GA. These findings facilitate the rational design or directed evolution of STPSs with structurally diverse skeletons.
Collapse
Affiliation(s)
- Dan Ye
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
| | - Yi-Zhen Shao
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
| | - Wen-Rui Li
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
| | - Zhen-Jia Cui
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
| | - Ting Gong
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
| | - Jin-Ling Yang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
| | - Hai-Qiang Wang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
| | - Jun-Gui Dai
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
| | - Ke-Ping Feng
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
| | - Ming Ma
- Department State Key Laboratory of Natural and Biomimetic Drugs, Institution School of Pharmaceutical Sciences, Peking University, Beijing, 100191, People's Republic of China
| | - Shuang-Gang Ma
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
| | - Yun-Bao Liu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
| | - Ping Zhu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
- NHC Key Laboratory of Biosynthesis of Natural Products, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
| | - Shi-Shan Yu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100050, People's Republic of China
| |
Collapse
|
15
|
Liu GY, Jouandin P, Bahng RE, Perrimon N, Sabatini DM. An evolutionary mechanism to assimilate new nutrient sensors into the mTORC1 pathway. Nat Commun 2024; 15:2517. [PMID: 38514639 PMCID: PMC10957897 DOI: 10.1038/s41467-024-46680-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 03/06/2024] [Indexed: 03/23/2024] Open
Abstract
Animals sense and respond to nutrient availability in their environments, a task coordinated in part by the mTOR complex 1 (mTORC1) pathway. mTORC1 regulates growth in response to nutrients and, in mammals, senses specific amino acids through specialized sensors that bind the GATOR1/2 signaling hub. Given that animals can occupy diverse niches, we hypothesized that the pathway might evolve distinct sensors in different metazoan phyla. Whether such customization occurs, and how the mTORC1 pathway might capture new inputs, is unknown. Here, we identify the Drosophila melanogaster protein Unmet expectations (CG11596) as a species-restricted methionine sensor that directly binds the fly GATOR2 complex in a fashion antagonized by S-adenosylmethionine (SAM). We find that in Dipterans GATOR2 rapidly evolved the capacity to bind Unmet and to thereby repurpose a previously independent methyltransferase as a SAM sensor. Thus, the modular architecture of the mTORC1 pathway allows it to co-opt preexisting enzymes to expand its nutrient sensing capabilities, revealing a mechanism for conferring evolvability on an otherwise conserved system.
Collapse
Affiliation(s)
- Grace Y Liu
- Whitehead Institute for Biomedical Research and Massachusetts Institute of Technology, Department of Biology, 455 Main Street, Cambridge, MA, USA.
- Department of Biology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, USA.
- Koch Institute for Integrative Cancer Research and Massachusetts Institute of Technology, Department of Biology, 77 Massachusetts Avenue, Cambridge, MA, USA.
| | - Patrick Jouandin
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
- Institut de Recherche en Cancérologie de Montpellier, Inserm U1194-UM-ICM, Campus Val d'Aurelle, Montpellier, Cedex 5, France
| | - Raymond E Bahng
- Whitehead Institute for Biomedical Research and Massachusetts Institute of Technology, Department of Biology, 455 Main Street, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, USA
- Koch Institute for Integrative Cancer Research and Massachusetts Institute of Technology, Department of Biology, 77 Massachusetts Avenue, Cambridge, MA, USA
| | - Norbert Perrimon
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA.
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA.
| | - David M Sabatini
- Institute of Organic Chemistry and Biochemistry, Flemingovo n. 2, 166 10 Praha 6, Prague, Czech Republic.
| |
Collapse
|
16
|
Chakraborty S, Ahler E, Simon JJ, Fang L, Potter ZE, Sitko KA, Stephany JJ, Guttman M, Fowler DM, Maly DJ. Profiling of drug resistance in Src kinase at scale uncovers a regulatory network coupling autoinhibition and catalytic domain dynamics. Cell Chem Biol 2024; 31:207-220.e11. [PMID: 37683649 PMCID: PMC10902203 DOI: 10.1016/j.chembiol.2023.08.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 07/03/2023] [Accepted: 08/16/2023] [Indexed: 09/10/2023]
Abstract
Kinase inhibitors are effective cancer therapies, but resistance often limits clinical efficacy. Despite the cataloging of numerous resistance mutations, our understanding of kinase inhibitor resistance is still incomplete. Here, we comprehensively profiled the resistance of ∼3,500 Src tyrosine kinase mutants to four different ATP-competitive inhibitors. We found that ATP-competitive inhibitor resistance mutations are distributed throughout Src's catalytic domain. In addition to inhibitor contact residues, residues that participate in regulating Src's phosphotransferase activity were prone to the development of resistance. Unexpectedly, we found that a resistance-prone cluster of residues located on the top face of the N-terminal lobe of Src's catalytic domain contributes to autoinhibition by reducing catalytic domain dynamics, and mutations in this cluster led to resistance by lowering inhibitor affinity and promoting kinase hyperactivation. Together, our studies demonstrate how drug resistance profiling can be used to define potential resistance pathways and uncover new mechanisms of kinase regulation.
Collapse
Affiliation(s)
- Sujata Chakraborty
- Department of Chemistry, University of Washington, Seattle, WA 98195, USA
| | - Ethan Ahler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Molecular and Cellular Biology, University of Washington, Seattle, WA 98195, USA
| | - Jessica J Simon
- Department of Chemistry, University of Washington, Seattle, WA 98195, USA
| | - Linglan Fang
- Department of Chemistry, University of Washington, Seattle, WA 98195, USA
| | - Zachary E Potter
- Department of Chemistry, University of Washington, Seattle, WA 98195, USA
| | - Katherine A Sitko
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Jason J Stephany
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Miklos Guttman
- Department of Medicinal Chemistry, University of Washington, Seattle, WA 98195, USA
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Department of Bioengineering, University of Washington, Seattle, WA 98195, USA.
| | - Dustin J Maly
- Department of Chemistry, University of Washington, Seattle, WA 98195, USA; Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
17
|
Liu Z, Gillis T, Raman S, Cui Q. A parametrized two-domain thermodynamic model explains diverse mutational effects on protein allostery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.06.552196. [PMID: 37662419 PMCID: PMC10473640 DOI: 10.1101/2023.08.06.552196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
New experimental findings continue to challenge our understanding of protein allostery. Recent deep mutational scanning study showed that allosteric hotspots in the tetracycline repressor (TetR) and its homologous transcriptional factors are broadly distributed rather than spanning well-defined structural pathways as often assumed. Moreover, hotspot mutation-induced allostery loss was rescued by distributed additional mutations in a degenerate fashion. Here, we develop a two-domain thermodynamic model for TetR, which readily rationalizes these intriguing observations. The model accurately captures the in vivo activities of various mutants with changes in physically transparent parameters, allowing the data-based quantification of mutational effects using statistical inference. Our analysis reveals the intrinsic connection of intra- and inter-domain properties for allosteric regulation and illustrate epistatic interactions that are consistent with structural features of the protein. The insights gained from this study into the nature of two-domain allostery are expected to have broader implications for other multidomain allosteric proteins.
Collapse
Affiliation(s)
- Zhuang Liu
- Department of Physics, Boston University, Boston, United States
| | - Thomas Gillis
- Department of Biochemistry, University of Wisconsin, Madison, United States
| | - Srivatsan Raman
- Department of Biochemistry, University of Wisconsin, Madison, United States
- Department of Chemistry, University of Wisconsin, Madison, United States
- Department of Bacteriology, University of Wisconsin, Madison, United States
| | - Qiang Cui
- Department of Physics, Boston University, Boston, United States
- Department of Chemistry, Boston University, Boston, United States
| |
Collapse
|
18
|
Scott BM, Chen SK, Van Nynatten A, Liu J, Schott RK, Heon E, Peisajovich SG, Chang BSW. Scaling up Functional Analyses of the G Protein-Coupled Receptor Rhodopsin. J Mol Evol 2024; 92:61-71. [PMID: 38324225 DOI: 10.1007/s00239-024-10154-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/22/2023] [Indexed: 02/08/2024]
Abstract
Eukaryotic cells use G protein-coupled receptors (GPCRs) to convert external stimuli into internal signals to elicit cellular responses. However, how mutations in GPCR-coding genes affect GPCR activation and downstream signaling pathways remain poorly understood. Approaches such as deep mutational scanning show promise in investigations of GPCRs, but a high-throughput method to measure rhodopsin activation has yet to be achieved. Here, we scale up a fluorescent reporter assay in budding yeast that we engineered to study rhodopsin's light-activated signal transduction. Using this approach, we measured the mutational effects of over 1200 individual human rhodopsin mutants, generated by low-frequency random mutagenesis of the GPCR rhodopsin (RHO) gene. Analysis of the data in the context of rhodopsin's three-dimensional structure reveals that transmembrane helices are generally less tolerant to mutations compared to flanking helices that face the lipid bilayer, which suggest that mutational tolerance is contingent on both the local environment surrounding specific residues and the specific position of these residues in the protein structure. Comparison of functional scores from our screen to clinically identified rhodopsin disease variants found many pathogenic mutants to be loss of function. Lastly, functional scores from our assay were consistent with a complex counterion mechanism involved in ligand-binding and rhodopsin activation. Our results demonstrate that deep mutational scanning is possible for rhodopsin activation and can be an effective method for revealing properties of mutational tolerance that may be generalizable to other transmembrane proteins.
Collapse
Affiliation(s)
- Benjamin M Scott
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Steven K Chen
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada
| | | | - Jing Liu
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Ryan K Schott
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada
- Department of Biology and Centre for Vision Research, York University, Toronto, ON, Canada
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Elise Heon
- Department of Ophthalmology, Hospital for Sick Children, Toronto, ON, Canada
| | - Sergio G Peisajovich
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Belinda S W Chang
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada.
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada.
- Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
19
|
Sesta L, Pagnani A, Fernandez-de-Cossio-Diaz J, Uguzzoni G. Inference of annealed protein fitness landscapes with AnnealDCA. PLoS Comput Biol 2024; 20:e1011812. [PMID: 38377054 PMCID: PMC10878520 DOI: 10.1371/journal.pcbi.1011812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 01/08/2024] [Indexed: 02/22/2024] Open
Abstract
The design of proteins with specific tasks is a major challenge in molecular biology with important diagnostic and therapeutic applications. High-throughput screening methods have been developed to systematically evaluate protein activity, but only a small fraction of possible protein variants can be tested using these techniques. Computational models that explore the sequence space in-silico to identify the fittest molecules for a given function are needed to overcome this limitation. In this article, we propose AnnealDCA, a machine-learning framework to learn the protein fitness landscape from sequencing data derived from a broad range of experiments that use selection and sequencing to quantify protein activity. We demonstrate the effectiveness of our method by applying it to antibody Rep-Seq data of immunized mice and screening experiments, assessing the quality of the fitness landscape reconstructions. Our method can be applied to several experimental cases where a population of protein variants undergoes various rounds of selection and sequencing, without relying on the computation of variants enrichment ratios, and thus can be used even in cases of disjoint sequence samples.
Collapse
Affiliation(s)
- Luca Sesta
- Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy
| | - Andrea Pagnani
- Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy
- Italian Institute for Genomic Medicine, Torino, Italy
- INFN, Sezione di Torino, Torino, Italy
| | | | | |
Collapse
|
20
|
Hong Z, Barton JP. popDMS infers mutation effects from deep mutational scanning data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.29.577759. [PMID: 38352383 PMCID: PMC10862717 DOI: 10.1101/2024.01.29.577759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Deep mutational scanning (DMS) experiments provide a powerful method to measure the functional effects of genetic mutations at massive scales. However, the data generated from these experiments can be difficult to analyze, with significant variation between experimental replicates. To overcome this challenge, we developed popDMS, a computational method based on population genetics theory, to infer the functional effects of mutations from DMS data. Through extensive tests, we found that the functional effects of single mutations and epistasis inferred by popDMS are highly consistent across replicates, comparing favorably with existing methods. Our approach is flexible and can be widely applied to DMS data that includes multiple time points, multiple replicates, and different experimental conditions.
Collapse
Affiliation(s)
- Zhenchen Hong
- Department of Physics and Astronomy, University of California, Riverside, USA
| | - John P. Barton
- Department of Physics and Astronomy, University of California, Riverside, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, USA
- Department of Physics and Astronomy, University of Pittsburgh, USA
| |
Collapse
|
21
|
Bakhache W, Orr W, McCormick L, Dolan PT. Uncovering Structural Plasticity of Enterovirus A through Deep Insertional and Deletional Scanning. RESEARCH SQUARE 2024:rs.3.rs-3835307. [PMID: 38410474 PMCID: PMC10896406 DOI: 10.21203/rs.3.rs-3835307/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
Insertions and deletions (InDels) are essential sources of novelty in protein evolution. In RNA viruses, InDels cause dramatic phenotypic changes contributing to the emergence of viruses with altered immune profiles and host engagement. This work aimed to expand our current understanding of viral evolution and explore the mutational tolerance of RNA viruses to InDels, focusing on Enterovirus A71 (EV-A71) as a prototype for Enterovirus A species (EV-A). Using newly described deep InDel scanning approaches, we engineered approximately 45,000 insertions and 6,000 deletions at every site across the viral proteome, quantifying their effects on viral fitness. As a general trend, most InDels were lethal to the virus. However, our screen reproducibly identified a set of InDel-tolerant regions, demonstrating our ability to comprehensively map tolerance to these mutations. Tolerant sites highlighted structurally flexible and mutationally plastic regions of viral proteins that avoid core structural and functional elements. Phylogenetic analysis on EV-A species infecting diverse mammalian hosts revealed that the experimentally-identified hotspots overlapped with sites of InDels across the EV-A species, suggesting structural plasticity at these sites is an important function for InDels in EV speciation. Our work reveals the fitness effects of InDels across EV-A71, identifying regions of evolutionary capacity that require further monitoring, which could guide the development of Enterovirus vaccines.
Collapse
Affiliation(s)
- William Bakhache
- Quantitative Virology and Evolution Unit, Laboratory of Viral Diseases, NIH-NIAID Division of Intramural Research, Bethesda, MD, USA
| | - Walker Orr
- Quantitative Virology and Evolution Unit, Laboratory of Viral Diseases, NIH-NIAID Division of Intramural Research, Bethesda, MD, USA
| | - Lauren McCormick
- Quantitative Virology and Evolution Unit, Laboratory of Viral Diseases, NIH-NIAID Division of Intramural Research, Bethesda, MD, USA
- Department of Biology, University of Oxford, Oxford, UK
| | - Patrick T. Dolan
- Quantitative Virology and Evolution Unit, Laboratory of Viral Diseases, NIH-NIAID Division of Intramural Research, Bethesda, MD, USA
| |
Collapse
|
22
|
Irvine EB, Reddy ST. Advancing Antibody Engineering through Synthetic Evolution and Machine Learning. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2024; 212:235-243. [PMID: 38166249 DOI: 10.4049/jimmunol.2300492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 10/20/2023] [Indexed: 01/04/2024]
Abstract
Abs are versatile molecules with the potential to achieve exceptional binding to target Ags, while also possessing biophysical properties suitable for therapeutic drug development. Protein display and directed evolution systems have transformed synthetic Ab discovery, engineering, and optimization, vastly expanding the number of Ab clones able to be experimentally screened for binding. Moreover, the burgeoning integration of high-throughput screening, deep sequencing, and machine learning has further augmented in vitro Ab optimization, promising to accelerate the design process and massively expand the Ab sequence space interrogated. In this Brief Review, we discuss the experimental and computational tools employed in synthetic Ab engineering and optimization. We also explore the therapeutic challenges posed by developing Abs for infectious diseases, and the prospects for leveraging machine learning-guided protein engineering to prospectively design Abs resistant to viral escape.
Collapse
Affiliation(s)
- Edward B Irvine
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Sai T Reddy
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| |
Collapse
|
23
|
Fowler DM, Rehm HL. Will variants of uncertain significance still exist in 2030? Am J Hum Genet 2024; 111:5-10. [PMID: 38086381 PMCID: PMC10806733 DOI: 10.1016/j.ajhg.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 11/12/2023] [Accepted: 11/13/2023] [Indexed: 12/28/2023] Open
Abstract
In 2020, the National Human Genome Research Institute (NHGRI) made ten "bold predictions," including that "the clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation 'variant of uncertain significance (VUS)' obsolete." We discuss the prospects for this prediction, arguing that many, if not most, VUS in coding regions will be resolved by 2030. We outline a confluence of recent changes making this possible, especially advances in the standards for variant classification that better leverage diverse types of evidence, improvements in computational variant effect predictor performance, scalable multiplexed assays of variant effect capable of saturating the genome, and data-sharing efforts that will maximize the information gained from each new individual sequenced and variant interpreted. We suggest that clinicians and researchers can realize a future where VUSs have largely been eliminated, in line with the NHGRI's bold prediction. The length of time taken to reach this future, and thus whether we are able to achieve the goal of largely eliminating VUSs by 2030, is largely a consequence of the choices made now and in the next few years. We believe that investing in eliminating VUSs is worthwhile, since their predominance remains one of the biggest challenges to precision genomic medicine.
Collapse
Affiliation(s)
- Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA; Department of Bioengineering, University of Washington, Seattle, WA, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
| | - Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
24
|
Nemoto T, Ocari T, Planul A, Tekinsoy M, Zin EA, Dalkara D, Ferrari U. ACIDES: on-line monitoring of forward genetic screens for protein engineering. Nat Commun 2023; 14:8504. [PMID: 38148337 PMCID: PMC10751290 DOI: 10.1038/s41467-023-43967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 11/24/2023] [Indexed: 12/28/2023] Open
Abstract
Forward genetic screens of mutated variants are a versatile strategy for protein engineering and investigation, which has been successfully applied to various studies like directed evolution (DE) and deep mutational scanning (DMS). While next-generation sequencing can track millions of variants during the screening rounds, the vast and noisy nature of the sequencing data impedes the estimation of the performance of individual variants. Here, we propose ACIDES that combines statistical inference and in-silico simulations to improve performance estimation in the library selection process by attributing accurate statistical scores to individual variants. We tested ACIDES first on a random-peptide-insertion experiment and then on multiple public datasets from DE and DMS studies. ACIDES allows experimentalists to reliably estimate variant performance on the fly and can aid protein engineering and research pipelines in a range of applications, including gene therapy.
Collapse
Affiliation(s)
- Takahiro Nemoto
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
- Graduate School of Informatics, Kyoto University, Yoshida Hon-machi, Sakyo-ku, Kyoto, 606-8501, Japan.
- Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, Suita, Osaka, 565-0871, Japan.
| | - Tommaso Ocari
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Arthur Planul
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Muge Tekinsoy
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Emilia A Zin
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Deniz Dalkara
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
| | - Ulisse Ferrari
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
| |
Collapse
|
25
|
Xi C, Diao J, Moon TS. Advances in ligand-specific biosensing for structurally similar molecules. Cell Syst 2023; 14:1024-1043. [PMID: 38128482 PMCID: PMC10751988 DOI: 10.1016/j.cels.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/23/2023] [Accepted: 10/19/2023] [Indexed: 12/23/2023]
Abstract
The specificity of biological systems makes it possible to develop biosensors targeting specific metabolites, toxins, and pollutants in complex medical or environmental samples without interference from structurally similar compounds. For the last two decades, great efforts have been devoted to creating proteins or nucleic acids with novel properties through synthetic biology strategies. Beyond augmenting biocatalytic activity, expanding target substrate scopes, and enhancing enzymes' enantioselectivity and stability, an increasing research area is the enhancement of molecular specificity for genetically encoded biosensors. Here, we summarize recent advances in the development of highly specific biosensor systems and their essential applications. First, we describe the rational design principles required to create libraries containing potential mutants with less promiscuity or better specificity. Next, we review the emerging high-throughput screening techniques to engineer biosensing specificity for the desired target. Finally, we examine the computer-aided evaluation and prediction methods to facilitate the construction of ligand-specific biosensors.
Collapse
Affiliation(s)
- Chenggang Xi
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Jinjin Diao
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Tae Seok Moon
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA; Division of Biology and Biomedical Sciences, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
26
|
Hoskins I, Rao S, Tante C, Cenik C. Integrated multiplexed assays of variant effect reveal cis-regulatory determinants of catechol- O-methyltransferase gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.02.551517. [PMID: 38014045 PMCID: PMC10680568 DOI: 10.1101/2023.08.02.551517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Multiplexed assays of variant effect are powerful methods to profile the consequences of rare variants on gene expression and organismal fitness. Yet, few studies have integrated several multiplexed assays to map variant effects on gene expression in coding sequences. Here, we pioneered a multiplexed assay based on polysome profiling to measure variant effects on translation at scale, uncovering single-nucleotide variants that increase and decrease ribosome load. By combining high-throughput ribosome load data with multiplexed mRNA and protein abundance readouts, we mapped the cis-regulatory landscape of thousands of catechol-O-methyltransferase (COMT) variants from RNA to protein and found numerous coding variants that alter COMT expression. Finally, we trained machine learning models to map signatures of variant effects on COMT gene expression and uncovered both directional and divergent impacts across expression layers. Our analyses reveal expression phenotypes for thousands of variants in COMT and highlight variant effects on both single and multiple layers of expression. Our findings prompt future studies that integrate several multiplexed assays for the readout of gene expression.
Collapse
Affiliation(s)
- Ian Hoskins
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Shilpa Rao
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Charisma Tante
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Can Cenik
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
27
|
Padhy AA, Mavor D, Sahoo S, Bolon DNA, Mishra P. Systematic profiling of dominant ubiquitin variants reveals key functional nodes contributing to evolutionary selection. Cell Rep 2023; 42:113064. [PMID: 37656625 DOI: 10.1016/j.celrep.2023.113064] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/30/2023] [Accepted: 08/21/2023] [Indexed: 09/03/2023] Open
Abstract
Dominant-negative mutations can help to investigate the biological mechanisms and to understand the selective pressures for multifunctional proteins. However, most studies have focused on recessive mutant effects that occur in the absence of a second functional gene copy, which overlooks the fact that most eukaryotic genomes contain more than one copy of many genes. We have identified dominant effects on yeast growth rate among all possible point mutations in ubiquitin expressed alongside a wild-type allele. Our results reveal more than 400 dominant-negative mutations, indicating that dominant-negative effects make a sizable contribution to selection acting on ubiquitin. Cellular and biochemical analyses of individual ubiquitin variants show that dominant-negative effects are explained by varied accumulation of polyubiquitinated cellular proteins and/or defects in conjugation of ubiquitin variants to ubiquitin ligases. Our approach to identify dominant-negative mutations is general and can be applied to other proteins of interest.
Collapse
Affiliation(s)
- Amrita Arpita Padhy
- Department of Animal Biology, School of Life Sciences, University of Hyderabad, Telangana 500046, India
| | - David Mavor
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Subhashree Sahoo
- Department of Animal Biology, School of Life Sciences, University of Hyderabad, Telangana 500046, India
| | - Daniel N A Bolon
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Medical School, Worcester, MA 01655, USA.
| | - Parul Mishra
- Department of Animal Biology, School of Life Sciences, University of Hyderabad, Telangana 500046, India.
| |
Collapse
|
28
|
Smith MD, Case MA, Makowski EK, Tessier PM. Position-Specific Enrichment Ratio Matrix scores predict antibody variant properties from deep sequencing data. Bioinformatics 2023; 39:btad446. [PMID: 37478351 PMCID: PMC10477941 DOI: 10.1093/bioinformatics/btad446] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 06/21/2023] [Accepted: 07/20/2023] [Indexed: 07/23/2023] Open
Abstract
MOTIVATION Deep sequencing of antibody and related protein libraries after phage or yeast-surface display sorting is widely used to identify variants with increased affinity, specificity, and/or improvements in key biophysical properties. Conventional approaches for identifying optimal variants typically use the frequencies of observation in enriched libraries or the corresponding enrichment ratios. However, these approaches disregard the vast majority of deep sequencing data and often fail to identify the best variants in the libraries. RESULTS Here, we present a method, Position-Specific Enrichment Ratio Matrix (PSERM) scoring, that uses entire deep sequencing datasets from pre- and post-selections to score each observed protein variant. The PSERM scores are the sum of the site-specific enrichment ratios observed at each mutated position. We find that PSERM scores are much more reproducible and correlate more strongly with experimentally measured properties than frequencies or enrichment ratios, including for multiple antibody properties (affinity and non-specific binding) for a clinical-stage antibody (emibetuzumab). We expect that this method will be broadly applicable to diverse protein engineering campaigns. AVAILABILITY AND IMPLEMENTATION All deep sequencing datasets and code to perform the analyses presented within are available via https://github.com/Tessier-Lab-UMich/PSERM_paper.
Collapse
Affiliation(s)
- Matthew D Smith
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Biointerfaces Institute, University of Michigan, Ann Arbor, MI 48109-2200, United States
| | - Marshall A Case
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109-2200, United States
| | - Emily K Makowski
- Biointerfaces Institute, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Department of Pharmaceutical Sciences, University of Michigan, Ann Arbor, MI 48109-2200, United States
| | - Peter M Tessier
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Biointerfaces Institute, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Department of Pharmaceutical Sciences, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Protein Folding Disease Initiative, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Michigan Alzheimer’s Disease Center, University of Michigan, Ann Arbor, MI 48109-2200, United States
| |
Collapse
|
29
|
McConnell A, Hackel BJ. Protein engineering via sequence-performance mapping. Cell Syst 2023; 14:656-666. [PMID: 37494931 PMCID: PMC10527434 DOI: 10.1016/j.cels.2023.06.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 05/10/2023] [Accepted: 06/21/2023] [Indexed: 07/28/2023]
Abstract
Discovery and evolution of new and improved proteins has empowered molecular therapeutics, diagnostics, and industrial biotechnology. Discovery and evolution both require efficient screens and effective libraries, although they differ in their challenges because of the absence or presence, respectively, of an initial protein variant with the desired function. A host of high-throughput technologies-experimental and computational-enable efficient screens to identify performant protein variants. In partnership, an informed search of sequence space is needed to overcome the immensity, sparsity, and complexity of the sequence-performance landscape. Early in the historical trajectory of protein engineering, these elements aligned with distinct approaches to identify the most performant sequence: selection from large, randomized combinatorial libraries versus rational computational design. Substantial advances have now emerged from the synergy of these perspectives. Rational design of combinatorial libraries aids the experimental search of sequence space, and high-throughput, high-integrity experimental data inform computational design. At the core of the collaborative interface, efficient protein characterization (rather than mere selection of optimal variants) maps sequence-performance landscapes. Such quantitative maps elucidate the complex relationships between protein sequence and performance-e.g., binding, catalytic efficiency, biological activity, and developability-thereby advancing fundamental protein science and facilitating protein discovery and evolution.
Collapse
Affiliation(s)
- Adam McConnell
- Department of Biomedical Engineering, University of Minnesota - Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455, USA
| | - Benjamin J Hackel
- Department of Biomedical Engineering, University of Minnesota - Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455, USA; Department of Chemical Engineering and Materials Science, University of Minnesota - Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455, USA.
| |
Collapse
|
30
|
Haddox HK, Galloway JG, Dadonaite B, Bloom JD, Matsen IV FA, DeWitt WS. Jointly modeling deep mutational scans identifies shifted mutational effects among SARS-CoV-2 spike homologs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.31.551037. [PMID: 37577604 PMCID: PMC10418112 DOI: 10.1101/2023.07.31.551037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Deep mutational scanning (DMS) is a high-throughput experimental technique that measures the effects of thousands of mutations to a protein. These experiments can be performed on multiple homologs of a protein or on the same protein selected under multiple conditions. It is often of biological interest to identify mutations with shifted effects across homologs or conditions. However, it is challenging to determine if observed shifts arise from biological signal or experimental noise. Here, we describe a method for jointly inferring mutational effects across multiple DMS experiments while also identifying mutations that have shifted in their effects among experiments. A key aspect of our method is to regularize the inferred shifts, so that they are nonzero only when strongly supported by the data. We apply this method to DMS experiments that measure how mutations to spike proteins from SARS-CoV-2 variants (Delta, Omicron BA.1, and Omicron BA.2) affect cell entry. Most mutational effects are conserved between these spike homologs, but a fraction have markedly shifted. We experimentally validate a subset of the mutations inferred to have shifted effects, and confirm differences of > 1,000-fold in the impact of the same mutation on spike-mediated viral infection across spikes from different SARS-CoV-2 variants. Overall, our work establishes a general approach for comparing sets of DMS experiments to identify biologically important shifts in mutational effects.
Collapse
Affiliation(s)
- Hugh K. Haddox
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Jared G. Galloway
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Bernadeta Dadonaite
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Jesse D. Bloom
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, Seattle, WA 98109, USA
| | - Frederick A. Matsen IV
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Howard Hughes Medical Institute, Seattle, WA 98109, USA
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - William S. DeWitt
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
31
|
Diabate M, Islam MM, Nagy G, Banerjee T, Dhar S, Smith N, Adamovich AI, Starita LM, Parvin JD. DNA repair function scores for 2172 variants in the BRCA1 amino-terminus. PLoS Genet 2023; 19:e1010739. [PMID: 37578980 PMCID: PMC10449183 DOI: 10.1371/journal.pgen.1010739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 08/24/2023] [Accepted: 07/16/2023] [Indexed: 08/16/2023] Open
Abstract
Single nucleotide variants are the most frequent type of sequence changes detected in the genome and these are frequently variants of uncertain significance (VUS). VUS are changes in DNA for which disease risk association is unknown. Thus, methods that classify the functional impact of a VUS can be used as evidence for variant interpretation. In the case of the breast and ovarian cancer specific tumor suppressor protein, BRCA1, pathogenic missense variants frequently score as loss of function in an assay for homology-directed repair (HDR) of DNA double-strand breaks. We previously published functional results using a multiplexed assay for 1056 amino acid substitutions residues 2-192 in the amino terminus of BRCA1. In this study, we have re-assessed the data from this multiplexed assay using an improved analysis pipeline. These new analysis methods yield functional scores for more variants in the first 192 amino acids of BRCA1, plus we report new results for BRCA1 amino acid residues 193-302. We now present the functional classification of 2172 BRCA1 variants in BRCA1 residues 2-302 using the multiplexed HDR assay. Comparison of the functional determinations of the missense variants with clinically known benign or pathogenic variants indicated 93% sensitivity and 100% specificity for this assay. The results from BRCA1 variants tested in this assay are a resource for clinical geneticists for evidence to evaluate VUS in BRCA1.
Collapse
Affiliation(s)
- Mariame Diabate
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| | - Muhtadi M. Islam
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| | - Gregory Nagy
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| | - Tapahsama Banerjee
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| | - Shruti Dhar
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| | - Nahum Smith
- The University of Washington, Department of Genome Sciences, Seattle, Washington, United States of America
- Brotman Baty Institute for Precision Medicine, Seattle, Washington, United States of America
| | - Aleksandra I. Adamovich
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| | - Lea M. Starita
- The University of Washington, Department of Genome Sciences, Seattle, Washington, United States of America
- Brotman Baty Institute for Precision Medicine, Seattle, Washington, United States of America
| | - Jeffrey D. Parvin
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| |
Collapse
|
32
|
Ruperao P, Rangan P, Shah T, Thakur V, Kalia S, Mayes S, Rathore A. The Progression in Developing Genomic Resources for Crop Improvement. Life (Basel) 2023; 13:1668. [PMID: 37629524 PMCID: PMC10455509 DOI: 10.3390/life13081668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 07/25/2023] [Indexed: 08/27/2023] Open
Abstract
Sequencing technologies have rapidly evolved over the past two decades, and new technologies are being continually developed and commercialized. The emerging sequencing technologies target generating more data with fewer inputs and at lower costs. This has also translated to an increase in the number and type of corresponding applications in genomics besides enhanced computational capacities (both hardware and software). Alongside the evolving DNA sequencing landscape, bioinformatics research teams have also evolved to accommodate the increasingly demanding techniques used to combine and interpret data, leading to many researchers moving from the lab to the computer. The rich history of DNA sequencing has paved the way for new insights and the development of new analysis methods. Understanding and learning from past technologies can help with the progress of future applications. This review focuses on the evolution of sequencing technologies, their significant enabling role in generating plant genome assemblies and downstream applications, and the parallel development of bioinformatics tools and skills, filling the gap in data analysis techniques.
Collapse
Affiliation(s)
- Pradeep Ruperao
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Parimalan Rangan
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India;
| | - Trushar Shah
- International Institute of Tropical Agriculture (IITA), Nairobi 30709-00100, Kenya;
| | - Vivek Thakur
- Department of Systems & Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad 500046, India;
| | - Sanjay Kalia
- Department of Biotechnology, Ministry of Science and Technology, Government of India, New Delhi 110003, India;
| | - Sean Mayes
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Abhishek Rathore
- Excellence in Breeding, International Maize and Wheat Improvement Center (CIMMYT), Hyderabad 502324, India
| |
Collapse
|
33
|
Smith MD, Case MA, Makowski EK, Tessier PM. Position-Specific Enrichment Ratio Matrix scores predict antibody variant properties from deep sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.10.548448. [PMID: 37503142 PMCID: PMC10369870 DOI: 10.1101/2023.07.10.548448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Motivation Deep sequencing of antibody and related protein libraries after phage or yeast-surface display sorting is widely used to identify variants with increased affinity, specificity and/or improvements in key biophysical properties. Conventional approaches for identifying optimal variants typically use the frequencies of observation in enriched libraries or the corresponding enrichment ratios. However, these approaches disregard the vast majority of deep sequencing data and often fail to identify the best variants in the libraries. Results Here, we present a method, Position-Specific Enrichment Ratio Matrix (PSERM) scoring, that uses entire deep sequencing datasets from pre- and post-selections to score each observed protein variant. The PSERM scores are the sum of the site-specific enrichment ratios observed at each mutated position. We find that PSERM scores are much more reproducible and correlate more strongly with experimentally measured properties than frequencies or enrichment ratios, including for multiple antibody properties (affinity and non-specific binding) for a clinical-stage antibody (emibetuzumab). We expect that this method will be broadly applicable to diverse protein engineering campaigns. Availability All deep sequencing datasets and code to do the analyses presented within are available via GitHub. Contact Peter Tessier, ptessier@umich.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
34
|
Akkapeddi P, Hattori T, Khan I, Glasser E, Koide A, Ketavarapu G, Whaby M, Zuberi M, Teng KW, Lefler J, Maso L, Bang I, Ostrowski MC, O’Bryan JP, Koide S. Exploring switch II pocket conformation of KRAS(G12D) with mutant-selective monobody inhibitors. Proc Natl Acad Sci U S A 2023; 120:e2302485120. [PMID: 37399416 PMCID: PMC10334749 DOI: 10.1073/pnas.2302485120] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 05/26/2023] [Indexed: 07/05/2023] Open
Abstract
The G12D mutation is among the most common KRAS mutations associated with cancer, in particular, pancreatic cancer. Here, we have developed monobodies, small synthetic binding proteins, that are selective to KRAS(G12D) over KRAS(wild type) and other oncogenic KRAS mutations, as well as over the G12D mutation in HRAS and NRAS. Crystallographic studies revealed that, similar to other KRAS mutant-selective inhibitors, the initial monobody bound to the S-II pocket, the groove between switch II and α3 helix, and captured this pocket in the most widely open form reported to date. Unlike other G12D-selective polypeptides reported to date, the monobody used its backbone NH group to directly recognize the side chain of KRAS Asp12, a feature that closely resembles that of a small-molecule inhibitor, MTRX1133. The monobody also directly interacted with H95, a residue not conserved in RAS isoforms. These features rationalize the high selectivity toward the G12D mutant and the KRAS isoform. Structure-guided affinity maturation resulted in monobodies with low nM KD values. Deep mutational scanning of a monobody generated hundreds of functional and nonfunctional single-point mutants, which identified crucial residues for binding and those that contributed to the selectivity toward the GTP- and GDP-bound states. When expressed in cells as genetically encoded reagents, these monobodies engaged selectively with KRAS(G12D) and inhibited KRAS(G12D)-mediated signaling and tumorigenesis. These results further illustrate the plasticity of the S-II pocket, which may be exploited for the design of next-generation KRAS(G12D)-selective inhibitors.
Collapse
Affiliation(s)
- Padma Akkapeddi
- Laura and Isaac Perlmutter Cancer Center, New York University Langone Health, New York, NY10016
| | - Takamitsu Hattori
- Laura and Isaac Perlmutter Cancer Center, New York University Langone Health, New York, NY10016
- Dertment of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY10016
| | - Imran Khan
- Department of Cell and Molecular Pharmacology and Experimental Therapeutics, Hollings Cancer Center, Medical University of South Carolina, Charleston, SC29425
- Ralph H. Johnson VA Medical Center, Charleston, SC29425
| | - Eliezra Glasser
- Laura and Isaac Perlmutter Cancer Center, New York University Langone Health, New York, NY10016
| | - Akiko Koide
- Laura and Isaac Perlmutter Cancer Center, New York University Langone Health, New York, NY10016
- Department of Medicine, New York University Grossman School of Medicine, New York, NY10016
| | - Gayatri Ketavarapu
- Laura and Isaac Perlmutter Cancer Center, New York University Langone Health, New York, NY10016
| | - Michael Whaby
- Department of Cell and Molecular Pharmacology and Experimental Therapeutics, Hollings Cancer Center, Medical University of South Carolina, Charleston, SC29425
- Ralph H. Johnson VA Medical Center, Charleston, SC29425
| | - Mariyam Zuberi
- Department of Cell and Molecular Pharmacology and Experimental Therapeutics, Hollings Cancer Center, Medical University of South Carolina, Charleston, SC29425
- Ralph H. Johnson VA Medical Center, Charleston, SC29425
| | - Kai Wen Teng
- Laura and Isaac Perlmutter Cancer Center, New York University Langone Health, New York, NY10016
| | - Julia Lefler
- Department of Biochemistry and Molecular Biology, Hollings Cancer Center, Medical University of South Carolina, Charleston, SC29425
| | - Lorenzo Maso
- Laura and Isaac Perlmutter Cancer Center, New York University Langone Health, New York, NY10016
| | - Injin Bang
- Laura and Isaac Perlmutter Cancer Center, New York University Langone Health, New York, NY10016
| | - Michael C. Ostrowski
- Department of Biochemistry and Molecular Biology, Hollings Cancer Center, Medical University of South Carolina, Charleston, SC29425
| | - John P. O’Bryan
- Department of Cell and Molecular Pharmacology and Experimental Therapeutics, Hollings Cancer Center, Medical University of South Carolina, Charleston, SC29425
- Ralph H. Johnson VA Medical Center, Charleston, SC29425
| | - Shohei Koide
- Laura and Isaac Perlmutter Cancer Center, New York University Langone Health, New York, NY10016
- Dertment of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY10016
| |
Collapse
|
35
|
Zheng R, Huang Z, Deng L. Large-scale predicting protein functions through heterogeneous feature fusion. Brief Bioinform 2023:bbad243. [PMID: 37401369 DOI: 10.1093/bib/bbad243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 05/18/2023] [Accepted: 06/12/2023] [Indexed: 07/05/2023] Open
Abstract
As the volume of protein sequence and structure data grows rapidly, the functions of the overwhelming majority of proteins cannot be experimentally determined. Automated annotation of protein function at a large scale is becoming increasingly important. Existing computational prediction methods are typically based on expanding the relatively small number of experimentally determined functions to large collections of proteins with various clues, including sequence homology, protein-protein interaction, gene co-expression, etc. Although there has been some progress in protein function prediction in recent years, the development of accurate and reliable solutions still has a long way to go. Here we exploit AlphaFold predicted three-dimensional structural information, together with other non-structural clues, to develop a large-scale approach termed PredGO to annotate Gene Ontology (GO) functions for proteins. We use a pre-trained language model, geometric vector perceptrons and attention mechanisms to extract heterogeneous features of proteins and fuse these features for function prediction. The computational results demonstrate that the proposed method outperforms other state-of-the-art approaches for predicting GO functions of proteins in terms of both coverage and accuracy. The improvement of coverage is because the number of structures predicted by AlphaFold is greatly increased, and on the other hand, PredGO can extensively use non-structural information for functional prediction. Moreover, we show that over 205 000 ($\sim $100%) entries in UniProt for human are annotated by PredGO, over 186 000 ($\sim $90%) of which are based on predicted structure. The webserver and database are available at http://predgo.denglab.org/.
Collapse
Affiliation(s)
- Rongtao Zheng
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| | - Zhijian Huang
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| |
Collapse
|
36
|
Fowler DM, Adams DJ, Gloyn AL, Hahn WC, Marks DS, Muffley LA, Neal JT, Roth FP, Rubin AF, Starita LM, Hurles ME. An Atlas of Variant Effects to understand the genome at nucleotide resolution. Genome Biol 2023; 24:147. [PMID: 37394429 PMCID: PMC10316620 DOI: 10.1186/s13059-023-02986-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 06/13/2023] [Indexed: 07/04/2023] Open
Abstract
Sequencing has revealed hundreds of millions of human genetic variants, and continued efforts will only add to this variant avalanche. Insufficient information exists to interpret the effects of most variants, limiting opportunities for precision medicine and comprehension of genome function. A solution lies in experimental assessment of the functional effect of variants, which can reveal their biological and clinical impact. However, variant effect assays have generally been undertaken reactively for individual variants only after and, in most cases long after, their first observation. Now, multiplexed assays of variant effect can characterise massive numbers of variants simultaneously, yielding variant effect maps that reveal the function of every possible single nucleotide change in a gene or regulatory element. Generating maps for every protein encoding gene and regulatory element in the human genome would create an 'Atlas' of variant effect maps and transform our understanding of genetics and usher in a new era of nucleotide-resolution functional knowledge of the genome. An Atlas would reveal the fundamental biology of the human genome, inform human evolution, empower the development and use of therapeutics and maximize the utility of genomics for diagnosing and treating disease. The Atlas of Variant Effects Alliance is an international collaborative group comprising hundreds of researchers, technologists and clinicians dedicated to realising an Atlas of Variant Effects to help deliver on the promise of genomics.
Collapse
Affiliation(s)
- Douglas M. Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA USA
- Department of Bioengineering, University of Washington, Seattle, WA USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA USA
| | | | - Anna L. Gloyn
- Department of Pediatrics & Department of Genetics, Division of Endocrinology, Stanford School of Medicine, Stanford University, Stanford, CA USA
| | - William C. Hahn
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Debora S. Marks
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Systems Biology, Harvard Medical School, Cambridge, USA
| | - Lara A. Muffley
- Department of Genome Sciences, University of Washington, Seattle, WA USA
| | - James T. Neal
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease at Broad Institute, Cambridge, MA USA
| | - Frederick P. Roth
- Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON Canada
| | - Alan F. Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| | - Lea M. Starita
- Department of Genome Sciences, University of Washington, Seattle, WA USA
- Department of Bioengineering, University of Washington, Seattle, WA USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA USA
| | | |
Collapse
|
37
|
Konecki DM, Hamrick S, Wang C, Agosto MA, Wensel TG, Lichtarge O. CovET: A covariation-evolutionary trace method that identifies protein structure-function modules. J Biol Chem 2023; 299:104896. [PMID: 37290531 PMCID: PMC10338321 DOI: 10.1016/j.jbc.2023.104896] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 06/01/2023] [Accepted: 06/02/2023] [Indexed: 06/10/2023] Open
Abstract
Measuring the relative effect that any two sequence positions have on each other may improve protein design or help better interpret coding variants. Current approaches use statistics and machine learning but rarely consider phylogenetic divergences which, as shown by Evolutionary Trace studies, provide insight into the functional impact of sequence perturbations. Here, we reframe covariation analyses in the Evolutionary Trace framework to measure the relative tolerance to perturbation of each residue pair during evolution. This approach (CovET) systematically accounts for phylogenetic divergences: at each divergence event, we penalize covariation patterns that belie evolutionary coupling. We find that while CovET approximates the performance of existing methods to predict individual structural contacts, it performs significantly better at finding structural clusters of coupled residues and ligand binding sites. For example, CovET found more functionally critical residues when we examined the RNA recognition motif and WW domains. It correlates better with large-scale epistasis screen data. In the dopamine D2 receptor, top CovET residue pairs recovered accurately the allosteric activation pathway characterized for Class A G protein-coupled receptors. These data suggest that CovET ranks highest the sequence position pairs that play critical functional roles through epistatic and allosteric interactions in evolutionarily relevant structure-function motifs. CovET complements current methods and may shed light on fundamental molecular mechanisms of protein structure and function.
Collapse
Affiliation(s)
- Daniel M Konecki
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Spencer Hamrick
- Chemical, Physical, and Structural Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Chen Wang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Melina A Agosto
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Theodore G Wensel
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Cancer and Cell Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Olivier Lichtarge
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Cancer and Cell Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
38
|
Limdi A, Baym M. Resolving Deleterious and Near-Neutral Effects Requires Different Pooled Fitness Assay Designs. J Mol Evol 2023; 91:325-333. [PMID: 37160452 DOI: 10.1007/s00239-023-10110-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 04/06/2023] [Indexed: 05/11/2023]
Abstract
Pooled sequencing-based fitness assays are a powerful and widely used approach to quantifying fitness of thousands of genetic variants in parallel. Despite the throughput of such assays, they are prone to biases in fitness estimates, and errors in measurements are typically larger for deleterious fitness effects, relative to neutral effects. In practice, designing pooled fitness assays involves tradeoffs between the number of timepoints, the sequencing depth, and other parameters to gain as much information as possible within a feasible experiment. Here, we combined simulations and reanalysis of an existing experimental dataset to explore how assay parameters impact measurements of near-neutral and deleterious fitness effects using a standard fitness estimator. We found that sequencing multiple timepoints at relatively modest depth improved estimates of near-neutral fitness effects, but systematically biased measurements of deleterious effects. We showed that a fixed total number of reads, deeper sequencing at fewer timepoints improved resolution of deleterious fitness effects. Our results highlight a tradeoff between measurement of deleterious and near-neutral effect sizes for a fixed amount of data and suggest that fitness assay design should be tuned for fitness effects that are relevant to the specific biological question.
Collapse
Affiliation(s)
- Anurag Limdi
- Department of Biomedical Informatics and Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - Michael Baym
- Department of Biomedical Informatics and Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
39
|
Soneson C, Bendel AM, Diss G, Stadler MB. mutscan-a flexible R package for efficient end-to-end analysis of multiplexed assays of variant effect data. Genome Biol 2023; 24:132. [PMID: 37264470 DOI: 10.1186/s13059-023-02967-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 05/10/2023] [Indexed: 06/03/2023] Open
Abstract
Multiplexed assays of variant effect (MAVE) experimentally measure the effect of large numbers of sequence variants by selective enrichment of sequences with desirable properties followed by quantification by sequencing. mutscan is an R package for flexible analysis of such experiments, covering the entire workflow from raw reads up to statistical analysis and visualization. The core components are implemented in C++ for efficiency. Various experimental designs are supported, including single or paired reads with optional unique molecular identifiers. To find variants with changed relative abundance, mutscan employs established statistical models provided in the edgeR and limma packages. mutscan is available from https://github.com/fmicompbio/mutscan .
Collapse
Affiliation(s)
- Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland.
| | - Alexandra M Bendel
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
| | - Guillaume Diss
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
| | - Michael B Stadler
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland.
- University of Basel, Basel, Switzerland.
| |
Collapse
|
40
|
Liu GY, Jouandin P, Bahng RE, Perrimon N, Sabatini DM. An evolutionary mechanism to assimilate new nutrient sensors into the mTORC1 pathway. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.25.541239. [PMID: 37292894 PMCID: PMC10245982 DOI: 10.1101/2023.05.25.541239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Animals must sense and respond to nutrient availability in their local niche. This task is coordinated in part by the mTOR complex 1 (mTORC1) pathway, which regulates growth and metabolism in response to nutrients1-5. In mammals, mTORC1 senses specific amino acids through specialized sensors that act through the upstream GATOR1/2 signaling hub6-8. To reconcile the conserved architecture of the mTORC1 pathway with the diversity of environments that animals can occupy, we hypothesized that the pathway might maintain plasticity by evolving distinct nutrient sensors in different metazoan phyla1,9,10. Whether such customization occurs-and how the mTORC1 pathway might capture new nutrient inputs-is not known. Here, we identify the Drosophila melanogaster protein Unmet expectations (Unmet, formerly CG11596) as a species-restricted nutrient sensor and trace its incorporation into the mTORC1 pathway. Upon methionine starvation, Unmet binds to the fly GATOR2 complex to inhibit dTORC1. S-adenosylmethionine (SAM), a proxy for methionine availability, directly relieves this inhibition. Unmet expression is elevated in the ovary, a methionine-sensitive niche11, and flies lacking Unmet fail to maintain the integrity of the female germline under methionine restriction. By monitoring the evolutionary history of the Unmet-GATOR2 interaction, we show that the GATOR2 complex evolved rapidly in Dipterans to recruit and repurpose an independent methyltransferase as a SAM sensor. Thus, the modular architecture of the mTORC1 pathway allows it to co-opt preexisting enzymes and expand its nutrient sensing capabilities, revealing a mechanism for conferring evolvability on an otherwise highly conserved system.
Collapse
Affiliation(s)
- Grace Y. Liu
- Whitehead Institute for Biomedical Research and Massachusetts Institute of Technology, Department of Biology; 455 Main Street, Cambridge, Massachusetts 02142, USA
- Department of Biology, Massachusetts Institute of Technology; 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
- Koch Institute for Integrative Cancer Research and Massachusetts Institute of Technology, Department of Biology; 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Patrick Jouandin
- Department of Genetics, Blavatnik Institute, Harvard Medical School; Boston, MA 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School; Boston, MA 02115, USA
- Present address: Institut de Recherche en Cancérologie de Montpellier, Inserm U1194-UM-ICM; Campus Val d’Aurelle, F-34298 Montpellier Cedex 5, France
| | - Raymond E. Bahng
- Whitehead Institute for Biomedical Research and Massachusetts Institute of Technology, Department of Biology; 455 Main Street, Cambridge, Massachusetts 02142, USA
- Department of Biology, Massachusetts Institute of Technology; 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
- Koch Institute for Integrative Cancer Research and Massachusetts Institute of Technology, Department of Biology; 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Norbert Perrimon
- Department of Genetics, Blavatnik Institute, Harvard Medical School; Boston, MA 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School; Boston, MA 02115, USA
| | | |
Collapse
|
41
|
Gersing S, Schulze TK, Cagiada M, Stein A, Roth FP, Lindorff-Larsen K, Hartmann-Petersen R. Characterizing glucokinase variant mechanisms using a multiplexed abundance assay. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.24.542036. [PMID: 37292969 PMCID: PMC10245906 DOI: 10.1101/2023.05.24.542036] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Amino acid substitutions can perturb protein activity in multiple ways. Understanding their mechanistic basis may pinpoint how residues contribute to protein function. Here, we characterize the mechanisms of human glucokinase (GCK) variants, building on our previous comprehensive study on GCK variant activity. We assayed the abundance of 95% of GCK missense and nonsense variants, and found that 43% of hypoactive variants have a decreased cellular abundance. By combining our abundance scores with predictions of protein thermodynamic stability, we identify residues important for GCK metabolic stability and conformational dynamics. These residues could be targeted to modulate GCK activity, and thereby affect glucose homeostasis.
Collapse
Affiliation(s)
- Sarah Gersing
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen, Denmark
| | - Thea K. Schulze
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen, Denmark
| | - Matteo Cagiada
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen, Denmark
| | - Amelie Stein
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen, Denmark
| | - Frederick P. Roth
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5T 3A1, Canada
| | - Kresten Lindorff-Larsen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen, Denmark
| | - Rasmus Hartmann-Petersen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen, Denmark
| |
Collapse
|
42
|
Weinstein JY, Martí-Gómez C, Lipsh-Sokolik R, Hoch SY, Liebermann D, Nevo R, Weissman H, Petrovich-Kopitman E, Margulies D, Ivankov D, McCandlish DM, Fleishman SJ. Designed active-site library reveals thousands of functional GFP variants. Nat Commun 2023; 14:2890. [PMID: 37210560 DOI: 10.1038/s41467-023-38099-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 04/13/2023] [Indexed: 05/22/2023] Open
Abstract
Mutations in a protein active site can lead to dramatic and useful changes in protein activity. The active site, however, is sensitive to mutations due to a high density of molecular interactions, substantially reducing the likelihood of obtaining functional multipoint mutants. We introduce an atomistic and machine-learning-based approach, called high-throughput Functional Libraries (htFuncLib), that designs a sequence space in which mutations form low-energy combinations that mitigate the risk of incompatible interactions. We apply htFuncLib to the GFP chromophore-binding pocket, and, using fluorescence readout, recover >16,000 unique designs encoding as many as eight active-site mutations. Many designs exhibit substantial and useful diversity in functional thermostability (up to 96 °C), fluorescence lifetime, and quantum yield. By eliminating incompatible active-site mutations, htFuncLib generates a large diversity of functional sequences. We envision that htFuncLib will be used in one-shot optimization of activity in enzymes, binders, and other proteins.
Collapse
Affiliation(s)
| | - Carlos Martí-Gómez
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Rosalie Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Shlomo Yakir Hoch
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Demian Liebermann
- Department of Chemical and Biological Physics, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Reinat Nevo
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Haim Weissman
- Department of Molecular Chemistry and Materials Science, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | | | - David Margulies
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Dmitry Ivankov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel.
| |
Collapse
|
43
|
Gersing S, Cagiada M, Gebbia M, Gjesing AP, Coté AG, Seesankar G, Li R, Tabet D, Weile J, Stein A, Gloyn AL, Hansen T, Roth FP, Lindorff-Larsen K, Hartmann-Petersen R. A comprehensive map of human glucokinase variant activity. Genome Biol 2023; 24:97. [PMID: 37101203 PMCID: PMC10131484 DOI: 10.1186/s13059-023-02935-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 04/10/2023] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND Glucokinase (GCK) regulates insulin secretion to maintain appropriate blood glucose levels. Sequence variants can alter GCK activity to cause hyperinsulinemic hypoglycemia or hyperglycemia associated with GCK-maturity-onset diabetes of the young (GCK-MODY), collectively affecting up to 10 million people worldwide. Patients with GCK-MODY are frequently misdiagnosed and treated unnecessarily. Genetic testing can prevent this but is hampered by the challenge of interpreting novel missense variants. RESULT Here, we exploit a multiplexed yeast complementation assay to measure both hyper- and hypoactive GCK variation, capturing 97% of all possible missense and nonsense variants. Activity scores correlate with in vitro catalytic efficiency, fasting glucose levels in carriers of GCK variants and with evolutionary conservation. Hypoactive variants are concentrated at buried positions, near the active site, and at a region of known importance for GCK conformational dynamics. Some hyperactive variants shift the conformational equilibrium towards the active state through a relative destabilization of the inactive conformation. CONCLUSION Our comprehensive assessment of GCK variant activity promises to facilitate variant interpretation and diagnosis, expand our mechanistic understanding of hyperactive variants, and inform development of therapeutics targeting GCK.
Collapse
Affiliation(s)
- Sarah Gersing
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen, Denmark
| | - Matteo Cagiada
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen, Denmark
| | - Marinella Gebbia
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
| | - Anette P Gjesing
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Atina G Coté
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
| | - Gireesh Seesankar
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
| | - Roujia Li
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5T 3A1, Canada
| | - Daniel Tabet
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5T 3A1, Canada
| | - Jochen Weile
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5T 3A1, Canada
| | - Amelie Stein
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen, Denmark
| | - Anna L Gloyn
- Division of Endocrinology, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Diabetes Research Center, Stanford University, Stanford, CA, USA
| | - Torben Hansen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, M5T 3A1, Canada.
| | - Kresten Lindorff-Larsen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen, Denmark.
| | - Rasmus Hartmann-Petersen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen, Denmark.
| |
Collapse
|
44
|
Hoskins I, Sun S, Cote A, Roth FP, Cenik C. satmut_utils: a simulation and variant calling package for multiplexed assays of variant effect. Genome Biol 2023; 24:82. [PMID: 37081510 PMCID: PMC10116734 DOI: 10.1186/s13059-023-02922-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 04/04/2023] [Indexed: 04/22/2023] Open
Abstract
The impact of millions of individual genetic variants on molecular phenotypes in coding sequences remains unknown. Multiplexed assays of variant effect (MAVEs) are scalable methods to annotate relevant variants, but existing software lacks standardization, requires cumbersome configuration, and does not scale to large targets. We present satmut_utils as a flexible solution for simulation and variant quantification. We then benchmark MAVE software using simulated and real MAVE data. We finally determine mRNA abundance for thousands of cystathionine beta-synthase variants using two experimental methods. The satmut_utils package enables high-performance analysis of MAVEs and reveals the capability of variants to alter mRNA abundance.
Collapse
Affiliation(s)
- Ian Hoskins
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Song Sun
- The Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Atina Cote
- The Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Frederick P Roth
- The Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Can Cenik
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA.
| |
Collapse
|
45
|
Diabate M, Islam MM, Nagy G, Banerjee T, Dhar S, Smith N, Adamovich AI, Starita LM, Parvin JD. DNA Repair Function Scores for 2172 Variants in the BRCA1 Amino-Terminus. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.10.536331. [PMID: 37090572 PMCID: PMC10120616 DOI: 10.1101/2023.04.10.536331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Single nucleotide variants are the most frequent type of sequence changes detected in the genome and these are frequently variants of uncertain significance (VUS). VUS are changes in DNA for which disease risk association is unknown. Thus, methods that classify the functional impact of a VUS can be used as evidence for variant interpretation. In the case of the breast and ovarian cancer specific tumor suppressor protein, BRCA1, pathogenic missense variants frequently score as loss of function in an assay for homology-directed repair (HDR) of DNA double-strand breaks. We previously published functional results using a multiplexed assay for 1056 amino acid substitutions residues 2-192 in the amino terminus of BRCA1. In this study, we have re-assessed the data from this multiplexed assay using an improved analysis pipeline. These new analysis methods yield functional scores for more variants in the first 192 amino acids of BRCA1, plus we report new results for BRCA1 amino acid residues 193-302. We now present the functional classification of 2172 BRCA1 variants in BRCA1 residues 2-302 using the multiplexed HDR assay. Comparison of the functional determinations of the missense variants with clinically known benign or pathogenic variants indicated 93% sensitivity and 100% specificity for this assay. The results from BRCA1 variants tested in this assay are a resource for clinical geneticists for evidence to evaluate VUS in BRCA1 . AUTHOR SUMMARY Most missense substitutions in BRCA1 are variants of unknown significance (VUS), and individuals with a VUS in BRCA1 cannot know from genetic information alone whether this variant predisposes to breast or ovarian cancer. We apply a multiplexed functional assay for homology directed repair of DNA double strand breaks to assess variant impact on this important BRCA1 protein function. We analyzed 2172 variants in the amino-terminus of BRCA1 and demonstrate that variants that are known as pathogenic have a loss of function in the DNA repair assay. Conversely, variants that are known to be benign are functionally normal in the multiplexed assay. We suggest that these functional determinations of BRCA1 variants can be used to augment the information that clinical cancer geneticists provide to patients who have a VUS in BRCA1 .
Collapse
Affiliation(s)
- Mariame Diabate
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| | - Muhtadi M Islam
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| | - Gregory Nagy
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| | - Tapahsama Banerjee
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| | - Shruti Dhar
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| | - Nahum Smith
- The University of Washington, Department of Genome Sciences, Seattle, WA 98195
- Brotman Baty Institute for Precision Medicine, Seattle WA, 98195
| | - Aleksandra I Adamovich
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| | - Lea M Starita
- The University of Washington, Department of Genome Sciences, Seattle, WA 98195
- Brotman Baty Institute for Precision Medicine, Seattle WA, 98195
| | - Jeffrey D Parvin
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| |
Collapse
|
46
|
Duan B, Qiu C, Sze SH, Kaplan C. Widespread epistasis shapes RNA Polymerase II active site function and evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.27.530048. [PMID: 36909581 PMCID: PMC10002619 DOI: 10.1101/2023.02.27.530048] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/04/2023]
Abstract
Multi-subunit RNA Polymerases (msRNAPs) are responsible for transcription in all kingdoms of life. At the heart of these msRNAPs is an ultra-conserved active site domain, the trigger loop (TL), coordinating transcription speed and fidelity by critical conformational changes impacting multiple steps in substrate selection, catalysis, and translocation. Previous studies have observed several different types of genetic interactions between eukaryotic RNA polymerase II (Pol II) TL residues, suggesting that the TL's function is shaped by functional interactions of residues within and around the TL. The extent of these interaction networks and how they control msRNAP function and evolution remain to be determined. Here we have dissected the Pol II TL interaction landscape by deep mutational scanning in Saccharomyces cerevisiae Pol II. Through analysis of over 15000 alleles, representing all single mutants, a rationally designed subset of double mutants, and evolutionarily observed TL haplotypes, we identify interaction networks controlling TL function. Substituting residues creates allele-specific networks and propagates epistatic effects across the Pol II active site. Furthermore, the interaction landscape further distinguishes alleles with similar growth phenotypes, suggesting increased resolution over the previously reported single mutant phenotypic landscape. Finally, co-evolutionary analyses reveal groups of co-evolving residues across Pol II converge onto the active site, where evolutionary constraints interface with pervasive epistasis. Our studies provide a powerful system to understand the plasticity of RNA polymerase mechanism and evolution, and provide the first example of pervasive epistatic landscape in a highly conserved and constrained domain within an essential enzyme.
Collapse
Affiliation(s)
- Bingbing Duan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260
| | - Chenxi Qiu
- Department of Genetics, Harvard Medical School, Boston, MA 02215
| | - Sing-Hoi Sze
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX 77843
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX 77843
| | - Craig Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260
| |
Collapse
|
47
|
Meier G, Thavarasah S, Ehrenbolger K, Hutter CAJ, Hürlimann LM, Barandun J, Seeger MA. Deep mutational scan of a drug efflux pump reveals its structure-function landscape. Nat Chem Biol 2023; 19:440-450. [PMID: 36443574 PMCID: PMC7615509 DOI: 10.1038/s41589-022-01205-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 10/10/2022] [Indexed: 11/30/2022]
Abstract
Drug efflux is a common resistance mechanism found in bacteria and cancer cells, but studies providing comprehensive functional insights are scarce. In this study, we performed deep mutational scanning (DMS) on the bacterial ABC transporter EfrCD to determine the drug efflux activity profile of more than 1,430 single variants. These systematic measurements revealed that the introduction of negative charges at different locations within the large substrate binding pocket results in strongly increased efflux activity toward positively charged ethidium, whereas additional aromatic residues did not display the same effect. Data analysis in the context of an inward-facing cryogenic electron microscopy structure of EfrCD uncovered a high-affinity binding site, which releases bound drugs through a peristaltic transport mechanism as the transporter transits to its outward-facing conformation. Finally, we identified substitutions resulting in rapid Hoechst influx without affecting the efflux activity for ethidium and daunorubicin. Hence, single mutations can convert EfrCD into a drug-specific ABC importer.
Collapse
Affiliation(s)
- Gianmarco Meier
- Institute of Medical Microbiology, University of Zurich, Zurich, Switzerland
| | - Sujani Thavarasah
- Institute of Medical Microbiology, University of Zurich, Zurich, Switzerland
| | - Kai Ehrenbolger
- Laboratory for Molecular Infection Medicine Sweden (MIMS), Department of Molecular Biology, Umeå Centre for Microbial Research, Umeå University, Umeå, Sweden
- Science for Life Laboratory, Umeå University, Umeå, Sweden
| | - Cedric A J Hutter
- Institute of Medical Microbiology, University of Zurich, Zurich, Switzerland
- Linkster Therapeutics AG, Zurich, Switzerland
| | - Lea M Hürlimann
- Institute of Medical Microbiology, University of Zurich, Zurich, Switzerland
- Linkster Therapeutics AG, Zurich, Switzerland
| | - Jonas Barandun
- Laboratory for Molecular Infection Medicine Sweden (MIMS), Department of Molecular Biology, Umeå Centre for Microbial Research, Umeå University, Umeå, Sweden
- Science for Life Laboratory, Umeå University, Umeå, Sweden
| | - Markus A Seeger
- Institute of Medical Microbiology, University of Zurich, Zurich, Switzerland.
| |
Collapse
|
48
|
Cheng Y, Bi X, Xu Y, Liu Y, Li J, Du G, Lv X, Liu L. Machine learning for metabolic pathway optimization: A review. Comput Struct Biotechnol J 2023; 21:2381-2393. [PMID: 38213889 PMCID: PMC10781721 DOI: 10.1016/j.csbj.2023.03.045] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 03/24/2023] [Accepted: 03/25/2023] [Indexed: 03/29/2023] Open
Abstract
Optimizing the metabolic pathways of microbial cell factories is essential for establishing viable biotechnological production processes. However, due to the limited understanding of the complex setup of cellular machinery, building efficient microbial cell factories remains tedious and time-consuming. Machine learning (ML), a powerful tool capable of identifying patterns within large datasets, has been used to analyze biological datasets generated using various high-throughput technologies to build data-driven models for complex bioprocesses. In addition, ML can also be integrated with Design-Build-Test-Learn to accelerate development. This review focuses on recent ML applications in genome-scale metabolic model construction, multistep pathway optimization, rate-limiting enzyme engineering, and gene regulatory element designing. In addition, we have discussed some limitations of these methods as well as potential solutions.
Collapse
Affiliation(s)
- Yang Cheng
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Xinyu Bi
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Yameng Xu
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Yanfeng Liu
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Jianghua Li
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Guocheng Du
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Xueqin Lv
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Long Liu
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
49
|
Chandra S, Manjunath K, Asok A, Varadarajan R. Mutational scan inferred binding energetics and structure in intrinsically disordered protein CcdA. Protein Sci 2023; 32:e4580. [PMID: 36714997 PMCID: PMC9951195 DOI: 10.1002/pro.4580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 01/02/2023] [Accepted: 01/25/2023] [Indexed: 01/31/2023]
Abstract
Unlike globular proteins, mutational effects on the function of Intrinsically Disordered Proteins (IDPs) are not well-studied. Deep Mutational Scanning of a yeast surface displayed mutant library yields insights into sequence-function relationships in the CcdA IDP. The approach enables facile prediction of interface residues and local structural signatures of the bound conformation. In contrast to previous titration-based approaches which use a number of ligand concentrations, we show that use of a single rationally chosen ligand concentration can provide quantitative estimates of relative binding constants for large numbers of protein variants. This is because the extended interface of IDP ensures that energetic effects of point mutations are spread over a much smaller range than for globular proteins. Our data also provides insights into the much-debated role of helicity and disorder in partner binding of IDPs. Based on this exhaustive mutational sensitivity dataset, a rudimentary model was developed in an attempt to predict mutational effects on binding affinity of IDPs that form alpha-helical structures upon binding.
Collapse
Affiliation(s)
| | | | - Aparna Asok
- Molecular Biophysics Unit, Indian Institute of ScienceBangaloreIndia
| | | |
Collapse
|
50
|
Flynn J, Samant N, Schneider-Nachum G, Tenzin T, Bolon DNA. Mutational fitness landscape and drug resistance. Curr Opin Struct Biol 2023; 78:102525. [PMID: 36621152 PMCID: PMC10243218 DOI: 10.1016/j.sbi.2022.102525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 11/29/2022] [Accepted: 12/06/2022] [Indexed: 01/08/2023]
Abstract
Robust technology has been developed to systematically quantify fitness landscapes that provide valuable opportunities to improve our understanding of drug resistance and define new avenues to develop drugs with reduced resistance susceptibility. We outline the critical importance of drug resistance studies and the potential for fitness landscape approaches to contribute to this effort. We describe the major technical advancements in mutational scanning, which is the primary approach used to quantify protein fitness landscapes. There are many complex steps to consider in planning and executing mutational scanning projects including developing a selection scheme, generating mutant libraries, tracking the frequency of variants using next-generation sequencing, and processing and interpreting the data. Key experimental parameters impacting each of these steps are discussed to aid in planning fitness landscape studies. There is a strong need for improved understanding of drug resistance, and fitness landscapes provide a promising new approach.
Collapse
Affiliation(s)
- Julia Flynn
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Neha Samant
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Gily Schneider-Nachum
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Tsepal Tenzin
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Daniel N A Bolon
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA.
| |
Collapse
|