1
|
McDonnell AF, Plech M, Livesey BJ, Gerasimavicius L, Owen LJ, Hall HN, FitzPatrick DR, Marsh JA, Kudla G. Deep mutational scanning quantifies DNA binding and predicts clinical outcomes of PAX6 variants. Mol Syst Biol 2024; 20:825-844. [PMID: 38849565 PMCID: PMC11219921 DOI: 10.1038/s44320-024-00043-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 04/05/2024] [Accepted: 05/14/2024] [Indexed: 06/09/2024] Open
Abstract
Nonsense and missense mutations in the transcription factor PAX6 cause a wide range of eye development defects, including aniridia, microphthalmia and coloboma. To understand how changes of PAX6:DNA binding cause these phenotypes, we combined saturation mutagenesis of the paired domain of PAX6 with a yeast one-hybrid (Y1H) assay in which expression of a PAX6-GAL4 fusion gene drives antibiotic resistance. We quantified binding of more than 2700 single amino-acid variants to two DNA sequence elements. Mutations in DNA-facing residues of the N-terminal subdomain and linker region were most detrimental, as were mutations to prolines and to negatively charged residues. Many variants caused sequence-specific molecular gain-of-function effects, including variants in position 71 that increased binding to the LE9 enhancer but decreased binding to a SELEX-derived binding site. In the absence of antibiotic selection, variants that retained DNA binding slowed yeast growth, likely because such variants perturbed the yeast transcriptome. Benchmarking against known patient variants and applying ACMG/AMP guidelines to variant classification, we obtained supporting-to-moderate evidence that 977 variants are likely pathogenic and 1306 are likely benign. Our analysis shows that most pathogenic mutations in the paired domain of PAX6 can be explained simply by the effects of these mutations on PAX6:DNA association, and establishes Y1H as a generalisable assay for the interpretation of variant effects in transcription factors.
Collapse
Affiliation(s)
- Alexander F McDonnell
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Marcin Plech
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Benjamin J Livesey
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Lukas Gerasimavicius
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Liusaidh J Owen
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Hildegard Nikki Hall
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - David R FitzPatrick
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Joseph A Marsh
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK.
| |
Collapse
|
2
|
Frenkel M, Raman S. Discovering mechanisms of human genetic variation and controlling cell states at scale. Trends Genet 2024; 40:587-600. [PMID: 38658256 DOI: 10.1016/j.tig.2024.03.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 03/29/2024] [Accepted: 03/29/2024] [Indexed: 04/26/2024]
Abstract
Population-scale sequencing efforts have catalogued substantial genetic variation in humans such that variant discovery dramatically outpaces interpretation. We discuss how single-cell sequencing is poised to reveal genetic mechanisms at a rate that may soon approach that of variant discovery. The functional genomics toolkit is sufficiently modular to systematically profile almost any type of variation within increasingly diverse contexts and with molecularly comprehensive and unbiased readouts. As a result, we can construct deep phenotypic atlases of variant effects that span the entire regulatory cascade. The same conceptual approach to interpreting genetic variation should be applied to engineering therapeutic cell states. In this way, variant mechanism discovery and cell state engineering will become reciprocating and iterative processes towards genomic medicine.
Collapse
Affiliation(s)
- Max Frenkel
- Cellular and Molecular Biology Graduate Program, University of Wisconsin, Madison, WI, USA; Medical Scientist Training Program, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA; Department of Biochemistry, University of Wisconsin, Madison, WI, USA.
| | - Srivatsan Raman
- Department of Biochemistry, University of Wisconsin, Madison, WI, USA; Department of Bacteriology, University of Wisconsin, Madison, WI, USA; Department of Chemical and Biological Engineering, University of Wisconsin, Madison, WI, USA.
| |
Collapse
|
3
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.593772. [PMID: 38798671 PMCID: PMC11118547 DOI: 10.1101/2024.05.12.593772] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
4
|
Bendel AM, Skendo K, Klein D, Shimada K, Kauneckaite-Griguole K, Diss G. Optimization of a deep mutational scanning workflow to improve quantification of mutation effects on protein-protein interactions. BMC Genomics 2024; 25:630. [PMID: 38914936 PMCID: PMC11194945 DOI: 10.1186/s12864-024-10524-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 06/14/2024] [Indexed: 06/26/2024] Open
Abstract
Deep Mutational Scanning (DMS) assays are powerful tools to study sequence-function relationships by measuring the effects of thousands of sequence variants on protein function. During a DMS experiment, several technical artefacts might distort non-linearly the functional score obtained, potentially biasing the interpretation of the results. We therefore tested several technical parameters in the deepPCA workflow, a DMS assay for protein-protein interactions, in order to identify technical sources of non-linearities. We found that parameters common to many DMS assays such as amount of transformed DNA, timepoint of harvest and library composition can cause non-linearities in the data. Designing experiments in a way to minimize these non-linear effects will improve the quantification and interpretation of mutation effects.
Collapse
Affiliation(s)
- Alexandra M Bendel
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
- University of Basel, Basel, Switzerland
| | | | - Dominique Klein
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
| | - Kenji Shimada
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
| | - Kotryna Kauneckaite-Griguole
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Guillaume Diss
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland.
| |
Collapse
|
5
|
Kinnersley B, Sud A, Everall A, Cornish AJ, Chubb D, Culliford R, Gruber AJ, Lärkeryd A, Mitsopoulos C, Wedge D, Houlston R. Analysis of 10,478 cancer genomes identifies candidate driver genes and opportunities for precision oncology. Nat Genet 2024:10.1038/s41588-024-01785-9. [PMID: 38890488 DOI: 10.1038/s41588-024-01785-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 05/01/2024] [Indexed: 06/20/2024]
Abstract
Tumor genomic profiling is increasingly seen as a prerequisite to guide the treatment of patients with cancer. To explore the value of whole-genome sequencing (WGS) in broadening the scope of cancers potentially amenable to a precision therapy, we analysed whole-genome sequencing data on 10,478 patients spanning 35 cancer types recruited to the UK 100,000 Genomes Project. We identified 330 candidate driver genes, including 74 that are new to any cancer. We estimate that approximately 55% of patients studied harbor at least one clinically relevant mutation, predicting either sensitivity or resistance to certain treatments or clinical trial eligibility. By performing computational chemogenomic analysis of cancer mutations we identify additional targets for compounds that represent attractive candidates for future clinical trials. This study represents one of the most comprehensive efforts thus far to identify cancer driver genes in the real world setting and assess their impact on informing precision oncology.
Collapse
Affiliation(s)
- Ben Kinnersley
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
- University College London Cancer Institute, University College London, London, UK
| | - Amit Sud
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Andrew Everall
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Alex J Cornish
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Daniel Chubb
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Richard Culliford
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Andreas J Gruber
- Systems Biology & Biomedical Data Science Laboratory, University of Konstanz, Konstanz, Germany
| | - Adrian Lärkeryd
- Division of Molecular Pathology, The Institute of Cancer Research, London, UK
| | - Costas Mitsopoulos
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| | - David Wedge
- Manchester Cancer Research Centre, University of Manchester, Manchester, UK
| | - Richard Houlston
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK.
| |
Collapse
|
6
|
Gao CY, Yang GY, Ding XW, Xu JH, Cheng X, Zheng GW, Chen Q. Engineering of Halide Methyltransferase BxHMT through Dynamic Cross-Correlation Network Analysis. Angew Chem Int Ed Engl 2024; 63:e202401235. [PMID: 38623716 DOI: 10.1002/anie.202401235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/18/2024] [Accepted: 04/15/2024] [Indexed: 04/17/2024]
Abstract
Halide methyltransferases (HMTs) provide an effective way to regenerate S-adenosyl methionine (SAM) from S-adenosyl homocysteine and reactive electrophiles, such as methyl iodide (MeI) and methyl toluene sulfonate (MeOTs). As compared with MeI, the cost-effective unnatural substrate MeOTs can be accessed directly from cheap and abundant alcohols, but shows only limited reactivity in SAM production. In this study, we developed a dynamic cross-correlation network analysis (DCCNA) strategy for quickly identifying hot spots influencing the catalytic efficiency of the enzyme, and applied it to the evolution of HMT from Paraburkholderia xenovorans. Finally, the optimal mutant, M4 (V55T/C125S/L127T/L129P), exhibited remarkable improvement, with a specific activity of 4.08 U/mg towards MeOTs, representing an 82-fold increase as compared to the wild-type (WT) enzyme. Notably, M4 also demonstrated a positive impact on the catalytic ability with other methyl donors. The structural mechanism behind the enhanced enzyme activity was uncovered by molecular dynamics simulations. Our work not only contributes a promising biocatalyst for the regeneration of SAM, but also offers a strategy for efficient enzyme engineering.
Collapse
Affiliation(s)
- Chun-Yu Gao
- State Key Laboratory of Bioreactor Engineering and Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, 200237, China
| | - Gui-Ying Yang
- State Key Laboratory of Bioreactor Engineering and Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, 200237, China
| | - Xu-Wei Ding
- State Key Laboratory of Bioreactor Engineering and Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, 200237, China
| | - Jian-He Xu
- State Key Laboratory of Bioreactor Engineering and Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, 200237, China
| | - Xiaolin Cheng
- Division of Medicinal Chemistry and Pharmacognosy, College of Pharmacy, The Ohio State University, Columbus, OH 43210, United States
| | - Gao-Wei Zheng
- State Key Laboratory of Bioreactor Engineering and Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, 200237, China
| | - Qi Chen
- State Key Laboratory of Bioreactor Engineering and Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, 200237, China
| |
Collapse
|
7
|
Chen SK, Liu J, Van Nynatten A, Tudor-Price BM, Chang BSW. Sampling Strategies for Experimentally Mapping Molecular Fitness Landscapes Using High-Throughput Methods. J Mol Evol 2024:10.1007/s00239-024-10179-8. [PMID: 38886207 DOI: 10.1007/s00239-024-10179-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/20/2024] [Indexed: 06/20/2024]
Abstract
Empirical studies of genotype-phenotype-fitness maps of proteins are fundamental to understanding the evolutionary process, in elucidating the space of possible genotypes accessible through mutations in a landscape of phenotypes and fitness effects. Yet, comprehensively mapping molecular fitness landscapes remains challenging since all possible combinations of amino acid substitutions for even a few protein sites are encoded by an enormous genotype space. High-throughput mapping of genotype space can be achieved using large-scale screening experiments known as multiplexed assays of variant effect (MAVEs). However, to accommodate such multi-mutational studies, the size of MAVEs has grown to the point where a priori determination of sampling requirements is needed. To address this problem, we propose calculations and simulation methods to approximate minimum sampling requirements for multi-mutational MAVEs, which we combine with a new library construction protocol to experimentally validate our approximation approaches. Analysis of our simulated data reveals how sampling trajectories differ between simulations of nucleotide versus amino acid variants and among mutagenesis schemes. For this, we show quantitatively that marginal gains in sampling efficiency demand increasingly greater sampling effort when sampling for nucleotide sequences over their encoded amino acid equivalents. We present a new library construction protocol that efficiently maximizes sequence variation, and demonstrate using ultradeep sequencing that the library encodes virtually all possible combinations of mutations within the experimental design. Insights learned from our analyses together with the methodological advances reported herein are immediately applicable toward pooled experimental screens of arbitrary design, enabling further assay upscaling and expanded testing of genotype space.
Collapse
Affiliation(s)
- Steven K Chen
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Jing Liu
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Alexander Van Nynatten
- Department of Biological Science, University of Toronto Scarborough, Toronto, ON, Canada
| | | | - Belinda S W Chang
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada.
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, ON, Canada.
- Centre for the Analysis of Genome Evolution & Function, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
8
|
Sun H, Zhao D, He Y, Meng HM, Li Z. Aptamer-Based DNA Allosteric Switch for Regulation of Protein Activity. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024:e2402531. [PMID: 38864341 DOI: 10.1002/advs.202402531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 05/13/2024] [Indexed: 06/13/2024]
Abstract
Allostery is a fundamental way to regulate the function of biomolecules playing crucial roles in cell metabolism and proliferation and is deemed the second secret of life. Given the limited understanding of the structure of natural allosteric molecules, the development of artificial allosteric molecules brings a huge opportunity to transform the allosteric mechanism into practical applications. In this study, the concept of bionics is introduced into the design of artificial allosteric molecules and an allosteric DNA switch with an activity site and an allosteric site based on two aptamers for selective inhibition of thrombin activity. Compared with the single aptamer, the allosteric switch possesses a significantly enhanced inhibition ability, which can be precisely regulated by converting the switch states. Moreover, the dynamic allosteric switch is further subjected to the control of the DNA threshold circuit for realizing automatic concentration determination and activity inhibition of thrombin. These compelling results confirm that this allosteric switch equipped with self-sensing and information-processing modules puts a new slant on the research of allosteric mechanisms and further application of allosteric tactics in chemical and biomedical fields.
Collapse
Affiliation(s)
- Hongzhi Sun
- College of Chemistry, Institute of Analytical Chemistry for Life Science, Zhengzhou University, Zhengzhou, 450001, China
| | - Di Zhao
- College of Chemistry, Institute of Analytical Chemistry for Life Science, Zhengzhou University, Zhengzhou, 450001, China
| | - Yating He
- College of Chemistry, Institute of Analytical Chemistry for Life Science, Zhengzhou University, Zhengzhou, 450001, China
| | - Hong-Min Meng
- College of Chemistry, Institute of Analytical Chemistry for Life Science, Zhengzhou University, Zhengzhou, 450001, China
| | - Zhaohui Li
- College of Chemistry, Institute of Analytical Chemistry for Life Science, Zhengzhou University, Zhengzhou, 450001, China
- The First Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China
| |
Collapse
|
9
|
Wang Y, Shen Z, Chen R, Chi X, Li W, Xu D, Lu Y, Ding J, Dong X, Zheng X. Discovery and characterization of novel FGFR1 inhibitors in triple-negative breast cancer via hybrid virtual screening and molecular dynamics simulations. Bioorg Chem 2024; 150:107553. [PMID: 38901279 DOI: 10.1016/j.bioorg.2024.107553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 06/08/2024] [Accepted: 06/08/2024] [Indexed: 06/22/2024]
Abstract
The overexpression of FGFR1 is thought to significantly contribute to the progression of triple-negative breast cancer (TNBC), impacting aspects such as tumorigenesis, growth, metastasis, and drug resistance. Consequently, the pursuit of effective inhibitors for FGFR1 is a key area of research interest. In response to this need, our study developed a hybrid virtual screening method. Utilizing KarmaDock, an innovative algorithm that blends deep learning with molecular docking, alongside Schrödinger's Residue Scanning. This strategy led us to identify compound 6, which demonstrated promising FGFR1 inhibitory activity, evidenced by an IC50 value of approximately 0.24 nM in the HTRF bioassay. Further evaluation revealed that this compound also inhibits the FGFR1 V561M variant with an IC50 value around 1.24 nM. Our subsequent investigations demonstrate that Compound 6 robustly suppresses the migration and invasion capacities of TNBC cell lines, through the downregulation of p-FGFR1 and modulation of EMT markers, highlighting its promise as a potent anti-metastatic therapeutic agent. Additionally, our use of molecular dynamics simulations provided a deeper understanding of the compound's specific binding interactions with FGFR1.
Collapse
Affiliation(s)
- Yuchen Wang
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of Medicine, Hangzhou City University, Hangzhou 310015, China; Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Zheyuan Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Roufen Chen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xinglong Chi
- Affiliated Yongkang First People's Hospital and School of Pharmacy, Hangzhou Medical College, Hangzhou 310014, Zhejiang, China
| | - Wenjie Li
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of Medicine, Hangzhou City University, Hangzhou 310015, China
| | - Donghang Xu
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Yan Lu
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Jianjun Ding
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Xiaowu Dong
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Xiaoli Zheng
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of Medicine, Hangzhou City University, Hangzhou 310015, China.
| |
Collapse
|
10
|
Liu Z, Gillis TG, Raman S, Cui Q. A parameterized two-domain thermodynamic model explains diverse mutational effects on protein allostery. eLife 2024; 12:RP92262. [PMID: 38836839 PMCID: PMC11152574 DOI: 10.7554/elife.92262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024] Open
Abstract
New experimental findings continue to challenge our understanding of protein allostery. Recent deep mutational scanning study showed that allosteric hotspots in the tetracycline repressor (TetR) and its homologous transcriptional factors are broadly distributed rather than spanning well-defined structural pathways as often assumed. Moreover, hotspot mutation-induced allostery loss was rescued by distributed additional mutations in a degenerate fashion. Here, we develop a two-domain thermodynamic model for TetR, which readily rationalizes these intriguing observations. The model accurately captures the in vivo activities of various mutants with changes in physically transparent parameters, allowing the data-based quantification of mutational effects using statistical inference. Our analysis reveals the intrinsic connection of intra- and inter-domain properties for allosteric regulation and illustrate epistatic interactions that are consistent with structural features of the protein. The insights gained from this study into the nature of two-domain allostery are expected to have broader implications for other multi-domain allosteric proteins.
Collapse
Affiliation(s)
- Zhuang Liu
- Department of Physics, Boston UniversityBostonUnited States
| | - Thomas G Gillis
- Department of Biochemistry, University of WisconsinMadisonUnited States
| | - Srivatsan Raman
- Department of Biochemistry, University of WisconsinMadisonUnited States
- Department of Chemistry, University of WisconsinMadisonUnited States
- Department of Bacteriology, University of WisconsinMadisonUnited States
| | - Qiang Cui
- Department of Physics, Boston UniversityBostonUnited States
- Department of Chemistry, Boston UniversityBostonUnited States
| |
Collapse
|
11
|
Jänes J, Müller M, Selvaraj S, Manoel D, Stephenson J, Gonçalves C, Lafita A, Polacco B, Obernier K, Alasoo K, Lemos MC, Krogan N, Martin M, Saraiva LR, Burke D, Beltrao P. Predicted mechanistic impacts of human protein missense variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.29.596373. [PMID: 38854010 PMCID: PMC11160786 DOI: 10.1101/2024.05.29.596373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Genome sequencing efforts have led to the discovery of tens of millions of protein missense variants found in the human population with the majority of these having no annotated role and some likely contributing to trait variation and disease. Sequence-based artificial intelligence approaches have become highly accurate at predicting variants that are detrimental to the function of proteins but they do not inform on mechanisms of disruption. Here we combined sequence and structure-based methods to perform proteome-wide prediction of deleterious variants with information on their impact on protein stability, protein-protein interactions and small-molecule binding pockets. AlphaFold2 structures were used to predict approximately 100,000 small-molecule binding pockets and stability changes for over 200 million variants. To inform on protein-protein interfaces we used AlphaFold2 to predict structures for nearly 500,000 protein complexes. We illustrate the value of mechanism-aware variant effect predictions to study the relation between protein stability and abundance and the structural properties of interfaces underlying trans protein quantitative trait loci (pQTLs). We characterised the distribution of mechanistic impacts of protein variants found in patients and experimentally studied example disease linked variants in FGFR1.
Collapse
Affiliation(s)
- Jürgen Jänes
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Müller
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Senthil Selvaraj
- Sidra Medicine, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | - Diogo Manoel
- Sidra Medicine, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | - James Stephenson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Catarina Gonçalves
- Sidra Medicine, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | | | - Benjamin Polacco
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA, USA
| | - Kirsten Obernier
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA, USA
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Manuel C. Lemos
- CICS-UBI, Health Sciences Research Centre, University of Beira Interior, 6200-506, Covilhã, Portugal
| | - Nevan Krogan
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA, USA
- J. David Gladstone Institutes, San Francisco, CA, USA
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Luis R. Saraiva
- Sidra Medicine, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | - David Burke
- Faculty of Life Sciences and Medicine, King’s College, London, UK
| | - Pedro Beltrao
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| |
Collapse
|
12
|
Rao J, Xin R, Macdonald C, Howard MK, Estevam GO, Yee SW, Wang M, Fraser JS, Coyote-Maestas W, Pimentel H. Rosace: a robust deep mutational scanning analysis framework employing position and mean-variance shrinkage. Genome Biol 2024; 25:138. [PMID: 38789982 PMCID: PMC11127319 DOI: 10.1186/s13059-024-03279-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 05/14/2024] [Indexed: 05/26/2024] Open
Abstract
Deep mutational scanning (DMS) measures the effects of thousands of genetic variants in a protein simultaneously. The small sample size renders classical statistical methods ineffective. For example, p-values cannot be correctly calibrated when treating variants independently. We propose Rosace, a Bayesian framework for analyzing growth-based DMS data. Rosace leverages amino acid position information to increase power and control the false discovery rate by sharing information across parameters via shrinkage. We also developed Rosette for simulating the distributional properties of DMS. We show that Rosace is robust to the violation of model assumptions and is more powerful than existing tools.
Collapse
Affiliation(s)
- Jingyou Rao
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Ruiqi Xin
- Computational and Systems Biology Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Christian Macdonald
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA
| | - Matthew K Howard
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA
- Tetrad Graduate Program, UCSF, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, UCSF, San Francisco, CA, USA
| | - Gabriella O Estevam
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA
- Tetrad Graduate Program, UCSF, San Francisco, CA, USA
| | - Sook Wah Yee
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA
| | - Mingsen Wang
- Department of Mathematics, Baruch College, CUNY, New York, NY, USA
| | - James S Fraser
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA
- Quantitative Biosciences Institute, UCSF, San Francisco, CA, USA
| | - Willow Coyote-Maestas
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA.
- Quantitative Biosciences Institute, UCSF, San Francisco, CA, USA.
| | - Harold Pimentel
- Department of Computer Science, UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
13
|
Yee SW, Macdonald CB, Mitrovic D, Zhou X, Koleske ML, Yang J, Buitrago Silva D, Rockefeller Grimes P, Trinidad DD, More SS, Kachuri L, Witte JS, Delemotte L, Giacomini KM, Coyote-Maestas W. The full spectrum of SLC22 OCT1 mutations illuminates the bridge between drug transporter biophysics and pharmacogenomics. Mol Cell 2024; 84:1932-1947.e10. [PMID: 38703769 DOI: 10.1016/j.molcel.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 01/04/2024] [Accepted: 04/15/2024] [Indexed: 05/06/2024]
Abstract
Mutations in transporters can impact an individual's response to drugs and cause many diseases. Few variants in transporters have been evaluated for their functional impact. Here, we combine saturation mutagenesis and multi-phenotypic screening to dissect the impact of 11,213 missense single-amino-acid deletions, and synonymous variants across the 554 residues of OCT1, a key liver xenobiotic transporter. By quantifying in parallel expression and substrate uptake, we find that most variants exert their primary effect on protein abundance, a phenotype not commonly measured alongside function. Using our mutagenesis results combined with structure prediction and molecular dynamic simulations, we develop accurate structure-function models of the entire transport cycle, providing biophysical characterization of all known and possible human OCT1 polymorphisms. This work provides a complete functional map of OCT1 variants along with a framework for integrating functional genomics, biophysical modeling, and human genetics to predict variant effects on disease and drug efficacy.
Collapse
Affiliation(s)
- Sook Wah Yee
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Christian B Macdonald
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Darko Mitrovic
- Science for Life Laboratory, Department of Applied Physics, KTH Royal Institute of Technology, 12121 Solna, Stockholm, Stockholm County 114 28, Sweden
| | - Xujia Zhou
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Megan L Koleske
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Jia Yang
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Dina Buitrago Silva
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Patrick Rockefeller Grimes
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Donovan D Trinidad
- Department of Medicine, Division of Infectious Disease, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Swati S More
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University, Stanford, CA 94305, USA; Stanford Cancer Institute, Stanford University, Stanford, CA 94305, USA
| | - John S Witte
- Department of Epidemiology and Population Health, Stanford University, Stanford, CA 94305, USA; Stanford Cancer Institute, Stanford University, Stanford, CA 94305, USA
| | - Lucie Delemotte
- Science for Life Laboratory, Department of Applied Physics, KTH Royal Institute of Technology, 12121 Solna, Stockholm, Stockholm County 114 28, Sweden.
| | - Kathleen M Giacomini
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA.
| | - Willow Coyote-Maestas
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA; Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94143, USA; Chan Zuckerberg Biohub, San Francisco, CA 94148, USA.
| |
Collapse
|
14
|
Faure AJ, Lehner B, Miró Pina V, Serrano Colome C, Weghorn D. An extension of the Walsh-Hadamard transform to calculate and model epistasis in genetic landscapes of arbitrary shape and complexity. PLoS Comput Biol 2024; 20:e1012132. [PMID: 38805561 PMCID: PMC11161127 DOI: 10.1371/journal.pcbi.1012132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 06/07/2024] [Accepted: 05/04/2024] [Indexed: 05/30/2024] Open
Abstract
Accurate models describing the relationship between genotype and phenotype are necessary in order to understand and predict how mutations to biological sequences affect the fitness and evolution of living organisms. The apparent abundance of epistasis (genetic interactions), both between and within genes, complicates this task and how to build mechanistic models that incorporate epistatic coefficients (genetic interaction terms) is an open question. The Walsh-Hadamard transform represents a rigorous computational framework for calculating and modeling epistatic interactions at the level of individual genotypic values (known as genetical, biological or physiological epistasis), and can therefore be used to address fundamental questions related to sequence-to-function encodings. However, one of its main limitations is that it can only accommodate two alleles (amino acid or nucleotide states) per sequence position. In this paper we provide an extension of the Walsh-Hadamard transform that allows the calculation and modeling of background-averaged epistasis (also known as ensemble epistasis) in genetic landscapes with an arbitrary number of states per position (20 for amino acids, 4 for nucleotides, etc.). We also provide a recursive formula for the inverse matrix and then derive formulae to directly extract any element of either matrix without having to rely on the computationally intensive task of constructing or inverting large matrices. Finally, we demonstrate the utility of our theory by using it to model epistasis within both simulated and empirical multiallelic fitness landscapes, revealing that both pairwise and higher-order genetic interactions are enriched between physically interacting positions.
Collapse
Affiliation(s)
- Andre J. Faure
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- ICREA, Pg. Lluis Companys 23, Barcelona 08010, Spain
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Verónica Miró Pina
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Claudia Serrano Colome
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Donate Weghorn
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|
15
|
Hoskins I, Rao S, Tante C, Cenik C. Integrated multiplexed assays of variant effect reveal determinants of catechol-O-methyltransferase gene expression. Mol Syst Biol 2024; 20:481-505. [PMID: 38355921 PMCID: PMC11066095 DOI: 10.1038/s44320-024-00018-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 01/16/2024] [Accepted: 01/18/2024] [Indexed: 02/16/2024] Open
Abstract
Multiplexed assays of variant effect are powerful methods to profile the consequences of rare variants on gene expression and organismal fitness. Yet, few studies have integrated several multiplexed assays to map variant effects on gene expression in coding sequences. Here, we pioneered a multiplexed assay based on polysome profiling to measure variant effects on translation at scale, uncovering single-nucleotide variants that increase or decrease ribosome load. By combining high-throughput ribosome load data with multiplexed mRNA and protein abundance readouts, we mapped the cis-regulatory landscape of thousands of catechol-O-methyltransferase (COMT) variants from RNA to protein and found numerous coding variants that alter COMT expression. Finally, we trained machine learning models to map signatures of variant effects on COMT gene expression and uncovered both directional and divergent impacts across expression layers. Our analyses reveal expression phenotypes for thousands of variants in COMT and highlight variant effects on both single and multiple layers of expression. Our findings prompt future studies that integrate several multiplexed assays for the readout of gene expression.
Collapse
Affiliation(s)
- Ian Hoskins
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Shilpa Rao
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Charisma Tante
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Can Cenik
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA.
| |
Collapse
|
16
|
Simon JJ, Fowler DM, Maly DJ. Multiplexed, multimodal profiling of the intracellular activity, interactions, and druggability of protein variants using LABEL-seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.19.590094. [PMID: 38659825 PMCID: PMC11042325 DOI: 10.1101/2024.04.19.590094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Multiplexed assays of variant effect are powerful tools for assessing the impact of protein sequence variation, but are limited to measuring a single protein property and often rely on indirect readouts of intracellular protein function. Here, we developed LAbeling with Barcodes and Enrichment for biochemicaL analysis by sequencing (LABEL-seq), a platform for the multimodal profiling of thousands of protein variants in cultured human cells. Multimodal measurement of ~20,000 variant effects for ~1,600 BRaf variants using LABEL-seq revealed that variation at positions that are frequently mutated in cancer had minimal effects on folding and intracellular abundance but could dramatically alter activity, protein-protein interactions, and druggability. Integrative analysis of our multimodal measurements identified networks of positions with similar roles in regulating BRaf's signaling properties and enabled predictive modeling of variant effects on complex processes such as cell proliferation and small molecule-promoted degradation. LABEL-seq provides a scalable approach for the direct measurement of multiple biochemical effects of protein variants in their native cellular context, yielding insight into protein function, disease mechanisms, and druggability.
Collapse
Affiliation(s)
- Jessica J Simon
- Department of Chemistry, University of Washington, Seattle, WA, United States
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
- Department of Bioengineering, University of Washington, Seattle, WA, United States
- Co-corresponding authors: ,
| | - Dustin J Maly
- Department of Chemistry, University of Washington, Seattle, WA, United States
- Department of Biochemistry, University of Washington, Seattle, WA, United States
- Co-corresponding authors: ,
| |
Collapse
|
17
|
Zarin T, Lehner B. A complete map of specificity encoding for a partially fuzzy protein interaction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.25.591103. [PMID: 38712134 PMCID: PMC11071492 DOI: 10.1101/2024.04.25.591103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Thousands of human proteins function by binding short linear motifs embedded in intrinsically disordered regions. How affinity and specificity are encoded in these binding domains and the motifs themselves is not well understood. The evolvability of binding specificity - how rapidly and extensively it can change upon mutation - is also largely unexplored, as is the contribution of 'fuzzy' dynamic residues to affinity and specificity in protein-protein interactions. Here we report the first complete map of specificity encoding for a globular protein domain. Quantifying >200,000 energetic interactions between a PDZ domain and its ligand identifies 20 major energetically coupled pairs of sites that control specificity. These are organized into six modules, with most mutations in each module reprogramming specificity for a single position in the ligand. Nine of the major energetic couplings controlling specificity are between structural contacts and 11 have an allosteric mechanism of action. The dynamic tail of the ligand is more robust to mutation than the structured residues but contributes additively to binding affinity and communicates with structured residues to enable changes in specificity. Our results quantify the binding specificities of >1,800 globular proteins to reveal how specificity is encoded and provide a direct comparison of the encoding of affinity and specificity in structured and dynamic molecular recognition.
Collapse
Affiliation(s)
- Taraneh Zarin
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Barcelona, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Barcelona, Spain
- Wellcome Sanger Institute, Cambridge, UK
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
18
|
Howard MK, Hoppe N, Huang XP, Macdonald CB, Mehrota E, Grimes PR, Zahm A, Trinidad DD, English J, Coyote-Maestas W, Manglik A. Molecular basis of proton-sensing by G protein-coupled receptors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.17.590000. [PMID: 38659943 PMCID: PMC11042331 DOI: 10.1101/2024.04.17.590000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Three proton-sensing G protein-coupled receptors (GPCRs), GPR4, GPR65, and GPR68, respond to changes in extracellular pH to regulate diverse physiology and are implicated in a wide range of diseases. A central challenge in determining how protons activate these receptors is identifying the set of residues that bind protons. Here, we determine structures of each receptor to understand the spatial arrangement of putative proton sensing residues in the active state. With a newly developed deep mutational scanning approach, we determined the functional importance of every residue in proton activation for GPR68 by generating ~9,500 mutants and measuring effects on signaling and surface expression. This unbiased screen revealed that, unlike other proton-sensitive cell surface channels and receptors, no single site is critical for proton recognition in GPR68. Instead, a network of titratable residues extend from the extracellular surface to the transmembrane region and converge on canonical class A GPCR activation motifs to activate proton-sensing GPCRs. More broadly, our approach integrating structure and unbiased functional interrogation defines a new framework for understanding the rich complexity of GPCR signaling.
Collapse
Affiliation(s)
- Matthew K. Howard
- Tetrad graduate program, University of California, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Science, University of California, San Francisco, CA, USA
| | - Nicholas Hoppe
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
- Biophysics graduate program, University of California, San Francisco, CA, USA
| | - Xi-Ping Huang
- Department of Pharmacology and the National Institute of Mental Health Psychoactive Drug Screening Program (NIMH PDSP), The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Christian B. Macdonald
- Department of Bioengineering and Therapeutic Science, University of California, San Francisco, CA, USA
| | - Eshan Mehrota
- Tetrad graduate program, University of California, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
- Medical Scientist Training Program, University of California, San Francisco, CA, USA
| | | | - Adam Zahm
- Department of Biochemistry, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Donovan D. Trinidad
- Department of Medicine, Division of Infectious Disease, University of California, San Francisco, United States
| | - Justin English
- Department of Biochemistry, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Willow Coyote-Maestas
- Department of Bioengineering and Therapeutic Science, University of California, San Francisco, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Quantitative Biosciences Institute, University of California, San Francisco, USA
| | - Aashish Manglik
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Quantitative Biosciences Institute, University of California, San Francisco, USA
- Department of Anesthesia and Perioperative Care, University of California, San Francisco, CA, USA
| |
Collapse
|
19
|
Nguyen TN, Ingle C, Thompson S, Reynolds KA. The genetic landscape of a metabolic interaction. Nat Commun 2024; 15:3351. [PMID: 38637543 PMCID: PMC11026382 DOI: 10.1038/s41467-024-47671-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 04/09/2024] [Indexed: 04/20/2024] Open
Abstract
While much prior work has explored the constraints on protein sequence and evolution induced by physical protein-protein interactions, the sequence-level constraints emerging from non-binding functional interactions in metabolism remain unclear. To quantify how variation in the activity of one enzyme constrains the biochemical parameters and sequence of another, we focus on dihydrofolate reductase (DHFR) and thymidylate synthase (TYMS), a pair of enzymes catalyzing consecutive reactions in folate metabolism. We use deep mutational scanning to quantify the growth rate effect of 2696 DHFR single mutations in 3 TYMS backgrounds under conditions selected to emphasize biochemical epistasis. Our data are well-described by a relatively simple enzyme velocity to growth rate model that quantifies how metabolic context tunes enzyme mutational tolerance. Together our results reveal the structural distribution of epistasis in a metabolic enzyme and establish a foundation for the design of multi-enzyme systems.
Collapse
Affiliation(s)
- Thuy N Nguyen
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Form Bio, Dallas, TX, 75226, USA
| | - Christine Ingle
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Samuel Thompson
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, 94158, USA
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
| | - Kimberly A Reynolds
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- The Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| |
Collapse
|
20
|
Gersing S, Schulze TK, Cagiada M, Stein A, Roth FP, Lindorff-Larsen K, Hartmann-Petersen R. Characterizing glucokinase variant mechanisms using a multiplexed abundance assay. Genome Biol 2024; 25:98. [PMID: 38627865 PMCID: PMC11021015 DOI: 10.1186/s13059-024-03238-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 04/04/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Amino acid substitutions can perturb protein activity in multiple ways. Understanding their mechanistic basis may pinpoint how residues contribute to protein function. Here, we characterize the mechanisms underlying variant effects in human glucokinase (GCK) variants, building on our previous comprehensive study on GCK variant activity. RESULTS Using a yeast growth-based assay, we score the abundance of 95% of GCK missense and nonsense variants. When combining the abundance scores with our previously determined activity scores, we find that 43% of hypoactive variants also decrease cellular protein abundance. The low-abundance variants are enriched in the large domain, while residues in the small domain are tolerant to mutations with respect to abundance. Instead, many variants in the small domain perturb GCK conformational dynamics which are essential for appropriate activity. CONCLUSIONS In this study, we identify residues important for GCK metabolic stability and conformational dynamics. These residues could be targeted to modulate GCK activity, and thereby affect glucose homeostasis.
Collapse
Affiliation(s)
- Sarah Gersing
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen, Denmark.
| | - Thea K Schulze
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen, Denmark
| | - Matteo Cagiada
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen, Denmark
| | - Amelie Stein
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen, Denmark
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, M5S 3E1, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, M5S 1A8, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, M5G 1X5, Toronto, ON, Canada
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, 15213, Pittsburgh, USA
| | - Kresten Lindorff-Larsen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen, Denmark.
| | - Rasmus Hartmann-Petersen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen, Denmark.
| |
Collapse
|
21
|
Prywes N, Philips NR, Oltrogge LM, Lindner S, Candace Tsai YC, de Pins B, Cowan AE, Taylor-Kearney LJ, Chang HA, Hall LN, Bellieny-Rabelo D, Nisonoff HM, Weissman RF, Flamholz AI, Ding D, Bhatt AY, Shih PM, Mueller-Cajar O, Milo R, Savage DF. A map of the rubisco biochemical landscape. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.27.559826. [PMID: 38645011 PMCID: PMC11030240 DOI: 10.1101/2023.09.27.559826] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Rubisco is the primary CO2 fixing enzyme of the biosphere yet has slow kinetics. The roles of evolution and chemical mechanism in constraining the sequence landscape of rubisco remain debated. In order to map sequence to function, we developed a massively parallel assay for rubisco using an engineered E. coli where enzyme function is coupled to growth. By assaying >99% of single amino acid mutants across CO2 concentrations, we inferred enzyme velocity and CO2 affinity for thousands of substitutions. We identified many highly conserved positions that tolerate mutation and rare mutations that improve CO2 affinity. These data suggest that non-trivial kinetic improvements are readily accessible and provide a comprehensive sequence-to-function mapping for enzyme engineering efforts.
Collapse
Affiliation(s)
- Noam Prywes
- Innovative Genomics Institute, University of California; Berkeley, California 94720, USA
- Howard Hughes Medical Institute, University of California; Berkeley, California 94720, USA
| | - Naiya R. Philips
- Department of Molecular and Cell Biology, University of California; Berkeley, California 94720, USA
| | - Luke M. Oltrogge
- Howard Hughes Medical Institute, University of California; Berkeley, California 94720, USA
- Department of Molecular and Cell Biology, University of California; Berkeley, California 94720, USA
| | | | - Yi-Chin Candace Tsai
- School of Biological Sciences, Nanyang Technological University; Singapore 637551, Singapore
| | - Benoit de Pins
- Department of Plant and Environmental Sciences, Weizmann Institute of Science; Rehovot 76100, Israel
| | - Aidan E. Cowan
- Department of Molecular and Cell Biology, University of California; Berkeley, California 94720, USA
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory; Emeryville, CA 94608, USA
| | - Leah J. Taylor-Kearney
- Department of Plant and Microbial Biology, University of California, Berkeley; Berkeley, CA 94720, USA
| | - Hana A. Chang
- Department of Plant and Microbial Biology, University of California, Berkeley; Berkeley, CA 94720, USA
| | - Laina N. Hall
- Biophysics, University of California, Berkeley; Berkeley, CA 94720, USA
| | - Daniel Bellieny-Rabelo
- Innovative Genomics Institute, University of California; Berkeley, California 94720, USA
- California Institute for Quantitative Biosciences (QB3), University of California; Berkeley, CA 94720, USA
| | - Hunter M. Nisonoff
- Center for Computational Biology, University of California, Berkeley; Berkeley, CA, USA
| | - Rachel F. Weissman
- Department of Molecular and Cell Biology, University of California; Berkeley, California 94720, USA
| | - Avi I. Flamholz
- Division of Biology and Biological Engineering, California Institute of Technology; Pasadena, CA 91125
| | - David Ding
- Innovative Genomics Institute, University of California; Berkeley, California 94720, USA
- Howard Hughes Medical Institute, University of California; Berkeley, California 94720, USA
| | - Abhishek Y. Bhatt
- Department of Molecular and Cell Biology, University of California; Berkeley, California 94720, USA
- School of Medicine, University of California, San Diego; La Jolla, CA 92092, USA
| | - Patrick M. Shih
- Innovative Genomics Institute, University of California; Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley; Berkeley, CA 94720, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory; Berkeley, CA 94720, USA
- Feedstocks Division, Joint BioEnergy Institute; Emeryville, CA 94608, USA
| | - Oliver Mueller-Cajar
- School of Biological Sciences, Nanyang Technological University; Singapore 637551, Singapore
| | - Ron Milo
- Department of Plant and Environmental Sciences, Weizmann Institute of Science; Rehovot 76100, Israel
| | - David F. Savage
- Innovative Genomics Institute, University of California; Berkeley, California 94720, USA
- Howard Hughes Medical Institute, University of California; Berkeley, California 94720, USA
- Department of Molecular and Cell Biology, University of California; Berkeley, California 94720, USA
| |
Collapse
|
22
|
Nerín-Fonz F, Cournia Z. Machine learning approaches in predicting allosteric sites. Curr Opin Struct Biol 2024; 85:102774. [PMID: 38354652 DOI: 10.1016/j.sbi.2024.102774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 12/29/2023] [Accepted: 01/02/2024] [Indexed: 02/16/2024]
Abstract
Allosteric regulation is a fundamental biological mechanism that can control critical cellular processes via allosteric modulator binding to protein distal functional sites. The advantages of allosteric modulators over orthosteric ones have sparked the development of numerous computational approaches, such as the identification of allosteric binding sites, to facilitate allosteric drug discovery. Building on the success of machine learning (ML) models for solving complex problems in biology and chemistry, several ML models for predicting allosteric sites have been developed. In this review, we provide an overview of these models and discuss future perspectives powered by the field of artificial intelligence such as protein language models.
Collapse
Affiliation(s)
- Francho Nerín-Fonz
- Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephesiou, Athens 11527, Greece; Vienna Doctoral School of Pharmaceutical, Nutritional and Sport Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria.
| | - Zoe Cournia
- Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephesiou, Athens 11527, Greece; Vienna Doctoral School of Pharmaceutical, Nutritional and Sport Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria.
| |
Collapse
|
23
|
Dibyachintan S, Dube AK, Bradley D, Lemieux P, Dionne U, Landry CR. Cryptic genetic variation shapes the fate of gene duplicates in a protein interaction network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.23.581840. [PMID: 38464075 PMCID: PMC10925128 DOI: 10.1101/2024.02.23.581840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Paralogous genes are often redundant for long periods of time before they diverge in function. While their functions are preserved, paralogous proteins can accumulate mutations that, through epistasis, could impact their fate in the future. By quantifying the impact of all single-amino acid substitutions on the binding of two myosin proteins to their interaction partners, we find that the future evolution of these proteins is highly contingent on their regulatory divergence and the mutations that have silently accumulated in their protein binding domains. Differences in the promoter strength of the two paralogs amplify the impact of mutations on binding in the lowly expressed one. While some mutations would be sufficient to non-functionalize one paralog, they would have minimal impact on the other. Our results reveal how functionally equivalent protein domains could be destined to specific fates by regulatory and cryptic coding sequence changes that currently have little to no functional impact.
Collapse
Affiliation(s)
- Soham Dibyachintan
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
| | - Alexandre K Dube
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
- Département de Biologie, Université Laval, Québec, QC, Canada
| | - David Bradley
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
- Département de Biologie, Université Laval, Québec, QC, Canada
| | - Pascale Lemieux
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
| | - Ugo Dionne
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Current affiliation: Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Christian R Landry
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
- Département de Biologie, Université Laval, Québec, QC, Canada
| |
Collapse
|
24
|
Kohlmayr JM, Grabner GF, Nusser A, Höll A, Manojlović V, Halwachs B, Masser S, Jany-Luig E, Engelke H, Zimmermann R, Stelzl U. Mutational scanning pinpoints distinct binding sites of key ATGL regulators in lipolysis. Nat Commun 2024; 15:2516. [PMID: 38514628 PMCID: PMC10958042 DOI: 10.1038/s41467-024-46937-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 03/14/2024] [Indexed: 03/23/2024] Open
Abstract
ATGL is a key enzyme in intracellular lipolysis and plays an important role in metabolic and cardiovascular diseases. ATGL is tightly regulated by a known set of protein-protein interaction partners with activating or inhibiting functions in the control of lipolysis. Here, we use deep mutational protein interaction perturbation scanning and generate comprehensive profiles of single amino acid variants that affect the interactions of ATGL with its regulatory partners: CGI-58, G0S2, PLIN1, PLIN5 and CIDEC. Twenty-three ATGL amino acid variants yield a specific interaction perturbation pattern when validated in co-immunoprecipitation experiments in mammalian cells. We identify and characterize eleven highly selective ATGL switch mutations which affect the interaction of one of the five partners without affecting the others. Switch mutations thus provide distinct interaction determinants for ATGL's key regulatory proteins at an amino acid resolution. When we test triglyceride hydrolase activity in vitro and lipolysis in cells, the activity patterns of the ATGL switch variants trace to their protein interaction profile. In the context of structural data, the integration of variant binding and activity profiles provides insights into the regulation of lipolysis and the impact of mutations in human disease.
Collapse
Affiliation(s)
- Johanna M Kohlmayr
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
| | - Gernot F Grabner
- Institute of Molecular Biosciences, Biochemistry, University of Graz, Graz, Austria
- Gottfried Schatz Research Center, Molecular Biology and Biochemistry, Medical University of Graz, Graz, Austria
| | - Anna Nusser
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
| | - Anna Höll
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
| | - Verina Manojlović
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
| | - Bettina Halwachs
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
- Field of Excellence BioHealth - University of Graz, Graz, Austria
| | - Sarah Masser
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
- BioTechMed-Graz, Graz, Austria
| | - Evelyne Jany-Luig
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
| | - Hanna Engelke
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
- Field of Excellence BioHealth - University of Graz, Graz, Austria
| | - Robert Zimmermann
- Institute of Molecular Biosciences, Biochemistry, University of Graz, Graz, Austria
- Field of Excellence BioHealth - University of Graz, Graz, Austria
- BioTechMed-Graz, Graz, Austria
| | - Ulrich Stelzl
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria.
- Field of Excellence BioHealth - University of Graz, Graz, Austria.
- BioTechMed-Graz, Graz, Austria.
| |
Collapse
|
25
|
Gelman S, Johnson B, Freschlin C, D'Costa S, Gitter A, Romero PA. Biophysics-based protein language models for protein engineering. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585128. [PMID: 38559182 PMCID: PMC10980077 DOI: 10.1101/2024.03.15.585128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure, and energetics. We finetune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity, and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.
Collapse
Affiliation(s)
- Sam Gelman
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | - Bryce Johnson
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | | | - Sameer D'Costa
- Department of Biochemistry, University of Wisconsin-Madison
| | - Anthony Gitter
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| | | |
Collapse
|
26
|
Xiao Z, Zha J, Yang X, Huang T, Huang S, Liu Q, Wang X, Zhong J, Zheng J, Liang R, Deng Z, Zhang J, Lin S, Dai S. A three-level regulatory mechanism of the aldo-keto reductase subfamily AKR12D. Nat Commun 2024; 15:2128. [PMID: 38459030 PMCID: PMC10923870 DOI: 10.1038/s41467-024-46363-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 02/23/2024] [Indexed: 03/10/2024] Open
Abstract
Modulation of protein function through allosteric regulation is central in biology, but biomacromolecular systems involving multiple subunits and ligands may exhibit complex regulatory mechanisms at different levels, which remain poorly understood. Here, we discover an aldo-keto reductase termed AKRtyl and present its three-level regulatory mechanism. Specifically, by combining steady-state and transient kinetics, X-ray crystallography and molecular dynamics simulation, we demonstrate that AKRtyl exhibits a positive synergy mediated by an unusual Monod-Wyman-Changeux (MWC) paradigm of allosteric regulation at low concentrations of the cofactor NADPH, but an inhibitory effect at high concentrations is observed. While the substrate tylosin binds at a remote allosteric site with positive cooperativity. We further reveal that these regulatory mechanisms are conserved in AKR12D subfamily, and that substrate cooperativity is common in AKRs across three kingdoms of life. This work provides an intriguing example for understanding complex allosteric regulatory networks.
Collapse
Affiliation(s)
- Zhihong Xiao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Jinyin Zha
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xu Yang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Tingting Huang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Shuxin Huang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Qi Liu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Xiaozheng Wang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Jie Zhong
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Jianting Zheng
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Rubing Liang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Zixin Deng
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Jian Zhang
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Shuangjun Lin
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
- Haihe Laboratory of Synthetic Biology, Tianjin, 300308, China.
- Frontiers Science Center for Transformative Molecules, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Shaobo Dai
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
| |
Collapse
|
27
|
Seitz EE, McCandlish DM, Kinney JB, Koo PK. Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.14.567120. [PMID: 38013993 PMCID: PMC10680760 DOI: 10.1101/2023.11.14.567120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Deep neural networks (DNNs) have greatly advanced the ability to predict genome function from sequence. Interpreting genomic DNNs in terms of biological mechanisms, however, remains difficult. Here we introduce SQUID, a genomic DNN interpretability framework based on surrogate modeling. SQUID approximates genomic DNNs in user-specified regions of sequence space using surrogate models, i.e., simpler models that are mechanistically interpretable. Importantly, SQUID removes the confounding effects that nonlinearities and heteroscedastic noise in functional genomics data can have on model interpretation. Benchmarking analysis on multiple genomic DNNs shows that SQUID, when compared to established interpretability methods, identifies motifs that are more consistent across genomic loci and yields improved single-nucleotide variant-effect predictions. SQUID also supports surrogate models that quantify epistatic interactions within and between cis-regulatory elements. SQUID thus advances the ability to mechanistically interpret genomic DNNs.
Collapse
Affiliation(s)
- Evan E Seitz
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
28
|
Yang J, Li FZ, Arnold FH. Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering. ACS CENTRAL SCIENCE 2024; 10:226-241. [PMID: 38435522 PMCID: PMC10906252 DOI: 10.1021/acscentsci.3c01275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 12/26/2023] [Accepted: 01/16/2024] [Indexed: 03/05/2024]
Abstract
Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.
Collapse
Affiliation(s)
- Jason Yang
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Francesca-Zhoufan Li
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Frances H. Arnold
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
29
|
Ding D, Shaw AY, Sinai S, Rollins N, Prywes N, Savage DF, Laub MT, Marks DS. Protein design using structure-based residue preferences. Nat Commun 2024; 15:1639. [PMID: 38388493 PMCID: PMC10884402 DOI: 10.1038/s41467-024-45621-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 01/29/2024] [Indexed: 02/24/2024] Open
Abstract
Recent developments in protein design rely on large neural networks with up to 100s of millions of parameters, yet it is unclear which residue dependencies are critical for determining protein function. Here, we show that amino acid preferences at individual residues-without accounting for mutation interactions-explain much and sometimes virtually all of the combinatorial mutation effects across 8 datasets (R2 ~ 78-98%). Hence, few observations (~100 times the number of mutated residues) enable accurate prediction of held-out variant effects (Pearson r > 0.80). We hypothesized that the local structural contexts around a residue could be sufficient to predict mutation preferences, and develop an unsupervised approach termed CoVES (Combinatorial Variant Effects from Structure). Our results suggest that CoVES outperforms not just model-free methods but also similarly to complex models for creating functional and diverse protein variants. CoVES offers an effective alternative to complicated models for identifying functional protein mutations.
Collapse
Affiliation(s)
- David Ding
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA.
| | - Ada Y Shaw
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Sam Sinai
- Dyno Therapeutics, Watertown, MA, 02472, USA
| | - Nathan Rollins
- Seismic Therapeutics, Lab Central, Cambridge, MA, 02142, USA
| | - Noam Prywes
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA
| | - David F Savage
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA, 94720, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
30
|
Wei Y, Chen AX, Lin Y, Wei T, Qiao B. Allosteric regulation in SARS-CoV-2 spike protein. Phys Chem Chem Phys 2024; 26:6582-6589. [PMID: 38329233 DOI: 10.1039/d4cp00106k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Allosteric regulation is common in protein-protein interactions and is thus promising in drug design. Previous experimental and simulation work supported the presence of allosteric regulation in the SARS-CoV-2 spike protein. Here the route of allosteric regulation in SARS-CoV-2 spike protein is examined by all-atom explicit solvent molecular dynamics simulations, contrastive machine learning, and the Ohm approach. It was found that peptide binding to the polybasic cleavage sites, especially the one at the first subunit of the trimeric spike protein, activates the fluctuation of the spike protein's backbone, which eventually propagates to the receptor-binding domain on the third subunit that binds to ACE2. Remarkably, the allosteric regulation routes starting from the polybasic cleavage sites share a high fraction (39-67%) of the critical amino acids with the routes starting from the nitrogen-terminal domains, suggesting the presence of an allosteric regulation network in the spike protein. Our study paves the way for the rational design of allosteric antibody inhibitors.
Collapse
Affiliation(s)
- Yong Wei
- Department of Computer Science, High Point University, High Point, NC 27268, USA
| | - Amy X Chen
- Thomas Jefferson High School for Science and Technology, Alexandria, VA 22312, USA
| | - Yuewei Lin
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Tao Wei
- Department of Chemical Engineering and Department of Biomedical Engineering, University of South Carolina, Columbia, SC 29208, USA.
| | - Baofu Qiao
- Department of Natural Sciences, Baruch College, City University of New York, New York, NY 10010, USA.
| |
Collapse
|
31
|
Nam K, Shao Y, Major DT, Wolf-Watz M. Perspectives on Computational Enzyme Modeling: From Mechanisms to Design and Drug Development. ACS OMEGA 2024; 9:7393-7412. [PMID: 38405524 PMCID: PMC10883025 DOI: 10.1021/acsomega.3c09084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/15/2024] [Accepted: 01/19/2024] [Indexed: 02/27/2024]
Abstract
Understanding enzyme mechanisms is essential for unraveling the complex molecular machinery of life. In this review, we survey the field of computational enzymology, highlighting key principles governing enzyme mechanisms and discussing ongoing challenges and promising advances. Over the years, computer simulations have become indispensable in the study of enzyme mechanisms, with the integration of experimental and computational exploration now established as a holistic approach to gain deep insights into enzymatic catalysis. Numerous studies have demonstrated the power of computer simulations in characterizing reaction pathways, transition states, substrate selectivity, product distribution, and dynamic conformational changes for various enzymes. Nevertheless, significant challenges remain in investigating the mechanisms of complex multistep reactions, large-scale conformational changes, and allosteric regulation. Beyond mechanistic studies, computational enzyme modeling has emerged as an essential tool for computer-aided enzyme design and the rational discovery of covalent drugs for targeted therapies. Overall, enzyme design/engineering and covalent drug development can greatly benefit from our understanding of the detailed mechanisms of enzymes, such as protein dynamics, entropy contributions, and allostery, as revealed by computational studies. Such a convergence of different research approaches is expected to continue, creating synergies in enzyme research. This review, by outlining the ever-expanding field of enzyme research, aims to provide guidance for future research directions and facilitate new developments in this important and evolving field.
Collapse
Affiliation(s)
- Kwangho Nam
- Department
of Chemistry and Biochemistry, University
of Texas at Arlington, Arlington, Texas 76019, United States
| | - Yihan Shao
- Department
of Chemistry and Biochemistry, University
of Oklahoma, Norman, Oklahoma 73019-5251, United States
| | - Dan T. Major
- Department
of Chemistry and Institute for Nanotechnology & Advanced Materials, Bar-Ilan University, Ramat-Gan 52900, Israel
| | | |
Collapse
|
32
|
Liu Z, Gillis T, Raman S, Cui Q. A parametrized two-domain thermodynamic model explains diverse mutational effects on protein allostery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.06.552196. [PMID: 37662419 PMCID: PMC10473640 DOI: 10.1101/2023.08.06.552196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
New experimental findings continue to challenge our understanding of protein allostery. Recent deep mutational scanning study showed that allosteric hotspots in the tetracycline repressor (TetR) and its homologous transcriptional factors are broadly distributed rather than spanning well-defined structural pathways as often assumed. Moreover, hotspot mutation-induced allostery loss was rescued by distributed additional mutations in a degenerate fashion. Here, we develop a two-domain thermodynamic model for TetR, which readily rationalizes these intriguing observations. The model accurately captures the in vivo activities of various mutants with changes in physically transparent parameters, allowing the data-based quantification of mutational effects using statistical inference. Our analysis reveals the intrinsic connection of intra- and inter-domain properties for allosteric regulation and illustrate epistatic interactions that are consistent with structural features of the protein. The insights gained from this study into the nature of two-domain allostery are expected to have broader implications for other multidomain allosteric proteins.
Collapse
Affiliation(s)
- Zhuang Liu
- Department of Physics, Boston University, Boston, United States
| | - Thomas Gillis
- Department of Biochemistry, University of Wisconsin, Madison, United States
| | - Srivatsan Raman
- Department of Biochemistry, University of Wisconsin, Madison, United States
- Department of Chemistry, University of Wisconsin, Madison, United States
- Department of Bacteriology, University of Wisconsin, Madison, United States
| | - Qiang Cui
- Department of Physics, Boston University, Boston, United States
- Department of Chemistry, Boston University, Boston, United States
| |
Collapse
|
33
|
Sesta L, Pagnani A, Fernandez-de-Cossio-Diaz J, Uguzzoni G. Inference of annealed protein fitness landscapes with AnnealDCA. PLoS Comput Biol 2024; 20:e1011812. [PMID: 38377054 PMCID: PMC10878520 DOI: 10.1371/journal.pcbi.1011812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 01/08/2024] [Indexed: 02/22/2024] Open
Abstract
The design of proteins with specific tasks is a major challenge in molecular biology with important diagnostic and therapeutic applications. High-throughput screening methods have been developed to systematically evaluate protein activity, but only a small fraction of possible protein variants can be tested using these techniques. Computational models that explore the sequence space in-silico to identify the fittest molecules for a given function are needed to overcome this limitation. In this article, we propose AnnealDCA, a machine-learning framework to learn the protein fitness landscape from sequencing data derived from a broad range of experiments that use selection and sequencing to quantify protein activity. We demonstrate the effectiveness of our method by applying it to antibody Rep-Seq data of immunized mice and screening experiments, assessing the quality of the fitness landscape reconstructions. Our method can be applied to several experimental cases where a population of protein variants undergoes various rounds of selection and sequencing, without relying on the computation of variants enrichment ratios, and thus can be used even in cases of disjoint sequence samples.
Collapse
Affiliation(s)
- Luca Sesta
- Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy
| | - Andrea Pagnani
- Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy
- Italian Institute for Genomic Medicine, Torino, Italy
- INFN, Sezione di Torino, Torino, Italy
| | | | | |
Collapse
|
34
|
Tee WV, Berezovsky IN. Allosteric drugs: New principles and design approaches. Curr Opin Struct Biol 2024; 84:102758. [PMID: 38171188 DOI: 10.1016/j.sbi.2023.102758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024]
Abstract
Focusing on an important biomedical implication of allostery - design of allosteric drugs, we describe characteristics of allosteric sites, effectors, and their modes of actions distinguishing them from the orthosteric counterparts and calling for new principles and protocols in the quests for allosteric drugs. We show the importance of considering both binding affinity and allosteric signaling in establishing the structure-activity relationships (SARs) toward design of allosteric effectors, arguing that pairs of allosteric sites and their effector ligands - the site-effector pairs - should be generated and adjusted simultaneously in the framework of what we call directed design protocol. Key ideas and approaches for designing allosteric effectors including reverse perturbation, targeted and agnostic analysis are also discussed here. Several promising computational approaches are highlighted, along with the need for and potential advantages of utilizing generative models to facilitate discovery/design of new allosteric drugs.
Collapse
Affiliation(s)
- Wei-Ven Tee
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A∗STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671.
| | - Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A∗STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671; Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore.
| |
Collapse
|
35
|
Weng C, Faure AJ, Escobedo A, Lehner B. The energetic and allosteric landscape for KRAS inhibition. Nature 2024; 626:643-652. [PMID: 38109937 PMCID: PMC10866706 DOI: 10.1038/s41586-023-06954-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 12/07/2023] [Indexed: 12/20/2023]
Abstract
Thousands of proteins have been validated genetically as therapeutic targets for human diseases1. However, very few have been successfully targeted, and many are considered 'undruggable'. This is particularly true for proteins that function via protein-protein interactions-direct inhibition of binding interfaces is difficult and requires the identification of allosteric sites. However, most proteins have no known allosteric sites, and a comprehensive allosteric map does not exist for any protein. Here we address this shortcoming by charting multiple global atlases of inhibitory allosteric communication in KRAS. We quantified the effects of more than 26,000 mutations on the folding of KRAS and its binding to six interaction partners. Genetic interactions in double mutants enabled us to perform biophysical measurements at scale, inferring more than 22,000 causal free energy changes. These energy landscapes quantify how mutations tune the binding specificity of a signalling protein and map the inhibitory allosteric sites for an important therapeutic target. Allosteric propagation is particularly effective across the central β-sheet of KRAS, and multiple surface pockets are genetically validated as allosterically active, including a distal pocket in the C-terminal lobe of the protein. Allosteric mutations typically inhibit binding to all tested effectors, but they can also change the binding specificity, revealing the regulatory, evolutionary and therapeutic potential to tune pathway activation. Using the approach described here, it should be possible to rapidly and comprehensively identify allosteric target sites in many proteins.
Collapse
Affiliation(s)
- Chenchun Weng
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Andre J Faure
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Albert Escobedo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- University Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
36
|
Deng J, Yuan Y, Cui Q. Modulation of Allostery with Multiple Mechanisms by Hotspot Mutations in TetR. J Am Chem Soc 2024; 146:2757-2768. [PMID: 38231868 PMCID: PMC10843641 DOI: 10.1021/jacs.3c12494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Modulating allosteric coupling offers unique opportunities for biomedical applications. Such efforts can benefit from efficient prediction and evaluation of allostery hotspot residues that dictate the degree of cooperativity between distant sites. We demonstrate that effects of allostery hotspot mutations can be evaluated qualitatively and semiquantitatively by molecular dynamics simulations in a bacterial tetracycline repressor (TetR). The simulations recapitulate the effects of these mutations on abolishing the induction function of TetR and provide a rationale for the different rescuabilities observed to restore allosteric coupling of the hotspot mutations. We demonstrate that the same noninducible phenotype could be the result of perturbations in distinct structural and energetic properties of TetR. Our work underscores the value of explicitly computing the functional free energy landscapes to effectively evaluate and rank hotspot mutations despite the prevalence of compensatory interactions and therefore provides quantitative guidance to allostery modulation for therapeutic and engineering applications.
Collapse
Affiliation(s)
- Jiahua Deng
- Department of Chemistry, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, United States
| | - Yuchen Yuan
- Department of Chemistry, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, United States
| | - Qiang Cui
- Department of Chemistry, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, United States
- Department of Physics, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, United States
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, Massachusetts 02215, United States
| |
Collapse
|
37
|
He J, Liu X, Zhu C, Zha J, Li Q, Zhao M, Wei J, Li M, Wu C, Wang J, Jiao Y, Ning S, Zhou J, Hong Y, Liu Y, He H, Zhang M, Chen F, Li Y, He X, Wu J, Lu S, Song K, Lu X, Zhang J. ASD2023: towards the integrating landscapes of allosteric knowledgebase. Nucleic Acids Res 2024; 52:D376-D383. [PMID: 37870448 PMCID: PMC10767950 DOI: 10.1093/nar/gkad915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 09/22/2023] [Accepted: 10/06/2023] [Indexed: 10/24/2023] Open
Abstract
Allosteric regulation, induced by perturbations at an allosteric site topographically distinct from the orthosteric site, is one of the most direct and efficient ways to fine-tune macromolecular function. The Allosteric Database (ASD; accessible online at http://mdl.shsmu.edu.cn/ASD) has been systematically developed since 2009 to provide comprehensive information on allosteric regulation. In recent years, allostery has seen sustained growth and wide-ranging applications in life sciences, from basic research to new therapeutics development, while also elucidating emerging obstacles across allosteric research stages. To overcome these challenges and maintain high-quality data center services, novel features were curated in the ASD2023 update: (i) 66 589 potential allosteric sites, covering > 80% of the human proteome and constituting the human allosteric pocketome; (ii) 748 allosteric protein-protein interaction (PPI) modulators with clear mechanisms, aiding protein machine studies and PPI-targeted drug discovery; (iii) 'Allosteric Hit-to-Lead,' a pioneering dataset providing panoramic views from 87 well-defined allosteric hits to 6565 leads and (iv) 456 dualsteric modulators for exploring the simultaneous regulation of allosteric and orthosteric sites. Meanwhile, ASD2023 maintains a significant growth of foundational allosteric data. Based on these efforts, the allosteric knowledgebase is progressively evolving towards an integrated landscape, facilitating advancements in allosteric target identification, mechanistic exploration and drug discovery.
Collapse
Affiliation(s)
- Jixiao He
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Xinyi Liu
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Chunhao Zhu
- College of Pharmacy, Ningxia Medical University, 1160 Shengli Street, Yinchuan, Ningxia 750004, China
| | - Jinyin Zha
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Qian Li
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Mingzhu Zhao
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Jiacheng Wei
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Mingyu Li
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Chengwei Wu
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- Department of Assisted Reproduction, Shanghai Ninth People's Hospital, Shanghai Jiao-Tong University School of Medicine (SJTU-SM), Shanghai 200011, China
| | - Junyuan Wang
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Department of Assisted Reproduction, Shanghai Ninth People's Hospital, Shanghai Jiao-Tong University School of Medicine (SJTU-SM), Shanghai 200011, China
| | - Yonglai Jiao
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Shaobo Ning
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Jiamin Zhou
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- Department of Assisted Reproduction, Shanghai Ninth People's Hospital, Shanghai Jiao-Tong University School of Medicine (SJTU-SM), Shanghai 200011, China
| | - Yue Hong
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Yonghui Liu
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Hongxi He
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Department of Assisted Reproduction, Shanghai Ninth People's Hospital, Shanghai Jiao-Tong University School of Medicine (SJTU-SM), Shanghai 200011, China
| | - Mingyang Zhang
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Feiying Chen
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Yanxiu Li
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Xinheng He
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Jing Wu
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Shaoyong Lu
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Kun Song
- Nutshell Therapeutics, Shanghai 201210, China
| | - Xuefeng Lu
- Department of Assisted Reproduction, Shanghai Ninth People's Hospital, Shanghai Jiao-Tong University School of Medicine (SJTU-SM), Shanghai 200011, China
| | - Jian Zhang
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- College of Pharmacy, Ningxia Medical University, 1160 Shengli Street, Yinchuan, Ningxia 750004, China
- School of Pharmaceutical Sciences, Zhengzhou University, Zhengzhou 450001, China
| |
Collapse
|
38
|
Nemoto T, Ocari T, Planul A, Tekinsoy M, Zin EA, Dalkara D, Ferrari U. ACIDES: on-line monitoring of forward genetic screens for protein engineering. Nat Commun 2023; 14:8504. [PMID: 38148337 PMCID: PMC10751290 DOI: 10.1038/s41467-023-43967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 11/24/2023] [Indexed: 12/28/2023] Open
Abstract
Forward genetic screens of mutated variants are a versatile strategy for protein engineering and investigation, which has been successfully applied to various studies like directed evolution (DE) and deep mutational scanning (DMS). While next-generation sequencing can track millions of variants during the screening rounds, the vast and noisy nature of the sequencing data impedes the estimation of the performance of individual variants. Here, we propose ACIDES that combines statistical inference and in-silico simulations to improve performance estimation in the library selection process by attributing accurate statistical scores to individual variants. We tested ACIDES first on a random-peptide-insertion experiment and then on multiple public datasets from DE and DMS studies. ACIDES allows experimentalists to reliably estimate variant performance on the fly and can aid protein engineering and research pipelines in a range of applications, including gene therapy.
Collapse
Affiliation(s)
- Takahiro Nemoto
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
- Graduate School of Informatics, Kyoto University, Yoshida Hon-machi, Sakyo-ku, Kyoto, 606-8501, Japan.
- Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, Suita, Osaka, 565-0871, Japan.
| | - Tommaso Ocari
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Arthur Planul
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Muge Tekinsoy
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Emilia A Zin
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Deniz Dalkara
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
| | - Ulisse Ferrari
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
| |
Collapse
|
39
|
Freiberger MI, Ruiz-Serra V, Pontes C, Romero-Durana M, Galaz-Davison P, Ramírez-Sarmiento CA, Schuster CD, Marti MA, Wolynes PG, Ferreiro DU, Parra RG, Valencia A. Local energetic frustration conservation in protein families and superfamilies. Nat Commun 2023; 14:8379. [PMID: 38104123 PMCID: PMC10725452 DOI: 10.1038/s41467-023-43801-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 11/20/2023] [Indexed: 12/19/2023] Open
Abstract
Energetic local frustration offers a biophysical perspective to interpret the effects of sequence variability on protein families. Here we present a methodology to analyze local frustration patterns within protein families and superfamilies that allows us to uncover constraints related to stability and function, and identify differential frustration patterns in families with a common ancestry. We analyze these signals in very well studied protein families such as PDZ, SH3, ɑ and β globins and RAS families. Recent advances in protein structure prediction make it possible to analyze a vast majority of the protein space. An automatic and unsupervised proteome-wide analysis on the SARS-CoV-2 virus demonstrates the potential of our approach to enhance our understanding of the natural phenotypic diversity of protein families beyond single protein instances. We apply our method to modify biophysical properties of natural proteins based on their family properties, as well as perform unsupervised analysis of large datasets to shed light on the physicochemical signatures of poorly characterized proteins such as the ones belonging to emergent pathogens.
Collapse
Affiliation(s)
- Maria I Freiberger
- Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica - IQUIBICEN/CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, C1428EGA, Argentina
| | - Victoria Ruiz-Serra
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
| | - Camila Pontes
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
| | - Miguel Romero-Durana
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
| | - Pablo Galaz-Davison
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine, and Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, 7820436, Chile
- ANID - Millennium Science Initiative Program - Millennium Institute for Integrative Biology (iBio), Santiago, 8331150, Chile
| | - Cesar A Ramírez-Sarmiento
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine, and Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, 7820436, Chile
- ANID - Millennium Science Initiative Program - Millennium Institute for Integrative Biology (iBio), Santiago, 8331150, Chile
| | - Claudio D Schuster
- Laboratorio de Bioinformática, Departamento de Química Biológica - IQUIBICEN/CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C1428EGA, Buenos Aires, Argentina
| | - Marcelo A Marti
- Laboratorio de Bioinformática, Departamento de Química Biológica - IQUIBICEN/CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C1428EGA, Buenos Aires, Argentina
| | - Peter G Wolynes
- Center for Theoretical Biological Physics and Department of Chemistry, Rice University, Houston, TX, 77005, USA
| | - Diego U Ferreiro
- Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica - IQUIBICEN/CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, C1428EGA, Argentina
| | - R Gonzalo Parra
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain.
| | - Alfonso Valencia
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|
40
|
Deng J, Yuan Y, Cui Q. Modulation of Allostery with Multiple Mechanisms by Hotspot Mutations in TetR. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.29.555381. [PMID: 37905112 PMCID: PMC10614727 DOI: 10.1101/2023.08.29.555381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Modulating allosteric coupling offers unique opportunities for biomedical applications. Such efforts can benefit from efficient prediction and evaluation of allostery hotspot residues that dictate the degree of co-operativity between distant sites. We demonstrate that effects of allostery hotspot mutations can be evaluated qualitatively and semi-quantitatively by molecular dynamics simulations in a bacterial tetracycline repressor (TetR). The simulations recapitulate the effects of these mutations on abolishing the induction function of TetR and provide a rationale for the different degrees of rescuability observed to restore allosteric coupling of the hotspot mutations. We demonstrate that the same non-inducible phenotype could be the result of perturbations in distinct structural and energetic properties of TetR. Our work underscore the value of explicitly computing the functional free energy landscapes to effectively evaluate and rank hotspot mutations despite the prevalence of compensatory interactions, and therefore provide quantitative guidance to allostery modulation for therapeutic and engineering applications.
Collapse
Affiliation(s)
- Jiahua Deng
- Department of Chemistry, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, United States
| | - Yuchen Yuan
- Department of Chemistry, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, United States
| | - Qiang Cui
- Department of Chemistry, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, United States
- Department of Physics, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, United States
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, Massachusetts 02215, United States
| |
Collapse
|
41
|
Notin P, Kollasch AW, Ritter D, van Niekerk L, Paul S, Spinner H, Rollins N, Shaw A, Weitzman R, Frazer J, Dias M, Franceschi D, Orenbuch R, Gal Y, Marks DS. ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.07.570727. [PMID: 38106144 PMCID: PMC10723403 DOI: 10.1101/2023.12.07.570727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite a surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and the variable performance of models across different protein families. Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design. It encompasses both a broad collection of over 250 standardized deep mutational scanning assays, spanning millions of mutated sequences, as well as curated clinical datasets providing high-quality expert annotations about mutation effects. We devise a robust evaluation framework that combines metrics for both fitness prediction and design, factors in known limitations of the underlying experimental methods, and covers both zero-shot and supervised settings. We report the performance of a diverse set of over 70 high-performing models from various subfields (eg., alignment-based, inverse folding) into a unified benchmark suite. We open source the corresponding codebase, datasets, MSAs, structures, model predictions and develop a user-friendly website that facilitates data access and analysis.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Ada Shaw
- Applied Mathematics, Harvard University
| | | | | | - Mafalda Dias
- Centre for Genomic Regulation, Universitat Pompeu Fabra
| | | | | | - Yarin Gal
- Computer Science, University of Oxford
| | | |
Collapse
|
42
|
Zhang Z, Lamson AR, Shelley M, Troyanskaya O. Interpretable neural architecture search and transfer learning for understanding CRISPR-Cas9 off-target enzymatic reactions. NATURE COMPUTATIONAL SCIENCE 2023; 3:1056-1066. [PMID: 38177723 DOI: 10.1038/s43588-023-00569-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 11/08/2023] [Indexed: 01/06/2024]
Abstract
Finely tuned enzymatic pathways control cellular processes, and their dysregulation can lead to disease. Developing predictive and interpretable models for these pathways is challenging because of the complexity of the pathways and of the cellular and genomic contexts. Here we introduce Elektrum, a deep learning framework that addresses these challenges with data-driven and biophysically interpretable models for determining the kinetics of biochemical systems. First, it uses in vitro kinetic assays to rapidly hypothesize an ensemble of high-quality kinetically interpretable neural networks (KINNs) that predict reaction rates. It then employs a transfer learning step, where the KINNs are inserted as intermediary layers into deeper convolutional neural networks, fine-tuning the predictions for reaction-dependent in vivo outcomes. We apply Elektrum to predict CRISPR-Cas9 off-target editing probabilities and demonstrate that Elektrum achieves improved performance, regularizes neural network architectures and maintains physical interpretability.
Collapse
Affiliation(s)
- Zijun Zhang
- Division of Artificial Intelligence in Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Adam R Lamson
- Center for Computational Biology, Flatiron Institute, New York City, NY, USA
| | - Michael Shelley
- Center for Computational Biology, Flatiron Institute, New York City, NY, USA.
- Courant Institute of Mathematical Sciences, New York University, New York City, NY, USA.
| | - Olga Troyanskaya
- Center for Computational Biology, Flatiron Institute, New York City, NY, USA.
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
43
|
Papkou A, Garcia-Pastor L, Escudero JA, Wagner A. A rugged yet easily navigable fitness landscape. Science 2023; 382:eadh3860. [PMID: 37995212 DOI: 10.1126/science.adh3860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 09/29/2023] [Indexed: 11/25/2023]
Abstract
Fitness landscape theory predicts that rugged landscapes with multiple peaks impair Darwinian evolution, but experimental evidence is limited. In this study, we used genome editing to map the fitness of >260,000 genotypes of the key metabolic enzyme dihydrofolate reductase in the presence of the antibiotic trimethoprim, which targets this enzyme. The resulting landscape is highly rugged and harbors 514 fitness peaks. However, its highest peaks are accessible to evolving populations via abundant fitness-increasing paths. Different peaks share large basins of attraction that render the outcome of adaptive evolution highly contingent on chance events. Our work shows that ruggedness need not be an obstacle to Darwinian evolution but can reduce its predictability. If true in general, the complexity of optimization problems on realistic landscapes may require reappraisal.
Collapse
Affiliation(s)
- Andrei Papkou
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - Lucia Garcia-Pastor
- Departamento de Sanidad Animal and VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain
| | - José Antonio Escudero
- Departamento de Sanidad Animal and VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
44
|
Hoskins I, Rao S, Tante C, Cenik C. Integrated multiplexed assays of variant effect reveal cis-regulatory determinants of catechol- O-methyltransferase gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.02.551517. [PMID: 38014045 PMCID: PMC10680568 DOI: 10.1101/2023.08.02.551517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Multiplexed assays of variant effect are powerful methods to profile the consequences of rare variants on gene expression and organismal fitness. Yet, few studies have integrated several multiplexed assays to map variant effects on gene expression in coding sequences. Here, we pioneered a multiplexed assay based on polysome profiling to measure variant effects on translation at scale, uncovering single-nucleotide variants that increase and decrease ribosome load. By combining high-throughput ribosome load data with multiplexed mRNA and protein abundance readouts, we mapped the cis-regulatory landscape of thousands of catechol-O-methyltransferase (COMT) variants from RNA to protein and found numerous coding variants that alter COMT expression. Finally, we trained machine learning models to map signatures of variant effects on COMT gene expression and uncovered both directional and divergent impacts across expression layers. Our analyses reveal expression phenotypes for thousands of variants in COMT and highlight variant effects on both single and multiple layers of expression. Our findings prompt future studies that integrate several multiplexed assays for the readout of gene expression.
Collapse
Affiliation(s)
- Ian Hoskins
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Shilpa Rao
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Charisma Tante
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Can Cenik
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
45
|
Xie X, Sun X, Wang Y, Lehner B, Li X. Dominance vs epistasis: the biophysical origins and plasticity of genetic interactions within and between alleles. Nat Commun 2023; 14:5551. [PMID: 37689712 PMCID: PMC10492795 DOI: 10.1038/s41467-023-41188-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 08/25/2023] [Indexed: 09/11/2023] Open
Abstract
An important challenge in genetics, evolution and biotechnology is to understand and predict how mutations combine to alter phenotypes, including molecular activities, fitness and disease. In diploids, mutations in a gene can combine on the same chromosome or on different chromosomes as a "heteroallelic combination". However, a direct comparison of the extent, sign, and stability of the genetic interactions between variants within and between alleles is lacking. Here we use thermodynamic models of protein folding and ligand-binding to show that interactions between mutations within and between alleles are expected in even very simple biophysical systems. Protein folding alone generates within-allele interactions and a single molecular interaction is sufficient to cause between-allele interactions and dominance. These interactions change differently, quantitatively and qualitatively as a system becomes more complex. Altering the concentration of a ligand can, for example, switch alleles from dominant to recessive. Our results show that intra-molecular epistasis and dominance should be widely expected in even the simplest biological systems but also reinforce the view that they are plastic system properties and so a formidable challenge to predict. Accurate prediction of both intra-molecular epistasis and dominance will require either detailed mechanistic understanding and experimental parameterization or brute-force measurement and learning.
Collapse
Affiliation(s)
- Xuan Xie
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China
| | - Xia Sun
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China
- Deanery of Biomedical Sciences, College of Medicine & Veterinary Medicine, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Yuheng Wang
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China
- Deanery of Biomedical Sciences, College of Medicine & Veterinary Medicine, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Ben Lehner
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, 08003, Spain.
- ICREA, Pg. Luis Companys 23, Barcelona, 08010, Spain.
- Wellcome Sanger Institute, Wellcome Genome Campus Hinxton, Cambridge, CB10 1SA, UK.
| | - Xianghua Li
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China.
- Wellcome Sanger Institute, Wellcome Genome Campus Hinxton, Cambridge, CB10 1SA, UK.
- Deanery of Biomedical Sciences, College of Medicine & Veterinary Medicine, University of Edinburgh, Edinburgh, EH8 9XD, UK.
- Biomedical and Health Translational Centre of Zhejiang Province, Haizhou East Road 718, Haining, 314400, P. R. China.
| |
Collapse
|
46
|
Nguyen TN, Ingle C, Thompson S, Reynolds KA. The Genetic Landscape of a Metabolic Interaction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.28.542639. [PMID: 37645784 PMCID: PMC10461916 DOI: 10.1101/2023.05.28.542639] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Enzyme abundance, catalytic activity, and ultimately sequence are all shaped by the need of growing cells to maintain metabolic flux while minimizing accumulation of deleterious intermediates. While much prior work has explored the constraints on protein sequence and evolution induced by physical protein-protein interactions, the sequence-level constraints emerging from non-binding functional interactions in metabolism remain unclear. To quantify how variation in the activity of one enzyme constrains the biochemical parameters and sequence of another, we focused on dihydrofolate reductase (DHFR) and thymidylate synthase (TYMS), a pair of enzymes catalyzing consecutive reactions in folate metabolism. We used deep mutational scanning to quantify the growth rate effect of 2,696 DHFR single mutations in 3 TYMS backgrounds under conditions selected to emphasize biochemical epistasis. Our data are well-described by a relatively simple enzyme velocity to growth rate model that quantifies how metabolic context tunes enzyme mutational tolerance. Together our results reveal the structural distribution of epistasis in a metabolic enzyme and establish a foundation for the design of multi-enzyme systems.
Collapse
Affiliation(s)
- Thuy N. Nguyen
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
| | - Christine Ingle
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
| | - Samuel Thompson
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158
| | - Kimberly A. Reynolds
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
| |
Collapse
|
47
|
Haddox HK, Galloway JG, Dadonaite B, Bloom JD, Matsen IV FA, DeWitt WS. Jointly modeling deep mutational scans identifies shifted mutational effects among SARS-CoV-2 spike homologs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.31.551037. [PMID: 37577604 PMCID: PMC10418112 DOI: 10.1101/2023.07.31.551037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Deep mutational scanning (DMS) is a high-throughput experimental technique that measures the effects of thousands of mutations to a protein. These experiments can be performed on multiple homologs of a protein or on the same protein selected under multiple conditions. It is often of biological interest to identify mutations with shifted effects across homologs or conditions. However, it is challenging to determine if observed shifts arise from biological signal or experimental noise. Here, we describe a method for jointly inferring mutational effects across multiple DMS experiments while also identifying mutations that have shifted in their effects among experiments. A key aspect of our method is to regularize the inferred shifts, so that they are nonzero only when strongly supported by the data. We apply this method to DMS experiments that measure how mutations to spike proteins from SARS-CoV-2 variants (Delta, Omicron BA.1, and Omicron BA.2) affect cell entry. Most mutational effects are conserved between these spike homologs, but a fraction have markedly shifted. We experimentally validate a subset of the mutations inferred to have shifted effects, and confirm differences of > 1,000-fold in the impact of the same mutation on spike-mediated viral infection across spikes from different SARS-CoV-2 variants. Overall, our work establishes a general approach for comparing sets of DMS experiments to identify biologically important shifts in mutational effects.
Collapse
Affiliation(s)
- Hugh K. Haddox
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Jared G. Galloway
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Bernadeta Dadonaite
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Jesse D. Bloom
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, Seattle, WA 98109, USA
| | - Frederick A. Matsen IV
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Howard Hughes Medical Institute, Seattle, WA 98109, USA
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - William S. DeWitt
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
48
|
Moesslacher CS, Auernig E, Woodsmith J, Feichtner A, Jany-Luig E, Jehle S, Worseck JM, Heine CL, Stefan E, Stelzl U. Missense variant interaction scanning reveals a critical role of the FERM domain for tumor suppressor protein NF2 conformation and function. Life Sci Alliance 2023; 6:e202302043. [PMID: 37280085 PMCID: PMC10244618 DOI: 10.26508/lsa.202302043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/24/2023] [Accepted: 05/24/2023] [Indexed: 06/08/2023] Open
Abstract
NF2 (moesin-ezrin-radixin-like [MERLIN] tumor suppressor) is frequently inactivated in cancer, where its NF2 tumor suppressor functionality is tightly coupled to protein conformation. How NF2 conformation is regulated and how NF2 conformation influences tumor suppressor activity is a largely open question. Here, we systematically characterized three NF2 conformation-dependent protein interactions utilizing deep mutational scanning interaction perturbation analyses. We identified two regions in NF2 with clustered mutations which affected conformation-dependent protein interactions. NF2 variants in the F2-F3 subdomain and the α3H helix region substantially modulated NF2 conformation and homomerization. Mutations in the F2-F3 subdomain altered proliferation in three cell lines and matched patterns of disease mutations in NF2 related-schwannomatosis. This study highlights the power of systematic mutational interaction perturbation analysis to identify missense variants impacting NF2 conformation and provides insight into NF2 tumor suppressor function.
Collapse
Affiliation(s)
- Christina S Moesslacher
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
| | - Elisabeth Auernig
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
| | - Jonathan Woodsmith
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
| | - Andreas Feichtner
- Institute of Biochemistry and Center for Molecular Biosciences, University of Innsbruck, Innsbruck, Austria
| | - Evelyne Jany-Luig
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
| | - Stefanie Jehle
- Max-Planck Institute for Molecular Genetics (MPIMG), Otto-Warburg-Laboratory, Berlin, Germany
| | - Josephine M Worseck
- Max-Planck Institute for Molecular Genetics (MPIMG), Otto-Warburg-Laboratory, Berlin, Germany
| | - Christian L Heine
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
| | - Eduard Stefan
- Institute of Biochemistry and Center for Molecular Biosciences, University of Innsbruck, Innsbruck, Austria
- Tyrolean Cancer Research Institute (TKFI), Innsbruck, Austria
- Institute of Molecular Biology, Innsbruck, Austria
| | - Ulrich Stelzl
- Institute of Pharmaceutical Sciences, Pharmaceutical Chemistry, University of Graz, Graz, Austria
- Max-Planck Institute for Molecular Genetics (MPIMG), Otto-Warburg-Laboratory, Berlin, Germany
- BioTechMed-Graz, Graz, Austria
- Field of Excellence BioHealth - University of Graz, Graz, Austria
| |
Collapse
|
49
|
Cagiada M, Bottaro S, Lindemose S, Schenstrøm SM, Stein A, Hartmann-Petersen R, Lindorff-Larsen K. Discovering functionally important sites in proteins. Nat Commun 2023; 14:4175. [PMID: 37443362 PMCID: PMC10345196 DOI: 10.1038/s41467-023-39909-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 07/02/2023] [Indexed: 07/15/2023] Open
Abstract
Proteins play important roles in biology, biotechnology and pharmacology, and missense variants are a common cause of disease. Discovering functionally important sites in proteins is a central but difficult problem because of the lack of large, systematic data sets. Sequence conservation can highlight residues that are functionally important but is often convoluted with a signal for preserving structural stability. We here present a machine learning method to predict functional sites by combining statistical models for protein sequences with biophysical models of stability. We train the model using multiplexed experimental data on variant effects and validate it broadly. We show how the model can be used to discover active sites, as well as regulatory and binding sites. We illustrate the utility of the model by prospective prediction and subsequent experimental validation on the functional consequences of missense variants in HPRT1 which may cause Lesch-Nyhan syndrome, and pinpoint the molecular mechanisms by which they cause disease.
Collapse
Affiliation(s)
- Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Sandro Bottaro
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Søren Lindemose
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Signe M Schenstrøm
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
50
|
Mighell TL, Toledano I, Lehner B. SUNi mutagenesis: Scalable and uniform nicking for efficient generation of variant libraries. PLoS One 2023; 18:e0288158. [PMID: 37418460 DOI: 10.1371/journal.pone.0288158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Accepted: 06/20/2023] [Indexed: 07/09/2023] Open
Abstract
Multiplexed assays of variant effects (MAVEs) have made possible the functional assessment of all possible mutations to genes and regulatory sequences. A core pillar of the approach is generation of variant libraries, but current methods are either difficult to scale or not uniform enough to enable MAVEs at the scale of gene families or beyond. We present an improved method called Scalable and Uniform Nicking (SUNi) mutagenesis that combines massive scalability with high uniformity to enable cost-effective MAVEs of gene families and eventually genomes.
Collapse
Affiliation(s)
- Taylor L Mighell
- The Barcelona Institute of Science and Technology, Center for Genomic Regulation (CRG), Barcelona, Spain
| | - Ignasi Toledano
- The Barcelona Institute of Science and Technology, Center for Genomic Regulation (CRG), Barcelona, Spain
| | - Ben Lehner
- The Barcelona Institute of Science and Technology, Center for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| |
Collapse
|