1
|
Rubin AF, Stone J, Bianchi AH, Capodanno BJ, Da EY, Dias M, Esposito D, Frazer J, Fu Y, Grindstaff SB, Harrington MR, Li I, McEwen AE, Min JK, Moore N, Moscatelli OG, Ong J, Polunina PV, Rollins JE, Rollins NJ, Snyder AE, Tam A, Wakefield MJ, Ye SS, Starita LM, Bryant VL, Marks DS, Fowler DM. MaveDB 2024: a curated community database with over seven million variant effects from multiplexed functional assays. Genome Biol 2025; 26:13. [PMID: 39838450 DOI: 10.1186/s13059-025-03476-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2024] [Accepted: 01/10/2025] [Indexed: 01/23/2025] Open
Abstract
Multiplexed assays of variant effect (MAVEs) are a critical tool for researchers and clinicians to understand genetic variants. Here we describe the 2024 update to MaveDB ( https://www.mavedb.org/ ) with four key improvements to the MAVE community's database of record: more available data including over 7 million variant effect measurements, an improved data model supporting assays such as saturation genome editing, new built-in exploration and visualization tools, and powerful APIs for data federation and streamlined submission and access. Together these changes support MaveDB's role as a hub for the analysis and dissemination of MAVEs now and into the future.
Collapse
Affiliation(s)
- Alan F Rubin
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia.
- Department of Medical Biology, University of Melbourne, Parkville, Australia.
| | - Jeremy Stone
- Brotman Baty Institute for Precision Medicine, Seattle, USA
| | | | | | - Estelle Y Da
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | - Mafalda Dias
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- University Pompeu Fabra, Barcelona, Spain
| | - Daniel Esposito
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | - Jonathan Frazer
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- University Pompeu Fabra, Barcelona, Spain
| | - Yunfan Fu
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
- Department of Medical Biology, University of Melbourne, Parkville, Australia
| | | | | | - Iris Li
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | - Abbye E McEwen
- Brotman Baty Institute for Precision Medicine, Seattle, USA
- Department of Genome Sciences, University of Washington, Seattle, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, USA
| | - Joseph K Min
- Department of Genome Sciences, University of Washington, Seattle, USA
| | - Nick Moore
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | - Olivia G Moscatelli
- Department of Medical Biology, University of Melbourne, Parkville, Australia
- Immunology Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | - Jesslyn Ong
- Immunology Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
- Department of Microbiology and Immunology, University of Melbourne, Parkville, Australia
| | - Polina V Polunina
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Joshua E Rollins
- Department of Computer Science, The Graduate Center, The City University of New York, New York, USA
| | | | | | - Amy Tam
- Department of Systems Biology, Harvard Medical School, Boston, USA
| | - Matthew J Wakefield
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
- Department of Medical Biology, University of Melbourne, Parkville, Australia
- Department of Obstetrics, Gynaecology and Newborn Health, University of Melbourne, Parkville, Australia
| | - Shenyi Sunny Ye
- Department of Genome Sciences, University of Washington, Seattle, USA
| | - Lea M Starita
- Brotman Baty Institute for Precision Medicine, Seattle, USA
- Department of Genome Sciences, University of Washington, Seattle, USA
| | - Vanessa L Bryant
- Department of Medical Biology, University of Melbourne, Parkville, Australia
- Immunology Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
- Department of Clinical Immunology & Allergy, The Royal Melbourne Hospital, Parkville, Australia
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, USA.
- Broad Institute of Harvard and MIT, Boston, USA.
| | - Douglas M Fowler
- Brotman Baty Institute for Precision Medicine, Seattle, USA.
- Department of Genome Sciences, University of Washington, Seattle, USA.
- Department of Bioengineering, University of Washington, Seattle, USA.
| |
Collapse
|
2
|
Westmann CA, Goldbach L, Wagner A. The highly rugged yet navigable regulatory landscape of the bacterial transcription factor TetR. Nat Commun 2024; 15:10745. [PMID: 39737967 DOI: 10.1038/s41467-024-54723-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 11/19/2024] [Indexed: 01/01/2025] Open
Abstract
Transcription factor binding sites (TFBSs) are important sources of evolutionary innovations. Understanding how evolution navigates the sequence space of such sites can be achieved by mapping TFBS adaptive landscapes. In such a landscape, an individual location corresponds to a TFBS bound by a transcription factor. The elevation at that location corresponds to the strength of transcriptional regulation conveyed by the sequence. Here, we develop an in vivo massively parallel reporter assay to map the landscape of bacterial TFBSs. We apply this assay to the TetR repressor, for which few TFBSs are known. We quantify the strength of transcriptional repression for 17,765 TFBSs and show that the resulting landscape is highly rugged, with 2092 peaks. Only a few peaks convey stronger repression than the wild type. Non-additive (epistatic) interactions between mutations are frequent. Despite these hallmarks of ruggedness, most high peaks are evolutionarily accessible. They have large basins of attraction and are reached by around 20% of populations evolving on the landscape. Which high peak is reached during evolution is unpredictable and contingent on the mutational path taken. This in-depth analysis of a prokaryotic gene regulator reveals a landscape that is navigable but much more rugged than the landscapes of eukaryotic regulators.
Collapse
Affiliation(s)
- Cauã Antunes Westmann
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, Zurich, CH-8057, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015, Lausanne, Switzerland
| | - Leander Goldbach
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, Zurich, CH-8057, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015, Lausanne, Switzerland
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, Zurich, CH-8057, Switzerland.
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015, Lausanne, Switzerland.
- The Santa Fe Institute, Santa Fe, NM, 87501, USA.
| |
Collapse
|
3
|
Sinnott-Armstrong N, Fields S, Roth F, Starita LM, Trapnell C, Villen J, Fowler DM, Queitsch C. Understanding genetic variants in context. eLife 2024; 13:e88231. [PMID: 39625477 PMCID: PMC11614383 DOI: 10.7554/elife.88231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 11/15/2024] [Indexed: 12/06/2024] Open
Abstract
Over the last three decades, human genetics has gone from dissecting high-penetrance Mendelian diseases to discovering the vast and complex genetic etiology of common human diseases. In tackling this complexity, scientists have discovered the importance of numerous genetic processes - most notably functional regulatory elements - in the development and progression of these diseases. Simultaneously, scientists have increasingly used multiplex assays of variant effect to systematically phenotype the cellular consequences of millions of genetic variants. In this article, we argue that the context of genetic variants - at all scales, from other genetic variants and gene regulation to cell biology to organismal environment - are critical components of how we can employ genomics to interpret these variants, and ultimately treat these diseases. We describe approaches to extend existing experimental assays and computational approaches to examine and quantify the importance of this context, including through causal analytic approaches. Having a unified understanding of the molecular, physiological, and environmental processes governing the interpretation of genetic variants is sorely needed for the field, and this perspective argues for feasible approaches by which the combined interpretation of cellular, animal, and epidemiological data can yield that knowledge.
Collapse
Affiliation(s)
- Nasa Sinnott-Armstrong
- Herbold Computational Biology Program, Fred Hutchinson Cancer CenterSeattleUnited States
- Department of Genome Sciences, University of WashingtonSeattleUnited States
- Brotman Baty Institute for Precision MedicineSeattleUnited States
| | - Stanley Fields
- Department of Genome Sciences, University of WashingtonSeattleUnited States
- Department of Medicine, University of WashingtonSeattleUnited States
| | - Frederick Roth
- Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of TorontoTorontoCanada
- Lunenfeld-Tanenbaum Research Institute, Mt. Sinai HospitalTorontoCanada
- Department of Computational and Systems Biology, University of Pittsburgh School of MedicinePittsburghUnited States
| | - Lea M Starita
- Department of Genome Sciences, University of WashingtonSeattleUnited States
- Brotman Baty Institute for Precision MedicineSeattleUnited States
| | - Cole Trapnell
- Department of Genome Sciences, University of WashingtonSeattleUnited States
- Brotman Baty Institute for Precision MedicineSeattleUnited States
| | - Judit Villen
- Department of Genome Sciences, University of WashingtonSeattleUnited States
- Brotman Baty Institute for Precision MedicineSeattleUnited States
| | - Douglas M Fowler
- Department of Genome Sciences, University of WashingtonSeattleUnited States
- Brotman Baty Institute for Precision MedicineSeattleUnited States
- Department of Bioengineering, University of WashingtonSeattleUnited States
| | - Christine Queitsch
- Department of Genome Sciences, University of WashingtonSeattleUnited States
- Brotman Baty Institute for Precision MedicineSeattleUnited States
| |
Collapse
|
4
|
Pan RW, Röschinger T, Faizi K, Garcia HG, Phillips R. Deciphering regulatory architectures of bacterial promoters from synthetic expression patterns. PLoS Comput Biol 2024; 20:e1012697. [PMID: 39724021 PMCID: PMC11709304 DOI: 10.1371/journal.pcbi.1012697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 01/08/2025] [Accepted: 12/04/2024] [Indexed: 12/28/2024] Open
Abstract
For the vast majority of genes in sequenced genomes, there is limited understanding of how they are regulated. Without such knowledge, it is not possible to perform a quantitative theory-experiment dialogue on how such genes give rise to physiological and evolutionary adaptation. One category of high-throughput experiments used to understand the sequence-phenotype relationship of the transcriptome is massively parallel reporter assays (MPRAs). However, to improve the versatility and scalability of MPRAs, we need a "theory of the experiment" to help us better understand the impact of various biological and experimental parameters on the interpretation of experimental data. These parameters include binding site copy number, where a large number of specific binding sites may titrate away transcription factors, as well as the presence of overlapping binding sites, which may affect analysis of the degree of mutual dependence between mutations in the regulatory region and expression levels. To that end, in this paper we create tens of thousands of synthetic gene expression outputs for bacterial promoters using both equilibrium and out-of-equilibrium models. These models make it possible to imitate the summary statistics (information footprints and expression shift matrices) used to characterize the output of MPRAs and thus to infer the underlying regulatory architecture. Specifically, we use a more refined implementation of the so-called thermodynamic models in which the binding energies of each sequence variant are derived from energy matrices. Our simulations reveal important effects of the parameters on MPRA data and we demonstrate our ability to optimize MPRA experimental designs with the goal of generating thermodynamic models of the transcriptome with base-pair specificity. Further, this approach makes it possible to carefully examine the mapping between mutations in binding sites and their corresponding expression profiles, a tool useful not only for developing a theory of transcription, but also for exploring regulatory evolution.
Collapse
Affiliation(s)
- Rosalind Wenshan Pan
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Tom Röschinger
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Kian Faizi
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Hernan G. Garcia
- Biophysics Graduate Group, University of California, Berkeley, California, United States of America
- Department of Physics, University of California, Berkeley, California, United States of America
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America
- Institute for Quantitative Biosciences-QB3, University of California, Berkeley, California, United States of America
- Chan Zuckerberg Biohub-San Francisco, San Francisco, California, United States of America
| | - Rob Phillips
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
- Division of Physics, Mathematics, and Astronomy, California Institute of Technology, Pasadena, California, United States of America
| |
Collapse
|
5
|
Tonelli A, Cousin P, Jankowski A, Wang B, Dorier J, Barraud J, Zunjarrao S, Gambetta MC. Systematic screening of enhancer-blocking insulators in Drosophila identifies their DNA sequence determinants. Dev Cell 2024:S1534-5807(24)00636-1. [PMID: 39532105 DOI: 10.1016/j.devcel.2024.10.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 06/21/2024] [Accepted: 10/18/2024] [Indexed: 11/16/2024]
Abstract
Long-range transcriptional activation of gene promoters by abundant enhancers in animal genomes calls for mechanisms to limit inappropriate regulation. DNA elements called insulators serve this purpose by shielding promoters from an enhancer when interposed. Unlike promoters and enhancers, insulators have not been systematically characterized due to lacking high-throughput screening assays, and questions regarding how insulators are distributed and encoded in the genome remain. Here, we establish "insulator-seq" as a plasmid-based massively parallel reporter assay in Drosophila cultured cells to perform a systematic insulator screen of selected genomic loci. Screening developmental gene loci showed that not all insulator protein binding sites effectively block enhancer-promoter communication. Deep insulator mutagenesis identified sequences flexibly positioned around the CTCF insulator protein binding motif that are critical for functionality. The ability to screen millions of DNA sequences without positional effect has enabled functional mapping of insulators and provided further insights into the determinants of insulators.
Collapse
Affiliation(s)
- Anastasiia Tonelli
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Pascal Cousin
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Aleksander Jankowski
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, 02-097 Warsaw, Poland
| | - Bihan Wang
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Julien Dorier
- Bioinformatics Competence Center, University of Lausanne, 1015 Lausanne, Switzerland; Bioinformatics Competence Center, Swiss Federal Institute of Technology Lausanne, 1015 Lausanne, Switzerland
| | - Jonas Barraud
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Sanyami Zunjarrao
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | | |
Collapse
|
6
|
Blaabjerg LM, Jonsson N, Boomsma W, Stein A, Lindorff-Larsen K. SSEmb: A joint embedding of protein sequence and structure enables robust variant effect predictions. Nat Commun 2024; 15:9646. [PMID: 39511177 PMCID: PMC11544099 DOI: 10.1038/s41467-024-53982-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 10/28/2024] [Indexed: 11/15/2024] Open
Abstract
The ability to predict how amino acid changes affect proteins has a wide range of applications including in disease variant classification and protein engineering. Many existing methods focus on learning from patterns found in either protein sequences or protein structures. Here, we present a method for integrating information from sequence and structure in a single model that we term SSEmb (Sequence Structure Embedding). SSEmb combines a graph representation for the protein structure with a transformer model for processing multiple sequence alignments. We show that by integrating both types of information we obtain a variant effect prediction model that is robust when sequence information is scarce. We also show that SSEmb learns embeddings of the sequence and structure that are useful for other downstream tasks such as to predict protein-protein binding sites. We envisage that SSEmb may be useful both for variant effect predictions and as a representation for learning to predict protein properties that depend on sequence and structure.
Collapse
Affiliation(s)
- Lasse M Blaabjerg
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| | - Nicolas Jonsson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| | - Wouter Boomsma
- Center for Basic Machine Learning Research in Life Science, Department of Computer Science, University of Copenhagen, Copenhagen N, Denmark.
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen N, Denmark.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen N, Denmark.
| |
Collapse
|
7
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
8
|
Sun M, Stoltzfus A, McCandlish DM. A fitness distribution law for amino-acid replacements. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.11.617952. [PMID: 39464166 PMCID: PMC11507765 DOI: 10.1101/2024.10.11.617952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
The effect of replacing the amino acid at a given site in a protein is difficult to predict. Yet, evolutionary comparisons have revealed highly regular patterns of interchangeability between pairs of amino acids, and such patterns have proved enormously useful in a range of applications in bioinformatics, evolutionary inference, and protein design. Here we reconcile these apparently contradictory observations using fitness data from over 350,000 experimental amino acid replacements. Almost one-quarter of the 20 × 19 = 380 types of replacements have broad distributions of fitness effects (DFEs) that closely resemble the background DFE for random changes, indicating an overwhelming influence of protein context in determining mutational effects. However, we also observe that the 380 pair-specific DFEs closely follow a maximum entropy distribution, specifically a truncated exponential distribution. The shape of this distribution is determined entirely by its mean, which is equivalent to the chance that a replacement of the given type is fitter than a random replacement. In this type of distribution, modest deviations in the mean correspond to much larger changes in the probability of falling in the far right tail, so that modest differences in mean exchangeability may result in much larger differences in the chance of a highly fit mutation. Indeed, we show that under the assumption that purifying selection filters out the vast majority of mutations, the maximum entropy distributions of fitness effects inferred from deep mutational scanning experiments predict the characteristic patterns of amino acid change observed in molecular evolution. These maximum entropy distributions of mutational effects not only provide a tuneable model for molecular evolution, but also have implications for mutational effect prediction and protein engineering.
Collapse
Affiliation(s)
- Mengyi Sun
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Arlin Stoltzfus
- Office of Data and Informatics, Material Measurement Laboratory, NIST, Gaithersburg, MD
- Institute for Bioscience and Biotechnology Research, Rockville, USA
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
9
|
Visani GM, Pun MN, Galvin W, Daniel E, Borisiak K, Wagura U, Nourmohammad A. HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.09.602403. [PMID: 39026838 PMCID: PMC11257601 DOI: 10.1101/2024.07.09.602403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Predicting the stability and fitness effects of amino-acid mutations in proteins is a cornerstone of biological discovery and engineering. Various experimental techniques have been developed to measure mutational effects, providing us with extensive datasets across a diverse range of proteins. By training on these data, machine learning approaches have advanced significantly in predicting mutational effects. Here, we introduce HERMES, a 3D rotationally equivariant structure-based neural network model for mutation effect prediction. Pre-trained to predict amino-acid propensities from their surrounding 3D structure atomic environments, HERMES can be efficiently fine-tuned to predict mutational effects, thanks to its symmetry-aware parameterization of the output space. Benchmarking against other models demonstrates that HERMES often outperforms or matches their performance in predicting mutation effects on stability, binding, and fitness, using either computationally or experimentally resolved protein structures. HERMES offers a versatile suit of tools for evaluating mutation effects and can be easily fine-tuned for specific predictive objectives using our open-source code.
Collapse
Affiliation(s)
- Gian Marco Visani
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| | | | - William Galvin
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| | - Eric Daniel
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| | | | | | - Armita Nourmohammad
- Department of Physics, Applied Math, and CSE, University of Washington, Fred Hutch Cancer Research Center, Seattle, WA
| |
Collapse
|
10
|
Faure AJ, Martí-Aranda A, Hidalgo-Carcedo C, Beltran A, Schmiedel JM, Lehner B. The genetic architecture of protein stability. Nature 2024; 634:995-1003. [PMID: 39322666 PMCID: PMC11499273 DOI: 10.1038/s41586-024-07966-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 08/20/2024] [Indexed: 09/27/2024]
Abstract
There are more ways to synthesize a 100-amino acid (aa) protein (20100) than there are atoms in the universe. Only a very small fraction of such a vast sequence space can ever be experimentally or computationally surveyed. Deep neural networks are increasingly being used to navigate high-dimensional sequence spaces1. However, these models are extremely complicated. Here, by experimentally sampling from sequence spaces larger than 1010, we show that the genetic architecture of at least some proteins is remarkably simple, allowing accurate genetic prediction in high-dimensional sequence spaces with fully interpretable energy models. These models capture the nonlinear relationships between free energies and phenotypes but otherwise consist of additive free energy changes with a small contribution from pairwise energetic couplings. These energetic couplings are sparse and associated with structural contacts and backbone proximity. Our results indicate that protein genetics is actually both rather simple and intelligible.
Collapse
Affiliation(s)
- Andre J Faure
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- ALLOX, Barcelona, Spain.
| | - Aina Martí-Aranda
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Cristina Hidalgo-Carcedo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Antoni Beltran
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Jörn M Schmiedel
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- factorize.bio, Berlin, Germany
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
11
|
Fernandez-de-Cossio-Diaz J, Uguzzoni G, Ricard K, Anselmi F, Nizak C, Pagnani A, Rivoire O. Inference and design of antibody specificity: From experiments to models and back. PLoS Comput Biol 2024; 20:e1012522. [PMID: 39401247 PMCID: PMC11501025 DOI: 10.1371/journal.pcbi.1012522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 10/24/2024] [Accepted: 09/29/2024] [Indexed: 10/26/2024] Open
Abstract
Exquisite binding specificity is essential for many protein functions but is difficult to engineer. Many biotechnological or biomedical applications require the discrimination of very similar ligands, which poses the challenge of designing protein sequences with highly specific binding profiles. Experimental methods for generating specific binders rely on in vitro selection, which is limited in terms of library size and control over specificity profiles. Additional control was recently demonstrated through high-throughput sequencing and downstream computational analysis. Here we follow such an approach to demonstrate the design of specific antibodies beyond those probed experimentally. We do so in a context where very similar epitopes need to be discriminated, and where these epitopes cannot be experimentally dissociated from other epitopes present in the selection. Our approach involves the identification of different binding modes, each associated with a particular ligand against which the antibodies are either selected or not. Using data from phage display experiments, we show that the model successfully disentangles these modes, even when they are associated with chemically very similar ligands. Additionally, we demonstrate and validate experimentally the computational design of antibodies with customized specificity profiles, either with specific high affinity for a particular target ligand, or with cross-specificity for multiple target ligands. Overall, our results showcase the potential of leveraging a biophysical model learned from selections against multiple ligands to design proteins with tailored specificity, with applications to protein engineering extending beyond the design of antibodies.
Collapse
Affiliation(s)
- Jorge Fernandez-de-Cossio-Diaz
- Laboratoire de physique de l’Ecole normale supérieure, CNRS, PSL University, Sorbonne Université, Université Paris-Cité, Paris, France
| | - Guido Uguzzoni
- Italian Institute for Genomic Medicine, IRCCS Candiolo, Candiolo, Italy
| | - Kévin Ricard
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, Paris, France
| | - Francesca Anselmi
- Italian Institute for Genomic Medicine, IRCCS Candiolo, Candiolo, Italy
- Department of Life Sciences and Systems Biology & Molecular Biotechnology Center - MBC, Universita di Torino, Via Nizza, Torino, Italy
| | - Clément Nizak
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire Jean Perrin, Paris, France
| | - Andrea Pagnani
- Italian Institute for Genomic Medicine, IRCCS Candiolo, Candiolo, Italy
- DISAT, Politecnico di Torino, Corso Duca degli Abruzzi, Torino, Italy
- INFN, Sezione di Torino, Torino, Via Pietro Giuria, Torino Italy
| | - Olivier Rivoire
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, Paris, France
- Gulliver, CNRS, ESPCI Paris, Université PSL, Paris, France
| |
Collapse
|
12
|
Kim J, Muller RY, Bondra ER, Ingolia NT. CRISPRi with barcoded expression reporters dissects regulatory networks in human cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.06.611573. [PMID: 39282439 PMCID: PMC11398470 DOI: 10.1101/2024.09.06.611573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/20/2024]
Abstract
Genome-wide CRISPR screens have emerged as powerful tools for uncovering the genetic underpinnings of diverse biological processes. Incisive screens often depend on directly measuring molecular phenotypes, such as regulated gene expression changes, provoked by CRISPR-mediated genetic perturbations. Here, we provide quantitative measurements of transcriptional responses in human cells across genome-scale perturbation libraries by coupling CRISPR interference (CRISPRi) with barcoded expression reporter sequencing (CiBER-seq). To enable CiBER-seq in mammalian cells, we optimize the integration of highly complex, barcoded sgRNA libraries into a defined genomic context. CiBER-seq profiling of a nuclear factor kappa B (NF-κB) reporter delineates the canonical signaling cascade linking the transmembrane TNF-alpha receptor to inflammatory gene activation and highlights cell-type-specific factors in this response. Importantly, CiBER-seq relies solely on bulk RNA sequencing to capture the regulatory circuit driving this rapid transcriptional response. Our work demonstrates the accuracy of CiBER-seq and its potential for dissecting genetic networks in mammalian cells with superior time resolution.
Collapse
Affiliation(s)
- Jinyoung Kim
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Ryan Y. Muller
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Eliana R. Bondra
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Nicholas T. Ingolia
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
13
|
Aguirre Rivera J, Mao G, Sabantsev A, Panfilov M, Hou Q, Lindell M, Chanez C, Ritort F, Jinek M, Deindl S. Massively parallel analysis of single-molecule dynamics on next-generation sequencing chips. Science 2024; 385:892-898. [PMID: 39172826 DOI: 10.1126/science.adn5371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 06/12/2024] [Indexed: 08/24/2024]
Abstract
Single-molecule techniques are ideally poised to characterize complex dynamics but are typically limited to investigating a small number of different samples. However, a large sequence or chemical space often needs to be explored to derive a comprehensive understanding of complex biological processes. Here we describe multiplexed single-molecule characterization at the library scale (MUSCLE), a method that combines single-molecule fluorescence microscopy with next-generation sequencing to enable highly multiplexed observations of complex dynamics. We comprehensively profiled the sequence dependence of DNA hairpin properties and Cas9-induced target DNA unwinding-rewinding dynamics. The ability to explore a large sequence space for Cas9 allowed us to identify a number of target sequences with unexpected behaviors. We envision that MUSCLE will enable the mechanistic exploration of many fundamental biological processes.
Collapse
Affiliation(s)
- J Aguirre Rivera
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, 75105 Uppsala, Sweden
| | - G Mao
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, 75105 Uppsala, Sweden
| | - A Sabantsev
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, 75105 Uppsala, Sweden
| | - M Panfilov
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, 75105 Uppsala, Sweden
| | - Q Hou
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, 75105 Uppsala, Sweden
| | - M Lindell
- Department of Medical Sciences, Science for Life Laboratory, Uppsala University, 75144 Uppsala, Sweden
| | - C Chanez
- Department of Biochemistry, University of Zürich, 8057 Zürich, Switzerland
| | - F Ritort
- Small Biosystems Lab, Condensed Matter Physics Department, Universitat de Barcelona, 08028 Barcelona, Spain
- Institut de Nanociència i Nanotecnologia (IN2UB), Universitat de Barcelona, 08028 Barcelona, Spain
| | - M Jinek
- Department of Biochemistry, University of Zürich, 8057 Zürich, Switzerland
| | - S Deindl
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, 75105 Uppsala, Sweden
| |
Collapse
|
14
|
Kim SO, Yun SR, Lee H, Jo J, Ahn DS, Kim D, Kosheleva I, Henning R, Kim J, Kim C, You S, Kim H, Lee SJ, Ihee H. Serial X-ray liquidography: multi-dimensional assay framework for exploring biomolecular structural dynamics with microgram quantities. Nat Commun 2024; 15:6287. [PMID: 39060271 PMCID: PMC11282289 DOI: 10.1038/s41467-024-50696-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 07/17/2024] [Indexed: 07/28/2024] Open
Abstract
Understanding protein structure and kinetics under physiological conditions is crucial for elucidating complex biological processes. While time-resolved (TR) techniques have advanced to track molecular actions, their practical application in biological reactions is often confined to reversible photoreactions within limited experimental parameters due to inefficient sample utilization and inflexibility of experimental setups. Here, we introduce serial X-ray liquidography (SXL), a technique that combines time-resolved X-ray liquidography with a fixed target of serially arranged microchambers. SXL breaks through the previously mentioned barriers, enabling microgram-scale TR studies of both irreversible and reversible reactions of even a non-photoactive protein. We demonstrate its versatility in studying a wide range of biological reactions, highlighting its potential as a flexible and multi-dimensional assay framework for kinetic and structural characterization. Leveraging X-ray free-electron lasers and micro-focused X-ray pulses promises further enhancements in both temporal resolution and minimizing sample quantity. SXL offers unprecedented insights into the structural and kinetic landscapes of molecular actions, paving the way for a deeper understanding of complex biological processes.
Collapse
Affiliation(s)
- Seong Ok Kim
- Center for Advanced Reactions Dynamics (CARD), Institute for Basic Science (IBS), Daejeon, 34141, Republic of Korea
- Department of Chemistry, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - So Ri Yun
- Center for Advanced Reactions Dynamics (CARD), Institute for Basic Science (IBS), Daejeon, 34141, Republic of Korea
- Department of Chemistry, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Hyosub Lee
- Center for Advanced Reactions Dynamics (CARD), Institute for Basic Science (IBS), Daejeon, 34141, Republic of Korea
- Department of Chemistry, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Junbeom Jo
- Center for Advanced Reactions Dynamics (CARD), Institute for Basic Science (IBS), Daejeon, 34141, Republic of Korea
- Department of Chemistry, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Doo-Sik Ahn
- Center for Advanced Reactions Dynamics (CARD), Institute for Basic Science (IBS), Daejeon, 34141, Republic of Korea
- Department of Chemistry, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Doyeong Kim
- Center for Advanced Reactions Dynamics (CARD), Institute for Basic Science (IBS), Daejeon, 34141, Republic of Korea
- Department of Chemistry, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Irina Kosheleva
- Center for Advanced Radiation Sources, The University of Chicago, 9700 South Cass Avenue, Argonne, IL, 60439, USA
| | - Robert Henning
- Center for Advanced Radiation Sources, The University of Chicago, 9700 South Cass Avenue, Argonne, IL, 60439, USA
| | - Jungmin Kim
- Center for Advanced Reactions Dynamics (CARD), Institute for Basic Science (IBS), Daejeon, 34141, Republic of Korea
- Department of Chemistry, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Changin Kim
- Center for Advanced Reactions Dynamics (CARD), Institute for Basic Science (IBS), Daejeon, 34141, Republic of Korea
- Department of Chemistry, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Seyoung You
- Center for Advanced Reactions Dynamics (CARD), Institute for Basic Science (IBS), Daejeon, 34141, Republic of Korea
- Department of Chemistry, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Hanui Kim
- Center for Advanced Reactions Dynamics (CARD), Institute for Basic Science (IBS), Daejeon, 34141, Republic of Korea
- Department of Chemistry, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Sang Jin Lee
- Center for Advanced Reactions Dynamics (CARD), Institute for Basic Science (IBS), Daejeon, 34141, Republic of Korea
- Department of Chemistry, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Hyotcherl Ihee
- Center for Advanced Reactions Dynamics (CARD), Institute for Basic Science (IBS), Daejeon, 34141, Republic of Korea.
- Department of Chemistry, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
| |
Collapse
|
15
|
De Leonardis M, Fernandez-de-Cossio-Diaz J, Uguzzoni G, Pagnani A. Unsupervised modeling of mutational landscapes of adeno-associated viruses viability. BMC Bioinformatics 2024; 25:229. [PMID: 38956474 PMCID: PMC11221173 DOI: 10.1186/s12859-024-05823-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 06/03/2024] [Indexed: 07/04/2024] Open
Abstract
Adeno-associated viruses 2 (AAV2) are minute viruses renowned for their capacity to infect human cells and akin organisms. They have recently emerged as prominent candidates in the field of gene therapy, primarily attributed to their inherent non-pathogenic nature in humans and the safety associated with their manipulation. The efficacy of AAV2 as gene therapy vectors hinges on their ability to infiltrate host cells, a phenomenon reliant on their competence to construct a capsid capable of breaching the nucleus of the target cell. To enhance their infection potential, researchers have extensively scrutinized various combinatorial libraries by introducing mutations into the capsid, aiming to boost their effectiveness. The emergence of high-throughput experimental techniques, like deep mutational scanning (DMS), has made it feasible to experimentally assess the fitness of these libraries for their intended purpose. Notably, machine learning is starting to demonstrate its potential in addressing predictions within the mutational landscape from sequence data. In this context, we introduce a biophysically-inspired model designed to predict the viability of genetic variants in DMS experiments. This model is tailored to a specific segment of the CAP region within AAV2's capsid protein. To evaluate its effectiveness, we conduct model training with diverse datasets, each tailored to explore different aspects of the mutational landscape influenced by the selection process. Our assessment of the biophysical model centers on two primary objectives: (i) providing quantitative forecasts for the log-selectivity of variants and (ii) deploying it as a binary classifier to categorize sequences into viable and non-viable classes.
Collapse
Affiliation(s)
- Matteo De Leonardis
- DISAT, Politecnico di Torino, Corso Duca degli Abruzzi, 10129, Torino, Italy.
| | - Jorge Fernandez-de-Cossio-Diaz
- Laboratoire de Physique de l'Ecole Normale Supérieure, CNRS, PSL University, Sorbonne Université, Universite, Paris-Cité, 75005, Paris, France
| | - Guido Uguzzoni
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, 10060, Candiolo, Italy
| | - Andrea Pagnani
- DISAT, Politecnico di Torino, Corso Duca degli Abruzzi, 10129, Torino, Italy
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, 10060, Candiolo, Italy
| |
Collapse
|
16
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.593772. [PMID: 38798671 PMCID: PMC11118547 DOI: 10.1101/2024.05.12.593772] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Justin B. Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
17
|
Posfai A, McCandlish DM, Kinney JB. Symmetry, gauge freedoms, and the interpretability of sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.593774. [PMID: 38798625 PMCID: PMC11118426 DOI: 10.1101/2024.05.12.593774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Quantitative models that describe how biological sequences encode functional activities are ubiquitous in modern biology. One important aspect of these models is that they commonly exhibit gauge freedoms, i.e., directions in parameter space that do not affect model predictions. In physics, gauge freedoms arise when physical theories are formulated in ways that respect fundamental symmetries. However, the connections that gauge freedoms in models of sequence-function relationships have to the symmetries of sequence space have yet to be systematically studied. Here we study the gauge freedoms of models that respect a specific symmetry of sequence space: the group of position-specific character permutations. We find that gauge freedoms arise when model parameters transform under redundant irreducible matrix representations of this group. Based on this finding, we describe an "embedding distillation" procedure that enables analytic calculation of the number of independent gauge freedoms, as well as efficient computation of a sparse basis for the space of gauge freedoms. We also study how parameter transformation behavior affects parameter interpretability. We find that in many (and possibly all) nontrivial models, the ability to interpret individual model parameters as quantifying intrinsic allelic effects requires that gauge freedoms be present. This finding establishes an incompatibility between two distinct notions of parameter interpretability. Our work thus advances the understanding of symmetries, gauge freedoms, and parameter interpretability in sequence-function relationships.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Justin B. Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
18
|
Pan RW, Röschinger T, Faizi K, Garcia H, Phillips R. Deciphering regulatory architectures from synthetic single-cell expression patterns. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.28.577658. [PMID: 38352569 PMCID: PMC10862715 DOI: 10.1101/2024.01.28.577658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
For the vast majority of genes in sequenced genomes, there is limited understanding of how they are regulated. Without such knowledge, it is not possible to perform a quantitative theory-experiment dialogue on how such genes give rise to physiological and evolutionary adaptation. One category of high-throughput experiments used to understand the sequence-phenotype relationship of the transcriptome is massively parallel reporter assays (MPRAs). However, to improve the versatility and scalability of MPRA pipelines, we need a "theory of the experiment" to help us better understand the impact of various biological and experimental parameters on the interpretation of experimental data. These parameters include binding site copy number, where a large number of specific binding sites may titrate away transcription factors, as well as the presence of overlapping binding sites, which may affect analysis of the degree of mutual dependence between mutations in the regulatory region and expression levels. To that end, in this paper we create tens of thousands of synthetic single-cell gene expression outputs using both equilibrium and out-of-equilibrium models. These models make it possible to imitate the summary statistics (information footprints and expression shift matrices) used to characterize the output of MPRAs and from this summary statistic to infer the underlying regulatory architecture. Specifically, we use a more refined implementation of the so-called thermodynamic models in which the binding energies of each sequence variant are derived from energy matrices. Our simulations reveal important effects of the parameters on MPRA data and we demonstrate our ability to optimize MPRA experimental designs with the goal of generating thermodynamic models of the transcriptome with base-pair specificity. Further, this approach makes it possible to carefully examine the mapping between mutations in binding sites and their corresponding expression profiles, a tool useful not only for better designing MPRAs, but also for exploring regulatory evolution.
Collapse
Affiliation(s)
- Rosalind Wenshan Pan
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA
| | - Tom Röschinger
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA
| | - Kian Faizi
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA
| | - Hernan Garcia
- Biophysics Graduate Group, University of California, Berkeley, CA
- Department of Physics, University of California, Berkeley, CA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA
- Institute for Quantitative Biosciences-QB3, University of California, Berkeley, CA
| | - Rob Phillips
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA
- Department of Physics, California Institute of Technology, Pasadena, CA
| |
Collapse
|
19
|
Miliotis C, Ma Y, Katopodi XL, Karagkouni D, Kanata E, Mattioli K, Kalavros N, Pita-Juárez YH, Batalini F, Ramnarine VR, Nanda S, Slack FJ, Vlachos IS. Determinants of gastric cancer immune escape identified from non-coding immune-landscape quantitative trait loci. Nat Commun 2024; 15:4319. [PMID: 38773080 PMCID: PMC11109163 DOI: 10.1038/s41467-024-48436-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 05/01/2024] [Indexed: 05/23/2024] Open
Abstract
The landscape of non-coding mutations in cancer progression and immune evasion is largely unexplored. Here, we identify transcrptome-wide somatic and germline 3' untranslated region (3'-UTR) variants from 375 gastric cancer patients from The Cancer Genome Atlas. By performing gene expression quantitative trait loci (eQTL) and immune landscape QTL (ilQTL) analysis, we discover 3'-UTR variants with cis effects on expression and immune landscape phenotypes, such as immune cell infiltration and T cell receptor diversity. Using a massively parallel reporter assay, we distinguish between causal and correlative effects of 3'-UTR eQTLs in immune-related genes. Our approach identifies numerous 3'-UTR eQTLs and ilQTLs, providing a unique resource for the identification of immunotherapeutic targets and biomarkers. A prioritized ilQTL variant signature predicts response to immunotherapy better than standard-of-care PD-L1 expression in independent patient cohorts, showcasing the untapped potential of non-coding mutations in cancer.
Collapse
Affiliation(s)
- Christos Miliotis
- Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
- Harvard Program in Virology, Harvard University Graduate School of Arts and Sciences, Boston, MA, USA
| | - Yuling Ma
- Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Xanthi-Lida Katopodi
- Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | - Dimitra Karagkouni
- Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cancer Center & Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | - Eleni Kanata
- Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | - Kaia Mattioli
- Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Nikolas Kalavros
- Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Spatial Technologies Unit, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Yered H Pita-Juárez
- Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Felipe Batalini
- Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
- Division of Oncology, Department of Medicine, Mayo Clinic, Phoenix, AZ, USA
| | - Varune R Ramnarine
- Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | - Shivani Nanda
- Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cancer Center & Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | - Frank J Slack
- Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.
- Cancer Center & Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.
| | - Ioannis S Vlachos
- Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Cancer Center & Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.
- Spatial Technologies Unit, Beth Israel Deaconess Medical Center, Boston, MA, USA.
| |
Collapse
|
20
|
Rozhoňová H, Martí-Gómez C, McCandlish DM, Payne JL. Robust genetic codes enhance protein evolvability. PLoS Biol 2024; 22:e3002594. [PMID: 38754362 PMCID: PMC11098591 DOI: 10.1371/journal.pbio.3002594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 03/19/2024] [Indexed: 05/18/2024] Open
Abstract
The standard genetic code defines the rules of translation for nearly every life form on Earth. It also determines the amino acid changes accessible via single-nucleotide mutations, thus influencing protein evolvability-the ability of mutation to bring forth adaptive variation in protein function. One of the most striking features of the standard genetic code is its robustness to mutation, yet it remains an open question whether such robustness facilitates or frustrates protein evolvability. To answer this question, we use data from massively parallel sequence-to-function assays to construct and analyze 6 empirical adaptive landscapes under hundreds of thousands of rewired genetic codes, including those of codon compression schemes relevant to protein engineering and synthetic biology. We find that robust genetic codes tend to enhance protein evolvability by rendering smooth adaptive landscapes with few peaks, which are readily accessible from throughout sequence space. However, the standard genetic code is rarely exceptional in this regard, because many alternative codes render smoother landscapes than the standard code. By constructing low-dimensional visualizations of these landscapes, which each comprise more than 16 million mRNA sequences, we show that such alternative codes radically alter the topological features of the network of high-fitness genotypes. Whereas the genetic codes that optimize evolvability depend to some extent on the detailed relationship between amino acid sequence and protein function, we also uncover general design principles for engineering nonstandard genetic codes for enhanced and diminished evolvability, which may facilitate directed protein evolution experiments and the bio-containment of synthetic organisms, respectively.
Collapse
Affiliation(s)
- Hana Rozhoňová
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Carlos Martí-Gómez
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Joshua L. Payne
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
21
|
Claussnitzer M, Parikh VN, Wagner AH, Arbesfeld JA, Bult CJ, Firth HV, Muffley LA, Nguyen Ba AN, Riehle K, Roth FP, Tabet D, Bolognesi B, Glazer AM, Rubin AF. Minimum information and guidelines for reporting a multiplexed assay of variant effect. Genome Biol 2024; 25:100. [PMID: 38641812 PMCID: PMC11027375 DOI: 10.1186/s13059-024-03223-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 03/25/2024] [Indexed: 04/21/2024] Open
Abstract
Multiplexed assays of variant effect (MAVEs) have emerged as a powerful approach for interrogating thousands of genetic variants in a single experiment. The flexibility and widespread adoption of these techniques across diverse disciplines have led to a heterogeneous mix of data formats and descriptions, which complicates the downstream use of the resulting datasets. To address these issues and promote reproducibility and reuse of MAVE data, we define a set of minimum information standards for MAVE data and metadata and outline a controlled vocabulary aligned with established biomedical ontologies for describing these experimental designs.
Collapse
Affiliation(s)
- Melina Claussnitzer
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Cambridge, MA, 02142, USA
| | - Victoria N Parikh
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Alex H Wagner
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, 43210, USA
| | - Jeremy A Arbesfeld
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, 43215, USA
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Carol J Bult
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | - Helen V Firth
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
- Dept of Medical Genetics, Cambridge University Hospitals NHS Trust, Cambridge, UK
| | - Lara A Muffley
- Department of Genome Sciences, University of Washington, Seattle, WA, 98105, USA
| | - Alex N Nguyen Ba
- Department of Biology, University of Toronto at Mississauga, Mississauga, ON, Canada
| | - Kevin Riehle
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Daniel Tabet
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Benedetta Bolognesi
- Institute for Bioengineering of Catalunya (IBEC), The Barcelona Institute of Science and Technology, Barcelona, Spain.
| | - Andrew M Glazer
- Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia.
| |
Collapse
|
22
|
Meger AT, Spence MA, Sandhu M, Matthews D, Chen J, Jackson CJ, Raman S. Rugged fitness landscapes minimize promiscuity in the evolution of transcriptional repressors. Cell Syst 2024; 15:374-387.e6. [PMID: 38537640 PMCID: PMC11299162 DOI: 10.1016/j.cels.2024.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 09/08/2023] [Accepted: 03/05/2024] [Indexed: 04/20/2024]
Abstract
How a protein's function influences the shape of its fitness landscape, smooth or rugged, is a fundamental question in evolutionary biochemistry. Smooth landscapes arise when incremental mutational steps lead to a progressive change in function, as commonly seen in enzymes and binding proteins. On the other hand, rugged landscapes are poorly understood because of the inherent unpredictability of how sequence changes affect function. Here, we experimentally characterize the entire sequence phylogeny, comprising 1,158 extant and ancestral sequences, of the DNA-binding domain (DBD) of the LacI/GalR transcriptional repressor family. Our analysis revealed an extremely rugged landscape with rapid switching of specificity, even between adjacent nodes. Further, the ruggedness arises due to the necessity of the repressor to simultaneously evolve specificity for asymmetric operators and disfavors potentially adverse regulatory crosstalk. Our study provides fundamental insight into evolutionary, molecular, and biophysical rules of genetic regulation through the lens of fitness landscapes.
Collapse
Affiliation(s)
- Anthony T Meger
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Matthew A Spence
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Mahakaran Sandhu
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Dana Matthews
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Jackie Chen
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Colin J Jackson
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia; ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia; ARC Centre of Excellence for Innovations in Synthetic Biology, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia.
| | - Srivatsan Raman
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA.
| |
Collapse
|
23
|
Shepherdson JL, Friedman RZ, Zheng Y, Sun C, Oh IY, Granas DM, Cohen BA, Chen S, White MA. Pathogenic variants in CRX have distinct cis-regulatory effects on enhancers and silencers in photoreceptors. Genome Res 2024; 34:243-255. [PMID: 38355306 PMCID: PMC10984388 DOI: 10.1101/gr.278133.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 02/01/2024] [Indexed: 02/16/2024]
Abstract
Dozens of variants in the gene for the homeodomain transcription factor (TF) cone-rod homeobox (CRX) are linked with human blinding diseases that vary in their severity and age of onset. How different variants in this single TF alter its function in ways that lead to a range of phenotypes is unclear. We characterized the effects of human disease-causing variants on CRX cis-regulatory function by deploying massively parallel reporter assays (MPRAs) in mouse retina explants carrying knock-ins of two variants, one in the DNA-binding domain (p.R90W) and the other in the transcriptional effector domain (p.E168d2). The degree of reporter gene dysregulation in these mutant Crx retinas corresponds with their phenotypic severity. The two variants affect similar sets of enhancers, and p.E168d2 has distinct effects on silencers. Cis-regulatory elements (CREs) near cone photoreceptor genes are enriched for silencers that are derepressed in the presence of p.E168d2. Chromatin environments of CRX-bound loci are partially predictive of episomal MPRA activity, and distal elements whose accessibility increases later in retinal development are enriched for CREs with silencer activity. We identified a set of potentially pleiotropic regulatory elements that convert from silencers to enhancers in retinas that lack a functional CRX effector domain. Our findings show that phenotypically distinct variants in different domains of CRX have partially overlapping effects on its cis-regulatory function, leading to misregulation of similar sets of enhancers while having a qualitatively different impact on silencers.
Collapse
Affiliation(s)
- James L Shepherdson
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Ryan Z Friedman
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Yiqiao Zheng
- Department of Ophthalmology and Visual Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Chi Sun
- Department of Ophthalmology and Visual Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Inez Y Oh
- Department of Ophthalmology and Visual Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - David M Granas
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Barak A Cohen
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Shiming Chen
- Department of Ophthalmology and Visual Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA;
- Department of Developmental Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Michael A White
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA;
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| |
Collapse
|
24
|
Kim YA, Mousavi K, Yazdi A, Zwierzyna M, Cardinali M, Fox D, Peel T, Coller J, Aggarwal K, Maruggi G. Computational design of mRNA vaccines. Vaccine 2024; 42:1831-1840. [PMID: 37479613 DOI: 10.1016/j.vaccine.2023.07.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 06/23/2023] [Accepted: 07/10/2023] [Indexed: 07/23/2023]
Abstract
mRNA technology has emerged as a successful vaccine platform that offered a swift response to the COVID-19 pandemic. Accumulating evidence shows that vaccine efficacy, thermostability, and other important properties, are largely impacted by intrinsic properties of the mRNA molecule, such as RNA sequence and structure, both of which can be optimized. Designing mRNA sequence for vaccines presents a combinatorial problem due to an extremely large selection space. For instance, due to the degeneracy of the genetic code, there are over 10632 possible mRNA sequences that could encode the spike protein, the COVID-19 vaccines' target. Moreover, designing different elements of the mRNA sequence simultaneously against multiple objectives such as translational efficiency, reduced reactogenicity, and improved stability requires an efficient and sophisticated optimization strategy. Recently, there has been a growing interest in utilizing computational tools to redesign mRNA sequences to improve vaccine characteristics and expedite discovery timelines. In this review, we explore important biophysical features of mRNA to be considered for vaccine design and discuss how computational approaches can be applied to rapidly design mRNA sequences with desirable characteristics.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Jeff Coller
- Johns Hopkins University, Baltimore, MD, USA
| | | | | |
Collapse
|
25
|
Ishigami Y, Wong MS, Martí-Gómez C, Ayaz A, Kooshkbaghi M, Hanson SM, McCandlish DM, Krainer AR, Kinney JB. Specificity, synergy, and mechanisms of splice-modifying drugs. Nat Commun 2024; 15:1880. [PMID: 38424098 PMCID: PMC10904865 DOI: 10.1038/s41467-024-46090-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 02/10/2024] [Indexed: 03/02/2024] Open
Abstract
Drugs that target pre-mRNA splicing hold great therapeutic potential, but the quantitative understanding of how these drugs work is limited. Here we introduce mechanistically interpretable quantitative models for the sequence-specific and concentration-dependent behavior of splice-modifying drugs. Using massively parallel splicing assays, RNA-seq experiments, and precision dose-response curves, we obtain quantitative models for two small-molecule drugs, risdiplam and branaplam, developed for treating spinal muscular atrophy. The results quantitatively characterize the specificities of risdiplam and branaplam for 5' splice site sequences, suggest that branaplam recognizes 5' splice sites via two distinct interaction modes, and contradict the prevailing two-site hypothesis for risdiplam activity at SMN2 exon 7. The results also show that anomalous single-drug cooperativity, as well as multi-drug synergy, are widespread among small-molecule drugs and antisense-oligonucleotide drugs that promote exon inclusion. Our quantitative models thus clarify the mechanisms of existing treatments and provide a basis for the rational development of new therapies.
Collapse
Affiliation(s)
- Yuma Ishigami
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Mandy S Wong
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
- Beam Therapeutics, Cambridge, MA, 02142, USA
| | | | - Andalus Ayaz
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Mahdi Kooshkbaghi
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
- The Estée Lauder Companies, New York, NY, 10153, USA
| | | | | | - Adrian R Krainer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| | - Justin B Kinney
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
26
|
Zeng M, Sarker B, Rondthaler SN, Vu V, Andrews LB. Identifying LasR Quorum Sensors with Improved Signal Specificity by Mapping the Sequence-Function Landscape. ACS Synth Biol 2024; 13:568-589. [PMID: 38206199 DOI: 10.1021/acssynbio.3c00543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
Programmable intercellular signaling using components of naturally occurring quorum sensing can allow for coordinated functions to be engineered in microbial consortia. LuxR-type transcriptional regulators are widely used for this purpose and are activated by homoserine lactone (HSL) signals. However, they often suffer from imperfect molecular discrimination of structurally similar HSLs, causing misregulation within engineered consortia containing multiple HSL signals. Here, we studied one such example, the regulator LasR from Pseudomonas aeruginosa. We elucidated its sequence-function relationship for ligand specificity using targeted protein engineering and multiplexed high-throughput biosensor screening. A pooled combinatorial saturation mutagenesis library (9,486 LasR DNA sequences) was created by mutating six residues in LasR's β5 sheet with single, double, or triple amino acid substitutions. Sort-seq assays were performed in parallel using cognate and noncognate HSLs to quantify each corresponding sensor's response to each HSL signal, which identified hundreds of highly specific variants. Sensor variants identified were individually assayed and exhibited up to 60.6-fold (p = 0.0013) improved relative activation by the cognate signal compared to the wildtype. Interestingly, we uncovered prevalent mutational epistasis and previously unidentified residues contributing to signal specificity. The resulting sensors with negligible signal crosstalk could be broadly applied to engineer bacteria consortia.
Collapse
Affiliation(s)
- Min Zeng
- Department of Chemical Engineering, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
| | - Biprodev Sarker
- Department of Chemical Engineering, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
| | - Stephen N Rondthaler
- Department of Chemical Engineering, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
| | - Vanessa Vu
- Department of Biochemistry and Molecular Biology, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
| | - Lauren B Andrews
- Department of Chemical Engineering, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
- Molecular and Cellular Biology Graduate Program, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
- Biotechnology Training Program, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
| |
Collapse
|
27
|
Nakamura T, Ueda J, Mizuno S, Honda K, Kazuno AA, Yamamoto H, Hara T, Takata A. Topologically associating domains define the impact of de novo promoter variants on autism spectrum disorder risk. CELL GENOMICS 2024; 4:100488. [PMID: 38280381 PMCID: PMC10879036 DOI: 10.1016/j.xgen.2024.100488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/24/2023] [Accepted: 01/02/2024] [Indexed: 01/29/2024]
Abstract
Whole-genome sequencing (WGS) studies of autism spectrum disorder (ASD) have demonstrated the roles of rare promoter de novo variants (DNVs). However, most promoter DNVs in ASD are not located immediately upstream of known ASD genes. In this study analyzing WGS data of 5,044 ASD probands, 4,095 unaffected siblings, and their parents, we show that promoter DNVs within topologically associating domains (TADs) containing ASD genes are significantly and specifically associated with ASD. An analysis considering TADs as functional units identified specific TADs enriched for promoter DNVs in ASD and indicated that common variants in these regions also confer ASD heritability. Experimental validation using human induced pluripotent stem cells (iPSCs) showed that likely deleterious promoter DNVs in ASD can influence multiple genes within the same TAD, resulting in overall dysregulation of ASD-associated genes. These results highlight the importance of TADs and gene-regulatory mechanisms in better understanding the genetic architecture of ASD.
Collapse
Affiliation(s)
- Takumi Nakamura
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Junko Ueda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
| | - Shota Mizuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Kurara Honda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - An-A Kazuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Hirona Yamamoto
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Tomonori Hara
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Organ Anatomy, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
| | - Atsushi Takata
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Research Institute for Diseases of Old Age, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo 113-8421, Japan.
| |
Collapse
|
28
|
Fowler DM, Rehm HL. Will variants of uncertain significance still exist in 2030? Am J Hum Genet 2024; 111:5-10. [PMID: 38086381 PMCID: PMC10806733 DOI: 10.1016/j.ajhg.2023.11.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 11/12/2023] [Accepted: 11/13/2023] [Indexed: 12/28/2023] Open
Abstract
In 2020, the National Human Genome Research Institute (NHGRI) made ten "bold predictions," including that "the clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation 'variant of uncertain significance (VUS)' obsolete." We discuss the prospects for this prediction, arguing that many, if not most, VUS in coding regions will be resolved by 2030. We outline a confluence of recent changes making this possible, especially advances in the standards for variant classification that better leverage diverse types of evidence, improvements in computational variant effect predictor performance, scalable multiplexed assays of variant effect capable of saturating the genome, and data-sharing efforts that will maximize the information gained from each new individual sequenced and variant interpreted. We suggest that clinicians and researchers can realize a future where VUSs have largely been eliminated, in line with the NHGRI's bold prediction. The length of time taken to reach this future, and thus whether we are able to achieve the goal of largely eliminating VUSs by 2030, is largely a consequence of the choices made now and in the next few years. We believe that investing in eliminating VUSs is worthwhile, since their predominance remains one of the biggest challenges to precision genomic medicine.
Collapse
Affiliation(s)
- Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA; Department of Bioengineering, University of Washington, Seattle, WA, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
| | - Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
29
|
de Boer CG, Taipale J. Hold out the genome: a roadmap to solving the cis-regulatory code. Nature 2024; 625:41-50. [PMID: 38093018 DOI: 10.1038/s41586-023-06661-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/20/2023] [Indexed: 01/05/2024]
Abstract
Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.
Collapse
Affiliation(s)
- Carl G de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Jussi Taipale
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.
- Department of Biochemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
30
|
Xi C, Diao J, Moon TS. Advances in ligand-specific biosensing for structurally similar molecules. Cell Syst 2023; 14:1024-1043. [PMID: 38128482 PMCID: PMC10751988 DOI: 10.1016/j.cels.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/23/2023] [Accepted: 10/19/2023] [Indexed: 12/23/2023]
Abstract
The specificity of biological systems makes it possible to develop biosensors targeting specific metabolites, toxins, and pollutants in complex medical or environmental samples without interference from structurally similar compounds. For the last two decades, great efforts have been devoted to creating proteins or nucleic acids with novel properties through synthetic biology strategies. Beyond augmenting biocatalytic activity, expanding target substrate scopes, and enhancing enzymes' enantioselectivity and stability, an increasing research area is the enhancement of molecular specificity for genetically encoded biosensors. Here, we summarize recent advances in the development of highly specific biosensor systems and their essential applications. First, we describe the rational design principles required to create libraries containing potential mutants with less promiscuity or better specificity. Next, we review the emerging high-throughput screening techniques to engineer biosensing specificity for the desired target. Finally, we examine the computer-aided evaluation and prediction methods to facilitate the construction of ligand-specific biosensors.
Collapse
Affiliation(s)
- Chenggang Xi
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Jinjin Diao
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Tae Seok Moon
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA; Division of Biology and Biomedical Sciences, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
31
|
Lässig M, Mustonen V, Nourmohammad A. Steering and controlling evolution - from bioengineering to fighting pathogens. Nat Rev Genet 2023; 24:851-867. [PMID: 37400577 PMCID: PMC11137064 DOI: 10.1038/s41576-023-00623-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/30/2023] [Indexed: 07/05/2023]
Abstract
Control interventions steer the evolution of molecules, viruses, microorganisms or other cells towards a desired outcome. Applications range from engineering biomolecules and synthetic organisms to drug, therapy and vaccine design against pathogens and cancer. In all these instances, a control system alters the eco-evolutionary trajectory of a target system, inducing new functions or suppressing escape evolution. Here, we synthesize the objectives, mechanisms and dynamics of eco-evolutionary control in different biological systems. We discuss how the control system learns and processes information about the target system by sensing or measuring, through adaptive evolution or computational prediction of future trajectories. This information flow distinguishes pre-emptive control strategies by humans from feedback control in biotic systems. We establish a cost-benefit calculus to gauge and optimize control protocols, highlighting the fundamental link between predictability of evolution and efficacy of pre-emptive control.
Collapse
Affiliation(s)
- Michael Lässig
- Institute for Biological Physics, University of Cologne, Cologne, Germany.
| | - Ville Mustonen
- Organismal and Evolutionary Biology Research Programme, Department of Computer Science, Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| | - Armita Nourmohammad
- Department of Physics, University of Washington, Seattle, WA, USA.
- Department of Applied Mathematics, University of Washington, Seattle, WA, USA.
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
- Herbold Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, USA.
| |
Collapse
|
32
|
Charest N, Shen Y, Lai YC, Chen IA, Shea JE. Discovering pathways through ribozyme fitness landscapes using information theoretic quantification of epistasis. RNA (NEW YORK, N.Y.) 2023; 29:1644-1657. [PMID: 37580126 PMCID: PMC10578471 DOI: 10.1261/rna.079541.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 07/29/2023] [Indexed: 08/16/2023]
Abstract
The identification of catalytic RNAs is typically achieved through primarily experimental means. However, only a small fraction of sequence space can be analyzed even with high-throughput techniques. Methods to extrapolate from a limited data set to predict additional ribozyme sequences, particularly in a human-interpretable fashion, could be useful both for designing new functional RNAs and for generating greater understanding about a ribozyme fitness landscape. Using information theory, we express the effects of epistasis (i.e., deviations from additivity) on a ribozyme. This representation was incorporated into a simple model of the epistatic fitness landscape, which identified potentially exploitable combinations of mutations. We used this model to theoretically predict mutants of high activity for a self-aminoacylating ribozyme, identifying potentially active triple and quadruple mutants beyond the experimental data set of single and double mutants. The predictions were validated experimentally, with nine out of nine sequences being accurately predicted to have high activity. This set of sequences included mutants that form a previously unknown evolutionary "bridge" between two ribozyme families that share a common motif. Individual steps in the method could be examined, understood, and guided by a human, combining interpretability and performance in a simple model to predict ribozyme sequences by extrapolation.
Collapse
Affiliation(s)
- Nathaniel Charest
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
| | - Yuning Shen
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
| | - Yei-Chen Lai
- Department of Chemistry, National Chung Hsing University, Taichung City 40227, Taiwan
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, California 90095, USA
| | - Irene A Chen
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, California 90095, USA
| | - Joan-Emma Shea
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
| |
Collapse
|
33
|
Quan N, Eguchi Y, Geiler-Samerotte K. Intra- FCY1: a novel system to identify mutations that cause protein misfolding. Front Genet 2023; 14:1198203. [PMID: 37745845 PMCID: PMC10512024 DOI: 10.3389/fgene.2023.1198203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 08/22/2023] [Indexed: 09/26/2023] Open
Abstract
Protein misfolding is a common intracellular occurrence. Most mutations to coding sequences increase the propensity of the encoded protein to misfold. These misfolded molecules can have devastating effects on cells. Despite the importance of protein misfolding in human disease and protein evolution, there are fundamental questions that remain unanswered, such as, which mutations cause the most misfolding? These questions are difficult to answer partially because we lack high-throughput methods to compare the destabilizing effects of different mutations. Commonly used systems to assess the stability of mutant proteins in vivo often rely upon essential proteins as sensors, but misfolded proteins can disrupt the function of the essential protein enough to kill the cell. This makes it difficult to identify and compare mutations that cause protein misfolding using these systems. Here, we present a novel in vivo system named Intra-FCY1 that we use to identify mutations that cause misfolding of a model protein [yellow fluorescent protein (YFP)] in Saccharomyces cerevisiae. The Intra-FCY1 system utilizes two complementary fragments of the yeast cytosine deaminase Fcy1, a toxic protein, into which YFP is inserted. When YFP folds, the Fcy1 fragments associate together to reconstitute their function, conferring toxicity in media containing 5-fluorocytosine and hindering growth. But mutations that make YFP misfold abrogate Fcy1 toxicity, thus strains possessing misfolded YFP variants rise to high frequency in growth competition experiments. This makes such strains easier to study. The Intra-FCY1 system cancels localization of the protein of interest, thus can be applied to study the relative stability of mutant versions of diverse cellular proteins. Here, we confirm this method can identify novel mutations that cause misfolding, highlighting the potential for Intra-FCY1 to illuminate the relationship between protein sequence and stability.
Collapse
Affiliation(s)
- N. Quan
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ, United States
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Y. Eguchi
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ, United States
| | - K. Geiler-Samerotte
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ, United States
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| |
Collapse
|
34
|
Ginell GM, Flynn AJ, Holehouse AS. SHEPHARD: a modular and extensible software architecture for analyzing and annotating large protein datasets. Bioinformatics 2023; 39:btad488. [PMID: 37540173 PMCID: PMC10423030 DOI: 10.1093/bioinformatics/btad488] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 07/02/2023] [Accepted: 08/03/2023] [Indexed: 08/05/2023] Open
Abstract
MOTIVATION The emergence of high-throughput experiments and high-resolution computational predictions has led to an explosion in the quality and volume of protein sequence annotations at proteomic scales. Unfortunately, sanity checking, integrating, and analyzing complex sequence annotations remains logistically challenging and introduces a major barrier to entry for even superficial integrative bioinformatics. RESULTS To address this technical burden, we have developed SHEPHARD, a Python framework that trivializes large-scale integrative protein bioinformatics. SHEPHARD combines an object-oriented hierarchical data structure with database-like features, enabling programmatic annotation, integration, and analysis of complex datatypes. Importantly SHEPHARD is easy to use and enables a Pythonic interrogation of largescale protein datasets with millions of unique annotations. We use SHEPHARD to examine three orthogonal proteome-wide questions relating protein sequence to molecular function, illustrating its ability to uncover novel biology. AVAILABILITY AND IMPLEMENTATION We provided SHEPHARD as both a stand-alone software package (https://github.com/holehouse-lab/shephard), and as a Google Colab notebook with a collection of precomputed proteome-wide annotations (https://github.com/holehouse-lab/shephard-colab).
Collapse
Affiliation(s)
- Garrett M Ginell
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, Saint Louis, MO 63110, United States
- Center for Biomolecular Condensates, Washington University in St. Louis, 1 Brookings Drive, Saint Louis, MO 63130, United States
| | - Aidan J Flynn
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, Saint Louis, MO 63110, United States
- Center for Biomolecular Condensates, Washington University in St. Louis, 1 Brookings Drive, Saint Louis, MO 63130, United States
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, Saint Louis, MO 63110, United States
- Center for Biomolecular Condensates, Washington University in St. Louis, 1 Brookings Drive, Saint Louis, MO 63130, United States
| |
Collapse
|
35
|
Wang Y, Zhang K, Zhao Y, Li Y, Su W, Li S. Construction and Applications of Mammalian Cell-Based DNA-Encoded Peptide/Protein Libraries. ACS Synth Biol 2023; 12:1874-1888. [PMID: 37315219 DOI: 10.1021/acssynbio.3c00043] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
DNA-encoded peptide/protein libraries are the starting point for protein evolutionary modification and functional peptide/antibody selection. Different display technologies, protein directed evolution, and deep mutational scanning (DMS) experiments employ DNA-encoded libraries to provide sequence variations for downstream affinity- or function-based selections. Mammalian cells promise the inherent post-translational modification and near-to-natural conformation of exogenously expressed mammalian proteins and thus are the best platform for studying transmembrane proteins or human disease-related proteins. However, due to the current technical bottlenecks of constructing mammalian cell-based large size DNA-encoded libraries, the advantages of mammalian cells as screening platforms have not been fully exploited. In this review, we summarize the current efforts in constructing DNA-encoded libraries in mammalian cells and the existing applications of these libraries in different fields.
Collapse
Affiliation(s)
- Yi Wang
- Department of Breast Cancer Pathology and Research Laboratory, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer; Key Laboratory of Cancer Prevention and Therapy, Tianjin; Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Kaili Zhang
- Department of Breast Cancer Pathology and Research Laboratory, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer; Key Laboratory of Cancer Prevention and Therapy, Tianjin; Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Yanjie Zhao
- Department of Breast Cancer Pathology and Research Laboratory, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer; Key Laboratory of Cancer Prevention and Therapy, Tianjin; Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Yifan Li
- Department of Breast Cancer Pathology and Research Laboratory, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer; Key Laboratory of Cancer Prevention and Therapy, Tianjin; Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Weijun Su
- School of Medicine, Nankai University, Tianjin 300071, China
| | - Shuai Li
- Department of Breast Cancer Pathology and Research Laboratory, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer; Key Laboratory of Cancer Prevention and Therapy, Tianjin; Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| |
Collapse
|
36
|
Fowler DM, Adams DJ, Gloyn AL, Hahn WC, Marks DS, Muffley LA, Neal JT, Roth FP, Rubin AF, Starita LM, Hurles ME. An Atlas of Variant Effects to understand the genome at nucleotide resolution. Genome Biol 2023; 24:147. [PMID: 37394429 PMCID: PMC10316620 DOI: 10.1186/s13059-023-02986-x] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 06/13/2023] [Indexed: 07/04/2023] Open
Abstract
Sequencing has revealed hundreds of millions of human genetic variants, and continued efforts will only add to this variant avalanche. Insufficient information exists to interpret the effects of most variants, limiting opportunities for precision medicine and comprehension of genome function. A solution lies in experimental assessment of the functional effect of variants, which can reveal their biological and clinical impact. However, variant effect assays have generally been undertaken reactively for individual variants only after and, in most cases long after, their first observation. Now, multiplexed assays of variant effect can characterise massive numbers of variants simultaneously, yielding variant effect maps that reveal the function of every possible single nucleotide change in a gene or regulatory element. Generating maps for every protein encoding gene and regulatory element in the human genome would create an 'Atlas' of variant effect maps and transform our understanding of genetics and usher in a new era of nucleotide-resolution functional knowledge of the genome. An Atlas would reveal the fundamental biology of the human genome, inform human evolution, empower the development and use of therapeutics and maximize the utility of genomics for diagnosing and treating disease. The Atlas of Variant Effects Alliance is an international collaborative group comprising hundreds of researchers, technologists and clinicians dedicated to realising an Atlas of Variant Effects to help deliver on the promise of genomics.
Collapse
Affiliation(s)
- Douglas M. Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA USA
- Department of Bioengineering, University of Washington, Seattle, WA USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA USA
| | | | - Anna L. Gloyn
- Department of Pediatrics & Department of Genetics, Division of Endocrinology, Stanford School of Medicine, Stanford University, Stanford, CA USA
| | - William C. Hahn
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Debora S. Marks
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Systems Biology, Harvard Medical School, Cambridge, USA
| | - Lara A. Muffley
- Department of Genome Sciences, University of Washington, Seattle, WA USA
| | - James T. Neal
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease at Broad Institute, Cambridge, MA USA
| | - Frederick P. Roth
- Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON Canada
| | - Alan F. Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| | - Lea M. Starita
- Department of Genome Sciences, University of Washington, Seattle, WA USA
- Department of Bioengineering, University of Washington, Seattle, WA USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA USA
| | | |
Collapse
|
37
|
FitzPatrick VD, Leemans C, van Arensbergen J, van Steensel B, Bussemaker H. Defining the fine structure of promoter activity on a genome-wide scale with CISSECTOR. Nucleic Acids Res 2023; 51:5499-5511. [PMID: 37013986 PMCID: PMC10287907 DOI: 10.1093/nar/gkad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 03/08/2023] [Accepted: 03/22/2023] [Indexed: 04/05/2023] Open
Abstract
Classic promoter mutagenesis strategies can be used to study how proximal promoter regions regulate the expression of particular genes of interest. This is a laborious process, in which the smallest sub-region of the promoter still capable of recapitulating expression in an ectopic setting is first identified, followed by targeted mutation of putative transcription factor binding sites. Massively parallel reporter assays such as survey of regulatory elements (SuRE) provide an alternative way to study millions of promoter fragments in parallel. Here we show how a generalized linear model (GLM) can be used to transform genome-scale SuRE data into a high-resolution genomic track that quantifies the contribution of local sequence to promoter activity. This coefficient track helps identify regulatory elements and can be used to predict promoter activity of any sub-region in the genome. It thus allows in silico dissection of any promoter in the human genome to be performed. We developed a web application, available at cissector.nki.nl, that lets researchers easily perform this analysis as a starting point for their research into any promoter of interest.
Collapse
Affiliation(s)
- Vincent D FitzPatrick
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Christ Leemans
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Joris van Arensbergen
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Bas van Steensel
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
- Department of Cell Biology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
38
|
Diaz-Colunga J, Skwara A, Gowda K, Diaz-Uriarte R, Tikhonov M, Bajic D, Sanchez A. Global epistasis on fitness landscapes. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220053. [PMID: 37004717 PMCID: PMC10067270 DOI: 10.1098/rstb.2022.0053] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 11/23/2022] [Indexed: 04/04/2023] Open
Abstract
Epistatic interactions between mutations add substantial complexity to adaptive landscapes and are often thought of as detrimental to our ability to predict evolution. Yet, patterns of global epistasis, in which the fitness effect of a mutation is well-predicted by the fitness of its genetic background, may actually be of help in our efforts to reconstruct fitness landscapes and infer adaptive trajectories. Microscopic interactions between mutations, or inherent nonlinearities in the fitness landscape, may cause global epistasis patterns to emerge. In this brief review, we provide a succinct overview of recent work about global epistasis, with an emphasis on building intuition about why it is often observed. To this end, we reconcile simple geometric reasoning with recent mathematical analyses, using these to explain why different mutations in an empirical landscape may exhibit different global epistasis patterns-ranging from diminishing to increasing returns. Finally, we highlight open questions and research directions. This article is part of the theme issue 'Interdisciplinary approaches to predicting evolutionary biology'.
Collapse
Affiliation(s)
- Juan Diaz-Colunga
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - Abigail Skwara
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - Karna Gowda
- Department of Ecology & Evolution & Center for the Physics of Evolving Systems, The University of Chicago, Chicago, IL 60637, USA
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, School of Medicine, Universidad Autónoma de Madrid, Madrid 28029, Spain
- Instituto de Investigaciones Biomédicas ‘Alberto Sols’ (UAM-CSIC), Madrid 28029, Spain
| | - Mikhail Tikhonov
- Department of Physics, Washington University of St Louis, St Louis, MO 63130, USA
| | - Djordje Bajic
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - Alvaro Sanchez
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA
- Department of Microbial Biotechnology, Campus de Cantoblanco, CNB-CSIC, Madrid 28049, Spain
| |
Collapse
|
39
|
Blaabjerg LM, Kassem MM, Good LL, Jonsson N, Cagiada M, Johansson KE, Boomsma W, Stein A, Lindorff-Larsen K. Rapid protein stability prediction using deep learning representations. eLife 2023; 12:e82593. [PMID: 37184062 PMCID: PMC10266766 DOI: 10.7554/elife.82593] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 05/12/2023] [Indexed: 05/16/2023] Open
Abstract
Predicting the thermodynamic stability of proteins is a common and widely used step in protein engineering, and when elucidating the molecular mechanisms behind evolution and disease. Here, we present RaSP, a method for making rapid and accurate predictions of changes in protein stability by leveraging deep learning representations. RaSP performs on-par with biophysics-based methods and enables saturation mutagenesis stability predictions in less than a second per residue. We use RaSP to calculate ∼ 230 million stability changes for nearly all single amino acid changes in the human proteome, and examine variants observed in the human population. We find that variants that are common in the population are substantially depleted for severe destabilization, and that there are substantial differences between benign and pathogenic variants, highlighting the role of protein stability in genetic diseases. RaSP is freely available-including via a Web interface-and enables large-scale analyses of stability in experimental and predicted protein structures.
Collapse
Affiliation(s)
- Lasse M Blaabjerg
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Maher M Kassem
- Center for Basic Machine Learning Research in Life Science, Department of Computer Science, University of CopenhagenCopenhagenDenmark
| | - Lydia L Good
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Nicolas Jonsson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Wouter Boomsma
- Center for Basic Machine Learning Research in Life Science, Department of Computer Science, University of CopenhagenCopenhagenDenmark
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| |
Collapse
|
40
|
Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023; 24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Enhancers are genomic DNA elements controlling spatiotemporal gene expression. Their flexible organization and functional redundancies make deciphering their sequence-function relationships challenging. This article provides an overview of the current understanding of enhancer organization and evolution, with an emphasis on factors that influence these relationships. Technological advancements, particularly in machine learning and synthetic biology, are discussed in light of how they provide new ways to understand this complexity. Exciting opportunities lie ahead as we continue to unravel the intricacies of enhancer function.
Collapse
Affiliation(s)
- Gabrielle D Smith
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Wan Hern Ching
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
| | - Paola Cornejo-Páramo
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Emily S Wong
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia.
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia.
| |
Collapse
|
41
|
Johansson KE, Lindorff-Larsen K, Winther JR. Global Analysis of Multi-Mutants to Improve Protein Function. J Mol Biol 2023; 435:168034. [PMID: 36863661 DOI: 10.1016/j.jmb.2023.168034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/04/2023]
Abstract
The identification of amino acid substitutions that both enhance the stability and function of a protein is a key challenge in protein engineering. Technological advances have enabled assaying thousands of protein variants in a single high-throughput experiment, and more recent studies use such data in protein engineering. We present a Global Multi-Mutant Analysis (GMMA) that exploits the presence of multiply-substituted variants to identify individual amino acid substitutions that are beneficial for the stability and function across a large library of protein variants. We have applied GMMA to a previously published experiment reporting on >54,000 variants of green fluorescent protein (GFP), each with known fluorescence output, and each carrying 1-15 amino acid substitutions (Sarkisyan et al., 2016). The GMMA method achieves a good fit to this dataset while being analytically transparent. We show experimentally that the six top-ranking substitutions progressively enhance GFP. More broadly, using only a single experiment as input our analysis recovers nearly all the substitutions previously reported to be beneficial for GFP folding and function. In conclusion, we suggest that large libraries of multiply-substituted variants may provide a unique source of information for protein engineering.
Collapse
Affiliation(s)
- Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology of (University of Copenhagen), Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology of (University of Copenhagen), Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Jakob R Winther
- Linderstrøm-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology of (University of Copenhagen), Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
42
|
Wei H, Li X. Deep mutational scanning: A versatile tool in systematically mapping genotypes to phenotypes. Front Genet 2023; 14:1087267. [PMID: 36713072 PMCID: PMC9878224 DOI: 10.3389/fgene.2023.1087267] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 01/02/2023] [Indexed: 01/13/2023] Open
Abstract
Unveiling how genetic variations lead to phenotypic variations is one of the key questions in evolutionary biology, genetics, and biomedical research. Deep mutational scanning (DMS) technology has allowed the mapping of tens of thousands of genetic variations to phenotypic variations efficiently and economically. Since its first systematic introduction about a decade ago, we have witnessed the use of deep mutational scanning in many research areas leading to scientific breakthroughs. Also, the methods in each step of deep mutational scanning have become much more versatile thanks to the oligo-synthesizing technology, high-throughput phenotyping methods and deep sequencing technology. However, each specific possible step of deep mutational scanning has its pros and cons, and some limitations still await further technological development. Here, we discuss recent scientific accomplishments achieved through the deep mutational scanning and describe widely used methods in each step of deep mutational scanning. We also compare these different methods and analyze their advantages and disadvantages, providing insight into how to design a deep mutational scanning study that best suits the aims of the readers' projects.
Collapse
Affiliation(s)
- Huijin Wei
- Zhejiang University—University of Edinburgh Institute, Zhejiang University, Haining, Zhejiang, China
| | - Xianghua Li
- Zhejiang University—University of Edinburgh Institute, Zhejiang University, Haining, Zhejiang, China
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh, United Kingdom
- The Second Affiliated Hospital of Zhejiang University, Hangzhou, Zhejiang, China
- Biomedical and Health Translational Centre of Zhejiang Province, Haining, Zhejiang, China
| |
Collapse
|
43
|
Zhao FL, Zhang Q, Wang SH, Hong Y, Zhou S, Zhou Q, Geng PW, Luo QF, Yang JF, Chen H, Cai JP, Dai DP. Identification and drug metabolic characterization of four new CYP2C9 variants CYP2C9*72- *75 in the Chinese Han population. Front Pharmacol 2022; 13:1007268. [PMID: 36582532 PMCID: PMC9792615 DOI: 10.3389/fphar.2022.1007268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 12/01/2022] [Indexed: 12/15/2022] Open
Abstract
Cytochrome 2C9 (CYP2C9), one of the most important drug metabolic enzymes in the human hepatic P450 superfamily, is required for the metabolism of 15% of clinical drugs. Similar to other CYP2C family members, CYP2C9 gene has a high genetic polymorphism which can cause significant racial and inter-individual differences in drug metabolic activity. To better understand the genetic distribution pattern of CYP2C9 in the Chinese Han population, 931 individuals were recruited and used for the genotyping in this study. As a result, seven synonymous and 14 non-synonymous variations were identified, of which 4 missense variants were designated as new alleles CYP2C9*72, *73, *74 and *75, resulting in the amino acid substitutions of A149V, R150C, Q214H and N418T, respectively. When expressed in insect cell microsomes, all four variants exhibited comparable protein expression levels to that of the wild-type CYP2C9 enzyme. However, drug metabolic activity analysis revealed that these variants exhibited significantly decreased catalytic activities toward three CYP2C9 specific probe drugs, as compared with that of the wild-type enzyme. These data indicate that the amino acid substitution in newly designated variants can cause reduced function of the enzyme and its clinical significance still needs further investigation in the future.
Collapse
Affiliation(s)
- Fang-Ling Zhao
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology of National Health Commission, Beijing, China,Peking University Fifth School of Clinical Medicine, Beijing, China
| | - Qing Zhang
- Department of Cardiovascular, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Shuang-Hu Wang
- Laboratory of Clinical Pharmacy, The Sixth Affiliated Hospital of Wenzhou Medical University, The People’s Hospital of Lishui, Lishui, China
| | - Yun Hong
- Department of Gastroenterology, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Shan Zhou
- Peking University Fifth School of Clinical Medicine, Beijing, China
| | - Quan Zhou
- Laboratory of Clinical Pharmacy, The Sixth Affiliated Hospital of Wenzhou Medical University, The People’s Hospital of Lishui, Lishui, China
| | - Pei-Wu Geng
- Laboratory of Clinical Pharmacy, The Sixth Affiliated Hospital of Wenzhou Medical University, The People’s Hospital of Lishui, Lishui, China
| | - Qing-Feng Luo
- Department of Gastroenterology, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Jie-Fu Yang
- Department of Cardiovascular, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Hao Chen
- Department of Cardiovascular, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China,*Correspondence: Da-Peng Dai, ; Jian-Ping Cai, ; Hao Chen,
| | - Jian-Ping Cai
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology of National Health Commission, Beijing, China,*Correspondence: Da-Peng Dai, ; Jian-Ping Cai, ; Hao Chen,
| | - Da-Peng Dai
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology of National Health Commission, Beijing, China,Peking University Fifth School of Clinical Medicine, Beijing, China,*Correspondence: Da-Peng Dai, ; Jian-Ping Cai, ; Hao Chen,
| |
Collapse
|
44
|
Tabet D, Parikh V, Mali P, Roth FP, Claussnitzer M. Scalable Functional Assays for the Interpretation of Human Genetic Variation. Annu Rev Genet 2022; 56:441-465. [PMID: 36055970 DOI: 10.1146/annurev-genet-072920-032107] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Scalable sequence-function studies have enabled the systematic analysis and cataloging of hundreds of thousands of coding and noncoding genetic variants in the human genome. This has improved clinical variant interpretation and provided insights into the molecular, biophysical, and cellular effects of genetic variants at an astonishing scale and resolution across the spectrum of allele frequencies. In this review, we explore current applications and prospects for the field and outline the principles underlying scalable functional assay design, with a focus on the study of single-nucleotide coding and noncoding variants.
Collapse
Affiliation(s)
- Daniel Tabet
- Donnelly Centre, Department of Molecular Genetics, and Department of Computer Science, University of Toronto, Toronto, Ontario, Canada;
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Victoria Parikh
- Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - Prashant Mali
- Department of Bioengineering, University of California, San Diego, California, USA
| | - Frederick P Roth
- Donnelly Centre, Department of Molecular Genetics, and Department of Computer Science, University of Toronto, Toronto, Ontario, Canada;
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Melina Claussnitzer
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Center for Genomic Medicine and Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Harvard University, Boston, Massachusetts, USA;
| |
Collapse
|
45
|
Fernandez-Lopez R, Ruiz R, del Campo I, Gonzalez-Montes L, Boer D, de la Cruz F, Moncalian G. Structural basis of direct and inverted DNA sequence repeat recognition by helix-turn-helix transcription factors. Nucleic Acids Res 2022; 50:11938-11947. [PMID: 36370103 PMCID: PMC9723621 DOI: 10.1093/nar/gkac1024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 10/13/2022] [Accepted: 10/25/2022] [Indexed: 11/13/2022] Open
Abstract
Some transcription factors bind DNA motifs containing direct or inverted sequence repeats. Preference for each of these DNA topologies is dictated by structural constraints. Most prokaryotic regulators form symmetric oligomers, which require operators with a dyad structure. Binding to direct repeats requires breaking the internal symmetry, a property restricted to a few regulators, most of them from the AraC family. The KorA family of transcriptional repressors, involved in plasmid propagation and stability, includes members that form symmetric dimers and recognize inverted repeats. Our structural analyses show that ArdK, a member of this family, can form a symmetric dimer similar to that observed for KorA, yet it binds direct sequence repeats as a non-symmetric dimer. This is possible by the 180° rotation of one of the helix-turn-helix domains. We then probed and confirmed that ArdK shows affinity for an inverted repeat, which, surprisingly, is also recognized by a non-symmetrical dimer. Our results indicate that structural flexibility at different positions in the dimerization interface constrains transcription factors to bind DNA sequences with one of these two alternative DNA topologies.
Collapse
Affiliation(s)
- Raul Fernandez-Lopez
- Departamento de Biología Molecular, Universidad de Cantabria and Instituto de Biomedicina y Biotecnología de Cantabria (IBBTEC), CSIC-Universidad de Cantabria, 39011, Santander, Spain
| | - Raul Ruiz
- Departamento de Biología Molecular, Universidad de Cantabria and Instituto de Biomedicina y Biotecnología de Cantabria (IBBTEC), CSIC-Universidad de Cantabria, 39011, Santander, Spain
| | - Irene del Campo
- Departamento de Biología Molecular, Universidad de Cantabria and Instituto de Biomedicina y Biotecnología de Cantabria (IBBTEC), CSIC-Universidad de Cantabria, 39011, Santander, Spain
| | - Lorena Gonzalez-Montes
- Departamento de Biología Molecular, Universidad de Cantabria and Instituto de Biomedicina y Biotecnología de Cantabria (IBBTEC), CSIC-Universidad de Cantabria, 39011, Santander, Spain
| | - D Roeland Boer
- Alba Synchrotron, Cerdanyola del Vallès, 08290, Barcelona, Spain
| | | | | |
Collapse
|
46
|
Schubert OT, Bloom JS, Sadhu MJ, Kruglyak L. Genome-wide base editor screen identifies regulators of protein abundance in yeast. eLife 2022; 11:e79525. [PMID: 36326816 PMCID: PMC9633064 DOI: 10.7554/elife.79525] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 09/23/2022] [Indexed: 11/07/2022] Open
Abstract
Proteins are key molecular players in a cell, and their abundance is extensively regulated not just at the level of gene expression but also post-transcriptionally. Here, we describe a genetic screen in yeast that enables systematic characterization of how protein abundance regulation is encoded in the genome. The screen combines a CRISPR/Cas9 base editor to introduce point mutations with fluorescent tagging of endogenous proteins to facilitate a flow-cytometric readout. We first benchmarked base editor performance in yeast with individual gRNAs as well as in positive and negative selection screens. We then examined the effects of 16,452 genetic perturbations on the abundance of eleven proteins representing a variety of cellular functions. We uncovered hundreds of regulatory relationships, including a novel link between the GAPDH isoenzymes Tdh1/2/3 and the Ras/PKA pathway. Many of the identified regulators are specific to one of the eleven proteins, but we also found genes that, upon perturbation, affected the abundance of most of the tested proteins. While the more specific regulators usually act transcriptionally, broad regulators often have roles in protein translation. Overall, our novel screening approach provides unprecedented insights into the components, scale and connectedness of the protein regulatory network.
Collapse
Affiliation(s)
- Olga T Schubert
- Department of Human Genetics, University of California, Los AngelesLos AngelesUnited States
- Department of Biological Chemistry, University of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical Institute, University of California, Los AngelesLos AngelesUnited States
- Institute for Quantitative and Computational Biology, University of California, Los AngelesLos AngelesUnited States
- Department of Environmental Systems Science, Swiss Federal Institute of Technology (ETH)ZürichSwitzerland
- Department of Environmental Microbiology, Swiss Federal Institute of Aquatic Science and Technology (Eawag)DübendorfSwitzerland
| | - Joshua S Bloom
- Department of Human Genetics, University of California, Los AngelesLos AngelesUnited States
- Department of Biological Chemistry, University of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical Institute, University of California, Los AngelesLos AngelesUnited States
- Institute for Quantitative and Computational Biology, University of California, Los AngelesLos AngelesUnited States
| | - Meru J Sadhu
- Department of Human Genetics, University of California, Los AngelesLos AngelesUnited States
- Department of Biological Chemistry, University of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical Institute, University of California, Los AngelesLos AngelesUnited States
- Institute for Quantitative and Computational Biology, University of California, Los AngelesLos AngelesUnited States
| | - Leonid Kruglyak
- Department of Human Genetics, University of California, Los AngelesLos AngelesUnited States
- Department of Biological Chemistry, University of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical Institute, University of California, Los AngelesLos AngelesUnited States
- Institute for Quantitative and Computational Biology, University of California, Los AngelesLos AngelesUnited States
| |
Collapse
|
47
|
Azbukina N, Zharikova A, Ramensky V. Intragenic compensation through the lens of deep mutational scanning. Biophys Rev 2022; 14:1161-1182. [PMID: 36345285 PMCID: PMC9636336 DOI: 10.1007/s12551-022-01005-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 09/26/2022] [Indexed: 12/20/2022] Open
Abstract
A significant fraction of mutations in proteins are deleterious and result in adverse consequences for protein function, stability, or interaction with other molecules. Intragenic compensation is a specific case of positive epistasis when a neutral missense mutation cancels effect of a deleterious mutation in the same protein. Permissive compensatory mutations facilitate protein evolution, since without them all sequences would be extremely conserved. Understanding compensatory mechanisms is an important scientific challenge at the intersection of protein biophysics and evolution. In human genetics, intragenic compensatory interactions are important since they may result in variable penetrance of pathogenic mutations or fixation of pathogenic human alleles in orthologous proteins from related species. The latter phenomenon complicates computational and clinical inference of an allele's pathogenicity. Deep mutational scanning is a relatively new technique that enables experimental studies of functional effects of thousands of mutations in proteins. We review the important aspects of the field and discuss existing limitations of current datasets. We reviewed ten published DMS datasets with quantified functional effects of single and double mutations and described rates and patterns of intragenic compensation in eight of them. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-022-01005-w.
Collapse
Affiliation(s)
- Nadezhda Azbukina
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
| | - Anastasia Zharikova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| | - Vasily Ramensky
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| |
Collapse
|
48
|
Abstract
One core goal of genetics is to systematically understand the mapping between the DNA sequence of an organism (genotype) and its measurable characteristics (phenotype). Understanding this mapping is often challenging because of interactions between mutations, where the result of combining several different mutations can be very different than the sum of their individual effects. Here we provide a statistical framework for modeling complex genetic interactions of this type. The key idea is to ask how fast the effects of mutations change when introducing the same mutation in increasingly distant genetic backgrounds. We then propose a model for phenotypic prediction that takes into account this tendency for the effects of mutations to be more similar in nearby genetic backgrounds. Contemporary high-throughput mutagenesis experiments are providing an increasingly detailed view of the complex patterns of genetic interaction that occur between multiple mutations within a single protein or regulatory element. By simultaneously measuring the effects of thousands of combinations of mutations, these experiments have revealed that the genotype–phenotype relationship typically reflects not only genetic interactions between pairs of sites but also higher-order interactions among larger numbers of sites. However, modeling and understanding these higher-order interactions remains challenging. Here we present a method for reconstructing sequence-to-function mappings from partially observed data that can accommodate all orders of genetic interaction. The main idea is to make predictions for unobserved genotypes that match the type and extent of epistasis found in the observed data. This information on the type and extent of epistasis can be extracted by considering how phenotypic correlations change as a function of mutational distance, which is equivalent to estimating the fraction of phenotypic variance due to each order of genetic interaction (additive, pairwise, three-way, etc.). Using these estimated variance components, we then define an empirical Bayes prior that in expectation matches the observed pattern of epistasis and reconstruct the genotype–phenotype mapping by conducting Gaussian process regression under this prior. To demonstrate the power of this approach, we present an application to the antibody-binding domain GB1 and also provide a detailed exploration of a dataset consisting of high-throughput measurements for the splicing efficiency of human pre-mRNA 5′ splice sites, for which we also validate our model predictions via additional low-throughput experiments.
Collapse
|
49
|
Srivastava M, Payne JL. On the incongruence of genotype-phenotype and fitness landscapes. PLoS Comput Biol 2022; 18:e1010524. [PMID: 36121840 PMCID: PMC9521842 DOI: 10.1371/journal.pcbi.1010524] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 09/29/2022] [Accepted: 08/30/2022] [Indexed: 11/22/2022] Open
Abstract
The mapping from genotype to phenotype to fitness typically involves multiple nonlinearities that can transform the effects of mutations. For example, mutations may contribute additively to a phenotype, but their effects on fitness may combine non-additively because selection favors a low or intermediate value of that phenotype. This can cause incongruence between the topographical properties of a fitness landscape and its underlying genotype-phenotype landscape. Yet, genotype-phenotype landscapes are often used as a proxy for fitness landscapes to study the dynamics and predictability of evolution. Here, we use theoretical models and empirical data on transcription factor-DNA interactions to systematically study the incongruence of genotype-phenotype and fitness landscapes when selection favors a low or intermediate phenotypic value. Using the theoretical models, we prove a number of fundamental results. For example, selection for low or intermediate phenotypic values does not change simple sign epistasis into reciprocal sign epistasis, implying that genotype-phenotype landscapes with only simple sign epistasis motifs will always give rise to single-peaked fitness landscapes under such selection. More broadly, we show that such selection tends to create fitness landscapes that are more rugged than the underlying genotype-phenotype landscape, but this increased ruggedness typically does not frustrate adaptive evolution because the local adaptive peaks in the fitness landscape tend to be nearly as tall as the global peak. Many of these results carry forward to the empirical genotype-phenotype landscapes, which may help to explain why low- and intermediate-affinity transcription factor-DNA interactions are so prevalent in eukaryotic gene regulation.
Collapse
Affiliation(s)
- Malvika Srivastava
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Joshua L. Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
50
|
Zhou Y, Tremmel R, Schaeffeler E, Schwab M, Lauschke VM. Challenges and opportunities associated with rare-variant pharmacogenomics. Trends Pharmacol Sci 2022; 43:852-865. [PMID: 36008164 DOI: 10.1016/j.tips.2022.07.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 06/15/2022] [Accepted: 07/29/2022] [Indexed: 12/26/2022]
Abstract
Recent advances in next-generation sequencing (NGS) have resulted in the identification of tens of thousands of rare pharmacogenetic variations with unknown functional effects. However, although such pharmacogenetic variations have been estimated to account for a considerable amount of the heritable variability in drug response and toxicity, accurate interpretation at the level of the individual patient remains challenging. We discuss emerging strategies and concepts to close this translational gap. We illustrate how massively parallel experimental assays, artificial intelligence (AI), and machine learning can synergize with population-scale biobank projects to facilitate the interpretation of NGS data to individualize clinical decision-making and personalized medicine.
Collapse
Affiliation(s)
- Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Roman Tremmel
- Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany; University of Tübingen, Tübingen, Germany
| | - Elke Schaeffeler
- Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany; University of Tübingen, Tübingen, Germany; Cluster of Excellence iFIT (EXC2180) Image-Guided and Functionally Instructed Tumor Therapies, University of Tübingen, Tübingen, Germany
| | - Matthias Schwab
- Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany; Cluster of Excellence iFIT (EXC2180) Image-Guided and Functionally Instructed Tumor Therapies, University of Tübingen, Tübingen, Germany; Department of Clinical Pharmacology, and Department of Biochemistry and Pharmacy, University of Tübingen, Tübingen, Germany
| | - Volker M Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, 171 77 Stockholm, Sweden; Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany; University of Tübingen, Tübingen, Germany.
| |
Collapse
|