1
|
Diaz-Colunga J, Skwara A, Vila JCC, Bajic D, Sanchez A. Global epistasis and the emergence of function in microbial consortia. Cell 2024; 187:3108-3119.e30. [PMID: 38776921 DOI: 10.1016/j.cell.2024.04.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 12/06/2023] [Accepted: 04/16/2024] [Indexed: 05/25/2024]
Abstract
The many functions of microbial communities emerge from a complex web of interactions between organisms and their environment. This poses a significant obstacle to engineering microbial consortia, hindering our ability to harness the potential of microorganisms for biotechnological applications. In this study, we demonstrate that the collective effect of ecological interactions between microbes in a community can be captured by simple statistical models that predict how adding a new species to a community will affect its function. These predictive models mirror the patterns of global epistasis reported in genetics, and they can be quantitatively interpreted in terms of pairwise interactions between community members. Our results illuminate an unexplored path to quantitatively predicting the function of microbial consortia from their composition, paving the way to optimizing desirable community properties and bringing the tasks of predicting biological function at the genetic, organismal, and ecological scales under the same quantitative formalism.
Collapse
Affiliation(s)
- Juan Diaz-Colunga
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Microbial Biotechnology, National Center for Biotechnology CNB-CSIC, 28049 Madrid, Spain; Institute of Functional Biology and Genomics IBFG-CSIC, University of Salamanca, 37007 Salamanca, Spain.
| | - Abigail Skwara
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA
| | - Jean C C Vila
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Djordje Bajic
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Biotechnology, Delft University of Technology, Delft 2628 CD, the Netherlands.
| | - Alvaro Sanchez
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Microbial Biotechnology, National Center for Biotechnology CNB-CSIC, 28049 Madrid, Spain; Institute of Functional Biology and Genomics IBFG-CSIC, University of Salamanca, 37007 Salamanca, Spain.
| |
Collapse
|
2
|
Liu Z, Gillis TG, Raman S, Cui Q. A parameterized two-domain thermodynamic model explains diverse mutational effects on protein allostery. eLife 2024; 12:RP92262. [PMID: 38836839 PMCID: PMC11152574 DOI: 10.7554/elife.92262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024] Open
Abstract
New experimental findings continue to challenge our understanding of protein allostery. Recent deep mutational scanning study showed that allosteric hotspots in the tetracycline repressor (TetR) and its homologous transcriptional factors are broadly distributed rather than spanning well-defined structural pathways as often assumed. Moreover, hotspot mutation-induced allostery loss was rescued by distributed additional mutations in a degenerate fashion. Here, we develop a two-domain thermodynamic model for TetR, which readily rationalizes these intriguing observations. The model accurately captures the in vivo activities of various mutants with changes in physically transparent parameters, allowing the data-based quantification of mutational effects using statistical inference. Our analysis reveals the intrinsic connection of intra- and inter-domain properties for allosteric regulation and illustrate epistatic interactions that are consistent with structural features of the protein. The insights gained from this study into the nature of two-domain allostery are expected to have broader implications for other multi-domain allosteric proteins.
Collapse
Affiliation(s)
- Zhuang Liu
- Department of Physics, Boston UniversityBostonUnited States
| | - Thomas G Gillis
- Department of Biochemistry, University of WisconsinMadisonUnited States
| | - Srivatsan Raman
- Department of Biochemistry, University of WisconsinMadisonUnited States
- Department of Chemistry, University of WisconsinMadisonUnited States
- Department of Bacteriology, University of WisconsinMadisonUnited States
| | - Qiang Cui
- Department of Physics, Boston UniversityBostonUnited States
- Department of Chemistry, Boston UniversityBostonUnited States
| |
Collapse
|
3
|
Seitz EE, McCandlish DM, Kinney JB, Koo PK. Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.14.567120. [PMID: 38013993 PMCID: PMC10680760 DOI: 10.1101/2023.11.14.567120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Deep neural networks (DNNs) have greatly advanced the ability to predict genome function from sequence. Interpreting genomic DNNs in terms of biological mechanisms, however, remains difficult. Here we introduce SQUID, a genomic DNN interpretability framework based on surrogate modeling. SQUID approximates genomic DNNs in user-specified regions of sequence space using surrogate models, i.e., simpler models that are mechanistically interpretable. Importantly, SQUID removes the confounding effects that nonlinearities and heteroscedastic noise in functional genomics data can have on model interpretation. Benchmarking analysis on multiple genomic DNNs shows that SQUID, when compared to established interpretability methods, identifies motifs that are more consistent across genomic loci and yields improved single-nucleotide variant-effect predictions. SQUID also supports surrogate models that quantify epistatic interactions within and between cis-regulatory elements. SQUID thus advances the ability to mechanistically interpret genomic DNNs.
Collapse
Affiliation(s)
- Evan E Seitz
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
4
|
Xi C, Diao J, Moon TS. Advances in ligand-specific biosensing for structurally similar molecules. Cell Syst 2023; 14:1024-1043. [PMID: 38128482 PMCID: PMC10751988 DOI: 10.1016/j.cels.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/23/2023] [Accepted: 10/19/2023] [Indexed: 12/23/2023]
Abstract
The specificity of biological systems makes it possible to develop biosensors targeting specific metabolites, toxins, and pollutants in complex medical or environmental samples without interference from structurally similar compounds. For the last two decades, great efforts have been devoted to creating proteins or nucleic acids with novel properties through synthetic biology strategies. Beyond augmenting biocatalytic activity, expanding target substrate scopes, and enhancing enzymes' enantioselectivity and stability, an increasing research area is the enhancement of molecular specificity for genetically encoded biosensors. Here, we summarize recent advances in the development of highly specific biosensor systems and their essential applications. First, we describe the rational design principles required to create libraries containing potential mutants with less promiscuity or better specificity. Next, we review the emerging high-throughput screening techniques to engineer biosensing specificity for the desired target. Finally, we examine the computer-aided evaluation and prediction methods to facilitate the construction of ligand-specific biosensors.
Collapse
Affiliation(s)
- Chenggang Xi
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Jinjin Diao
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Tae Seok Moon
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA; Division of Biology and Biomedical Sciences, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
5
|
Diaz-Colunga J, Sanchez A, Ogbunugafor CB. Environmental modulation of global epistasis in a drug resistance fitness landscape. Nat Commun 2023; 14:8055. [PMID: 38052815 DOI: 10.1038/s41467-023-43806-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 11/21/2023] [Indexed: 12/07/2023] Open
Abstract
Interactions between mutations (epistasis) can add substantial complexity to genotype-phenotype maps, hampering our ability to predict evolution. Yet, recent studies have shown that the fitness effect of a mutation can often be predicted from the fitness of its genetic background using simple, linear relationships. This phenomenon, termed global epistasis, has been leveraged to reconstruct fitness landscapes and infer adaptive trajectories in a wide variety of contexts. However, little attention has been paid to how patterns of global epistasis may be affected by environmental variation, despite this variation frequently being a major driver of evolution. This is particularly relevant for the evolution of drug resistance, where antimicrobial drugs may change the environment faced by pathogens and shape their adaptive trajectories in ways that can be difficult to predict. By analyzing a fitness landscape of four mutations in a gene encoding an essential enzyme of P. falciparum (a parasite cause of malaria), here we show that patterns of global epistasis can be strongly modulated by the concentration of a drug in the environment. Expanding on previous theoretical results, we demonstrate that this modulation can be quantitatively explained by how specific gene-by-gene interactions are modified by drug dose. Importantly, our results highlight the need to incorporate potential environmental variation into the global epistasis framework in order to predict adaptation in dynamic environments.
Collapse
Affiliation(s)
- Juan Diaz-Colunga
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT, 06511, USA.
- Department of Microbial Biotechnology, Spanish National Center for Biotechnology CNB-CSIC, 28049, Madrid, Spain.
- Institute of Functional Biology and Genomics IBFG-CSIC, University of Salamanca, 37007, Salamanca, Spain.
| | - Alvaro Sanchez
- Department of Microbial Biotechnology, Spanish National Center for Biotechnology CNB-CSIC, 28049, Madrid, Spain.
- Institute of Functional Biology and Genomics IBFG-CSIC, University of Salamanca, 37007, Salamanca, Spain.
| | - C Brandon Ogbunugafor
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT, 06511, USA.
- Santa Fe Institute, Santa Fe, NM, 87501, USA.
| |
Collapse
|
6
|
Johnson MS, Reddy G, Desai MM. Epistasis and evolution: recent advances and an outlook for prediction. BMC Biol 2023; 21:120. [PMID: 37226182 PMCID: PMC10206586 DOI: 10.1186/s12915-023-01585-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 03/30/2023] [Indexed: 05/26/2023] Open
Abstract
As organisms evolve, the effects of mutations change as a result of epistatic interactions with other mutations accumulated along the line of descent. This can lead to shifts in adaptability or robustness that ultimately shape subsequent evolution. Here, we review recent advances in measuring, modeling, and predicting epistasis along evolutionary trajectories, both in microbial cells and single proteins. We focus on simple patterns of global epistasis that emerge in this data, in which the effects of mutations can be predicted by a small number of variables. The emergence of these patterns offers promise for efforts to model epistasis and predict evolution.
Collapse
Affiliation(s)
- Milo S Johnson
- Department of Integrative Biology, University of California, Berkeley, CA, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Gautam Reddy
- Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology and Department of Physics, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
7
|
Nikolados EM, Oyarzún DA. Deep learning for optimization of protein expression. Curr Opin Biotechnol 2023; 81:102941. [PMID: 37087839 DOI: 10.1016/j.copbio.2023.102941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 02/02/2023] [Accepted: 03/17/2023] [Indexed: 04/25/2023]
Abstract
Recent progress in high-throughput DNA synthesis and sequencing has enabled the development of massively parallel reporter assays for strain characterization. These datasets map a large number of DNA sequences to protein expression levels, sparking increased interest in data-driven methods for sequence-to-expression modeling. Here, we highlight advances in deep learning models of protein expression and their potential for optimizing strains engineered to produce recombinant proteins. We review recent works that built highly accurate models and discuss challenges that hinder adoption by end users. There is a need to better align this technology with the constraints encountered in strain engineering, particularly the cost of acquiring large amounts of data and the requirement for interpretable models that generalize beyond the training data. Overcoming these barriers will help to incentivize academic and industrial laboratories to tap into a new era of data-centric strain engineering.
Collapse
Affiliation(s)
| | - Diego A Oyarzún
- School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JH, UK; School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK; The Alan Turing Institute, London NW1 2DB, UK.
| |
Collapse
|
8
|
Johansson KE, Lindorff-Larsen K, Winther JR. Global Analysis of Multi-Mutants to Improve Protein Function. J Mol Biol 2023; 435:168034. [PMID: 36863661 DOI: 10.1016/j.jmb.2023.168034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/04/2023]
Abstract
The identification of amino acid substitutions that both enhance the stability and function of a protein is a key challenge in protein engineering. Technological advances have enabled assaying thousands of protein variants in a single high-throughput experiment, and more recent studies use such data in protein engineering. We present a Global Multi-Mutant Analysis (GMMA) that exploits the presence of multiply-substituted variants to identify individual amino acid substitutions that are beneficial for the stability and function across a large library of protein variants. We have applied GMMA to a previously published experiment reporting on >54,000 variants of green fluorescent protein (GFP), each with known fluorescence output, and each carrying 1-15 amino acid substitutions (Sarkisyan et al., 2016). The GMMA method achieves a good fit to this dataset while being analytically transparent. We show experimentally that the six top-ranking substitutions progressively enhance GFP. More broadly, using only a single experiment as input our analysis recovers nearly all the substitutions previously reported to be beneficial for GFP folding and function. In conclusion, we suggest that large libraries of multiply-substituted variants may provide a unique source of information for protein engineering.
Collapse
Affiliation(s)
- Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology of (University of Copenhagen), Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology of (University of Copenhagen), Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Jakob R Winther
- Linderstrøm-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology of (University of Copenhagen), Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
9
|
Sanchez A, Bajic D, Diaz-Colunga J, Skwara A, Vila JCC, Kuehn S. The community-function landscape of microbial consortia. Cell Syst 2023; 14:122-134. [PMID: 36796331 DOI: 10.1016/j.cels.2022.12.011] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 10/17/2022] [Accepted: 12/21/2022] [Indexed: 02/17/2023]
Abstract
Quantitatively linking the composition and function of microbial communities is a major aspiration of microbial ecology. Microbial community functions emerge from a complex web of molecular interactions between cells, which give rise to population-level interactions among strains and species. Incorporating this complexity into predictive models is highly challenging. Inspired by a similar problem in genetics of predicting quantitative phenotypes from genotypes, an ecological community-function (or structure-function) landscape could be defined that maps community composition and function. In this piece, we present an overview of our current understanding of these community landscapes, their uses, limitations, and open questions. We argue that exploiting the parallels between both landscapes could bring powerful predictive methodologies from evolution and genetics into ecology, providing a boost to our ability to engineer and optimize microbial consortia.
Collapse
Affiliation(s)
- Alvaro Sanchez
- Department of Ecology & Evolutionary Biology & Microbial Sciences Institute, Yale University, New Haven, CT, USA; Department of Microbial Biotechnology, CNB-CSIC, Campus de Cantoblanco, Madrid, Spain.
| | - Djordje Bajic
- Department of Ecology & Evolutionary Biology & Microbial Sciences Institute, Yale University, New Haven, CT, USA
| | - Juan Diaz-Colunga
- Department of Ecology & Evolutionary Biology & Microbial Sciences Institute, Yale University, New Haven, CT, USA
| | - Abigail Skwara
- Department of Ecology & Evolutionary Biology & Microbial Sciences Institute, Yale University, New Haven, CT, USA
| | - Jean C C Vila
- Department of Ecology & Evolutionary Biology & Microbial Sciences Institute, Yale University, New Haven, CT, USA
| | - Seppe Kuehn
- Center for the Physics of Evolving Systems, The Unviersity of Chicago, Chicago, IL, USA; Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| |
Collapse
|
10
|
Tack DS, Tonner PD, Pressman A, Olson ND, Levy SF, Romantseva EF, Alperovich N, Vasilyeva O, Ross D. Precision engineering of biological function with large-scale measurements and machine learning. PLoS One 2023; 18:e0283548. [PMID: 36989327 PMCID: PMC10057847 DOI: 10.1371/journal.pone.0283548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 03/11/2023] [Indexed: 03/30/2023] Open
Abstract
As synthetic biology expands and accelerates into real-world applications, methods for quantitatively and precisely engineering biological function become increasingly relevant. This is particularly true for applications that require programmed sensing to dynamically regulate gene expression in response to stimuli. However, few methods have been described that can engineer biological sensing with any level of quantitative precision. Here, we present two complementary methods for precision engineering of genetic sensors: in silico selection and machine-learning-enabled forward engineering. Both methods use a large-scale genotype-phenotype dataset to identify DNA sequences that encode sensors with quantitatively specified dose response. First, we show that in silico selection can be used to engineer sensors with a wide range of dose-response curves. To demonstrate in silico selection for precise, multi-objective engineering, we simultaneously tune a genetic sensor's sensitivity (EC50) and saturating output to meet quantitative specifications. In addition, we engineer sensors with inverted dose-response and specified EC50. Second, we demonstrate a machine-learning-enabled approach to predictively engineer genetic sensors with mutation combinations that are not present in the large-scale dataset. We show that the interpretable machine learning results can be combined with a biophysical model to engineer sensors with improved inverted dose-response curves.
Collapse
Affiliation(s)
- Drew S Tack
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Peter D Tonner
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Abe Pressman
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Nathan D Olson
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Sasha F Levy
- SLAC National Accelerator Laboratory, Menlo Park, CA, United States of America
- Joint Initiative for Metrology in Biology, Stanford, CA, United States of America
| | - Eugenia F Romantseva
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Nina Alperovich
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Olga Vasilyeva
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - David Ross
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| |
Collapse
|