1
|
Faure AJ, Martí-Aranda A, Hidalgo-Carcedo C, Beltran A, Schmiedel JM, Lehner B. The genetic architecture of protein stability. Nature 2024:10.1038/s41586-024-07966-0. [PMID: 39322666 DOI: 10.1038/s41586-024-07966-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 08/20/2024] [Indexed: 09/27/2024]
Abstract
There are more ways to synthesize a 100-amino acid (aa) protein (20100) than there are atoms in the universe. Only a very small fraction of such a vast sequence space can ever be experimentally or computationally surveyed. Deep neural networks are increasingly being used to navigate high-dimensional sequence spaces1. However, these models are extremely complicated. Here, by experimentally sampling from sequence spaces larger than 1010, we show that the genetic architecture of at least some proteins is remarkably simple, allowing accurate genetic prediction in high-dimensional sequence spaces with fully interpretable energy models. These models capture the nonlinear relationships between free energies and phenotypes but otherwise consist of additive free energy changes with a small contribution from pairwise energetic couplings. These energetic couplings are sparse and associated with structural contacts and backbone proximity. Our results indicate that protein genetics is actually both rather simple and intelligible.
Collapse
Affiliation(s)
- Andre J Faure
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- ALLOX, Barcelona, Spain.
| | - Aina Martí-Aranda
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Cristina Hidalgo-Carcedo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Antoni Beltran
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Jörn M Schmiedel
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- factorize.bio, Berlin, Germany
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
2
|
Park Y, Metzger BPH, Thornton JW. The simplicity of protein sequence-function relationships. Nat Commun 2024; 15:7953. [PMID: 39261454 PMCID: PMC11390738 DOI: 10.1038/s41467-024-51895-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 08/20/2024] [Indexed: 09/13/2024] Open
Abstract
How complex are the rules by which a protein's sequence determines its function? High-order epistatic interactions among residues are thought to be pervasive, suggesting an idiosyncratic and unpredictable sequence-function relationship. But many prior studies may have overestimated epistasis, because they analyzed sequence-function relationships relative to a single reference sequence-which causes measurement noise and local idiosyncrasies to snowball into high-order epistasis-or they did not fully account for global nonlinearities. Here we present a reference-free method that jointly infers specific epistatic interactions and global nonlinearity using a bird's-eye view of sequence space. This technique yields the simplest explanation of sequence-function relationships and is more robust than existing methods to measurement noise, missing data, and model misspecification. We reanalyze 20 experimental datasets and find that context-independent amino acid effects and pairwise interactions, along with a simple nonlinearity to account for limited dynamic range, explain a median of 96% of phenotypic variance and over 92% in every case. Only a tiny fraction of genotypes are strongly affected by higher-order epistasis. Sequence-function relationships are also sparse: a miniscule fraction of amino acids and interactions account for 90% of phenotypic variance. Sequence-function causality across these datasets is therefore simple, opening the way for tractable approaches to characterize proteins' genetic architecture.
Collapse
Affiliation(s)
- Yeonwoo Park
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL, USA
- Center for RNA Research, Institute for Basic Science, Seoul, Republic of Korea
| | - Brian P H Metzger
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Joseph W Thornton
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA.
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
3
|
Welsh FC, Eguia RT, Lee JM, Haddox HK, Galloway J, Van Vinh Chau N, Loes AN, Huddleston J, Yu TC, Quynh Le M, Nhat NTD, Thi Le Thanh N, Greninger AL, Chu HY, Englund JA, Bedford T, Matsen FA, Boni MF, Bloom JD. Age-dependent heterogeneity in the antigenic effects of mutations to influenza hemagglutinin. Cell Host Microbe 2024; 32:1397-1411.e11. [PMID: 39032493 PMCID: PMC11329357 DOI: 10.1016/j.chom.2024.06.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 04/19/2024] [Accepted: 06/25/2024] [Indexed: 07/23/2024]
Abstract
Human influenza virus evolves to escape neutralization by polyclonal antibodies. However, we have a limited understanding of how the antigenic effects of viral mutations vary across the human population and how this heterogeneity affects virus evolution. Here, we use deep mutational scanning to map how mutations to the hemagglutinin (HA) proteins of two H3N2 strains, A/Hong Kong/45/2019 and A/Perth/16/2009, affect neutralization by serum from individuals of a variety of ages. The effects of HA mutations on serum neutralization differ across age groups in ways that can be partially rationalized in terms of exposure histories. Mutations that were fixed in influenza variants after 2020 cause greater escape from sera from younger individuals compared with adults. Overall, these results demonstrate that influenza faces distinct antigenic selection regimes from different age groups and suggest approaches to understand how this heterogeneous selection shapes viral evolution.
Collapse
MESH Headings
- Humans
- Hemagglutinin Glycoproteins, Influenza Virus/genetics
- Hemagglutinin Glycoproteins, Influenza Virus/immunology
- Influenza A Virus, H3N2 Subtype/genetics
- Influenza A Virus, H3N2 Subtype/immunology
- Mutation
- Adult
- Antibodies, Viral/immunology
- Antibodies, Viral/blood
- Influenza, Human/virology
- Influenza, Human/immunology
- Age Factors
- Middle Aged
- Young Adult
- Antibodies, Neutralizing/immunology
- Antibodies, Neutralizing/blood
- Antigens, Viral/genetics
- Antigens, Viral/immunology
- Adolescent
- Evolution, Molecular
- Aged
- Child
Collapse
Affiliation(s)
- Frances C Welsh
- Molecular and Cellular Biology Graduate Program, University of Washington, and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, WA 98109, USA; Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Rachel T Eguia
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Howard Hughes Medical Institute, Seattle, WA 98109, USA
| | - Juhye M Lee
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Howard Hughes Medical Institute, Seattle, WA 98109, USA
| | - Hugh K Haddox
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Jared Galloway
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Nguyen Van Vinh Chau
- Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
| | - Andrea N Loes
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Howard Hughes Medical Institute, Seattle, WA 98109, USA
| | - John Huddleston
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Timothy C Yu
- Molecular and Cellular Biology Graduate Program, University of Washington, and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, WA 98109, USA; Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Mai Quynh Le
- National Institutes for Hygiene and Epidemiology, Hanoi, Vietnam
| | - Nguyen T D Nhat
- Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam; Centre for Tropical Medicine, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK
| | - Nguyen Thi Le Thanh
- Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
| | - Alexander L Greninger
- Department of Laboratory Medicine and Pathology, University of Washington School of Medicine, Seattle, WA 98195, USA; Division of Allergy and Infectious Diseases, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Helen Y Chu
- Division of Allergy and Infectious Diseases, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Janet A Englund
- Seattle Children's Research Institute, Seattle, WA 98109, USA
| | - Trevor Bedford
- Howard Hughes Medical Institute, Seattle, WA 98109, USA; Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Frederick A Matsen
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Howard Hughes Medical Institute, Seattle, WA 98109, USA
| | - Maciej F Boni
- Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam; Centre for Tropical Medicine, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK; Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Howard Hughes Medical Institute, Seattle, WA 98109, USA.
| |
Collapse
|
4
|
Chamness LM, Kuntz CP, McKee AG, Penn WD, Hemmerich CM, Rusch DB, Woods H, Dyotima, Meiler J, Schlebach JP. Divergent folding-mediated epistasis among unstable membrane protein variants. eLife 2024; 12:RP92406. [PMID: 39078397 PMCID: PMC11288631 DOI: 10.7554/elife.92406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/31/2024] Open
Abstract
Many membrane proteins are prone to misfolding, which compromises their functional expression at the plasma membrane. This is particularly true for the mammalian gonadotropin-releasing hormone receptor GPCRs (GnRHR). We recently demonstrated that evolutionary GnRHR modifications appear to have coincided with adaptive changes in cotranslational folding efficiency. Though protein stability is known to shape evolution, it is unclear how cotranslational folding constraints modulate the synergistic, epistatic interactions between mutations. We therefore compared the pairwise interactions formed by mutations that disrupt the membrane topology (V276T) or tertiary structure (W107A) of GnRHR. Using deep mutational scanning, we evaluated how the plasma membrane expression of these variants is modified by hundreds of secondary mutations. An analysis of 251 mutants in three genetic backgrounds reveals that V276T and W107A form distinct epistatic interactions that depend on both the severity and the mechanism of destabilization. V276T forms predominantly negative epistatic interactions with destabilizing mutations in soluble loops. In contrast, W107A forms positive interactions with mutations in both loops and transmembrane domains that reflect the diminishing impacts of the destabilizing mutations in variants that are already unstable. These findings reveal how epistasis is remodeled by conformational defects in membrane proteins and in unstable proteins more generally.
Collapse
Affiliation(s)
- Laura M Chamness
- Department of Chemistry, Indiana UniversityBloomingtonUnited States
| | - Charles P Kuntz
- The James Tarpo Jr. and Margaret Tarpo Department of Chemistry, Purdue UniversityWest LafayetteUnited States
| | - Andrew G McKee
- Department of Chemistry, Indiana UniversityBloomingtonUnited States
| | - Wesley D Penn
- Department of Chemistry, Indiana UniversityBloomingtonUnited States
| | | | - Douglas B Rusch
- Center for Genomics and Bioinformatics, Indiana UniversityBloomingtonUnited States
| | - Hope Woods
- Department of Chemistry, Vanderbilt UniversityNashvilleUnited States
- Chemical and Physical Biology Program, Vanderbilt UniversityNashvilleUnited States
| | - Dyotima
- Department of Chemistry, Indiana UniversityBloomingtonUnited States
| | - Jens Meiler
- Department of Chemistry, Vanderbilt UniversityNashvilleUnited States
- Institute for Drug Discovery, Leipzig UniversityLeipzigGermany
| | - Jonathan P Schlebach
- The James Tarpo Jr. and Margaret Tarpo Department of Chemistry, Purdue UniversityWest LafayetteUnited States
| |
Collapse
|
5
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.593772. [PMID: 38798671 PMCID: PMC11118547 DOI: 10.1101/2024.05.12.593772] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Justin B. Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
6
|
Diaz-Colunga J, Skwara A, Vila JCC, Bajic D, Sanchez A. Global epistasis and the emergence of function in microbial consortia. Cell 2024; 187:3108-3119.e30. [PMID: 38776921 DOI: 10.1016/j.cell.2024.04.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 12/06/2023] [Accepted: 04/16/2024] [Indexed: 05/25/2024]
Abstract
The many functions of microbial communities emerge from a complex web of interactions between organisms and their environment. This poses a significant obstacle to engineering microbial consortia, hindering our ability to harness the potential of microorganisms for biotechnological applications. In this study, we demonstrate that the collective effect of ecological interactions between microbes in a community can be captured by simple statistical models that predict how adding a new species to a community will affect its function. These predictive models mirror the patterns of global epistasis reported in genetics, and they can be quantitatively interpreted in terms of pairwise interactions between community members. Our results illuminate an unexplored path to quantitatively predicting the function of microbial consortia from their composition, paving the way to optimizing desirable community properties and bringing the tasks of predicting biological function at the genetic, organismal, and ecological scales under the same quantitative formalism.
Collapse
Affiliation(s)
- Juan Diaz-Colunga
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Microbial Biotechnology, National Center for Biotechnology CNB-CSIC, 28049 Madrid, Spain; Institute of Functional Biology and Genomics IBFG-CSIC, University of Salamanca, 37007 Salamanca, Spain.
| | - Abigail Skwara
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA
| | - Jean C C Vila
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Djordje Bajic
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Biotechnology, Delft University of Technology, Delft 2628 CD, the Netherlands.
| | - Alvaro Sanchez
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Microbial Biotechnology, National Center for Biotechnology CNB-CSIC, 28049 Madrid, Spain; Institute of Functional Biology and Genomics IBFG-CSIC, University of Salamanca, 37007 Salamanca, Spain.
| |
Collapse
|
7
|
Hulse SV, Bruns EL. The Emergence of Non-Linear Evolutionary Trade-offs and the Maintenance of Genetic Polymorphisms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.29.595890. [PMID: 38853830 PMCID: PMC11160725 DOI: 10.1101/2024.05.29.595890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Evolutionary models of quantitative traits often assume trade-offs between beneficial and detrimental traits, requiring modelers to specify a function linking costs to benefits. The choice of trade-off function is often consequential; functions that assume diminishing returns (accelerating costs) typically lead to single equilibrium genotypes, while decelerating costs often lead to evolutionary branching. Despite their importance, we still lack a strong theoretical foundation to base the choice of trade-off function. To address this gap, we explore how trade-off functions can emerge from the genetic architecture of a quantitative trait. We developed a multi-locus model of disease resistance, assuming each locus had random antagonistic pleiotropic effects on resistance and fecundity. We used this model to generate genotype landscapes and explored how additive versus epistatic genetic architectures influenced the shape of the trade-off function. Regardless of epistasis, our model consistently led to accelerating costs. We then used our genotype landscapes to build an evolutionary model of disease resistance. Unlike other models with accelerating costs, our approach often led to genetic polymorphisms at equilibrium. Our results suggest that accelerating costs are a strong null model for evolutionary trade-offs and that the eco-evolutionary conditions required for polymorphism may be more nuanced than previously believed.
Collapse
|
8
|
Metzger BPH, Park Y, Starr TN, Thornton JW. Epistasis facilitates functional evolution in an ancient transcription factor. eLife 2024; 12:RP88737. [PMID: 38767330 PMCID: PMC11105156 DOI: 10.7554/elife.88737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024] Open
Abstract
A protein's genetic architecture - the set of causal rules by which its sequence produces its functions - also determines its possible evolutionary trajectories. Prior research has proposed that the genetic architecture of proteins is very complex, with pervasive epistatic interactions that constrain evolution and make function difficult to predict from sequence. Most of this work has analyzed only the direct paths between two proteins of interest - excluding the vast majority of possible genotypes and evolutionary trajectories - and has considered only a single protein function, leaving unaddressed the genetic architecture of functional specificity and its impact on the evolution of new functions. Here, we develop a new method based on ordinal logistic regression to directly characterize the global genetic determinants of multiple protein functions from 20-state combinatorial deep mutational scanning (DMS) experiments. We use it to dissect the genetic architecture and evolution of a transcription factor's specificity for DNA, using data from a combinatorial DMS of an ancient steroid hormone receptor's capacity to activate transcription from two biologically relevant DNA elements. We show that the genetic architecture of DNA recognition consists of a dense set of main and pairwise effects that involve virtually every possible amino acid state in the protein-DNA interface, but higher-order epistasis plays only a tiny role. Pairwise interactions enlarge the set of functional sequences and are the primary determinants of specificity for different DNA elements. They also massively expand the number of opportunities for single-residue mutations to switch specificity from one DNA target to another. By bringing variants with different functions close together in sequence space, pairwise epistasis therefore facilitates rather than constrains the evolution of new functions.
Collapse
Affiliation(s)
- Brian PH Metzger
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
| | - Yeonwoo Park
- Program in Genetics, Genomics, and Systems Biology, University of ChicagoChicagoUnited States
| | - Tyler N Starr
- Department of Biochemistry and Molecular Biophysics, University of ChicagoChicagoUnited States
| | - Joseph W Thornton
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
- Department of Human Genetics, University of ChicagoChicagoUnited States
| |
Collapse
|
9
|
Behr M, Kumbier K, Cordova-Palomera A, Aguirre M, Ronen O, Ye C, Ashley E, Butte AJ, Arnaout R, Brown B, Priest J, Yu B. Learning epistatic polygenic phenotypes with Boolean interactions. PLoS One 2024; 19:e0298906. [PMID: 38625909 PMCID: PMC11020961 DOI: 10.1371/journal.pone.0298906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 01/31/2024] [Indexed: 04/18/2024] Open
Abstract
Detecting epistatic drivers of human phenotypes is a considerable challenge. Traditional approaches use regression to sequentially test multiplicative interaction terms involving pairs of genetic variants. For higher-order interactions and genome-wide large-scale data, this strategy is computationally intractable. Moreover, multiplicative terms used in regression modeling may not capture the form of biological interactions. Building on the Predictability, Computability, Stability (PCS) framework, we introduce the epiTree pipeline to extract higher-order interactions from genomic data using tree-based models. The epiTree pipeline first selects a set of variants derived from tissue-specific estimates of gene expression. Next, it uses iterative random forests (iRF) to search training data for candidate Boolean interactions (pairwise and higher-order). We derive significance tests for interactions, based on a stabilized likelihood ratio test, by simulating Boolean tree-structured null (no epistasis) and alternative (epistasis) distributions on hold-out test data. Finally, our pipeline computes PCS epistasis p-values that probabilisticly quantify improvement in prediction accuracy via bootstrap sampling on the test set. We validate the epiTree pipeline in two case studies using data from the UK Biobank: predicting red hair and multiple sclerosis (MS). In the case of predicting red hair, epiTree recovers known epistatic interactions surrounding MC1R and novel interactions, representing non-linearities not captured by logistic regression models. In the case of predicting MS, a more complex phenotype than red hair, epiTree rankings prioritize novel interactions surrounding HLA-DRB1, a variant previously associated with MS in several populations. Taken together, these results highlight the potential for epiTree rankings to help reduce the design space for follow up experiments.
Collapse
Affiliation(s)
- Merle Behr
- Faculty of Informatics and Data Science, University of Regensburg, Regensburg, Germany
| | - Karl Kumbier
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, United States of America
| | | | - Matthew Aguirre
- Department of Pediatrics, Stanford Medicine, Stanford, CA, United States of America
- Department of Biomedical Data Science, Stanford Medicine, Stanford, CA, United States of America
| | - Omer Ronen
- Department of Statistics, University of California at Berkeley, Berkeley, CA, United States of America
| | - Chengzhong Ye
- Department of Statistics, University of California at Berkeley, Berkeley, CA, United States of America
| | - Euan Ashley
- Division of Cardiovascular Medicine, Stanford Medicine, Stanford, CA, United States of America
| | - Atul J. Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States of America
| | - Rima Arnaout
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States of America
- Division of Cardiology, Department of Medicine, University of California, San Francisco, San Francisco, CA, United States of America
| | - Ben Brown
- Department of Statistics, University of California at Berkeley, Berkeley, CA, United States of America
- Biosciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - James Priest
- Department of Pediatrics, Stanford Medicine, Stanford, CA, United States of America
| | - Bin Yu
- Department of Statistics, University of California at Berkeley, Berkeley, CA, United States of America
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California at Berkeley, Berkeley, CA, United States of America
| |
Collapse
|
10
|
Park Y, Metzger BP, Thornton JW. The simplicity of protein sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.02.556057. [PMID: 37732229 PMCID: PMC10508729 DOI: 10.1101/2023.09.02.556057] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
How complicated is the genetic architecture of proteins - the set of causal effects by which sequence determines function? High-order epistatic interactions among residues are thought to be pervasive, making a protein's function difficult to predict or understand from its sequence. Most studies, however, used methods that overestimate epistasis, because they analyze genetic architecture relative to a designated reference sequence - causing measurement noise and small local idiosyncrasies to propagate into pervasive high-order interactions - or have not effectively accounted for global nonlinearity in the sequence-function relationship. Here we present a new reference-free method that jointly estimates global nonlinearity and specific epistatic interactions across a protein's entire genotype-phenotype map. This method yields a maximally efficient explanation of a protein's genetic architecture and is more robust than existing methods to measurement noise, partial sampling, and model misspecification. We reanalyze 20 combinatorial mutagenesis experiments from a diverse set of proteins and find that additive and pairwise effects, along with a simple nonlinearity to account for limited dynamic range, explain a median of 96% of total variance in measured phenotypes (and >92% in every case). Only a tiny fraction of genotypes are strongly affected by third- or higher-order epistasis. Genetic architecture is also sparse: the number of terms required to explain the vast majority of variance is smaller than the number of genotypes by many orders of magnitude. The sequence-function relationship in most proteins is therefore far simpler than previously thought, opening the way for new and tractable approaches to characterize it.
Collapse
Affiliation(s)
- Yeonwoo Park
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637
- Current affiliation: Center for RNA Research, Institute for Basic Science, Seoul, Republic of Korea 08826
| | - Brian P.H. Metzger
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637
- Current affiliation: Department of Biological Sciences, Purdue University, West Lafayette, IN 47907
| | - Joseph W. Thornton
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637
- Department of Human Genetics, University of Chicago, Chicago, IL 60637
| |
Collapse
|
11
|
Fannjiang C, Listgarten J. Is Novelty Predictable? Cold Spring Harb Perspect Biol 2024; 16:a041469. [PMID: 38052497 PMCID: PMC10835614 DOI: 10.1101/cshperspect.a041469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Machine learning-based design has gained traction in the sciences, most notably in the design of small molecules, materials, and proteins, with societal applications ranging from drug development and plastic degradation to carbon sequestration. When designing objects to achieve novel property values with machine learning, one faces a fundamental challenge: how to push past the frontier of current knowledge, distilled from the training data into the model, in a manner that rationally controls the risk of failure. If one trusts learned models too much in extrapolation, one is likely to design rubbish. In contrast, if one does not extrapolate, one cannot find novelty. Herein, we ponder how one might strike a useful balance between these two extremes. We focus in particular on designing proteins with novel property values, although much of our discussion is relevant to machine learning-based design more broadly.
Collapse
Affiliation(s)
- Clara Fannjiang
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA
| | - Jennifer Listgarten
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA
| |
Collapse
|
12
|
Dupic T, Phillips AM, Desai MM. Protein sequence landscapes are not so simple: on reference-free versus reference-based inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.29.577800. [PMID: 38352387 PMCID: PMC10862727 DOI: 10.1101/2024.01.29.577800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
In a recent preprint, Park, Metzger, and Thornton reanalyze 20 empirical protein sequence-function landscapes using a "reference-free analysis" (RFA) method they recently developed. They argue that these empirical landscapes are simpler and less epistatic than earlier work suggested, and attribute the difference to limitations of the methods used in the original analyses of these landscapes, which they claim are more sensitive to measurement noise, missing data, and other artifacts. Here, we show that these claims are incorrect. Instead, we find that the RFA method introduced by Park et al. is exactly equivalent to the reference-based least-squares methods used in the original analysis of many of these empirical landscapes (and also equivalent to a Hadamard-based approach they implement). Because the reanalyzed and original landscapes are in fact identical, the different conclusions drawn by Park et al. instead reflect different interpretations of the parameters describing the inferred landscapes; we argue that these do not support the conclusion that epistasis plays only a small role in protein sequence-function landscapes.
Collapse
Affiliation(s)
- Thomas Dupic
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA
| | - Angela M Phillips
- Department of Microbiology and Immunology, University of California San Francisco, San Francisco CA
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA
| |
Collapse
|
13
|
Buda K, Miton CM, Tokuriki N. Pervasive epistasis exposes intramolecular networks in adaptive enzyme evolution. Nat Commun 2023; 14:8508. [PMID: 38129396 PMCID: PMC10739712 DOI: 10.1038/s41467-023-44333-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Accepted: 12/08/2023] [Indexed: 12/23/2023] Open
Abstract
Enzyme evolution is characterized by constant alterations of the intramolecular residue networks supporting their functions. The rewiring of these network interactions can give rise to epistasis. As mutations accumulate, the epistasis observed across diverse genotypes may appear idiosyncratic, that is, exhibit unique effects in different genetic backgrounds. Here, we unveil a quantitative picture of the prevalence and patterns of epistasis in enzyme evolution by analyzing 41 fitness landscapes generated from seven enzymes. We show that >94% of all mutational and epistatic effects appear highly idiosyncratic, which greatly distorted the functional prediction of the evolved enzymes. By examining seemingly idiosyncratic changes in epistasis along adaptive trajectories, we expose several instances of higher-order, intramolecular rewiring. Using complementary structural data, we outline putative molecular mechanisms explaining higher-order epistasis along two enzyme trajectories. Our work emphasizes the prevalence of epistasis and provides an approach to exploring this phenomenon through a molecular lens.
Collapse
Affiliation(s)
- Karol Buda
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada
| | - Charlotte M Miton
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada
| | - Nobuhiko Tokuriki
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada.
| |
Collapse
|
14
|
Eble H, Joswig M, Lamberti L, Ludington WB. Master regulators of biological systems in higher dimensions. Proc Natl Acad Sci U S A 2023; 120:e2300634120. [PMID: 38096409 PMCID: PMC10743376 DOI: 10.1073/pnas.2300634120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 10/23/2023] [Indexed: 12/18/2023] Open
Abstract
A longstanding goal of biology is to identify the key genes and species that critically impact evolution, ecology, and health. Network analysis has revealed keystone species that regulate ecosystems and master regulators that regulate cellular genetic networks. Yet these studies have focused on pairwise biological interactions, which can be affected by the context of genetic background and other species present, generating higher-order interactions. The important regulators of higher-order interactions are unstudied. To address this, we applied a high-dimensional geometry approach that quantifies epistasis in a fitness landscape to ask how individual genes and species influence the interactions in the rest of the biological network. We then generated and also reanalyzed 5-dimensional datasets (two genetic, two microbiome). We identified key genes (e.g., the rbs locus and pykF) and species (e.g., Lactobacilli) that control the interactions of many other genes and species. These higher-order master regulators can induce or suppress evolutionary and ecological diversification by controlling the topography of the fitness landscape. Thus, we provide a method and mathematical justification for exploration of biological networks in higher dimensions.
Collapse
Affiliation(s)
- Holger Eble
- Chair of Discrete Mathematics/Geometry, Technical University Berlin, Berlin10623, Germany
| | - Michael Joswig
- Chair of Discrete Mathematics/Geometry, Technical University Berlin, Berlin10623, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig04103, Germany
| | - Lisa Lamberti
- Department of Biosystems Science and Engineering, Federal Institute of Technology (ETH Zürich), Basel4058, Switzerland
- Swiss Institute of Bioinformatics, Basel4058, Switzerland
| | - William B. Ludington
- Department of Biosphere Sciences and Engineering, Carnegie Institution for Science, Baltimore, MD21218
- Department of Biology, Johns Hopkins University, Baltimore, MD21218
| |
Collapse
|
15
|
Welsh FC, Eguia RT, Lee JM, Haddox HK, Galloway J, Chau NVV, Loes AN, Huddleston J, Yu TC, Le MQ, Nhat NTD, Thanh NTL, Greninger AL, Chu HY, Englund JA, Bedford T, Matsen FA, Boni MF, Bloom JD. Age-dependent heterogeneity in the antigenic effects of mutations to influenza hemagglutinin. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.12.571235. [PMID: 38168237 PMCID: PMC10760046 DOI: 10.1101/2023.12.12.571235] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Human influenza virus evolves to escape neutralization by polyclonal antibodies. However, we have a limited understanding of how the antigenic effects of viral mutations vary across the human population, and how this heterogeneity affects virus evolution. Here we use deep mutational scanning to map how mutations to the hemagglutinin (HA) proteins of the A/Hong Kong/45/2019 (H3N2) and A/Perth/16/2009 (H3N2) strains affect neutralization by serum from individuals of a variety of ages. The effects of HA mutations on serum neutralization differ across age groups in ways that can be partially rationalized in terms of exposure histories. Mutations that fixed in influenza variants after 2020 cause the greatest escape from sera from younger individuals. Overall, these results demonstrate that influenza faces distinct antigenic selection regimes from different age groups, and suggest approaches to understand how this heterogeneous selection shapes viral evolution.
Collapse
Affiliation(s)
- Frances C Welsh
- Molecular and Cellular Biology Graduate Program, University of Washington, and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, WA, 98109, USA
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Rachel T Eguia
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
- Howard Hughes Medical Institute, Seattle, WA, 98109, USA
| | - Juhye M Lee
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
- Howard Hughes Medical Institute, Seattle, WA, 98109, USA
| | - Hugh K Haddox
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Jared Galloway
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Nguyen Van Vinh Chau
- Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
| | - Andrea N Loes
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
- Howard Hughes Medical Institute, Seattle, WA, 98109, USA
| | - John Huddleston
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Timothy C Yu
- Molecular and Cellular Biology Graduate Program, University of Washington, and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, WA, 98109, USA
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Mai Quynh Le
- National Institutes for Hygiene and Epidemiology, Hanoi, Vietnam
| | - Nguyen T D Nhat
- Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
- Centre for Tropical Medicine, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, United Kingdom
| | - Nguyen Thi Le Thanh
- Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
| | - Alexander L Greninger
- Department of Laboratory Medicine and Pathology, University of Washington School of Medicine, Seattle, WA, 98195, USA
- Division of Allergy and Infectious Diseases, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Helen Y Chu
- Division of Allergy and Infectious Diseases, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Janet A Englund
- Seattle Children's Research Institute, Seattle, WA, 98109, USA
| | - Trevor Bedford
- Howard Hughes Medical Institute, Seattle, WA, 98109, USA
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Frederick A Matsen
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
- Howard Hughes Medical Institute, Seattle, WA, 98109, USA
| | - Maciej F Boni
- Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
- Centre for Tropical Medicine, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, United Kingdom
- Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
- Howard Hughes Medical Institute, Seattle, WA, 98109, USA
| |
Collapse
|
16
|
Arya S, George AB, O’Dwyer JP. Sparsity of higher-order landscape interactions enables learning and prediction for microbiomes. Proc Natl Acad Sci U S A 2023; 120:e2307313120. [PMID: 37991947 PMCID: PMC10691334 DOI: 10.1073/pnas.2307313120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 10/16/2023] [Indexed: 11/24/2023] Open
Abstract
Microbiome engineering offers the potential to leverage microbial communities to improve outcomes in human health, agriculture, and climate. To translate this potential into reality, it is crucial to reliably predict community composition and function. But a brute force approach to cataloging community function is hindered by the combinatorial explosion in the number of ways we can combine microbial species. An alternative is to parameterize microbial community outcomes using simplified, mechanistic models, and then extrapolate these models beyond where we have sampled. But these approaches remain data-hungry, as well as requiring an a priori specification of what kinds of mechanisms are included and which are omitted. Here, we resolve both issues by introducing a mechanism-agnostic approach to predicting microbial community compositions and functions using limited data. The critical step is the identification of a sparse representation of the community landscape. We then leverage this sparsity to predict community compositions and functions, drawing from techniques in compressive sensing. We validate this approach on in silico community data, generated from a theoretical model. By sampling just [Formula: see text]1% of all possible communities, we accurately predict community compositions out of sample. We then demonstrate the real-world application of our approach by applying it to four experimental datasets and showing that we can recover interpretable, accurate predictions on composition and community function from highly limited data.
Collapse
Affiliation(s)
- Shreya Arya
- Department of Physics, University of Illinois, Urbana-Champaign, Urbana, IL61801
| | - Ashish B. George
- Center for Artificial Intelligence and Modeling, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL61801
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA0214
- Department of Plant Biology, University of Illinois, Urbana-Champaign, Urbana, IL61801
| | - James P. O’Dwyer
- Center for Artificial Intelligence and Modeling, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL61801
- Department of Plant Biology, University of Illinois, Urbana-Champaign, Urbana, IL61801
| |
Collapse
|
17
|
Santorsola M, Lescai F. The promise of explainable deep learning for omics data analysis: Adding new discovery tools to AI. N Biotechnol 2023; 77:1-11. [PMID: 37329982 DOI: 10.1016/j.nbt.2023.06.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/01/2023] [Accepted: 06/14/2023] [Indexed: 06/19/2023]
Abstract
Deep learning has already revolutionised the way a wide range of data is processed in many areas of daily life. The ability to learn abstractions and relationships from heterogeneous data has provided impressively accurate prediction and classification tools to handle increasingly big datasets. This has a significant impact on the growing wealth of omics datasets, with the unprecedented opportunity for a better understanding of the complexity of living organisms. While this revolution is transforming the way these data are analyzed, explainable deep learning is emerging as an additional tool with the potential to change the way biological data is interpreted. Explainability addresses critical issues such as transparency, so important when computational tools are introduced especially in clinical environments. Moreover, it empowers artificial intelligence with the capability to provide new insights into the input data, thus adding an element of discovery to these already powerful resources. In this review, we provide an overview of the transformative effects explainable deep learning is having on multiple sectors, ranging from genome engineering and genomics, from radiomics to drug design and clinical trials. We offer a perspective to life scientists, to better understand the potential of these tools, and a motivation to implement them in their research, by suggesting learning resources they can use to move their first steps in this field.
Collapse
Affiliation(s)
| | - Francesco Lescai
- Department of Biology and Biotechnology, University of Pavia, Pavia, Italy.
| |
Collapse
|
18
|
Charest N, Shen Y, Lai YC, Chen IA, Shea JE. Discovering pathways through ribozyme fitness landscapes using information theoretic quantification of epistasis. RNA (NEW YORK, N.Y.) 2023; 29:1644-1657. [PMID: 37580126 PMCID: PMC10578471 DOI: 10.1261/rna.079541.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 07/29/2023] [Indexed: 08/16/2023]
Abstract
The identification of catalytic RNAs is typically achieved through primarily experimental means. However, only a small fraction of sequence space can be analyzed even with high-throughput techniques. Methods to extrapolate from a limited data set to predict additional ribozyme sequences, particularly in a human-interpretable fashion, could be useful both for designing new functional RNAs and for generating greater understanding about a ribozyme fitness landscape. Using information theory, we express the effects of epistasis (i.e., deviations from additivity) on a ribozyme. This representation was incorporated into a simple model of the epistatic fitness landscape, which identified potentially exploitable combinations of mutations. We used this model to theoretically predict mutants of high activity for a self-aminoacylating ribozyme, identifying potentially active triple and quadruple mutants beyond the experimental data set of single and double mutants. The predictions were validated experimentally, with nine out of nine sequences being accurately predicted to have high activity. This set of sequences included mutants that form a previously unknown evolutionary "bridge" between two ribozyme families that share a common motif. Individual steps in the method could be examined, understood, and guided by a human, combining interpretability and performance in a simple model to predict ribozyme sequences by extrapolation.
Collapse
Affiliation(s)
- Nathaniel Charest
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
| | - Yuning Shen
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
| | - Yei-Chen Lai
- Department of Chemistry, National Chung Hsing University, Taichung City 40227, Taiwan
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, California 90095, USA
| | - Irene A Chen
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, California 90095, USA
| | - Joan-Emma Shea
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
| |
Collapse
|
19
|
Yitbarek S, Guittar J, Knutie SA, Ogbunugafor CB. Deconstructing taxa x taxa xenvironment interactions in the microbiota: A theoretical examination. iScience 2023; 26:107875. [PMID: 37860776 PMCID: PMC10583047 DOI: 10.1016/j.isci.2023.107875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 03/21/2023] [Accepted: 09/07/2023] [Indexed: 10/21/2023] Open
Abstract
A major objective of microbial ecology is to identify how the composition of microbial taxa shapes host phenotypes. However, most studies focus on pairwise interactions and ignore the potentially significant effects of higher-order microbial interactions.Here, we quantify the effects of higher-order interactions among taxa on host infection risk. We apply our approach to an in silico dataset that is built to resemble a population of insect hosts with gut-associated microbial communities at risk of infection from an intestinal parasite across a breadth of nutrient environmental contexts.We find that the effect of higher-order interactions is considerable and can change appreciably across environmental contexts. Furthermore, we show that higher-order interactions can stabilize community structure thereby reducing host susceptibility to parasite invasion.Our approach illustrates how incorporating the effects of higher-order interactions among gut microbiota across environments can be essential for understanding their effects on host phenotypes.
Collapse
Affiliation(s)
- Senay Yitbarek
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - John Guittar
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI 48824, USA
- Kellogg Biological Station, Michigan State University, Hickory Corners, MI 49060, USA
| | - Sarah A. Knutie
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - C. Brandon Ogbunugafor
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Santa Fe Institute, Santa Fe, NM 87501, USA
- Vermont Complex Systems Center, University of Vermont, Burlington, VT 05405, USA
| |
Collapse
|
20
|
Haddox HK, Galloway JG, Dadonaite B, Bloom JD, Matsen IV FA, DeWitt WS. Jointly modeling deep mutational scans identifies shifted mutational effects among SARS-CoV-2 spike homologs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.31.551037. [PMID: 37577604 PMCID: PMC10418112 DOI: 10.1101/2023.07.31.551037] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Deep mutational scanning (DMS) is a high-throughput experimental technique that measures the effects of thousands of mutations to a protein. These experiments can be performed on multiple homologs of a protein or on the same protein selected under multiple conditions. It is often of biological interest to identify mutations with shifted effects across homologs or conditions. However, it is challenging to determine if observed shifts arise from biological signal or experimental noise. Here, we describe a method for jointly inferring mutational effects across multiple DMS experiments while also identifying mutations that have shifted in their effects among experiments. A key aspect of our method is to regularize the inferred shifts, so that they are nonzero only when strongly supported by the data. We apply this method to DMS experiments that measure how mutations to spike proteins from SARS-CoV-2 variants (Delta, Omicron BA.1, and Omicron BA.2) affect cell entry. Most mutational effects are conserved between these spike homologs, but a fraction have markedly shifted. We experimentally validate a subset of the mutations inferred to have shifted effects, and confirm differences of > 1,000-fold in the impact of the same mutation on spike-mediated viral infection across spikes from different SARS-CoV-2 variants. Overall, our work establishes a general approach for comparing sets of DMS experiments to identify biologically important shifts in mutational effects.
Collapse
Affiliation(s)
- Hugh K. Haddox
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Jared G. Galloway
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Bernadeta Dadonaite
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Jesse D. Bloom
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, Seattle, WA 98109, USA
| | - Frederick A. Matsen IV
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Howard Hughes Medical Institute, Seattle, WA 98109, USA
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - William S. DeWitt
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
21
|
Radford CE, Schommers P, Gieselmann L, Crawford KHD, Dadonaite B, Yu TC, Dingens AS, Overbaugh J, Klein F, Bloom JD. Mapping the neutralizing specificity of human anti-HIV serum by deep mutational scanning. Cell Host Microbe 2023; 31:1200-1215.e9. [PMID: 37327779 PMCID: PMC10351223 DOI: 10.1016/j.chom.2023.05.025] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/15/2023] [Accepted: 05/23/2023] [Indexed: 06/18/2023]
Abstract
Understanding the specificities of human serum antibodies that broadly neutralize HIV can inform prevention and treatment strategies. Here, we describe a deep mutational scanning system that can measure the effects of combinations of mutations to HIV envelope (Env) on neutralization by antibodies and polyclonal serum. We first show that this system can accurately map how all functionally tolerated mutations to Env affect neutralization by monoclonal antibodies. We then comprehensively map Env mutations that affect neutralization by a set of human polyclonal sera that neutralize diverse strains of HIV and target the site engaging the host receptor CD4. The neutralizing activities of these sera target different epitopes, with most sera having specificities reminiscent of individual characterized monoclonal antibodies, but one serum targeting two epitopes within the CD4-binding site. Mapping the specificity of the neutralizing activity in polyclonal human serum will aid in assessing anti-HIV immune responses to inform prevention strategies.
Collapse
Affiliation(s)
- Caelan E Radford
- Molecular and Cellular Biology Graduate Program, University of Washington and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, WA 98109, USA; Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Philipp Schommers
- Laboratory of Experimental Immunology, Institute of Virology, Faculty of Medicine and University Hospital of Cologne, University of Cologne, 50931 Cologne, Germany; German Center for Infection Research, partner site Bonn-Cologne, 50931 Cologne, Germany; Department I of Internal Medicine, Faculty of Medicine and University Hospital of Cologne, University of Cologne, 50931 Cologne, Germany
| | - Lutz Gieselmann
- Laboratory of Experimental Immunology, Institute of Virology, Faculty of Medicine and University Hospital of Cologne, University of Cologne, 50931 Cologne, Germany; German Center for Infection Research, partner site Bonn-Cologne, 50931 Cologne, Germany; Department I of Internal Medicine, Faculty of Medicine and University Hospital of Cologne, University of Cologne, 50931 Cologne, Germany
| | - Katharine H D Crawford
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Department of Genome Sciences & Medical Scientist Training Program, University of Washington, Seattle, WA 98109, USA
| | - Bernadeta Dadonaite
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Timothy C Yu
- Molecular and Cellular Biology Graduate Program, University of Washington and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, WA 98109, USA; Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Adam S Dingens
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Julie Overbaugh
- Division of Human Biology, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Florian Klein
- Laboratory of Experimental Immunology, Institute of Virology, Faculty of Medicine and University Hospital of Cologne, University of Cologne, 50931 Cologne, Germany; German Center for Infection Research, partner site Bonn-Cologne, 50931 Cologne, Germany; Department I of Internal Medicine, Faculty of Medicine and University Hospital of Cologne, University of Cologne, 50931 Cologne, Germany
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Howard Hughes Medical Institute, Seattle, WA 98109, USA.
| |
Collapse
|
22
|
Diaz-Colunga J, Skwara A, Gowda K, Diaz-Uriarte R, Tikhonov M, Bajic D, Sanchez A. Global epistasis on fitness landscapes. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220053. [PMID: 37004717 PMCID: PMC10067270 DOI: 10.1098/rstb.2022.0053] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 11/23/2022] [Indexed: 04/04/2023] Open
Abstract
Epistatic interactions between mutations add substantial complexity to adaptive landscapes and are often thought of as detrimental to our ability to predict evolution. Yet, patterns of global epistasis, in which the fitness effect of a mutation is well-predicted by the fitness of its genetic background, may actually be of help in our efforts to reconstruct fitness landscapes and infer adaptive trajectories. Microscopic interactions between mutations, or inherent nonlinearities in the fitness landscape, may cause global epistasis patterns to emerge. In this brief review, we provide a succinct overview of recent work about global epistasis, with an emphasis on building intuition about why it is often observed. To this end, we reconcile simple geometric reasoning with recent mathematical analyses, using these to explain why different mutations in an empirical landscape may exhibit different global epistasis patterns-ranging from diminishing to increasing returns. Finally, we highlight open questions and research directions. This article is part of the theme issue 'Interdisciplinary approaches to predicting evolutionary biology'.
Collapse
Affiliation(s)
- Juan Diaz-Colunga
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - Abigail Skwara
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - Karna Gowda
- Department of Ecology & Evolution & Center for the Physics of Evolving Systems, The University of Chicago, Chicago, IL 60637, USA
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, School of Medicine, Universidad Autónoma de Madrid, Madrid 28029, Spain
- Instituto de Investigaciones Biomédicas ‘Alberto Sols’ (UAM-CSIC), Madrid 28029, Spain
| | - Mikhail Tikhonov
- Department of Physics, Washington University of St Louis, St Louis, MO 63130, USA
| | - Djordje Bajic
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - Alvaro Sanchez
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA
- Department of Microbial Biotechnology, Campus de Cantoblanco, CNB-CSIC, Madrid 28049, Spain
| |
Collapse
|
23
|
Chen Y, Hu R, Li K, Zhang Y, Fu L, Zhang J, Si T. Deep Mutational Scanning of an Oxygen-Independent Fluorescent Protein CreiLOV for Comprehensive Profiling of Mutational and Epistatic Effects. ACS Synth Biol 2023; 12:1461-1473. [PMID: 37066862 PMCID: PMC10204710 DOI: 10.1021/acssynbio.2c00662] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Indexed: 04/18/2023]
Abstract
Oxygen-independent, flavin mononucleotide-based fluorescent proteins (FbFPs) are promising alternatives to green fluorescent protein in anaerobic contexts. Deep mutational scanning performs systematic profiling of protein sequence-function relationships but has not been applied to FbFPs. Focusing on CreiLOV from Chlamydomonas reinhardtii, we created and analyzed two comprehensive mutant collections: (1) single-residue, site-saturation mutagenesis libraries covering all 118 residues; and (2) a full combinatorial metagenesis library among 20 mutations at 15 residues, where mutation and residue selection was based on single-site mutagenesis results. Notably, the second type of library is indispensable to study higher-order epistasis but underrepresented in the literature. Using optimized FACS-seq assays, 2,185 (>92.5%) out of 2,360 possible single-site mutants and 165,428 (>89.7%) out of 184,320 possible combinatorial mutants were reliably assigned with fitness values. We constructed statistical and machine-learning models to analyze the CreiLOV data set, enabling accurate fitness prediction of higher-order mutants using lower-order mutagenesis data. In addition, we successfully isolated CreiLOV variants with improved fluorescence quantum yield and thermostability. This work provides new empirical data and design rules to engineer combinatorial protein variants.
Collapse
Affiliation(s)
- Yongcan Chen
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Ruyun Hu
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Keyi Li
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yating Zhang
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Lihao Fu
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University
of Chinese Academy of Sciences, Beijing 100049, China
| | - Jianzhi Zhang
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Tong Si
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- BGI-Shenzhen, Shenzhen 518083, China
- University
of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
24
|
Radford CE, Schommers P, Gieselmann L, Crawford KHD, Dadonaite B, Yu TC, Dingens AS, Overbaugh J, Klein F, Bloom JD. Mapping the neutralizing specificity of human anti-HIV serum by deep mutational scanning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.23.533993. [PMID: 36993197 PMCID: PMC10055425 DOI: 10.1101/2023.03.23.533993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Understanding the specificities of human serum antibodies that broadly neutralize HIV can inform prevention and treatment strategies. Here we describe a deep mutational scanning system that can measure the effects of combinations of mutations to HIV envelope (Env) on neutralization by antibodies and polyclonal serum. We first show that this system can accurately map how all functionally tolerated mutations to Env affect neutralization by monoclonal antibodies. We then comprehensively map Env mutations that affect neutralization by a set of human polyclonal sera known to target the CD4-binding site that neutralize diverse strains of HIV. The neutralizing activities of these sera target different epitopes, with most sera having specificities reminiscent of individual characterized monoclonal antibodies, but one sera targeting two epitopes within the CD4 binding site. Mapping the specificity of the neutralizing activity in polyclonal human serum will aid in assessing anti-HIV immune responses to inform prevention strategies.
Collapse
Affiliation(s)
- Caelan E. Radford
- Molecular and Cellular Biology Graduate Program, University of
Washington, and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, Washington,
98109, USA
- Basic Sciences Division and Computational Biology Program, Fred
Hutchinson Cancer Center, Seattle, Washington, 98109, USA
| | - Philipp Schommers
- Laboratory of Experimental Immunology, Institute of Virology,
Faculty of Medicine and University Hospital of Cologne, University of Cologne, 50931
Cologne, Germany
- German Center for Infection Research, partner site
Bonn–Cologne, 50931 Cologne, Germany
- Department I of Internal Medicine, Faculty of Medicine and
University Hospital of Cologne, University of Cologne, 50931 Cologne, Germany
| | - Lutz Gieselmann
- Laboratory of Experimental Immunology, Institute of Virology,
Faculty of Medicine and University Hospital of Cologne, University of Cologne, 50931
Cologne, Germany
- German Center for Infection Research, partner site
Bonn–Cologne, 50931 Cologne, Germany
- Department I of Internal Medicine, Faculty of Medicine and
University Hospital of Cologne, University of Cologne, 50931 Cologne, Germany
| | - Katharine H. D. Crawford
- Basic Sciences Division and Computational Biology Program, Fred
Hutchinson Cancer Center, Seattle, Washington, 98109, USA
- Department of Genome Sciences & Medical Scientist Training
Program, University of Washington, Seattle, Washington, 98109, USA
| | - Bernadeta Dadonaite
- Basic Sciences Division and Computational Biology Program, Fred
Hutchinson Cancer Center, Seattle, Washington, 98109, USA
| | - Timothy C. Yu
- Molecular and Cellular Biology Graduate Program, University of
Washington, and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, Washington,
98109, USA
- Basic Sciences Division and Computational Biology Program, Fred
Hutchinson Cancer Center, Seattle, Washington, 98109, USA
| | - Adam S. Dingens
- Basic Sciences Division and Computational Biology Program, Fred
Hutchinson Cancer Center, Seattle, Washington, 98109, USA
| | - Julie Overbaugh
- Division of Human Biology, Fred Hutchinson Cancer Center,
Seattle, Washington, 98109, USA
| | - Florian Klein
- Laboratory of Experimental Immunology, Institute of Virology,
Faculty of Medicine and University Hospital of Cologne, University of Cologne, 50931
Cologne, Germany
- German Center for Infection Research, partner site
Bonn–Cologne, 50931 Cologne, Germany
- Department I of Internal Medicine, Faculty of Medicine and
University Hospital of Cologne, University of Cologne, 50931 Cologne, Germany
| | - Jesse D. Bloom
- Basic Sciences Division and Computational Biology Program, Fred
Hutchinson Cancer Center, Seattle, Washington, 98109, USA
- Howard Hughes Medical Institute, Seattle, WA, 98109, USA
| |
Collapse
|
25
|
Dadonaite B, Crawford KHD, Radford CE, Farrell AG, Yu TC, Hannon WW, Zhou P, Andrabi R, Burton DR, Liu L, Ho DD, Chu HY, Neher RA, Bloom JD. A pseudovirus system enables deep mutational scanning of the full SARS-CoV-2 spike. Cell 2023; 186:1263-1278.e20. [PMID: 36868218 PMCID: PMC9922669 DOI: 10.1016/j.cell.2023.02.001] [Citation(s) in RCA: 61] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 01/11/2023] [Accepted: 01/31/2023] [Indexed: 02/15/2023]
Abstract
A major challenge in understanding SARS-CoV-2 evolution is interpreting the antigenic and functional effects of emerging mutations in the viral spike protein. Here, we describe a deep mutational scanning platform based on non-replicative pseudotyped lentiviruses that directly quantifies how large numbers of spike mutations impact antibody neutralization and pseudovirus infection. We apply this platform to produce libraries of the Omicron BA.1 and Delta spikes. These libraries each contain ∼7,000 distinct amino acid mutations in the context of up to ∼135,000 unique mutation combinations. We use these libraries to map escape mutations from neutralizing antibodies targeting the receptor-binding domain, N-terminal domain, and S2 subunit of spike. Overall, this work establishes a high-throughput and safe approach to measure how ∼105 combinations of mutations affect antibody neutralization and spike-mediated infection. Notably, the platform described here can be extended to the entry proteins of many other viruses.
Collapse
Affiliation(s)
- Bernadeta Dadonaite
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Katharine H D Crawford
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Department of Genome Sciences & Medical Scientist Training Program, University of Washington, Seattle, WA 98109, USA
| | - Caelan E Radford
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA 98109, USA
| | - Ariana G Farrell
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Timothy C Yu
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA 98109, USA
| | - William W Hannon
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA 98109, USA
| | - Panpan Zhou
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA; IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037, USA; Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Raiees Andrabi
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA; IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037, USA; Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Dennis R Burton
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA; IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037, USA; Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA 92037, USA; Ragon Institute of Massachusetts General Hospital, MIT & Harvard, Cambridge, MA 02139, USA
| | - Lihong Liu
- Aaron Diamond AIDS Research Center, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
| | - David D Ho
- Aaron Diamond AIDS Research Center, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA; Department of Microbiology and Immunology, Columbia University Vagelos College of Physicians and Surgeons, New York, NY 10032, USA; Division of Infectious Diseases, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY 10032, USA
| | - Helen Y Chu
- University of Washington, Department of Medicine, Division of Allergy and Infectious Diseases, Seattle, WA, USA
| | - Richard A Neher
- Biozentrum, University of Basel, Basel, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Howard Hughes Medical Institute, Seattle, WA 98195, USA.
| |
Collapse
|
26
|
Morin MA, Morrison AJ, Harms MJ, Dutton RJ. Higher-order interactions shape microbial interactions as microbial community complexity increases. Sci Rep 2022; 12:22640. [PMID: 36587027 PMCID: PMC9805437 DOI: 10.1038/s41598-022-25303-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 11/28/2022] [Indexed: 01/01/2023] Open
Abstract
Non-pairwise interactions, or higher-order interactions (HOIs), in microbial communities have been described as significant drivers of emergent features in microbiomes. Yet, the re-organization of microbial interactions between pairwise cultures and larger communities remains largely unexplored from a molecular perspective but is central to our understanding and further manipulation of microbial communities. Here, we used a bottom-up approach to investigate microbial interaction mechanisms from pairwise cultures up to 4-species communities from a simple microbiome (Hafnia alvei, Geotrichum candidum, Pencillium camemberti and Escherichia coli). Specifically, we characterized the interaction landscape for each species combination involving E. coli by identifying E. coli's interaction-associated mutants using an RB-TnSeq-based interaction assay. We observed a deep reorganization of the interaction-associated mutants, with very few 2-species interactions conserved all the way up to a 4-species community and the emergence of multiple HOIs. We further used a quantitative genetics strategy to decipher how 2-species interactions were quantitatively conserved in higher community compositions. Epistasis-based analysis revealed that, of the interactions that are conserved at all levels of complexity, 82% follow an additive pattern. Altogether, we demonstrate the complex architecture of microbial interactions even within a simple microbiome, and provide a mechanistic and molecular explanation of HOIs.
Collapse
Affiliation(s)
- Manon A. Morin
- grid.266100.30000 0001 2107 4242School of Biological Science, University of California San Diego, San Diego, 92093 USA
| | - Anneliese J. Morrison
- grid.170202.60000 0004 1936 8008Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR USA ,grid.170202.60000 0004 1936 8008Institute of Molecular Biology, University of Oregon, Eugene, OR USA
| | - Michael J. Harms
- grid.170202.60000 0004 1936 8008Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR USA ,grid.170202.60000 0004 1936 8008Institute of Molecular Biology, University of Oregon, Eugene, OR USA
| | - Rachel J. Dutton
- grid.266100.30000 0001 2107 4242School of Biological Science, University of California San Diego, San Diego, 92093 USA
| |
Collapse
|
27
|
Moulana A, Dupic T, Phillips AM, Chang J, Nieves S, Roffler AA, Greaney AJ, Starr TN, Bloom JD, Desai MM. Compensatory epistasis maintains ACE2 affinity in SARS-CoV-2 Omicron BA.1. Nat Commun 2022; 13:7011. [PMID: 36384919 PMCID: PMC9668218 DOI: 10.1038/s41467-022-34506-z] [Citation(s) in RCA: 61] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 10/26/2022] [Indexed: 11/17/2022] Open
Abstract
The Omicron BA.1 variant emerged in late 2021 and quickly spread across the world. Compared to the earlier SARS-CoV-2 variants, BA.1 has many mutations, some of which are known to enable antibody escape. Many of these antibody-escape mutations individually decrease the spike receptor-binding domain (RBD) affinity for ACE2, but BA.1 still binds ACE2 with high affinity. The fitness and evolution of the BA.1 lineage is therefore driven by the combined effects of numerous mutations. Here, we systematically map the epistatic interactions between the 15 mutations in the RBD of BA.1 relative to the Wuhan Hu-1 strain. Specifically, we measure the ACE2 affinity of all possible combinations of these 15 mutations (215 = 32,768 genotypes), spanning all possible evolutionary intermediates from the ancestral Wuhan Hu-1 strain to BA.1. We find that immune escape mutations in BA.1 individually reduce ACE2 affinity but are compensated by epistatic interactions with other affinity-enhancing mutations, including Q498R and N501Y. Thus, the ability of BA.1 to evade immunity while maintaining ACE2 affinity is contingent on acquiring multiple interacting mutations. Our results implicate compensatory epistasis as a key factor driving substantial evolutionary change for SARS-CoV-2 and are consistent with Omicron BA.1 arising from a chronic infection.
Collapse
Affiliation(s)
- Alief Moulana
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Thomas Dupic
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Angela M Phillips
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA.
| | - Jeffrey Chang
- Department of Physics, Harvard University, Cambridge, MA, 02138, USA
| | - Serafina Nieves
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Anne A Roffler
- Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, 02115, USA
| | - Allison J Greaney
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA, 98195, USA
| | - Tyler N Starr
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
- Howard Hughes Medical Institute, Seattle, WA, 98109, USA
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA.
- Department of Physics, Harvard University, Cambridge, MA, 02138, USA.
- NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, MA, 02138, USA.
- Quantitative Biology Initiative, Harvard University, Cambridge, MA, 02138, USA.
| |
Collapse
|
28
|
Dadonaite B, Crawford KHD, Radford CE, Farrell AG, Yu TC, Hannon WW, Zhou P, Andrabi R, Burton DR, Liu L, Ho DD, Neher RA, Bloom JD. A pseudovirus system enables deep mutational scanning of the full SARS-CoV-2 spike. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.10.13.512056. [PMID: 36263061 PMCID: PMC9580381 DOI: 10.1101/2022.10.13.512056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A major challenge in understanding SARS-CoV-2 evolution is interpreting the antigenic and functional effects of emerging mutations in the viral spike protein. Here we describe a new deep mutational scanning platform based on non-replicative pseudotyped lentiviruses that directly quantifies how large numbers of spike mutations impact antibody neutralization and pseudovirus infection. We demonstrate this new platform by making libraries of the Omicron BA.1 and Delta spikes. These libraries each contain ~7000 distinct amino-acid mutations in the context of up to ~135,000 unique mutation combinations. We use these libraries to map escape mutations from neutralizing antibodies targeting the receptor binding domain, N-terminal domain, and S2 subunit of spike. Overall, this work establishes a high-throughput and safe approach to measure how ~10 5 combinations of mutations affect antibody neutralization and spike-mediated infection. Notably, the platform described here can be extended to the entry proteins of many other viruses.
Collapse
Affiliation(s)
- Bernadeta Dadonaite
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
| | - Katharine H D Crawford
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
- Department of Genome Sciences & Medical Scientist Training Program, University of Washington, Seattle, Washington, 98109, USA
| | - Caelan E Radford
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, Washington, 98109, USA
| | - Ariana G Farrell
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
| | - Timothy C Yu
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, Washington, 98109, USA
| | - William W Hannon
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, Washington, 98109, USA
| | - Panpan Zhou
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037, USA
- Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Raiees Andrabi
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037, USA
- Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Dennis R Burton
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037, USA
- Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA 92037, USA
- Ragon Institute of MGH, MIT & Harvard, Cambridge, MA 02139, USA
| | - Lihong Liu
- Aaron Diamond AIDS Research Center, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
| | - David D. Ho
- Aaron Diamond AIDS Research Center, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
- Department of Microbiology and Immunology, Columbia University Vagelos College of Physicians and Surgeons, New York, NY 10032, USA
- Division of Infectious Diseases, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY 10032, USA
| | - Richard A. Neher
- Biozentrum, University of Basel, Basel, Switzerland, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
- Howard Hughes Medical Institute, Seattle, WA, 98195, USA
| |
Collapse
|
29
|
Abstract
One core goal of genetics is to systematically understand the mapping between the DNA sequence of an organism (genotype) and its measurable characteristics (phenotype). Understanding this mapping is often challenging because of interactions between mutations, where the result of combining several different mutations can be very different than the sum of their individual effects. Here we provide a statistical framework for modeling complex genetic interactions of this type. The key idea is to ask how fast the effects of mutations change when introducing the same mutation in increasingly distant genetic backgrounds. We then propose a model for phenotypic prediction that takes into account this tendency for the effects of mutations to be more similar in nearby genetic backgrounds. Contemporary high-throughput mutagenesis experiments are providing an increasingly detailed view of the complex patterns of genetic interaction that occur between multiple mutations within a single protein or regulatory element. By simultaneously measuring the effects of thousands of combinations of mutations, these experiments have revealed that the genotype–phenotype relationship typically reflects not only genetic interactions between pairs of sites but also higher-order interactions among larger numbers of sites. However, modeling and understanding these higher-order interactions remains challenging. Here we present a method for reconstructing sequence-to-function mappings from partially observed data that can accommodate all orders of genetic interaction. The main idea is to make predictions for unobserved genotypes that match the type and extent of epistasis found in the observed data. This information on the type and extent of epistasis can be extracted by considering how phenotypic correlations change as a function of mutational distance, which is equivalent to estimating the fraction of phenotypic variance due to each order of genetic interaction (additive, pairwise, three-way, etc.). Using these estimated variance components, we then define an empirical Bayes prior that in expectation matches the observed pattern of epistasis and reconstruct the genotype–phenotype mapping by conducting Gaussian process regression under this prior. To demonstrate the power of this approach, we present an application to the antibody-binding domain GB1 and also provide a detailed exploration of a dataset consisting of high-throughput measurements for the splicing efficiency of human pre-mRNA 5′ splice sites, for which we also validate our model predictions via additional low-throughput experiments.
Collapse
|
30
|
Tareen A, Kooshkbaghi M, Posfai A, Ireland WT, McCandlish DM, Kinney JB. MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Genome Biol 2022; 23:98. [PMID: 35428271 PMCID: PMC9011994 DOI: 10.1186/s13059-022-02661-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 03/21/2022] [Accepted: 03/24/2022] [Indexed: 12/17/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps-including biophysically interpretable models-from MAVE datasets. We demonstrate MAVE-NN in multiple biological contexts, and highlight the ability of our approach to deconvolve mutational effects from otherwise confounding experimental nonlinearities and noise.
Collapse
Affiliation(s)
- Ammar Tareen
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA
- Present Address: Regeneron Pharmaceuticals, Inc., Tarrytown, 10591, NY, USA
| | - Mahdi Kooshkbaghi
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA
| | - Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA
| | - William T Ireland
- Department of Physics, California Institute of Technology, Pasadena, 91125, CA, USA
- Present Address: Department of Applied Physics, Harvard University, Cambridge, 02134, MA, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA.
| |
Collapse
|
31
|
Ogbunugafor CB. The mutation effect reaction norm (mu-rn) highlights environmentally dependent mutation effects and epistatic interactions. Evolution 2022; 76:37-48. [PMID: 34989399 DOI: 10.1111/evo.14428] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 12/23/2021] [Indexed: 11/27/2022]
Abstract
Since the modern synthesis, the fitness effects of mutations and epistasis have been central yet provocative concepts in evolutionary and population genetics. Studies of how the interactions between parcels of genetic information can change as a function of environmental context have added a layer of complexity to these discussions. Here I introduce the "mutation effect reaction norm" (Mu-RN), a new instrument through which one can analyze the phenotypic consequences of mutations and interactions across environmental contexts. It embodies the fusion of measurements of genetic interactions with the reaction norm, a classic depiction of the performance of genotypes across environments. I demonstrate the utility of the Mu-RN through the signature of a "compensatory ratchet" mutation that undermines reverse evolution of antimicrobial resistance. More broadly, I argue that the mutation effect reaction norm may help us resolve the dynamism and unpredictability of evolution, with implications for theoretical biology, genetic modification technology, and public health. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- C Brandon Ogbunugafor
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, 06520, USA
| |
Collapse
|
32
|
On the sparsity of fitness functions and implications for learning. Proc Natl Acad Sci U S A 2022; 119:2109649118. [PMID: 34937698 PMCID: PMC8740588 DOI: 10.1073/pnas.2109649118] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/11/2021] [Indexed: 01/05/2023] Open
Abstract
The properties of proteins and other biological molecules are encoded in large part in the sequence of amino acids or nucleotides that defines them. Increasingly, researchers estimate functions that map sequences to a particular property using machine learning and related statistical approaches. However, an important question remains unanswered: How many experimental measurements are needed in order to accurately learn these “fitness” functions? We leverage perspectives from the fields of biophysics, evolutionary biology, and signal processing to develop a theoretical framework that enables us to make progress on answering this question. We demonstrate that this framework can be used to make useful calculations on real-world data and suggest how these calculations may be used to guide experiments. Fitness functions map biological sequences to a scalar property of interest. Accurate estimation of these functions yields biological insight and sets the foundation for model-based sequence design. However, the fitness datasets available to learn these functions are typically small relative to the large combinatorial space of sequences; characterizing how much data are needed for accurate estimation remains an open problem. There is a growing body of evidence demonstrating that empirical fitness functions display substantial sparsity when represented in terms of epistatic interactions. Moreover, the theory of Compressed Sensing provides scaling laws for the number of samples required to exactly recover a sparse function. Motivated by these results, we develop a framework to study the sparsity of fitness functions sampled from a generalization of the NK model, a widely used random field model of fitness functions. In particular, we present results that allow us to test the effect of the Generalized NK (GNK) model’s interpretable parameters—sequence length, alphabet size, and assumed interactions between sequence positions—on the sparsity of fitness functions sampled from the model and, consequently, the number of measurements required to exactly recover these functions. We validate our framework by demonstrating that GNK models with parameters set according to structural considerations can be used to accurately approximate the number of samples required to recover two empirical protein fitness functions and an RNA fitness function. In addition, we show that these GNK models identify important higher-order epistatic interactions in the empirical fitness functions using only structural information.
Collapse
|
33
|
Shaw D, Miravet‐Verde S, Piñero‐Lambea C, Serrano L, Lluch‐Senar M. LoxTnSeq: random transposon insertions combined with cre/lox recombination and counterselection to generate large random genome reductions. Microb Biotechnol 2021; 14:2403-2419. [PMID: 33325626 PMCID: PMC8601177 DOI: 10.1111/1751-7915.13714] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 12/13/2022] Open
Abstract
The removal of unwanted genetic material is a key aspect in many synthetic biology efforts and often requires preliminary knowledge of which genomic regions are dispensable. Typically, these efforts are guided by transposon mutagenesis studies, coupled to deepsequencing (TnSeq) to identify insertion points and gene essentiality. However, epistatic interactions can cause unforeseen changes in essentiality after the deletion of a gene, leading to the redundancy of these essentiality maps. Here, we present LoxTnSeq, a new methodology to generate and catalogue libraries of genome reduction mutants. LoxTnSeq combines random integration of lox sites by transposon mutagenesis, and the generation of mutants via Cre recombinase, catalogued via deep sequencing. When LoxTnSeq was applied to the naturally genome reduced bacterium Mycoplasma pneumoniae, we obtained a mutant pool containing 285 unique deletions. These deletions spanned from > 50 bp to 28 Kb, which represents 21% of the total genome. LoxTnSeq also highlighted large regions of non-essential genes that could be removed simultaneously, and other non-essential regions that could not, providing a guide for future genome reductions.
Collapse
Affiliation(s)
- Daniel Shaw
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyDr. Aiguader 88Barcelona08003Spain
| | - Samuel Miravet‐Verde
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyDr. Aiguader 88Barcelona08003Spain
| | - Carlos Piñero‐Lambea
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyDr. Aiguader 88Barcelona08003Spain
- Present address:
Pulmobiotics ltdDr. Aiguader 88Barcelona08003Spain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyDr. Aiguader 88Barcelona08003Spain
- Universitat Pompeu Fabra (UPF)Barcelona08002Spain
- ICREAPg. Lluís Companys 23Barcelona08010Spain
| | - Maria Lluch‐Senar
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyDr. Aiguader 88Barcelona08003Spain
- Basic Sciences DepartmentFaculty of Medicine and Health SciencesUniversitat Internacional de CatalunyaSant Cugat del Vallès08195Spain
| |
Collapse
|
34
|
Phillips AM, Lawrence KR, Moulana A, Dupic T, Chang J, Johnson MS, Cvijovic I, Mora T, Walczak AM, Desai MM. Binding affinity landscapes constrain the evolution of broadly neutralizing anti-influenza antibodies. eLife 2021; 10:71393. [PMID: 34491198 PMCID: PMC8476123 DOI: 10.7554/elife.71393] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 09/05/2021] [Indexed: 12/12/2022] Open
Abstract
Over the past two decades, several broadly neutralizing antibodies (bnAbs) that confer protection against diverse influenza strains have been isolated. Structural and biochemical characterization of these bnAbs has provided molecular insight into how they bind distinct antigens. However, our understanding of the evolutionary pathways leading to bnAbs, and thus how best to elicit them, remains limited. Here, we measure equilibrium dissociation constants of combinatorially complete mutational libraries for two naturally isolated influenza bnAbs (CR9114, 16 heavy-chain mutations; CR6261, 11 heavy-chain mutations), reconstructing all possible evolutionary intermediates back to the unmutated germline sequences. We find that these two libraries exhibit strikingly different patterns of breadth: while many variants of CR6261 display moderate affinity to diverse antigens, those of CR9114 display appreciable affinity only in specific, nested combinations. By examining the extensive pairwise and higher order epistasis between mutations, we find key sites with strong synergistic interactions that are highly similar across antigens for CR6261 and different for CR9114. Together, these features of the binding affinity landscapes strongly favor sequential acquisition of affinity to diverse antigens for CR9114, while the acquisition of breadth to more similar antigens for CR6261 is less constrained. These results, if generalizable to other bnAbs, may explain the molecular basis for the widespread observation that sequential exposure favors greater breadth, and such mechanistic insight will be essential for predicting and eliciting broadly protective immune responses.
Collapse
Affiliation(s)
- Angela M Phillips
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Katherine R Lawrence
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States.,NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States.,Quantitative Biology Initiative, Harvard University, Cambridge, United States.,Department of Physics, Massachusetts Institute of Technology, Cambridge, United States
| | - Alief Moulana
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Thomas Dupic
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Jeffrey Chang
- Department of Physics, Harvard University, Cambridge, United States
| | - Milo S Johnson
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Ivana Cvijovic
- Department of Applied Physics, Stanford University, Stanford, United States
| | - Thierry Mora
- Laboratoire de physique de ÍÉcole Normale Supérieure, CNRS, PSL University, Sorbonne Université, and Université de Paris, Paris, France
| | - Aleksandra M Walczak
- Laboratoire de physique de ÍÉcole Normale Supérieure, CNRS, PSL University, Sorbonne Université, and Université de Paris, Paris, France
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States.,NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States.,Quantitative Biology Initiative, Harvard University, Cambridge, United States.,Department of Physics, Harvard University, Cambridge, United States
| |
Collapse
|
35
|
Aghazadeh A, Nisonoff H, Ocal O, Brookes DH, Huang Y, Koyluoglu OO, Listgarten J, Ramchandran K. Epistatic Net allows the sparse spectral regularization of deep neural networks for inferring fitness functions. Nat Commun 2021; 12:5225. [PMID: 34471113 PMCID: PMC8410946 DOI: 10.1038/s41467-021-25371-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 07/27/2021] [Indexed: 11/18/2022] Open
Abstract
Despite recent advances in high-throughput combinatorial mutagenesis assays, the number of labeled sequences available to predict molecular functions has remained small for the vastness of the sequence space combined with the ruggedness of many fitness functions. While deep neural networks (DNNs) can capture high-order epistatic interactions among the mutational sites, they tend to overfit to the small number of labeled sequences available for training. Here, we developed Epistatic Net (EN), a method for spectral regularization of DNNs that exploits evidence that epistatic interactions in many fitness functions are sparse. We built a scalable extension of EN, usable for larger sequences, which enables spectral regularization using fast sparse recovery algorithms informed by coding theory. Results on several biological landscapes show that EN consistently improves the prediction accuracy of DNNs and enables them to outperform competing models which assume other priors. EN estimates the higher-order epistatic interactions of DNNs trained on massive sequence spaces-a computational problem that otherwise takes years to solve.
Collapse
Affiliation(s)
- Amirali Aghazadeh
- Department of Electrical Engineering and Computer Sciences, Berkeley, CA, USA
| | | | - Orhan Ocal
- Department of Electrical Engineering and Computer Sciences, Berkeley, CA, USA
| | - David H Brookes
- Biophysics Graduate Group, University of California, Berkeley, CA, USA
| | - Yijie Huang
- Department of Electrical Engineering and Computer Sciences, Berkeley, CA, USA
| | - O Ozan Koyluoglu
- Department of Electrical Engineering and Computer Sciences, Berkeley, CA, USA
| | - Jennifer Listgarten
- Department of Electrical Engineering and Computer Sciences, Berkeley, CA, USA
- Center for Computational Biology, Berkeley, CA, USA
| | - Kannan Ramchandran
- Department of Electrical Engineering and Computer Sciences, Berkeley, CA, USA.
| |
Collapse
|
36
|
Morrison AJ, Wonderlick DR, Harms MJ. Ensemble epistasis: thermodynamic origins of nonadditivity between mutations. Genetics 2021; 219:iyab105. [PMID: 34849909 PMCID: PMC8633102 DOI: 10.1093/genetics/iyab105] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 06/19/2021] [Indexed: 01/02/2023] Open
Abstract
Epistasis-when mutations combine nonadditively-is a profoundly important aspect of biology. It is often difficult to understand its mechanistic origins. Here, we show that epistasis can arise from the thermodynamic ensemble, or the set of interchanging conformations a protein adopts. Ensemble epistasis occurs because mutations can have different effects on different conformations of the same protein, leading to nonadditive effects on its average, observable properties. Using a simple analytical model, we found that ensemble epistasis arises when two conditions are met: (1) a protein populates at least three conformations and (2) mutations have differential effects on at least two conformations. To explore the relative magnitude of ensemble epistasis, we performed a virtual deep-mutational scan of the allosteric Ca2+ signaling protein S100A4. We found that 47% of mutation pairs exhibited ensemble epistasis with a magnitude on the order of thermal fluctuations. We observed many forms of epistasis: magnitude, sign, and reciprocal sign epistasis. The same mutation pair could even exhibit different forms of epistasis under different environmental conditions. The ubiquity of thermodynamic ensembles in biology and the pervasiveness of ensemble epistasis in our dataset suggests that it may be a common mechanism of epistasis in proteins and other macromolecules.
Collapse
Affiliation(s)
- Anneliese J Morrison
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403, USA
- Department of Chemistry and Biochemistry, University of Oregon, Eugene OR 97403, USA
| | - Daria R Wonderlick
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403, USA
- Department of Chemistry and Biochemistry, University of Oregon, Eugene OR 97403, USA
| | - Michael J Harms
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403, USA
- Department of Chemistry and Biochemistry, University of Oregon, Eugene OR 97403, USA
| |
Collapse
|
37
|
Correlational selection in the age of genomics. Nat Ecol Evol 2021; 5:562-573. [PMID: 33859374 DOI: 10.1038/s41559-021-01413-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 02/11/2021] [Indexed: 02/01/2023]
Abstract
Ecologists and evolutionary biologists are well aware that natural and sexual selection do not operate on traits in isolation, but instead act on combinations of traits. This long-recognized and pervasive phenomenon is known as multivariate selection, or-in the particular case where it favours correlations between interacting traits-correlational selection. Despite broad acknowledgement of correlational selection, the relevant theory has often been overlooked in genomic research. Here, we discuss theory and empirical findings from ecological, quantitative genetic and genomic research, linking key insights from different fields. Correlational selection can operate on both discrete trait combinations and quantitative characters, with profound implications for genomic architecture, linkage, pleiotropy, evolvability, modularity, phenotypic integration and phenotypic plasticity. We synthesize current knowledge and discuss promising research approaches that will enable us to understand how correlational selection shapes genomic architecture, thereby linking quantitative genetic approaches with emerging genomic methods. We suggest that research on correlational selection has great potential to integrate multiple fields in evolutionary biology, including developmental and functional biology, ecology, quantitative genetics, phenotypic polymorphisms, hybrid zones and speciation processes.
Collapse
|
38
|
Reddy G, Desai MM. Global epistasis emerges from a generic model of a complex trait. eLife 2021; 10:64740. [PMID: 33779543 PMCID: PMC8057814 DOI: 10.7554/elife.64740] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 03/26/2021] [Indexed: 11/20/2022] Open
Abstract
Epistasis between mutations can make adaptation contingent on evolutionary history. Yet despite widespread ‘microscopic’ epistasis between the mutations involved, microbial evolution experiments show consistent patterns of fitness increase between replicate lines. Recent work shows that this consistency is driven in part by global patterns of diminishing-returns and increasing-costs epistasis, which make mutations systematically less beneficial (or more deleterious) on fitter genetic backgrounds. However, the origin of this ‘global’ epistasis remains unknown. Here, we show that diminishing-returns and increasing-costs epistasis emerge generically as a consequence of pervasive microscopic epistasis. Our model predicts a specific quantitative relationship between the magnitude of global epistasis and the stochastic effects of microscopic epistasis, which we confirm by reanalyzing existing data. We further show that the distribution of fitness effects takes on a universal form when epistasis is widespread and introduce a novel fitness landscape model to show how phenotypic evolution can be repeatable despite sequence-level stochasticity.
Collapse
Affiliation(s)
- Gautam Reddy
- NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States
| | - Michael M Desai
- NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States.,Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States.,Quantitative Biology Initiative, Harvard University, Cambridge, United States.,Department of Physics, Harvard University, Cambridge, United States
| |
Collapse
|
39
|
Gualtieri CT. Genomic Variation, Evolvability, and the Paradox of Mental Illness. Front Psychiatry 2021; 11:593233. [PMID: 33551865 PMCID: PMC7859268 DOI: 10.3389/fpsyt.2020.593233] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 11/27/2020] [Indexed: 12/30/2022] Open
Abstract
Twentieth-century genetics was hard put to explain the irregular behavior of neuropsychiatric disorders. Autism and schizophrenia defy a principle of natural selection; they are highly heritable but associated with low reproductive success. Nevertheless, they persist. The genetic origins of such conditions are confounded by the problem of variable expression, that is, when a given genetic aberration can lead to any one of several distinct disorders. Also, autism and schizophrenia occur on a spectrum of severity, from mild and subclinical cases to the overt and disabling. Such irregularities reflect the problem of missing heritability; although hundreds of genes may be associated with autism or schizophrenia, together they account for only a small proportion of cases. Techniques for higher resolution, genomewide analysis have begun to illuminate the irregular and unpredictable behavior of the human genome. Thus, the origins of neuropsychiatric disorders in particular and complex disease in general have been illuminated. The human genome is characterized by a high degree of structural and behavioral variability: DNA content variation, epistasis, stochasticity in gene expression, and epigenetic changes. These elements have grown more complex as evolution scaled the phylogenetic tree. They are especially pertinent to brain development and function. Genomic variability is a window on the origins of complex disease, neuropsychiatric disorders, and neurodevelopmental disorders in particular. Genomic variability, as it happens, is also the fuel of evolvability. The genomic events that presided over the evolution of the primate and hominid lineages are over-represented in patients with autism and schizophrenia, as well as intellectual disability and epilepsy. That the special qualities of the human genome that drove evolution might, in some way, contribute to neuropsychiatric disorders is a matter of no little interest.
Collapse
|
40
|
Stouffer DB, Novak M. Hidden layers of density dependence in consumer feeding rates. Ecol Lett 2021; 24:520-532. [PMID: 33404158 DOI: 10.1111/ele.13670] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 11/26/2020] [Accepted: 12/07/2020] [Indexed: 01/16/2023]
Abstract
Functional responses relate a consumer's feeding rates to variation in its abiotic and biotic environment, providing insight into consumer behaviour and fitness, and underpinning population and food-web dynamics. Despite their broad relevance and long-standing history, we show here that the types of density dependence found in classic resource- and consumer-dependent functional-response models equate to strong and often untenable assumptions about the independence of processes underlying feeding rates. We first demonstrate mathematically how to quantify non-independence between feeding and consumer interference and between feeding on multiple resources. We then analyse two large collections of functional-response data sets to show that non-independence is pervasive and borne out in previously hidden forms of density dependence. Our results provide a new lens through which to view variation in consumer feeding rates and disentangle the biological underpinnings of species interactions in multi-species contexts.
Collapse
Affiliation(s)
- Daniel B Stouffer
- Centre for Integrative Ecology, School of Biological Sciences, University of Canterbury, Christchurch, 8041, New Zealand
| | - Mark Novak
- Department of Integrative Biology, Oregon State University, Corvallis, OR, 97331, USA
| |
Collapse
|
41
|
Chen J, Wong KC. Analyzing High-Order Epistasis from Genotype-Phenotype Maps Using 'Epistasis' Package. Methods Mol Biol 2021; 2212:265-275. [PMID: 33733361 DOI: 10.1007/978-1-0716-0947-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Epistasis is the phenomenon about the interactions between genes, leading to complex phenotypic effects. The interactions between three or more mutations called "high-order epistasis" aroused significant interests in recent studies. However, there are still debates for analysis of high-order epistasis due to the non-linear model complexity and statistical artifacts. A recent "epistasis" Python package was therefore developed to characterize high-order epistasis by estimating non-linear scaling for mutation effects to extract high-order epistasis using linear models. This method successfully discovered statistically significant high-order epistasis on several real genotype-phenotype maps. We provided a concise and step-by-step guide to apply the "epistasis" by reproducing the high-order epistasis discoveries on real genotype-phenotype data using the latest API of the package.
Collapse
Affiliation(s)
- Junyi Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong.
| |
Collapse
|
42
|
Rigato E, Fusco G. A heuristic model of the effects of phenotypic robustness in adaptive evolution. Theor Popul Biol 2020; 136:22-30. [PMID: 33221334 DOI: 10.1016/j.tpb.2020.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 10/15/2020] [Accepted: 11/09/2020] [Indexed: 10/23/2022]
Abstract
A recent theoretical, deterministic model of the effects of phenotypic robustness on adaptive evolutionary dynamics showed that a certain level of phenotypic robustness (critical robustness) is a required condition for adaptation to occur and to be maintained during evolution in most real organismal systems. We built an individual-based heuristic model to verify the soundness of these theoretical results through computer simulation, testing expectations under a range of scenarios for the relevant parameters of the evolutionary dynamics. These include the mutation probability, the presence of stochastic effects, the introduction of environmental influences and the possibility for some features of the population (like selection coefficients and phenotypic robustness) to change themselves during adaptation. Overall, we found a good match between observed and expected results, even for evolutionary parameter values that violate some of the assumptions of the deterministic model, and that robustness can itself evolve. However, from more than one simulation it appears that very high robustness values, higher than the critical value, can limit or slow-down adaptation. This possible trade-off was not predicted by the deterministic model.
Collapse
Affiliation(s)
- Emanuele Rigato
- Department of Biology, University of Padova, Via U. Bassi 58/B, 35131 Padova, Italy
| | - Giuseppe Fusco
- Department of Biology, University of Padova, Via U. Bassi 58/B, 35131 Padova, Italy.
| |
Collapse
|
43
|
Sailer ZR, Shafik SH, Summers RL, Joule A, Patterson-Robert A, Martin RE, Harms MJ. Inferring a complete genotype-phenotype map from a small number of measured phenotypes. PLoS Comput Biol 2020; 16:e1008243. [PMID: 32991585 PMCID: PMC7546491 DOI: 10.1371/journal.pcbi.1008243] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2019] [Revised: 10/09/2020] [Accepted: 08/13/2020] [Indexed: 01/02/2023] Open
Abstract
Understanding evolution requires detailed knowledge of genotype-phenotype maps; however, it can be a herculean task to measure every phenotype in a combinatorial map. We have developed a computational strategy to predict the missing phenotypes from an incomplete, combinatorial genotype-phenotype map. As a test case, we used an incomplete genotype-phenotype dataset previously generated for the malaria parasite’s ‘chloroquine resistance transporter’ (PfCRT). Wild-type PfCRT (PfCRT3D7) lacks significant chloroquine (CQ) transport activity, but the introduction of the eight mutations present in the ‘Dd2’ isoform of PfCRT (PfCRTDd2) enables the protein to transport CQ away from its site of antimalarial action. This gain of a transport function imparts CQ resistance to the parasite. A combinatorial map between PfCRT3D7 and PfCRTDd2 consists of 256 genotypes, of which only 52 have had their CQ transport activities measured through expression in the Xenopus laevis oocyte. We trained a statistical model with these 52 measurements to infer the CQ transport activity for the remaining 204 combinatorial genotypes between PfCRT3D7 and PfCRTDd2. Our best-performing model incorporated a binary classifier, a nonlinear scale, and additive effects for each mutation. The addition of specific pairwise- and high-order-epistatic coefficients decreased the predictive power of the model. We evaluated our predictions by experimentally measuring the CQ transport activities of 24 additional PfCRT genotypes. The R2 value between our predicted and newly-measured phenotypes was 0.90. We then used the model to probe the accessibility of evolutionary trajectories through the map. Approximately 1% of the possible trajectories between PfCRT3D7 and PfCRTDd2 are accessible; however, none of the trajectories entailed eight successive increases in CQ transport activity. These results demonstrate that phenotypes can be inferred with known uncertainty from a partial genotype-phenotype dataset. We also validated our approach against a collection of previously published genotype-phenotype maps. The model therefore appears general and should be applicable to a large number of genotype-phenotype maps. Biological macromolecules are built from chains of building blocks. The function of a macromolecule depends on the specific chemical properties of the building blocks that make it up. Macromolecules evolve through mutations that swap one building block for another. Understanding how biomolecules work and evolve therefore requires knowledge of the effects of mutations. The effects of mutations can be measured experimentally; however, because there are a vast number of possible combinations of mutations, it is often difficult to make enough measurements to understand biomolecular function and evolution. In this paper, we describe a simple method to predict the effects of mutations on biomolecules from a small number of measurements. This method works by appropriately averaging the effects of mutations seen in different contexts. We test the method by predicting the effects of mutations on a PfCRT—a macromolecule from the malarial parasite that confers drug resistance. We find that our method is fast and effective. Using a small number of measurements, we were able to gain insight into the evolutionary steps by which this macromolecule conferred drug resistance. To make this method accessible to other researchers, we have released it as an open-source software package: https://gpseer.readthedocs.io.
Collapse
Affiliation(s)
- Zachary R. Sailer
- Institute for Molecular Biology, University of Oregon, Eugene, OR, United States of America
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR, United States of America
| | - Sarah H. Shafik
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Robert L. Summers
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Alex Joule
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | | | - Rowena E. Martin
- Research School of Biology, Australian National University, Canberra, ACT, Australia
- * E-mail: (REM); (MJH)
| | - Michael J. Harms
- Institute for Molecular Biology, University of Oregon, Eugene, OR, United States of America
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR, United States of America
- * E-mail: (REM); (MJH)
| |
Collapse
|
44
|
Genotype networks of 80 quantitative Arabidopsis thaliana phenotypes reveal phenotypic evolvability despite pervasive epistasis. PLoS Comput Biol 2020; 16:e1008082. [PMID: 32790763 PMCID: PMC7447023 DOI: 10.1371/journal.pcbi.1008082] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 08/25/2020] [Accepted: 06/22/2020] [Indexed: 12/23/2022] Open
Abstract
We study the genotype-phenotype maps of 80 quantitative phenotypes in the model plant Arabidopsis thaliana, by representing the genotypes affecting each phenotype as a genotype network. In such a network, each vertex or node corresponds to an individual's genotype at all those genomic loci that affect a given phenotype. Two vertices are connected by an edge if the associated genotypes differ in exactly one nucleotide. The 80 genotype networks we analyze are based on data from genome-wide association studies of 199 A. thaliana accessions. They form connected graphs whose topography differs substantially among phenotypes. We focus our analysis on the incidence of epistasis (non-additive interactions among mutations) because a high incidence of epistasis can reduce the accessibility of evolutionary paths towards high or low phenotypic values. We find epistatic interactions in 67 phenotypes, and in 51 phenotypes every pairwise mutant interaction is epistatic. Moreover, we find phenotype-specific differences in the fraction of accessible mutational paths to maximum phenotypic values. However, even though epistasis affects the accessibility of maximum phenotypic values, the relationships between genotypic and phenotypic change of our analyzed phenotypes are sufficiently smooth that some evolutionary paths remain accessible for most phenotypes, even where epistasis is pervasive. The genotype network representation we use can complement existing approaches to understand the genetic architecture of polygenic traits in many different organisms.
Collapse
|
45
|
Ballal A, Laurendon C, Salmon M, Vardakou M, Cheema J, Defernez M, O'Maille PE, Morozov AV. Sparse Epistatic Patterns in the Evolution of Terpene Synthases. Mol Biol Evol 2020; 37:1907-1924. [PMID: 32119077 DOI: 10.1093/molbev/msaa052] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
We explore sequence determinants of enzyme activity and specificity in a major enzyme family of terpene synthases. Most enzymes in this family catalyze reactions that produce cyclic terpenes-complex hydrocarbons widely used by plants and insects in diverse biological processes such as defense, communication, and symbiosis. To analyze the molecular mechanisms of emergence of terpene cyclization, we have carried out in-depth examination of mutational space around (E)-β-farnesene synthase, an Artemisia annua enzyme which catalyzes production of a linear hydrocarbon chain. Each mutant enzyme in our synthetic libraries was characterized biochemically, and the resulting reaction rate data were used as input to the Michaelis-Menten model of enzyme kinetics, in which free energies were represented as sums of one-amino-acid contributions and two-amino-acid couplings. Our model predicts measured reaction rates with high accuracy and yields free energy landscapes characterized by relatively few coupling terms. As a result, the Michaelis-Menten free energy landscapes have simple, interpretable structure and exhibit little epistasis. We have also developed biophysical fitness models based on the assumption that highly fit enzymes have evolved to maximize the output of correct products, such as cyclic products or a specific product of interest, while minimizing the output of byproducts. This approach results in nonlinear fitness landscapes that are considerably more epistatic. Overall, our experimental and computational framework provides focused characterization of evolutionary emergence of novel enzymatic functions in the context of microevolutionary exploration of sequence space around naturally occurring enzymes.
Collapse
Affiliation(s)
- Aditya Ballal
- Department of Physics & Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, NJ
| | - Caroline Laurendon
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich, United Kingdom.,Food & Health Programme, Institute of Food Research, Norwich Research Park, Norwich, United Kingdom
| | - Melissa Salmon
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich, United Kingdom.,Food & Health Programme, Institute of Food Research, Norwich Research Park, Norwich, United Kingdom.,Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | - Maria Vardakou
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich, United Kingdom.,Food & Health Programme, Institute of Food Research, Norwich Research Park, Norwich, United Kingdom.,School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
| | - Jitender Cheema
- John Innes Centre, Department of Computational and Systems Biology, Norwich Research Park, Norwich, United Kingdom
| | - Marianne Defernez
- Core Science Resources, Quadram Institute, Norwich Research Park, Norwich, United Kingdom
| | - Paul E O'Maille
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich, United Kingdom.,Food & Health Programme, Institute of Food Research, Norwich Research Park, Norwich, United Kingdom.,SRI International, Menlo Park, CA
| | - Alexandre V Morozov
- Department of Physics & Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, NJ
| |
Collapse
|
46
|
Levy JJ, O'Malley AJ. Don't dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning. BMC Med Res Methodol 2020; 20:171. [PMID: 32600277 PMCID: PMC7325087 DOI: 10.1186/s12874-020-01046-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 06/10/2020] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Machine learning approaches have become increasingly popular modeling techniques, relying on data-driven heuristics to arrive at its solutions. Recent comparisons between these algorithms and traditional statistical modeling techniques have largely ignored the superiority gained by the former approaches due to involvement of model-building search algorithms. This has led to alignment of statistical and machine learning approaches with different types of problems and the under-development of procedures that combine their attributes. In this context, we hoped to understand the domains of applicability for each approach and to identify areas where a marriage between the two approaches is warranted. We then sought to develop a hybrid statistical-machine learning procedure with the best attributes of each. METHODS We present three simple examples to illustrate when to use each modeling approach and posit a general framework for combining them into an enhanced logistic regression model building procedure that aids interpretation. We study 556 benchmark machine learning datasets to uncover when machine learning techniques outperformed rudimentary logistic regression models and so are potentially well-equipped to enhance them. We illustrate a software package, InteractionTransformer, which embeds logistic regression with advanced model building capacity by using machine learning algorithms to extract candidate interaction features from a random forest model for inclusion in the model. Finally, we apply our enhanced logistic regression analysis to two real-word biomedical examples, one where predictors vary linearly with the outcome and another with extensive second-order interactions. RESULTS Preliminary statistical analysis demonstrated that across 556 benchmark datasets, the random forest approach significantly outperformed the logistic regression approach. We found a statistically significant increase in predictive performance when using hybrid procedures and greater clarity in the association with the outcome of terms acquired compared to directly interpreting the random forest output. CONCLUSIONS When a random forest model is closer to the true model, hybrid statistical-machine learning procedures can substantially enhance the performance of statistical procedures in an automated manner while preserving easy interpretation of the results. Such hybrid methods may help facilitate widespread adoption of machine learning techniques in the biomedical setting.
Collapse
Affiliation(s)
- Joshua J Levy
- Program in Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Hanover, USA.
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, USA.
- Department of Pathology, Geisel School of Medicine at Dartmouth, Hanover, USA.
| | - A James O'Malley
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, USA
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth, Hanover, USA
| |
Collapse
|
47
|
Zhou J, McCandlish DM. Minimum epistasis interpolation for sequence-function relationships. Nat Commun 2020; 11:1782. [PMID: 32286265 PMCID: PMC7156698 DOI: 10.1038/s41467-020-15512-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open
Abstract
Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction where mutational effects change as little as possible across adjacent genetic backgrounds. The resulting models can capture complex higher-order genetic interactions near the data, but approach additivity where data is sparse or absent. We apply the method to high-throughput transcription factor binding assays and use it to explore a fitness landscape for protein G.
Collapse
Affiliation(s)
- Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
48
|
Crona K, Luo M, Greene D. An uncertainty law for microbial evolution. J Theor Biol 2020; 489:110155. [PMID: 31926205 DOI: 10.1016/j.jtbi.2020.110155] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Revised: 01/05/2020] [Accepted: 01/07/2020] [Indexed: 11/28/2022]
Abstract
Medical practice would benefit from a thorough understanding of constraints and uncertainty in microbial evolution. Higher order epistasis refers to unexpected effects of multiple mutations even if both single mutations and pairwise effects have been accounted for. Recent studies show that higher order epistasis is abundant in nature, for bacteria as well as higher organisms. However, the importance of higher order effects has been debated. It has been suggested that such effects cannot be interpreted, and should not be considered. Here, we show conclusively that higher order epistasis changes the adaptive prospects for a population. The conclusion is based on an exhaustive search of 193,270,310 hyper-cube graphs and applications of graph theory. Our results are more precise, yet more universal, than related research since they depend on mathematical theory, rather than sampling or simulations. Moreover, the uncertainty we establish for microbial evolution, due to higher order epistasis is not sensitive for detailed model assumptions, such as the baseline being additive or log-additive fitness.
Collapse
Affiliation(s)
- Kristina Crona
- Department of Mathematics and Statistics 4400 Massachusetts Avenue NW Washington, DC 20016-8050, United States.
| | - Mengming Luo
- University of California at San Diego, CA, United States.
| | - Devin Greene
- Department of Mathematics and Statistics 4400 Massachusetts Avenue NW Washington, DC 20016-8050, United States.
| |
Collapse
|
49
|
Miton CM, Chen JZ, Ost K, Anderson DW, Tokuriki N. Statistical analysis of mutational epistasis to reveal intramolecular interaction networks in proteins. Methods Enzymol 2020; 643:243-280. [DOI: 10.1016/bs.mie.2020.07.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
50
|
Sanchez-Gorostiaga A, Bajić D, Osborne ML, Poyatos JF, Sanchez A. High-order interactions distort the functional landscape of microbial consortia. PLoS Biol 2019; 17:e3000550. [PMID: 31830028 PMCID: PMC6932822 DOI: 10.1371/journal.pbio.3000550] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 12/26/2019] [Accepted: 11/15/2019] [Indexed: 12/11/2022] Open
Abstract
Understanding the link between community composition and function is a major challenge in microbial population biology, with implications for the management of natural microbiomes and the design of synthetic consortia. Specifically, it is poorly understood whether community functions can be quantitatively predicted from traits of species in monoculture. Inspired by the study of complex genetic interactions, we have examined how the amylolytic rate of combinatorial assemblages of six starch-degrading soil bacteria depend on the separate functional contributions from each species and their interactions. Filtering our results through the theory of biochemical kinetics, we show that this simple function is additive in the absence of interactions among community members. For about half of the combinatorially assembled consortia, the amylolytic function is dominated by pairwise and higher-order interactions. For the other half, the function is additive despite the presence of strong competitive interactions. We explain the mechanistic basis of these findings and propose a quantitative framework that allows us to separate the effect of behavioral and population dynamics interactions. Our results suggest that the functional robustness of a consortium to pairwise and higher-order interactions critically affects our ability to predict and bottom-up engineer ecosystem function in complex communities.
Collapse
Affiliation(s)
- Alicia Sanchez-Gorostiaga
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
- Microbial Sciences Institute, Yale University, West Haven, Connecticut, United States of America
| | - Djordje Bajić
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
- Microbial Sciences Institute, Yale University, West Haven, Connecticut, United States of America
| | - Melisa L. Osborne
- The Rowland Institute at Harvard, Harvard University, Cambridge, Massachusetts, United States of America
- Biological Design Center, Boston University, Boston, Massachusetts, United States of America
| | - Juan F. Poyatos
- The Rowland Institute at Harvard, Harvard University, Cambridge, Massachusetts, United States of America
- Logic of Genomic Systems Laboratory, Spanish National Biotechnology Centre (CNB-CSIC), Madrid, Spain
| | - Alvaro Sanchez
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
- Microbial Sciences Institute, Yale University, West Haven, Connecticut, United States of America
- The Rowland Institute at Harvard, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|