51
|
Weinstein JY, Martí-Gómez C, Lipsh-Sokolik R, Hoch SY, Liebermann D, Nevo R, Weissman H, Petrovich-Kopitman E, Margulies D, Ivankov D, McCandlish DM, Fleishman SJ. Designed active-site library reveals thousands of functional GFP variants. Nat Commun 2023; 14:2890. [PMID: 37210560 DOI: 10.1038/s41467-023-38099-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 04/13/2023] [Indexed: 05/22/2023] Open
Abstract
Mutations in a protein active site can lead to dramatic and useful changes in protein activity. The active site, however, is sensitive to mutations due to a high density of molecular interactions, substantially reducing the likelihood of obtaining functional multipoint mutants. We introduce an atomistic and machine-learning-based approach, called high-throughput Functional Libraries (htFuncLib), that designs a sequence space in which mutations form low-energy combinations that mitigate the risk of incompatible interactions. We apply htFuncLib to the GFP chromophore-binding pocket, and, using fluorescence readout, recover >16,000 unique designs encoding as many as eight active-site mutations. Many designs exhibit substantial and useful diversity in functional thermostability (up to 96 °C), fluorescence lifetime, and quantum yield. By eliminating incompatible active-site mutations, htFuncLib generates a large diversity of functional sequences. We envision that htFuncLib will be used in one-shot optimization of activity in enzymes, binders, and other proteins.
Collapse
Affiliation(s)
| | - Carlos Martí-Gómez
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Rosalie Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Shlomo Yakir Hoch
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Demian Liebermann
- Department of Chemical and Biological Physics, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Reinat Nevo
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Haim Weissman
- Department of Molecular Chemistry and Materials Science, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | | | - David Margulies
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Dmitry Ivankov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel.
| |
Collapse
|
52
|
Chen Y, Hu R, Li K, Zhang Y, Fu L, Zhang J, Si T. Deep Mutational Scanning of an Oxygen-Independent Fluorescent Protein CreiLOV for Comprehensive Profiling of Mutational and Epistatic Effects. ACS Synth Biol 2023; 12:1461-1473. [PMID: 37066862 PMCID: PMC10204710 DOI: 10.1021/acssynbio.2c00662] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Indexed: 04/18/2023]
Abstract
Oxygen-independent, flavin mononucleotide-based fluorescent proteins (FbFPs) are promising alternatives to green fluorescent protein in anaerobic contexts. Deep mutational scanning performs systematic profiling of protein sequence-function relationships but has not been applied to FbFPs. Focusing on CreiLOV from Chlamydomonas reinhardtii, we created and analyzed two comprehensive mutant collections: (1) single-residue, site-saturation mutagenesis libraries covering all 118 residues; and (2) a full combinatorial metagenesis library among 20 mutations at 15 residues, where mutation and residue selection was based on single-site mutagenesis results. Notably, the second type of library is indispensable to study higher-order epistasis but underrepresented in the literature. Using optimized FACS-seq assays, 2,185 (>92.5%) out of 2,360 possible single-site mutants and 165,428 (>89.7%) out of 184,320 possible combinatorial mutants were reliably assigned with fitness values. We constructed statistical and machine-learning models to analyze the CreiLOV data set, enabling accurate fitness prediction of higher-order mutants using lower-order mutagenesis data. In addition, we successfully isolated CreiLOV variants with improved fluorescence quantum yield and thermostability. This work provides new empirical data and design rules to engineer combinatorial protein variants.
Collapse
Affiliation(s)
- Yongcan Chen
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Ruyun Hu
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Keyi Li
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yating Zhang
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Lihao Fu
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University
of Chinese Academy of Sciences, Beijing 100049, China
| | - Jianzhi Zhang
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Tong Si
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- BGI-Shenzhen, Shenzhen 518083, China
- University
of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
53
|
Rabitz H, Russell B, Ho TS. The Surprising Ease of Finding Optimal Solutions for Controlling Nonlinear Phenomena in Quantum and Classical Complex Systems. J Phys Chem A 2023; 127:4224-4236. [PMID: 37142303 DOI: 10.1021/acs.jpca.3c01896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
This Perspective addresses the often observed surprising ease of achieving optimal control of nonlinear phenomena in quantum and classical complex systems. The circumstances involved are wide-ranging, with scenarios including manipulation of atomic scale processes, maximization of chemical and material properties or synthesis yields, Nature's optimization of species' populations by natural selection, and directed evolution. Natural evolution will mainly be discussed in terms of laboratory experiments with microorganisms, and the field is also distinct from the other domains where a scientist specifies the goal(s) and oversees the control process. We use the word "control" in reference to all of the available variables, regardless of the circumstance. The empirical observations on the ease of achieving at least good, if not excellent, control in diverse domains of science raise the question of why this occurs despite the generally inherent complexity of the systems in each scenario. The key to addressing the question lies in examining the associated control landscape, which is defined as the optimization objective as a function of the control variables that can be as diverse as the phenomena under consideration. Controls may range from laser pulses, chemical reagents, chemical processing conditions, out to nucleic acids in the genome and more. This Perspective presents a conjecture, based on present findings, that the systematics of readily finding good outcomes from controlled phenomena may be unified through consideration of control landscapes with the same common set of three underlying assumptions─the existence of an optimal solution, the ability for local movement on the landscape, and the availability of sufficient control resources─whose validity needs assessment in each scenario. In practice, many cases permit using myopic gradient-like algorithms while other circumstances utilize algorithms having some elements of stochasticity or introduced noise, depending on whether the landscape is locally smooth or rough. The overarching observation is that only relatively short searches are required despite the common high dimensionality of the available controls in typical scenarios.
Collapse
Affiliation(s)
- Herschel Rabitz
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Benjamin Russell
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Tak-San Ho
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| |
Collapse
|
54
|
Gantz M, Neun S, Medcalf EJ, van Vliet LD, Hollfelder F. Ultrahigh-Throughput Enzyme Engineering and Discovery in In Vitro Compartments. Chem Rev 2023; 123:5571-5611. [PMID: 37126602 PMCID: PMC10176489 DOI: 10.1021/acs.chemrev.2c00910] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Novel and improved biocatalysts are increasingly sourced from libraries via experimental screening. The success of such campaigns is crucially dependent on the number of candidates tested. Water-in-oil emulsion droplets can replace the classical test tube, to provide in vitro compartments as an alternative screening format, containing genotype and phenotype and enabling a readout of function. The scale-down to micrometer droplet diameters and picoliter volumes brings about a >107-fold volume reduction compared to 96-well-plate screening. Droplets made in automated microfluidic devices can be integrated into modular workflows to set up multistep screening protocols involving various detection modes to sort >107 variants a day with kHz frequencies. The repertoire of assays available for droplet screening covers all seven enzyme commission (EC) number classes, setting the stage for widespread use of droplet microfluidics in everyday biochemical experiments. We review the practicalities of adapting droplet screening for enzyme discovery and for detailed kinetic characterization. These new ways of working will not just accelerate discovery experiments currently limited by screening capacity but profoundly change the paradigms we can probe. By interfacing the results of ultrahigh-throughput droplet screening with next-generation sequencing and deep learning, strategies for directed evolution can be implemented, examined, and evaluated.
Collapse
Affiliation(s)
- Maximilian Gantz
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Rd, Cambridge CB2 1GA, U.K
| | - Stefanie Neun
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Rd, Cambridge CB2 1GA, U.K
| | - Elliot J Medcalf
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Rd, Cambridge CB2 1GA, U.K
| | - Liisa D van Vliet
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Rd, Cambridge CB2 1GA, U.K
| | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Rd, Cambridge CB2 1GA, U.K
| |
Collapse
|
55
|
Radford F, Rinehart J, Isaacs FJ. Mapping the in vivo fitness landscape of a tethered ribosome. SCIENCE ADVANCES 2023; 9:eade8934. [PMID: 37115918 PMCID: PMC10146877 DOI: 10.1126/sciadv.ade8934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Fitness landscapes are models of the sequence space of a genetic element that map how each sequence corresponds to its activity and can be used to guide laboratory evolution. The ribosome is a macromolecular machine that is essential for protein synthesis in all organisms. Because of the prevalence of dominant lethal mutations, a comprehensive fitness landscape of the ribosomal peptidyl transfer center (PTC) has not yet been attained. Here, we develop a method to functionally map an orthogonal tethered ribosome (oRiboT), which permits complete mutagenesis of nucleotides located in the PTC and the resulting epistatic interactions. We found that most nucleotides studied showed flexibility to mutation, and identified epistatic interactions between them, which compensate for deleterious mutations. This work provides a basis for a deeper understanding of ribosome function and malleability and could be used to inform design of engineered ribosomes with applications to synthesize next-generation biomaterials and therapeutics.
Collapse
Affiliation(s)
- Felix Radford
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06520, USA
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Jesse Rinehart
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
- Department of Cellular and Molecular Physiology, Yale School of Medicine, New Haven, CT 06520, USA
| | - Farren J. Isaacs
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06520, USA
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT 06520, USA
- Corresponding author.
| |
Collapse
|
56
|
Barroso GV, Lohmueller KE. Inferring the mode and strength of ongoing selection. Genome Res 2023; 33:632-643. [PMID: 37055196 PMCID: PMC10234300 DOI: 10.1101/gr.276386.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 03/29/2023] [Indexed: 04/15/2023]
Abstract
Genome sequence data are no longer scarce. The UK Biobank alone comprises 200,000 individual genomes, with more on the way, leading the field of human genetics toward sequencing entire populations. Within the next decades, other model organisms will follow suit, especially domesticated species such as crops and livestock. Having sequences from most individuals in a population will present new challenges for using these data to improve health and agriculture in the pursuit of a sustainable future. Existing population genetic methods are designed to model hundreds of randomly sampled sequences but are not optimized for extracting the information contained in the larger and richer data sets that are beginning to emerge, with thousands of closely related individuals. Here we develop a new method called trio-based inference of dominance and selection (TIDES) that uses data from tens of thousands of family trios to make inferences about natural selection acting in a single generation. TIDES further improves on the state of the art by making no assumptions regarding demography, linkage, or dominance. We discuss how our method paves the way for studying natural selection from new angles.
Collapse
Affiliation(s)
- Gustavo V Barroso
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095-1606, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095-1606, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
57
|
Kikani B, Patel R, Thumar J, Bhatt H, Rathore DS, Koladiya GA, Singh SP. Solvent tolerant enzymes in extremophiles: Adaptations and applications. Int J Biol Macromol 2023; 238:124051. [PMID: 36933597 DOI: 10.1016/j.ijbiomac.2023.124051] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 03/05/2023] [Accepted: 03/12/2023] [Indexed: 03/18/2023]
Abstract
Non-aqueous enzymology has always drawn attention due to the wide range of unique possibilities in biocatalysis. In general, the enzymes do not or insignificantly catalyze substrate in the presence of solvents. This is due to the interfering interactions of the solvents between enzyme and water molecules at the interface. Therefore, information about solvent-stable enzymes is scarce. Yet, solvent-stable enzymes prove quite valuable in the present day biotechnology. The enzymatic hydrolysis of the substrates in solvents synthesizes commercially valuable products, such as peptides, esters, and other transesterification products. Extremophiles, the most valuable yet not extensively explored candidates, can be an excellent source to investigate this avenue. Due to inherent structural attributes, many extremozymes can catalyze and maintain stability in organic solvents. In the present review, we aim to consolidate information about the solvent-stable enzymes from various extremophilic microorganisms. Further, it would be interesting to learn about the mechanism adapted by these microorganisms to sustain solvent stress. Various approaches to protein engineering are used to enhance catalytic flexibility and stability and broaden biocatalysis's prospects under non-aqueous conditions. It also describes strategies to achieve optimal immobilization with minimum inhibition of the catalysis. The proposed review would significantly aid our understanding of non-aqueous enzymology.
Collapse
Affiliation(s)
- Bhavtosh Kikani
- Department of Biosciences, Saurashtra University, Rajkot 360 005, Gujarat, India; Department of Biological Sciences, P.D. Patel Institute of Applied Sciences, Charotar University of Science and Technology, Changa 388 421, Gujarat, India
| | - Rajesh Patel
- Department of Biosciences, Veer Narmad South Gujarat University, Surat 395 007, Gujarat, India
| | - Jignasha Thumar
- Government Science College, Gandhinagar 382 016, Gujarat, India
| | - Hitarth Bhatt
- Department of Biosciences, Saurashtra University, Rajkot 360 005, Gujarat, India; Department of Microbiology, Faculty of Science, Atmiya University, Rajkot 360005, Gujarat, India
| | - Dalip Singh Rathore
- Department of Biosciences, Saurashtra University, Rajkot 360 005, Gujarat, India; Gujarat Biotechnology Research Centre, Gandhinagar 382 010, Gujarat, India
| | - Gopi A Koladiya
- Department of Biosciences, Saurashtra University, Rajkot 360 005, Gujarat, India
| | - Satya P Singh
- Department of Biosciences, Saurashtra University, Rajkot 360 005, Gujarat, India.
| |
Collapse
|
58
|
Qiao J, Sheng Y, Wang M, Li A, Li X, Huang H. Evolving Robust and Interpretable Enzymes for the Bioethanol Industry. Angew Chem Int Ed Engl 2023; 62:e202300320. [PMID: 36701239 DOI: 10.1002/anie.202300320] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 01/19/2023] [Accepted: 01/26/2023] [Indexed: 01/27/2023]
Abstract
Obtaining a robust and applicable enzyme for bioethanol production is a dream for biorefinery engineers. Herein, we describe a general method to evolve an all-round and interpretable enzyme that can be directly employed in the bioethanol industry. By integrating the transferable protein evolution strategy InSiReP 2.0 (In Silico guided Recombination Process), enzymatic characterization for actual production, and computational molecular understanding, the model cellulase PvCel5A (endoglucanase II Cel5A from Penicillium verruculosum) was successfully evolved to overcome the remaining challenges of low ethanol and temperature tolerance, which primarily limited biomass transformation and bioethanol yield. Remarkably, application of the PvCel5A variants in both first- and second-generation bioethanol production processes (i. Conventional corn ethanol fermentation combined with the in situ pretreatment process; ii. cellulosic ethanol fermentation process) resulted in a 5.7-10.1 % increase in the ethanol yield, which was unlikely to be achieved by other optimization techniques.
Collapse
Affiliation(s)
- Jie Qiao
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China
| | - Yijie Sheng
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China
| | - Minghui Wang
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China
| | - Anni Li
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China
| | - Xiujuan Li
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China
| | - He Huang
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China.,School of Pharmaceutical Science, Nanjing Tech University, Nanjing, 211816, China
| |
Collapse
|
59
|
Colizzi ES, van Dijk B, Merks RMH, Rozen DE, Vroomans RMA. Evolution of genome fragility enables microbial division of labor. Mol Syst Biol 2023; 19:e11353. [PMID: 36727665 PMCID: PMC9996244 DOI: 10.15252/msb.202211353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 01/17/2023] [Accepted: 01/19/2023] [Indexed: 02/03/2023] Open
Abstract
Division of labor can evolve when social groups benefit from the functional specialization of its members. Recently, a novel means of coordinating the division of labor was found in the antibiotic-producing bacterium Streptomyces coelicolor, where specialized cells are generated through large-scale genomic re-organization. We investigate how the evolution of a genome architecture enables such mutation-driven division of labor, using a multiscale computational model of bacterial evolution. In this model, bacterial behavior-antibiotic production or replication-is determined by the structure and composition of their genome, which encodes antibiotics, growth-promoting genes, and fragile genomic loci that can induce chromosomal deletions. We find that a genomic organization evolves, which partitions growth-promoting genes and antibiotic-coding genes into distinct parts of the genome, separated by fragile genomic loci. Mutations caused by these fragile sites mostly delete growth-promoting genes, generating sterile, and antibiotic-producing mutants from weakly-producing progenitors, in agreement with experimental observations. This division of labor enhances the competition between colonies by promoting antibiotic diversity. These results show that genomic organization can co-evolve with genomic instabilities to enable reproductive division of labor.
Collapse
Affiliation(s)
- Enrico Sandro Colizzi
- Mathematical Institute, Leiden University, Leiden, The Netherlands.,Origins Center, Leiden, The Netherlands.,Sainsbury Laboratory, Cambridge University, Cambridge, UK
| | - Bram van Dijk
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Roeland M H Merks
- Mathematical Institute, Leiden University, Leiden, The Netherlands.,Origins Center, Leiden, The Netherlands.,Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Daniel E Rozen
- Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Renske M A Vroomans
- Origins Center, Leiden, The Netherlands.,Sainsbury Laboratory, Cambridge University, Cambridge, UK.,Informatic Institute, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
60
|
Johansson KE, Lindorff-Larsen K, Winther JR. Global Analysis of Multi-Mutants to Improve Protein Function. J Mol Biol 2023; 435:168034. [PMID: 36863661 DOI: 10.1016/j.jmb.2023.168034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/04/2023]
Abstract
The identification of amino acid substitutions that both enhance the stability and function of a protein is a key challenge in protein engineering. Technological advances have enabled assaying thousands of protein variants in a single high-throughput experiment, and more recent studies use such data in protein engineering. We present a Global Multi-Mutant Analysis (GMMA) that exploits the presence of multiply-substituted variants to identify individual amino acid substitutions that are beneficial for the stability and function across a large library of protein variants. We have applied GMMA to a previously published experiment reporting on >54,000 variants of green fluorescent protein (GFP), each with known fluorescence output, and each carrying 1-15 amino acid substitutions (Sarkisyan et al., 2016). The GMMA method achieves a good fit to this dataset while being analytically transparent. We show experimentally that the six top-ranking substitutions progressively enhance GFP. More broadly, using only a single experiment as input our analysis recovers nearly all the substitutions previously reported to be beneficial for GFP folding and function. In conclusion, we suggest that large libraries of multiply-substituted variants may provide a unique source of information for protein engineering.
Collapse
Affiliation(s)
- Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology of (University of Copenhagen), Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology of (University of Copenhagen), Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Jakob R Winther
- Linderstrøm-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology of (University of Copenhagen), Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
61
|
Reiter F, de Almeida BP, Stark A. Enhancers display constrained sequence flexibility and context-specific modulation of motif function. Genome Res 2023; 33:346-358. [PMID: 36941077 PMCID: PMC10078294 DOI: 10.1101/gr.277246.122] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 02/14/2023] [Indexed: 03/23/2023]
Abstract
The information about when and where each gene is to be expressed is mainly encoded in the DNA sequence of enhancers, sequence elements that comprise binding sites (motifs) for different transcription factors (TFs). Most of the research on enhancer sequences has been focused on TF motif presence, whereas the enhancer syntax, that is, the flexibility of important motif positions and how the sequence context modulates the activity of TF motifs, remains poorly understood. Here, we explore the rules of enhancer syntax by a two-pronged approach in Drosophila melanogaster S2 cells: we (1) replace important TF motifs by all possible 65,536 eight-nucleotide-long sequences and (2) paste eight important TF motif types into 763 positions within 496 enhancers. These complementary strategies reveal that enhancers display constrained sequence flexibility and the context-specific modulation of motif function. Important motifs can be functionally replaced by hundreds of sequences constituting several distinct motif types, but these are only a fraction of all possible sequences and motif types. Moreover, TF motifs contribute with different intrinsic strengths that are strongly modulated by the enhancer sequence context (the flanking sequence, the presence and diversity of other motif types, and the distance between motifs), such that not all motif types can work in all positions. The context-specific modulation of motif function is also a hallmark of human enhancers, as we demonstrate experimentally. Overall, these two general principles of enhancer sequences are important to understand and predict enhancer function during development, evolution, and in disease.
Collapse
Affiliation(s)
- Franziska Reiter
- Research Institute of Molecular Pathology, Vienna BioCenter, Campus-Vienna-BioCenter 1, 1030 Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, 1030 Vienna, Austria
| | - Bernardo P de Almeida
- Research Institute of Molecular Pathology, Vienna BioCenter, Campus-Vienna-BioCenter 1, 1030 Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, 1030 Vienna, Austria
| | - Alexander Stark
- Research Institute of Molecular Pathology, Vienna BioCenter, Campus-Vienna-BioCenter 1, 1030 Vienna, Austria;
- Medical University of Vienna, Vienna BioCenter, 1030 Vienna, Austria
| |
Collapse
|
62
|
Xu H, Woicik A, Poon H, Altman RB, Wang S. Multilingual translation for zero-shot biomedical classification using BioTranslator. Nat Commun 2023; 14:738. [PMID: 36759510 PMCID: PMC9911740 DOI: 10.1038/s41467-023-36476-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 02/01/2023] [Indexed: 02/11/2023] Open
Abstract
Existing annotation paradigms rely on controlled vocabularies, where each data instance is classified into one term from a predefined set of controlled vocabularies. This paradigm restricts the analysis to concepts that are known and well-characterized. Here, we present the novel multilingual translation method BioTranslator to address this problem. BioTranslator takes a user-written textual description of a new concept and then translates this description to a non-text biological data instance. The key idea of BioTranslator is to develop a multilingual translation framework, where multiple modalities of biological data are all translated to text. We demonstrate how BioTranslator enables the identification of novel cell types using only a textual description and how BioTranslator can be further generalized to protein function prediction and drug target identification. Our tool frees scientists from limiting their analyses within predefined controlled vocabularies, enabling them to interact with biological data using free text.
Collapse
Affiliation(s)
- Hanwen Xu
- School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Addie Woicik
- School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | | | - Russ B Altman
- Department of Bioengineering, Stanford University, Stanford, CA, USA.,Department of Genetics, Stanford University, Stanford, CA, USA.,Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Sheng Wang
- School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
63
|
Li M, Kang L, Xiong Y, Wang YG, Fan G, Tan P, Hong L. SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering. J Cheminform 2023; 15:12. [PMID: 36737798 PMCID: PMC9898993 DOI: 10.1186/s13321-023-00688-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 01/23/2023] [Indexed: 02/05/2023] Open
Abstract
Deep learning has been widely used for protein engineering. However, it is limited by the lack of sufficient experimental data to train an accurate model for predicting the functional fitness of high-order mutants. Here, we develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants by leveraging both sequence and structure information, and exploiting attention mechanism. Our model integrates local evolutionary context from homologous sequences, the global evolutionary context encoding rich semantic from the universal protein sequence space and the structure information accounting for the microenvironment around each residue in a protein. We show that SESNet outperforms state-of-the-art models for predicting the sequence-function relationship on 26 deep mutational scanning datasets. More importantly, we propose a data augmentation strategy by leveraging the data from unsupervised models to pre-train our model. After that, our model can achieve strikingly high accuracy in prediction of the fitness of protein mutants, especially for the higher order variants (> 4 mutation sites), when finetuned by using only a small number of experimental mutation data (< 50). The strategy proposed is of great practical value as the required experimental effort, i.e., producing a few tens of experimental mutation data on a given protein, is generally affordable by an ordinary biochemical group and can be applied on almost any protein.
Collapse
Affiliation(s)
- Mingchen Li
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200240, China
| | - Liqi Kang
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
- School of Physics and Astronomy & School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yu Guang Wang
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China
| | - Guisheng Fan
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200240, China
| | - Pan Tan
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China.
| | - Liang Hong
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China.
- School of Physics and Astronomy & School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
64
|
Qiu Y, Wei GW. Persistent spectral theory-guided protein engineering. NATURE COMPUTATIONAL SCIENCE 2023; 3:149-163. [PMID: 37637776 PMCID: PMC10456983 DOI: 10.1038/s43588-022-00394-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 12/22/2022] [Indexed: 08/29/2023]
Abstract
While protein engineering, which iteratively optimizes protein fitness by screening the gigantic mutational space, is constrained by experimental capacity, various machine learning models have substantially expedited protein engineering. Three-dimensional protein structures promise further advantages, but their intricate geometric complexity hinders their applications in deep mutational screening. Persistent homology, an established algebraic topology tool for protein structural complexity reduction, fails to capture the homotopic shape evolution during the filtration of a given data. This work introduces a Topology-offered protein Fitness (TopFit) framework to complement protein sequence and structure embeddings. Equipped with an ensemble regression strategy, TopFit integrates the persistent spectral theory, a new topological Laplacian, and two auxiliary sequence embeddings to capture mutation-induced topological invariant, shape evolution, and sequence disparity in the protein fitness landscape. The performance of TopFit is assessed by 34 benchmark datasets with 128,634 variants, involving a vast variety of protein structure acquisition modalities and training set size variations.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI, 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
| |
Collapse
|
65
|
Serebryany E, Zhao VY, Park K, Bitran A, Trauger SA, Budnik B, Shakhnovich EI. Systematic conformation-to-phenotype mapping via limited deep-sequencing of proteins. ARXIV 2023:2204.06159. [PMID: 36776823 PMCID: PMC9915745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/14/2023]
Abstract
Non-native conformations drive protein misfolding diseases, complicate bioengineering efforts, and fuel molecular evolution. No current experimental technique is well-suited for elucidating them and their phenotypic effects. Especially intractable are the transient conformations populated by intrinsically disordered proteins. We describe an approach to systematically discover, stabilize, and purify native and non-native conformations, generated in vitro or in vivo, and directly link conformations to molecular, organismal, or evolutionary phenotypes. This approach involves high-throughput disulfide scanning (HTDS) of the entire protein. To reveal which disulfides trap which chromatographically resolvable conformers, we devised a deep-sequencing method for double-Cys variant libraries of proteins that precisely and simultaneously locates both Cys residues within each polypeptide. HTDS of the abundant E. coli periplasmic chaperone HdeA revealed distinct classes of disordered hydrophobic conformers with variable cytotoxicity depending on where the backbone was cross-linked. HTDS can bridge conformational and phenotypic landscapes for many proteins that function in disulfide-permissive environments.
Collapse
Affiliation(s)
- Eugene Serebryany
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA
| | - Victor Y. Zhao
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA
| | - Kibum Park
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA
| | - Amir Bitran
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA
| | | | - Bogdan Budnik
- Center for Mass Spectrometry, Harvard University, Cambridge, MA
| | | |
Collapse
|
66
|
Tresnak DT, Hackel BJ. Deep Antimicrobial Activity and Stability Analysis Inform Lysin Sequence-Function Mapping. ACS Synth Biol 2023; 12:249-264. [PMID: 36599162 PMCID: PMC10822705 DOI: 10.1021/acssynbio.2c00509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Antibiotic-resistant infectious disease is a critical challenge to human health. Antimicrobial proteins offer a compelling solution if engineered for potency, selectivity, and physiological stability. Lysins, which lyse cells via degradation of cell wall peptidoglycans, have significant potential to fill this role. Yet, the functional complexity of antimicrobial activity has hindered high-throughput characterization for discovery and design. To dramatically expand knowledge of the sequence-function landscape of lysins, we developed a depletion-based assay for library-scale measurement of lysin inhibitory activity. We coupled this platform with a high-throughput proteolytic stability assay to assess the activity and stability of ∼5 × 104 lysin catalytic domain variants, resulting in the discovery of a variant with increased activity (70 ± 20%) and stability (7.2 ± 0.4 °C increased midpoint of thermal denaturation). Ridge regression of the resulting data set demonstrated that libraries with a higher average Hamming distance better informed pairwise models and that coupling activity and stability assays enabled better prediction of catalytically active lysins. The best models achieved Pearson's correlation coefficients of 0.87 ± 0.01 and 0.61 ± 0.04 for predicting catalytic domain stability and activity, respectively. Our work provides an efficient strategy for constructing protein sequence-function landscapes, drastically increases screening throughput for engineering lysins, and yields promising lysins for further development.
Collapse
Affiliation(s)
- Daniel T. Tresnak
- Department of Chemical Engineering and Materials Science, University of Minnesota – Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455
| | - Benjamin J. Hackel
- Department of Chemical Engineering and Materials Science, University of Minnesota – Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455
| |
Collapse
|
67
|
Clifton BE, Kozome D, Laurino P. Efficient Exploration of Sequence Space by Sequence-Guided Protein Engineering and Design. Biochemistry 2023; 62:210-220. [PMID: 35245020 DOI: 10.1021/acs.biochem.1c00757] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The rapid growth of sequence databases over the past two decades means that protein engineers faced with optimizing a protein for any given task will often have immediate access to a vast number of related protein sequences. These sequences encode information about the evolutionary history of the protein and the underlying sequence requirements to produce folded, stable, and functional protein variants. Methods that can take advantage of this information are an increasingly important part of the protein engineering tool kit. In this Perspective, we discuss the utility of sequence data in protein engineering and design, focusing on recent advances in three main areas: the use of ancestral sequence reconstruction as an engineering tool to generate thermostable and multifunctional proteins, the use of sequence data to guide engineering of multipoint mutants by structure-based computational protein design, and the use of unlabeled sequence data for unsupervised and semisupervised machine learning, allowing the generation of diverse and functional protein sequences in unexplored regions of sequence space. Altogether, these methods enable the rapid exploration of sequence space within regions enriched with functional proteins and therefore have great potential for accelerating the engineering of stable, functional, and diverse proteins for industrial and biomedical applications.
Collapse
Affiliation(s)
- Ben E Clifton
- Protein Engineering and Evolution Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan
| | - Dan Kozome
- Protein Engineering and Evolution Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan
| | - Paola Laurino
- Protein Engineering and Evolution Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan
| |
Collapse
|
68
|
Dewachter L, Brooks AN, Noon K, Cialek C, Clark-ElSayed A, Schalck T, Krishnamurthy N, Versées W, Vranken W, Michiels J. Deep mutational scanning of essential bacterial proteins can guide antibiotic development. Nat Commun 2023; 14:241. [PMID: 36646716 PMCID: PMC9842644 DOI: 10.1038/s41467-023-35940-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 01/09/2023] [Indexed: 01/18/2023] Open
Abstract
Deep mutational scanning is a powerful approach to investigate a wide variety of research questions including protein function and stability. Here, we perform deep mutational scanning on three essential E. coli proteins (FabZ, LpxC and MurA) involved in cell envelope synthesis using high-throughput CRISPR genome editing, and study the effect of the mutations in their original genomic context. We use more than 17,000 variants of the proteins to interrogate protein function and the importance of individual amino acids in supporting viability. Additionally, we exploit these libraries to study resistance development against antimicrobial compounds that target the selected proteins. Among the three proteins studied, MurA seems to be the superior antimicrobial target due to its low mutational flexibility, which decreases the chance of acquiring resistance-conferring mutations that simultaneously preserve MurA function. Additionally, we rank anti-LpxC lead compounds for further development, guided by the number of resistance-conferring mutations against each compound. Our results show that deep mutational scanning studies can be used to guide drug development, which we hope will contribute towards the development of novel antimicrobial therapies.
Collapse
Affiliation(s)
- Liselot Dewachter
- Centre of Microbial and Plant Genetics, KU Leuven, Leuven, Belgium. .,VIB-KU Leuven Center for Microbiology, Leuven, Belgium.
| | | | | | | | | | - Thomas Schalck
- Centre of Microbial and Plant Genetics, KU Leuven, Leuven, Belgium.,VIB-KU Leuven Center for Microbiology, Leuven, Belgium
| | | | - Wim Versées
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.,VIB-VUB Center for Structural Biology, Brussels, Belgium
| | - Wim Vranken
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.,VIB-VUB Center for Structural Biology, Brussels, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
| | - Jan Michiels
- Centre of Microbial and Plant Genetics, KU Leuven, Leuven, Belgium. .,VIB-KU Leuven Center for Microbiology, Leuven, Belgium.
| |
Collapse
|
69
|
Wei H, Li X. Deep mutational scanning: A versatile tool in systematically mapping genotypes to phenotypes. Front Genet 2023; 14:1087267. [PMID: 36713072 PMCID: PMC9878224 DOI: 10.3389/fgene.2023.1087267] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 01/02/2023] [Indexed: 01/13/2023] Open
Abstract
Unveiling how genetic variations lead to phenotypic variations is one of the key questions in evolutionary biology, genetics, and biomedical research. Deep mutational scanning (DMS) technology has allowed the mapping of tens of thousands of genetic variations to phenotypic variations efficiently and economically. Since its first systematic introduction about a decade ago, we have witnessed the use of deep mutational scanning in many research areas leading to scientific breakthroughs. Also, the methods in each step of deep mutational scanning have become much more versatile thanks to the oligo-synthesizing technology, high-throughput phenotyping methods and deep sequencing technology. However, each specific possible step of deep mutational scanning has its pros and cons, and some limitations still await further technological development. Here, we discuss recent scientific accomplishments achieved through the deep mutational scanning and describe widely used methods in each step of deep mutational scanning. We also compare these different methods and analyze their advantages and disadvantages, providing insight into how to design a deep mutational scanning study that best suits the aims of the readers' projects.
Collapse
Affiliation(s)
- Huijin Wei
- Zhejiang University—University of Edinburgh Institute, Zhejiang University, Haining, Zhejiang, China
| | - Xianghua Li
- Zhejiang University—University of Edinburgh Institute, Zhejiang University, Haining, Zhejiang, China,Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh, United Kingdom,The Second Affiliated Hospital of Zhejiang University, Hangzhou, Zhejiang, China,Biomedical and Health Translational Centre of Zhejiang Province, Haining, Zhejiang, China,*Correspondence: Xianghua Li,
| |
Collapse
|
70
|
Pak MA, Markhieva KA, Novikova MS, Petrov DS, Vorobyev IS, Maksimova ES, Kondrashov FA, Ivankov DN. Using AlphaFold to predict the impact of single mutations on protein stability and function. PLoS One 2023; 18:e0282689. [PMID: 36928239 PMCID: PMC10019719 DOI: 10.1371/journal.pone.0282689] [Citation(s) in RCA: 66] [Impact Index Per Article: 66.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 02/21/2023] [Indexed: 03/17/2023] Open
Abstract
AlphaFold changed the field of structural biology by achieving three-dimensional (3D) structure prediction from protein sequence at experimental quality. The astounding success even led to claims that the protein folding problem is "solved". However, protein folding problem is more than just structure prediction from sequence. Presently, it is unknown if the AlphaFold-triggered revolution could help to solve other problems related to protein folding. Here we assay the ability of AlphaFold to predict the impact of single mutations on protein stability (ΔΔG) and function. To study the question we extracted the pLDDT and <pLDDT> metrics from AlphaFold predictions before and after single mutation in a protein and correlated the predicted change with the experimentally known ΔΔG values. Additionally, we correlated the same AlphaFold pLDDT metrics with the impact of a single mutation on structure using a large scale dataset of single mutations in GFP with the experimentally assayed levels of fluorescence. We found a very weak or no correlation between AlphaFold output metrics and change of protein stability or fluorescence. Our results imply that AlphaFold may not be immediately applied to other problems or applications in protein folding.
Collapse
Affiliation(s)
- Marina A. Pak
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
| | | | - Mariia S. Novikova
- Armand Hammer United World College of the American West, Montezuma, New Mexico, United Stated of America
| | - Dmitry S. Petrov
- Specialized Educational and Scientific Center of UrFU (SUNC UrFU), Ekaterinburg, Russia
| | - Ilya S. Vorobyev
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
| | | | - Fyodor A. Kondrashov
- Institute of Science and Technology Austria, Maria Gugging, Austria
- Evolutionary and Synthetic Biology Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa, Japan
| | - Dmitry N. Ivankov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
- * E-mail:
| |
Collapse
|
71
|
Gilliot PA, Gorochowski TE. Design and Analysis of Massively Parallel Reporter Assays Using FORECAST. Methods Mol Biol 2023; 2553:41-56. [PMID: 36227538 DOI: 10.1007/978-1-0716-2617-7_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Machine learning is revolutionizing molecular biology and bioengineering by providing powerful insights and predictions. Massively parallel reporter assays (MPRAs) have emerged as a particularly valuable class of high-throughput technique to support such algorithms. MPRAs enable the simultaneous characterization of thousands or even millions of genetic constructs and provide the large amounts of data needed to train models. However, while the scale of this approach is impressive, the design of effective MPRA experiments is challenging due to the many factors that can be varied and the difficulty in predicting how these will impact the quality and quantity of data obtained. Here, we present a computational tool called FORECAST, which can simulate MPRA experiments based on fluorescence-activated cell sorting and subsequent sequencing (commonly referred to as Flow-seq or Sort-seq experiments), as well as carry out rigorous statistical estimation of construct performance from this type of experimental data. FORECAST can be used to develop workflows to aid the design of MPRA experiments and reanalyze existing MPRA data sets.
Collapse
|
72
|
Harmalkar A, Rao R, Richard Xie Y, Honer J, Deisting W, Anlahr J, Hoenig A, Czwikla J, Sienz-Widmann E, Rau D, Rice AJ, Riley TP, Li D, Catterall HB, Tinberg CE, Gray JJ, Wei KY. Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features. MAbs 2023; 15:2163584. [PMID: 36683173 PMCID: PMC9872953 DOI: 10.1080/19420862.2022.2163584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 12/14/2022] [Accepted: 12/26/2022] [Indexed: 01/24/2023] Open
Abstract
Over the last three decades, the appeal for monoclonal antibodies (mAbs) as therapeutics has been steadily increasing as evident with FDA's recent landmark approval of the 100th mAb. Unlike mAbs that bind to single targets, multispecific biologics (msAbs) have garnered particular interest owing to the advantage of engaging distinct targets. One important modular component of msAbs is the single-chain variable fragment (scFv). Despite the exquisite specificity and affinity of these scFv modules, their relatively poor thermostability often hampers their development as a potential therapeutic drug. In recent years, engineering antibody sequences to enhance their stability by mutations has gained considerable momentum. As experimental methods for antibody engineering are time-intensive, laborious and expensive, computational methods serve as a fast and inexpensive alternative to conventional routes. In this work, we show two machine learning approaches - one with pre-trained language models (PTLM) capturing functional effects of sequence variation, and second, a supervised convolutional neural network (CNN) trained with Rosetta energetic features - to better classify thermostable scFv variants from sequence. Both of these models are trained over temperature-specific data (TS50 measurements) derived from multiple libraries of scFv sequences. On out-of-distribution (refers to the fact that the out-of-distribution sequnes are blind to the algorithm) sequences, we show that a sufficiently simple CNN model performs better than general pre-trained language models trained on diverse protein sequences (average Spearman correlation coefficient, ρ , of 0.4 as opposed to 0.15). On the other hand, an antibody-specific language model performs comparatively better than the CNN model on the same task (ρ = 0.52). Further, we demonstrate that for an independent mAb with available thermal melting temperatures for 20 experimentally characterized thermostable mutations, these models trained on TS50 data could identify 18 residue positions and 5 identical amino-acid mutations showing remarkable generalizability. Our results suggest that such models can be broadly applicable for improving the biological characteristics of antibodies. Further, transferring such models for alternative physicochemical properties of scFvs can have potential applications in optimizing large-scale production and delivery of mAbs or bsAbs.
Collapse
Affiliation(s)
- Ameya Harmalkar
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Roshan Rao
- Electrical Engineering and Computer Science, University of California, Berkeley, CA, USA
| | - Yuxuan Richard Xie
- Department of Bioengineering and Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jonas Honer
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Wibke Deisting
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Jonas Anlahr
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Anja Hoenig
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Julia Czwikla
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Eva Sienz-Widmann
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Doris Rau
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Austin J. Rice
- Therapeutic Discovery, Amgen Research, Amgen Inc, Thousand Oaks, CA, USA
| | - Timothy P. Riley
- Therapeutic Discovery, Amgen Research, Amgen Inc, Thousand Oaks, CA, USA
| | - Danqing Li
- Therapeutic Discovery, Amgen Research, Amgen Inc, Thousand Oaks, CA, USA
| | | | | | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Kathy Y. Wei
- Therapeutic Discovery, Amgen Research, Amgen Inc, South San Francisco, CA, USA
| |
Collapse
|
73
|
Evolutionary scaling of maximum growth rate with organism size. Sci Rep 2022; 12:22586. [PMID: 36585440 PMCID: PMC9803686 DOI: 10.1038/s41598-022-23626-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 11/02/2022] [Indexed: 12/31/2022] Open
Abstract
Data from nearly 1000 species reveal the upper bound to rates of biomass production achievable by natural selection across the Tree of Life. For heterotrophs, maximum growth rates scale positively with organism size in bacteria but negatively in eukaryotes, whereas for phototrophs, the scaling is negligible for cyanobacteria and weakly negative for eukaryotes. These results have significant implications for understanding the bioenergetic consequences of the transition from prokaryotes to eukaryotes, and of the expansion of some groups of the latter into multicellularity. The magnitudes of the scaling coefficients for eukaryotes are significantly lower than expected under any proposed physical-constraint model. Supported by genomic, bioenergetic, and population-genetic data and theory, an alternative hypothesis for the observed negative scaling in eukaryotes postulates that growth-diminishing mutations with small effects passively accumulate with increasing organism size as a consequence of associated increases in the power of random genetic drift. In contrast, conditional on the structural and functional features of ribosomes, natural selection has been able to promote bacteria with the fastest possible growth rates, implying minimal conflicts with both bioenergetic constraints and random genetic drift. If this extension of the drift-barrier hypothesis is correct, the interpretations of comparative studies of biological traits that have traditionally ignored differences in population-genetic environments will require revisiting.
Collapse
|
74
|
Fu Y, Bedő J, Papenfuss AT, Rubin AF. Integrating deep mutational scanning and low-throughput mutagenesis data to predict the impact of amino acid variants. Gigascience 2022; 12:giad073. [PMID: 37721410 PMCID: PMC10506130 DOI: 10.1093/gigascience/giad073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 07/02/2023] [Accepted: 08/23/2023] [Indexed: 09/19/2023] Open
Abstract
BACKGROUND Evaluating the impact of amino acid variants has been a critical challenge for studying protein function and interpreting genomic data. High-throughput experimental methods like deep mutational scanning (DMS) can measure the effect of large numbers of variants in a target protein, but because DMS studies have not been performed on all proteins, researchers also model DMS data computationally to estimate variant impacts by predictors. RESULTS In this study, we extended a linear regression-based predictor to explore whether incorporating data from alanine scanning (AS), a widely used low-throughput mutagenesis method, would improve prediction results. To evaluate our model, we collected 146 AS datasets, mapping to 54 DMS datasets across 22 distinct proteins. CONCLUSIONS We show that improved model performance depends on the compatibility of the DMS and AS assays, and the scale of improvement is closely related to the correlation between DMS and AS results.
Collapse
Affiliation(s)
- Yunfan Fu
- The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Pde, Parkville, Victoria 3052, Australia
- The University of Melbourne, Department of Medical Biology, Parkville, Victoria 3010, Australia
| | - Justin Bedő
- The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Pde, Parkville, Victoria 3052, Australia
- The University of Melbourne, Department of Medical Biology, Parkville, Victoria 3010, Australia
| | - Anthony T Papenfuss
- The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Pde, Parkville, Victoria 3052, Australia
- The University of Melbourne, Department of Medical Biology, Parkville, Victoria 3010, Australia
- Peter MacCallum Cancer Centre, Melbourne, Victoria 3000, Australia
| | - Alan F Rubin
- The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Pde, Parkville, Victoria 3052, Australia
- The University of Melbourne, Department of Medical Biology, Parkville, Victoria 3010, Australia
| |
Collapse
|
75
|
Wang W, Peng Z, Yang J. Single-sequence protein structure prediction using supervised transformer protein language models. NATURE COMPUTATIONAL SCIENCE 2022; 2:804-814. [PMID: 38177395 DOI: 10.1038/s43588-022-00373-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 11/06/2022] [Indexed: 01/06/2024]
Abstract
Significant progress has been made in protein structure prediction in recent years. However, it remains challenging for AlphaFold2 and other deep learning-based methods to predict protein structure with single-sequence input. Here we introduce trRosettaX-Single, an automated algorithm for single-sequence protein structure prediction. It incorporates the sequence embedding from a supervised transformer protein language model into a multi-scale network enhanced by knowledge distillation to predict inter-residue two-dimensional geometry, which is then used to reconstruct three-dimensional structures via energy minimization. Benchmark tests show that trRosettaX-Single outperforms AlphaFold2 and RoseTTAFold on orphan proteins and works well on human-designed proteins (with an average template modeling score (TM-score) of 0.79). An experimental test shows that the full trRosettaX-Single pipeline is two times faster than AlphaFold2, using much fewer computing resources (<10%). On 2,000 designed proteins from network hallucination, trRosettaX-Single generates structure models with high confidence. As a demonstration, trRosettaX-Single is applied to missense mutation analysis. These data suggest that trRosettaX-Single may find potential applications in protein design and related studies.
Collapse
Affiliation(s)
- Wenkai Wang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Zhenling Peng
- Ministry of Education Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
| | - Jianyi Yang
- Ministry of Education Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China.
| |
Collapse
|
76
|
Iyengar BR, Wagner A. Bacterial Hsp90 predominantly buffers but does not potentiate the phenotypic effects of deleterious mutations during fluorescent protein evolution. Genetics 2022; 222:iyac154. [PMID: 36227141 PMCID: PMC9713429 DOI: 10.1093/genetics/iyac154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 09/26/2022] [Indexed: 12/13/2022] Open
Abstract
Chaperones facilitate the folding of other ("client") proteins and can thus affect the adaptive evolution of these clients. Specifically, chaperones affect the phenotype of proteins via two opposing mechanisms. On the one hand, they can buffer the effects of mutations in proteins and thus help preserve an ancestral, premutation phenotype. On the other hand, they can potentiate the effects of mutations and thus enhance the phenotypic changes caused by a mutation. We study that how the bacterial Hsp90 chaperone (HtpG) affects the evolution of green fluorescent protein. To this end, we performed directed evolution of green fluorescent protein under low and high cellular concentrations of Hsp90. Specifically, we evolved green fluorescent protein under both stabilizing selection for its ancestral (green) phenotype and directional selection toward a new (cyan) phenotype. While Hsp90 did only affect the rate of adaptive evolution transiently, it did affect the phenotypic effects of mutations that occurred during adaptive evolution. Specifically, Hsp90 allowed strongly deleterious mutations to accumulate in evolving populations by buffering their effects. Our observations show that the role of a chaperone for adaptive evolution depends on the organism and the trait being studied.
Collapse
Affiliation(s)
- Bharat Ravi Iyengar
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015 Lausanne, Switzerland
- Institute for Evolution and Biodiversity, Westfalian Wilhelms—University of Münster, 48149 Münster, Germany
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015 Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, NM 87501, USA
- Stellenbosch Institute for Advanced Study (STIAS), Wallenberg Research Centre at Stellenbosch University, 7600 Stellenbosch, South Africa
| |
Collapse
|
77
|
Abstract
Gene-by-environment interactions play a crucial role in horizontal gene transfer by affecting how the transferred genes alter host fitness. However, how the environment modulates the fitness effect of transferred genes has not been tested systematically in an experimental study. We adapted a high-throughput technique for obtaining very precise estimates of bacterial fitness, in order to measure the fitness effects of 44 orthologs transferred from Salmonella Typhimurium to Escherichia coli in six physiologically relevant environments. We found that the fitness effects of individual genes were highly dependent on the environment, while the distributions of fitness effects across genes were not, with all tested environments resulting in distributions of same shape and spread. Furthermore, the extent to which the fitness effects of a gene varied between environments depended on the average fitness effect of that gene across all environments, with nearly neutral and nearly lethal genes having more consistent fitness effects across all environments compared to deleterious genes. Put together, our results reveal the unpredictable nature of how environmental conditions impact the fitness effects of each individual gene. At the same time, distributions of fitness effects across environments exhibit consistent features, pointing to the generalizability of factors that shape horizontal gene transfer of orthologous genes.
Collapse
Affiliation(s)
- Hande Acar Kirit
- Veterinary and Ecological Sciences, Institute of Infection, University of Liverpool, Liverpool, Merseyside, United Kingdom
- Laboratories of Molecular Anthropology and Microbiome Research, University of Oklahoma, Norman, OK
- Department of Anthropology, University of Oklahoma, Norman, OK
| | - Jonathan P Bollback
- Veterinary and Ecological Sciences, Institute of Infection, University of Liverpool, Liverpool, Merseyside, United Kingdom
| | - Mato Lagator
- School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
78
|
Pillai AS, Hochberg GK, Thornton JW. Simple mechanisms for the evolution of protein complexity. Protein Sci 2022; 31:e4449. [PMID: 36107026 PMCID: PMC9601886 DOI: 10.1002/pro.4449] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/01/2022] [Accepted: 09/10/2022] [Indexed: 01/26/2023]
Abstract
Proteins are tiny models of biological complexity: specific interactions among their many amino acids cause proteins to fold into elaborate structures, assemble with other proteins into higher-order complexes, and change their functions and structures upon binding other molecules. These complex features are classically thought to evolve via long and gradual trajectories driven by persistent natural selection. But a growing body of evidence from biochemistry, protein engineering, and molecular evolution shows that naturally occurring proteins often exist at or near the genetic edge of multimerization, allostery, and even new folds, so just one or a few mutations can trigger acquisition of these properties. These sudden transitions can occur because many of the physical properties that underlie these features are present in simpler proteins as fortuitous by-products of their architecture. Moreover, complex features of proteins can be encoded by huge arrays of sequences, so they are accessible from many different starting points via many possible paths. Because the bridges to these features are both short and numerous, random chance can join selection as a key factor in explaining the evolution of molecular complexity.
Collapse
Affiliation(s)
- Arvind S. Pillai
- Department of Ecology and EvolutionUniversity of ChicagoChicagoIllinoisUSA
- Institute for Protein DesignUniversity of WashingtonSeattleWAUSA
| | - Georg K.A. Hochberg
- Max Planck Institute for Terrestrial MicrobiologyMarburgGermany
- Department of Chemistry, Center for Synthetic MicrobiologyPhilipps University MarburgMarburgGermany
| | - Joseph W. Thornton
- Department of Ecology and EvolutionUniversity of ChicagoChicagoIllinoisUSA
- Departments of Human Genetics and Ecology and EvolutionUniversity of ChicagoChicagoIllinoisUSA
| |
Collapse
|
79
|
Leander M, Liu Z, Cui Q, Raman S. Deep mutational scanning and machine learning reveal structural and molecular rules governing allosteric hotspots in homologous proteins. eLife 2022; 11:e79932. [PMID: 36226916 PMCID: PMC9662819 DOI: 10.7554/elife.79932] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 10/13/2022] [Indexed: 01/29/2023] Open
Abstract
A fundamental question in protein science is where allosteric hotspots - residues critical for allosteric signaling - are located, and what properties differentiate them. We carried out deep mutational scanning (DMS) of four homologous bacterial allosteric transcription factors (aTFs) to identify hotspots and built a machine learning model with this data to glean the structural and molecular properties of allosteric hotspots. We found hotspots to be distributed protein-wide rather than being restricted to 'pathways' linking allosteric and active sites as is commonly assumed. Despite structural homology, the location of hotspots was not superimposable across the aTFs. However, common signatures emerged when comparing hotspots coincident with long-range interactions, suggesting that the allosteric mechanism is conserved among the homologs despite differences in molecular details. Machine learning with our large DMS datasets revealed global structural and dynamic properties to be a strong predictor of whether a residue is a hotspot than local and physicochemical properties. Furthermore, a model trained on one protein can predict hotspots in a homolog. In summary, the overall allosteric mechanism is embedded in the structural fold of the aTF family, but the finer, molecular details are sequence-specific.
Collapse
Affiliation(s)
- Megan Leander
- Department of Biochemistry, University of Wisconsin-MadisonMadisonUnited States
| | - Zhuang Liu
- Department of Physics, Boston UniversityBostonUnited States
| | - Qiang Cui
- Department of Physics, Boston UniversityBostonUnited States
- Department of Chemistry, Boston UniversityBostonUnited States
| | - Srivatsan Raman
- Department of Biochemistry, University of Wisconsin-MadisonMadisonUnited States
- Department of Bacteriology, University of Wisconsin-MadisonMadisonUnited States
- Department of Chemical and Biological Engineering, University of Wisconsin-MadisonMadisonUnited States
| |
Collapse
|
80
|
Azbukina N, Zharikova A, Ramensky V. Intragenic compensation through the lens of deep mutational scanning. Biophys Rev 2022; 14:1161-1182. [PMID: 36345285 PMCID: PMC9636336 DOI: 10.1007/s12551-022-01005-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 09/26/2022] [Indexed: 12/20/2022] Open
Abstract
A significant fraction of mutations in proteins are deleterious and result in adverse consequences for protein function, stability, or interaction with other molecules. Intragenic compensation is a specific case of positive epistasis when a neutral missense mutation cancels effect of a deleterious mutation in the same protein. Permissive compensatory mutations facilitate protein evolution, since without them all sequences would be extremely conserved. Understanding compensatory mechanisms is an important scientific challenge at the intersection of protein biophysics and evolution. In human genetics, intragenic compensatory interactions are important since they may result in variable penetrance of pathogenic mutations or fixation of pathogenic human alleles in orthologous proteins from related species. The latter phenomenon complicates computational and clinical inference of an allele's pathogenicity. Deep mutational scanning is a relatively new technique that enables experimental studies of functional effects of thousands of mutations in proteins. We review the important aspects of the field and discuss existing limitations of current datasets. We reviewed ten published DMS datasets with quantified functional effects of single and double mutations and described rates and patterns of intragenic compensation in eight of them. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-022-01005-w.
Collapse
Affiliation(s)
- Nadezhda Azbukina
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
| | - Anastasia Zharikova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| | - Vasily Ramensky
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| |
Collapse
|
81
|
Abstract
One core goal of genetics is to systematically understand the mapping between the DNA sequence of an organism (genotype) and its measurable characteristics (phenotype). Understanding this mapping is often challenging because of interactions between mutations, where the result of combining several different mutations can be very different than the sum of their individual effects. Here we provide a statistical framework for modeling complex genetic interactions of this type. The key idea is to ask how fast the effects of mutations change when introducing the same mutation in increasingly distant genetic backgrounds. We then propose a model for phenotypic prediction that takes into account this tendency for the effects of mutations to be more similar in nearby genetic backgrounds. Contemporary high-throughput mutagenesis experiments are providing an increasingly detailed view of the complex patterns of genetic interaction that occur between multiple mutations within a single protein or regulatory element. By simultaneously measuring the effects of thousands of combinations of mutations, these experiments have revealed that the genotype–phenotype relationship typically reflects not only genetic interactions between pairs of sites but also higher-order interactions among larger numbers of sites. However, modeling and understanding these higher-order interactions remains challenging. Here we present a method for reconstructing sequence-to-function mappings from partially observed data that can accommodate all orders of genetic interaction. The main idea is to make predictions for unobserved genotypes that match the type and extent of epistasis found in the observed data. This information on the type and extent of epistasis can be extracted by considering how phenotypic correlations change as a function of mutational distance, which is equivalent to estimating the fraction of phenotypic variance due to each order of genetic interaction (additive, pairwise, three-way, etc.). Using these estimated variance components, we then define an empirical Bayes prior that in expectation matches the observed pattern of epistasis and reconstruct the genotype–phenotype mapping by conducting Gaussian process regression under this prior. To demonstrate the power of this approach, we present an application to the antibody-binding domain GB1 and also provide a detailed exploration of a dataset consisting of high-throughput measurements for the splicing efficiency of human pre-mRNA 5′ splice sites, for which we also validate our model predictions via additional low-throughput experiments.
Collapse
|
82
|
Castro E, Godavarthi A, Rubinfien J, Givechian K, Bhaskar D, Krishnaswamy S. Transformer-based protein generation with regularized latent space optimization. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00532-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
83
|
Srivastava M, Payne JL. On the incongruence of genotype-phenotype and fitness landscapes. PLoS Comput Biol 2022; 18:e1010524. [PMID: 36121840 PMCID: PMC9521842 DOI: 10.1371/journal.pcbi.1010524] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 09/29/2022] [Accepted: 08/30/2022] [Indexed: 11/22/2022] Open
Abstract
The mapping from genotype to phenotype to fitness typically involves multiple nonlinearities that can transform the effects of mutations. For example, mutations may contribute additively to a phenotype, but their effects on fitness may combine non-additively because selection favors a low or intermediate value of that phenotype. This can cause incongruence between the topographical properties of a fitness landscape and its underlying genotype-phenotype landscape. Yet, genotype-phenotype landscapes are often used as a proxy for fitness landscapes to study the dynamics and predictability of evolution. Here, we use theoretical models and empirical data on transcription factor-DNA interactions to systematically study the incongruence of genotype-phenotype and fitness landscapes when selection favors a low or intermediate phenotypic value. Using the theoretical models, we prove a number of fundamental results. For example, selection for low or intermediate phenotypic values does not change simple sign epistasis into reciprocal sign epistasis, implying that genotype-phenotype landscapes with only simple sign epistasis motifs will always give rise to single-peaked fitness landscapes under such selection. More broadly, we show that such selection tends to create fitness landscapes that are more rugged than the underlying genotype-phenotype landscape, but this increased ruggedness typically does not frustrate adaptive evolution because the local adaptive peaks in the fitness landscape tend to be nearly as tall as the global peak. Many of these results carry forward to the empirical genotype-phenotype landscapes, which may help to explain why low- and intermediate-affinity transcription factor-DNA interactions are so prevalent in eukaryotic gene regulation. How do mutations change phenotypic traits and organismal fitness? This question is often addressed in the context of a classic metaphor of evolutionary theory—the fitness landscape. A fitness landscape is akin to a physical landscape, in which genotypes define spatial coordinates, and fitness defines the elevation of each coordinate. Evolution then acts like a hill-climbing process, in which populations ascend fitness peaks as a consequence of mutation and selection. It is becoming increasingly common to construct such landscapes using experimental data from high-throughput sequencing technologies and phenotypic assays, in systems such as macromolecules and gene regulatory circuits. Although these landscapes are typically defined by molecular phenotypes, and are therefore more appropriately referred to as genotype-phenotype landscapes, they are often used to study evolutionary dynamics. This requires the assumption that the molecular phenotype is a reasonable proxy for fitness, which need not be the case. For example, selection may favor a low or intermediate phenotypic value, causing incongruence between a fitness landscape and its underlying genotype-phenotype landscape. Here, we study such incongruence using a diversity of theoretical models and experimental data from gene regulatory systems. We regularly find incongruence, in that fitness landscapes tend to comprise more peaks than their underlying genotype-phenotype landscapes. However, using evolutionary simulations, we show that this increased ruggedness need not impede adaptation.
Collapse
Affiliation(s)
- Malvika Srivastava
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Joshua L. Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
84
|
Three-dimensional structure-guided evolution of a ribosome with tethered subunits. Nat Chem Biol 2022; 18:990-998. [PMID: 35836020 PMCID: PMC9815830 DOI: 10.1038/s41589-022-01064-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 05/17/2022] [Indexed: 01/11/2023]
Abstract
RNA-based macromolecular machines, such as the ribosome, have functional parts reliant on structural interactions spanning sequence-distant regions. These features limit evolutionary exploration of mutant libraries and confound three-dimensional structure-guided design. To address these challenges, we describe Evolink (evolution and linkage), a method that enables high-throughput evolution of sequence-distant regions in large macromolecular machines, and library design guided by computational RNA modeling to enable exploration of structurally stable designs. Using Evolink, we evolved a tethered ribosome with a 58% increased activity in orthogonal protein translation and a 97% improvement in doubling times in SQ171 cells compared to a previously developed tethered ribosome, and reveal new permissible sequences in a pair of ribosomal helices with previously explored biological function. The Evolink approach may enable enhanced engineering of macromolecular machines for new and improved functions for synthetic biology.
Collapse
|
85
|
Gabzi T, Pilpel Y, Friedlander T. Fitness landscape analysis of a tRNA gene reveals that the wild type allele is sub-optimal, yet mutationally robust. Mol Biol Evol 2022; 39:6670756. [PMID: 35976926 PMCID: PMC9447856 DOI: 10.1093/molbev/msac178] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Fitness landscape mapping and the prediction of evolutionary trajectories on these landscapes are major tasks in evolutionary biology research. Evolutionary dynamics is tightly linked to the landscape topography, but this relation is not straightforward. Here, we analyze a fitness landscape of a yeast tRNA gene, previously measured under four different conditions. We find that the wild type allele is sub-optimal, and 8–10% of its variants are fitter. We rule out the possibilities that the wild type is fittest on average on these four conditions or located on a local fitness maximum. Notwithstanding, we cannot exclude the possibility that the wild type might be fittest in some of the many conditions in the complex ecology that yeast lives at. Instead, we find that the wild type is mutationally robust (“flat”), while more fit variants are typically mutationally fragile. Similar observations of mutational robustness or flatness have been so far made in very few cases, predominantly in viral genomes.
Collapse
Affiliation(s)
- Tzahi Gabzi
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Yitzhak Pilpel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Tamar Friedlander
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture Faculty of Agriculture, Hebrew University of Jerusalem, 229 Herzl St., Rehovot 7610001, Israel
| |
Collapse
|
86
|
Low protein expression enhances phenotypic evolvability by intensifying selection on folding stability. Nat Ecol Evol 2022; 6:1155-1164. [PMID: 35798838 PMCID: PMC7613228 DOI: 10.1038/s41559-022-01797-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 05/19/2022] [Indexed: 01/09/2023]
Abstract
Protein abundance affects the evolution of protein genotypes, but we do not know how it affects the evolution of protein phenotypes. Here we investigate the role of protein abundance in the evolvability of green fluorescent protein (GFP) towards the novel phenotype of cyan fluorescence. We evolve GFP in E. coli through multiple cycles of mutation and selection and show that low GFP expression facilitates the evolution of cyan fluorescence. A computational model whose predictions we test experimentally helps explain why: lowly expressed proteins are under stronger selection for proper folding, which facilitates their evolvability on short evolutionary time scales. The reason is that high fluorescence can be achieved by either few proteins that fold well or by many proteins that fold less well. In other words, we observe a synergy between a protein's scarcity and its stability. Because many proteins meet the essential requirements for this scarcity-stability synergy, it may be a widespread mechanism by which low expression helps proteins evolve new phenotypes and functions.
Collapse
|
87
|
Wang B, Gamazon ER. Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2. iScience 2022; 25:104500. [PMID: 35669036 PMCID: PMC9159778 DOI: 10.1016/j.isci.2022.104500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 11/15/2021] [Accepted: 05/26/2022] [Indexed: 11/29/2022] Open
Abstract
Deep mutational scanning (DMS) experiments have been performed on SARS-CoV-2’s spike receptor-binding domain (RBD) and human angiotensin-converting enzyme 2 (ACE2) zinc-binding peptidase domain—both central players in viral infection and evolution and antibody evasion—quantifying how mutations impact biochemical phenotypes. We modeled biochemical phenotypes from massively parallel assays, using neural networks trained on protein sequence mutations in the virus and human host. Neural networks were significantly predictive of binding affinity, protein expression, and antibody escape, learning complex interactions and higher-order features that are difficult to capture with conventional methods from structural biology. Integrating the physicochemical properties of amino acids, such as hydrophobicity and long-range non-bonded energy per atom, significantly improved prediction (empirical p < 0.01). We observed concordance of the neural network predictions with molecular dynamics (multiple 500 ns or 1 μs all-atom) simulations of the spike protein-ACE2 interface, with critical implications for the use of deep learning to dissect molecular mechanisms. Deep learning models of biochemical phenotypes from deep mutational scanning (DMS) data Prediction performance gain from using physicochemical properties of amino acids Concordance of neural network predictions with molecular dynamics simulations Improved causal inference properties for neural-network-defined phenotypes
Collapse
Affiliation(s)
- Bo Wang
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Eric R Gamazon
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA.,Data Science Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA.,Clare Hall, University of Cambridge, Cambridge CB3 9AL, UK
| |
Collapse
|
88
|
Matsumura I, Patrick WM. Dan Tawfik's Lessons for Protein Engineers about Enzymes Adapting to New Substrates. Biochemistry 2022; 62:158-162. [PMID: 35820168 PMCID: PMC9851151 DOI: 10.1021/acs.biochem.2c00230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Natural evolution has been creating new complex systems for billions of years. The process is spontaneous and requires neither intelligence nor moral purpose but is nevertheless difficult to understand. The late Dan Tawfik spent years studying enzymes as they adapted to recognize new substrates. Much of his work focused on gaining fundamental insights, so the practical utility of his experiments may not be obvious even to accomplished protein engineers. Here we focus on two questions fundamental to any directed evolution experiment. Which proteins are the best starting points for such experiments? Which trait(s) of the chosen parental protein should be evolved to achieve the desired outcome? We summarize Tawfik's contributions to our understanding of these problems, to honor his memory and encourage those unfamiliar with his ideas to read his publications.
Collapse
Affiliation(s)
- Ichiro Matsumura
- O.
Wayne Rollins Research Center, 1510 Clifton Road NE, Room 4001, Atlanta, Georgia 30322, United States,E-mail:
| | - Wayne M. Patrick
- Centre
for Biodiscovery, School of Biological Sciences, Victoria University of Wellington, Wellington 6012, New Zealand,E-mail:
| |
Collapse
|
89
|
Samant N, Nachum G, Tsepal T, Bolon DNA. Sequence dependencies and biophysical features both govern cleavage of diverse cut-sites by HIV protease. Protein Sci 2022; 31:e4366. [PMID: 35762719 DOI: 10.1002/pro.4366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 05/18/2022] [Accepted: 05/27/2022] [Indexed: 11/12/2022]
Abstract
The infectivity of HIV-1 requires its protease (PR) cleave multiple cut-sites with low sequence similarity. The diversity of cleavage sites has made it challenging to investigate the underlying sequence properties that determine binding and turnover of substrates by PR. We engineered a mutational scanning approach utilizing yeast display, flow cytometry, and deep sequencing to systematically measure the impacts of all individual amino acid changes at 12 positions in three different cut-sites (MA/CA, NC/p1, and p1/p6). The resulting fitness landscapes revealed common physical features that underlie cutting of all three cut-sites at the amino acid positions closest to the scissile bond. In contrast, positions more than two amino acids away from the scissile bond exhibited a strong dependence on the sequence background of the rest of the cut-site. We observed multiple amino acid changes in cut-sites that led to faster cleavage rates, including a preference for negative charge five and six amino acids away from the scissile bond at locations where the surface of protease is positively charged. Analysis of individual cut sites using full-length matrix-capsid proteins indicate that long-distance sequence context can contribute to cutting efficiency such that analyses of peptides or shorter engineered constructs including those in this work should be considered carefully. This work provides a framework for understanding how diverse substrates interact with HIV-1 PR and can be extended to investigate other viral PRs with similar properties.
Collapse
Affiliation(s)
- Neha Samant
- Biochemistry and Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, Massachusetts, USA
| | - Gily Nachum
- Biochemistry and Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, Massachusetts, USA
| | - Tenzin Tsepal
- Biochemistry and Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, Massachusetts, USA
| | - Daniel N A Bolon
- Biochemistry and Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, Massachusetts, USA
| |
Collapse
|
90
|
Interpretable modeling of genotype-phenotype landscapes with state-of-the-art predictive power. Proc Natl Acad Sci U S A 2022; 119:e2114021119. [PMID: 35733251 PMCID: PMC9245639 DOI: 10.1073/pnas.2114021119] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Large-scale measurements linking genetic background to biological function have driven a need for models that can incorporate these data for reliable predictions and insight into the underlying biophysical system. Recent modeling efforts, however, prioritize predictive accuracy at the expense of model interpretability. Here, we present LANTERN (landscape interpretable nonparametric model, https://github.com/usnistgov/lantern), a hierarchical Bayesian model that distills genotype-phenotype landscape (GPL) measurements into a low-dimensional feature space that represents the fundamental biological mechanisms of the system while also enabling straightforward, explainable predictions. Across a benchmark of large-scale datasets, LANTERN equals or outperforms all alternative approaches, including deep neural networks. LANTERN furthermore extracts useful insights of the landscape, including its inherent dimensionality, a latent space of additive mutational effects, and metrics of landscape structure. LANTERN facilitates straightforward discovery of fundamental mechanisms in GPLs, while also reliably extrapolating to unexplored regions of genotypic space.
Collapse
|
91
|
Park Y, Metzger BPH, Thornton JW. Epistatic drift causes gradual decay of predictability in protein evolution. Science 2022; 376:823-830. [PMID: 35587978 DOI: 10.1126/science.abn6895] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Epistatic interactions can make the outcomes of evolution unpredictable, but no comprehensive data are available on the extent and temporal dynamics of changes in the effects of mutations as protein sequences evolve. Here, we use phylogenetic deep mutational scanning to measure the functional effect of every possible amino acid mutation in a series of ancestral and extant steroid receptor DNA binding domains. Across 700 million years of evolution, epistatic interactions caused the effects of most mutations to become decorrelated from their initial effects and their windows of evolutionary accessibility to open and close transiently. Most effects changed gradually and without bias at rates that were largely constant across time, indicating a neutral process caused by many weak epistatic interactions. Our findings show that protein sequences drift inexorably into contingency and unpredictability, but that the process is statistically predictable, given sufficient phylogenetic and experimental data.
Collapse
Affiliation(s)
- Yeonwoo Park
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Brian P H Metzger
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Joseph W Thornton
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL, USA.,Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA.,Department of Human Genetics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
92
|
Horne J, Shukla D. Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering. Ind Eng Chem Res 2022; 61:6235-6245. [DOI: 10.1021/acs.iecr.1c04943] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Jesse Horne
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana−Champaign, Champaign, Illinois 61801, United States
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana−Champaign, Champaign, Illinois 61801, United States
- Department of Bioengineering, University of Illinois Urbana−Champaign, Champaign, Illinois 61801, United States
- Department of Plant Biology, University of Illinois Urbana−Champaign, Champaign, Illinois 61801, United States
- Cancer Center at Illinois, University of Illinois Urbana−Champaign, Champaign, Illinois 61801, United States
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana−Champaign, Champaign, Illinois 61801, United States
| |
Collapse
|
93
|
Koch P, Schmitt S, Heynisch A, Gumpinger A, Wüthrich I, Gysin M, Shcherbakov D, Hobbie SN, Panke S, Held M. Optimization of the antimicrobial peptide Bac7 by deep mutational scanning. BMC Biol 2022; 20:114. [PMID: 35578204 PMCID: PMC9112550 DOI: 10.1186/s12915-022-01304-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 03/30/2022] [Indexed: 11/24/2022] Open
Abstract
Background Intracellularly active antimicrobial peptides are promising candidates for the development of antibiotics for human applications. However, drug development using peptides is challenging as, owing to their large size, an enormous sequence space is spanned. We built a high-throughput platform that incorporates rapid investigation of the sequence-activity relationship of peptides and enables rational optimization of their antimicrobial activity. The platform is based on deep mutational scanning of DNA-encoded peptides and employs highly parallelized bacterial self-screening coupled to next-generation sequencing as a readout for their antimicrobial activity. As a target, we used Bac71-23, a 23 amino acid residues long variant of bactenecin-7, a potent translational inhibitor and one of the best researched proline-rich antimicrobial peptides. Results Using the platform, we simultaneously determined the antimicrobial activity of >600,000 Bac71-23 variants and explored their sequence-activity relationship. This dataset guided the design of a focused library of ~160,000 variants and the identification of a lead candidate Bac7PS. Bac7PS showed high activity against multidrug-resistant clinical isolates of E. coli, and its activity was less dependent on SbmA, a transporter commonly used by proline-rich antimicrobial peptides to reach the cytosol and then inhibit translation. Furthermore, Bac7PS displayed strong ribosomal inhibition and low toxicity against eukaryotic cells and demonstrated good efficacy in a murine septicemia model induced by E. coli. Conclusion We demonstrated that the presented platform can be used to establish the sequence-activity relationship of antimicrobial peptides, and showed its usefulness for hit-to-lead identification and optimization of antimicrobial drug candidates. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-022-01304-4.
Collapse
Affiliation(s)
- Philipp Koch
- Bioprocess Laboratory, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Steven Schmitt
- Bioprocess Laboratory, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Alexander Heynisch
- Bioprocess Laboratory, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Anja Gumpinger
- Machine Learning and Computational Biology, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Irene Wüthrich
- Bioprocess Laboratory, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Marina Gysin
- Institute of Medical Microbiology, University of Zurich, Zurich, Switzerland
| | - Dimitri Shcherbakov
- Institute of Medical Microbiology, University of Zurich, Zurich, Switzerland
| | - Sven N Hobbie
- Institute of Medical Microbiology, University of Zurich, Zurich, Switzerland
| | - Sven Panke
- Bioprocess Laboratory, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Martin Held
- Bioprocess Laboratory, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
| |
Collapse
|
94
|
Bakerlee CW, Nguyen Ba AN, Shulgina Y, Rojas Echenique JI, Desai MM. Idiosyncratic epistasis leads to global fitness-correlated trends. Science 2022; 376:630-635. [PMID: 35511982 PMCID: PMC10124986 DOI: 10.1126/science.abm4774] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Epistasis can markedly affect evolutionary trajectories. In recent decades, protein-level fitness landscapes have revealed extensive idiosyncratic epistasis among specific mutations. By contrast, other work has found ubiquitous and apparently nonspecific patterns of global diminishing-returns and increasing-costs epistasis among mutations across the genome. Here, we used a hierarchical CRISPR gene drive system to construct all combinations of 10 missense mutations from across the genome in budding yeast and measured their fitness in six environments. We show that the resulting fitness landscapes exhibit global fitness-correlated trends but that these trends emerge from specific idiosyncratic interactions. We thus provide experimental validation of recent theoretical work arguing that fitness-correlated trends can emerge as the generic consequence of idiosyncratic epistasis.
Collapse
Affiliation(s)
- Christopher W Bakerlee
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.,Quantitative Biology Initiative, Harvard University, Cambridge, MA, USA.,Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Alex N Nguyen Ba
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.,Quantitative Biology Initiative, Harvard University, Cambridge, MA, USA.,Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada.,Department of Biology, University of Toronto Mississauga, Mississauga, Ontario, Canada
| | - Yekaterina Shulgina
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Jose I Rojas Echenique
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.,Quantitative Biology Initiative, Harvard University, Cambridge, MA, USA.,NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, MA, USA.,Department of Physics, Harvard University, Cambridge, MA, USA
| |
Collapse
|
95
|
Ding D, Green AG, Wang B, Lite TLV, Weinstein EN, Marks DS, Laub MT. Co-evolution of interacting proteins through non-contacting and non-specific mutations. Nat Ecol Evol 2022; 6:590-603. [PMID: 35361892 PMCID: PMC9090974 DOI: 10.1038/s41559-022-01688-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Accepted: 01/31/2022] [Indexed: 01/08/2023]
Abstract
Proteins often accumulate neutral mutations that do not affect current functions but can profoundly influence future mutational possibilities and functions. Understanding such hidden potential has major implications for protein design and evolutionary forecasting but has been limited by a lack of systematic efforts to identify potentiating mutations. Here, through the comprehensive analysis of a bacterial toxin-antitoxin system, we identified all possible single substitutions in the toxin that enable it to tolerate otherwise interface-disrupting mutations in its antitoxin. Strikingly, the majority of enabling mutations in the toxin do not contact and promote tolerance non-specifically to many different antitoxin mutations, despite covariation in homologues occurring primarily between specific pairs of contacting residues across the interface. In addition, the enabling mutations we identified expand future mutational paths that both maintain old toxin-antitoxin interactions and form new ones. These non-specific mutations are missed by widely used covariation and machine learning methods. Identifying such enabling mutations will be critical for ensuring continued binding of therapeutically relevant proteins, such as antibodies, aimed at evolving targets.
Collapse
Affiliation(s)
- David Ding
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Anna G Green
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Boyuan Wang
- Department of Pharmacology, UT Southwestern Medical Center, Dallas, TX, USA
| | - Thuy-Lan Vo Lite
- Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA, USA
| | | | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
96
|
Yang CH, Scarpino SV. A Family of Fitness Landscapes Modeled through Gene Regulatory Networks. ENTROPY (BASEL, SWITZERLAND) 2022; 24:622. [PMID: 35626507 PMCID: PMC9141513 DOI: 10.3390/e24050622] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 04/11/2022] [Accepted: 04/26/2022] [Indexed: 02/01/2023]
Abstract
Fitness landscapes are a powerful metaphor for understanding the evolution of biological systems. These landscapes describe how genotypes are connected to each other through mutation and related through fitness. Empirical studies of fitness landscapes have increasingly revealed conserved topographical features across diverse taxa, e.g., the accessibility of genotypes and "ruggedness". As a result, theoretical studies are needed to investigate how evolution proceeds on fitness landscapes with such conserved features. Here, we develop and study a model of evolution on fitness landscapes using the lens of Gene Regulatory Networks (GRNs), where the regulatory products are computed from multiple genes and collectively treated as phenotypes. With the assumption that regulation is a binary process, we prove the existence of empirically observed, topographical features such as accessibility and connectivity. We further show that these results hold across arbitrary fitness functions and that a trade-off between accessibility and ruggedness need not exist. Then, using graph theory and a coarse-graining approach, we deduce a mesoscopic structure underlying GRN fitness landscapes where the information necessary to predict a population's evolutionary trajectory is retained with minimal complexity. Using this coarse-graining, we develop a bottom-up algorithm to construct such mesoscopic backbones, which does not require computing the genotype network and is therefore far more efficient than brute-force approaches. Altogether, this work provides mathematical results of high-dimensional fitness landscapes and a path toward connecting theory to empirical studies.
Collapse
Affiliation(s)
- Chia-Hung Yang
- Network Science Institute, Northeastern University, Boston, MA 02115, USA
| | - Samuel V. Scarpino
- Network Science Institute, Northeastern University, Boston, MA 02115, USA
- Physics Department, Northeastern University, Boston, MA 02115, USA
- Roux Institute, Northeastern University, Boston, MA 02115, USA
- Institute for Experiential AI, Northeastern University, Boston, MA 02115, USA
- Santa Fe Institute, Santa Fe, NM 87501, USA
- Vermont Complex Systems Center, University of Vermont, Burlington, VT 05405, USA
| |
Collapse
|
97
|
Vila JA. Proteins' Evolution upon Point Mutations. ACS OMEGA 2022; 7:14371-14376. [PMID: 35573218 PMCID: PMC9089682 DOI: 10.1021/acsomega.2c01407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 04/05/2022] [Indexed: 05/03/2023]
Abstract
As the reader must be already aware, state-of-the-art protein folding prediction methods have reached a smashing success in their goal of accurately determining the three-dimensional structures of proteins. Yet, a solution to simple problems such as the effects of protein point mutations on their (i) native conformation; (ii) marginal stability; (iii) ensemble of high-energy nativelike conformations; and (iv) metamorphism propensity and, hence, their evolvability, remains as an unsolved problem. As a plausible solution to the latter, some properties of the amide hydrogen-deuterium exchange, a highly sensitive probe of the structure, stability, and folding of proteins, are assessed from a new perspective. The preliminary results indicate that the protein marginal stability change upon point mutations provides the necessary and sufficient information to estimate, through a Boltzmann factor, the evolution of the amide hydrogen exchange protection factors and, consequently, that of the ensemble of folded conformations coexisting with the native state. This work contributes to our general understanding of the effects of point mutations on proteins and may spur significant progress in our efforts to develop methods to determine the appearance of new folds and functions accurately.
Collapse
|
98
|
Tareen A, Kooshkbaghi M, Posfai A, Ireland WT, McCandlish DM, Kinney JB. MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Genome Biol 2022; 23:98. [PMID: 35428271 PMCID: PMC9011994 DOI: 10.1186/s13059-022-02661-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 03/24/2022] [Indexed: 12/17/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps—including biophysically interpretable models—from MAVE datasets. We demonstrate MAVE-NN in multiple biological contexts, and highlight the ability of our approach to deconvolve mutational effects from otherwise confounding experimental nonlinearities and noise.
Collapse
|
99
|
Detlefsen NS, Hauberg S, Boomsma W. Learning meaningful representations of protein sequences. Nat Commun 2022; 13:1914. [PMID: 35395843 PMCID: PMC8993921 DOI: 10.1038/s41467-022-29443-w] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 03/15/2022] [Indexed: 01/27/2023] Open
Abstract
How we choose to represent our data has a fundamental impact on our ability to subsequently extract information from them. Machine learning promises to automatically determine efficient representations from large unstructured datasets, such as those arising in biology. However, empirical evidence suggests that seemingly minor changes to these machine learning models yield drastically different data representations that result in different biological interpretations of data. This begs the question of what even constitutes the most meaningful representation. Here, we approach this question for representations of protein sequences, which have received considerable attention in the recent literature. We explore two key contexts in which representations naturally arise: transfer learning and interpretable learning. In the first context, we demonstrate that several contemporary practices yield suboptimal performance, and in the latter we demonstrate that taking representation geometry into account significantly improves interpretability and lets the models reveal biological information that is otherwise obscured. "Representation learning plays an increasing role in protein sequence analysis. This paper seeks to clarify how to ensure that such representations are meaningful, proposing best practices both for the choice of methods and the subsequence analysis
Collapse
Affiliation(s)
| | - Søren Hauberg
- Section for Cognitive Systems, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Wouter Boomsma
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
100
|
Environmental selection and epistasis in an empirical phenotype-environment-fitness landscape. Nat Ecol Evol 2022; 6:427-438. [PMID: 35210579 DOI: 10.1038/s41559-022-01675-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 12/14/2021] [Indexed: 11/08/2022]
Abstract
Fitness landscapes, mappings of genotype/phenotype to their effects on fitness, are invaluable concepts in evolutionary biochemistry. Although widely discussed, measurements of phenotype-fitness landscapes in proteins remain scarce. Here, we quantify all single mutational effects on fitness and phenotype (EC50) of VIM-2 β-lactamase across a 64-fold range of ampicillin concentrations. We then construct a phenotype-fitness landscape that takes variations in environmental selection pressure into account. We found that a simple, empirical landscape accurately models the ~39,000 mutational data points, suggesting that the evolution of VIM-2 can be predicted on the basis of the selection environment. Our landscape provides new quantitative knowledge on the evolution of the β-lactamases and proteins in general, particularly their evolutionary dynamics under subinhibitory antibiotic concentrations, as well as the mechanisms and environmental dependence of non-specific epistasis.
Collapse
|