1
|
Li Y, Barton JP. Correlated Allele Frequency Changes Reveal Clonal Structure and Selection in Temporal Genetic Data. Mol Biol Evol 2024; 41:msae060. [PMID: 38507665 PMCID: PMC10986812 DOI: 10.1093/molbev/msae060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 02/02/2024] [Accepted: 03/15/2024] [Indexed: 03/22/2024] Open
Abstract
In evolving populations where the rate of beneficial mutations is large, subpopulations of individuals with competing beneficial mutations can be maintained over long times. Evolution with this kind of clonal structure is commonly observed in a wide range of microbial and viral populations. However, it can be difficult to completely resolve clonal dynamics in data. This is due to limited read lengths in high-throughput sequencing methods, which are often insufficient to directly measure linkage disequilibrium or determine clonal structure. Here, we develop a method to infer clonal structure using correlated allele frequency changes in time-series sequence data. Simulations show that our method recovers true, underlying clonal structures when they are known and accurately estimate linkage disequilibrium. This information can then be combined with other inference methods to improve estimates of the fitness effects of individual mutations. Applications to data suggest novel clonal structures in an E. coli long-term evolution experiment, and yield improved predictions of the effects of mutations on bacterial fitness and antibiotic resistance. Moreover, our method is computationally efficient, requiring orders of magnitude less run time for large data sets than existing methods. Overall, our method provides a powerful tool to infer clonal structures from data sets where only allele frequencies are available, which can also improve downstream analyses.
Collapse
Affiliation(s)
- Yunxiao Li
- Department of Physics and Astronomy, University of California, Riverside, CA 92521, USA
| | - John P Barton
- Department of Physics and Astronomy, University of California, Riverside, CA 92521, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, USA
| |
Collapse
|
2
|
Zhang S, Ma Z, Li W, Shen Y, Xu Y, Liu G, Chang J, Li Z, Qin H, Tian B, Gong H, Liu D, Thuronyi B, Voigt C. EvoAI enables extreme compression and reconstruction of the protein sequence space. RESEARCH SQUARE 2024:rs.3.rs-3930833. [PMID: 38464127 PMCID: PMC10925456 DOI: 10.21203/rs.3.rs-3930833/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Designing proteins with improved functions requires a deep understanding of how sequence and function are related, a vast space that is hard to explore. The ability to efficiently compress this space by identifying functionally important features is extremely valuable. Here, we first establish a method called EvoScan to comprehensively segment and scan the high-fitness sequence space to obtain anchor points that capture its essential features, especially in high dimensions. Our approach is compatible with any biomolecular function that can be coupled to a transcriptional output. We then develop deep learning and large language models to accurately reconstruct the space from these anchors, allowing computational prediction of novel, highly fit sequences without prior homology-derived or structural information. We apply this hybrid experimental-computational method, which we call EvoAI, to a repressor protein and find that only 82 anchors are sufficient to compress the high-fitness sequence space with a compression ratio of 1048. The extreme compressibility of the space informs both applied biomolecular design and understanding of natural evolution.
Collapse
|
3
|
Schmitt LT, Schneider A, Posorski J, Lansing F, Jelicic M, Jain M, Sayed S, Buchholz F, Sürün D. Quantification of evolved DNA-editing enzymes at scale with DEQSeq. Genome Biol 2023; 24:254. [PMID: 37932818 PMCID: PMC10626641 DOI: 10.1186/s13059-023-03097-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 10/24/2023] [Indexed: 11/08/2023] Open
Abstract
We introduce DEQSeq, a nanopore sequencing approach that rationalizes the selection of favorable genome editing enzymes from directed molecular evolution experiments. With the ability to capture full-length sequences, editing efficiencies, and specificities from thousands of evolved enzymes simultaneously, DEQSeq streamlines the process of identifying the most valuable variants for further study and application. We apply DEQSeq to evolved libraries of Cas12f-ABEs and designer-recombinases, identifying variants with improved properties for future applications. Our results demonstrate that DEQSeq is a powerful tool for accelerating enzyme discovery and advancing genome editing research.
Collapse
Affiliation(s)
- Lukas Theo Schmitt
- Medical Faculty and University Hospital Carl Gustav Carus, UCC Section Medical Systems Biology, Dresden, TU Dresden, 01307, Germany
- Present Address: Seamless Therapeutics GmbH, Tatzberg 47/49, 01307, Dresden, Germany
| | - Aksana Schneider
- Medical Faculty and University Hospital Carl Gustav Carus, UCC Section Medical Systems Biology, Dresden, TU Dresden, 01307, Germany
- Present Address: Seamless Therapeutics GmbH, Tatzberg 47/49, 01307, Dresden, Germany
| | - Jonas Posorski
- Medical Faculty and University Hospital Carl Gustav Carus, UCC Section Medical Systems Biology, Dresden, TU Dresden, 01307, Germany
| | - Felix Lansing
- Medical Faculty and University Hospital Carl Gustav Carus, UCC Section Medical Systems Biology, Dresden, TU Dresden, 01307, Germany
- Present Address: Seamless Therapeutics GmbH, Tatzberg 47/49, 01307, Dresden, Germany
| | - Milica Jelicic
- Medical Faculty and University Hospital Carl Gustav Carus, UCC Section Medical Systems Biology, Dresden, TU Dresden, 01307, Germany
| | - Manavi Jain
- Medical Faculty and University Hospital Carl Gustav Carus, UCC Section Medical Systems Biology, Dresden, TU Dresden, 01307, Germany
| | - Shady Sayed
- Medical Faculty and University Hospital Carl Gustav Carus, UCC Section Medical Systems Biology, Dresden, TU Dresden, 01307, Germany
| | - Frank Buchholz
- Medical Faculty and University Hospital Carl Gustav Carus, UCC Section Medical Systems Biology, Dresden, TU Dresden, 01307, Germany.
| | - Duran Sürün
- Medical Faculty and University Hospital Carl Gustav Carus, UCC Section Medical Systems Biology, Dresden, TU Dresden, 01307, Germany.
| |
Collapse
|
4
|
Li Y, Barton JP. Estimating linkage disequilibrium and selection from allele frequency trajectories. Genetics 2023; 223:iyac189. [PMID: 36610715 PMCID: PMC9991507 DOI: 10.1093/genetics/iyac189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 10/14/2022] [Accepted: 12/11/2022] [Indexed: 01/09/2023] Open
Abstract
Genetic sequences collected over time provide an exciting opportunity to study natural selection. In such studies, it is important to account for linkage disequilibrium to accurately measure selection and to distinguish between selection and other effects that can cause changes in allele frequencies, such as genetic hitchhiking or clonal interference. However, most high-throughput sequencing methods cannot directly measure linkage due to short-read lengths. Here we develop a simple method to estimate linkage disequilibrium from time-series allele frequencies. This reconstructed linkage information can then be combined with other inference methods to infer the fitness effects of individual mutations. Simulations show that our approach reliably outperforms inference that ignores linkage disequilibrium and, with sufficient sampling, performs similarly to inference using the true linkage information. We also introduce two regularization methods derived from random matrix theory that help to preserve its performance under limited sampling effects. Overall, our method enables the use of linkage-aware inference methods even for data sets where only allele frequency time series are available.
Collapse
Affiliation(s)
- Yunxiao Li
- Department of Physics and Astronomy, University of California, Riverside, CA 92521, USA
| | - John P Barton
- Department of Physics and Astronomy, University of California, Riverside, CA 92521, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, USA
| |
Collapse
|
5
|
Schmitt LT, Paszkowski-Rogacz M, Jug F, Buchholz F. Prediction of designer-recombinases for DNA editing with generative deep learning. Nat Commun 2022; 13:7966. [PMID: 36575171 PMCID: PMC9794738 DOI: 10.1038/s41467-022-35614-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 12/14/2022] [Indexed: 12/28/2022] Open
Abstract
Site-specific tyrosine-type recombinases are effective tools for genome engineering, with the first engineered variants having demonstrated therapeutic potential. So far, adaptation to new DNA target site selectivity of designer-recombinases has been achieved mostly through iterative cycles of directed molecular evolution. While effective, directed molecular evolution methods are laborious and time consuming. Here we present RecGen (Recombinase Generator), an algorithm for the intelligent generation of designer-recombinases. We gather the sequence information of over one million Cre-like recombinase sequences evolved for 89 different target sites with which we train Conditional Variational Autoencoders for recombinase generation. Experimental validation demonstrates that the algorithm can predict recombinase sequences with activity on novel target-sites, indicating that RecGen is useful to accelerate the development of future designer-recombinases.
Collapse
Affiliation(s)
- Lukas Theo Schmitt
- grid.4488.00000 0001 2111 7257Medical Systems Biology, Medical Faculty, TU Dresden, 01307 Dresden, Germany
| | - Maciej Paszkowski-Rogacz
- grid.4488.00000 0001 2111 7257Medical Systems Biology, Medical Faculty, TU Dresden, 01307 Dresden, Germany
| | - Florian Jug
- grid.510779.d0000 0004 9414 6915Fondazione Human Technopole, Milano, Italy ,grid.495510.c0000 0004 9335 670XCenter for Systems Biology Dresden, Dresden, Germany ,grid.419537.d0000 0001 2113 4567Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Frank Buchholz
- grid.4488.00000 0001 2111 7257Medical Systems Biology, Medical Faculty, TU Dresden, 01307 Dresden, Germany
| |
Collapse
|
6
|
Molina RS, Rix G, Mengiste AA, Alvarez B, Seo D, Chen H, Hurtado J, Zhang Q, Donato García-García J, Heins ZJ, Almhjell PJ, Arnold FH, Khalil AS, Hanson AD, Dueber JE, Schaffer DV, Chen F, Kim S, Ángel Fernández L, Shoulders MD, Liu CC. In vivo hypermutation and continuous evolution. NATURE REVIEWS. METHODS PRIMERS 2022; 2:37. [PMID: 37073402 PMCID: PMC10108624 DOI: 10.1038/s43586-022-00130-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Rosana S. Molina
- Department of Biomedical Engineering, University of California, Irvine, CA 92617, USA
| | - Gordon Rix
- Department of Molecular Biology and Biochemistry, University of California, Irvine, CA 92697, USA
| | - Amanuella A. Mengiste
- Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Beatriz Alvarez
- Department of Microbial Biotechnology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas (CNB-CSIC), Darwin 3, Campus UAM Cantoblanco, 28049 Madrid, Spain
| | - Daeje Seo
- Department of Chemistry, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| | - Haiqi Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Juan Hurtado
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
| | - Qiong Zhang
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
| | - Jorge Donato García-García
- Tecnologico de Monterrey, Escuela de Ingenieria y Ciencias, Av. General Ramon Corona 2514, Nuevo Mexico, C.P. 45138, Zapopan, Jalisco, Mexico
| | - Zachary J. Heins
- Biological Design Center, Boston University, Boston, Massachusetts, USA
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Patrick J. Almhjell
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Frances H. Arnold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Ahmad S. Khalil
- Biological Design Center, Boston University, Boston, Massachusetts, USA
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, Massachusetts, USA
| | - Andrew D. Hanson
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - John E. Dueber
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute, University of California Berkeley and San Francisco, Berkeley, CA, USA
- Biological Systems & Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - David V. Schaffer
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute, University of California Berkeley and San Francisco, Berkeley, CA, USA
- Department of Chemical and Biomolecular Engineering, University of California Berkeley, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA
| | - Fei Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Seokhee Kim
- Department of Chemistry, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| | - Luis Ángel Fernández
- Department of Microbial Biotechnology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas (CNB-CSIC), Darwin 3, Campus UAM Cantoblanco, 28049 Madrid, Spain
| | - Matthew D. Shoulders
- Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Chang C. Liu
- Department of Biomedical Engineering, University of California, Irvine, CA 92617, USA
- Department of Molecular Biology and Biochemistry, University of California, Irvine, CA 92697, USA
- Department of Chemistry, University of California, Irvine, CA 92617, USA
| |
Collapse
|