1
|
Karamycheva S, Wolf YI, Persi E, Koonin EV, Makarova KS. Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions. Biol Direct 2022; 17:22. [PMID: 36042479 PMCID: PMC9425974 DOI: 10.1186/s13062-022-00337-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 08/13/2022] [Indexed: 12/24/2022] Open
Abstract
Background Evolutionary rate is a key characteristic of gene families that is linked to the functional importance of the respective genes as well as specific biological functions of the proteins they encode. Accurate estimation of evolutionary rates is a challenging task that requires precise phylogenetic analysis. Here we present an easy to estimate protein family level measure of sequence variability based on alignment column homogeneity in multiple alignments of protein sequences from Clade-Specific Clusters of Orthologous Genes (csCOGs). Results We report genome-wide estimates of variability for 8 diverse groups of bacteria and archaea and investigate the connection between variability and various genomic and biological features. The variability estimates are based on homogeneity distributions across amino acid sequence alignments and can be obtained for multiple groups of genomes at minimal computational expense. About half of the variance in variability values can be explained by the analyzed features, with the greatest contribution coming from the extent of gene paralogy in the given csCOG. The correlation between variability and paralogy appears to originate, primarily, not from gene duplication, but from acquisition of distant paralogs and xenologs, introducing sequence variants that are more divergent than those that could have evolved in situ during the lifetime of the given group of organisms. Both high-variability and low-variability csCOGs were identified in all functional categories, but as expected, proteins encoded by integrated mobile elements as well as proteins involved in defense functions and cell motility are, on average, more variable than proteins with housekeeping functions. Additionally, using linear discriminant analysis, we found that variability and fraction of genomes carrying a given gene are the two variables that provide the best prediction of gene essentiality as compared to the results of transposon mutagenesis in Sulfolobus islandicus. Conclusions Variability, a measure of sequence diversity within an alignment relative to the overall diversity within a group of organisms, offers a convenient proxy for evolutionary rate estimates and is informative with respect to prediction of functional properties of proteins. In particular, variability is a strong predictor of gene essentiality for the respective organisms and indicative of sub- or neofunctionalization of paralogs. Supplementary Information The online version contains supplementary material available at 10.1186/s13062-022-00337-7.
Collapse
Affiliation(s)
- Svetlana Karamycheva
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA.
| |
Collapse
|
2
|
Molina RS, Rix G, Mengiste AA, Alvarez B, Seo D, Chen H, Hurtado J, Zhang Q, Donato García-García J, Heins ZJ, Almhjell PJ, Arnold FH, Khalil AS, Hanson AD, Dueber JE, Schaffer DV, Chen F, Kim S, Ángel Fernández L, Shoulders MD, Liu CC. In vivo hypermutation and continuous evolution. NATURE REVIEWS. METHODS PRIMERS 2022; 2:37. [PMID: 37073402 PMCID: PMC10108624 DOI: 10.1038/s43586-022-00130-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Rosana S. Molina
- Department of Biomedical Engineering, University of California, Irvine, CA 92617, USA
| | - Gordon Rix
- Department of Molecular Biology and Biochemistry, University of California, Irvine, CA 92697, USA
| | - Amanuella A. Mengiste
- Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Beatriz Alvarez
- Department of Microbial Biotechnology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas (CNB-CSIC), Darwin 3, Campus UAM Cantoblanco, 28049 Madrid, Spain
| | - Daeje Seo
- Department of Chemistry, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| | - Haiqi Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Juan Hurtado
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
| | - Qiong Zhang
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
| | - Jorge Donato García-García
- Tecnologico de Monterrey, Escuela de Ingenieria y Ciencias, Av. General Ramon Corona 2514, Nuevo Mexico, C.P. 45138, Zapopan, Jalisco, Mexico
| | - Zachary J. Heins
- Biological Design Center, Boston University, Boston, Massachusetts, USA
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Patrick J. Almhjell
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Frances H. Arnold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Ahmad S. Khalil
- Biological Design Center, Boston University, Boston, Massachusetts, USA
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, Massachusetts, USA
| | - Andrew D. Hanson
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - John E. Dueber
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute, University of California Berkeley and San Francisco, Berkeley, CA, USA
- Biological Systems & Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - David V. Schaffer
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute, University of California Berkeley and San Francisco, Berkeley, CA, USA
- Department of Chemical and Biomolecular Engineering, University of California Berkeley, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA
| | - Fei Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Seokhee Kim
- Department of Chemistry, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| | - Luis Ángel Fernández
- Department of Microbial Biotechnology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas (CNB-CSIC), Darwin 3, Campus UAM Cantoblanco, 28049 Madrid, Spain
| | - Matthew D. Shoulders
- Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Chang C. Liu
- Department of Biomedical Engineering, University of California, Irvine, CA 92617, USA
- Department of Molecular Biology and Biochemistry, University of California, Irvine, CA 92697, USA
- Department of Chemistry, University of California, Irvine, CA 92617, USA
| |
Collapse
|
3
|
Yi X, Khey J, Kazlauskas RJ, Travisano M. Plasmid hypermutation using a targeted artificial DNA replisome. SCIENCE ADVANCES 2021; 7:7/29/eabg8712. [PMID: 34272238 PMCID: PMC8284885 DOI: 10.1126/sciadv.abg8712] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 06/02/2021] [Indexed: 06/13/2023]
Abstract
Extensive exploration of a protein's sequence space for improved or new molecular functions requires in vivo evolution with large populations. But disentangling the evolution of a target protein from the rest of the proteome is challenging. Here, we designed a protein complex of a targeted artificial DNA replisome (TADR) that operates in live cells to processively replicate one strand of a plasmid with errors. It enhanced mutation rates of the target plasmid up to 2.3 × 105-fold with only a 78-fold increase in off-target mutagenesis. It was used to evolve itself to increase error rate and increase the efficiency of an efflux pump while simultaneously expanding the substrate repertoire. TADR enables multiple simultaneous substitutions to discover functions inaccessible by accumulating single substitutions, affording potential for solving hard problems in molecular evolution and developing biologic drugs and industrial catalysts.
Collapse
Affiliation(s)
- Xiao Yi
- BioTechnology Institute, University of Minnesota, Minneapolis, MN, USA.
| | - Joleen Khey
- Department of Plant and Microbial Biology, University of Minnesota, Minneapolis, MN, USA
| | - Romas J Kazlauskas
- BioTechnology Institute, University of Minnesota, Minneapolis, MN, USA.
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Michael Travisano
- BioTechnology Institute, University of Minnesota, Minneapolis, MN, USA.
- Department of Ecology Evolution and Behavior, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|