1
|
Katriel G, Mahanaymi U, Brezner S, Kezel N, Koutschan C, Zeilberger D, Steel M, Snir S. Gene Transfer-Based Phylogenetics: Analytical Expressions and Additivity via Birth-Death Theory. Syst Biol 2023; 72:1403-1417. [PMID: 37862116 DOI: 10.1093/sysbio/syad060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 09/01/2023] [Accepted: 10/05/2023] [Indexed: 10/22/2023] Open
Abstract
The genomic era has opened up vast opportunities in molecular systematics, one of which is deciphering the evolutionary history in fine detail. Under this mass of data, analyzing the point mutations of standard markers is often too crude and slow for fine-scale phylogenetics. Nevertheless, genome dynamics (GD) events provide alternative, often richer information. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing the comparison of genomes of unequal gene content, together with order considerations of their common genes. Recently, genome dynamics has been modeled as a continuous-time Markov process, and gene distance in the genome as a birth-death-immigration process. Nevertheless, due to complexities arising in this setting, no precise and provably consistent estimators could be derived, resulting in heuristic solutions. Here, we extend this modeling approach by using techniques from birth-death theory to derive explicit expressions of the system's probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically accurate distances between organisms based on their SI. Subsequently, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency). Applying the new measure in simulation studies shows that it provides accurate results in realistic settings and even under model extensions such as gene gain/loss or over a tree structure. In the real-data realm, we applied the new formulation to unique data structure that we constructed-the ordered orthology DB-based on a new version of the EggNOG database, to construct a tree with more than 4.5K taxa. To the best of our knowledge, this is the largest gene-order-based tree constructed and it overcomes shortcomings found in previous approaches. Constructing a GD-based tree allows to confirm and contrast findings based on other phylogenetic approaches, as we show.
Collapse
Affiliation(s)
- Guy Katriel
- Department of Mathematics, Braude College of Engineering, Karmiel, Israel
| | - Udi Mahanaymi
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| | - Shelly Brezner
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| | - Noor Kezel
- Department of Mathematics, University of Haifa, Haifa, Israel
| | | | - Doron Zeilberger
- Department of Mathematics, Rutgers University, New Brunwick, NJ, USA
| | - Mike Steel
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Sagi Snir
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
2
|
Shikov AE, Malovichko YV, Nizhnikov AA, Antonets KS. Current Methods for Recombination Detection in Bacteria. Int J Mol Sci 2022; 23:ijms23116257. [PMID: 35682936 PMCID: PMC9181119 DOI: 10.3390/ijms23116257] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 05/30/2022] [Accepted: 05/30/2022] [Indexed: 02/05/2023] Open
Abstract
The role of genetic exchanges, i.e., homologous recombination (HR) and horizontal gene transfer (HGT), in bacteria cannot be overestimated for it is a pivotal mechanism leading to their evolution and adaptation, thus, tracking the signs of recombination and HGT events is importance both for fundamental and applied science. To date, dozens of bioinformatics tools for revealing recombination signals are available, however, their pros and cons as well as the spectra of solvable tasks have not yet been systematically reviewed. Moreover, there are two major groups of software. One aims to infer evidence of HR, while the other only deals with horizontal gene transfer (HGT). However, despite seemingly different goals, all the methods use similar algorithmic approaches, and the processes are interconnected in terms of genomic evolution influencing each other. In this review, we propose a classification of novel instruments for both HR and HGT detection based on the genomic consequences of recombination. In this context, we summarize available methodologies paying particular attention to the type of traceable events for which a certain program has been designed.
Collapse
Affiliation(s)
- Anton E. Shikov
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Yury V. Malovichko
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Anton A. Nizhnikov
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Kirill S. Antonets
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
- Correspondence:
| |
Collapse
|
3
|
Sevillya G. Relation between two evolutionary clocks reveal new insights in bacterial evolution. Access Microbiol 2022; 4:000265. [PMID: 35355876 PMCID: PMC8941958 DOI: 10.1099/acmi.0.000265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 07/26/2021] [Indexed: 12/02/2022] Open
Abstract
New insights in evolution are available thanks to next-generation sequencing technologies in recent years. However, due to the network of complex relations between species, caused by the intensive horizontal gene transfer (HGT) between different bacterial species, it is difficult to discover bacterial evolution. This difficulty leads to new research in the field of phylogeny, including the gene-based phylogeny, in contrast to sequence-based phylogeny. In previous articles, we presented evolutionary insights of Synteny Index (SI) study on a large biological dataset. We showed that the SI approach naturally clusters 1133 species into 39 cliques of closely related species. In addition, we presented a model that enables calculation of the number of translocation events between genomes based on their SI distance. Here, these two studies are combined together and lead to new insights. A principal result is the relation between two evolutionary clocks: the well-known sequence-based clock influenced by point mutations, and SI distance clock influenced by translocation events. A surprising linear relation between these two evolutionary clocks rising for closely related species across all genus. In other words, these two different clocks are ticking at the same rate inside the genus level. Conversely, a phase-transition manner discovered between these two clocks across non-closely related species. This may suggest a new genus definition based on an analytic approach, since the phase-transition occurs where each gene, on average, undergoes one translocation event. In addition, rare cases of HGT among highly conserved genes can be detected as outliers from the phase-transition pattern.
Collapse
Affiliation(s)
- Gur Sevillya
- Faculty of Biology, Technion - Israel Institute of Technology, Haifa, Israel
| |
Collapse
|
4
|
Bansal MS. Deciphering Microbial Gene Family Evolution Using Duplication-Transfer-Loss Reconciliation and RANGER-DTL. Methods Mol Biol 2022; 2569:233-252. [PMID: 36083451 DOI: 10.1007/978-1-0716-2691-7_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenetic reconciliation has emerged as a principled, highly effective technique for investigating the origin, spread, and evolutionary history of microbial gene families. Proper application of phylogenetic reconciliation requires a clear understanding of potential pitfalls and sources of error, and knowledge of the most effective reconciliation-based tools and protocols to use to maximize accuracy. In this book chapter, we provide a brief overview of Duplication-Transfer-Loss (DTL) reconciliation, the standard reconciliation model used to study microbial gene families and provide a step-by-step computational protocol to maximize the accuracy of DTL reconciliation and minimize false-positive evolutionary inferences.
Collapse
Affiliation(s)
- Mukul S Bansal
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
5
|
Puri A, Bajaj A, Lal S, Singh Y, Lal R. Phylogenomic Framework for Taxonomic Delineation of Paracoccus spp. and Exploration of Core-Pan Genome. Indian J Microbiol 2021; 61:180-194. [PMID: 33927459 DOI: 10.1007/s12088-021-00929-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 02/24/2021] [Indexed: 11/26/2022] Open
Abstract
The taxonomic classification of metabolically versatile Paracoccus spp. has been so far performed using polyphasic approach. The topology of single gene phylogenies, however, has highlighted ambiguous species assignments. In the present study, genome based multi-gene phylogenies and overall genome related index were used for species threshold assessment. Comprehensive phylogenomic analysis of Paracoccus genomes (n = 103) showed concordant clustering of strains across multi-gene marker set phylogenies (nMC = 0.08-0.14); as compared to 16S rDNA phylogeny (nMC = 0.37-0.42) suggesting robustness of multi gene phylogenies in drawing phylogenetic inferences. Functional gene content distribution across the genus showed that only 1.7% gene content constitutes the core genome highlighting the significance of extensive genomic variability in the evolution of Paracoccus spp. Further, genome metrics were used to validate characterized strains, identifying classification anomalies (n = 13), and based on this, genome derived taxonomic amendments were notified in present study. Conclusively, validated metric tools can be employed on whole genome sequences, including draft assemblies, for the assessment and assignment of uncharacterized strains and species level ascription of newly isolated Paracoccus strains in future.
Collapse
Affiliation(s)
- Akshita Puri
- Department of Zoology, University of Delhi, Delhi, India
- Present Address: P.G.T.D, Zoology, R.T.M Nagpur University, Nagpur, 440033 India
| | - Abhay Bajaj
- Department of Zoology, University of Delhi, Delhi, India
- Present Address: EBGD, CSIR-National Environmental Engineering Research Institute (CSIR-NEERI), Nehru Marg, Nagpur, 440020 India
| | - Sukanya Lal
- Present Address: Ramjas College, University of Delhi, Delhi, India
| | - Yogendra Singh
- Department of Zoology, University of Delhi, Delhi, India
| | - Rup Lal
- Department of Zoology, University of Delhi, Delhi, India
- Present Address: The Energy and Resources Institute Darbari Seth Block, IHC Complex, Lodhi Road, New Delhi, 110003 India
| |
Collapse
|
6
|
Sevillya G, Doerr D, Lerner Y, Stoye J, Steel M, Snir S. Horizontal Gene Transfer Phylogenetics: A Random Walk Approach. Mol Biol Evol 2020; 37:1470-1479. [PMID: 31845962 DOI: 10.1093/molbev/msz302] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The dramatic decrease in time and cost for generating genetic sequence data has opened up vast opportunities in molecular systematics, one of which is the ability to decipher the evolutionary history of strains of a species. Under this fine systematic resolution, the standard markers are too crude to provide a phylogenetic signal. Nevertheless, among prokaryotes, genome dynamics in the form of horizontal gene transfer (HGT) between organisms and gene loss seem to provide far richer information by affecting both gene order and gene content. The "synteny index" (SI) between a pair of genomes combines these latter two factors, allowing comparison of genomes with unequal gene content, together with order considerations of their common genes. Although this approach is useful for classifying close relatives, no rigorous statistical modeling for it has been suggested. Such modeling is valuable, as it allows observed measures to be transformed into estimates of time periods during evolution, yielding the "additivity" of the measure. To the best of our knowledge, there is no other additivity proof for other gene order/content measures under HGT. Here, we provide a first statistical model and analysis for the SI measure. We model the "gene neighborhood" as a "birth-death-immigration" process affected by the HGT activity over the genome, and analytically relate the HGT rate and time to the expected SI. This model is asymptotic and thus provides accurate results, assuming infinite size genomes. Therefore, we also developed a heuristic model following an "exponential decay" function, accounting for biologically realistic values, which performed well in simulations. Applying this model to 1,133 prokaryotes partitioned to 39 clusters by the rank of genus yields that the average number of genome dynamics events per gene in the phylogenetic depth of genus is around half with significant variability between genera. This result extends and confirms similar results obtained for individual genera in different manners.
Collapse
Affiliation(s)
- Gur Sevillya
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Daniel Doerr
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Yael Lerner
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Jens Stoye
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Mike Steel
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
7
|
Sevillya G, Adato O, Snir S. Detecting horizontal gene transfer: a probabilistic approach. BMC Genomics 2020; 21:106. [PMID: 32138652 PMCID: PMC7057450 DOI: 10.1186/s12864-019-6395-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 12/12/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Horizontal gene transfer (HGT) is the event of a DNA sequence being transferred between species not by inheritance. HGT is a crucial factor in prokaryotic evolution and is a significant source for genomic novelty resulting in antibiotic resistance or the outbreak of virulent strains. Detection of HGT and the mechanisms responsible and enabling it, is hence of prime importance.Existing algorithms rely on a strong phylogenetic signal distinguishing the transferred sequence from its recipient genome. Closely related species pose an even greater challenge as most genes are very similar and therefore, the phylogenetic signal is weak anyhow. Notwithstanding, the importance of detecting HGT between such organisms is extremely high for the role of HGT in the emergence of new highly virulent strains. RESULTS In a recent work we devised a novel technique that relies on loss of synteny around a gene as a witness for HGT. We used a novel heuristic for synteny measurement, SI (Syntent Index), and the technique was tested on both simulated and real data and was found to provide a greater sensitivity than other HGT techniques. This synteny-based approach suffers low specificity, in particular more closely related species. Here we devise an adaptive approach to cope with this by varying the criteria according to species distance. The new approach is doubly adaptive as it also considers the lengths of the genes being transferred. In particular, we use Chernoff bound to decree HGT both in simulations and real bacterial genomes taken from EggNog database. CONCLUSIONS Here we show empirically that this approach is more conservative than the previous χ2 based approach and provides a lower false positive rate, especially for closely related species and under wide range of genome parameters.
Collapse
Affiliation(s)
- Gur Sevillya
- Dept. of Evolutionary and Environmental Biology, University of Haifa, Haifa, 3498838, Israel
| | - Orit Adato
- Dept. of Evolutionary and Environmental Biology, University of Haifa, Haifa, 3498838, Israel
| | - Sagi Snir
- Dept. of Evolutionary and Environmental Biology, University of Haifa, Haifa, 3498838, Israel.
| |
Collapse
|
8
|
Sevillya G, Snir S. Synteny footprints provide clearer phylogenetic signal than sequence data for prokaryotic classification. Mol Phylogenet Evol 2019; 136:128-137. [DOI: 10.1016/j.ympev.2019.03.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Revised: 03/07/2019] [Accepted: 03/17/2019] [Indexed: 01/22/2023]
|
9
|
Avni E, Montoya D, Lopez D, Modlin R, Pellegrini M, Snir S. A phylogenomic study quantifies competing mechanisms for pseudogenization in prokaryotes-The Mycobacterium leprae case. PLoS One 2018; 13:e0204322. [PMID: 30383852 PMCID: PMC6211624 DOI: 10.1371/journal.pone.0204322] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 09/06/2018] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Pseudogenes are non-functional sequences in the genome with homologous sequences that are functional (i.e. genes). They are abundant in eukaryotes where they have been extensively investigated, while in prokaryotes they are significantly scarcer and less well studied. Here we conduct a comprehensive analysis of the evolution of orthologs of Mycobacterium leprae pseudogenes in prokaryotes. The leprosy pathogen M. leprae is of particular interest since it contains an unusually large number of pseudogenes, comprising approximately 40% of its entire genome. The analysis is conducted in both broad and narrow phylogenetic ranges. RESULTS We have developed an informatics-based approach to characterize the evolution of pseudogenes. This approach combines tools from phylogenomics, genomics, and transcriptomics. The results we obtain are used to assess the contributions of two mechanisms for pseudogene formation: failed horizontal gene transfer events and disruption of native genes. CONCLUSIONS We conclude that, although it was reported that in most bacteria the former is most likely responsible for the majority of pseudogenization events, in mycobacteria, and in particular in M. leprae with its exceptionally high pseudogene numbers, the latter predominates. We believe that our study sheds new light on the evolution of pseudogenes in bacteria, by utilizing new methodologies that are applied to the unusually abundant M. leprae pseudogenes and their orthologs.
Collapse
Affiliation(s)
- Eliran Avni
- Dept. of Evolutionary Biology and the Institute of Evolution, University of Haifa, Haifa, Israel
| | - Dennis Montoya
- Dept. of Molecular, Cell and Developmental Biology; University of California Los Angeles, Los Angeles, CA 90095, United States of America
| | - David Lopez
- Dept. of Molecular, Cell and Developmental Biology; University of California Los Angeles, Los Angeles, CA 90095, United States of America
| | - Robert Modlin
- Dept. of Microbiology, Immunology and Molecular Genetics, and Division of Dermatology, David Geffen School of Medicine University of California Los Angeles, Los Angeles, CA 90095, United States of America
| | - Matteo Pellegrini
- Dept. of Molecular, Cell and Developmental Biology; University of California Los Angeles, Los Angeles, CA 90095, United States of America
| | - Sagi Snir
- Dept. of Evolutionary Biology and the Institute of Evolution, University of Haifa, Haifa, Israel
| |
Collapse
|
10
|
Zhou X, Amir A, Guerra C, Landau G, Rossignac J. EDoP Distance Between Sets of Incomplete Permutations: Application to Bacteria Classification Based on Gene Order. J Comput Biol 2018; 25:1193-1202. [PMID: 30113868 DOI: 10.1089/cmb.2018.0063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In this work, we extend measures of distance between permutations to support incomplete permutations. Modeling and comparing incomplete permutations are a challenging computational problem of practical importance in many applications in bioinformatics and social science. We show that the proposed distance measure admits a closed-form expression and can be efficiently computed on sets of permutations involving several missing elements. We demonstrate the proposed method on the classification of bacteria from different phyla based on gene order.
Collapse
Affiliation(s)
- Xinrui Zhou
- 1 School of Interactive Computing, College of Computing , Georgia Tech TSRB, Atlanta, Georgia
| | - Amihood Amir
- 2 Department of Computer Science, Bar-Ilan University , Ramat-Gan, Israel
| | - Concettina Guerra
- 1 School of Interactive Computing, College of Computing , Georgia Tech TSRB, Atlanta, Georgia
| | - Gadi Landau
- 3 Department of Computer Science, University of Haifa , Haifa, Israel .,4 NYU Tandon School of Engineering, New York University , New York, New York
| | - Jarek Rossignac
- 1 School of Interactive Computing, College of Computing , Georgia Tech TSRB, Atlanta, Georgia
| |
Collapse
|
11
|
Abstract
BACKGROUND Deciphering the history of life on Earth has long been regarded as one of the most central tasks in biology. In past years, widespread discordance between the evolutionary histories of different groups of orthologous genes of prokaryotes have been revealed, primarily due to horizontal gene transfers (HGTs). Nonetheless, evidence that support a strong tree-like signal of evolution have been uncovered, despite the presence of HGT events. Therefore, a challenging task is to distill this tree-like signal from the noise induced by all sources of non-tree-like events. RESULTS In this work we tackle this question, using real and simulated data. We first tighten a recent related theoretical result in this field. In a simulation study, we infer individual quartet topologies, and then use the inferred quartets to reconstruct simulated species trees. We demonstrate that accurate tree reconstruction is feasible despite surprisingly high rates of HGT. In a real data study, we construct phylogenies of two sets of prokaryotes, and show that our tree reconstruction scheme is comparable with (and complementary better than) other commonly used methods. CONCLUSIONS Using a blend of theoretical and empirical investigations, our study proves the feasibility of accurate quartet-based phylogenetic reconstruction, the vast impact of HGT events notwithstanding.
Collapse
Affiliation(s)
- Eliran Avni
- Department of Evolutionary Biology, University of Haifa, 199 Aba Khoushy Ave. Mount Carmel, Haifa, 3498838, Israel
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, 199 Aba Khoushy Ave. Mount Carmel, Haifa, 3498838, Israel.
| |
Collapse
|
12
|
Predicting synonymous codon usage and optimizing the heterologous gene for expression in E. coli. Sci Rep 2017; 7:9926. [PMID: 28855614 PMCID: PMC5577221 DOI: 10.1038/s41598-017-10546-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Accepted: 08/11/2017] [Indexed: 11/27/2022] Open
Abstract
Of the 20 common amino acids, 18 are encoded by multiple synonymous codons. These synonymous codons are not redundant; in fact, all of codons contribute substantially to protein expression, structure and function. In this study, the codon usage pattern of genes in the E. coli was learned from the sequenced genomes of E. coli. A machine learning based method, Presyncodon was proposed to predict synonymous codon selection in E. coli based on the learned codon usage patterns of the residue in the context of the specific fragment. The predicting results indicate that Presycoden could be used to predict synonymous codon selection of the gene in the E. coli with the high accuracy. Two reporter genes (egfp and mApple) were designed with a combination of low- and high-frequency-usage codons by the method. The fluorescence intensity of eGFP and mApple expressed by the (egfp and mApple) designed by this method was about 2.3- or 1.7- folds greater than that from the genes with only high-frequency-usage codons in E. coli. Therefore, both low- and high-frequency-usage codons make positive contributions to the functional expression of the heterologous proteins. This method could be used to design synthetic genes for heterologous gene expression in biotechnology.
Collapse
|
13
|
Avni E, Snir S. Toxic genes present a unique phylogenetic signature. Mol Phylogenet Evol 2017; 116:141-148. [PMID: 28842276 DOI: 10.1016/j.ympev.2017.08.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 08/17/2017] [Accepted: 08/17/2017] [Indexed: 10/19/2022]
Abstract
Horizontal gene transfer (HGT) is a major part of the evolution of Archaea and Bacteria, to the extent that the validity of the Tree of Life concept for prokaryotes has been seriously questioned. The patterns and routes of HGT remain a subject of intense study and debate. It was discovered that while several genes exhibit rampant HGT across the whole prokaryotic tree of life, others are lethal to certain organisms and therefore cannot be successfully transferred to them. We distinguish between these two classes of genes and show analytically that genes found to be toxic to a specific species (E. coli) also resist HGT in general. Several tools we employ show evidence to support that claim. One of those tools is the quartet plurality distribution (QPD), a mathematical tool that measures tendency to HGT over a large set of genes and species. When aggregated over a collection of genes, it can reveal important properties of this collection. We conclude that evidence of toxicity of certain genes to a wide variety of prokaryotes are revealed using the new tool of quartet plurality distribution.
Collapse
Affiliation(s)
- Eliran Avni
- Dept. of Evolutionary Biology, University of Haifa, Haifa 31905, Israel.
| | - Sagi Snir
- Dept. of Evolutionary Biology, University of Haifa, Haifa 31905, Israel.
| |
Collapse
|
14
|
Snir S. Ordered orthology as a tool in prokaryotic evolutionary inference. Mob Genet Elements 2017; 6:e1120576. [PMID: 28090377 DOI: 10.1080/2159256x.2015.1120576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Revised: 10/27/2015] [Accepted: 11/10/2015] [Indexed: 10/22/2022] Open
Abstract
Molecular data is accumulated at exponentially increasing pace. This deluge of information should have brought us closer to resolving one of the most fundamental issues in biology - deciphering the history of life on Earth. So far, however, this abundance of data only seems to blur our understanding of the problem. This is largely due to horizontal gene transfer (HGT), the transfer of genetic material between evolutionarily unrelated organisms that transforms the prokaryotic tree into a network of relationships. Recently, we developed a method to infer evolutionary relationships among closely related species where the conventional evolutionary markers do not provide a strong enough signal. The method relies on the loss of synteny, gene order conservation among species that provides a stronger signal, sufficient to classify even strains of a given species. Here we elaborate on this method and suggest further uses of it in the context of detecting HGT events and genome architecture.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary Biology, University of Haifa , Haifa, Israel
| |
Collapse
|
15
|
Yang WF, Yu ZG, Anh V. Whole genome/proteome based phylogeny reconstruction for prokaryotes using higher order Markov model and chaos game representation. Mol Phylogenet Evol 2015; 96:102-111. [PMID: 26724405 DOI: 10.1016/j.ympev.2015.12.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 12/17/2015] [Accepted: 12/18/2015] [Indexed: 01/18/2023]
Abstract
UNLABELLED Traditional methods for sequence comparison and phylogeny reconstruction rely on pair wise and multiple sequence alignments. But alignment could not be directly applied to whole genome/proteome comparison and phylogenomic studies due to their high computational complexity. Hence alignment-free methods became popular in recent years. Here we propose a fast alignment-free method for whole genome/proteome comparison and phylogeny reconstruction using higher order Markov model and chaos game representation. In the present method, we use the transition matrices of higher order Markov models to characterize amino acid or DNA sequences for their comparison. The order of the Markov model is uniquely identified by maximizing the average Shannon entropy of conditional probability distributions. Using one-dimensional chaos game representation and linked list, this method can reduce large memory and time consumption which is due to the large-scale conditional probability distributions. To illustrate the effectiveness of our method, we employ it for fast phylogeny reconstruction based on genome/proteome sequences of two species data sets used in previous published papers. Our results demonstrate that the present method is useful and efficient. AVAILABILITY AND IMPLEMENTATION The source codes for our algorithm to get the distance matrix and genome/proteome sequences can be downloaded from ftp://121.199.20.25/. The software Phylip and EvolView we used to construct phylogenetic trees can be referred from their websites.
Collapse
Affiliation(s)
- Wei-Feng Yang
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, PR China; Department of Mathematics and Physics, Hunan Institute of Engineering, Hunan 411104, PR China.
| | - Zu-Guo Yu
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, PR China; School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia.
| | - Vo Anh
- School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia.
| |
Collapse
|
16
|
Adato O, Ninyo N, Gophna U, Snir S. Detecting Horizontal Gene Transfer between Closely Related Taxa. PLoS Comput Biol 2015; 11:e1004408. [PMID: 26439115 PMCID: PMC4595140 DOI: 10.1371/journal.pcbi.1004408] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 06/20/2015] [Indexed: 01/12/2023] Open
Abstract
Horizontal gene transfer (HGT), the transfer of genetic material between organisms, is crucial for genetic innovation and the evolution of genome architecture. Existing HGT detection algorithms rely on a strong phylogenetic signal distinguishing the transferred sequence from ancestral (vertically derived) genes in its recipient genome. Detecting HGT between closely related species or strains is challenging, as the phylogenetic signal is usually weak and the nucleotide composition is normally nearly identical. Nevertheless, there is a great importance in detecting HGT between congeneric species or strains, especially in clinical microbiology, where understanding the emergence of new virulent and drug-resistant strains is crucial, and often time-sensitive. We developed a novel, self-contained technique named Near HGT, based on the synteny index, to measure the divergence of a gene from its native genomic environment and used it to identify candidate HGT events between closely related strains. The method confirms candidate transferred genes based on the constant relative mutability (CRM). Using CRM, the algorithm assigns a confidence score based on “unusual” sequence divergence. A gene exhibiting exceptional deviations according to both synteny and mutability criteria, is considered a validated HGT product. We first employed the technique to a set of three E. coli strains and detected several highly probable horizontally acquired genes. We then compared the method to existing HGT detection tools using a larger strain data set. When combined with additional approaches our new algorithm provides richer picture and brings us closer to the goal of detecting all newly acquired genes in a particular strain. The transfer of genetic material between organisms, usually denoted as horizontal (or lateral) gene transfer (HGT or LGT), is a prime mechanism in microbial evolution and responsible for genetic innovation and the evolution of genome architecture. Detecting HGT between closely related species or strains is imperative as drug-resistant pathogenic strains most often acquire their virulence from closely related bacteria. The proposed method combines two evolutionary signals that were not employed in the past for this task. One is the synteny index (SI), measuring the loss of synteny in an organism, and the other is a novel concept—constant relative mutability (CRM), maintaining that genes preserve their relative evolution rate along linages (although the latter ones may each change). We show both in simulation and real biological data that the method is sound and, in the cases examined, provides stronger sensitivity than existing methods. We therefore believe this novel approach represents a significant advance, for the first time enabling the detection of previously ignored HGT events that will bring us closer to the goal of detecting all newly acquired genes in a particular strain. Availability: The method is publicly available at http://research.haifa.ac.il/~ssagi/software/nearHGT.zip
Collapse
Affiliation(s)
- Orit Adato
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Noga Ninyo
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Uri Gophna
- Department of Molecular Microbiology and Biotechnology Tel Aviv University, Tel-Aviv, Israel
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
- * E-mail:
| |
Collapse
|
17
|
House CH, Pellegrini M, Fitz-Gibbon ST. Genome-wide gene order distances support clustering the gram-positive bacteria. Front Microbiol 2015; 5:785. [PMID: 25653643 PMCID: PMC4299520 DOI: 10.3389/fmicb.2014.00785] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 12/21/2014] [Indexed: 11/29/2022] Open
Abstract
Initially using 143 genomes, we developed a method for calculating the pair-wise distance between prokaryotic genomes using a Monte Carlo method to estimate the conservation of gene order. The method was based on repeatedly selecting five or six non-adjacent random orthologs from each of two genomes and determining if the chosen orthologs were in the same order. The raw distances were then corrected for gene order convergence using an adaptation of the Jukes-Cantor model, as well as using the common distance correction D' = -ln(1-D). First, we compared the distances found via the order of six orthologs to distances found based on ortholog gene content and small subunit rRNA sequences. The Jukes-Cantor gene order distances are reasonably well correlated with the divergence of rRNA (R (2) = 0.24), especially at rRNA Jukes-Cantor distances of less than 0.2 (R (2) = 0.52). Gene content is only weakly correlated with rRNA divergence (R (2) = 0.04) over all distances, however, it is especially strongly correlated at rRNA Jukes-Cantor distances of less than 0.1 (R (2) = 0.67). This initial work suggests that gene order may be useful in conjunction with other methods to help understand the relatedness of genomes. Using the gene order distances in 143 genomes, the relations of prokaryotes were studied using neighbor joining and agreement subtrees. We then repeated our study of the relations of prokaryotes using gene order in 172 complete genomes better representing a wider-diversity of prokaryotes. Consistently, our trees show the Actinobacteria as a sister group to the bulk of the Firmicutes. In fact, the robustness of gene order support was found to be considerably greater for uniting these two phyla than for uniting any of the proteobacterial classes together. The results are supportive of the idea that Actinobacteria and Firmicutes are closely related, which in turn implies a single origin for the gram-positive cell.
Collapse
Affiliation(s)
- Christopher H. House
- Penn State Astrobiology Research Center and Department of Geosciences, The Pennsylvania State UniversityUniversity Park, PA, USA
| | - Matteo Pellegrini
- Department of Molecular, Cell, and Developmental Biology, University of California, Los AngelesLos Angeles, CA, USA
- Department of Molecular, Cell, and Developmental Biology, Institute of Genomics and Proteomics, University of California, Los AngelesLos Angeles, CA, USA
| | - Sorel T. Fitz-Gibbon
- Department of Molecular, Cell, and Developmental Biology, University of California, Los AngelesLos Angeles, CA, USA
- Department of Molecular, Cell, and Developmental Biology, Institute of Genomics and Proteomics, University of California, Los AngelesLos Angeles, CA, USA
| |
Collapse
|
18
|
Flexibility and symmetry of prokaryotic genome rearrangement reveal lineage-associated core-gene-defined genome organizational frameworks. mBio 2014; 5:e01867. [PMID: 25425232 PMCID: PMC4251990 DOI: 10.1128/mbio.01867-14] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The prokaryotic pangenome partitions genes into core and dispensable genes. The order of core genes, albeit assumed to be stable under selection in general, is frequently interrupted by horizontal gene transfer and rearrangement, but how a core-gene-defined genome maintains its stability or flexibility remains to be investigated. Based on data from 30 species, including 425 genomes from six phyla, we grouped core genes into syntenic blocks in the context of a pangenome according to their stability across multiple isolates. A subset of the core genes, often species specific and lineage associated, formed a core-gene-defined genome organizational framework (cGOF). Such cGOFs are either single segmental (one-third of the species analyzed) or multisegmental (the rest). Multisegment cGOFs were further classified into symmetric or asymmetric according to segment orientations toward the origin-terminus axis. The cGOFs in Gram-positive species are exclusively symmetric and often reversible in orientation, as opposed to those of the Gram-negative bacteria, which are all asymmetric and irreversible. Meanwhile, all species showing strong strand-biased gene distribution contain symmetric cGOFs and often specific DnaE (α subunit of DNA polymerase III) isoforms. Furthermore, functional evaluations revealed that cGOF genes are hub associated with regard to cellular activities, and the stability of cGOF provides efficient indexes for scaffold orientation as demonstrated by assembling virtual and empirical genome drafts. cGOFs show species specificity, and the symmetry of multisegmental cGOFs is conserved among taxa and constrained by DNA polymerase-centric strand-biased gene distribution. The definition of species-specific cGOFs provides powerful guidance for genome assembly and other structure-based analysis. Prokaryotic genomes are frequently interrupted by horizontal gene transfer (HGT) and rearrangement. To know whether there is a set of genes not only conserved in position among isolates but also functionally essential for a given species and to further evaluate the stability or flexibility of such genome structures across lineages are of importance. Based on a large number of multi-isolate pangenomic data, our analysis reveals that a subset of core genes is organized into a core-gene-defined genome organizational framework, or cGOF. Furthermore, the lineage-associated cGOFs among Gram-positive and Gram-negative bacteria behave differently: the former, composed of 2 to 4 segments, have their fragments symmetrically rearranged around the origin-terminus axis, whereas the latter show more complex segmentation and are partitioned asymmetrically into chromosomal structures. The definition of cGOFs provides new insights into prokaryotic genome organization and efficient guidance for genome assembly and analysis.
Collapse
|