Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Dufraigne C, Fertil B, Lespinats S, Giron A, Deschavanne P. Detection and characterization of horizontal transfers in prokaryotes using genomic signature. Nucleic Acids Res 2005;33:e6. [PMID: 15653627 PMCID: PMC546175 DOI: 10.1093/nar/gni004] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

For:	Dufraigne C, Fertil B, Lespinats S, Giron A, Deschavanne P. Detection and characterization of horizontal transfers in prokaryotes using genomic signature. Nucleic Acids Res 2005;33:e6. [PMID: 15653627 PMCID: PMC546175 DOI: 10.1093/nar/gni004] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Hellmuth M, Stadler PF. The Theory of Gene Family Histories. Methods Mol Biol 2024;2802:1-32. [PMID: 38819554 DOI: 10.1007/978-1-0716-3838-5_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]

Bernaola-Galván P, Carpena P, Gómez-Martín C, Oliver JL. Compositional Structure of the Genome: A Review. BIOLOGY 2023;12:849. [PMID: 37372134 PMCID: PMC10295253 DOI: 10.3390/biology12060849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/06/2023] [Accepted: 06/07/2023] [Indexed: 06/29/2023]

Abstract

As the genome carries the historical information of a species' biotic and environmental interactions, analyzing changes in genome structure over time by using powerful statistical physics methods (such as entropic segmentation algorithms, fluctuation analysis in DNA walks, or measures of compositional complexity) provides valuable insights into genome evolution. Nucleotide frequencies tend to vary along the DNA chain, resulting in a hierarchically patchy chromosome structure with heterogeneities at different length scales that range from a few nucleotides to tens of millions of them. Fluctuation analysis reveals that these compositional structures can be classified into three main categories: (1) short-range heterogeneities (below a few kilobase pairs (Kbp)) primarily attributed to the alternation of coding and noncoding regions, interspersed or tandem repeats densities, etc.; (2) isochores, spanning tens to hundreds of tens of Kbp; and (3) superstructures, reaching sizes of tens of megabase pairs (Mbp) or even larger. The obtained isochore and superstructure coordinates in the first complete T2T human sequence are now shared in a public database. In this way, interested researchers can use T2T isochore data, as well as the annotations for different genome elements, to check a specific hypothesis about genome structure. Similarly to other levels of biological organization, a hierarchical compositional structure is prevalent in the genome. Once the compositional structure of a genome is identified, various measures can be derived to quantify the heterogeneity of such structure. The distribution of segment G+C content has recently been proposed as a new genome signature that proves to be useful for comparing complete genomes. Another meaningful measure is the sequence compositional complexity (SCC), which has been used for genome structure comparisons. Lastly, we review the recent genome comparisons in species of the ancient phylum Cyanobacteria, conducted by phylogenetic regression of SCC against time, which have revealed positive trends towards higher genome complexity. These findings provide the first evidence for a driven progressive evolution of genome compositional structure.

Collapse

de la Fuente R, Díaz-Villanueva W, Arnau V, Moya A. Genomic Signature in Evolutionary Biology: A Review. BIOLOGY 2023;12:biology12020322. [PMID: 36829597 PMCID: PMC9953303 DOI: 10.3390/biology12020322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 02/11/2023] [Accepted: 02/13/2023] [Indexed: 02/19/2023]

Kern L, Abdeen SK, Kolodziejczyk AA, Elinav E. Commensal inter-bacterial interactions shaping the microbiota. Curr Opin Microbiol 2021;63:158-171. [PMID: 34365152 DOI: 10.1016/j.mib.2021.07.011] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 07/15/2021] [Accepted: 07/16/2021] [Indexed: 12/14/2022]

Noroy C, Meyer DF. The super repertoire of type IV effectors in the pangenome of Ehrlichia spp. provides insights into host-specificity and pathogenesis. PLoS Comput Biol 2021;17:e1008788. [PMID: 34252087 PMCID: PMC8274917 DOI: 10.1371/journal.pcbi.1008788] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 05/26/2021] [Indexed: 11/28/2022] Open

Abstract

The identification of bacterial effectors is essential to understand how obligatory intracellular bacteria such as Ehrlichia spp. manipulate the host cell for survival and replication. Infection of mammals–including humans–by the intracellular pathogenic bacteria Ehrlichia spp. depends largely on the injection of virulence proteins that hijack host cell processes. Several hypothetical virulence proteins have been identified in Ehrlichia spp., but one so far has been experimentally shown to translocate into host cells via the type IV secretion system. However, the current challenge is to identify most of the type IV effectors (T4Es) to fully understand their role in Ehrlichia spp. virulence and host adaptation. Here, we predict the T4E repertoires of four sequenced Ehrlichia spp. and four other Anaplasmataceae as comparative models (pathogenic Anaplasma spp. and Wolbachia endosymbiont) using previously developed S4TE 2.0 software. This analysis identified 579 predicted T4Es (228 pT4Es for Ehrlichia spp. only). The effector repertoires of Ehrlichia spp. overlapped, thereby defining a conserved core effectome of 92 predicted effectors shared by all strains. In addition, 69 species-specific T4Es were predicted with non-canonical GC% mostly in gene sparse regions of the genomes and we observed a bias in pT4Es according to host-specificity. We also identified new protein domain combinations, suggesting novel effector functions. This work presenting the predicted effector collection of Ehrlichia spp. can serve as a guide for future functional characterisation of effectors and design of alternative control strategies against these bacteria.

A fundamental step for the survival and replication of intravacuolar bacterial pathogens is the establishment of a replicative niche inside host cells by the secretion of bacterial effector proteins in the cytoplasm of the infected cells. These effectors manipulate host signaling pathways, thus allowing to escape the host degradative pathway and uptake nutrients required for intracellular replication of bacteria. In this study, we used S4TE2.0 software for high-throughput computational prediction of bacterial type IV effectors in zoonotic bacteria of the Anaplasmataceae family. The analysis of protein architecture of effectors helped us to identify the cellular pathways targeted during the infection process. The demonstration that effectors are modular components with a broad variety of protein architectures nicely explains their pleotropic mode of action and enlightens their function. We showed that bacterial adaptation to a given host during evolution requires a minimal repertoire of candidate effectors although further experimental determination is needed. T4Es are of increasing interest for basic research, including comprehension of hijacked cellular pathways, manipulated innate immunity, and application for therapeutics. Indeed pathogenomics-driven studies, especially on genetically intractable intracellular bacteria such as Anaplasmataceae, have now a substantial impact for the development of host-targeted antimicrobials, as an alternative to antibiotics.

Collapse

Indirect identification of horizontal gene transfer. J Math Biol 2021;83:10. [PMID: 34218334 PMCID: PMC8254804 DOI: 10.1007/s00285-021-01631-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 04/06/2021] [Accepted: 06/13/2021] [Indexed: 12/04/2022]

Tay AP, Hosking B, Hosking C, Bauer DC, Wilson LO. INSIDER: alignment-free detection of foreign DNA sequences. Comput Struct Biotechnol J 2021;19:3810-3816. [PMID: 34285780 PMCID: PMC8273350 DOI: 10.1016/j.csbj.2021.06.045] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 06/28/2021] [Accepted: 06/28/2021] [Indexed: 11/21/2022] Open

Bize A, Midoux C, Mariadassou M, Schbath S, Forterre P, Da Cunha V. Exploring short k-mer profiles in cells and mobile elements from Archaea highlights the major influence of both the ecological niche and evolutionary history. BMC Genomics 2021;22:186. [PMID: 33726663 PMCID: PMC7962313 DOI: 10.1186/s12864-021-07471-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 02/24/2021] [Indexed: 12/16/2022] Open

Abstract

BACKGROUND

K-mer-based methods have greatly advanced in recent years, largely driven by the realization of their biological significance and by the advent of next-generation sequencing. Their speed and their independence from the annotation process are major advantages. Their utility in the study of the mobilome has recently emerged and they seem a priori adapted to the patchy gene distribution and the lack of universal marker genes of viruses and plasmids. To provide a framework for the interpretation of results from k-mer based methods applied to archaea or their mobilome, we analyzed the 5-mer DNA profiles of close to 600 archaeal cells, viruses and plasmids. Archaea is one of the three domains of life. Archaea seem enriched in extremophiles and are associated with a high diversity of viral and plasmid families, many of which are specific to this domain. We explored the dataset structure by multivariate and statistical analyses, seeking to identify the underlying factors.

RESULTS

For cells, the 5-mer profiles were inconsistent with the phylogeny of archaea. At a finer taxonomic level, the influence of the taxonomy and the environmental constraints on 5-mer profiles was very strong. These two factors were interdependent to a significant extent, and the respective weights of their contributions varied according to the clade. A convergent adaptation was observed for the class Halobacteria, for which a strong 5-mer signature was identified. For mobile elements, coevolution with the host had a clear influence on their 5-mer profile. This enabled us to identify one previously known and one new case of recent host transfer based on the atypical composition of the mobile elements involved. Beyond the effect of coevolution, extrachromosomal elements strikingly retain the specific imprint of their own viral or plasmid taxonomic family in their 5-mer profile.

CONCLUSION

This specific imprint confirms that the evolution of extrachromosomal elements is driven by multiple parameters and is not restricted to host adaptation. In addition, we detected only recent host transfer events, suggesting the fast evolution of short k-mer profiles. This calls for caution when using k-mers for host prediction, metagenomic binning or phylogenetic reconstruction.

Collapse

Goussarov G, Cleenwerck I, Mysara M, Leys N, Monsieurs P, Tahon G, Carlier A, Vandamme P, Van Houdt R. PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing. Bioinformatics 2020;36:2337-2344. [PMID: 31899493 PMCID: PMC7178395 DOI: 10.1093/bioinformatics/btz964] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 11/21/2019] [Accepted: 12/30/2019] [Indexed: 11/13/2022] Open

Zhou Y, Zhang W, Wu H, Huang K, Jin J. A high-resolution genomic composition-based method with the ability to distinguish similar bacterial organisms. BMC Genomics 2019;20:754. [PMID: 31638897 PMCID: PMC6805505 DOI: 10.1186/s12864-019-6119-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 09/20/2019] [Indexed: 12/03/2022] Open

Abstract

Background

Genomic composition has been found to be species specific and is used to differentiate bacterial species. To date, almost no published composition-based approaches are able to distinguish between most closely related organisms, including intra-genus species and intra-species strains. Thus, it is necessary to develop a novel approach to address this problem.

Results

Here, we initially determine that the “tetranucleotide-derived z-value Pearson correlation coefficient” (TETRA) approach is representative of other published statistical methods. Then, we devise a novel method called “Tetranucleotide-derived Z-value Manhattan Distance” (TZMD) and compare it with the TETRA approach. Our results show that TZMD reflects the maximal genome difference, while TETRA does not in most conditions, demonstrating in theory that TZMD provides improved resolution. Additionally, our analysis of real data shows that TZMD improves species differentiation and clearly differentiates similar organisms, including similar species belonging to the same genospecies, subspecies and intraspecific strains, most of which cannot be distinguished by TETRA. Furthermore, TZMD is able to determine clonal strains with the TZMD = 0 criterion, which intrinsically encompasses identical composition, high average nucleotide identity and high percentage of shared genomes.

Conclusions

Our extensive assessment demonstrates that TZMD has high resolution. This study is the first to propose a composition-based method for differentiating bacteria at the strain level and to demonstrate that composition is also strain specific. TZMD is a powerful tool and the first easy-to-use approach for differentiating clonal and non-clonal strains. Therefore, as the first composition-based algorithm for strain typing, TZMD will facilitate bacterial studies in the future.

Collapse

Bernard G, Chan CX, Chan YB, Chua XY, Cong Y, Hogan JM, Maetschke SR, Ragan MA. Alignment-free inference of hierarchical and reticulate phylogenomic relationships. Brief Bioinform 2019;20:426-435. [PMID: 28673025 PMCID: PMC6433738 DOI: 10.1093/bib/bbx067] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Revised: 05/04/2017] [Indexed: 11/22/2022] Open

Liu L, Anderson C, Pearl D, Edwards SV. Modern Phylogenomics: Building Phylogenetic Trees Using the Multispecies Coalescent Model. Methods Mol Biol 2019;1910:211-239. [PMID: 31278666 DOI: 10.1007/978-1-4939-9074-0_7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Abstract

The multispecies coalescent (MSC) model provides a compelling framework for building phylogenetic trees from multilocus DNA sequence data. The pure MSC is best thought of as a special case of so-called "multispecies network coalescent" models, in which gene flow is allowed among branches of the tree, whereas MSC methods assume there is no gene flow between diverging species. Early implementations of the MSC, such as "parsimony" or "democratic vote" approaches to combining information from multiple gene trees, as well as concatenation, in which DNA sequences from multiple gene trees are combined into a single "supergene," were quickly shown to be inconsistent in some regions of tree space, in so far as they converged on the incorrect species tree as more gene trees and sequence data were accumulated. The anomaly zone, a region of tree space in which the most frequent gene tree is different from the species tree, is one such region where many so-called "coalescent" methods are inconsistent. Second-generation implementations of the MSC employed Bayesian or likelihood models; these are consistent in all regions of gene tree space, but Bayesian methods in particular are incapable of handling the large phylogenomic data sets currently available. Two-step methods, such as MP-EST and ASTRAL, in which gene trees are first estimated and then combined to estimate an overarching species tree, are currently popular in part because they can handle large phylogenomic data sets. These methods are consistent in the anomaly zone but can sometimes provide inappropriate measures of tree support or apportion error and signal in the data inappropriately. MP-EST in particular employs a likelihood model which can be conveniently manipulated to perform statistical tests of competing species trees, incorporating the likelihood of the collected gene trees on each species tree in a likelihood ratio test. Such tests provide a useful alternative to the multilocus bootstrap, which only indirectly tests the appropriateness of competing species trees. We illustrate these tests and implementations of the MSC with examples and suggest that MSC methods are a useful class of models effectively using information from multiple loci to build phylogenetic trees.

Collapse

Danchin A, Ouzounis C, Tokuyasu T, Zucker JD. No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects. Microb Biotechnol 2018;11:588-605. [PMID: 29806194 PMCID: PMC6011933 DOI: 10.1111/1751-7915.13284] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Ren J, Bai X, Lu YY, Tang K, Wang Y, Reinert G, Sun F. Alignment-Free Sequence Analysis and Applications. Annu Rev Biomed Data Sci 2018;1:93-114. [PMID: 31828235 PMCID: PMC6905628 DOI: 10.1146/annurev-biodatasci-080917-013431] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]

Clasen FJ, Pierneef RE, Slippers B, Reva O. EuGI: a novel resource for studying genomic islands to facilitate horizontal gene transfer detection in eukaryotes. BMC Genomics 2018;19:323. [PMID: 29724163 PMCID: PMC5934851 DOI: 10.1186/s12864-018-4724-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 04/25/2018] [Indexed: 11/17/2022] Open

Tang K, Lu YY, Sun F. Background Adjusted Alignment-Free Dissimilarity Measures Improve the Detection of Horizontal Gene Transfer. Front Microbiol 2018;9:711. [PMID: 29713314 PMCID: PMC5911508 DOI: 10.3389/fmicb.2018.00711] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Accepted: 03/27/2018] [Indexed: 11/20/2022] Open

Horizontal acquisition of a hypoxia-responsive molybdenum cofactor biosynthesis pathway contributed to Mycobacterium tuberculosis pathoadaptation. PLoS Pathog 2017;13:e1006752. [PMID: 29176894 PMCID: PMC5720804 DOI: 10.1371/journal.ppat.1006752] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Revised: 12/07/2017] [Accepted: 11/13/2017] [Indexed: 12/16/2022] Open

Abstract

The unique ability of the tuberculosis (TB) bacillus, Mycobacterium tuberculosis, to persist for long periods of time in lung hypoxic lesions chiefly contributes to the global burden of latent TB. We and others previously reported that the M. tuberculosis ancestor underwent massive episodes of horizontal gene transfer (HGT), mostly from environmental species. Here, we sought to explore whether such ancient HGT played a part in M. tuberculosis evolution towards pathogenicity. We were interested by a HGT-acquired M. tuberculosis-specific gene set, namely moaA1-D1, which is involved in the biosynthesis of the molybdenum cofactor. Horizontal acquisition of this gene set was striking because homologues of these moa genes are present all across the Mycobacterium genus, including in M. tuberculosis. Here, we discovered that, unlike their paralogues, the moaA1-D1 genes are strongly induced under hypoxia. In vitro, a M. tuberculosis moaA1-D1-null mutant has an impaired ability to respire nitrate, to enter dormancy and to survive in oxygen-limiting conditions. Conversely, heterologous expression of moaA1-D1 in the phylogenetically closest non-TB mycobacterium, Mycobacterium kansasii, which lacks these genes, improves its capacity to respire nitrate and grants it with a marked ability to survive oxygen depletion. In vivo, the M. tuberculosis moaA1-D1-null mutant shows impaired survival in hypoxic granulomas in C3HeB/FeJ mice, but not in normoxic lesions in C57BL/6 animals. Collectively, our results identify a novel pathway required for M. tuberculosis resistance to host-imposed stress, namely hypoxia, and provide evidence that ancient HGT bolstered M. tuberculosis evolution from an environmental species towards a pervasive human-adapted pathogen.

Mycobacterium tuberculosis, the etiological agent of tuberculosis (TB), can persist for years and even decades in the lungs of its human host. Here we report that a unique M. tuberculosis gene cluster involved in the synthesis of the molybdenum cofactor, a cofactor for several oxidoreductases including the nitrate reductase, allows this major pathogen to respire nitrate and to persist in a dormant state under hypoxia, a stress condition encountered in lung TB lesions. Strikingly the M. tuberculosis ancestor, which most likely was an environmental harmless bacterium, acquired this gene cluster, together with its hypoxia-responsive transcriptional regulator, horizontally from neighboring bacteria. Our results uncover a key step in M. tuberculosis evolution towards pathogenicity.

Collapse

Gatherer D. Genome Signatures, Self-Organizing Maps and Higher Order Phylogenies: A Parametric Analysis. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Zielezinski A, Vinga S, Almeida J, Karlowski WM. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol 2017;18:186. [PMID: 28974235 PMCID: PMC5627421 DOI: 10.1186/s13059-017-1319-7] [Citation(s) in RCA: 248] [Impact Index Per Article: 35.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open

Genomic Analysis of Calderihabitans maritimus KKC1, a Thermophilic, Hydrogenogenic, Carboxydotrophic Bacterium Isolated from Marine Sediment. Appl Environ Microbiol 2017;83:AEM.00832-17. [PMID: 28526793 DOI: 10.1128/aem.00832-17] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Accepted: 05/13/2017] [Indexed: 11/20/2022] Open

Abstract

Calderihabitans maritimus KKC1 is a thermophilic, hydrogenogenic carboxydotroph isolated from a submerged marine caldera. Here, we describe the de novo sequencing and feature analysis of the C. maritimus KKC1 genome. Genome-based phylogenetic analysis confirmed that C. maritimus KKC1 was most closely related to the genus Moorella, which includes well-studied acetogenic members. Comparative genomic analysis revealed that, like Moorella, C. maritimus KKC1 retained both the CO₂-reducing Wood-Ljungdahl pathway and energy-converting hydrogenase-based module activated by reduced ferredoxin, but it lacked the HydABC and NfnAB electron-bifurcating enzymes and pyruvate:ferredoxin oxidoreductase required for ferredoxin reduction for acetogenic growth. Furthermore, C. maritimus KKC1 harbored six genes encoding CooS, a catalytic subunit of the anaerobic CO dehydrogenase that can reduce ferredoxin via CO oxidation, whereas Moorella possessed only two CooS genes. Our analysis revealed that three cooS genes formed known gene clusters in other microorganisms, i.e., cooS-acetyl coenzyme A (acetyl-CoA) synthase (which contained a frameshift mutation), cooS-energy-converting hydrogenase, and cooF-cooS-FAD-NAD oxidoreductase, while the other three had novel genomic contexts. Sequence composition analysis indicated that these cooS genes likely evolved from a common ancestor. Collectively, these data suggest that C. maritimus KKC1 may be highly dependent on CO as a low-potential electron donor to directly reduce ferredoxin and may be more suited to carboxydotrophic growth compared to the acetogenic growth observed in Moorella, which show adaptation at a thermodynamic limit.IMPORTANCECalderihabitans maritimus KKC1 and members of the genus Moorella are phylogenetically related but physiologically distinct. The former is a hydrogenogenic carboxydotroph that can grow on carbon monoxide (CO) with H₂ production, whereas the latter include acetogenic bacteria that grow on H₂ plus CO₂ with acetate production. Both species may require reduced ferredoxin as an actual "energy equivalent," but ferredoxin is a low-potential electron carrier and requires a high-energy substrate as an electron donor for reduction. Comparative genomic analysis revealed that C. maritimus KKC1 lacked specific electron-bifurcating enzymes and possessed six CO dehydrogenases, unlike Moorella species. This suggests that C. maritimus KKC1 may be more dependent on CO, a strong electron donor that can directly reduce ferredoxin via CO dehydrogenase, and may exhibit a survival strategy different from that of acetogenic Moorella, which solves the energetic barrier associated with endergonic reduction of ferredoxin with hydrogen.

Collapse

Barros-Carvalho GA, Van Sluys MA, Lopes FM. An Efficient Approach to Explore and Discriminate Anomalous Regions in Bacterial Genomes Based on Maximum Entropy. J Comput Biol 2017;24:1125-1133. [PMID: 28570142 DOI: 10.1089/cmb.2017.0042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Cuecas A, Kanoksilapatham W, Gonzalez JM. Evidence of horizontal gene transfer by transposase gene analyses in Fervidobacterium species. PLoS One 2017;12:e0173961. [PMID: 28426805 PMCID: PMC5398504 DOI: 10.1371/journal.pone.0173961] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 03/01/2017] [Indexed: 11/25/2022] Open

Jain S, Panda A, Colson P, Raoult D, Pontarotti P. MimiLook: A Phylogenetic Workflow for Detection of Gene Acquisition in Major Orthologous Groups of Megavirales. Viruses 2017;9:v9040072. [PMID: 28387730 PMCID: PMC5408678 DOI: 10.3390/v9040072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Revised: 04/03/2017] [Accepted: 04/03/2017] [Indexed: 12/20/2022] Open

Abstract

With the inclusion of new members, understanding about evolutionary mechanisms and processes by which members of the proposed order, Megavirales, have evolved has become a key area of interest. The central role of gene acquisition has been shown in previous studies. However, the major drawback in gene acquisition studies is the focus on few MV families or putative families with large variation in their genetic structure. Thus, here we have tried to develop a methodology by which we can detect horizontal gene transfers (HGTs), taking into consideration orthologous groups of distantly related Megavirale families. Here, we report an automated workflow MimiLook, prepared as a Perl command line program, that deduces orthologous groups (OGs) from ORFomes of Megavirales and constructs phylogenetic trees by performing alignment generation, alignment editing and protein-protein BLAST (BLASTP) searching across the National Center for Biotechnology Information (NCBI) non-redundant (nr) protein sequence database. Finally, this tool detects statistically validated events of gene acquisitions with the help of the T-REX algorithm by comparing individual gene tree with NCBI species tree. In between the steps, the workflow decides about handling paralogs, filtering outputs, identifying Megavirale specific OGs, detection of HGTs, along with retrieval of information about those OGs that are monophyletic with organisms from cellular domains of life. By implementing MimiLook, we noticed that nine percent of Megavirale gene families (i.e., OGs) have been acquired by HGT, 80% OGs were Megaviralespecific and eight percent were found to be sharing common ancestry with members of cellular domains (Eukaryote, Bacteria, Archaea, Phages or other viruses) and three percent were ambivalent. The results are briefly discussed to emphasize methodology. Also, MimiLook is relevant for detecting evolutionary scenarios in other targeted phyla with user defined modifications. It can be accessed at following link 10.6084/m9.figshare.4653622.

Collapse

Jermiin LS, Jayaswal V, Ababneh FM, Robinson J. Identifying Optimal Models of Evolution. Methods Mol Biol 2017;1525:379-420. [PMID: 27896729 DOI: 10.1007/978-1-4939-6622-6_15] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Maumus F, Blanc G. Study of Gene Trafficking between Acanthamoeba and Giant Viruses Suggests an Undiscovered Family of Amoeba-Infecting Viruses. Genome Biol Evol 2016;8:3351-3363. [PMID: 27811174 PMCID: PMC5203793 DOI: 10.1093/gbe/evw260] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2016] [Indexed: 01/10/2023] Open

Lateral Gene Transfer in a Heavy Metal-Contaminated-Groundwater Microbial Community. mBio 2016;7:e02234-15. [PMID: 27048805 PMCID: PMC4817265 DOI: 10.1128/mbio.02234-15] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Abstract

Unraveling the drivers controlling the response and adaptation of biological communities to environmental change, especially anthropogenic activities, is a central but poorly understood issue in ecology and evolution. Comparative genomics studies suggest that lateral gene transfer (LGT) is a major force driving microbial genome evolution, but its role in the evolution of microbial communities remains elusive. To delineate the importance of LGT in mediating the response of a groundwater microbial community to heavy metal contamination, representative Rhodanobacter reference genomes were sequenced and compared to shotgun metagenome sequences. 16S rRNA gene-based amplicon sequence analysis indicated that Rhodanobacter populations were highly abundant in contaminated wells with low pHs and high levels of nitrate and heavy metals but remained rare in the uncontaminated wells. Sequence comparisons revealed that multiple geochemically important genes, including genes encoding Fe²⁺/Pb²⁺ permeases, most denitrification enzymes, and cytochrome c₅₅₃, were native to Rhodanobacter and not subjected to LGT. In contrast, the Rhodanobacter pangenome contained a recombinational hot spot in which numerous metal resistance genes were subjected to LGT and/or duplication. In particular, Co²⁺/Zn²⁺/Cd²⁺ efflux and mercuric resistance operon genes appeared to be highly mobile within Rhodanobacter populations. Evidence of multiple duplications of a mercuric resistance operon common to most Rhodanobacter strains was also observed. Collectively, our analyses indicated the importance of LGT during the evolution of groundwater microbial communities in response to heavy metal contamination, and a conceptual model was developed to display such adaptive evolutionary processes for explaining the extreme dominance of Rhodanobacter populations in the contaminated groundwater microbiome.

Lateral gene transfer (LGT), along with positive selection and gene duplication, are the three main mechanisms that drive adaptive evolution of microbial genomes and communities, but their relative importance is unclear. Some recent studies suggested that LGT is a major adaptive mechanism for microbial populations in response to changing environments, and hence, it could also be critical in shaping microbial community structure. However, direct evidence of LGT and its rates in extant natural microbial communities in response to changing environments is still lacking. Our results presented in this study provide explicit evidence that LGT played a crucial role in driving the evolution of a groundwater microbial community in response to extreme heavy metal contamination. It appears that acquisition of genes critical for survival, growth, and reproduction via LGT is the most rapid and effective way to enable microorganisms and associated microbial communities to quickly adapt to abrupt harsh environmental stresses.

Collapse

Karamichalis R, Kari L, Konstantinidis S, Kopecki S. An investigation into inter- and intragenomic variations of graphic genomic signatures. BMC Bioinformatics 2015;16:246. [PMID: 26249837 PMCID: PMC4527362 DOI: 10.1186/s12859-015-0655-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Accepted: 06/30/2015] [Indexed: 11/30/2022] Open

Abstract

Background

Motivated by the general need to identify and classify species based on molecular evidence, genome comparisons have been proposed that are based on measuring mostly Euclidean distances between Chaos Game Representation (CGR) patterns of genomic DNA sequences.

Results

We provide, on an extensive dataset and using several different distances, confirmation of the hypothesis that CGR patterns are preserved along a genomic DNA sequence, and are different for DNA sequences originating from genomes of different species. This finding lends support to the theory that CGRs of genomic sequences can act as graphic genomic signatures. In particular, we compare the CGR patterns of over five hundred different 150,000 bp genomic sequences spanning one complete chromosome from each of six organisms, representing all kingdoms of life: H. sapiens (Animalia; chromosome 21), S. cerevisiae (Fungi; chromosome 4), A. thaliana (Plantae; chromosome 1), P. falciparum (Protista; chromosome 14), E. coli (Bacteria - full genome), and P. furiosus (Archaea - full genome). To maximize the diversity within each species, we also analyze the interrelationships within a set of over five hundred 150,000 bp genomic sequences sampled from the entire aforementioned genomes. Lastly, we provide some preliminary evidence of this method’s ability to classify genomic DNA sequences at lower taxonomic levels by comparing sequences sampled from the entire genome of H. sapiens (class Mammalia, order Primates) and of M. musculus (class Mammalia, order Rodentia), for a total length of approximately 174 million basepairs analyzed. We compute pairwise distances between CGRs of these genomic sequences using six different distances, and construct Molecular Distance Maps, which visualize all sequences as points in a two-dimensional or three-dimensional space, to simultaneously display their interrelationships.

Conclusion

Our analysis confirms, for this dataset, that CGR patterns of DNA sequences from the same genome are in general quantitatively similar, while being different for DNA sequences from genomes of different species. Our assessment of the performance of the six distances analyzed uses three different quality measures and suggests that several distances outperform the Euclidean distance, which has so far been almost exclusively used for such studies.

Collapse

Pierneef R, Cronje L, Bezuidt O, Reva ON. Pre_GI: a global map of ontological links between horizontally transferred genomic islands in bacterial and archaeal genomes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015. [PMID: 26200753 PMCID: PMC5630688 DOI: 10.1093/database/bav058] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Thomas D, Finan C, Newport MJ, Jones S. DNA entropy reveals a significant difference in complexity between housekeeping and tissue specific gene promoters. Comput Biol Chem 2015;58:19-24. [PMID: 25988219 DOI: 10.1016/j.compbiolchem.2015.05.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 05/01/2015] [Accepted: 05/01/2015] [Indexed: 10/23/2022]

Ravenhall M, Škunca N, Lassalle F, Dessimoz C. Inferring horizontal gene transfer. PLoS Comput Biol 2015;11:e1004095. [PMID: 26020646 PMCID: PMC4462595 DOI: 10.1371/journal.pcbi.1004095] [Citation(s) in RCA: 147] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Almagro G, Viale AM, Montero M, Rahimpour M, Muñoz FJ, Baroja-Fernández E, Bahaji A, Zúñiga M, González-Candelas F, Pozueta-Romero J. Comparative genomic and phylogenetic analyses of Gammaproteobacterial glg genes traced the origin of the Escherichia coli glycogen glgBXCAP operon to the last common ancestor of the sister orders Enterobacteriales and Pasteurellales. PLoS One 2015;10:e0115516. [PMID: 25607991 PMCID: PMC4301808 DOI: 10.1371/journal.pone.0115516] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Accepted: 11/25/2014] [Indexed: 12/22/2022] Open

Abstract

Production of branched α-glucan, glycogen-like polymers is widely spread in the Bacteria domain. The glycogen pathway of synthesis and degradation has been fairly well characterized in the model enterobacterial species Escherichia coli (order Enterobacteriales, class Gammaproteobacteria), in which the cognate genes (branching enzyme glgB, debranching enzyme glgX, ADP-glucose pyrophosphorylase glgC, glycogen synthase glgA, and glycogen phosphorylase glgP) are clustered in a glgBXCAP operon arrangement. However, the evolutionary origin of this particular arrangement and of its constituent genes is unknown. Here, by using 265 complete gammaproteobacterial genomes we have carried out a comparative analysis of the presence, copy number and arrangement of glg genes in all lineages of the Gammaproteobacteria. These analyses revealed large variations in glg gene presence, copy number and arrangements among different gammaproteobacterial lineages. However, the glgBXCAP arrangement was remarkably conserved in all glg-possessing species of the orders Enterobacteriales and Pasteurellales (the E/P group). Subsequent phylogenetic analyses of glg genes present in the Gammaproteobacteria and in other main bacterial groups indicated that glg genes have undergone a complex evolutionary history in which horizontal gene transfer may have played an important role. These analyses also revealed that the E/P glgBXCAP genes (a) share a common evolutionary origin, (b) were vertically transmitted within the E/P group, and (c) are closely related to glg genes of some phylogenetically distant betaproteobacterial species. The overall data allowed tracing the origin of the E. coli glgBXCAP operon to the last common ancestor of the E/P group, and also to uncover a likely glgBXCAP transfer event from the E/P group to particular lineages of the Betaproteobacteria.

Collapse

Metzler S, Kalinina OV. Detection of atypical genes in virus families using a one-class SVM. BMC Genomics 2014;15:913. [PMID: 25336138 PMCID: PMC4210486 DOI: 10.1186/1471-2164-15-913] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 10/10/2014] [Indexed: 12/22/2022] Open

Kupczok A, Bollback JP. Motif depletion in bacteriophages infecting hosts with CRISPR systems. BMC Genomics 2014;15:663. [PMID: 25103210 PMCID: PMC4246573 DOI: 10.1186/1471-2164-15-663] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2014] [Accepted: 02/15/2014] [Indexed: 12/26/2022] Open

Vinga S. Information theory applications for biological sequence analysis. Brief Bioinform 2014;15:376-89. [PMID: 24058049 PMCID: PMC7109941 DOI: 10.1093/bib/bbt068] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 08/17/2013] [Indexed: 01/13/2023] Open

Peeters N, Carrère S, Anisimova M, Plener L, Cazalé AC, Genin S. Repertoire, unified nomenclature and evolution of the Type III effector gene set in the Ralstonia solanacearum species complex. BMC Genomics 2013;14:859. [PMID: 24314259 PMCID: PMC3878972 DOI: 10.1186/1471-2164-14-859] [Citation(s) in RCA: 139] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 11/29/2013] [Indexed: 12/21/2022] Open

Taniguchi Y, Yamada Y, Maruyama O, Kuhara S, Ikeda D. The purity measure for genomic regions leads to horizontally transferred genes. J Bioinform Comput Biol 2013;11:1343002. [PMID: 24372031 DOI: 10.1142/s0219720013430026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Jeanniard A, Dunigan DD, Gurnon JR, Agarkova IV, Kang M, Vitek J, Duncan G, McClung OW, Larsen M, Claverie JM, Van Etten JL, Blanc G. Towards defining the chloroviruses: a genomic journey through a genus of large DNA viruses. BMC Genomics 2013;14:158. [PMID: 23497343 PMCID: PMC3602175 DOI: 10.1186/1471-2164-14-158] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2012] [Accepted: 02/22/2013] [Indexed: 11/29/2022] Open

Abstract

Background

Giant viruses in the genus Chlorovirus (family Phycodnaviridae) infect eukaryotic green microalgae. The prototype member of the genus, Paramecium bursaria chlorella virus 1, was sequenced more than 15 years ago, and to date there are only 6 fully sequenced chloroviruses in public databases. Presented here are the draft genome sequences of 35 additional chloroviruses (287 – 348 Kb/319 – 381 predicted protein encoding genes) collected across the globe; they infect one of three different green algal species. These new data allowed us to analyze the genomic landscape of 41 chloroviruses, which revealed some remarkable features about these viruses.

Results

Genome colinearity, nucleotide conservation and phylogenetic affinity were limited to chloroviruses infecting the same host, confirming the validity of the three previously known subgenera. Clues for the existence of a fourth new subgenus indicate that the boundaries of chlorovirus diversity are not completely determined. Comparison of the chlorovirus phylogeny with that of the algal hosts indicates that chloroviruses have changed hosts in their evolutionary history. Reconstruction of the ancestral genome suggests that the last common chlorovirus ancestor had a slightly more diverse protein repertoire than modern chloroviruses. However, more than half of the defined chlorovirus gene families have a potential recent origin (after Chlorovirus divergence), among which a portion shows compositional evidence for horizontal gene transfer. Only a few of the putative acquired proteins had close homologs in databases raising the question of the true donor organism(s). Phylogenomic analysis identified only seven proteins whose genes were potentially exchanged between the algal host and the chloroviruses.

Conclusion

The present evaluation of the genomic evolution pattern suggests that chloroviruses differ from that described in the related Poxviridae and Mimiviridae. Our study shows that the fixation of algal host genes has been anecdotal in the evolutionary history of chloroviruses. We finally discuss the incongruence between compositional evidence of horizontal gene transfer and lack of close relative sequences in the databases, which suggests that the recently acquired genes originate from a still largely un-sequenced reservoir of genomes, possibly other unknown viruses that infect the same hosts.

Collapse

Le PT, Ramulu HG, Guijarro L, Paganini J, Gouret P, Chabrol O, Raoult D, Pontarotti P. An automated approach for the identification of horizontal gene transfers from complete genomes reveals the rhizome of Rickettsiales. BMC Evol Biol 2012;12:243. [PMID: 23234643 PMCID: PMC3575314 DOI: 10.1186/1471-2148-12-243] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Accepted: 11/22/2012] [Indexed: 11/10/2022] Open

Liu L, Chen X, Skogerbø G, Zhang P, Chen R, He S, Huang DW. The human microbiome: A hot spot of microbial horizontal gene transfer. Genomics 2012;100:265-70. [DOI: 10.1016/j.ygeno.2012.07.012] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Revised: 07/06/2012] [Accepted: 07/16/2012] [Indexed: 12/19/2022]

Saini V, Raghuvanshi S, Khurana JP, Ahmed N, Hasnain SE, Tyagi AK, Tyagi AK. Massive gene acquisitions in Mycobacterium indicus pranii provide a perspective on mycobacterial evolution. Nucleic Acids Res 2012;40:10832-50. [PMID: 22965120 PMCID: PMC3505973 DOI: 10.1093/nar/gks793] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Elhai J, Liu H, Taton A. Detection of horizontal transfer of individual genes by anomalous oligomer frequencies. BMC Genomics 2012;13:245. [PMID: 22702893 PMCID: PMC3497702 DOI: 10.1186/1471-2164-13-245] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Accepted: 05/18/2012] [Indexed: 11/10/2022] Open

Zhai Z, Reinert G, Song K, Waterman MS, Luan Y, Sun F. Normal and compound poisson approximations for pattern occurrences in NGS reads. J Comput Biol 2012;19:839-54. [PMID: 22697250 PMCID: PMC3375642 DOI: 10.1089/cmb.2012.0029] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

Next generation sequencing (NGS) technologies are now widely used in many biological studies. In NGS, sequence reads are randomly sampled from the genome sequence of interest. Most computational approaches for NGS data first map the reads to the genome and then analyze the data based on the mapped reads. Since many organisms have unknown genome sequences and many reads cannot be uniquely mapped to the genomes even if the genome sequences are known, alternative analytical methods are needed for the study of NGS data. Here we suggest using word patterns to analyze NGS data. Word pattern counting (the study of the probabilistic distribution of the number of occurrences of word patterns in one or multiple long sequences) has played an important role in molecular sequence analysis. However, no studies are available on the distribution of the number of occurrences of word patterns in NGS reads. In this article, we build probabilistic models for the background sequence and the sampling process of the sequence reads from the genome. Based on the models, we provide normal and compound Poisson approximations for the number of occurrences of word patterns from the sequence reads, with bounds on the approximation error. The main challenge is to consider the randomness in generating the long background sequence, as well as in the sampling of the reads using NGS. We show the accuracy of these approximations under a variety of conditions for different patterns with various characteristics. Under realistic assumptions, the compound Poisson approximation seems to outperform the normal approximation in most situations. These approximate distributions can be used to evaluate the statistical significance of the occurrence of patterns from NGS data. The theory and the computational algorithm for calculating the approximate distributions are then used to analyze ChIP-Seq data using transcription factor GABP. Software is available online (www-rcf.usc.edu/∼fsun/Programs/NGS_motif_power/NGS_motif_power.html). In addition, Supplementary Material can be found online (www.liebertonline.com/cmb).

Collapse

Pattern matching through Chaos Game Representation: bridging numerical and discrete data structures for biological sequence analysis. Algorithms Mol Biol 2012;7:10. [PMID: 22551152 PMCID: PMC3402988 DOI: 10.1186/1748-7188-7-10] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2012] [Accepted: 05/02/2012] [Indexed: 01/06/2023] Open

Abstract

Background

Chaos Game Representation (CGR) is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L-long suffix will be located within 2^-Ldistance of each other. In the two decades since its original proposal, CGR has been generalized beyond its original focus on genomic sequences and has been successfully applied to a wide range of problems in bioinformatics. This report explores the possibility that it can be further extended to approach algorithms that rely on discrete, graph-based representations.

Results

The exploratory analysis described here consisted of selecting foundational string problems and refactoring them using CGR-based algorithms. We found that CGR can take the role of suffix trees and emulate sophisticated string algorithms, efficiently solving exact and approximate string matching problems such as finding all palindromes and tandem repeats, and matching with mismatches. The common feature of these problems is that they use longest common extension (LCE) queries as subtasks of their procedures, which we show to have a constant time solution with CGR. Additionally, we show that CGR can be used as a rolling hash function within the Rabin-Karp algorithm.

Conclusions

The analysis of biological sequences relies on algorithmic foundations facing mounting challenges, both logistic (performance) and analytical (lack of unifying mathematical framework). CGR is found to provide the latter and to promise the former: graph-based data structures for sequence analysis operations are entailed by numerical-based data structures produced by CGR maps, providing a unifying analytical framework for a diversity of pattern matching problems.

Collapse

Ménigaud S, Mallet L, Picord G, Churlaud C, Borrel A, Deschavanne P. GOHTAM: a website for 'Genomic Origin of Horizontal Transfers, Alignment and Metagenomics'. Bioinformatics 2012;28:1270-1. [PMID: 22426345 PMCID: PMC3338014 DOI: 10.1093/bioinformatics/bts118] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open

Frenkel S, Kirzhner V, Korol A. Organizational heterogeneity of vertebrate genomes. PLoS One 2012;7:e32076. [PMID: 22384143 PMCID: PMC3288070 DOI: 10.1371/journal.pone.0032076] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2011] [Accepted: 01/23/2012] [Indexed: 01/06/2023] Open

Soares SC, Abreu VAC, Ramos RTJ, Cerdeira L, Silva A, Baumbach J, Trost E, Tauch A, Hirata R, Mattos-Guaraldi AL, Miyoshi A, Azevedo V. PIPS: pathogenicity island prediction software. PLoS One 2012;7:e30848. [PMID: 22355329 PMCID: PMC3280268 DOI: 10.1371/journal.pone.0030848] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2011] [Accepted: 12/22/2011] [Indexed: 01/08/2023] Open

Azad RK, Lawrence JG. Detecting laterally transferred genes. Methods Mol Biol 2012;855:281-308. [PMID: 22407713 DOI: 10.1007/978-1-61779-582-4_10] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Anderson CNK, Liu L, Pearl D, Edwards SV. Tangled trees: the challenge of inferring species trees from coalescent and noncoalescent genes. Methods Mol Biol 2012;856:3-28. [PMID: 22399453 DOI: 10.1007/978-1-61779-585-5_1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Bezuidt O, Pierneef R, Mncube K, Lima-Mendez G, Reva ON. Mainstreams of horizontal gene exchange in enterobacteria: consideration of the outbreak of enterohemorrhagic E. coli O104:H4 in Germany in 2011. PLoS One 2011;6:e25702. [PMID: 22022434 PMCID: PMC3195076 DOI: 10.1371/journal.pone.0025702] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 09/08/2011] [Indexed: 11/18/2022] Open

Abstract

BACKGROUND

Escherichia coli O104:H4 caused a severe outbreak in Europe in 2011. The strain TY-2482 sequenced from this outbreak allowed the discovery of its closest relatives but failed to resolve ways in which it originated and evolved. On account of the previous statement, may we expect similar upcoming outbreaks to occur recurrently or spontaneously in the future? The inability to answer these questions shows limitations of the current comparative and evolutionary genomics methods.

PRINCIPAL FINDINGS

The study revealed oscillations of gene exchange in enterobacteria, which originated from marine γ-Proteobacteria. These mobile genetic elements have become recombination hotspots and effective 'vehicles' ensuring a wide distribution of successful combinations of fitness and virulence genes among enterobacteria. Two remarkable peculiarities of the strain TY-2482 and its relatives were observed: i) retaining the genetic primitiveness by these strains as they somehow avoided the main fluxes of horizontal gene transfer which effectively penetrated other enetrobacteria; ii) acquisition of antibiotic resistance genes in a plasmid genomic island of β-Proteobacteria origin which ontologically is unrelated to the predominant genomic islands of enterobacteria.

CONCLUSIONS

Oscillations of horizontal gene exchange activity were reported which result from a counterbalance between the acquired resistance of bacteria towards existing mobile vectors and the generation of new vectors in the environmental microflora. We hypothesized that TY-2482 may originate from a genetically primitive lineage of E. coli that has evolved in confined geographical areas and brought by human migration or cattle trade onto an intersection of several independent streams of horizontal gene exchange. Development of a system for monitoring the new and most active gene exchange events was proposed.

Collapse

Bioinformatic analysis reveals high diversity of bacterial genes for laccase-like enzymes. PLoS One 2011;6:e25724. [PMID: 22022440 PMCID: PMC3192119 DOI: 10.1371/journal.pone.0025724] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2011] [Accepted: 09/09/2011] [Indexed: 11/19/2022] Open