Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bohlin J, Skjerve E, Ussery DW. Investigations of oligonucleotide usage variance within and between prokaryotes. PLoS Comput Biol 2008;4:e1000057. [PMID: 18421372 PMCID: PMC2289840 DOI: 10.1371/journal.pcbi.1000057] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2007] [Accepted: 03/12/2008] [Indexed: 11/18/2022] Open

For:	Bohlin J, Skjerve E, Ussery DW. Investigations of oligonucleotide usage variance within and between prokaryotes. PLoS Comput Biol 2008;4:e1000057. [PMID: 18421372 PMCID: PMC2289840 DOI: 10.1371/journal.pcbi.1000057] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2007] [Accepted: 03/12/2008] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Bruneaux M, Kronholm I, Ashrafi R, Ketola T. Roles of adenine methylation and genetic mutations in adaptation to different temperatures in Serratia marcescens. Epigenetics 2021;17:861-881. [PMID: 34519613 DOI: 10.1080/15592294.2021.1966215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open

Duan B, Ding P, Navarre WW, Liu J, Xia B. Xenogeneic Silencing and Bacterial Genome Evolution: Mechanisms for DNA Recognition Imply Multifaceted Roles of Xenogeneic Silencers. Mol Biol Evol 2021;38:4135-4148. [PMID: 34003286 PMCID: PMC8476142 DOI: 10.1093/molbev/msab136] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 04/08/2021] [Indexed: 12/14/2022] Open

Bize A, Midoux C, Mariadassou M, Schbath S, Forterre P, Da Cunha V. Exploring short k-mer profiles in cells and mobile elements from Archaea highlights the major influence of both the ecological niche and evolutionary history. BMC Genomics 2021;22:186. [PMID: 33726663 PMCID: PMC7962313 DOI: 10.1186/s12864-021-07471-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 02/24/2021] [Indexed: 12/16/2022] Open

Abstract

BACKGROUND

K-mer-based methods have greatly advanced in recent years, largely driven by the realization of their biological significance and by the advent of next-generation sequencing. Their speed and their independence from the annotation process are major advantages. Their utility in the study of the mobilome has recently emerged and they seem a priori adapted to the patchy gene distribution and the lack of universal marker genes of viruses and plasmids. To provide a framework for the interpretation of results from k-mer based methods applied to archaea or their mobilome, we analyzed the 5-mer DNA profiles of close to 600 archaeal cells, viruses and plasmids. Archaea is one of the three domains of life. Archaea seem enriched in extremophiles and are associated with a high diversity of viral and plasmid families, many of which are specific to this domain. We explored the dataset structure by multivariate and statistical analyses, seeking to identify the underlying factors.

RESULTS

For cells, the 5-mer profiles were inconsistent with the phylogeny of archaea. At a finer taxonomic level, the influence of the taxonomy and the environmental constraints on 5-mer profiles was very strong. These two factors were interdependent to a significant extent, and the respective weights of their contributions varied according to the clade. A convergent adaptation was observed for the class Halobacteria, for which a strong 5-mer signature was identified. For mobile elements, coevolution with the host had a clear influence on their 5-mer profile. This enabled us to identify one previously known and one new case of recent host transfer based on the atypical composition of the mobile elements involved. Beyond the effect of coevolution, extrachromosomal elements strikingly retain the specific imprint of their own viral or plasmid taxonomic family in their 5-mer profile.

CONCLUSION

This specific imprint confirms that the evolution of extrachromosomal elements is driven by multiple parameters and is not restricted to host adaptation. In addition, we detected only recent host transfer events, suggesting the fast evolution of short k-mer profiles. This calls for caution when using k-mers for host prediction, metagenomic binning or phylogenetic reconstruction.

Collapse

Azlan A, Obeidat SM, Theva Das K, Yunus MA, Azzam G. Genome-wide identification of Aedes albopictus long noncoding RNAs and their association with dengue and Zika virus infection. PLoS Negl Trop Dis 2021;15:e0008351. [PMID: 33481791 PMCID: PMC7872224 DOI: 10.1371/journal.pntd.0008351] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 02/09/2021] [Accepted: 11/20/2020] [Indexed: 12/14/2022] Open

Abstract

The Asian tiger mosquito, Aedes albopictus (Ae. albopictus), is an important vector that transmits arboviruses such as dengue (DENV), Zika (ZIKV) and Chikungunya virus (CHIKV). Long noncoding RNAs (lncRNAs) are known to regulate various biological processes. Knowledge on Ae. albopictus lncRNAs and their functional role in virus-host interactions are still limited. Here, we identified and characterized the lncRNAs in the genome of an arbovirus vector, Ae. albopictus, and evaluated their potential involvement in DENV and ZIKV infection. We used 148 public datasets, and identified a total of 10, 867 novel lncRNA transcripts, of which 5,809, 4,139, and 919 were intergenic, intronic and antisense respectively. The Ae. albopictus lncRNAs shared many characteristics with other species such as short length, low GC content, and low sequence conservation. RNA-sequencing of Ae. albopictus cells infected with DENV and ZIKV showed that the expression of lncRNAs was altered upon virus infection. Target prediction analysis revealed that Ae. albopictus lncRNAs may regulate the expression of genes involved in immunity and other metabolic and cellular processes. To verify the role of lncRNAs in virus infection, we generated mutations in lncRNA loci using CRISPR-Cas9, and discovered that two lncRNA loci mutations, namely XLOC_029733 (novel lncRNA transcript id: lncRNA_27639.2) and LOC115270134 (known lncRNA transcript id: XR_003899061.1) resulted in enhancement of DENV and ZIKV replication. The results presented here provide an important foundation for future studies of lncRNAs and their relationship with virus infection in Ae. albopictus.

Ae. albopictus is an important vector of arboviruses such as dengue and Zika viruses. Studies on virus-host interaction at gene expression and molecular level are crucial especially in devising methods to inhibit virus replication in Aedes mosquitoes. Previous reports have shown that, besides protein-coding genes, noncoding RNAs such as lncRNAs are also involved in virus-host interaction. In this study, we report a comprehensive catalog of novel lncRNA transcripts in the genome of Ae. albopictus. We also show that the expression of lncRNAs was altered upon infection with dengue and Zika. Additionally, depletion of certain lncRNAs resulted in increased replication of dengue and Zika; hence, suggesting potential association of lncRNAs in virus infection. Results of this study provide a new avenue to the investigation of mosquito-virus interactions, especially in the aspect of noncoding genes.

Collapse

Revisiting the Relationships Between Genomic G + C Content, RNA Secondary Structures, and Optimal Growth Temperature. J Mol Evol 2020;89:165-171. [PMID: 33216148 DOI: 10.1007/s00239-020-09974-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 11/09/2020] [Indexed: 10/23/2022]

Zhou Y, Zhang W, Wu H, Huang K, Jin J. A high-resolution genomic composition-based method with the ability to distinguish similar bacterial organisms. BMC Genomics 2019;20:754. [PMID: 31638897 PMCID: PMC6805505 DOI: 10.1186/s12864-019-6119-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 09/20/2019] [Indexed: 12/03/2022] Open

Abstract

Background

Genomic composition has been found to be species specific and is used to differentiate bacterial species. To date, almost no published composition-based approaches are able to distinguish between most closely related organisms, including intra-genus species and intra-species strains. Thus, it is necessary to develop a novel approach to address this problem.

Results

Here, we initially determine that the “tetranucleotide-derived z-value Pearson correlation coefficient” (TETRA) approach is representative of other published statistical methods. Then, we devise a novel method called “Tetranucleotide-derived Z-value Manhattan Distance” (TZMD) and compare it with the TETRA approach. Our results show that TZMD reflects the maximal genome difference, while TETRA does not in most conditions, demonstrating in theory that TZMD provides improved resolution. Additionally, our analysis of real data shows that TZMD improves species differentiation and clearly differentiates similar organisms, including similar species belonging to the same genospecies, subspecies and intraspecific strains, most of which cannot be distinguished by TETRA. Furthermore, TZMD is able to determine clonal strains with the TZMD = 0 criterion, which intrinsically encompasses identical composition, high average nucleotide identity and high percentage of shared genomes.

Conclusions

Our extensive assessment demonstrates that TZMD has high resolution. This study is the first to propose a composition-based method for differentiating bacteria at the strain level and to demonstrate that composition is also strain specific. TZMD is a powerful tool and the first easy-to-use approach for differentiating clonal and non-clonal strains. Therefore, as the first composition-based algorithm for strain typing, TZMD will facilitate bacterial studies in the future.

Collapse

Gophna U. The unbearable ease of expression-how avoidance of spurious transcription can shape G+C content in bacterial genomes. FEMS Microbiol Lett 2019;365:5181332. [PMID: 30423131 DOI: 10.1093/femsle/fny267] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Accepted: 11/11/2018] [Indexed: 12/28/2022] Open

Azlan A, Obeidat SM, Yunus MA, Azzam G. Systematic identification and characterization of Aedes aegypti long noncoding RNAs (lncRNAs). Sci Rep 2019;9:12147. [PMID: 31434910 PMCID: PMC6704130 DOI: 10.1038/s41598-019-47506-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Accepted: 07/18/2019] [Indexed: 12/14/2022] Open

Krawczyk PS, Lipinski L, Dziembowski A. PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures. Nucleic Acids Res 2019;46:e35. [PMID: 29346586 PMCID: PMC5887522 DOI: 10.1093/nar/gkx1321] [Citation(s) in RCA: 276] [Impact Index Per Article: 55.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 12/28/2017] [Indexed: 12/14/2022] Open

Bohlin J, Pettersson JHO. Evolution of Genomic Base Composition: From Single Cell Microbes to Multicellular Animals. Comput Struct Biotechnol J 2019;17:362-370. [PMID: 30949307 PMCID: PMC6429543 DOI: 10.1016/j.csbj.2019.03.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 02/28/2019] [Accepted: 03/01/2019] [Indexed: 01/07/2023] Open

Almpanis A, Swain M, Gatherer D, McEwan N. Correlation between bacterial G+C content, genome size and the G+C content of associated plasmids and bacteriophages. Microb Genom 2018;4:e000168. [PMID: 29633935 PMCID: PMC5989581 DOI: 10.1099/mgen.0.000168] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 03/06/2018] [Indexed: 02/06/2023] Open

Akhter S, Aziz RK, Kashef MT, Ibrahim ES, Bailey B, Edwards RA. Kullback Leibler divergence in complete bacterial and phage genomes. PeerJ 2017;5:e4026. [PMID: 29204318 PMCID: PMC5712468 DOI: 10.7717/peerj.4026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 10/22/2017] [Indexed: 12/11/2022] Open

Systematic Identification and Molecular Characteristics of Long Noncoding RNAs in Pig Tissues. BIOMED RESEARCH INTERNATIONAL 2017;2017:6152582. [PMID: 29062838 PMCID: PMC5618743 DOI: 10.1155/2017/6152582] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Revised: 07/26/2017] [Accepted: 08/08/2017] [Indexed: 12/15/2022]

Bohlin J, Eldholm V, Pettersson JHO, Brynildsrud O, Snipen L. The nucleotide composition of microbial genomes indicates differential patterns of selection on core and accessory genomes. BMC Genomics 2017;18:151. [PMID: 28187704 PMCID: PMC5303225 DOI: 10.1186/s12864-017-3543-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Accepted: 02/02/2017] [Indexed: 12/02/2022] Open

Abstract

Background

The core genome consists of genes shared by the vast majority of a species and is therefore assumed to have been subjected to substantially stronger purifying selection than the more mobile elements of the genome, also known as the accessory genome. Here we examine intragenic base composition differences in core genomes and corresponding accessory genomes in 36 species, represented by the genomes of 731 bacterial strains, to assess the impact of selective forces on base composition in microbes. We also explore, in turn, how these results compare with findings for whole genome intragenic regions.

Results

We found that GC content in coding regions is significantly higher in core genomes than accessory genomes and whole genomes. Likewise, GC content variation within coding regions was significantly lower in core genomes than in accessory genomes and whole genomes. Relative entropy in coding regions, measured as the difference between observed and expected trinucleotide frequencies estimated from mononucleotide frequencies, was significantly higher in the core genomes than in accessory and whole genomes. Relative entropy was positively associated with coding region GC content within the accessory genomes, but not within the corresponding coding regions of core or whole genomes.

Conclusion

The higher intragenic GC content and relative entropy, as well as the lower GC content variation, observed in the core genomes is most likely associated with selective constraints. It is unclear whether the positive association between GC content and relative entropy in the more mobile accessory genomes constitutes signatures of selection or selective neutral processes.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-017-3543-7) contains supplementary material, which is available to authorized users.

Collapse

Díez-Vives C, Moitinho-Silva L, Nielsen S, Reynolds D, Thomas T. Expression of eukaryotic-like protein in the microbiome of sponges. Mol Ecol 2017;26:1432-1451. [PMID: 28036141 DOI: 10.1111/mec.14003] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Revised: 12/08/2016] [Accepted: 12/09/2016] [Indexed: 01/04/2023]

Samchenko AA, Kiselev SS, Kabanov AV, Kondratjev MS, Komarov VM. On the nature of the domination of oligomeric (dA:dT) n tracts in the structure of eukaryotic genomes. Biophysics (Nagoya-shi) 2016. [DOI: 10.1134/s0006350916060233] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Mehmood T, Bohlin J, Snipen L. A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015;12:560-567. [PMID: 26357267 DOI: 10.1109/tcbb.2014.2366146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Broin PÓ, Smith TJ, Golden AA. Alignment-free clustering of transcription factor binding motifs using a genetic-k-medoids approach. BMC Bioinformatics 2015;16:22. [PMID: 25627106 PMCID: PMC4384390 DOI: 10.1186/s12859-015-0450-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Accepted: 01/02/2015] [Indexed: 11/10/2022] Open

Abstract

Background

Familial binding profiles (FBPs) represent the average binding specificity for a group of structurally related DNA-binding proteins. The construction of such profiles allows the classification of novel motifs based on similarity to known families, can help to reduce redundancy in motif databases and de novo prediction algorithms, and can provide valuable insights into the evolution of binding sites. Many current approaches to automated motif clustering rely on progressive tree-based techniques, and can suffer from so-called frozen sub-alignments, where motifs which are clustered early on in the process remain ‘locked’ in place despite the potential for better placement at a later stage. In order to avoid this scenario, we have developed a genetic-k-medoids approach which allows motifs to move freely between clusters at any point in the clustering process.

Results

We demonstrate the performance of our algorithm, GMACS, on multiple benchmark motif datasets, comparing results obtained with current leading approaches. The first dataset includes 355 position weight matrices from the TRANSFAC database and indicates that the k-mer frequency vector approach used in GMACS outperforms other motif comparison techniques. We then cluster a set of 79 motifs from the JASPAR database previously used in several motif clustering studies and demonstrate that GMACS can produce a higher number of structurally homogeneous clusters than other methods without the need for a large number of singletons. Finally, we show the robustness of our algorithm to noise on multiple synthetic datasets consisting of known motifs convolved with varying degrees of noise.

Conclusions

Our proposed algorithm is generally applicable to any DNA or protein motifs, can produce highly stable and biologically meaningful clusters, and, by avoiding the problem of frozen sub-alignments, can provide improved results when compared with existing techniques on benchmark datasets.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0450-2) contains supplementary material, which is available to authorized users.

Collapse

Bohlin J, Brynildsrud OB, Sekse C, Snipen L. An evolutionary analysis of genome expansion and pathogenicity in Escherichia coli. BMC Genomics 2014;15:882. [PMID: 25297974 PMCID: PMC4200225 DOI: 10.1186/1471-2164-15-882] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 09/29/2014] [Indexed: 12/27/2022] Open

Abstract

BACKGROUND

There are several studies describing loss of genes through reductive evolution in microbes, but how selective forces are associated with genome expansion due to horizontal gene transfer (HGT) has not received similar attention. The aim of this study was therefore to examine how selective pressures influence genome expansion in 53 fully sequenced and assembled Escherichia coli strains. We also explored potential connections between genome expansion and the attainment of virulence factors. This was performed using estimations of several genomic parameters such as AT content, genomic drift (measured using relative entropy), genome size and estimated HGT size, which were subsequently compared to analogous parameters computed from the core genome consisting of 1729 genes common to the 53 E. coli strains. Moreover, we analyzed how selective pressures (quantified using relative entropy and dN/dS), acting on the E. coli core genome, influenced lineage and phylogroup formation.

RESULTS

Hierarchical clustering of dS and dN estimations from the E. coli core genome resulted in phylogenetic trees with topologies in agreement with known E. coli taxonomy and phylogroups. High values of dS, compared to dN, indicate that the E. coli core genome has been subjected to substantial purifying selection over time; significantly more than the non-core part of the genome (p<0.001). This is further supported by a linear association between strain-wise dS and dN values (β = 26.94 ± 0.44, R2~0.98, p<0.001). The non-core part of the genome was also significantly more AT-rich (p<0.001) than the core genome and E. coli genome size correlated with estimated HGT size (p<0.001). In addition, genome size (p<0.001), AT content (p<0.001) as well as estimated HGT size (p<0.005) were all associated with the presence of virulence factors, suggesting that pathogenicity traits in E. coli are largely attained through HGT. No associations were found between selective pressures operating on the E. coli core genome, as estimated using relative entropy, and genome size (p~0.98).

CONCLUSIONS

On a larger time frame, genome expansion in E. coli, which is significantly associated with the acquisition of virulence factors, appears to be independent of selective forces operating on the core genome.

Collapse

Fullmer MS, Soucy SM, Swithers KS, Makkay AM, Wheeler R, Ventosa A, Gogarten JP, Papke RT. Population and genomic analysis of the genus Halorubrum. Front Microbiol 2014;5:140. [PMID: 24782836 PMCID: PMC3990103 DOI: 10.3389/fmicb.2014.00140] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2014] [Accepted: 03/18/2014] [Indexed: 11/13/2022] Open

Bohlin J, Brynildsrud O, Vesth T, Skjerve E, Ussery DW. Amino acid usage is asymmetrically biased in AT- and GC-rich microbial genomes. PLoS One 2013;8:e69878. [PMID: 23922837 PMCID: PMC3724673 DOI: 10.1371/journal.pone.0069878] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Accepted: 06/14/2013] [Indexed: 11/18/2022] Open

Abstract

INTRODUCTION

Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates.

RESULTS

We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB.

CONCLUSION

Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study.

Collapse

Dutta C, Paul S. Microbial lifestyle and genome signatures. Curr Genomics 2012;13:153-62. [PMID: 23024607 PMCID: PMC3308326 DOI: 10.2174/138920212799860698] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2011] [Revised: 09/13/2011] [Accepted: 09/28/2011] [Indexed: 12/29/2022] Open

Bohlin J, van Passel MWJ, Snipen L, Kristoffersen AB, Ussery D, Hardy SP. Relative entropy differences in bacterial chromosomes, plasmids, phages and genomic islands. BMC Genomics 2012;13:66. [PMID: 22325062 PMCID: PMC3305612 DOI: 10.1186/1471-2164-13-66] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2011] [Accepted: 02/10/2012] [Indexed: 11/10/2022] Open

Lamelas A, Gosalbes MJ, Manzano-Marín A, Peretó J, Moya A, Latorre A. Serratia symbiotica from the aphid Cinara cedri: a missing link from facultative to obligate insect endosymbiont. PLoS Genet 2011;7:e1002357. [PMID: 22102823 PMCID: PMC3213167 DOI: 10.1371/journal.pgen.1002357] [Citation(s) in RCA: 141] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2011] [Accepted: 09/10/2011] [Indexed: 02/07/2023] Open

Zheng H, Wu H. Short prokaryotic DNA fragment binning using a hierarchical classifier based on linear discriminant analysis and principal component analysis. J Bioinform Comput Biol 2011;8:995-1011. [PMID: 21121023 DOI: 10.1142/s0219720010005051] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2010] [Accepted: 07/31/2010] [Indexed: 11/18/2022]

Zheng H, Wu H. Gene-centric association analysis for the correlation between the guanine-cytosine content levels and temperature range conditions of prokaryotic species. BMC Bioinformatics 2010;11 Suppl 11:S7. [PMID: 21172057 PMCID: PMC3024870 DOI: 10.1186/1471-2105-11-s11-s7] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

Background

The environment has been playing an instrumental role in shaping and maintaining the morphological, physiological and biochemical diversities of prokaryotes. It has been debatable whether the whole-genome Guanine-Cytosine (GC) content levels of prokaryotic organisms are correlated with their optimal growth temperatures. Since the GC content is variable within a genome, we here focus on the correlation between the genic GC content levels and the temperature range conditions of prokaryotic organisms.

Results

The GC content levels in the coding regions of four genes were consistently identified as correlated with the temperature range condition when the association analysis was applied to (i) the 722 mesophilic and 93 thermophilic/hyperthermophilic organisms regardless of their phylogeny, oxygen requirement, salinity, or habitat conditions, and (ii) partial lists of organisms when organisms with certain phylogeny, oxygen requirement, salinity or habitat conditions were excluded. These four genes are K01251 (adenosylhomocysteinase), K03724 (DNA repair and recombination proteins), K07588 (LAO/AO transport system kinase), and K09122 (hypothetical protein).

To further validate the identified correlation relationships, we examined to what extent the temperature range condition of an organism can be predicted based on the GC content levels in the coding regions of the selected genes. The 84.52% accuracy for the complete genomes, the 84.09% accuracy for the in-progress genomes, and 82.70% accuracy for the metagenomes, especially when being compared to the 50% accuracy rendered by random guessing, suggested that the temperature range condition of a prokaryotic organism can generally be predicted based on the GC content levels of the selected genomic regions.

Conclusions

The results rendered by various statistical tests and prediction tests indicated that the GC content levels of the coding/non-coding regions of certain genes are highly likely to be correlated with the temperature range conditions of prokaryotic organisms. Therefore, it is promising to carry out “reverse ecology” and to complete the ecological characterizations of prokaryotic organisms, i.e., to infer their temperature range conditions based on the GC content levels of certain genomic regions.

Collapse

Kelley DR, Salzberg SL. Clustering metagenomic sequences with interpolated Markov models. BMC Bioinformatics 2010;11:544. [PMID: 21044341 PMCID: PMC3098094 DOI: 10.1186/1471-2105-11-544] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2010] [Accepted: 11/02/2010] [Indexed: 01/16/2023] Open

Dutta A, Paul S, Dutta C. GC-rich intra-operonic spacers in prokaryotes: Possible relation to gene order conservation. FEBS Lett 2010;584:4633-8. [DOI: 10.1016/j.febslet.2010.10.037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Revised: 10/12/2010] [Accepted: 10/15/2010] [Indexed: 11/28/2022]

Rangannan V, Bansal M. High-quality annotation of promoter regions for 913 bacterial genomes. ACTA ACUST UNITED AC 2010;26:3043-50. [PMID: 20956245 DOI: 10.1093/bioinformatics/btq577] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Bohlin J, Snipen L, Cloeckaert A, Lagesen K, Ussery D, Kristoffersen AB, Godfroid J. Genomic comparisons of Brucella spp. and closely related bacteria using base compositional and proteome based methods. BMC Evol Biol 2010;10:249. [PMID: 20707916 PMCID: PMC2928237 DOI: 10.1186/1471-2148-10-249] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 08/13/2010] [Indexed: 11/30/2022] Open

Abstract

Background

Classification of bacteria within the genus Brucella has been difficult due in part to considerable genomic homogeneity between the different species and biovars, in spite of clear differences in phenotypes. Therefore, many different methods have been used to assess Brucella taxonomy. In the current work, we examine 32 sequenced genomes from genus Brucella representing the six classical species, as well as more recently described species, using bioinformatical methods. Comparisons were made at the level of genomic DNA using oligonucleotide based methods (Markov chain based genomic signatures, genomic codon and amino acid frequencies based comparisons) and proteomes (all-against-all BLAST protein comparisons and pan-genomic analyses).

Results

We found that the oligonucleotide based methods gave different results compared to that of the proteome based methods. Differences were also found between the oligonucleotide based methods used. Whilst the Markov chain based genomic signatures grouped the different species in genus Brucella according to host preference, the codon and amino acid frequencies based methods reflected small differences between the Brucella species. Only minor differences could be detected between all genera included in this study using the codon and amino acid frequencies based methods.

Proteome comparisons were found to be in strong accordance with current Brucella taxonomy indicating a remarkable association between gene gain or loss on one hand and mutations in marker genes on the other. The proteome based methods found greater similarity between Brucella species and Ochrobactrum species than between species within genus Agrobacterium compared to each other. In other words, proteome comparisons of species within genus Agrobacterium were found to be more diverse than proteome comparisons between species in genus Brucella and genus Ochrobactrum. Pan-genomic analyses indicated that uptake of DNA from outside genus Brucella appears to be limited.

Conclusions

While both the proteome based methods and the Markov chain based genomic signatures were able to reflect environmental diversity between the different species and strains of genus Brucella, the genomic codon and amino acid frequencies based comparisons were not found adequate for such comparisons. The proteome comparison based phylogenies of the species in genus Brucella showed a surprising consistency with current Brucella taxonomy.

Collapse

Bohlin J, Snipen L, Hardy SP, Kristoffersen AB, Lagesen K, Dønsvik T, Skjerve E, Ussery DW. Analysis of intra-genomic GC content homogeneity within prokaryotes. BMC Genomics 2010;11:464. [PMID: 20691090 PMCID: PMC3091660 DOI: 10.1186/1471-2164-11-464] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2010] [Accepted: 08/06/2010] [Indexed: 11/10/2022] Open

Davenport C, Ussery DW, Tümmler B. Comparative genomics of green sulfur bacteria. PHOTOSYNTHESIS RESEARCH 2010;104:137-152. [PMID: 20099081 DOI: 10.1007/s11120-009-9515-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2009] [Accepted: 12/07/2009] [Indexed: 05/28/2023]

Davenport CF, Tümmler B. Abundant oligonucleotides common to most bacteria. PLoS One 2010;5:e9841. [PMID: 20352124 PMCID: PMC2843746 DOI: 10.1371/journal.pone.0009841] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2009] [Accepted: 03/03/2010] [Indexed: 11/25/2022] Open

Association analysis of the general environmental conditions and prokaryotes' gene distributions in various functional groups. Genomics 2010;96:27-38. [PMID: 20338234 DOI: 10.1016/j.ygeno.2010.03.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2009] [Revised: 03/14/2010] [Accepted: 03/16/2010] [Indexed: 12/15/2022]

Abstract

The activities of prokaryotes are pivotal in shaping the environment, and at the same time are greatly influenced by the environment. By using the genomic data and environmental descriptions of the complete prokaryotic genomes in NCBI's Microbial Genome Project Database and applying statistical methods, we have identified in a systematic manner those gene groups whose presence/frequency patterns are different for organisms of different environmental conditions. Here environmental conditions are characterized in four dimensions--salinity, oxygen requirement, habitat and temperature, and are based on the controlled vocabularies that NCBI's Microbial Genome Project database uses to specify the organism information; and, gene groups are determined as Clusters of Orthologous Groups (COG) and KEGG Orthology (KO) groups. These identified COG and KO groups are considered as potentially correlated with certain environmental conditions, and are then mapped to the COG general categories and KEGG pathways to determine which part of the functional machinery of prokaryotic cells are correlated with the environments. The observations derived from the analysis of the COG and KO groups that are potentially correlated with the oxygen requirement and habitat conditions are in general consistent with existing studies on properties of organisms living in different conditions of these two environmental factors. To further assess the identified correlation relationships, we have also examined whether the environmental conditions are predictable based on the gene distributions in the selected COG and KO groups. The misclassification rates of the prediction experiments are much smaller than that rendered by random guessing, indicating the existence of the correlation relationships between organisms' environmental conditions and gene distributions in certain functional groups. However, the rather moderate misclassification rates (the 25- and 75-percentiles of the misclassification rates of all prediction experiments are 16.79% and 24.06%, respectively) also indicate that the correlation relationships between environmental conditions and gene distributions in certain functional groups are not strong enough for one to decisively define the other.

Collapse

Perry SC, Beiko RG. Distinguishing microbial genome fragments based on their composition: evolutionary and comparative genomic perspectives. Genome Biol Evol 2010;2:117-31. [PMID: 20333228 PMCID: PMC2839357 DOI: 10.1093/gbe/evq004] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2010] [Indexed: 01/23/2023] Open

Abstract

It is well known that patterns of nucleotide composition vary within and among genomes, although the reasons why these variations exist are not completely understood. Between-genome compositional variation has been exploited to assign environmental shotgun sequences to their most likely originating genomes, whereas within-genome variation has been used to identify recently acquired genetic material such as pathogenicity islands. Recent sequence assignment techniques have achieved high levels of accuracy on artificial data sets, but the relative difficulty of distinguishing lineages with varying degrees of relatedness, and different types of genomic sequence, has not been examined in depth. We investigated the compositional differences in a set of 774 sequenced microbial genomes, finding rapid divergence among closely related genomes, but also convergence of compositional patterns among genomes with similar habitats. Support vector machines were then used to distinguish all pairs of genomes based on genome fragments 500 nucleotides in length. The nearly 300,000 accuracy scores obtained from these trials were used to construct general models of distinguishability versus taxonomic and compositional indices of genomic divergence. Unusual genome pairs were evident from their large residuals relative to the fitted model, and we identified several factors including genome reduction, putative lateral genetic transfer, and habitat convergence that influence the distinguishability of genomes. The positional, compositional, and functional context of a fragment within a genome has a strong influence on its likelihood of correct classification, but in a way that depends on the taxonomic and ecological similarity of the comparator genome.

Collapse

Examination of genome homogeneity in prokaryotes using genomic signatures. PLoS One 2009;4:e8113. [PMID: 19956556 PMCID: PMC2781299 DOI: 10.1371/journal.pone.0008113] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 11/05/2009] [Indexed: 01/17/2023] Open

Abstract

BACKGROUND

DNA word frequencies, normalized for genomic AT content, are remarkably stable within prokaryotic genomes and are therefore said to reflect a "genomic signature." The genomic signatures can be used to phylogenetically classify organisms from arbitrary sampled DNA. Genomic signatures can also be used to search for horizontally transferred DNA or DNA regions subjected to special selection forces. Thus, the stability of the genomic signature can be used as a measure of genomic homogeneity. The factors associated with the stability of the genomic signatures are not known, and this motivated us to investigate further. We analyzed the intra-genomic variance of genomic signatures based on AT content normalization (0(th) order Markov model) as well as genomic signatures normalized by smaller DNA words (1(st) and 2(nd) order Markov models) for 636 sequenced prokaryotic genomes. Regression models were fitted, with intra-genomic signature variance as the response variable, to a set of factors representing genomic properties such as genomic AT content, genome size, habitat, phylum, oxygen requirement, optimal growth temperature and oligonucleotide usage variance (OUV, a measure of oligonucleotide usage bias), measured as the variance between genomic tetranucleotide frequencies and Markov chain approximated tetranucleotide frequencies, as predictors.

PRINCIPAL FINDINGS

Regression analysis revealed that OUV was the most important factor (p<0.001) determining intra-genomic homogeneity as measured using genomic signatures. This means that the less random the oligonucleotide usage is in the sense of higher OUV, the more homogeneous the genome is in terms of the genomic signature. The other factors influencing variance in the genomic signature (p<0.001) were genomic AT content, phylum and oxygen requirement.

CONCLUSIONS

Genomic homogeneity in prokaryotes is intimately linked to genomic GC content, oligonucleotide usage bias (OUV) and aerobiosis, while oligonucleotide usage bias (OUV) is associated with genomic GC content, aerobiosis and habitat.

Collapse

Bohlin J, Skjerve E, Ussery DW. Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering. BMC Genomics 2009;10:487. [PMID: 19845945 PMCID: PMC2770534 DOI: 10.1186/1471-2164-10-487] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Accepted: 10/21/2009] [Indexed: 11/26/2022] Open

Dick GJ, Andersson AF, Baker BJ, Simmons SL, Thomas BC, Yelton AP, Banfield JF. Community-wide analysis of microbial genome sequence signatures. Genome Biol 2009;10:R85. [PMID: 19698104 PMCID: PMC2745766 DOI: 10.1186/gb-2009-10-8-r85] [Citation(s) in RCA: 373] [Impact Index Per Article: 24.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2009] [Revised: 07/10/2009] [Accepted: 08/21/2009] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Analyses of DNA sequences from cultivated microorganisms have revealed genome-wide, taxa-specific nucleotide compositional characteristics, referred to as genome signatures. These signatures have far-reaching implications for understanding genome evolution and potential application in classification of metagenomic sequence fragments. However, little is known regarding the distribution of genome signatures in natural microbial communities or the extent to which environmental factors shape them.

RESULTS

We analyzed metagenomic sequence data from two acidophilic biofilm communities, including composite genomes reconstructed for nine archaea, three bacteria, and numerous associated viruses, as well as thousands of unassigned fragments from strain variants and low-abundance organisms. Genome signatures, in the form of tetranucleotide frequencies analyzed by emergent self-organizing maps, segregated sequences from all known populations sharing < 50 to 60% average amino acid identity and revealed previously unknown genomic clusters corresponding to low-abundance organisms and a putative plasmid. Signatures were pervasive genome-wide. Clusters were resolved because intra-genome differences resulting from translational selection or protein adaptation to the intracellular (pH approximately 5) versus extracellular (pH approximately 1) environment were small relative to inter-genome differences. We found that these genome signatures stem from multiple influences but are primarily manifested through codon composition, which we propose is the result of genome-specific mutational biases.

CONCLUSIONS

An important conclusion is that shared environmental pressures and interactions among coevolving organisms do not obscure genome signatures in acid mine drainage communities. Thus, genome signatures can be used to assign sequence fragments to populations, an essential prerequisite if metagenomics is to provide ecological and biochemical insights into the functioning of microbial communities.

Collapse

Sekse C, Muniesa M, Wasteson Y. Conserved Stx2 phages from Escherichia coli O103:H25 isolated from patients suffering from hemolytic uremic syndrome. Foodborne Pathog Dis 2009;5:801-10. [PMID: 19014273 DOI: 10.1089/fpd.2008.0130] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open

TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics 2009;10:56. [PMID: 19210774 PMCID: PMC2653487 DOI: 10.1186/1471-2105-10-56] [Citation(s) in RCA: 142] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2008] [Accepted: 02/11/2009] [Indexed: 02/03/2023] Open

Abstract

Background

Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning.

Results

Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained.

Conclusion

An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference set can be easily updated as newly sequenced genomes become available. Moreover, the method demonstrated to be competitive when compared to the most current classifier PhyloPythia and has the advantage that it can be locally installed and the reference set can be kept up-to-date.

Collapse

Davenport CF, Wiehlmann L, Reva ON, Tümmler B. Visualization of Pseudomonas genomic structure by abundant 8-14mer oligonucleotides. Environ Microbiol 2009;11:1092-104. [PMID: 19161433 DOI: 10.1111/j.1462-2920.2008.01839.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]