Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Johnson ZI, Chisholm SW. Properties of overlapping genes are conserved across microbial genomes. Genome Res 2004;14:2268-72. [PMID: 15520290 PMCID: PMC525685 DOI: 10.1101/gr.2433104] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2004] [Revised: 08/12/2004] [Indexed: 11/25/2022]

For:	Johnson ZI, Chisholm SW. Properties of overlapping genes are conserved across microbial genomes. Genome Res 2004;14:2268-72. [PMID: 15520290 PMCID: PMC525685 DOI: 10.1101/gr.2433104] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2004] [Revised: 08/12/2004] [Indexed: 11/25/2022]

Number

Cited by Other Article(s)

Wasik BR, Turner PE. On the Biological Success of Viruses. Annu Rev Microbiol 2013;67:519-41. [DOI: 10.1146/annurev-micro-090110-102833] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Verma M, Lal D, Kaur J, Saxena A, Kaur J, Anand S, Lal R. Phylogenetic analyses of phylum Actinobacteria based on whole genome sequences. Res Microbiol 2013;164:718-28. [DOI: 10.1016/j.resmic.2013.04.002] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2012] [Accepted: 03/26/2013] [Indexed: 11/25/2022]

Behura SK, Severson DW. Overlapping genes of Aedes aegypti: evolutionary implications from comparison with orthologs of Anopheles gambiae and other insects. BMC Evol Biol 2013;13:124. [PMID: 23777277 PMCID: PMC3689595 DOI: 10.1186/1471-2148-13-124] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2012] [Accepted: 06/12/2013] [Indexed: 11/11/2022] Open

Giessen T, von Tesmar A, Marahiel M. Insights into the Generation of Structural Diversity in a tRNA-Dependent Pathway for Highly Modified Bioactive Cyclic Dipeptides. ACTA ACUST UNITED AC 2013;20:828-38. [DOI: 10.1016/j.chembiol.2013.04.017] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Revised: 04/25/2013] [Accepted: 04/30/2013] [Indexed: 01/08/2023]

Comparative analysis of the genomes of Clostera anastomosis (L.) granulovirus and Clostera anachoreta granulovirus. Arch Virol 2013;158:2109-14. [PMID: 23649176 DOI: 10.1007/s00705-013-1710-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Accepted: 03/26/2013] [Indexed: 10/26/2022]

Predicting statistical properties of open reading frames in bacterial genomes. PLoS One 2012;7:e45103. [PMID: 23028785 PMCID: PMC3454372 DOI: 10.1371/journal.pone.0045103] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 08/14/2012] [Indexed: 11/26/2022] Open

Ho MR, Tsai KW, Lin WC. A unified framework of overlapping genes: towards the origination and endogenic regulation. Genomics 2012;100:231-9. [PMID: 22766524 DOI: 10.1016/j.ygeno.2012.06.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2012] [Revised: 06/21/2012] [Accepted: 06/25/2012] [Indexed: 11/27/2022]

Ogilvie LA, Caplin J, Dedi C, Diston D, Cheek E, Bowler L, Taylor H, Ebdon J, Jones BV. Comparative (meta)genomic analysis and ecological profiling of human gut-specific bacteriophage φB124-14. PLoS One 2012;7:e35053. [PMID: 22558115 PMCID: PMC3338817 DOI: 10.1371/journal.pone.0035053] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2011] [Accepted: 03/08/2012] [Indexed: 12/30/2022] Open

Abstract

Bacteriophage associated with the human gut microbiome are likely to have an important impact on community structure and function, and provide a wealth of biotechnological opportunities. Despite this, knowledge of the ecology and composition of bacteriophage in the gut bacterial community remains poor, with few well characterized gut-associated phage genomes currently available. Here we describe the identification and in-depth (meta)genomic, proteomic, and ecological analysis of a human gut-specific bacteriophage (designated φB124-14). In doing so we illuminate a fraction of the biological dark matter extant in this ecosystem and its surrounding eco-genomic landscape, identifying a novel and uncharted bacteriophage gene-space in this community. φB124-14 infects only a subset of closely related gut-associated Bacteroides fragilis strains, and the circular genome encodes functions previously found to be rare in viral genomes and human gut viral metagenome sequences, including those which potentially confer advantages upon phage and/or host bacteria. Comparative genomic analyses revealed φB124-14 is most closely related to φB40-8, the only other publically available Bacteroides sp. phage genome, whilst comparative metagenomic analysis of both phage failed to identify any homologous sequences in 136 non-human gut metagenomic datasets searched, supporting the human gut-specific nature of this phage. Moreover, a potential geographic variation in the carriage of these and related phage was revealed by analysis of their distribution and prevalence within 151 human gut microbiomes and viromes from Europe, America and Japan. Finally, ecological profiling of φB124-14 and φB40-8, using both gene-centric alignment-driven phylogenetic analyses, as well as alignment-free gene-independent approaches was undertaken. This not only verified the human gut-specific nature of both phage, but also indicated that these phage populate a distinct and unexplored ecological landscape within the human gut microbiome.

Collapse

Refactoring the nitrogen fixation gene cluster from Klebsiella oxytoca. Proc Natl Acad Sci U S A 2012;109:7085-90. [PMID: 22509035 DOI: 10.1073/pnas.1120788109] [Citation(s) in RCA: 252] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Venter E, Smith RD, Payne SH. Proteogenomic analysis of bacteria and archaea: a 46 organism case study. PLoS One 2011;6:e27587. [PMID: 22114679 PMCID: PMC3219674 DOI: 10.1371/journal.pone.0027587] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 10/20/2011] [Indexed: 11/19/2022] Open

Snoeck J, Fellay J, Bartha I, Douek DC, Telenti A. Mapping of positive selection sites in the HIV-1 genome in the context of RNA and protein structural constraints. Retrovirology 2011;8:87. [PMID: 22044801 PMCID: PMC3229471 DOI: 10.1186/1742-4690-8-87] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2011] [Accepted: 11/01/2011] [Indexed: 02/06/2023] Open

Zhao L, Liu L, Leng W, Wei C, Jin Q. A proteogenomic analysis of Shigella flexneri using 2D LC-MALDI TOF/TOF. BMC Genomics 2011;12:528. [PMID: 22032405 PMCID: PMC3219829 DOI: 10.1186/1471-2164-12-528] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Accepted: 10/28/2011] [Indexed: 11/10/2022] Open

Martin JA, Wang Z. Next-generation transcriptome assembly. Nat Rev Genet 2011;12:671-82. [DOI: 10.1038/nrg3068] [Citation(s) in RCA: 895] [Impact Index Per Article: 68.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Wei W, Pelechano V, Järvelin AI, Steinmetz LM. Functional consequences of bidirectional promoters. Trends Genet 2011;27:267-76. [PMID: 21601935 PMCID: PMC3123404 DOI: 10.1016/j.tig.2011.04.002] [Citation(s) in RCA: 156] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2011] [Revised: 04/20/2011] [Accepted: 04/20/2011] [Indexed: 02/07/2023]

Zutz A, Hoffmann J, Hellmich UA, Glaubitz C, Ludwig B, Brutschy B, Tampé R. Asymmetric ATP hydrolysis cycle of the heterodimeric multidrug ABC transport complex TmrAB from Thermus thermophilus. J Biol Chem 2010;286:7104-15. [PMID: 21190941 DOI: 10.1074/jbc.m110.201178] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

The chromosomal mazEF locus of Streptococcus mutans encodes a functional type II toxin-antitoxin addiction system. J Bacteriol 2010;193:1122-30. [PMID: 21183668 DOI: 10.1128/jb.01114-10] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Chirico N, Vianelli A, Belshaw R. Why genes overlap in viruses. Proc Biol Sci 2010;277:3809-17. [PMID: 20610432 PMCID: PMC2992710 DOI: 10.1098/rspb.2010.1052] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2010] [Accepted: 06/14/2010] [Indexed: 12/25/2022] Open

Zhu Z, Cardin CJ, Gan Y, Colquhoun HM. Sequence-selective assembly of tweezer molecules on linear templates enables frameshift-reading of sequence information. Nat Chem 2010;2:653-60. [DOI: 10.1038/nchem.699] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2010] [Accepted: 05/10/2010] [Indexed: 11/09/2022]

Xu YP, Ye ZP, Niu CY, Bao YY, Wang WB, Shen WD, Zhang CX. Comparative analysis of the genomes of Bombyx mandarina and Bombyx mori nucleopolyhedroviruses. J Microbiol 2010;48:102-10. [PMID: 20221737 DOI: 10.1007/s12275-009-0197-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2009] [Accepted: 08/03/2009] [Indexed: 11/24/2022]

Cheng CH, Yang CH, Chiu HT, Lu CL. Reconstructing genome trees of prokaryotes using overlapping genes. BMC Bioinformatics 2010;11:102. [PMID: 20181237 PMCID: PMC2845580 DOI: 10.1186/1471-2105-11-102] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2009] [Accepted: 02/24/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Overlapping genes (OGs) are defined as adjacent genes whose coding sequences overlap partially or entirely. In fact, they are ubiquitous in microbial genomes and more conserved between species than non-overlapping genes. Based on this property, we have previously implemented a web server, named OGtree, that allows the user to reconstruct genome trees of some prokaryotes according to their pairwise OG distances. By analogy to the analyses of gene content and gene order, the OG distance between two genomes we defined was based on a measure of combining OG content (i.e., the normalized number of shared orthologous OG pairs) and OG order (i.e., the normalized OG breakpoint distance) in their whole genomes. A shortcoming of using the concept of breakpoints to define the OG distance is its inability to analyze the OG distance of multi-chromosomal genomes. In addition, the amount of overlapping coding sequences between some distantly related prokaryotic genomes may be limited so that it is hard to find enough OGs to properly evaluate their pairwise OG distances.

RESULTS

In this study, we therefore define a new OG order distance that is based on more biologically accurate rearrangements (e.g., reversals, transpositions and translocations) rather than breakpoints and that is applicable to both uni-chromosomal and multi-chromosomal genomes. In addition, we expand the term "gene" to include both its coding sequence and regulatory regions so that two adjacent genes whose coding sequences or regulatory regions overlap with each other are considered as a pair of overlapping genes. This is because overlapping of regulatory regions of distinct genes suggests that the regulation of expression for these genes should be more or less interrelated. Based on these modifications, we have reimplemented our OGtree as a new web server, named OGtree2, and have also evaluated its accuracy of genome tree reconstruction on a testing dataset consisting of 21 Proteobacteria genomes. Our experimental results have finally shown that our current OGtree2 indeed outperforms its previous version OGtree, as well as another similar server, called BPhyOG, significantly in the quality of genome tree reconstruction, because the phylogenetic tree obtained by OGtree2 is greatly congruent with the reference tree that coincides with the taxonomy accepted by biologists for these Proteobacteria.

CONCLUSIONS

In this study, we have introduced a new web server OGtree2 at http://bioalgorithm.life.nctu.edu.tw/OGtree2.0/ that can serve as a useful tool for reconstructing more precise and robust genome trees of prokaryotes according to their overlapping genes.

Collapse

SHEN ZY, LI ZF, HANG XY, ZHANG CG. Dual Coding Genes in Eukaryote*. PROG BIOCHEM BIOPHYS 2009. [DOI: 10.3724/sp.j.1206.2008.00620] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Genes involved in yellow pigmentation of Cronobacter sakazakii ES5 and influence of pigmentation on persistence and growth under environmental stress. Appl Environ Microbiol 2009;76:1053-61. [PMID: 20038705 DOI: 10.1128/aem.01420-09] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Kim W, Silby MW, Purvine SO, Nicoll JS, Hixson KK, Monroe M, Nicora CD, Lipton MS, Levy SB. Proteomic detection of non-annotated protein-coding genes in Pseudomonas fluorescens Pf0-1. PLoS One 2009;4:e8455. [PMID: 20041161 PMCID: PMC2794547 DOI: 10.1371/journal.pone.0008455] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 12/02/2009] [Indexed: 11/18/2022] Open

Cock PJA, Whitworth DE. Evolution of relative reading frame bias in unidirectional prokaryotic gene overlaps. Mol Biol Evol 2009;27:753-6. [PMID: 20008458 DOI: 10.1093/molbev/msp302] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Pallejà A, García-Vallvé S, Romeu A. Adaptation of the short intergenic spacers between co-directional genes to the Shine-Dalgarno motif among prokaryote genomes. BMC Genomics 2009;10:537. [PMID: 19922619 PMCID: PMC2784483 DOI: 10.1186/1471-2164-10-537] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2009] [Accepted: 11/18/2009] [Indexed: 11/30/2022] Open

Grinchuk OV, Jenjaroenpun P, Orlov YL, Zhou J, Kuznetsov VA. Integrative analysis of the human cis-antisense gene pairs, miRNAs and their transcription regulation patterns. Nucleic Acids Res 2009;38:534-47. [PMID: 19906709 PMCID: PMC2811022 DOI: 10.1093/nar/gkp954] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Pallejà A, Reverter T, Garcia-Vallvé S, Romeu A. PairWise Neighbours database: overlaps and spacers among prokaryote genomes. BMC Genomics 2009;10:281. [PMID: 19555467 PMCID: PMC2716372 DOI: 10.1186/1471-2164-10-281] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2009] [Accepted: 06/25/2009] [Indexed: 05/25/2023] Open

Singh TR, Pardasani KR. Ambush hypothesis revisited: Evidences for phylogenetic trends. Comput Biol Chem 2009;33:239-44. [PMID: 19473880 DOI: 10.1016/j.compbiolchem.2009.04.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2008] [Revised: 04/15/2009] [Accepted: 04/23/2009] [Indexed: 10/20/2022]

Torres C, Galián C, Freiberg C, Fantino JR, Jault JM. The YheI/YheH heterodimer from Bacillus subtilis is a multidrug ABC transporter. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2009;1788:615-22. [DOI: 10.1016/j.bbamem.2008.12.012] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2008] [Revised: 12/12/2008] [Accepted: 12/22/2008] [Indexed: 12/12/2022]

Parker MS, Mock T, Armbrust EV. Genomic insights into marine microalgae. Annu Rev Genet 2009;42:619-45. [PMID: 18983264 DOI: 10.1146/annurev.genet.42.110807.091417] [Citation(s) in RCA: 122] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Lin GN, Cai Z, Lin G, Chakraborty S, Xu D. ComPhy: prokaryotic composite distance phylogenies inferred from whole-genome gene sets. BMC Bioinformatics 2009;10 Suppl 1:S5. [PMID: 19208152 PMCID: PMC2648732 DOI: 10.1186/1471-2105-10-s1-s5] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Warren AS, Setubal JC. The Genome Reverse Compiler: an explorative annotation tool. BMC Bioinformatics 2009;10:35. [PMID: 19173744 PMCID: PMC2640359 DOI: 10.1186/1471-2105-10-35] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2008] [Accepted: 01/27/2009] [Indexed: 11/19/2022] Open

Han D, Krauss G. Characterization of the endonuclease SSO2001 fromSulfolobus solfataricusP2. FEBS Lett 2009;583:771-6. [DOI: 10.1016/j.febslet.2009.01.024] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2008] [Revised: 01/16/2009] [Accepted: 01/16/2009] [Indexed: 10/21/2022]

Sabath N, Landan G, Graur D. A method for the simultaneous estimation of selection intensities in overlapping genes. PLoS One 2008;3:e3996. [PMID: 19098983 PMCID: PMC2601044 DOI: 10.1371/journal.pone.0003996] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2008] [Accepted: 11/21/2008] [Indexed: 11/18/2022] Open

Sabath N, Graur D, Landan G. Same-strand overlapping genes in bacteria: compositional determinants of phase bias. Biol Direct 2008;3:36. [PMID: 18717987 PMCID: PMC2542354 DOI: 10.1186/1745-6150-3-36] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2008] [Accepted: 08/21/2008] [Indexed: 11/24/2022] Open

Abstract

Background

Same-strand overlapping genes may occur in frameshifts of one (phase 1) or two nucleotides (phase 2). In previous studies of bacterial genomes, long phase-1 overlaps were found to be more numerous than long phase-2 overlaps. This bias was explained by either genomic location or an unspecified selection advantage. Models that focused on the ability of the two genes to evolve independently did not predict this phase bias. Here, we propose that a purely compositional model explains the phase bias in a more parsimonious manner. Same-strand overlapping genes may arise through either a mutation at the termination codon of the upstream gene or a mutation at the initiation codon of the downstream gene. We hypothesized that given these two scenarios, the frequencies of initiation and termination codons in the two phases may determine the number for overlapping genes.

Results

We examined the frequencies of initiation- and termination-codons in the two phases, and found that termination codons do not significantly differ between the two phases, whereas initiation codons are more abundant in phase 1. We found that the primary factors explaining the phase inequality are the frequencies of amino acids whose codons may combine to form start codons in the two phases. We show that the frequencies of start codons in each of the two phases, and, hence, the potential for the creation of overlapping genes, are determined by a universal amino-acid frequency and species-specific codon usage, leading to a correlation between long phase-1 overlaps and genomic GC content.

Conclusion

Our model explains the phase bias in same-strand overlapping genes by compositional factors without invoking selection. Therefore, it can be used as a null model of neutral evolution to test selection hypotheses concerning the evolution of overlapping genes.

Reviewers

This article was reviewed by Bill Martin, Itai Yanai, and Mikhail Gelfand.

Collapse

Pallejà A, Harrington ED, Bork P. Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? BMC Genomics 2008;9:335. [PMID: 18627618 PMCID: PMC2478687 DOI: 10.1186/1471-2164-9-335] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2008] [Accepted: 07/15/2008] [Indexed: 11/20/2022] Open

Abstract

Background

Across the fully sequenced microbial genomes there are thousands of examples of overlapping genes. Many of these are only a few nucleotides long and are thought to function by permitting the coordinated regulation of gene expression. However, there should also be selective pressure against long overlaps, as the existence of overlapping reading frames increases the risk of deleterious mutations. Here we examine the longest overlaps and assess whether they are the product of special functional constraints or of erroneous annotation.

Results

We analysed the genes that overlap by 60 bps or more among 338 fully-sequenced prokaryotic genomes. The likely functional significance of an overlap was determined by comparing each of the genes to its respective orthologs. If a gene showed a significantly different length from its orthologs it was considered unlikely to be functional and therefore the result of an error either in sequencing or gene prediction. Focusing on 715 co-directional overlaps longer than 60 bps, we classified the erroneous ones into five categories: i) 5'-end extension of the downstream gene due to either a mispredicted start codon or a frameshift at 5'-end of the gene (409 overlaps), ii) fragmentation of a gene caused by a frameshift (163), iii) 3'-end extension of the upstream gene due to either a frameshift at 3'-end of a gene or point mutation at the stop codon (68), iv) Redundant gene predictions (4), v) 5' & 3'-end extension which is a combination of i) and iii) (71). We also studied 75 divergent overlaps that could be classified as misannotations of group i). Nevertheless we found some convergent long overlaps (54) that might be true overlaps, although an important part of convergent overlaps could be classified as group iii) (124).

Conclusion

Among the 968 overlaps larger than 60 bps which we analysed, we did not find a single real one among the co-directional and divergent orientations and concluded that there had been an excessive number of misannotations. Only convergent orientation seems to permit some long overlaps, although convergent overlaps are also hampered by misannotations. We propose a simple rule to flag these erroneous gene length predictions to facilitate automatic annotation.

Collapse

Jiang LW, Lin KL, Lu CL. OGtree: a tool for creating genome trees of prokaryotes based on overlapping genes. Nucleic Acids Res 2008;36:W475-80. [PMID: 18456706 PMCID: PMC2447762 DOI: 10.1093/nar/gkn240] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Sanna CR, Li WH, Zhang L. Overlapping genes in the human and mouse genomes. BMC Genomics 2008;9:169. [PMID: 18410680 PMCID: PMC2335118 DOI: 10.1186/1471-2164-9-169] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2007] [Accepted: 04/14/2008] [Indexed: 11/10/2022] Open

Analysis of a novel spore antigen in Bacillus anthracis that contributes to spore opsonization. Microbiology (Reading) 2008;154:619-632. [DOI: 10.1099/mic.0.2007/008292-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Delaye L, Deluna A, Lazcano A, Becerra A. The origin of a novel gene through overprinting in Escherichia coli. BMC Evol Biol 2008;8:31. [PMID: 18226237 PMCID: PMC2268670 DOI: 10.1186/1471-2148-8-31] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2007] [Accepted: 01/28/2008] [Indexed: 11/10/2022] Open

McCauley S, de Groot S, Mailund T, Hein J. Annotation of selection strengths in viral genomes. ACTA ACUST UNITED AC 2007;23:2978-86. [PMID: 17921171 DOI: 10.1093/bioinformatics/btm472] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Abstract

MOTIVATION

Viral genomes tend to code in overlapping reading frames to maximize informational content. This may result in atypical codon bias and particular evolutionary constraints. Due to the fast mutation rate of viruses, there is additional strong evidence for varying selection between intra- and intergenomic regions. The presence of multiple coding regions complicates the concept of K(a)/K(s) ratio, and thus begs for an alternative approach when investigating selection strengths. Building on the paper by McCauley and Hein, we develop a method for annotating a viral genome coding in overlapping reading frames. We introduce an evolutionary model capable of accounting for varying levels of selection along the genome, and incorporate it into our prior single sequence HMM methodology, extending it now to a phylogenetic HMM. Given an alignment of several homologous viruses to a reference sequence, we may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses.

RESULTS

We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as of three Hepatitis B sequences. We obtain an annotation of the coding regions, as well as a posterior probability for each site of the strength of selection acting on it. From this we may deduce the average posterior selection acting on the different genes. Whilst we are encouraged to see in HIV2, that the known to be conserved genes gag and pol are indeed annotated as such, we also discover several sites of less stringent negative selection within the env gene. To the best of our knowledge, we are the first to subsequently provide a full selection annotation of the Hepatitis B genome by explicitly modelling the evolution within overlapping reading frames, and not relying on simple K(a)/K(s) ratios.

Collapse

Lillo F, Krakauer DC. A statistical analysis of the three-fold evolution of genomic compression through frame overlaps in prokaryotes. Biol Direct 2007;2:22. [PMID: 17877818 PMCID: PMC2174442 DOI: 10.1186/1745-6150-2-22] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2007] [Accepted: 09/18/2007] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Among microbial genomes, genetic information is frequently compressed, exploiting redundancies in the genetic code in order to store information in overlapping genes. We investigate the length, phase and orientation properties of overlap in 58 prokaryotic species evaluating neutral and selective mechanisms of evolution.

RESULTS

Using a variety of statistical null models we find patterns of compressive coding that can not be explained purely in terms of the selective processes favoring genome minimization or translational coupling. The distribution of overlap lengths follows a fat-tailed distribution, in which a significant proportion of overlaps are in excess of 100 base pairs in length. The phase of overlap--pairing of codon positions in complementary reading frames--is strongly predicted by the translation orientation of each gene. We find that as overlapping genes become longer, they have a tendency to alternate among alternative overlap phases. Some phases seem to reflect codon pairings reducing the probability of non-synonymous substitution. We analyze the lineage-dependent features of overlapping genes by tracing a number of different continuous characters through the prokaryotic phylogeny using squared-change parsimony and observe both clade-specific and species-specific patterns.

CONCLUSION

Overlapping reading frames preserve in their structure, features relating to mutational origination of new genes, but have undergone modification for both immediate benefits and for variational buffering and amplification. Genomes come under a variety of different mutational and selectional pressures, and the structure of redundancies in overlapping genes can be used to detect these pressures. No single mechanism is able to account for all the variability observed among the set of prokaryotic overlapping genes but a three-fold analysis of evolutionary events provides a more integrative framework.

Collapse

Belshaw R, Pybus OG, Rambaut A. The evolution of genome compression and genomic novelty in RNA viruses. Genome Res 2007;17:1496-504. [PMID: 17785537 PMCID: PMC1987338 DOI: 10.1101/gr.6305707] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

BPhyOG: an interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes. BMC Bioinformatics 2007;8:266. [PMID: 17650344 PMCID: PMC1940028 DOI: 10.1186/1471-2105-8-266] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2006] [Accepted: 07/25/2007] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Overlapping genes (OGs) in bacterial genomes are pairs of adjacent genes of which the coding sequences overlap partly or entirely. With the rapid accumulation of sequence data, many OGs in bacterial genomes have now been identified. Indeed, these might prove a consistent feature across all microbial genomes. Our previous work suggests that OGs can be considered as robust markers at the whole genome level for the construction of phylogenies. An online, interactive web server for inferring phylogenies is needed for biologists to analyze phylogenetic relationships among a set of bacterial genomes of interest.

DESCRIPTION

BPhyOG is an online interactive server for reconstructing the phylogenies of completely sequenced bacterial genomes on the basis of their shared overlapping genes. It provides two tree-reconstruction methods: Neighbor Joining (NJ) and Unweighted Pair-Group Method using Arithmetic averages (UPGMA). Users can apply the desired method to generate phylogenetic trees, which are based on an evolutionary distance matrix for the selected genomes. The distance between two genomes is defined by the normalized number of their shared OG pairs. BPhyOG also allows users to browse the OGs that were used to infer the phylogenetic relationships. It provides detailed annotation for each OG pair and the features of the component genes through hyperlinks. Users can also retrieve each of the homologous OG pairs that have been determined among 177 genomes. It is a useful tool for analyzing the tree of life and overlapping genes from a genomic standpoint.

CONCLUSION

BPhyOG is a useful interactive web server for genome-wide inference of any potential evolutionary relationship among the genomes selected by users. It currently includes 177 completely sequenced bacterial genomes containing 79,855 OG pairs, the annotation and homologous OG pairs of which are integrated comprehensively. The reliability of phylogenies complemented by annotations make BPhyOG a powerful web server for genomic and genetic studies. It is freely available at http://cmb.bnu.edu.cn/BPhyOG.

Collapse

Kingsford C, Delcher AL, Salzberg SL. A unified model explaining the offsets of overlapping and near-overlapping prokaryotic genes. Mol Biol Evol 2007;24:2091-8. [PMID: 17642473 PMCID: PMC2429982 DOI: 10.1093/molbev/msm145] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Li SC, Tang P, Lin WC. Intronic microRNA: discovery and biological implications. DNA Cell Biol 2007;26:195-207. [PMID: 17465886 DOI: 10.1089/dna.2006.0558] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Jothi R, Przytycka TM, Aravind L. Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment. BMC Bioinformatics 2007;8:173. [PMID: 17521444 PMCID: PMC1904249 DOI: 10.1186/1471-2105-8-173] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2007] [Accepted: 05/23/2007] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

A widely-used approach for discovering functional and physical interactions among proteins involves phylogenetic profile comparisons (PPCs). Here, proteins with similar profiles are inferred to be functionally related under the assumption that proteins involved in the same metabolic pathway or cellular system are likely to have been co-inherited during evolution.

RESULTS

Our experimentation with E. coli and yeast proteins with 16 different carefully composed reference sets of genomes revealed that the phyletic patterns of proteins in prokaryotes alone could be adequate enough to make reasonably accurate functional linkage predictions. A slight improvement in performance is observed on adding few eukaryotes into the reference set, but a noticeable drop-off in performance is observed with increased number of eukaryotes. Inclusion of most parasitic, pathogenic or vertebrate genomes and multiple strains of the same species into the reference set do not necessarily contribute to an improved sensitivity or accuracy. Interestingly, we also found that evolutionary histories of individual pathways have a significant affect on the performance of the PPC approach with respect to a particular reference set. For example, to accurately predict functional links in carbohydrate or lipid metabolism, a reference set solely composed of prokaryotic (or bacterial) genomes performed among the best compared to one composed of genomes from all three super-kingdoms; this is in contrast to predicting functional links in translation for which a reference set composed of prokaryotic (or bacterial) genomes performed the worst. We also demonstrate that the widely used random null model to quantify the statistical significance of profile similarity is incomplete, which could result in an increased number of false-positives.

CONCLUSION

Contrary to previous proposals, it is not merely the number of genomes but a careful selection of informative genomes in the reference set that influences the prediction accuracy of the PPC approach. We note that the predictive power of the PPC approach, especially in eukaryotes, is heavily influenced by the primary endosymbiosis and subsequent bacterial contributions. The over-representation of parasitic unicellular eukaryotes and vertebrates additionally make eukaryotes less useful in the reference sets. Reference sets composed of highly non-redundant set of genomes from all three super-kingdoms fare better with pathways showing considerable vertical inheritance and strong conservation (e.g. translation apparatus), while reference sets solely composed of prokaryotic genomes fare better for more variable pathways like carbohydrate metabolism. Differential performance of the PPC approach on various pathways, and a weak positive correlation between functional and profile similarities suggest that caution should be exercised while interpreting functional linkages inferred from genome-wide large-scale profile comparisons using a single reference set.

Collapse

Kapranov P, Willingham AT, Gingeras TR. Genome-wide transcription and the implications for genomic organization. Nat Rev Genet 2007;8:413-23. [PMID: 17486121 DOI: 10.1038/nrg2083] [Citation(s) in RCA: 529] [Impact Index Per Article: 31.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Cock PJA, Whitworth DE. Evolution of gene overlaps: relative reading frame bias in prokaryotic two-component system genes. J Mol Evol 2007;64:457-62. [PMID: 17479344 DOI: 10.1007/s00239-006-0180-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2006] [Accepted: 01/22/2007] [Indexed: 10/23/2022]

100

de Groot S, Mailund T, Hein J. Comparative annotation of viral genomes with non-conserved gene structure. Bioinformatics 2007;23:1080-9. [PMID: 17341494 DOI: 10.1093/bioinformatics/btm078] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Abstract

MOTIVATION

Detecting genes in viral genomes is a complex task. Due to the biological necessity of them being constrained in length, RNA viruses in particular tend to code in overlapping reading frames. Since one amino acid is encoded by a triplet of nucleic acids, up to three genes may be coded for simultaneously in one direction. Conventional hidden Markov model (HMM)-based gene-finding algorithms may typically find it difficult to identify multiple coding regions, since in general their topologies do not allow for the presence of overlapping or nested genes. Comparative methods have therefore been restricted to likelihood ratio tests on potential regions as to being double or single coding, using the fact that the constrictions forced upon multiple-coding nucleotides will result in atypical sequence evolution. Exploiting these same constraints, we present an HMM based gene-finding program, which allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences.

RESULTS

We apply our method to 15 pairwise alignments of six different HIV2 genomes. Given sufficient evolutionary distance between the two sequences, we achieve sensitivity of approximately 84-89% and specificity of approximately 97-99.9%. We additionally annotate three pairwise alignments of the more distantly related HIV1 and HIV2, as well as of two different hepatitis viruses, attaining results of approximately 87% sensitivity and approximately 98.5% specificity. We subsequently incorporate prior knowledge by 'knowing' the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate that conservation of gene structure on top of nucleotide sequence is a valuable source of information, especially in distantly related genomes.

AVAILABILITY

The Java code is available from the authors.

Collapse