51
|
Affiliation(s)
- Brian R. Wasik
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06520-8106; ,
| | - Paul E. Turner
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06520-8106; ,
| |
Collapse
|
52
|
Verma M, Lal D, Kaur J, Saxena A, Kaur J, Anand S, Lal R. Phylogenetic analyses of phylum Actinobacteria based on whole genome sequences. Res Microbiol 2013; 164:718-28. [DOI: 10.1016/j.resmic.2013.04.002] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2012] [Accepted: 03/26/2013] [Indexed: 11/25/2022]
|
53
|
Behura SK, Severson DW. Overlapping genes of Aedes aegypti: evolutionary implications from comparison with orthologs of Anopheles gambiae and other insects. BMC Evol Biol 2013; 13:124. [PMID: 23777277 PMCID: PMC3689595 DOI: 10.1186/1471-2148-13-124] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2012] [Accepted: 06/12/2013] [Indexed: 11/11/2022] Open
Abstract
Background Although gene overlapping is a common feature of prokaryote and mitochondria genomes, such genes have also been identified in many eukaryotes. The overlapping genes in eukaryotes are extensively rearranged even between closely related species. In this study, we investigated retention and rearrangement of positionally overlapping genes between the mosquitoes Aedes aegypti (dengue virus vector) and Anopheles gambiae (malaria vector). The overlapping gene pairs of A. aegypti were further compared with orthologs of other selected insects to conduct several hypothesis driven investigations relating to the evolution and rearrangement of overlapping genes. Results The results show that as much as ~10% of the predicted genes of A. aegypti and A. gambiae are localized in positional overlapping manner. Furthermore, the study shows that differential abundance of introns and simple sequence repeats have significant association with positional rearrangement of overlapping genes between the two species. Gene expression analysis further suggests that antisense transcripts generated from the oppositely oriented overlapping genes are differentially regulated and may have important regulatory functions in these mosquitoes. Our data further shows that synonymous and non-synonymous mutations have differential but non-significant effect on overlapping localization of orthologous genes in other insect genomes. Conclusion Gene overlapping in insects may be a species-specific evolutionary process as evident from non-dependency of gene overlapping with species phylogeny. Based on the results, our study suggests that overlapping genes may have played an important role in genome evolution of insects.
Collapse
Affiliation(s)
- Susanta K Behura
- Eck Institute for Global Health, Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
| | | |
Collapse
|
54
|
Giessen T, von Tesmar A, Marahiel M. Insights into the Generation of Structural Diversity in a tRNA-Dependent Pathway for Highly Modified Bioactive Cyclic Dipeptides. ACTA ACUST UNITED AC 2013; 20:828-38. [DOI: 10.1016/j.chembiol.2013.04.017] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Revised: 04/25/2013] [Accepted: 04/30/2013] [Indexed: 01/08/2023]
|
55
|
Comparative analysis of the genomes of Clostera anastomosis (L.) granulovirus and Clostera anachoreta granulovirus. Arch Virol 2013; 158:2109-14. [PMID: 23649176 DOI: 10.1007/s00705-013-1710-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Accepted: 03/26/2013] [Indexed: 10/26/2022]
Abstract
Clostera anastomosis (L.) granulovirus (CaLGV) and Clostera anachoreta granulovirus (ClanGV) are both capable of infecting each other's native host insects. Despite this, we have little information on their genetic relationship. The complete nucleotide sequence of CaLGV was determined and compared with that of the genome of ClanGV. The circular, double-stranded DNA CaLGV genome (GenBank accession no. KC179784) had a G+C content of 46.7 % and was 101,818 bp in size (331 bp larger than the ClanGV genome). Overall, the CaLGV nucleotide sequence was found to be 90 % identical to that of ClanGV. It contained a total of 123 ORFs, 119 of which had ClanGV homologues, with an identical transcription direction and ORF organization. Seventy-five of the 119 ORFs showed 90 % or greater identity to their ClanGV homologues. CaLGV contained only a single identifiable homologous region (hrs)/repeat region (similar to ClanGV hr4). The mean frequency of nucleotide substitutions in the CaLGV/ClanGV coding regions was 8.33 %. CaLGV contained four unique ORFs (CaL23, CaL39, CaL48 and CaL92). Eight ORFs found in both CaLGV and ClanGV have no homologues in other baculoviruses. Intergenic regions of CaLGV and ClanGV occupied 6.6 % and 7 % of their respective genomes. CaLGV appears closer phylogenetically to ClanGV than to any other baculoviruses.
Collapse
|
56
|
Predicting statistical properties of open reading frames in bacterial genomes. PLoS One 2012; 7:e45103. [PMID: 23028785 PMCID: PMC3454372 DOI: 10.1371/journal.pone.0045103] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 08/14/2012] [Indexed: 11/26/2022] Open
Abstract
An analytical model based on the statistical properties of Open Reading Frames (ORFs) of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.
Collapse
|
57
|
Ho MR, Tsai KW, Lin WC. A unified framework of overlapping genes: towards the origination and endogenic regulation. Genomics 2012; 100:231-9. [PMID: 22766524 DOI: 10.1016/j.ygeno.2012.06.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2012] [Revised: 06/21/2012] [Accepted: 06/25/2012] [Indexed: 11/27/2022]
Abstract
Overlapping genes are pairs of adjacent genes whose genomic regions partially overlap. They are notable by their potential intricate regulation, such as cis-regulation of nested gene-promoter configurations, and post-transcriptional regulation of natural antisense transcripts. The originations and consequent detailed regulation remain obscure. Herein, we propose a unified framework comprising biological classification rules followed by extensive analyses, namely, exon-sharing analysis, a human-mouse conservation study, and transcriptome analysis of hundreds of microarrays and transcriptome sequencing data (mRNA-Seq). We demonstrate that the tail-to-tail architecture would result from sharing functional elements in 3'-untranslated regions (3'-UTRs) of pre-existing genes. Dissimilarly, we illustrate that the other gene overlaps would originate from a new gene arising in a pre-existing gene locus. Interestingly, these types of coupled overlapping genes may influence each other synergistically or competitively during transcription, depending on the promoter configurations. This framework discloses distinctive characteristics of overlapping genes to be a foundation for a further comprehensive understanding of them.
Collapse
Affiliation(s)
- Meng-Ru Ho
- Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
| | | | | |
Collapse
|
58
|
Ogilvie LA, Caplin J, Dedi C, Diston D, Cheek E, Bowler L, Taylor H, Ebdon J, Jones BV. Comparative (meta)genomic analysis and ecological profiling of human gut-specific bacteriophage φB124-14. PLoS One 2012; 7:e35053. [PMID: 22558115 PMCID: PMC3338817 DOI: 10.1371/journal.pone.0035053] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2011] [Accepted: 03/08/2012] [Indexed: 12/30/2022] Open
Abstract
Bacteriophage associated with the human gut microbiome are likely to have an important impact on community structure and function, and provide a wealth of biotechnological opportunities. Despite this, knowledge of the ecology and composition of bacteriophage in the gut bacterial community remains poor, with few well characterized gut-associated phage genomes currently available. Here we describe the identification and in-depth (meta)genomic, proteomic, and ecological analysis of a human gut-specific bacteriophage (designated φB124-14). In doing so we illuminate a fraction of the biological dark matter extant in this ecosystem and its surrounding eco-genomic landscape, identifying a novel and uncharted bacteriophage gene-space in this community. φB124-14 infects only a subset of closely related gut-associated Bacteroides fragilis strains, and the circular genome encodes functions previously found to be rare in viral genomes and human gut viral metagenome sequences, including those which potentially confer advantages upon phage and/or host bacteria. Comparative genomic analyses revealed φB124-14 is most closely related to φB40-8, the only other publically available Bacteroides sp. phage genome, whilst comparative metagenomic analysis of both phage failed to identify any homologous sequences in 136 non-human gut metagenomic datasets searched, supporting the human gut-specific nature of this phage. Moreover, a potential geographic variation in the carriage of these and related phage was revealed by analysis of their distribution and prevalence within 151 human gut microbiomes and viromes from Europe, America and Japan. Finally, ecological profiling of φB124-14 and φB40-8, using both gene-centric alignment-driven phylogenetic analyses, as well as alignment-free gene-independent approaches was undertaken. This not only verified the human gut-specific nature of both phage, but also indicated that these phage populate a distinct and unexplored ecological landscape within the human gut microbiome.
Collapse
Affiliation(s)
- Lesley A. Ogilvie
- Centre for Biomedical and Health Science Research, School of Pharmacy and Biomolecular Sciences, University of Brighton, Brighton, United Kingdom
| | - Jonathan Caplin
- School of Environment and Technology, University of Brighton, Brighton, United Kingdom
| | - Cinzia Dedi
- Centre for Biomedical and Health Science Research, School of Pharmacy and Biomolecular Sciences, University of Brighton, Brighton, United Kingdom
| | - David Diston
- School of Environment and Technology, University of Brighton, Brighton, United Kingdom
| | - Elizabeth Cheek
- School of Computing, Engineering and Mathematics, University of Brighton, Brighton, United Kingdom
| | - Lucas Bowler
- Sussex Proteomics Centre, University of Sussex, Brighton, United Kingdom
| | - Huw Taylor
- School of Environment and Technology, University of Brighton, Brighton, United Kingdom
| | - James Ebdon
- School of Environment and Technology, University of Brighton, Brighton, United Kingdom
| | - Brian V. Jones
- Centre for Biomedical and Health Science Research, School of Pharmacy and Biomolecular Sciences, University of Brighton, Brighton, United Kingdom
- * E-mail:
| |
Collapse
|
59
|
Refactoring the nitrogen fixation gene cluster from Klebsiella oxytoca. Proc Natl Acad Sci U S A 2012; 109:7085-90. [PMID: 22509035 DOI: 10.1073/pnas.1120788109] [Citation(s) in RCA: 252] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Bacterial genes associated with a single trait are often grouped in a contiguous unit of the genome known as a gene cluster. It is difficult to genetically manipulate many gene clusters because of complex, redundant, and integrated host regulation. We have developed a systematic approach to completely specify the genetics of a gene cluster by rebuilding it from the bottom up using only synthetic, well-characterized parts. This process removes all native regulation, including that which is undiscovered. First, all noncoding DNA, regulatory proteins, and nonessential genes are removed. The codons of essential genes are changed to create a DNA sequence as divergent as possible from the wild-type (WT) gene. Recoded genes are computationally scanned to eliminate internal regulation. They are organized into operons and placed under the control of synthetic parts (promoters, ribosome binding sites, and terminators) that are functionally separated by spacer parts. Finally, a controller consisting of genetic sensors and circuits regulates the conditions and dynamics of gene expression. We applied this approach to an agriculturally relevant gene cluster from Klebsiella oxytoca encoding the nitrogen fixation pathway for converting atmospheric N(2) to ammonia. The native gene cluster consists of 20 genes in seven operons and is encoded in 23.5 kb of DNA. We constructed a "refactored" gene cluster that shares little DNA sequence identity with WT and for which the function of every genetic part is defined. This work demonstrates the potential for synthetic biology tools to rewrite the genetics encoding complex biological functions to facilitate access, engineering, and transferability.
Collapse
|
60
|
Venter E, Smith RD, Payne SH. Proteogenomic analysis of bacteria and archaea: a 46 organism case study. PLoS One 2011; 6:e27587. [PMID: 22114679 PMCID: PMC3219674 DOI: 10.1371/journal.pone.0027587] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 10/20/2011] [Indexed: 11/19/2022] Open
Abstract
Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We performed a proteogenomic analysis of 46 genomes spanning eight bacterial and archaeal phyla across the tree of life. These diverse datasets facilitated the development of a robust approach for proteogenomics that is functional across genomes varying in %GC, gene content, proteomic sampling depth, phylogeny, and genome size. In addition to finding evidence for 682 novel proteins, 1336 new start sites, and numerous dubious genes, we discovered sites of post-translational maturation in the form of proteolytic cleavage of 1175 signal peptides. The number of novel proteins per genome is highly variable (median 7, mean 15, stdev 20). Moreover, comparison of novel genes with the current genes did not reveal any consistent abnormalities. Thus, we conclude that proteogenomics fulfills a yet to be understood deficiency in gene prediction. With the adoption of new sequencing technologies which have higher error rates than Sanger-based methods and the advances in proteomics, proteogenomics may become even more important in the future.
Collapse
Affiliation(s)
- Eli Venter
- Department of Informatics, J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Richard D. Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Samuel H. Payne
- Department of Informatics, J. Craig Venter Institute, Rockville, Maryland, United States of America
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
- * E-mail:
| |
Collapse
|
61
|
Snoeck J, Fellay J, Bartha I, Douek DC, Telenti A. Mapping of positive selection sites in the HIV-1 genome in the context of RNA and protein structural constraints. Retrovirology 2011; 8:87. [PMID: 22044801 PMCID: PMC3229471 DOI: 10.1186/1742-4690-8-87] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2011] [Accepted: 11/01/2011] [Indexed: 02/06/2023] Open
Abstract
Background The HIV-1 genome is subject to pressures that target the virus resulting in escape and adaptation. On the other hand, there is a requirement for sequence conservation because of functional and structural constraints. Mapping the sites of selective pressure and conservation on the viral genome generates a reference for understanding the limits to viral escape, and can serve as a template for the discovery of sites of genetic conflict with known or unknown host proteins. Results To build a thorough evolutionary, functional and structural map of the HIV-1 genome, complete subtype B sequences were obtained from the Los Alamos database. We mapped sites under positive selective pressure, amino acid conservation, protein and RNA structure, overlapping coding frames, CD8 T cell, CD4 T cell and antibody epitopes, and sites enriched in AG and AA dinucleotide motives. Globally, 33% of amino acid positions were found to be variable and 12% of the genome was under positive selection. Because interrelated constraining and diversifying forces shape the viral genome, we included the variables from both classes of pressure in a multivariate model to predict conservation or positive selection: structured RNA and α-helix domains independently predicted conservation while CD4 T cell and antibody epitopes were associated with positive selection. Conclusions The global map of the viral genome contains positive selected sites that are not in canonical CD8 T cell, CD4 T cell or antibody epitopes; thus, it identifies a class of residues that may be targeted by other host selective pressures. Overall, RNA structure represents the strongest determinant of HIV-1 conservation. These data can inform the combined analysis of host and viral genetic information.
Collapse
Affiliation(s)
- Joke Snoeck
- Institute of Microbiology, University Hospital Center and University of Lausanne, Lausanne, Switzerland
| | | | | | | | | |
Collapse
|
62
|
Zhao L, Liu L, Leng W, Wei C, Jin Q. A proteogenomic analysis of Shigella flexneri using 2D LC-MALDI TOF/TOF. BMC Genomics 2011; 12:528. [PMID: 22032405 PMCID: PMC3219829 DOI: 10.1186/1471-2164-12-528] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Accepted: 10/28/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND New strategies for high-throughput sequencing are constantly appearing, leading to a great increase in the number of completely sequenced genomes. Unfortunately, computational genome annotation is out of step with this progress. Thus, the accurate annotation of these genomes has become a bottleneck of knowledge acquisition. RESULTS We exploited a proteogenomic approach to improve conventional genome annotation by integrating proteomic data with genomic information. Using Shigella flexneri 2a as a model, we identified total 823 proteins, including 187 hypothetical proteins. Among them, three annotated ORFs were extended upstream through comprehensive analysis against an in-house N-terminal extension database. Two genes, which could not be translated to their full length because of stop codon 'mutations' induced by genome sequencing errors, were revised and annotated as fully functional genes. Above all, seven new ORFs were discovered, which were not predicted in S. flexneri 2a str.301 by any other annotation approaches. The transcripts of four novel ORFs were confirmed by RT-PCR assay. Additionally, most of these novel ORFs were overlapping genes, some even nested within the coding region of other known genes. CONCLUSIONS Our findings demonstrate that current Shigella genome annotation methods are not perfect and need to be improved. Apart from the validation of predicted genes at the protein level, the additional features of proteogenomic tools include revision of annotation errors and discovery of novel ORFs. The complementary dataset could provide more targets for those interested in Shigella to perform functional studies.
Collapse
Affiliation(s)
- Lina Zhao
- State Key Laboratory for Molecular Virology and Genetic Engineering, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, PR China
| | | | | | | | | |
Collapse
|
63
|
|
64
|
Wei W, Pelechano V, Järvelin AI, Steinmetz LM. Functional consequences of bidirectional promoters. Trends Genet 2011; 27:267-76. [PMID: 21601935 PMCID: PMC3123404 DOI: 10.1016/j.tig.2011.04.002] [Citation(s) in RCA: 156] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2011] [Revised: 04/20/2011] [Accepted: 04/20/2011] [Indexed: 02/07/2023]
Abstract
Several studies have shown that promoters of protein-coding genes are origins of pervasive non-coding RNA transcription and can initiate transcription in both directions. However, only recently have researchers begun to elucidate the functional implications of this bidirectionality and non-coding RNA production. Increasing evidence indicates that non-coding transcription at promoters influences the expression of protein-coding genes, revealing a new layer of transcriptional regulation. This regulation acts at multiple levels, from modifying local chromatin to enabling regional signal spreading and more distal regulation. Moreover, the bidirectional activity of a promoter is regulated at multiple points during transcription, giving rise to diverse types of transcripts.
Collapse
Affiliation(s)
| | | | | | - Lars M. Steinmetz
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| |
Collapse
|
65
|
Zutz A, Hoffmann J, Hellmich UA, Glaubitz C, Ludwig B, Brutschy B, Tampé R. Asymmetric ATP hydrolysis cycle of the heterodimeric multidrug ABC transport complex TmrAB from Thermus thermophilus. J Biol Chem 2010; 286:7104-15. [PMID: 21190941 DOI: 10.1074/jbc.m110.201178] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
ATP-binding cassette (ABC) systems translocate a wide range of solutes across cellular membranes. The thermophilic gram-negative eubacterium Thermus thermophilus, a model organism for structural genomics and systems biology, discloses ∼46 ABC proteins, which are largely uncharacterized. Here, we functionally analyzed the first two and only ABC half-transporters of the hyperthermophilic bacterium, TmrA and TmrB. The ABC system mediates uptake of the drug Hoechst 33342 in inside-out oriented vesicles that is inhibited by verapamil. TmrA and TmrB form a stable heterodimeric complex hydrolyzing ATP with a K(m) of 0.9 mm and k(cat) of 9 s(-1) at 68 °C. Two nucleotides can be trapped in the heterodimeric ABC complex either by vanadate or by mutation inhibiting ATP hydrolysis. Nucleotide trapping requires permissive temperatures, at which a conformational ATP switch is possible. We further demonstrate that the canonic glutamate 523 of TmrA is essential for rapid conversion of the ATP/ATP-bound complex into its ADP/ATP state, whereas the corresponding aspartate in TmrB (Asp-500) has only a regulatory role. Notably, exchange of this single noncanonic residue into a catalytic glutamate cannot rescue the function of the E523Q/D500E complex, implicating a built-in asymmetry of the complex. However, slow ATP hydrolysis in the newly generated canonic site (D500E) strictly depends on the formation of a posthydrolysis state in the consensus site, indicating an allosteric coupling of both active sites.
Collapse
Affiliation(s)
- Ariane Zutz
- Institute of Biochemistry, Biocenter, Goethe-University Frankfurt, D-60438 Frankfurt, Germany
| | | | | | | | | | | | | |
Collapse
|
66
|
The chromosomal mazEF locus of Streptococcus mutans encodes a functional type II toxin-antitoxin addiction system. J Bacteriol 2010; 193:1122-30. [PMID: 21183668 DOI: 10.1128/jb.01114-10] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Type II chromosomal toxin-antitoxin (TA) modules consist of a pair of genes that encode two components: a stable toxin and a labile antitoxin interfering with the lethal action of the toxin through protein complex formation. Bioinformatic analysis of Streptococcus mutans UA159 genome identified a pair of linked genes encoding a MazEF-like TA. Our results show that S. mutans mazEF genes form a bicistronic operon that is cotranscribed from a σ70-like promoter. Overproduction of S. mutans MazF toxin had a toxic effect on S. mutans which can be neutralized by coexpression of its cognate antitoxin, S. mutans MazE. Although mazF expression inhibited cell growth, no cell lysis of S. mutans cultures was observed under the conditions tested. The MazEF TA is also functional in E. coli, where S. mutans MazF did not kill the cells but rather caused reversible cell growth arrest. Recombinant S. mutans MazE and MazF proteins were purified and were shown to interact with each other in vivo, confirming the nature of this TA as a type II addiction system. Our data indicate that MazF is a toxic nuclease arresting cell growth through the mechanism of RNA cleavage and that MazE inhibits the RNase activity of MazF by forming a complex. Our results suggest that the MazEF TA module might represent a cell growth modulator facilitating the persistence of S. mutans under the harsh conditions of the oral cavity.
Collapse
|
67
|
Abstract
The genomes of most virus species have overlapping genes--two or more proteins coded for by the same nucleotide sequence. Several explanations have been proposed for the evolution of this phenomenon, and we test these by comparing the amount of gene overlap in all known virus species. We conclude that gene overlap is unlikely to have evolved as a way of compressing the genome in response to the harmful effect of mutation because RNA viruses, despite having generally higher mutation rates, have less gene overlap on average than DNA viruses of comparable genome length. However, we do find a negative relationship between overlap proportion and genome length among viruses with icosahedral capsids, but not among those with other capsid types that we consider easier to enlarge in size. Our interpretation is that a physical constraint on genome length by the capsid has led to gene overlap evolving as a mechanism for producing more proteins from the same genome length. We consider that these patterns cannot be explained by other factors, namely the possible roles of overlap in transcription regulation, generating more divergent proteins and the relationship between gene length and genome length.
Collapse
Affiliation(s)
- Nicola Chirico
- Department of Structural and Functional Biology, University of Insubria, Via JH Dunant 3, 21100 Varese, Italy
| | - Alberto Vianelli
- Department of Structural and Functional Biology, University of Insubria, Via JH Dunant 3, 21100 Varese, Italy
| | - Robert Belshaw
- Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK
| |
Collapse
|
68
|
Zhu Z, Cardin CJ, Gan Y, Colquhoun HM. Sequence-selective assembly of tweezer molecules on linear templates enables frameshift-reading of sequence information. Nat Chem 2010; 2:653-60. [DOI: 10.1038/nchem.699] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2010] [Accepted: 05/10/2010] [Indexed: 11/09/2022]
|
69
|
Xu YP, Ye ZP, Niu CY, Bao YY, Wang WB, Shen WD, Zhang CX. Comparative analysis of the genomes of Bombyx mandarina and Bombyx mori nucleopolyhedroviruses. J Microbiol 2010; 48:102-10. [PMID: 20221737 DOI: 10.1007/s12275-009-0197-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2009] [Accepted: 08/03/2009] [Indexed: 11/24/2022]
Abstract
The Bombyx mandarina nucleopolyhedrovirus (BomaNPV) S1 strain can infect the silkworm, Bombyx mori, but is significantly less virulent than B. mori nucleopolyhedrovirus (BmNPV) T3 strain. The complete nucleotide sequence of the S1 strain of BomaNPV was determined and compared with the BmNPV T3 strain. The circular, double stranded DNA genome of the S1 strain was 126,770 nucleotides long (GenBank accession no. FJ882854), with a G+C content of 40.23%. The genome contained 133 potential ORFs. Most of the putative proteins were more than 96% identical to homologs in the BmNPV T3 strain, except for bro-a, lef-12, bro-c, and bro-d. Compared with the BmNPV T3 strain, however, this genome did not encode the bro-b and bro-e genes. In addition, hr1 lacked two repeat units, while hr2L, hr2R, hr3, hr4L, hr4R, and hr5 were similar to the corresponding hrs in the T3 strain. The sequence strongly suggested that BomaNPV and BmNPV are variants with each other, and supported the idea that baculovirus strain heterogeneity may often be caused by variation in the hrs and bro genes.
Collapse
Affiliation(s)
- Yi-Peng Xu
- Ministry of Agriculture Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, P R China
| | | | | | | | | | | | | |
Collapse
|
70
|
Cheng CH, Yang CH, Chiu HT, Lu CL. Reconstructing genome trees of prokaryotes using overlapping genes. BMC Bioinformatics 2010; 11:102. [PMID: 20181237 PMCID: PMC2845580 DOI: 10.1186/1471-2105-11-102] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2009] [Accepted: 02/24/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapping genes (OGs) are defined as adjacent genes whose coding sequences overlap partially or entirely. In fact, they are ubiquitous in microbial genomes and more conserved between species than non-overlapping genes. Based on this property, we have previously implemented a web server, named OGtree, that allows the user to reconstruct genome trees of some prokaryotes according to their pairwise OG distances. By analogy to the analyses of gene content and gene order, the OG distance between two genomes we defined was based on a measure of combining OG content (i.e., the normalized number of shared orthologous OG pairs) and OG order (i.e., the normalized OG breakpoint distance) in their whole genomes. A shortcoming of using the concept of breakpoints to define the OG distance is its inability to analyze the OG distance of multi-chromosomal genomes. In addition, the amount of overlapping coding sequences between some distantly related prokaryotic genomes may be limited so that it is hard to find enough OGs to properly evaluate their pairwise OG distances. RESULTS In this study, we therefore define a new OG order distance that is based on more biologically accurate rearrangements (e.g., reversals, transpositions and translocations) rather than breakpoints and that is applicable to both uni-chromosomal and multi-chromosomal genomes. In addition, we expand the term "gene" to include both its coding sequence and regulatory regions so that two adjacent genes whose coding sequences or regulatory regions overlap with each other are considered as a pair of overlapping genes. This is because overlapping of regulatory regions of distinct genes suggests that the regulation of expression for these genes should be more or less interrelated. Based on these modifications, we have reimplemented our OGtree as a new web server, named OGtree2, and have also evaluated its accuracy of genome tree reconstruction on a testing dataset consisting of 21 Proteobacteria genomes. Our experimental results have finally shown that our current OGtree2 indeed outperforms its previous version OGtree, as well as another similar server, called BPhyOG, significantly in the quality of genome tree reconstruction, because the phylogenetic tree obtained by OGtree2 is greatly congruent with the reference tree that coincides with the taxonomy accepted by biologists for these Proteobacteria. CONCLUSIONS In this study, we have introduced a new web server OGtree2 at http://bioalgorithm.life.nctu.edu.tw/OGtree2.0/ that can serve as a useful tool for reconstructing more precise and robust genome trees of prokaryotes according to their overlapping genes.
Collapse
Affiliation(s)
- Chih-Hsien Cheng
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Chung-Han Yang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Hsien-Tai Chiu
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Chin Lung Lu
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
71
|
|
72
|
Genes involved in yellow pigmentation of Cronobacter sakazakii ES5 and influence of pigmentation on persistence and growth under environmental stress. Appl Environ Microbiol 2009; 76:1053-61. [PMID: 20038705 DOI: 10.1128/aem.01420-09] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Cronobacter spp. are opportunistic food-borne pathogens that are responsible for rare but highly fatal cases of meningitis and necrotizing enterocolitis in neonates. While the operon responsible for yellow pigmentation in Cronobacter sakazakii strain ES5 was described recently, the involvement of additional genes in pigment expression and the influence of pigmentation on the fitness of Cronobacter spp. have not been investigated. Thus, the aim of this study was to identify further genes involved in pigment expression in Cronobacter sakazakii ES5 and to assess the influence of pigmentation on growth and persistence under conditions of environmental stress. A knockout library was created using random transposon mutagenesis. The screening of 9,500 mutants for decreased pigment production identified 30 colorless mutants. The mapping of transposon insertion sites revealed insertions in not only the carotenoid operon but also in various other genes involved in signal transduction, inorganic ions, and energy metabolism. To determine the effect of pigmentation on fitness, colorless mutants (DeltacrtE, DeltacrtX, and DeltacrtY) were compared to the yellow wild type using growth and inactivation experiments, a macrophage assay, and a phenotype array. Among other findings, the colorless mutants grew at significantly increased rates under osmotic stress compared to that of the yellow wild type while showing increased susceptibility to desiccation. Moreover, DeltacrtE and DeltacrtY exhibited increased sensitivity to UVB irradiation.
Collapse
|
73
|
Kim W, Silby MW, Purvine SO, Nicoll JS, Hixson KK, Monroe M, Nicora CD, Lipton MS, Levy SB. Proteomic detection of non-annotated protein-coding genes in Pseudomonas fluorescens Pf0-1. PLoS One 2009; 4:e8455. [PMID: 20041161 PMCID: PMC2794547 DOI: 10.1371/journal.pone.0008455] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 12/02/2009] [Indexed: 11/18/2022] Open
Abstract
Genome sequences are annotated by computational prediction of coding sequences, followed by similarity searches such as BLAST, which provide a layer of possible functional information. While the existence of processes such as alternative splicing complicates matters for eukaryote genomes, the view of bacterial genomes as a linear series of closely spaced genes leads to the assumption that computational annotations that predict such arrangements completely describe the coding capacity of bacterial genomes. We undertook a proteomic study to identify proteins expressed by Pseudomonas fluorescens Pf0-1 from genes that were not predicted during the genome annotation. Mapping peptides to the Pf0-1 genome sequence identified sixteen non-annotated protein-coding regions, of which nine were antisense to predicted genes, six were intergenic, and one read in the same direction as an annotated gene but in a different frame. The expression of all but one of the newly discovered genes was verified by RT-PCR. Few clues as to the function of the new genes were gleaned from informatic analyses, but potential orthologs in other Pseudomonas genomes were identified for eight of the new genes. The 16 newly identified genes improve the quality of the Pf0-1 genome annotation, and the detection of antisense protein-coding genes indicates the under-appreciated complexity of bacterial genome organization.
Collapse
Affiliation(s)
- Wook Kim
- Center for Adaptation Genetics and Drug Resistance and Department of Molecular Biology and Microbiology, Tufts University School of Medicine, Boston, Massachusetts, United States of America
| | - Mark W. Silby
- Center for Adaptation Genetics and Drug Resistance and Department of Molecular Biology and Microbiology, Tufts University School of Medicine, Boston, Massachusetts, United States of America
| | - Sam O. Purvine
- Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Julie S. Nicoll
- Center for Adaptation Genetics and Drug Resistance and Department of Molecular Biology and Microbiology, Tufts University School of Medicine, Boston, Massachusetts, United States of America
| | - Kim K. Hixson
- Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Matt Monroe
- Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Carrie D. Nicora
- Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Mary S. Lipton
- Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Stuart B. Levy
- Center for Adaptation Genetics and Drug Resistance and Department of Molecular Biology and Microbiology, Tufts University School of Medicine, Boston, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
74
|
Cock PJA, Whitworth DE. Evolution of relative reading frame bias in unidirectional prokaryotic gene overlaps. Mol Biol Evol 2009; 27:753-6. [PMID: 20008458 DOI: 10.1093/molbev/msp302] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Pairs of unidirectional (same strand) genes can overlap in one of two phases (relative reading frames). There is a striking bias in the relative abundance of prokaryotic gene overlaps in the two possible phases. A simple model is presented based on unidirectional gene overlaps evolving from nonoverlapping gene pairs, through the adoption of alternative start codons by the downstream genes. Potential alternative start codons within upstream gene sequences were found to occur at greater frequencies in one phase, corresponding to the most prevalent phase of gene overlaps. We therefore suggest that the phase bias of overlapping genes is primarily a consequence of the N-terminal extension of downstream genes through adoption of new start codons.
Collapse
|
75
|
Pallejà A, García-Vallvé S, Romeu A. Adaptation of the short intergenic spacers between co-directional genes to the Shine-Dalgarno motif among prokaryote genomes. BMC Genomics 2009; 10:537. [PMID: 19922619 PMCID: PMC2784483 DOI: 10.1186/1471-2164-10-537] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2009] [Accepted: 11/18/2009] [Indexed: 11/30/2022] Open
Abstract
Background In prokaryote genomes most of the co-directional genes are in close proximity. Even the coding sequence or the stop codon of a gene can overlap with the Shine-Dalgarno (SD) sequence of the downstream co-directional gene. In this paper we analyze how the presence of SD may influence the stop codon usage or the spacing lengths between co-directional genes. Results The SD sequences for 530 prokaryote genomes have been predicted using computer calculations of the base-pairing free energy between translation initiation regions and the 16S rRNA 3' tail. Genomes with a large number of genes with the SD sequence concentrate this regulatory motif from 4 to 11 bps before the start codon. However, not all genes seem to have the SD sequence. Genes separated from 1 to 4 bps from a co-directional upstream gene show a high SD presence, though this regulatory signal is located towards the 3' end of the coding sequence of the upstream gene. Genes separated from 9 to 15 bps show the highest SD presence as they accommodate the SD sequence within an intergenic region. However, genes separated from around 5 to 8 bps have a lower percentage of SD presence and when the SD is present, the stop codon usage of the upstream gene changes to accommodate the overlap between the SD sequence and the stop codon. Conclusion The SD presence makes the intergenic lengths from 5 to 8 bps less frequent and causes an adaptation of the stop codon usage. Our results introduce new elements to the discussion of which factors affect the intergenic lengths, which cannot be totally explained by the pressure to compact the prokaryote genomes.
Collapse
Affiliation(s)
- Albert Pallejà
- Department of Biochemistry and Biotechnology, Rovira i Virgili University, Tarragona, Catalonia, Spain.
| | | | | |
Collapse
|
76
|
Grinchuk OV, Jenjaroenpun P, Orlov YL, Zhou J, Kuznetsov VA. Integrative analysis of the human cis-antisense gene pairs, miRNAs and their transcription regulation patterns. Nucleic Acids Res 2009; 38:534-47. [PMID: 19906709 PMCID: PMC2811022 DOI: 10.1093/nar/gkp954] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Cis-antisense gene pairs (CASGPs) can transcribe mRNAs from an opposite strand of a given locus. To classify and understand diverse CASGP phenomena in the human we compiled a genome-wide catalog of CASGPs and integrated these sequences with microarray, SAGE and miRNA data. Using the concept of overlapping regions and clustering of SA transcripts by chromosome coordinates, we identified up to 9000 overlapping antisense loci. Four thousand three hundred and seventy-four of these CASGPs form 1759 complex gene architectures. We found that ∼35% (6347/18160) of RefSeq genes are overlapped with the antisense transcripts. About 30% of Affymetrix U133 microarray initial sequences map transcripts of ∼35% CASGPs and reveal mostly concordant expression in CASGPs. We found strong significant overrepresentation of human miRNA genes in loci of CASGPs. We developed a data-driven model of cross-talk between co-expressed CASGPs and DICER1-mediated miRNA pathway in normal spermatogenesis and in severe teratozoospermia. Specifically, we revealed complex SA structural–functional gene module composing the protein-coding genes, WDR6, DALRD3, NDUFAF3 and ncRNA precursors, mir-425 and mir-191, which could provide downregulation of ncRNA pathway via direct targeting DICER1 and basonuclin 2 transcripts by mir-425 and mir-191 in normal spermatogenesis, but this mechanism is switched off in severe teratozoospermia. The database is available from http://globalisland.bii.a-star.edu.sg/∼jiangtao/sas/index3.php?link =about
Collapse
Affiliation(s)
- Oleg V Grinchuk
- Bioinformatics Institute, 30 Biopolis Street #07-01, Singapore 138672, Singapore
| | | | | | | | | |
Collapse
|
77
|
Pallejà A, Reverter T, Garcia-Vallvé S, Romeu A. PairWise Neighbours database: overlaps and spacers among prokaryote genomes. BMC Genomics 2009; 10:281. [PMID: 19555467 PMCID: PMC2716372 DOI: 10.1186/1471-2164-10-281] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2009] [Accepted: 06/25/2009] [Indexed: 05/25/2023] Open
Abstract
Background Although prokaryotes live in a variety of habitats and possess different metabolic and genomic complexity, they have several genomic architectural features in common. The overlapping genes are a common feature of the prokaryote genomes. The overlapping lengths tend to be short because as the overlaps become longer they have more risk of deleterious mutations. The spacers between genes tend to be short too because of the tendency to reduce the non coding DNA among prokaryotes. However they must be long enough to maintain essential regulatory signals such as the Shine-Dalgarno (SD) sequence, which is responsible of an efficient translation. Description PairWise Neighbours is an interactive and intuitive database used for retrieving information about the spacers and overlapping genes among bacterial and archaeal genomes. It contains 1,956,294 gene pairs from 678 fully sequenced prokaryote genomes and is freely available at the URL . This database provides information about the overlaps and their conservation across species. Furthermore, it allows the wide analysis of the intergenic regions providing useful information such as the location and strength of the SD sequence. Conclusion There are experiments and bioinformatic analysis that rely on correct annotations of the initiation site. Therefore, a database that studies the overlaps and spacers among prokaryotes appears to be desirable. PairWise Neighbours database permits the reliability analysis of the overlapping structures and the study of the SD presence and location among the adjacent genes, which may help to check the annotation of the initiation sites.
Collapse
Affiliation(s)
- Albert Pallejà
- Department of Biochemistry and Biotechnology, Rovira i Virgili University, Tarragona, Catalunya, Spain.
| | | | | | | |
Collapse
|
78
|
Singh TR, Pardasani KR. Ambush hypothesis revisited: Evidences for phylogenetic trends. Comput Biol Chem 2009; 33:239-44. [PMID: 19473880 DOI: 10.1016/j.compbiolchem.2009.04.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2008] [Revised: 04/15/2009] [Accepted: 04/23/2009] [Indexed: 10/20/2022]
Abstract
Recoding events occur in competition with standard readout of the transcript, and are site-specific. Recoding is the reprogramming of mRNA translation by localized alterations in the standard translational rules. Frame-shifting is one class of recoding and defined as protein translations that start not at the first, but either at the second (+1 frame-shift) or the third (-1 frame-shift) nucleotide of the codon. Coding sequences lack stop codons, but frame-shifted sequences contain many stop codons, termed off-frame stops or hidden stops. These hidden stops terminate frame-shifted translation, potentially decreasing energy, and resource waste on non-functional proteins. Our results support this putative ancient adaptive event for the selection of codons that can be part of hidden stop codons. All taxonomic groups represent positive correlation between codon usage frequencies and contribution of codons to hidden stops in off-frame context. Our analysis on nuclear and mitochondrial genomic data revealed phylogenomic selection of ambush mechanism. Strongest impact of this event was found in viruses and bacteria. It has been suggested that this mechanism has occurred and been utilized in the early stages of evolution.
Collapse
Affiliation(s)
- Tiratha Raj Singh
- Department of Zoology, Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel.
| | | |
Collapse
|
79
|
Torres C, Galián C, Freiberg C, Fantino JR, Jault JM. The YheI/YheH heterodimer from Bacillus subtilis is a multidrug ABC transporter. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2009; 1788:615-22. [DOI: 10.1016/j.bbamem.2008.12.012] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2008] [Revised: 12/12/2008] [Accepted: 12/22/2008] [Indexed: 12/12/2022]
|
80
|
Abstract
Marine eukaryotic photosynthesis is dominated by a diverse group of unicellular organisms collectively called microalgae. Microalgae include cells derived from a primary endosymbiotic event (similar to land plants) and cells derived from subsequent secondary and/or tertiary endosymbiotic events. These latter cells are chimeras of several genomes and dominate primary production in the marine environment. Two consequences of multiple endosymbiotic events include complex targeting mechanisms to allow nuclear-encoded proteins to be imported into the plastid and coordination of enzymes, potentially from disparate originator cells, to form complete metabolic pathways. In this review, we discuss the forces that shaped the genomes of marine microalgae and then discuss some of the metabolic consequences of such a complex evolutionary history. We focus our metabolic discussion on carbon, nitrogen, and iron. We then discuss biomineralization and new evidence for programmed cell death in microalgae. We conclude with a short summary on advances in genetic manipulation of microalgae and thoughts on the future directions of marine algal genomics.
Collapse
Affiliation(s)
- Micaela S Parker
- School of Oceanography, University of Washington, Seattle, Washington 98195, USA.
| | | | | |
Collapse
|
81
|
Lin GN, Cai Z, Lin G, Chakraborty S, Xu D. ComPhy: prokaryotic composite distance phylogenies inferred from whole-genome gene sets. BMC Bioinformatics 2009; 10 Suppl 1:S5. [PMID: 19208152 PMCID: PMC2648732 DOI: 10.1186/1471-2105-10-s1-s5] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Background With the increasing availability of whole genome sequences, it is becoming more and more important to use complete genome sequences for inferring species phylogenies. We developed a new tool ComPhy, 'Composite Distance Phylogeny', based on a composite distance matrix calculated from the comparison of complete gene sets between genome pairs to produce a prokaryotic phylogeny. Results The composite distance between two genomes is defined by three components: Gene Dispersion Distance (GDD), Genome Breakpoint Distance (GBD) and Gene Content Distance (GCD). GDD quantifies the dispersion of orthologous genes along the genomic coordinates from one genome to another; GBD measures the shared breakpoints between two genomes; GCD measures the level of shared orthologs between two genomes. The phylogenetic tree is constructed from the composite distance matrix using a neighbor joining method. We tested our method on 9 datasets from 398 completely sequenced prokaryotic genomes. We have achieved above 90% agreement in quartet topologies between the tree created by our method and the tree from the Bergey's taxonomy. In comparison to several other phylogenetic analysis methods, our method showed consistently better performance. Conclusion ComPhy is a fast and robust tool for genome-wide inference of evolutionary relationship among genomes. It can be downloaded from .
Collapse
Affiliation(s)
- Guan Ning Lin
- Digital Biology Laboratory, Informatics Institute, Computer Science Department and Christopher S, Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA.
| | | | | | | | | |
Collapse
|
82
|
Warren AS, Setubal JC. The Genome Reverse Compiler: an explorative annotation tool. BMC Bioinformatics 2009; 10:35. [PMID: 19173744 PMCID: PMC2640359 DOI: 10.1186/1471-2105-10-35] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2008] [Accepted: 01/27/2009] [Indexed: 11/19/2022] Open
Abstract
Background As sequencing costs have decreased, whole genome sequencing has become a viable and integral part of biological laboratory research. However, the tools with which genes can be found and functionally characterized have not been readily adapted to be part of the everyday biological sciences toolkit. Most annotation pipelines remain as a service provided by large institutions or come as an unwieldy conglomerate of independent components, each requiring their own setup and maintenance. Results To address this issue we have created the Genome Reverse Compiler, an easy-to-use, open-source, automated annotation tool. The GRC is independent of third party software installs and only requires a Linux operating system. This stands in contrast to most annotation packages, which typically require installation of relational databases, sequence similarity software, and a number of other programming language modules. We provide details on the methodology used by GRC and evaluate its performance on several groups of prokaryotes using GRC's built in comparison module. Conclusion Traditionally, to perform whole genome annotation a user would either set up a pipeline or take advantage of an online service. With GRC the user need only provide the genome he or she wants to annotate and the function resource files to use. The result is high usability and a very minimal learning curve for the intended audience of life science researchers and bioinformaticians. We believe that the GRC fills a valuable niche in allowing users to perform explorative, whole-genome annotation.
Collapse
Affiliation(s)
- Andrew S Warren
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA.
| | | |
Collapse
|
83
|
Han D, Krauss G. Characterization of the endonuclease SSO2001 fromSulfolobus solfataricusP2. FEBS Lett 2009; 583:771-6. [DOI: 10.1016/j.febslet.2009.01.024] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2008] [Revised: 01/16/2009] [Accepted: 01/16/2009] [Indexed: 10/21/2022]
|
84
|
Sabath N, Landan G, Graur D. A method for the simultaneous estimation of selection intensities in overlapping genes. PLoS One 2008; 3:e3996. [PMID: 19098983 PMCID: PMC2601044 DOI: 10.1371/journal.pone.0003996] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2008] [Accepted: 11/21/2008] [Indexed: 11/18/2022] Open
Abstract
Inferring the intensity of positive selection in protein-coding genes is important since it is used to shed light on the process of adaptation. Recently, it has been reported that overlapping genes, which are ubiquitous in all domains of life, seem to exhibit inordinate degrees of positive selection. Here, we present a new method for the simultaneous estimation of selection intensities in overlapping genes. We show that the appearance of positive selection is caused by assuming that selection operates independently on each gene in an overlapping pair, thereby ignoring the unique evolutionary constraints on overlapping coding regions. Our method uses an exact evolutionary model, thereby voiding the need for approximation or intensive computation. We test the method by simulating the evolution of overlapping genes of different types as well as under diverse evolutionary scenarios. Our results indicate that the independent estimation approach leads to the false appearance of positive selection even though the gene is in reality subject to negative selection. Finally, we use our method to estimate selection in two influenza A genes for which positive selection was previously inferred. We find no evidence for positive selection in both cases.
Collapse
Affiliation(s)
- Niv Sabath
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, United States of America.
| | | | | |
Collapse
|
85
|
Sabath N, Graur D, Landan G. Same-strand overlapping genes in bacteria: compositional determinants of phase bias. Biol Direct 2008; 3:36. [PMID: 18717987 PMCID: PMC2542354 DOI: 10.1186/1745-6150-3-36] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2008] [Accepted: 08/21/2008] [Indexed: 11/24/2022] Open
Abstract
Background Same-strand overlapping genes may occur in frameshifts of one (phase 1) or two nucleotides (phase 2). In previous studies of bacterial genomes, long phase-1 overlaps were found to be more numerous than long phase-2 overlaps. This bias was explained by either genomic location or an unspecified selection advantage. Models that focused on the ability of the two genes to evolve independently did not predict this phase bias. Here, we propose that a purely compositional model explains the phase bias in a more parsimonious manner. Same-strand overlapping genes may arise through either a mutation at the termination codon of the upstream gene or a mutation at the initiation codon of the downstream gene. We hypothesized that given these two scenarios, the frequencies of initiation and termination codons in the two phases may determine the number for overlapping genes. Results We examined the frequencies of initiation- and termination-codons in the two phases, and found that termination codons do not significantly differ between the two phases, whereas initiation codons are more abundant in phase 1. We found that the primary factors explaining the phase inequality are the frequencies of amino acids whose codons may combine to form start codons in the two phases. We show that the frequencies of start codons in each of the two phases, and, hence, the potential for the creation of overlapping genes, are determined by a universal amino-acid frequency and species-specific codon usage, leading to a correlation between long phase-1 overlaps and genomic GC content. Conclusion Our model explains the phase bias in same-strand overlapping genes by compositional factors without invoking selection. Therefore, it can be used as a null model of neutral evolution to test selection hypotheses concerning the evolution of overlapping genes. Reviewers This article was reviewed by Bill Martin, Itai Yanai, and Mikhail Gelfand.
Collapse
Affiliation(s)
- Niv Sabath
- Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA.
| | | | | |
Collapse
|
86
|
Pallejà A, Harrington ED, Bork P. Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? BMC Genomics 2008; 9:335. [PMID: 18627618 PMCID: PMC2478687 DOI: 10.1186/1471-2164-9-335] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2008] [Accepted: 07/15/2008] [Indexed: 11/20/2022] Open
Abstract
Background Across the fully sequenced microbial genomes there are thousands of examples of overlapping genes. Many of these are only a few nucleotides long and are thought to function by permitting the coordinated regulation of gene expression. However, there should also be selective pressure against long overlaps, as the existence of overlapping reading frames increases the risk of deleterious mutations. Here we examine the longest overlaps and assess whether they are the product of special functional constraints or of erroneous annotation. Results We analysed the genes that overlap by 60 bps or more among 338 fully-sequenced prokaryotic genomes. The likely functional significance of an overlap was determined by comparing each of the genes to its respective orthologs. If a gene showed a significantly different length from its orthologs it was considered unlikely to be functional and therefore the result of an error either in sequencing or gene prediction. Focusing on 715 co-directional overlaps longer than 60 bps, we classified the erroneous ones into five categories: i) 5'-end extension of the downstream gene due to either a mispredicted start codon or a frameshift at 5'-end of the gene (409 overlaps), ii) fragmentation of a gene caused by a frameshift (163), iii) 3'-end extension of the upstream gene due to either a frameshift at 3'-end of a gene or point mutation at the stop codon (68), iv) Redundant gene predictions (4), v) 5' & 3'-end extension which is a combination of i) and iii) (71). We also studied 75 divergent overlaps that could be classified as misannotations of group i). Nevertheless we found some convergent long overlaps (54) that might be true overlaps, although an important part of convergent overlaps could be classified as group iii) (124). Conclusion Among the 968 overlaps larger than 60 bps which we analysed, we did not find a single real one among the co-directional and divergent orientations and concluded that there had been an excessive number of misannotations. Only convergent orientation seems to permit some long overlaps, although convergent overlaps are also hampered by misannotations. We propose a simple rule to flag these erroneous gene length predictions to facilitate automatic annotation.
Collapse
Affiliation(s)
- Albert Pallejà
- Biochemistry and Biotechnology Department, Rovira i Virgili University, C/Marcel.lí Domingo s/n, 43007 Tarragona, Catalunya, Spain.
| | | | | |
Collapse
|
87
|
Jiang LW, Lin KL, Lu CL. OGtree: a tool for creating genome trees of prokaryotes based on overlapping genes. Nucleic Acids Res 2008; 36:W475-80. [PMID: 18456706 PMCID: PMC2447762 DOI: 10.1093/nar/gkn240] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
OGtree is a web-based tool for constructing genome trees of prokaryotic species based on a measure of combining overlapping-gene content and overlapping-gene order in their whole genomes. The overlapping genes (OGs) are defined as adjacent genes whose coding sequences overlap partially or entirely. In fact, OGs are ubiquitous in microbial genomes and more conserved between species than non-OGs. Based on these properties, it has been suggested that OGs can serve as better phylogenetic characters than non-OGs for reconstructing the evolutionary relationships among microbial genomes. OGtree takes the accession numbers of prokaryotic genomes as its input. It then downloads their complete genomes from the National Centre for Biotechnology Information and identifies OGs in each genome and their orthologous OGs in other genomes. Next, OGtree computes an overlapping-gene distance between each pair of input genomes based on a combination of their OG content and orthologous OG order. Finally, it utilizes distance-based methods of building tree to reconstruct the genome trees of input prokaryotic genomes according to their pairwise OG distance. OGtree is available online at http://bioalgorithm.life.nctu.edu.tw/OGtree/.
Collapse
Affiliation(s)
- Li-Wei Jiang
- Institute of Bioinformatics and Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
| | | | | |
Collapse
|
88
|
Sanna CR, Li WH, Zhang L. Overlapping genes in the human and mouse genomes. BMC Genomics 2008; 9:169. [PMID: 18410680 PMCID: PMC2335118 DOI: 10.1186/1471-2164-9-169] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2007] [Accepted: 04/14/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Increasing evidence suggests that overlapping genes are much more common in eukaryotic genomes than previously thought. In this study we identified and characterized the overlapping genes in a set of 13,484 pairs of human-mouse orthologous genes. RESULTS About 10% of the genes under study are overlapping genes, the majority of which are different-strand overlaps. The majority of the same-strand overlaps are embedded forms, whereas most different-strand overlaps are not embedded and in the convergent transcription orientation. Most of the same-strand overlapping gene pairs show at least a tenfold difference in length, much larger than the length difference between non-overlapping neighboring gene pairs. The length difference between the two different-strand overlapping genes is less dramatic. Over 27% of the different-strand-overlap relationships are shared between human and mouse, compared to only approximately 8% conservation for same-strand-overlap relationships. More than 96% of the same-strand and different-strand overlaps that are not shared between human and mouse have both genes located on the same chromosomes in the species that does not show the overlap. We examined the causes of transition between the overlapping and non-overlapping states in the two species and found that 3' UTR change plays an important role in the transition. CONCLUSION Our study contributes to the understanding of the evolutionary transition between overlapping genes and non-overlapping genes and demonstrates the high rates of evolutionary changes in the un-translated regions.
Collapse
|
89
|
Analysis of a novel spore antigen in Bacillus anthracis that contributes to spore opsonization. Microbiology (Reading) 2008; 154:619-632. [DOI: 10.1099/mic.0.2007/008292-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
90
|
Delaye L, Deluna A, Lazcano A, Becerra A. The origin of a novel gene through overprinting in Escherichia coli. BMC Evol Biol 2008; 8:31. [PMID: 18226237 PMCID: PMC2268670 DOI: 10.1186/1471-2148-8-31] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2007] [Accepted: 01/28/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapped genes originate by a) loss of a stop codon among contiguous genes coded in different frames; b) shift to an upstream initiation codon of one of the contiguous genes; or c) by overprinting, whereby a novel open reading frame originates through point mutation inside an existing gene. Although overlapped genes are common in viruses, it is not clear whether overprinting has led to new genes in prokaryotes. RESULTS Here we report the origin of a new gene through overprinting in Escherichia coli K12. The htgA gene coding for a positive regulator of the sigma 32 heat shock promoter arose by point mutation in a 123/213 phase within an open reading frame (yaaW) of unknown function, most likely in the lineage leading to E. coli and Shigella sp. Further, we show that yaaW sequences coding for htgA genes have a slower evolutionary rate than those lacking an overlapped htgA gene. CONCLUSION While overprinting has been shown to be rather frequent in the evolution of new genes in viruses, our results suggest that this mechanism has also contributed to the origin of a novel gene in a prokaryote. We propose the term janolog (from Jano, the two-faced Roman god) to describe the homology relationship that holds between two genes when one originated through overprinting of the other. One cannot dismiss the possibility that at least a small fraction of the large number of novel ORPhan genes detected in pan-genome and metagenomic studies arose by overprinting.
Collapse
Affiliation(s)
- Luis Delaye
- Facultad de Ciencias, Universidad Nacional Autónoma de México, Apdo. Postal 70-407, Cd. Universitaria, 04510 México DF, México.
| | | | | | | |
Collapse
|
91
|
McCauley S, de Groot S, Mailund T, Hein J. Annotation of selection strengths in viral genomes. ACTA ACUST UNITED AC 2007; 23:2978-86. [PMID: 17921171 DOI: 10.1093/bioinformatics/btm472] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Viral genomes tend to code in overlapping reading frames to maximize informational content. This may result in atypical codon bias and particular evolutionary constraints. Due to the fast mutation rate of viruses, there is additional strong evidence for varying selection between intra- and intergenomic regions. The presence of multiple coding regions complicates the concept of K(a)/K(s) ratio, and thus begs for an alternative approach when investigating selection strengths. Building on the paper by McCauley and Hein, we develop a method for annotating a viral genome coding in overlapping reading frames. We introduce an evolutionary model capable of accounting for varying levels of selection along the genome, and incorporate it into our prior single sequence HMM methodology, extending it now to a phylogenetic HMM. Given an alignment of several homologous viruses to a reference sequence, we may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses. RESULTS We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as of three Hepatitis B sequences. We obtain an annotation of the coding regions, as well as a posterior probability for each site of the strength of selection acting on it. From this we may deduce the average posterior selection acting on the different genes. Whilst we are encouraged to see in HIV2, that the known to be conserved genes gag and pol are indeed annotated as such, we also discover several sites of less stringent negative selection within the env gene. To the best of our knowledge, we are the first to subsequently provide a full selection annotation of the Hepatitis B genome by explicitly modelling the evolution within overlapping reading frames, and not relying on simple K(a)/K(s) ratios.
Collapse
Affiliation(s)
- Stephen McCauley
- Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG, UK
| | | | | | | |
Collapse
|
92
|
Lillo F, Krakauer DC. A statistical analysis of the three-fold evolution of genomic compression through frame overlaps in prokaryotes. Biol Direct 2007; 2:22. [PMID: 17877818 PMCID: PMC2174442 DOI: 10.1186/1745-6150-2-22] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2007] [Accepted: 09/18/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Among microbial genomes, genetic information is frequently compressed, exploiting redundancies in the genetic code in order to store information in overlapping genes. We investigate the length, phase and orientation properties of overlap in 58 prokaryotic species evaluating neutral and selective mechanisms of evolution. RESULTS Using a variety of statistical null models we find patterns of compressive coding that can not be explained purely in terms of the selective processes favoring genome minimization or translational coupling. The distribution of overlap lengths follows a fat-tailed distribution, in which a significant proportion of overlaps are in excess of 100 base pairs in length. The phase of overlap--pairing of codon positions in complementary reading frames--is strongly predicted by the translation orientation of each gene. We find that as overlapping genes become longer, they have a tendency to alternate among alternative overlap phases. Some phases seem to reflect codon pairings reducing the probability of non-synonymous substitution. We analyze the lineage-dependent features of overlapping genes by tracing a number of different continuous characters through the prokaryotic phylogeny using squared-change parsimony and observe both clade-specific and species-specific patterns. CONCLUSION Overlapping reading frames preserve in their structure, features relating to mutational origination of new genes, but have undergone modification for both immediate benefits and for variational buffering and amplification. Genomes come under a variety of different mutational and selectional pressures, and the structure of redundancies in overlapping genes can be used to detect these pressures. No single mechanism is able to account for all the variability observed among the set of prokaryotic overlapping genes but a three-fold analysis of evolutionary events provides a more integrative framework.
Collapse
Affiliation(s)
- Fabrizio Lillo
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
- Dipartimento di Fisicae Tecnologie Relative, Università di Palermo, Viale delle Scienze, I-90128 Palermo, Italy
| | - David C Krakauer
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
| |
Collapse
|
93
|
Belshaw R, Pybus OG, Rambaut A. The evolution of genome compression and genomic novelty in RNA viruses. Genome Res 2007; 17:1496-504. [PMID: 17785537 PMCID: PMC1987338 DOI: 10.1101/gr.6305707] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The genomes of RNA viruses are characterized by their extremely small size and extremely high mutation rates (typically 10 kb and 10(-4)/base/replication cycle, respectively), traits that are thought to be causally linked. One aspect of their small size is the genome compression caused by the use of overlapping genes (where some nucleotides code for two genes). Using a comparative analysis of all known RNA viral species, we show that viruses with larger genomes tend to have less gene overlap. We provide a numerical model to show how a high mutation rate could lead to gene overlap, and we discuss the factors that might explain the observed relationship between gene overlap and genome size. We also propose a model for the evolution of gene overlap based on the co-opting of previously unused ORFs, which gives rise to two types of overlap: (1) the creation of novel genes inside older genes, predominantly via +1 frameshifts, and (2) the incremental increase in overlap between originally contiguous genes, with no frameshift preference. Both types of overlap are viewed as the creation of genomic novelty under pressure for genome compression. Simulations based on our model generate the empirical size distributions of overlaps and explain the observed frameshift preferences. We suggest that RNA viruses are a good model system for the investigation of general evolutionary relationship between genome attributes such as mutational robustness, mutation rate, and size.
Collapse
Affiliation(s)
- Robert Belshaw
- Department of Zoology, University of Oxford, Oxford OX1 3PS, United Kingdom.
| | | | | |
Collapse
|
94
|
BPhyOG: an interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes. BMC Bioinformatics 2007; 8:266. [PMID: 17650344 PMCID: PMC1940028 DOI: 10.1186/1471-2105-8-266] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2006] [Accepted: 07/25/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapping genes (OGs) in bacterial genomes are pairs of adjacent genes of which the coding sequences overlap partly or entirely. With the rapid accumulation of sequence data, many OGs in bacterial genomes have now been identified. Indeed, these might prove a consistent feature across all microbial genomes. Our previous work suggests that OGs can be considered as robust markers at the whole genome level for the construction of phylogenies. An online, interactive web server for inferring phylogenies is needed for biologists to analyze phylogenetic relationships among a set of bacterial genomes of interest. DESCRIPTION BPhyOG is an online interactive server for reconstructing the phylogenies of completely sequenced bacterial genomes on the basis of their shared overlapping genes. It provides two tree-reconstruction methods: Neighbor Joining (NJ) and Unweighted Pair-Group Method using Arithmetic averages (UPGMA). Users can apply the desired method to generate phylogenetic trees, which are based on an evolutionary distance matrix for the selected genomes. The distance between two genomes is defined by the normalized number of their shared OG pairs. BPhyOG also allows users to browse the OGs that were used to infer the phylogenetic relationships. It provides detailed annotation for each OG pair and the features of the component genes through hyperlinks. Users can also retrieve each of the homologous OG pairs that have been determined among 177 genomes. It is a useful tool for analyzing the tree of life and overlapping genes from a genomic standpoint. CONCLUSION BPhyOG is a useful interactive web server for genome-wide inference of any potential evolutionary relationship among the genomes selected by users. It currently includes 177 completely sequenced bacterial genomes containing 79,855 OG pairs, the annotation and homologous OG pairs of which are integrated comprehensively. The reliability of phylogenies complemented by annotations make BPhyOG a powerful web server for genomic and genetic studies. It is freely available at http://cmb.bnu.edu.cn/BPhyOG.
Collapse
|
95
|
Kingsford C, Delcher AL, Salzberg SL. A unified model explaining the offsets of overlapping and near-overlapping prokaryotic genes. Mol Biol Evol 2007; 24:2091-8. [PMID: 17642473 PMCID: PMC2429982 DOI: 10.1093/molbev/msm145] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Overlapping genes are a common phenomenon. Among sequenced prokaryotes, more than 29% of all annotated genes overlap at least 1 of their 2 flanking genes. We present a unified model for the creation and repair of overlaps among adjacent genes where the 3' ends either overlap or nearly overlap. Our model, derived from a comprehensive analysis of complete prokaryotic genomes in GenBank, explains the nonuniform distribution of the lengths of such overlap regions far more simply than previously proposed models. Specifically, we explain the distribution of overlap lengths based on random extensions of genes to the next occurring downstream stop codon. Our model also provides an explanation for a newly observed (here) pattern in the distribution of the separation distances of closely spaced nonoverlapping genes. We provide evidence that the newly described biased distribution of separation distances is driven by the same phenomenon that creates the uneven distribution of overlap lengths. This suggests a dynamic picture of continual overlap creation and elimination.
Collapse
Affiliation(s)
- Carl Kingsford
- Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, University of Maryland, USA.
| | | | | |
Collapse
|
96
|
Abstract
Identification of microRNAs (miRNAs) is essential to studying their physiological functions. Due to the difficulties in discovering truly expressed miRNAs from genomic random hairpin secondary structure sequences, it is beneficial to predict them from expressed sequences--expressed sequence tags (ESTs) and intronic sequences. We used a modified scanning pipeline using criteria based on the features of known pre-miRNAs and phylogenetic conservation for predicting intronic miRNAs. Upon examination, we found that 25% of known human miRNAs belong to intronic regions of known protein-coding genes. About 50% of these intronic miRNAs reside in introns whose length is longer than 5,000 bps. It is likely that these intronic miRNAs can have their own independently regulated transcription units, which can be regulated by RNA polymerase II (Pol II) or RNA polymerase III (Pol III). It was recently demonstrated that RNA Pol III could transcribe human miRNAs through associated repetitive elements. Since various repetitive elements are often found to be present in the intronic regions, the distribution of intronic miRNAs and their possible transcription regulation are presented. Although the intronic miRNAs and their host genes could be regulated independently, it is possible that the intronic miRNA can still down-regulate its own host protein-coding gene by targeting the untranslated region (UTR) of the host gene. Another biological implication is that intronic miRNAs could play an important role as negative feedback regulators. We propose hypothetical models of such feedback regulation on host protein-coding genes by selecting the transcription factors as miRNA targets or by protein-protein interactions between intronic miRNA host gene product and miRNA target gene products.
Collapse
Affiliation(s)
- Sung-Chou Li
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan, Republic of China
| | | | | |
Collapse
|
97
|
Jothi R, Przytycka TM, Aravind L. Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment. BMC Bioinformatics 2007; 8:173. [PMID: 17521444 PMCID: PMC1904249 DOI: 10.1186/1471-2105-8-173] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2007] [Accepted: 05/23/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A widely-used approach for discovering functional and physical interactions among proteins involves phylogenetic profile comparisons (PPCs). Here, proteins with similar profiles are inferred to be functionally related under the assumption that proteins involved in the same metabolic pathway or cellular system are likely to have been co-inherited during evolution. RESULTS Our experimentation with E. coli and yeast proteins with 16 different carefully composed reference sets of genomes revealed that the phyletic patterns of proteins in prokaryotes alone could be adequate enough to make reasonably accurate functional linkage predictions. A slight improvement in performance is observed on adding few eukaryotes into the reference set, but a noticeable drop-off in performance is observed with increased number of eukaryotes. Inclusion of most parasitic, pathogenic or vertebrate genomes and multiple strains of the same species into the reference set do not necessarily contribute to an improved sensitivity or accuracy. Interestingly, we also found that evolutionary histories of individual pathways have a significant affect on the performance of the PPC approach with respect to a particular reference set. For example, to accurately predict functional links in carbohydrate or lipid metabolism, a reference set solely composed of prokaryotic (or bacterial) genomes performed among the best compared to one composed of genomes from all three super-kingdoms; this is in contrast to predicting functional links in translation for which a reference set composed of prokaryotic (or bacterial) genomes performed the worst. We also demonstrate that the widely used random null model to quantify the statistical significance of profile similarity is incomplete, which could result in an increased number of false-positives. CONCLUSION Contrary to previous proposals, it is not merely the number of genomes but a careful selection of informative genomes in the reference set that influences the prediction accuracy of the PPC approach. We note that the predictive power of the PPC approach, especially in eukaryotes, is heavily influenced by the primary endosymbiosis and subsequent bacterial contributions. The over-representation of parasitic unicellular eukaryotes and vertebrates additionally make eukaryotes less useful in the reference sets. Reference sets composed of highly non-redundant set of genomes from all three super-kingdoms fare better with pathways showing considerable vertical inheritance and strong conservation (e.g. translation apparatus), while reference sets solely composed of prokaryotic genomes fare better for more variable pathways like carbohydrate metabolism. Differential performance of the PPC approach on various pathways, and a weak positive correlation between functional and profile similarities suggest that caution should be exercised while interpreting functional linkages inferred from genome-wide large-scale profile comparisons using a single reference set.
Collapse
Affiliation(s)
- Raja Jothi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Teresa M Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
98
|
Kapranov P, Willingham AT, Gingeras TR. Genome-wide transcription and the implications for genomic organization. Nat Rev Genet 2007; 8:413-23. [PMID: 17486121 DOI: 10.1038/nrg2083] [Citation(s) in RCA: 529] [Impact Index Per Article: 31.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Recent evidence of genome-wide transcription in several species indicates that the amount of transcription that occurs cannot be entirely accounted for by current sets of genome-wide annotations. Evidence indicates that most of both strands of the human genome might be transcribed, implying extensive overlap of transcriptional units and regulatory elements. These observations suggest that genomic architecture is not colinear, but is instead interleaved and modular, and that the same genomic sequences are multifunctional: that is, used for multiple independently regulated transcripts and as regulatory regions. What are the implications and consequences of such an interleaved genomic architecture in terms of increased information content, transcriptional complexity, evolution and disease states?
Collapse
Affiliation(s)
- Philipp Kapranov
- Affymetrix, Inc., 3420 Central Expressway, Santa Clara, California 95051, USA
| | | | | |
Collapse
|
99
|
Cock PJA, Whitworth DE. Evolution of gene overlaps: relative reading frame bias in prokaryotic two-component system genes. J Mol Evol 2007; 64:457-62. [PMID: 17479344 DOI: 10.1007/s00239-006-0180-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2006] [Accepted: 01/22/2007] [Indexed: 10/23/2022]
Abstract
During a survey of two-component system genes, a list of neighboring histidine kinase and response regulator genes, encoded on the same strand, was compiled from over 200 fully sequenced bacteria. It was observed that many gene pairs overlapped, and although such overlaps can potentially occur in two phases (relative reading frames), one phase predominated for overlaps of seven or more nucleotides. Preference for a particular phase cannot be explained by arguments of sequence restraint (mutations in one gene differentially affect an overlapping gene, depending on phase). We have therefore investigated a potential explanation of the observed phase bias. For phase +1 gene overlaps, simulated point mutations in the overlapping region result in more severe changes to the downstream gene product than to the upstream gene product; vice versa in phase +2. Additionally, codon usage frequencies in nonoverlapping regions are more similar to those at the end of the upstream gene than the beginning of the downstream gene in overlaps. Taking both observations together, we propose that new gene overlaps generally arise by N-terminal extension of a downstream gene, creating a novel sequence at the start of the downstream gene. Sequence changes in this newly coding sequence will alter the sequences of both the new and the original coding sequence (the C-terminal region of the upstream gene). However, these changes will be less detrimental to the original coding sequence if the two genes overlap in phase +1, leading to selective retention during evolution of phase +1 overlaps relative to phase +2 overlaps.
Collapse
Affiliation(s)
- Peter J A Cock
- MOAC Doctoral Training Centre, University of Warwick, Coventry, UK
| | | |
Collapse
|
100
|
de Groot S, Mailund T, Hein J. Comparative annotation of viral genomes with non-conserved gene structure. Bioinformatics 2007; 23:1080-9. [PMID: 17341494 DOI: 10.1093/bioinformatics/btm078] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Detecting genes in viral genomes is a complex task. Due to the biological necessity of them being constrained in length, RNA viruses in particular tend to code in overlapping reading frames. Since one amino acid is encoded by a triplet of nucleic acids, up to three genes may be coded for simultaneously in one direction. Conventional hidden Markov model (HMM)-based gene-finding algorithms may typically find it difficult to identify multiple coding regions, since in general their topologies do not allow for the presence of overlapping or nested genes. Comparative methods have therefore been restricted to likelihood ratio tests on potential regions as to being double or single coding, using the fact that the constrictions forced upon multiple-coding nucleotides will result in atypical sequence evolution. Exploiting these same constraints, we present an HMM based gene-finding program, which allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences. RESULTS We apply our method to 15 pairwise alignments of six different HIV2 genomes. Given sufficient evolutionary distance between the two sequences, we achieve sensitivity of approximately 84-89% and specificity of approximately 97-99.9%. We additionally annotate three pairwise alignments of the more distantly related HIV1 and HIV2, as well as of two different hepatitis viruses, attaining results of approximately 87% sensitivity and approximately 98.5% specificity. We subsequently incorporate prior knowledge by 'knowing' the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate that conservation of gene structure on top of nucleotide sequence is a valuable source of information, especially in distantly related genomes. AVAILABILITY The Java code is available from the authors.
Collapse
|