Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Huynen MA, Snel B. Gene and context: integrative approaches to genome analysis. Adv Protein Chem 2000;54:345-79. [PMID: 10829232 DOI: 10.1016/s0065-3233(00)54010-8] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Number

Cited by Other Article(s)

Gao Y, Ma B, Xu Q, Peng Y, Gong H, Guan A, Hua K, Langford PR, Jin H, Luo R. Spatial proximity and gene function: a new dimension in prokaryotic gene association network analysis with 3D-GeneNet. Brief Bioinform 2024;25:bbae320. [PMID: 38975892 PMCID: PMC11229033 DOI: 10.1093/bib/bbae320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 05/22/2024] [Accepted: 06/18/2024] [Indexed: 07/09/2024] Open

Abstract

Understanding the biological functions and processes of genes, particularly those not yet characterized, is crucial for advancing molecular biology and identifying therapeutic targets. The hypothesis guiding this study is that the 3D proximity of genes correlates with their functional interactions and relevance in prokaryotes. We introduced 3D-GeneNet, an innovative software tool that utilizes high-throughput sequencing data from chromosome conformation capture techniques and integrates topological metrics to construct gene association networks. Through a series of comparative analyses focused on spatial versus linear distances, we explored various dimensions such as topological structure, functional enrichment levels, distribution patterns of linear distances among gene pairs, and the area under the receiver operating characteristic curve by utilizing model organism Escherichia coli K-12. Furthermore, 3D-GeneNet was shown to maintain good accuracy compared to multiple algorithms (neighbourhood, co-occurrence, coexpression, and fusion) across multiple bacteria, including E. coli, Brucella abortus, and Vibrio cholerae. In addition, the accuracy of 3D-GeneNet's prediction of long-distance gene interactions was identified by bacterial two-hybrid assays on E. coli K-12 MG1655, where 3D-GeneNet not only increased the accuracy of linear genomic distance tripled but also achieved 60% accuracy by running alone. Finally, it can be concluded that the applicability of 3D-GeneNet will extend to various bacterial forms, including Gram-negative, Gram-positive, single-, and multi-chromosomal bacteria through Hi-C sequencing and analysis. Such findings highlight the broad applicability and significant promise of this method in the realm of gene association network. 3D-GeneNet is freely accessible at https://github.com/gaoyuanccc/3D-GeneNet.

Collapse

Affiliation(s)

Yuan Gao State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
Bin Ma State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
Qianshuai Xu State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
Yuna Peng State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
Huimin Gong State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
Aohan Guan State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
Kexin Hua Swine Genome and Breeding Team, Yazhouwan National Laboratory, No. 8 Huanjin Road, Yazhou District, Sanya City, Hainan Province 572024, China
Paul R Langford Section of Paediatric Infectious Disease, Imperial College London, St Mary's Campus, Norfolk Place, London W2 1PG, United Kingdom
Hui Jin State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
Rui Luo State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China

Collapse

Gumerov VM, Zhulin IB. TREND: a platform for exploring protein function in prokaryotes based on phylogenetic, domain architecture and gene neighborhood analyses. Nucleic Acids Res 2020;48:W72-W76. [PMID: 32282909 PMCID: PMC7319448 DOI: 10.1093/nar/gkaa243] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 03/16/2020] [Accepted: 04/01/2020] [Indexed: 01/16/2023] Open

Bhatt V, Mohapatra A, Anand S, Kuntal BK, Mande SS. FLIM-MAP: Gene Context Based Identification of Functional Modules in Bacterial Metabolic Pathways. Front Microbiol 2018;9:2183. [PMID: 30283416 PMCID: PMC6157337 DOI: 10.3389/fmicb.2018.02183] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 08/24/2018] [Indexed: 01/18/2023] Open

Crawley AB, Barrangou R. Conserved Genome Organization and Core Transcriptome of the Lactobacillus acidophilus Complex. Front Microbiol 2018;9:1834. [PMID: 30150974 PMCID: PMC6099100 DOI: 10.3389/fmicb.2018.01834] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 07/23/2018] [Indexed: 01/08/2023] Open

Gabaldón T. Evolution of Proteins and Proteomes: A Phylogenetics Approach. Evol Bioinform Online 2017. [DOI: 10.1177/117693430500100004] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Bouyioukos C, Elati M, Képès F. Analysis tools for the interplay between genome layout and regulation. BMC Bioinformatics 2016;17 Suppl 5:191. [PMID: 27294345 PMCID: PMC4905612 DOI: 10.1186/s12859-016-1047-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Abstract

BACKGROUND

Genome layout and gene regulation appear to be interdependent. Understanding this interdependence is key to exploring the dynamic nature of chromosome conformation and to engineering functional genomes. Evidence for non-random genome layout, defined as the relative positioning of either co-functional or co-regulated genes, stems from two main approaches. Firstly, the analysis of contiguous genome segments across species, has highlighted the conservation of gene arrangement (synteny) along chromosomal regions. Secondly, the study of long-range interactions along a chromosome has emphasised regularities in the positioning of microbial genes that are co-regulated, co-expressed or evolutionarily correlated. While one-dimensional pattern analysis is a mature field, it is often powerless on biological datasets which tend to be incomplete, and partly incorrect. Moreover, there is a lack of comprehensive, user-friendly tools to systematically analyse, visualise, integrate and exploit regularities along genomes.

RESULTS

Here we present the Genome REgulatory and Architecture Tools SCAN (GREAT:SCAN) software for the systematic study of the interplay between genome layout and gene expression regulation.

GREAT

SCAN is a collection of related and interconnected applications currently able to perform systematic analyses of genome regularities as well as to improve transcription factor binding sites (TFBS) and gene regulatory network predictions based on gene positional information.

CONCLUSIONS

We demonstrate the capabilities of these tools by studying on one hand the regular patterns of genome layout in the major regulons of the bacterium Escherichia coli. On the other hand, we demonstrate the capabilities to improve TFBS prediction in microbes. Finally, we highlight, by visualisation of multivariate techniques, the interplay between position and sequence information for effective transcription regulation.

Collapse

Bouyioukos C, Bucchini F, Elati M, Képès F. GREAT: a web portal for Genome Regulatory Architecture Tools. Nucleic Acids Res 2016;44:W77-82. [PMID: 27151196 PMCID: PMC4987929 DOI: 10.1093/nar/gkw384] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Accepted: 04/26/2016] [Indexed: 11/15/2022] Open

Jahn K, Winter S, Stoye J, Böcker S. Statistics for approximate gene clusters. BMC Bioinformatics 2014;14 Suppl 15:S14. [PMID: 24564620 PMCID: PMC3908651 DOI: 10.1186/1471-2105-14-s15-s14] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

Galperin MY, Koonin EV. Comparative Genomics Approaches to Identifying Functionally Related Genes. ALGORITHMS FOR COMPUTATIONAL BIOLOGY 2014. [DOI: 10.1007/978-3-319-07953-0_1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Vey G. Metagenomic guilt by association: an operonic perspective. PLoS One 2013;8:e71484. [PMID: 23940763 PMCID: PMC3735515 DOI: 10.1371/journal.pone.0071484] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Accepted: 06/28/2013] [Indexed: 11/22/2022] Open

Abstract

Next-generation sequencing projects continue to drive a vast accumulation of metagenomic sequence data. Given the growth rate of this data, automated approaches to functional annotation are indispensable and a cornerstone heuristic of many computational protocols is the concept of guilt by association. The guilt by association paradigm has been heavily exploited by genomic context methods that offer functional predictions that are complementary to homology-based annotations, thereby offering a means to extend functional annotation. In particular, operon methods that exploit co-directional intergenic distances can provide homology-free functional annotation through the transfer of functions among co-operonic genes, under the assumption that guilt by association is indeed applicable. Although guilt by association is a well-accepted annotative device, its applicability to metagenomic functional annotation has not been definitively demonstrated. Here a large-scale assessment of metagenomic guilt by association is undertaken where functional associations are predicted on the basis of co-directional intergenic distances. Specifically, functional annotations are compared within pairs of adjacent co-directional genes, as well as operons of various lengths (i.e. number of member genes), in order to reveal new information about annotative cohesion versus operon length. The results suggests that co-directional gene pairs offer reduced confidence for metagenomic guilt by association due to difficulty in resolving the existence of functional associations when intergenic distance is the sole predictor of pairwise gene interactions. However, metagenomic operons, particularly those with substantial lengths, appear to be capable of providing a superior basis for metagenomic guilt by association due to increased annotative stability. The need for improved recognition of metagenomic operons is discussed, as well as the limitations of the present work.

Collapse

Cohen O, Ashkenazy H, Levy Karin E, Burstein D, Pupko T. CoPAP: Coevolution of presence-absence patterns. Nucleic Acids Res 2013;41:W232-7. [PMID: 23748951 PMCID: PMC3692100 DOI: 10.1093/nar/gkt471] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Cohen O, Ashkenazy H, Burstein D, Pupko T. Uncovering the co-evolutionary network among prokaryotic genes. Bioinformatics 2012;28:i389-i394. [PMID: 22962457 PMCID: PMC3436823 DOI: 10.1093/bioinformatics/bts396] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

Jahn K. Efficient computation of approximate gene clusters based on reference occurrences. J Comput Biol 2012;18:1255-74. [PMID: 21899430 DOI: 10.1089/cmb.2011.0132] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Yelton AP, Thomas BC, Simmons SL, Wilmes P, Zemla A, Thelen MP, Justice N, Banfield JF. A semi-quantitative, synteny-based method to improve functional predictions for hypothetical and poorly annotated bacterial and archaeal genes. PLoS Comput Biol 2011;7:e1002230. [PMID: 22028637 PMCID: PMC3197636 DOI: 10.1371/journal.pcbi.1002230] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2011] [Accepted: 08/30/2011] [Indexed: 11/19/2022] Open

Zhang Y, Gladyshev VN. Comparative Genomics of Trace Elements: Emerging Dynamic View of Trace Element Utilization and Function. Chem Rev 2009;109:4828-61. [DOI: 10.1021/cr800557s] [Citation(s) in RCA: 99] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

Koonin EV, Wolf YI. Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res 2008;36:6688-719. [PMID: 18948295 PMCID: PMC2588523 DOI: 10.1093/nar/gkn668] [Citation(s) in RCA: 534] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

Gonzalez O, Zimmer R. Assigning functional linkages to proteins using phylogenetic profiles and continuous phenotypes. ACTA ACUST UNITED AC 2008;24:1257-63. [PMID: 18381403 DOI: 10.1093/bioinformatics/btn106] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Gabaldón T. Computational approaches for the prediction of protein function in the mitochondrion. Am J Physiol Cell Physiol 2006;291:C1121-8. [PMID: 16870830 DOI: 10.1152/ajpcell.00225.2006] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Ettema TJG, de Vos WM, van der Oost J. Discovering novel biology by in silico archaeology. Nat Rev Microbiol 2005;3:859-69. [PMID: 16175172 DOI: 10.1038/nrmicro1268] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Huynen MA, Gabaldón T, Snel B. Variation and evolution of biomolecular systems: Searching for functional relevance. FEBS Lett 2005;579:1839-45. [PMID: 15763561 DOI: 10.1016/j.febslet.2005.02.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2005] [Revised: 01/18/2005] [Accepted: 02/01/2005] [Indexed: 11/29/2022]

Zientz E, Dandekar T, Gross R. Metabolic interdependence of obligate intracellular bacteria and their insect hosts. Microbiol Mol Biol Rev 2005;68:745-70. [PMID: 15590782 PMCID: PMC539007 DOI: 10.1128/mmbr.68.4.745-770.2004] [Citation(s) in RCA: 231] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Korbel JO, Jensen LJ, von Mering C, Bork P. Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nat Biotechnol 2005;22:911-7. [PMID: 15229555 DOI: 10.1038/nbt988] [Citation(s) in RCA: 136] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Tasneem A, Iyer LM, Jakobsson E, Aravind L. Identification of the prokaryotic ligand-gated ion channels and their implications for the mechanisms and origins of animal Cys-loop ion channels. Genome Biol 2004;6:R4. [PMID: 15642096 PMCID: PMC549065 DOI: 10.1186/gb-2004-6-1-r4] [Citation(s) in RCA: 191] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2004] [Revised: 10/26/2004] [Accepted: 11/24/2004] [Indexed: 11/24/2022] Open

Abstract

BACKGROUND

Acetylcholine receptor type ligand-gated ion channels (ART-LGIC; also known as Cys-loop receptors) are a superfamily of proteins that include the receptors for major neurotransmitters such as acetylcholine, serotonin, glycine, GABA, glutamate and histamine, and for Zn2+ ions. They play a central role in fast synaptic signaling in animal nervous systems and so far have not been found outside of the Metazoa.

RESULTS

Using sensitive sequence-profile searches we have identified homologs of ART-LGICs in several bacteria and a single archaeal genus, Methanosarcina. The homology between the animal receptors and the prokaryotic homologs spans the entire length of the former, including both the ligand-binding and channel-forming transmembrane domains. A sequence-structure analysis using the structure of Lymnaea stagnalis acetylcholine-binding protein and the newly detected prokaryotic versions indicates the presence of at least one aromatic residue in the ligand-binding boxes of almost all representatives of the superfamily. Investigation of the domain architectures of the bacterial forms shows that they may often show fusions with other small-molecule-binding domains, such as the periplasmic binding protein superfamily I (PBP-I), Cache and MCP-N domains. Some of the bacterial forms also occur in predicted operons with the genes of the PBP-II superfamily and the Cache domains. Analysis of phyletic patterns suggests that the ART-LGICs are currently absent in all other eukaryotic lineages except animals. Moreover, phylogenetic analysis and conserved sequence motifs also suggest that a subset of the bacterial forms is closer to the metazoan forms.

CONCLUSIONS

From the information from the bacterial forms we infer that cation-pi or hydrophobic interactions with the ligand are likely to be a pervasive feature of the entire superfamily, even though the individual residues involved in the process may vary. The conservation pattern in the channel-forming transmembrane domains also suggests similar channel-gating mechanisms in the prokaryotic versions. From the distribution of charged residues in the prokaryotic M2 transmembrane segments, we expect that there will be examples of both cation and anion selectivity within the prokaryotic members. Contextual connections suggest that the prokaryotic forms may function as chemotactic receptors for low molecular weight solutes. The phyletic patterns and phylogenetic relationships suggest the possibility that the metazoan receptors emerged through an early lateral transfer from a prokaryotic source, before the divergence of extant metazoan lineages.

Collapse

Iyer LM, Leipe DD, Koonin EV, Aravind L. Evolutionary history and higher order classification of AAA+ ATPases. J Struct Biol 2004;146:11-31. [PMID: 15037234 DOI: 10.1016/j.jsb.2003.10.010] [Citation(s) in RCA: 608] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2003] [Revised: 10/08/2003] [Indexed: 12/29/2022]

Abstract

The AAA+ ATPases are enzymes containing a P-loop NTPase domain, and function as molecular chaperones, ATPase subunits of proteases, helicases or nucleic-acid-stimulated ATPases. All available sequences and structures of AAA+ protein domains were compared with the aim of identifying the definitive sequence and structure features of these domains and inferring the principal events in their evolution. An evolutionary classification of the AAA+ class was developed using standard phylogenetic methods, analysis of shared sequence and structural signatures, and similarity-based clustering. This analysis resulted in the identification of 26 major families within the AAA+ ATPase class. We also describe the position of the AAA+ ATPases with respect to the RecA/F1, helicase superfamilies I/II, PilT, and ABC classes of P-loop NTPases. The AAA+ class appears to have undergone an early radiation into the clamp-loader, DnaA/Orc/Cdc6, classic AAA, and "pre-sensor 1 beta-hairpin" (PS1BH) clades. Within the PS1BH clade, chelatases, MoxR, YifB, McrB, Dynein-midasin, NtrC, and MCMs form a monophyletic assembly defined by a distinct insert in helix-2 of the conserved ATPase core, and additional helical segment between the core ATPase domain and the C-terminal alpha-helical bundle. At least 6 distinct AAA+ proteins, which represent the different major clades, are traceable to the last universal common ancestor (LUCA) of extant cellular life. Additionally, superfamily III helicases, which belong to the PS1BH assemblage, were probably present at this stage in virus-like "selfish" replicons. The next major radiation, at the base of the two prokaryotic kingdoms, bacteria and archaea, gave rise to several distinct chaperones, ATPase subunits of proteases, DNA helicases, and transcription factors. The third major radiation, at the outset of eukaryotic evolution, contributed to the origin of several eukaryote-specific adaptations related to nuclear and cytoskeletal functions. The new relationships and previously undetected domains reported here might provide new leads for investigating the biology of AAA+ ATPases.

Collapse

Iyer LM, Makarova KS, Koonin EV, Aravind L. Comparative genomics of the FtsK-HerA superfamily of pumping ATPases: implications for the origins of chromosome segregation, cell division and viral capsid packaging. Nucleic Acids Res 2004;32:5260-79. [PMID: 15466593 PMCID: PMC521647 DOI: 10.1093/nar/gkh828] [Citation(s) in RCA: 246] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

Recently, it has been shown that a predicted P-loop ATPase (the HerA or MlaA protein), which is highly conserved in archaea and also present in many bacteria but absent in eukaryotes, has a bidirectional helicase activity and forms hexameric rings similar to those described for the TrwB ATPase. In this study, the FtsK-HerA superfamily of P-loop ATPases, in which the HerA clade comprises one of the major branches, is analyzed in detail. We show that, in addition to the FtsK and HerA clades, this superfamily includes several families of characterized or predicted ATPases which are predominantly involved in extrusion of DNA and peptides through membrane pores. The DNA-packaging ATPases of various bacteriophages and eukaryotic double-stranded DNA viruses also belong to the FtsK-HerA superfamily. The FtsK protein is the essential bacterial ATPase that is responsible for the correct segregation of daughter chromosomes during cell division. The structural and evolutionary relationship between HerA and FtsK and the nearly perfect complementarity of their phyletic distributions suggest that HerA similarly mediates DNA pumping into the progeny cells during archaeal cell division. It appears likely that the HerA and FtsK families diverged concomitantly with the archaeal-bacterial division and that the last universal common ancestor of modern life forms had an ancestral DNA-pumping ATPase that gave rise to these families. Furthermore, the relationship of these cellular proteins with the packaging ATPases of diverse DNA viruses suggests that a common DNA pumping mechanism might be operational in both cellular and viral genome segregation. The herA gene forms a highly conserved operon with the gene for the NurA nuclease and, in many archaea, also with the orthologs of eukaryotic double-strand break repair proteins MRE11 and Rad50. HerA is predicted to function in a complex with these proteins in DNA pumping and repair of double-stranded breaks introduced during this process and, possibly, also during DNA replication. Extensive comparative analysis of the 'genomic context' combined with in-depth sequence analysis led to the prediction of numerous previously unnoticed nucleases of the NurA superfamily, including a specific version that is likely to be the endonuclease component of a novel restriction-modification system. This analysis also led to the identification of previously uncharacterized nucleases, such as a novel predicted nuclease of the Sir2-type Rossmann fold, and phosphatases of the HAD superfamily that are likely to function as partners of the FtsK-HerA superfamily ATPases.

Collapse

Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Rogozin IB, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol 2004;5:R7. [PMID: 14759257 PMCID: PMC395751 DOI: 10.1186/gb-2004-5-2-r7] [Citation(s) in RCA: 676] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2003] [Revised: 12/01/2003] [Accepted: 12/04/2003] [Indexed: 11/10/2022] Open

Abstract

We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs from seven eukaryotic genomes. The analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes.

Background

Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes.

Results

We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalitozoon cuniculi. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability. The approximately 40% of KOGs that are represented in six or seven species are enriched in proteins responsible for housekeeping functions, particularly translation and RNA processing. These conserved KOGs are often essential for survival and might approximate the minimal set of essential eukaryotic genes. The 131 single-member, pan-eukaryotic KOGs we identified were examined in detail. For around 20 that remained uncharacterized, functions were predicted by in-depth sequence analysis and examination of genomic context. Nearly all these proteins are subunits of known or predicted multiprotein complexes, in agreement with the balance hypothesis of evolution of gene copy number. Other KOGs show a variety of phyletic patterns, which points to major contributions of lineage-specific gene loss and the 'invention' of genes new to eukaryotic evolution. Examination of the sets of KOGs lost in individual lineages reveals co-elimination of functionally connected genes. Parsimonious scenarios of eukaryotic genome evolution and gene sets for ancestral eukaryotic forms were reconstructed. The gene set of the last common ancestor of the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes.

Conclusions

The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms.

Collapse

Lawrence JG. Gene Organization: Selection, Selfishness, and Serendipity. Annu Rev Microbiol 2003;57:419-40. [PMID: 14527286 DOI: 10.1146/annurev.micro.57.030502.090816] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003;4:41. [PMID: 12969510 PMCID: PMC222959 DOI: 10.1186/1471-2105-4-41] [Citation(s) in RCA: 3221] [Impact Index Per Article: 153.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2003] [Accepted: 09/11/2003] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies.

RESULTS

We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or approximately 54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of approximately 20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (approximately 1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes.

CONCLUSION

The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.

Collapse

Affiliation(s)

Roman L Tatusov National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
Natalie D Fedorova National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
John D Jackson National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
Aviva R Jacobs National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
Boris Kiryutin National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
Eugene V Koonin National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
Dmitri M Krylov National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
Raja Mazumder Protein Information Resource, Georgetown University Medical Center, 3900 Reservoir Road, NW, Washington, DC 20007, USA
Sergei L Mekhedov National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
Anastasia N Nikolskaya Protein Information Resource, Georgetown University Medical Center, 3900 Reservoir Road, NW, Washington, DC 20007, USA
B Sridhar Rao National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
Sergei Smirnov National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
Alexander V Sverdlov National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
Sona Vasudevan National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
Yuri I Wolf National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
Jodie J Yin National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA
Darren A Natale Protein Information Resource, Georgetown University Medical Center, 3900 Reservoir Road, NW, Washington, DC 20007, USA

Collapse

Schmidt S, Sunyaev S, Bork P, Dandekar T. Metabolites: a helping hand for pathway evolution? Trends Biochem Sci 2003;28:336-41. [PMID: 12826406 DOI: 10.1016/s0968-0004(03)00114-2] [Citation(s) in RCA: 107] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Makarova KS, Wolf YI, Koonin EV. Potential genomic determinants of hyperthermophily. Trends Genet 2003;19:172-6. [PMID: 12683966 DOI: 10.1016/s0168-9525(03)00047-7] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Rogozin IB, Makarova KS, Murvai J, Czabarka E, Wolf YI, Tatusov RL, Szekely LA, Koonin EV. Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res 2002;30:2212-23. [PMID: 12000841 PMCID: PMC115289 DOI: 10.1093/nar/30.10.2212] [Citation(s) in RCA: 130] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

A computational method was developed for delineating connected gene neighborhoods in bacterial and archaeal genomes. These gene neighborhoods are not typically present, in their entirety, in any single genome, but are held together by overlapping, partially conserved gene arrays. The procedure was applied to comparing the orders of orthologous genes, which were extracted from the database of Clusters of Orthologous Groups of proteins (COGs), in 31 prokaryotic genomes and resulted in the identification of 188 clusters of gene arrays, which included 1001 of 2890 COGs. These clusters were projected onto actual genomes to produce extended neighborhoods including additional genes, which are adjacent to the genes from the clusters and are transcribed in the same direction, which resulted in a total of 2387 COGs being included in the neighborhoods. Most of the neighborhoods consist predominantly of genes united by a coherent functional theme, but also include a minority of genes without an obvious functional connection to the main theme. We hypothesize that although some of the latter genes might have unsuspected roles, others are maintained within gene arrays because of the advantage of expression at a level that is typical of the given neighborhood. We designate this phenomenon 'genomic hitchhiking'. The largest neighborhood includes 79 genes (COGs) and consists of overlapping, rearranged ribosomal protein superoperons; apparent genome hitchhiking is particularly typical of this neighborhood and other neighborhoods that consist of genes coding for translation machinery components. Several neighborhoods involve previously undetected connections between genes, allowing new functional predictions. Gene neighborhoods appear to evolve via complex rearrangement, with different combinations of genes from a neighborhood fixed in different lineages.

Collapse

Rison SCG, Teichmann SA, Thornton JM. Homology, pathway distance and chromosomal localization of the small molecule metabolism enzymes in Escherichia coli. J Mol Biol 2002;318:911-32. [PMID: 12054833 DOI: 10.1016/s0022-2836(02)00140-7] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Abstract

Here, we analyse Escherichia coli enzymes involved in small molecule metabolism (SMM). We introduce the concept of pathway distance as a measure of the number of distinct metabolic steps separating two SMM enzymes, and we consider protein homology (as determined by assigning enzymes to structural and sequence families) and gene interval (the number of genes separating two genes on the E. coli chromosome). The relationships between these three contexts (pathway distance, homology and chromosomal localisation) is investigated extensively. We make use of these relationships to suggest possible SMM evolution mechanisms. Homology between enzyme pairs close in the SMM was higher than expected by chance but was still rare. When observed, homologues usually conserved their reaction mechanism and/or co-factor binding rather than shared substrate binding. The correlation between pathway distance and gene intervals was clear. Enzymes catalysing nearby SMM reactions were usually encoded by genes close by on the E. coli chromosome. We found many co-regulated blocks of three to four genes (usually non-homologous) encoding enzymes occurring within four metabolic steps of one another; nearly all of these blocks formed part of known or predicted operons. The "inline reuse" of enzymes (i.e. the use of the same enzyme to catalyse two or more different steps of a metabolic pathway) is also discussed: of these enzymes, four were multifunctional (i.e. catalysed a different reaction in each instance), nine had multiple substrate specificity (i.e. catalysed the same reaction on different substrates in each instance) and one catalysed the same reaction on the same substrate but as part of two different complexes. We also identified 59 sets of isozymic proteins most commonly duplicated to function under different conditions, or with a different preferred substrate or minor substrate. In addition to transcriptional units, isozymes and inline reuse of enzymes provide mechanisms for controlling the SMM network. Our data suggest that several pathway evolution mechanisms may occur in concert, although chemistry-driven duplication/recruitment is favoured. SMM exploits regulatory strategies involving chromosomal location, isozymes and the reuse of enzymes.

Collapse

Yanai I, Mellor JC, DeLisi C. Identifying functional links between genes using conserved chromosomal proximity. Trends Genet 2002;18:176-9. [PMID: 11932011 DOI: 10.1016/s0168-9525(01)02621-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Korbel JO, Snel B, Huynen MA, Bork P. SHOT: a web server for the construction of genome phylogenies. Trends Genet 2002;18:158-62. [PMID: 11858840 DOI: 10.1016/s0168-9525(01)02597-5] [Citation(s) in RCA: 138] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Yanai I, Wolf YI, Koonin EV. Evolution of gene fusions: horizontal transfer versus independent events. Genome Biol 2002;3:research0024. [PMID: 12049665 PMCID: PMC115226 DOI: 10.1186/gb-2002-3-5-research0024] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2001] [Revised: 02/07/2002] [Accepted: 03/26/2002] [Indexed: 12/21/2022] Open

Snel B, Bork P, Huynen MA. Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res 2002;12:17-25. [PMID: 11779827 DOI: 10.1101/gr.176501] [Citation(s) in RCA: 272] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Wolf YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV. Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol 2001;1:8. [PMID: 11734060 PMCID: PMC60490 DOI: 10.1186/1471-2148-1-8] [Citation(s) in RCA: 234] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2001] [Accepted: 10/23/2001] [Indexed: 12/04/2022] Open

Abstract

BACKGROUND

The availability of multiple complete genome sequences from diverse taxa prompts the development of new phylogenetic approaches, which attempt to incorporate information derived from comparative analysis of complete gene sets or large subsets thereof. Such attempts are particularly relevant because of the major role of horizontal gene transfer and lineage-specific gene loss, at least in the evolution of prokaryotes.

RESULTS

Five largely independent approaches were employed to construct trees for completely sequenced bacterial and archaeal genomes: i) presence-absence of genomes in clusters of orthologous genes; ii) conservation of local gene order (gene pairs) among prokaryotic genomes; iii) parameters of identity distribution for probable orthologs; iv) analysis of concatenated alignments of ribosomal proteins; v) comparison of trees constructed for multiple protein families. All constructed trees support the separation of the two primary prokaryotic domains, bacteria and archaea, as well as some terminal bifurcations within the bacterial and archaeal domains. Beyond these obvious groupings, the trees made with different methods appeared to differ substantially in terms of the relative contributions of phylogenetic relationships and similarities in gene repertoires caused by similar life styles and horizontal gene transfer to the tree topology. The trees based on presence-absence of genomes in orthologous clusters and the trees based on conserved gene pairs appear to be strongly affected by gene loss and horizontal gene transfer. The trees based on identity distributions for orthologs and particularly the tree made of concatenated ribosomal protein sequences seemed to carry a stronger phylogenetic signal. The latter tree supported three potential high-level bacterial clades,: i) Chlamydia-Spirochetes, ii) Thermotogales-Aquificales (bacterial hyperthermophiles), and ii) Actinomycetes-Deinococcales-Cyanobacteria. The latter group also appeared to join the low-GC Gram-positive bacteria at a deeper tree node. These new groupings of bacteria were supported by the analysis of alternative topologies in the concatenated ribosomal protein tree using the Kishino-Hasegawa test and by a census of the topologies of 132 individual groups of orthologous proteins. Additionally, the results of this analysis put into question the sister-group relationship between the two major archaeal groups, Euryarchaeota and Crenarchaeota, and suggest instead that Euryarchaeota might be a paraphyletic group with respect to Crenarchaeota.

CONCLUSIONS

We conclude that, the extensive horizontal gene flow and lineage-specific gene loss notwithstanding, extension of phylogenetic analysis to the genome scale has the potential of uncovering deep evolutionary relationships between prokaryotic lineages.

Collapse

Teichmann SA, Rison SC, Thornton JM, Riley M, Gough J, Chothia C. The evolution and structural anatomy of the small molecule metabolic pathways in Escherichia coli. J Mol Biol 2001;311:693-708. [PMID: 11518524 DOI: 10.1006/jmbi.2001.4912] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Abstract

The 106 small molecule metabolic (SMM) pathways in Escherichia coli are formed by the protein products of 581 genes. We can define 722 domains, nearly all of which are homologous to proteins of known structure, that form all or part of 510 of these proteins. This information allows us to answer general questions on the structural anatomy of the SMM pathway proteins and to trace family relationships and recruitment events within and across pathways. Half the gene products contain a single domain and half are formed by combinations of between two and six domains. The 722 domains belong to one of 213 families that have between one and 51 members. Family members usually conserve their catalytic or cofactor binding properties; substrate recognition is rarely conserved. Of the 213 families, members of only a quarter occur in isolation, i.e. they form single-domain proteins. Most members of the other families combine with domains from just one or two other families and a few more versatile families can combine with several different partners. Excluding isoenzymes, more than twice as many homologues are distributed across pathways as within pathways. However, serial recruitment, with two consecutive enzymes both being recruited to another pathway, is rare and recruitment of three consecutive enzymes is not observed. Only eight of the 106 pathways have a high number of homologues. Homology between consecutive pairs of enzymes with conservation of the main substrate-binding site but change in catalytic mechanism (which would support a simple model of retrograde pathway evolution) occurs only six times in the whole set of enzymes. Most of the domains that form SMM pathways have homologues in non-SMM pathways. Taken together, these results imply a pervasive "mosaic" model for the formation of protein repertoires and pathways.

Collapse

Koonin EV, Wolf YI, Aravind L. Prediction of the archaeal exosome and its connections with the proteasome and the translation and transcription machineries by a comparative-genomic approach. Genome Res 2001;11:240-52. [PMID: 11157787 PMCID: PMC311015 DOI: 10.1101/gr.162001] [Citation(s) in RCA: 205] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Abstract

By comparing the gene order in the completely sequenced archaeal genomes complemented by sequence profile analysis, we predict the existence and protein composition of the archaeal counterpart of the eukaryotic exosome, a complex of RNAses, RNA-binding proteins, and helicases that mediates processing and 3'->5' degradation of a variety of RNA species. The majority of the predicted archaeal exosome subunits are encoded in what appears to be a previously undetected superoperon. In Methanobacterium thermoautotrophicum, this predicted superoperon consists of 15 genes; in the Crenarchaea, Sulfolobus solfataricus and Aeropyrum pernix, one and two of the genes from the superoperon, respectively, are relocated in the genome, whereas in other Euryarchaeota, the superoperon is split into a variable number of predicted operons and solitary genes. Methanococcus jannaschii partially retains the superoperon, but lacks the three core exosome subunits, and in Halobacterium sp., the superoperon is divided into two predicted operons, with the same three exosome subunits missing. This suggests concerted gene loss and an alteration of the structure and function of the predicted exosome in the Methanococcus and Halobacterium lineages. Additional potential components of the exosome are encoded by partially conserved predicted small operons. Along with the orthologs of eukaryotic exosome subunits, namely an RNase PH and two RNA-binding proteins, the predicted archaeal exosomal superoperon also encodes orthologs of two protein subunits of RNase P. This suggests a functional and possibly a physical interaction between RNase P and the postulated archaeal exosome, a connection that has not been reported in eukaryotes. In a pattern of apparent gene loss complementary to that seen in Methanococcus and Halobacterium, Thermoplasma acidophilum lacks the RNase P subunits. Unexpectedly, the identified exosomal superoperon, in addition to the predicted exosome components, encodes the catalytic subunits of the archaeal proteasome, two ribosomal proteins and a DNA-directed RNA polymerase subunit. These observations suggest that in archaea, a tight functional coupling exists between translation, RNA processing and degradation, (apparently mediated by the predicted exosome) and protein degradation (mediated by the proteasome), and may have implications for cross-talk between these processes in eukaryotes.

Collapse

Tamames J. Evolution of gene order conservation in prokaryotes. Genome Biol 2001;2:RESEARCH0020. [PMID: 11423009 PMCID: PMC33396 DOI: 10.1186/gb-2001-2-6-research0020] [Citation(s) in RCA: 137] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2001] [Revised: 04/09/2001] [Accepted: 04/12/2001] [Indexed: 11/11/2022] Open

Suyama M, Bork P. Evolution of prokaryotic gene order: genome rearrangements in closely related species. Trends Genet 2001;17:10-3. [PMID: 11163906 DOI: 10.1016/s0168-9525(00)02159-4] [Citation(s) in RCA: 114] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Snel B, Lehmann G, Bork P, Huynen MA. STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res 2000;28:3442-4. [PMID: 10982861 PMCID: PMC110752 DOI: 10.1093/nar/28.18.3442] [Citation(s) in RCA: 795] [Impact Index Per Article: 33.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2000] [Accepted: 08/02/2000] [Indexed: 11/14/2022] Open

Huynen M, Snel B, Lathe W, Bork P. Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 2000;10:1204-10. [PMID: 10958638 PMCID: PMC310926 DOI: 10.1101/gr.10.8.1204] [Citation(s) in RCA: 347] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Abstract

Various new methods have been proposed to predict functional interactions between proteins based on the genomic context of their genes. The types of genomic context that they use are Type I: the fusion of genes; Type II: the conservation of gene-order or co-occurrence of genes in potential operons; and Type III: the co-occurrence of genes across genomes (phylogenetic profiles). Here we compare these types for their coverage, their correlations with various types of functional interaction, and their overlap with homology-based function assignment. We apply the methods to Mycoplasma genitalium, the standard benchmarking genome in computational and experimental genomics. Quantitatively, conservation of gene order is the technique with the highest coverage, applying to 37% of the genes. By combining gene order conservation with gene fusion (6%), the co-occurrence of genes in operons in absence of gene order conservation (8%), and the co-occurrence of genes across genomes (11%), significant context information can be obtained for 50% of the genes (the categories overlap). Qualitatively, we observe that the functional interactions between genes are stronger as the requirements for physical neighborhood on the genome are more stringent, while the fraction of potential false positives decreases. Moreover, only in cases in which gene order is conserved in a substantial fraction of the genomes, in this case six out of twenty-five, does a single type of functional interaction (physical interaction) clearly dominate (>80%). In other cases, complementary function information from homology searches, which is available for most of the genes with significant genomic context, is essential to predict the type of interaction. Using a combination of genomic context and homology searches, new functional features can be predicted for 10% of M. genitalium genes.

Collapse

Comparative Genome Analysis: Exploiting the Context of Genes to Infer Evolution and Predict Function. COMPARATIVE GENOMICS 2000. [DOI: 10.1007/978-94-011-4309-7_25] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]