Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Siew N, Fischer D. Analysis of singleton ORFans in fully sequenced microbial genomes. Proteins 2003;53:241-51. [PMID: 14517975 DOI: 10.1002/prot.10423] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Number

Cited by Other Article(s)

Zielezinski A, Dobrychlop W, Karlowski WM. TRGdb: a universal resource for the exploration of taxonomically restricted genes in bacteria. Database (Oxford) 2023;2023:baad058. [PMID: 37555549 PMCID: PMC10410690 DOI: 10.1093/database/baad058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 06/30/2023] [Accepted: 07/31/2023] [Indexed: 08/10/2023]

Sanejouand YH. On the Unknown Proteins of Eukaryotic Proteomes. J Mol Evol 2023:10.1007/s00239-023-10116-1. [PMID: 37219573 DOI: 10.1007/s00239-023-10116-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 05/07/2023] [Indexed: 05/24/2023]

Jiang M, Li X, Dong X, Zu Y, Zhan Z, Piao Z, Lang H. Research Advances and Prospects of Orphan Genes in Plants. FRONTIERS IN PLANT SCIENCE 2022;13:947129. [PMID: 35874010 PMCID: PMC9305701 DOI: 10.3389/fpls.2022.947129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 06/23/2022] [Indexed: 06/15/2023]

Lobb B, Tremblay BJM, Moreno-Hagelsieb G, Doxey AC. An assessment of genome annotation coverage across the bacterial tree of life. Microb Genom 2020;6. [PMID: 32124724 PMCID: PMC7200070 DOI: 10.1099/mgen.0.000341] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Abstract

Although gene-finding in bacterial genomes is relatively straightforward, the automated assignment of gene function is still challenging, resulting in a vast quantity of hypothetical sequences of unknown function. But how prevalent are hypothetical sequences across bacteria, what proportion of genes in different bacterial genomes remain unannotated, and what factors affect annotation completeness? To address these questions, we surveyed over 27 000 bacterial genomes from the Genome Taxonomy Database, and measured genome annotation completeness as a function of annotation method, taxonomy, genome size, 'research bias' and publication date. Our analysis revealed that 52 and 79 % of the average bacterial proteome could be functionally annotated based on protein and domain-based homology searches, respectively. Annotation coverage using protein homology search varied significantly from as low as 14 % in some species to as high as 98 % in others. We found that taxonomy is a major factor influencing annotation completeness, with distinct trends observed across the microbial tree (e.g. the lowest level of completeness was found in the Patescibacteria lineage). Most lineages showed a significant association between genome size and annotation incompleteness, likely reflecting a greater degree of uncharacterized sequences in 'accessory' proteomes than in 'core' proteomes. Finally, research bias, as measured by publication volume, was also an important factor influencing genome annotation completeness, with early model organisms showing high completeness levels relative to other genomes in their own taxonomic lineages. Our work highlights the disparity in annotation coverage across the bacterial tree of life and emphasizes a need for more experimental characterization of accessory proteomes as well as understudied lineages.

Collapse

Chen K, Tian Z, Chen P, He H, Jiang F, Long CA. Genome-wide identification, characterization and expression analysis of lineage-specific genes within Hanseniaspora yeasts. FEMS Microbiol Lett 2020;367:5837084. [PMID: 32407480 DOI: 10.1093/femsle/fnaa077] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 05/12/2020] [Indexed: 12/13/2022] Open

Orphan Genes Shared by Pathogenic Genomes Are More Associated with Bacterial Pathogenicity. mSystems 2019;4:mSystems00290-18. [PMID: 30801025 PMCID: PMC6372840 DOI: 10.1128/msystems.00290-18] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 01/08/2019] [Indexed: 11/20/2022] Open

Abstract

Recent pangenome analyses of numerous bacterial species have suggested that each genome of a single species may have a significant fraction of its gene content unique or shared by a very few genomes (i.e., ORFans). We selected nine bacterial genera, each containing at least five pathogenic and five nonpathogenic genomes, to compare their ORFans in relation to pathogenicity-related genes. Pathogens in these genera are known to cause a number of common and devastating human diseases such as pneumonia, diphtheria, melioidosis, and tuberculosis. Thus, they are worthy of in-depth systems microbiology investigations, including the comparative study of ORFans between pathogens and nonpathogens. We provide direct evidence to suggest that ORFans shared by more pathogens are more associated with pathogenicity-related genes and thus are more important targets for development of new diagnostic markers or therapeutic drugs for bacterial infectious diseases.

Orphan genes (also known as ORFans [i.e., orphan open reading frames]) are new genes that enable an organism to adapt to its specific living environment. Our focus in this study is to compare ORFans between pathogens (P) and nonpathogens (NP) of the same genus. Using the pangenome idea, we have identified 130,169 ORFans in nine bacterial genera (505 genomes) and classified these ORFans into four groups: (i) SS-ORFans (P), which are only found in a single pathogenic genome; (ii) SS-ORFans (NP), which are only found in a single nonpathogenic genome; (iii) PS-ORFans (P), which are found in multiple pathogenic genomes; and (iv) NS-ORFans (NP), which are found in multiple nonpathogenic genomes. Within the same genus, pathogens do not always have more genes, more ORFans, or more pathogenicity-related genes (PRGs)—including prophages, pathogenicity islands (PAIs), virulence factors (VFs), and horizontal gene transfers (HGTs)—than nonpathogens. Interestingly, in pathogens of the nine genera, the percentages of PS-ORFans are consistently higher than those of SS-ORFans, which is not true in nonpathogens. Similarly, in pathogens of the nine genera, the percentages of PS-ORFans matching the four types of PRGs are also always higher than those of SS-ORFans, but this is not true in nonpathogens. All of these findings suggest the greater importance of PS-ORFans for bacterial pathogenicity.

IMPORTANCE Recent pangenome analyses of numerous bacterial species have suggested that each genome of a single species may have a significant fraction of its gene content unique or shared by a very few genomes (i.e., ORFans). We selected nine bacterial genera, each containing at least five pathogenic and five nonpathogenic genomes, to compare their ORFans in relation to pathogenicity-related genes. Pathogens in these genera are known to cause a number of common and devastating human diseases such as pneumonia, diphtheria, melioidosis, and tuberculosis. Thus, they are worthy of in-depth systems microbiology investigations, including the comparative study of ORFans between pathogens and nonpathogens. We provide direct evidence to suggest that ORFans shared by more pathogens are more associated with pathogenicity-related genes and thus are more important targets for development of new diagnostic markers or therapeutic drugs for bacterial infectious diseases.

Collapse

Description and genomic characterization of Massiliimalia massiliensis gen. nov., sp. nov., and Massiliimalia timonensis gen. nov., sp. nov., two new members of the family Ruminococcaceae isolated from the human gut. Antonie van Leeuwenhoek 2019;112:905-918. [DOI: 10.1007/s10482-018-01223-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 12/29/2018] [Indexed: 12/16/2022]

Angers A, Ouimet P, Tsyvian-Dzyabko A, Nock T, Breton S. [The underestimated coding potential of mitochondrial DNA]. Med Sci (Paris) 2019;35:46-54. [PMID: 30672456 DOI: 10.1051/medsci/2018308] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open

Basile W, Sachenkova O, Light S, Elofsson A. High GC content causes orphan proteins to be intrinsically disordered. PLoS Comput Biol 2017;13:e1005375. [PMID: 28355220 PMCID: PMC5389847 DOI: 10.1371/journal.pcbi.1005375] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Revised: 04/12/2017] [Accepted: 01/21/2017] [Indexed: 01/29/2023] Open

Abstract

De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population. These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.

We show that the GC content of a genome is of great importance for the properties of an orphan protein. GC content affects the frequency of the codons and this affects the probability for each amino acid to be included in a de novo created protein. The codons encoding for Ala, Pro and Gly contain 80% GC, while codons for Lys, Phe, Asn, Tyr and Ile contain 20% or less. The three high GC amino acids are all disorder promoting, while Phe, Tyr and Ile are order promoting. Therefore, random protein sequences at a high GC will be more disordered than the ones created at a low GC. The structural properties of the youngest proteins match to a large degree the properties of random proteins when the GC content is taken into account. In contrast, structural properties of ancient proteins only show a weak correlation with GC content. This suggests that even after fixation in the population, proteins largely resemble random proteins given a certain GC content. Thereafter, during evolution the correlation between structural properties and GC weakens.

Collapse

Gupta RS. Impact of genomics on the understanding of microbial evolution and classification: the importance of Darwin's views on classification. FEMS Microbiol Rev 2016;40:520-53. [PMID: 27279642 DOI: 10.1093/femsre/fuw011] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/14/2016] [Indexed: 12/24/2022] Open

Lobb B, Doxey AC. Novel function discovery through sequence and structural data mining. Curr Opin Struct Biol 2016;38:53-61. [DOI: 10.1016/j.sbi.2016.05.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 01/30/2023]

Xu Y, Wu G, Hao B, Chen L, Deng X, Xu Q. Identification, characterization and expression analysis of lineage-specific genes within sweet orange (Citrus sinensis). BMC Genomics 2015;16:995. [PMID: 26597278 PMCID: PMC4657247 DOI: 10.1186/s12864-015-2211-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Accepted: 11/13/2015] [Indexed: 11/23/2022] Open

Hugon P, Dufour JC, Colson P, Fournier PE, Sallah K, Raoult D. A comprehensive repertoire of prokaryotic species identified in human beings. THE LANCET. INFECTIOUS DISEASES 2015;15:1211-1219. [PMID: 26311042 DOI: 10.1016/s1473-3099(15)00293-5] [Citation(s) in RCA: 209] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Revised: 02/17/2015] [Accepted: 02/27/2015] [Indexed: 02/07/2023]

Lobb B, Kurtz DA, Moreno-Hagelsieb G, Doxey AC. Remote homology and the functions of metagenomic dark matter. Front Genet 2015;6:234. [PMID: 26257768 PMCID: PMC4508852 DOI: 10.3389/fgene.2015.00234] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Accepted: 06/22/2015] [Indexed: 01/26/2023] Open

Abstract

Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain insights into these questions, we applied sensitive remote-homology detection methods to functionally analyze ORFans from soil, marine, and human gut metagenome collections. ORFans were identified, clustered into sequence families, and annotated through profile-profile comparison to proteins of known structure. We found that a considerable number of metagenomic ORFans (73,896 of 484,121, 15.3%) exhibit significant remote homology to structurally characterized proteins, providing a means for ORFan functional profiling. The extent of detected remote homology far exceeds that obtained for artificial protein families (1.4%). As expected for real genes, the predicted functions of ORFans are significantly similar to the functions of their gene neighbors (p < 0.001). Compared to the functional profiles predicted through standard homology searches, ORFans show biologically intriguing differences. Many ORFan-enriched functions are virus-related and tend to reflect biological processes associated with extreme sequence diversity. Each environment also possesses a large number of unique ORFan families and functions, including some known to play important community roles such as gut microbial polysaccharide digestion. Lastly, ORFans are a valuable resource for finding novel enzymes of interest, as we demonstrate through the identification of hundreds of novel ORFan metalloproteases that all possess a signature catalytic motif despite a general lack of similarity to known proteins. Our ORFan functional predictions are a valuable resource for discovering novel protein families and exploring the boundaries of protein sequence space. All remote homology predictions are available at http://doxey.uwaterloo.ca/ORFans.

Collapse

Zhou K, Huang B, Zou M, Lu D, He S, Wang G. Genome-wide identification of lineage-specific genes within Caenorhabditis elegans. Genomics 2015;106:242-8. [PMID: 26188256 DOI: 10.1016/j.ygeno.2015.07.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Revised: 07/08/2015] [Accepted: 07/09/2015] [Indexed: 11/19/2022]

Yang YS, Fernandez B, Lagorce A, Aloin V, De Guillen KM, Boyer JB, Dedieu A, Confalonieri F, Armengaud J, Roumestand C. Prioritizing targets for structural biology through the lens of proteomics: the archaeal protein TGAM_1934 from Thermococcus gammatolerans. Proteomics 2015;15:114-23. [PMID: 25359407 DOI: 10.1002/pmic.201300535] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Revised: 10/01/2014] [Accepted: 10/24/2014] [Indexed: 11/09/2022]

Boros Á, Pankovics P, Reuter G. Avian picornaviruses: molecular evolution, genome diversity and unusual genome features of a rapidly expanding group of viruses in birds. INFECTION GENETICS AND EVOLUTION 2014;28:151-66. [PMID: 25278047 DOI: 10.1016/j.meegid.2014.09.027] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2014] [Revised: 09/15/2014] [Accepted: 09/21/2014] [Indexed: 12/29/2022]

Abstract

Picornaviridae is one of the most diverse families of viruses infecting vertebrate species. In contrast to the relative small number of mammal species compared to other vertebrates, the abundance of mammal-infecting picornaviruses was significantly overrepresented among the presently known picornaviruses. Therefore most of the current knowledge about the genome diversity/organization patterns and common genome features were based on the analysis of mammal-infecting picornaviruses. Beside the well known reservoir role of birds in case of several emerging viral pathogens, little is known about the diversity of picornaviruses circulating among birds, although in the last decade the number of known avian picornavirus species with complete genome was increased from one to at least 15. However, little is known about the geographic distribution, host spectrum or pathogenic potential of the recently described picornaviruses of birds. Despite the low number of known avian picornaviruses, the phylogenetic and genome organization diversity of these viruses were remarkable. Beside the common L-4-3-4 and 4-3-4 genome layouts unusual genome patterns (3-4-4; 3-5-4, 3-6-4; 3-8-4) with variable, multicistronic 2A genome regions were found among avian picornaviruses. The phylogenetic and genomic analysis revealed the presence of several conserved structures at the untranslated regions among phylogenetically distant avian and non-avian picornaviruses as well as at least five different avian picornavirus phylogenetic clusters located in every main picornavirus lineage with characteristic genome layouts which suggests the complex evolution history of these viruses. Based on the remarkable genetic diversity of the few known avian picornaviruses, the emergence of further divergent picornaviruses causing challenges in the current taxonomy and also in the understanding of the evolution and genome organization of picornaviruses will be strongly expected. In this review we would like to summarize the current knowledge about the taxonomy, pathogenic potential, phylogenetic/genomic diversity and evolutional relationship of avian picornaviruses.

Collapse

Mewalal R, Mizrachi E, Mansfield SD, Myburg AA. Cell wall-related proteins of unknown function: missing links in plant cell wall development. PLANT & CELL PHYSIOLOGY 2014;55:1031-43. [PMID: 24683037 DOI: 10.1093/pcp/pcu050] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]

Jeanniard A, Dunigan DD, Gurnon JR, Agarkova IV, Kang M, Vitek J, Duncan G, McClung OW, Larsen M, Claverie JM, Van Etten JL, Blanc G. Towards defining the chloroviruses: a genomic journey through a genus of large DNA viruses. BMC Genomics 2013;14:158. [PMID: 23497343 PMCID: PMC3602175 DOI: 10.1186/1471-2164-14-158] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2012] [Accepted: 02/22/2013] [Indexed: 11/29/2022] Open

Abstract

Background

Giant viruses in the genus Chlorovirus (family Phycodnaviridae) infect eukaryotic green microalgae. The prototype member of the genus, Paramecium bursaria chlorella virus 1, was sequenced more than 15 years ago, and to date there are only 6 fully sequenced chloroviruses in public databases. Presented here are the draft genome sequences of 35 additional chloroviruses (287 – 348 Kb/319 – 381 predicted protein encoding genes) collected across the globe; they infect one of three different green algal species. These new data allowed us to analyze the genomic landscape of 41 chloroviruses, which revealed some remarkable features about these viruses.

Results

Genome colinearity, nucleotide conservation and phylogenetic affinity were limited to chloroviruses infecting the same host, confirming the validity of the three previously known subgenera. Clues for the existence of a fourth new subgenus indicate that the boundaries of chlorovirus diversity are not completely determined. Comparison of the chlorovirus phylogeny with that of the algal hosts indicates that chloroviruses have changed hosts in their evolutionary history. Reconstruction of the ancestral genome suggests that the last common chlorovirus ancestor had a slightly more diverse protein repertoire than modern chloroviruses. However, more than half of the defined chlorovirus gene families have a potential recent origin (after Chlorovirus divergence), among which a portion shows compositional evidence for horizontal gene transfer. Only a few of the putative acquired proteins had close homologs in databases raising the question of the true donor organism(s). Phylogenomic analysis identified only seven proteins whose genes were potentially exchanged between the algal host and the chloroviruses.

Conclusion

The present evaluation of the genomic evolution pattern suggests that chloroviruses differ from that described in the related Poxviridae and Mimiviridae. Our study shows that the fixation of algal host genes has been anecdotal in the evolutionary history of chloroviruses. We finally discuss the incongruence between compositional evidence of horizontal gene transfer and lack of close relative sequences in the databases, which suggests that the recently acquired genes originate from a still largely un-sequenced reservoir of genomes, possibly other unknown viruses that infect the same hosts.

Collapse

Yang L, Zou M, Fu B, He S. Genome-wide identification, characterization, and expression analysis of lineage-specific genes within zebrafish. BMC Genomics 2013;14:65. [PMID: 23368736 PMCID: PMC3599513 DOI: 10.1186/1471-2164-14-65] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2012] [Accepted: 01/29/2013] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The genomic basis of teleost phenotypic complexity remains obscure, despite increasing availability of genome and transcriptome sequence data. Fish-specific genome duplication cannot provide sufficient explanation for the morphological complexity of teleosts, considering the relatively large number of extinct basal ray-finned fishes.

RESULTS

In this study, we performed comparative genomic analysis to discover the Conserved Teleost-Specific Genes (CTSGs) and orphan genes within zebrafish and found that these two sets of lineage-specific genes may have played important roles during zebrafish embryogenesis. Lineage-specific genes within zebrafish share many of the characteristics of their counterparts in other species: shorter length, fewer exon numbers, higher GC content, and fewer of them have transcript support. Chromosomal location analysis indicated that neither the CTSGs nor the orphan genes were distributed evenly in the chromosomes of zebrafish. The significant enrichment of immunity proteins in CTSGs annotated by gene ontology (GO) or predicted ab initio may imply that defense against pathogens may be an important reason for the diversification of teleosts. The evolutionary origin of the lineage-specific genes was determined and a very high percentage of lineage-specific genes were generated via gene duplications. The temporal and spatial expression profile of lineage-specific genes obtained by expressed sequence tags (EST) and RNA-seq data revealed two novel properties: in addition to being highly tissue-preferred expression, lineage-specific genes are also highly temporally restricted, namely they are expressed in narrower time windows than evolutionarily conserved genes and are specifically enriched in later-stage embryos and early larval stages.

CONCLUSIONS

Our study provides the first systematic identification of two different sets of lineage-specific genes within zebrafish and provides valuable information leading towards a better understanding of the molecular mechanisms of the genomic basis of teleost phenotypic complexity for future studies.

Collapse

Georgiades K, Raoult D. How microbiology helps define the rhizome of life. Front Cell Infect Microbiol 2012;2:60. [PMID: 22919651 PMCID: PMC3417629 DOI: 10.3389/fcimb.2012.00060] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Accepted: 04/16/2012] [Indexed: 01/24/2023] Open

Evolutionary link between the mycobacterial plasmid pAL5000 replication protein RepB and the extracytoplasmic function family of σ factors. J Bacteriol 2012;194:1331-41. [PMID: 22247504 DOI: 10.1128/jb.06218-11] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Modeling gene family evolution and reconciling phylogenetic discord. Methods Mol Biol 2012;856:29-51. [PMID: 22399454 DOI: 10.1007/978-1-61779-585-5_2] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Georgiades K, Merhej V, Raoult D. The influence of rickettsiologists on post-modern microbiology. Front Cell Infect Microbiol 2011;1:8. [PMID: 22919574 PMCID: PMC3417371 DOI: 10.3389/fcimb.2011.00008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Accepted: 10/10/2011] [Indexed: 11/29/2022] Open

Donoghue MT, Keshavaiah C, Swamidatta SH, Spillane C. Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana. BMC Evol Biol 2011;11:47. [PMID: 21332978 PMCID: PMC3049755 DOI: 10.1186/1471-2148-11-47] [Citation(s) in RCA: 126] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2010] [Accepted: 02/18/2011] [Indexed: 11/21/2022] Open

Abstract

Background

All sequenced genomes contain a proportion of lineage-specific genes, which exhibit no sequence similarity to any genes outside the lineage. Despite their prevalence, the origins and functions of most lineage-specific genes remain largely unknown. As more genomes are sequenced opportunities for understanding evolutionary origins and functions of lineage-specific genes are increasing.

Results

This study provides a comprehensive analysis of the origins of lineage-specific genes (LSGs) in Arabidopsis thaliana that are restricted to the Brassicaceae family. In this study, lineage-specific genes within the nuclear (1761 genes) and mitochondrial (28 genes) genomes are identified. The evolutionary origins of two thirds of the lineage-specific genes within the Arabidopsis thaliana genome are also identified. Almost a quarter of lineage-specific genes originate from non-lineage-specific paralogs, while the origins of ~10% of lineage-specific genes are partly derived from DNA exapted from transposable elements (twice the proportion observed for non-lineage-specific genes). Lineage-specific genes are also enriched in genes that have overlapping CDS, which is consistent with such novel genes arising from overprinting. Over half of the subset of the 958 lineage-specific genes found only in Arabidopsis thaliana have alignments to intergenic regions in Arabidopsis lyrata, consistent with either de novo origination or differential gene loss and retention, with both evolutionary scenarios explaining the lineage-specific status of these genes. A smaller number of lineage-specific genes with an incomplete open reading frame across different Arabidopsis thaliana accessions are further identified as accession-specific genes, most likely of recent origin in Arabidopsis thaliana. Putative de novo origination for two of the Arabidopsis thaliana-only genes is identified via additional sequencing across accessions of Arabidopsis thaliana and closely related sister species lineages. We demonstrate that lineage-specific genes have high tissue specificity and low expression levels across multiple tissues and developmental stages. Finally, stress responsiveness is identified as a distinct feature of Brassicaceae-specific genes; where these LSGs are enriched for genes responsive to a wide range of abiotic stresses.

Conclusion

Improving our understanding of the origins of lineage-specific genes is key to gaining insights regarding how novel genes can arise and acquire functionality in different lineages. This study comprehensively identifies all of the Brassicaceae-specific genes in Arabidopsis thaliana and identifies how the majority of such lineage-specific genes have arisen. The analysis allows the relative importance (and prevalence) of different evolutionary routes to the genesis of novel ORFs within lineages to be assessed. Insights regarding the functional roles of lineage-specific genes are further advanced through identification of enrichment for stress responsiveness in lineage-specific genes, highlighting their likely importance for environmental adaptation strategies.

Collapse

Capra JA, Pollard KS, Singh M. Novel genes exhibit distinct patterns of function acquisition and network integration. Genome Biol 2010;11:R127. [PMID: 21187012 PMCID: PMC3046487 DOI: 10.1186/gb-2010-11-12-r127] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2010] [Revised: 11/18/2010] [Accepted: 12/27/2010] [Indexed: 01/03/2023] Open

Molecular signatures for the Crenarchaeota and the Thaumarchaeota. Antonie van Leeuwenhoek 2010;99:133-57. [PMID: 20711675 DOI: 10.1007/s10482-010-9488-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Accepted: 07/26/2010] [Indexed: 10/19/2022]

Abstract

Crenarchaeotes found in mesophilic marine environments were recently placed into a new phylum of Archaea called the Thaumarchaeota. However, very few molecular characteristics of this new phylum are currently known which can be used to distinguish them from the Crenarchaeota. In addition, their relationships to deep-branching archaeal lineages are unclear. We report here detailed analyses of protein sequences from Crenarchaeota and Thaumarchaeota that have identified many conserved signature indels (CSIs) and signature proteins (SPs) (i.e., proteins for which all significant blast hits are from these groups) that are specific for these archaeal groups. Of the identified signatures 6 CSIs and 13 SPs are specific for the Crenarchaeota phylum; 6 CSIs and >250 SPs are uniquely found in various Thaumarchaeota (viz. Cenarchaeum symbiosum, Nitrosopumilus maritimus and a number of uncultured marine crenarchaeotes) and 3 CSIs and ~10 SPs are found in both Thaumarchaeota and Crenarchaeota species. Some of the molecular signatures are also present in Korarchaeum cryptofilum, which forms the independent phylum Korarchaeota. Although some of these molecular signatures suggest a distant shared ancestry between Thaumarchaeota and Crenarchaeota, our identification of large numbers of Thaumarchaeota-specific proteins and their deep branching between the Crenarchaeota and Euryarchaeota phyla in phylogenetic trees shows that they are distinct from both Crenarchaeota and Euryarchaeota in both genetic and phylogenetic terms. These observations support the placement of marine mesophilic archaea into the separate phylum Thaumarchaeota. Additionally, many CSIs and SPs have been found that are specific for different orders within Crenarchaeota (viz. Sulfolobales-3 CSIs and 169 SPs, Thermoproteales-5 CSIs and 25 SPs, Desulfurococcales-4 SPs, and Sulfolobales and Desulfurococcales-2 CSIs and 18 SPs). The signatures described here provide novel means for distinguishing the Crenarchaeota and the Thaumarchaeota and for the classification of related and novel species in different environments. Functional studies on these signature proteins could lead to discovery of novel biochemical properties that are unique to these groups of archaea.

Collapse

Ellrott K, Jaroszewski L, Li W, Wooley JC, Godzik A. Expansion of the protein repertoire in newly explored environments: human gut microbiome specific protein families. PLoS Comput Biol 2010;6:e1000798. [PMID: 20532204 PMCID: PMC2880560 DOI: 10.1371/journal.pcbi.1000798] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2009] [Accepted: 04/27/2010] [Indexed: 02/01/2023] Open

Abstract

The microbes that inhabit particular environments must be able to perform molecular functions that provide them with a competitive advantage to thrive in those environments. As most molecular functions are performed by proteins and are conserved between related proteins, we can expect that organisms successful in a given environmental niche would contain protein families that are specific for functions that are important in that environment. For instance, the human gut is rich in polysaccharides from the diet or secreted by the host, and is dominated by Bacteroides, whose genomes contain highly expanded repertoire of protein families involved in carbohydrate metabolism. To identify other protein families that are specific to this environment, we investigated the distribution of protein families in the currently available human gut genomic and metagenomic data. Using an automated procedure, we identified a group of protein families strongly overrepresented in the human gut. These not only include many families described previously but also, interestingly, a large group of previously unrecognized protein families, which suggests that we still have much to discover about this environment. The identification and analysis of these families could provide us with new information about an environment critical to our health and well being.

Metagenomics provides a unique opportunity to sample the gene content of microbial communities adapted to specific environments and for the study of the correlations between the presence or absence of gene families that occur in organisms within that environment. Such studies provide detailed information about the adaptation of microbes to a given environment and, indirectly, provide clues about the most important molecular processes that are specific for that environment. Having performed such an analysis for the community of the human distal gut, we report many new protein families and identify many others that are highly specific for this particular environment. The function of most of these proteins is unknown, which illustrates the extent of our ignorance about the organisms within this environment that are so important for human health and well being.

Collapse

Yomtovian I, Teerakulkittipong N, Lee B, Moult J, Unger R. Composition bias and the origin of ORFan genes. ACTA ACUST UNITED AC 2010;26:996-9. [PMID: 20231229 PMCID: PMC2853687 DOI: 10.1093/bioinformatics/btq093] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Lin H, Moghe G, Ouyang S, Iezzoni A, Shiu SH, Gu X, Buell CR. Comparative analyses reveal distinct sets of lineage-specific genes within Arabidopsis thaliana. BMC Evol Biol 2010;10:41. [PMID: 20152032 PMCID: PMC2829037 DOI: 10.1186/1471-2148-10-41] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Accepted: 02/12/2010] [Indexed: 11/25/2022] Open

Abstract

Background

The availability of genome and transcriptome sequences for a number of species permits the identification and characterization of conserved as well as divergent genes such as lineage-specific genes which have no detectable sequence similarity to genes from other lineages. While genes conserved among taxa provide insight into the core processes among species, lineage-specific genes provide insights into evolutionary processes and biological functions that are likely clade or species specific.

Results

Comparative analyses using the Arabidopsis thaliana genome and sequences from 178 other species within the Plant Kingdom enabled the identification of 24,624 A. thaliana genes (91.7%) that were termed Evolutionary Conserved (EC) as defined by sequence similarity to a database entry as well as two sets of lineage-specific genes within A. thaliana. One of the A. thaliana lineage-specific gene sets share sequence similarity only to sequences from species within the Brassicaceae family and are termed Conserved Brassicaceae-Specific Genes (914, 3.4%, CBSG). The other set of A. thaliana lineage-specific genes, the Arabidopsis Lineage-Specific Genes (1,324, 4.9%, ALSG), lack sequence similarity to any sequence outside A. thaliana. While many CBSGs (76.7%) and ALSGs (52.9%) are transcribed, the majority of the CBSGs (76.1%) and ALSGs (94.4%) have no annotated function. Co-expression analysis indicated significant enrichment of the CBSGs and ALSGs in multiple functional categories suggesting their involvement in a wide range of biological functions. Subcellular localization prediction revealed that the CBSGs were significantly enriched in proteins targeted to the secretory pathway (412, 45.1%). Among the 107 putatively secreted CBSGs with known functions, 67 encode a putative pollen coat protein or cysteine-rich protein with sequence similarity to the S-locus cysteine-rich protein that is the pollen determinant controlling allele specific pollen rejection in self-incompatible Brassicaceae species. Overall, the ALSGs and CBSGs were more highly methylated in floral tissue compared to the ECs. Single Nucleotide Polymorphism (SNP) analysis showed an elevated ratio of non-synonymous to synonymous SNPs within the ALSGs (1.99) and CBSGs (1.65) relative to the EC set (0.92), mainly caused by an elevated number of non-synonymous SNPs, indicating that they are fast-evolving at the protein sequence level.

Conclusions

Our analyses suggest that while a significant fraction of the A. thaliana proteome is conserved within the Plant Kingdom, evolutionarily distinct sets of genes that may function in defining biological processes unique to these lineages have arisen within the Brassicaceae and A. thaliana.

Collapse

Gupta RS, Mathews DW. Signature proteins for the major clades of Cyanobacteria. BMC Evol Biol 2010;10:24. [PMID: 20100331 PMCID: PMC2823733 DOI: 10.1186/1471-2148-10-24] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2009] [Accepted: 01/25/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The phylogeny and taxonomy of cyanobacteria is currently poorly understood due to paucity of reliable markers for identification and circumscription of its major clades.

RESULTS

A combination of phylogenomic and protein signature based approaches was used to characterize the major clades of cyanobacteria. Phylogenetic trees were constructed for 44 cyanobacteria based on 44 conserved proteins. In parallel, Blastp searches were carried out on each ORF in the genomes of Synechococcus WH8102, Synechocystis PCC6803, Nostoc PCC7120, Synechococcus JA-3-3Ab, Prochlorococcus MIT9215 and Prochlor. marinus subsp. marinus CCMP1375 to identify proteins that are specific for various main clades of cyanobacteria. These studies have identified 39 proteins that are specific for all (or most) cyanobacteria and large numbers of proteins for other cyanobacterial clades. The identified signature proteins include: (i) 14 proteins for a deep branching clade (Clade A) of Gloebacter violaceus and two diazotrophic Synechococcus strains (JA-3-3Ab and JA2-3-B'a); (ii) 5 proteins that are present in all other cyanobacteria except those from Clade A; (iii) 60 proteins that are specific for a clade (Clade C) consisting of various marine unicellular cyanobacteria (viz. Synechococcus and Prochlorococcus); (iv) 14 and 19 signature proteins that are specific for the Clade C Synechococcus and Prochlorococcus strains, respectively; (v) 67 proteins that are specific for the Low B/A ecotype Prochlorococcus strains, containing lower ratio of chl b/a2 and adapted to growth at high light intensities; (vi) 65 and 8 proteins that are specific for the Nostocales and Chroococcales orders, respectively; and (vii) 22 and 9 proteins that are uniquely shared by various Nostocales and Oscillatoriales orders, or by these two orders and the Chroococcales, respectively. We also describe 3 conserved indels in flavoprotein, heme oxygenase and protochlorophyllide oxidoreductase proteins that are specific for either Clade C cyanobacteria or for various subclades of Prochlorococcus. Many other conserved indels for cyanobacterial clades have been described recently.

CONCLUSIONS

These signature proteins and indels provide novel means for circumscription of various cyanobacterial clades in clear molecular terms. Their functional studies should lead to discovery of novel properties that are unique to these groups of cyanobacteria.

Collapse

Mazza R, Strozzi F, Caprera A, Ajmone-Marsan P, Williams JL. The other side of comparative genomics: genes with no orthologs between the cow and other mammalian species. BMC Genomics 2009;10:604. [PMID: 20003425 PMCID: PMC2808326 DOI: 10.1186/1471-2164-10-604] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2009] [Accepted: 12/14/2009] [Indexed: 11/10/2022] Open

Ekman D, Elofsson A. Identifying and quantifying orphan protein sequences in fungi. J Mol Biol 2009;396:396-405. [PMID: 19944701 DOI: 10.1016/j.jmb.2009.11.053] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2009] [Revised: 11/17/2009] [Accepted: 11/20/2009] [Indexed: 11/15/2022]

Abstract

For large regions of many proteins, and even entire proteins, no homology to known domains or proteins can be detected. These sequences are often referred to as orphans. Surprisingly, it has been reported that the large number of orphans is sustained in spite of a rapid increase of available genomic sequences. However, it is believed that de novo creation of coding sequences is rare in comparison to mechanisms such as domain shuffling and gene duplication; hence, most sequences should have homologs in other genomes. To investigate this, the sequences of 19 complete fungi genomes were compared. By using the phylogenetic relationship between these genomes, we could identify potentially de novo created orphans in Saccharomyces cerevisiae. We found that only a small fraction, <2%, of the S. cerevisiae proteome is orphan, which confirms that de novo creation of coding sequences is indeed rare. Furthermore, we found it necessary to compare the most closely related species to distinguish between de novo created sequences and rapidly evolving sequences where homologs are present but cannot be detected. Next, the orphan proteins (OPs) and orphan domains (ODs) were characterized. First, it was observed that both OPs and ODs are short. In addition, at least some of the OPs have been shown to be functional in experimental assays, showing that they are not pseudogenes. Furthermore, in contrast to what has been reported before and what is seen for older orphans, S. cerevisiae specific ODs and proteins are not more disordered than other proteins. This might indicate that many of the older, and earlier classified, orphans indeed are fast-evolving sequences. Finally, >90% of the detected ODs are located at the protein termini, which suggests that these orphans could have been created by mutations that have affected the start or stop codons.

Collapse

Cortez D, Forterre P, Gribaldo S. A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes. Genome Biol 2009;10:R65. [PMID: 19531232 PMCID: PMC2718499 DOI: 10.1186/gb-2009-10-6-r65] [Citation(s) in RCA: 95] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2009] [Revised: 06/04/2009] [Accepted: 06/16/2009] [Indexed: 11/10/2022] Open

Abstract

A large-scale survey of potential recently acquired integrative elements in 119 archaeal and bacterial genomes reveals that many recently acquired genes have originated from integrative elements

Background

Archaeal and bacterial genomes contain a number of genes of foreign origin that arose from recent horizontal gene transfer, but the role of integrative elements (IEs), such as viruses, plasmids, and transposable elements, in this process has not been extensively quantified. Moreover, it is not known whether IEs play an important role in the origin of ORFans (open reading frames without matches in current sequence databases), whose proportion remains stable despite the growing number of complete sequenced genomes.

Results

We have performed a large-scale survey of potential recently acquired IEs in 119 archaeal and bacterial genomes. We developed an accurate in silico Markov model-based strategy to identify clusters of genes that show atypical sequence composition (clusters of atypical genes or CAGs) and are thus likely to be recently integrated foreign elements, including IEs. Our method identified a high number of new CAGs. Probabilistic analysis of gene content indicates that 56% of these new CAGs are likely IEs, whereas only 7% likely originated via horizontal gene transfer from distant cellular sources. Thirty-four percent of CAGs remain unassigned, what may reflect a still poor sampling of IEs associated with bacterial and archaeal diversity. Moreover, our study contributes to the issue of the origin of ORFans, because 39% of these are found inside CAGs, many of which likely represent recently acquired IEs.

Conclusions

Our results strongly indicate that archaeal and bacterial genomes contain an impressive proportion of recently acquired foreign genes (including ORFans) coming from a still largely unexplored reservoir of IEs.

Collapse

Punta M, Rost B. Neural networks predict protein structure and function. Methods Mol Biol 2009;458:203-30. [PMID: 19065812 DOI: 10.1007/978-1-60327-101-1_11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]

Toll-Riera M, Bosch N, Bellora N, Castelo R, Armengol L, Estivill X, Albà MM. Origin of primate orphan genes: a comparative genomics approach. Mol Biol Evol 2008;26:603-12. [PMID: 19064677 DOI: 10.1093/molbev/msn281] [Citation(s) in RCA: 182] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Kuo CH, Ochman H. The fate of new bacterial genes. FEMS Microbiol Rev 2008;33:38-43. [PMID: 19054121 DOI: 10.1111/j.1574-6976.2008.00140.x] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open

Luhua S, Ciftci-Yilmaz S, Harper J, Cushman J, Mittler R. Enhanced tolerance to oxidative stress in transgenic Arabidopsis plants expressing proteins of unknown function. PLANT PHYSIOLOGY 2008;148:280-92. [PMID: 18614705 PMCID: PMC2528079 DOI: 10.1104/pp.108.124875] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2008] [Accepted: 07/02/2008] [Indexed: 05/19/2023]

Peregrín-Alvarez JM, Parkinson J. The global landscape of sequence diversity. Genome Biol 2008;8:R238. [PMID: 17996061 PMCID: PMC2258180 DOI: 10.1186/gb-2007-8-11-r238] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2007] [Revised: 10/18/2007] [Accepted: 11/08/2007] [Indexed: 11/10/2022] Open

Wasmuth J, Schmid R, Hedley A, Blaxter M. On the extent and origins of genic novelty in the phylum Nematoda. PLoS Negl Trop Dis 2008;2:e258. [PMID: 18596977 PMCID: PMC2432500 DOI: 10.1371/journal.pntd.0000258] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2008] [Accepted: 06/09/2008] [Indexed: 11/18/2022] Open

van Passel MWJ, Marri PR, Ochman H. The emergence and fate of horizontally acquired genes in Escherichia coli. PLoS Comput Biol 2008;4:e1000059. [PMID: 18404206 PMCID: PMC2275313 DOI: 10.1371/journal.pcbi.1000059] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2007] [Accepted: 03/14/2008] [Indexed: 11/18/2022] Open

Merkeev IV, Mironov AA. Orphan genes: Function, evolution, and composition. Mol Biol 2008. [DOI: 10.1134/s0026893308010196] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Yin Y, Fischer D. Identification and investigation of ORFans in the viral world. BMC Genomics 2008;9:24. [PMID: 18205946 PMCID: PMC2245933 DOI: 10.1186/1471-2164-9-24] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2007] [Accepted: 01/19/2008] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Genome-wide studies have already shed light into the evolution and enormous diversity of the viral world. Nevertheless, one of the unresolved mysteries in comparative genomics today is the abundance of ORFans - ORFs with no detectable sequence similarity to any other ORF in the databases. Recently, studies attempting to understand the origin and functions of bacterial ORFans have been reported. Here we present a first genome-wide identification and analysis of ORFans in the viral world, with focus on bacteriophages.

RESULTS

Almost one-third of all ORFs in 1,456 complete virus genomes correspond to ORFans, a figure significantly larger than that observed in prokaryotes. Like prokaryotic ORFans, viral ORFans are shorter and have a lower GC content than non-ORFans. Nevertheless, a statistically significant lower GC content is found only on a minority of viruses. By focusing on phages, we find that 38.4% of phage ORFs have no homologs in other phages, and 30.1% have no homologs neither in the viral nor in the prokaryotic world. Phages with different host ranges have different percentages of ORFans, reflecting different sampling status and suggesting various diversities. Similarity searches of the phage ORFeome (ORFans and non-ORFans) against prokaryotic genomes shows that almost half of the phage ORFs have prokaryotic homologs, suggesting the major role that horizontal transfer plays in bacterial evolution. Surprisingly, the percentage of phage ORFans with prokaryotic homologs is only 18.7%. This suggests that phage ORFans play a lesser role in horizontal transfer to prokaryotes, but may be among the major players contributing to the vast phage diversity.

CONCLUSION

Although the current sampling of viral genomes is extremely low, ORFans and near-ORFans are likely to continue to grow in number as more genomes are sequenced. The abundance of phage ORFans may be partially due to the expected vast viral diversity, and may be instrumental in understanding viral evolution. The functions, origins and fates of the majority of viral ORFans remain a mystery. Further computational and experimental studies are likely to shed light on the mechanisms that have given rise to so many bacterial and viral ORFans.

Collapse

Gupta RS, Mok A. Phylogenomics and signature proteins for the alpha proteobacteria and its main groups. BMC Microbiol 2007;7:106. [PMID: 18045498 PMCID: PMC2241609 DOI: 10.1186/1471-2180-7-106] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2007] [Accepted: 11/28/2007] [Indexed: 01/11/2023] Open

Abstract

Background

Alpha proteobacteria are one of the largest and most extensively studied groups within bacteria. However, for these bacteria as a whole and for all of its major subgroups (viz. Rhizobiales, Rhodobacterales, Rhodospirillales, Rickettsiales, Sphingomonadales and Caulobacterales), very few or no distinctive molecular or biochemical characteristics are known.

Results

We have carried out comprehensive phylogenomic analyses by means of Blastp and PSI-Blast searches on the open reading frames in the genomes of several α-proteobacteria (viz. Bradyrhizobium japonicum, Brucella suis, Caulobacter crescentus, Gluconobacter oxydans, Mesorhizobium loti, Nitrobacter winogradskyi, Novosphingobium aromaticivorans, Rhodobacter sphaeroides 2.4.1, Silicibacter sp. TM1040, Rhodospirillum rubrum and Wolbachia (Drosophila) endosymbiont). These studies have identified several proteins that are distinctive characteristics of all α-proteobacteria, as well as numerous proteins that are unique repertoires of all of its main orders (viz. Rhizobiales, Rhodobacterales, Rhodospirillales, Rickettsiales, Sphingomonadales and Caulobacterales) and many families (viz. Rickettsiaceae, Anaplasmataceae, Rhodospirillaceae, Acetobacteraceae, Bradyrhiozobiaceae, Brucellaceae and Bartonellaceae). Many other proteins that are present at different phylogenetic depths in α-proteobacteria provide important information regarding their evolution. The evolutionary relationships among α-proteobacteria as deduced from these studies are in excellent agreement with their branching pattern in the phylogenetic trees and character compatibility cliques based on concatenated sequences for many conserved proteins. These studies provide evidence that the major groups within α-proteobacteria have diverged in the following order: (Rickettsiales(Rhodospirillales (Sphingomonadales (Rhodobacterales (Caulobacterales-Parvularculales (Rhizobiales)))))). We also describe two conserved inserts in DNA Gyrase B and RNA polymerase beta subunit that are distinctive characteristics of the Sphingomonadales and Rhodosprilllales species, respectively. The results presented here also provide support for the grouping of Hyphomonadaceae and Parvularcula species with the Caulobacterales and the placement of Stappia aggregata with the Rhizobiaceae group.

Conclusion

The α-proteobacteria-specific proteins and indels described here provide novel and powerful means for the taxonomic, biochemical and molecular biological studies on these bacteria. Their functional studies should prove helpful in identifying novel biochemical and physiological characteristics that are unique to these bacteria.

Collapse

Chin KH, Ruan SK, Wang AHJ, Chou SH. XC5848, an ORFan protein from Xanthomonas campestris, adopts a novel variant of Sm-like motif. Proteins 2007;68:1006-10. [PMID: 17546661 DOI: 10.1002/prot.21375] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Jain M, Khurana P, Tyagi AK, Khurana JP. Genome-wide analysis of intronless genes in rice and Arabidopsis. Funct Integr Genomics 2007;8:69-78. [PMID: 17578610 DOI: 10.1007/s10142-007-0052-9] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2007] [Revised: 04/07/2007] [Accepted: 05/06/2007] [Indexed: 10/23/2022]

Gollery M, Harper J, Cushman J, Mittler T, Girke T, Zhu JK, Bailey-Serres J, Mittler R. What makes species unique? The contribution of proteins with obscure features. Genome Biol 2007;7:R57. [PMID: 16859532 PMCID: PMC1779552 DOI: 10.1186/gb-2006-7-7-r57] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2006] [Revised: 04/28/2006] [Accepted: 06/27/2006] [Indexed: 11/23/2022] Open

Luban S, Kihara D. Comparative genomics of small RNAs in bacterial genomes. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2007;11:58-73. [PMID: 17411396 DOI: 10.1089/omi.2006.0005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Marsden RL, Lewis TA, Orengo CA. Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint. BMC Bioinformatics 2007;8:86. [PMID: 17349043 PMCID: PMC1829165 DOI: 10.1186/1471-2105-8-86] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2006] [Accepted: 03/09/2007] [Indexed: 11/25/2022] Open

Marri PR, Hao W, Golding GB. The role of laterally transferred genes in adaptive evolution. BMC Evol Biol 2007;7 Suppl 1:S8. [PMID: 17288581 PMCID: PMC1796617 DOI: 10.1186/1471-2148-7-s1-s8] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open