1
|
Abstract
Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL [Markov clustering algorithm]) is a tool that automates the process of identifying clusters of orthologous genes from precomputed phylogenetic trees and classifying gene families. It identifies orthology relationships between genes using the species overlap algorithm to infer taxonomic information from the gene tree topology, and then uses the MCL to identify orthology clusters and provide annotated gene families. Our benchmarking shows that this approach, when provided with accurate phylogenies, is able to identify manually curated orthogroups with very high precision and recall. Overall, Possvm automates the routine process of gene tree inspection and annotation in a highly interpretable manner, and provides reusable outputs and phylogeny-aware gene annotations that can be used to inform comparative genomics and gene family evolution analyses.
Collapse
Affiliation(s)
- Xavier Grau-Bové
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Catalonia, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, 08003, Spain
| | - Arnau Sebé-Pedrós
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Catalonia, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, 08003, Spain
| |
Collapse
|
2
|
Lallemand T, Leduc M, Landès C, Rizzon C, Lerat E. An Overview of Duplicated Gene Detection Methods: Why the Duplication Mechanism Has to Be Accounted for in Their Choice. Genes (Basel) 2020; 11:E1046. [PMID: 32899740 PMCID: PMC7565063 DOI: 10.3390/genes11091046] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 09/01/2020] [Accepted: 09/02/2020] [Indexed: 12/11/2022] Open
Abstract
Gene duplication is an important evolutionary mechanism allowing to provide new genetic material and thus opportunities to acquire new gene functions for an organism, with major implications such as speciation events. Various processes are known to allow a gene to be duplicated and different models explain how duplicated genes can be maintained in genomes. Due to their particular importance, the identification of duplicated genes is essential when studying genome evolution but it can still be a challenge due to the various fates duplicated genes can encounter. In this review, we first describe the evolutionary processes allowing the formation of duplicated genes but also describe the various bioinformatic approaches that can be used to identify them in genome sequences. Indeed, these bioinformatic approaches differ according to the underlying duplication mechanism. Hence, understanding the specificity of the duplicated genes of interest is a great asset for tool selection and should be taken into account when exploring a biological question.
Collapse
Affiliation(s)
- Tanguy Lallemand
- IRHS, Agrocampus-Ouest, INRAE, Université d’Angers, SFR 4207 QuaSaV, 49071 Beaucouzé, France; (T.L.); (M.L.); (C.L.)
| | - Martin Leduc
- IRHS, Agrocampus-Ouest, INRAE, Université d’Angers, SFR 4207 QuaSaV, 49071 Beaucouzé, France; (T.L.); (M.L.); (C.L.)
| | - Claudine Landès
- IRHS, Agrocampus-Ouest, INRAE, Université d’Angers, SFR 4207 QuaSaV, 49071 Beaucouzé, France; (T.L.); (M.L.); (C.L.)
| | - Carène Rizzon
- Laboratoire de Mathématiques et Modélisation d’Evry (LaMME), Université d’Evry Val d’Essonne, Université Paris-Saclay, UMR CNRS 8071, ENSIIE, USC INRAE, 23 bvd de France, CEDEX, 91037 Evry Paris, France;
| | - Emmanuelle Lerat
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69622 Villeurbanne, France
| |
Collapse
|
3
|
Rangel LT, Marden J, Colston S, Setubal JC, Graf J, Gogarten JP. Identification and characterization of putative Aeromonas spp. T3SS effectors. PLoS One 2019; 14:e0214035. [PMID: 31163020 PMCID: PMC6548356 DOI: 10.1371/journal.pone.0214035] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 05/21/2019] [Indexed: 11/23/2022] Open
Abstract
The genetic determinants of bacterial pathogenicity are highly variable between species and strains. However, a factor that is commonly associated with virulent Gram-negative bacteria, including many Aeromonas spp., is the type 3 secretion system (T3SS), which is used to inject effector proteins into target eukaryotic cells. In this study, we developed a bioinformatics pipeline to identify T3SS effector proteins, applied this approach to the genomes of 105 Aeromonas strains isolated from environmental, mutualistic, or pathogenic contexts and evaluated the cytotoxicity of the identified effectors through their heterologous expression in yeast. The developed pipeline uses a two-step approach, where candidate Aeromonas gene families are initially selected using Hidden Markov Model (HMM) profile searches against the Virulence Factors DataBase (VFDB), followed by strict comparisons against positive and negative control datasets, greatly reducing the number of false positives. This approach identified 21 Aeromonas T3SS likely effector families, of which 8 represent known or characterized effectors, while the remaining 13 have not previously been described in Aeromonas. We experimentally validated our in silico findings by assessing the cytotoxicity of representative effectors in Saccharomyces cerevisiae BY4741, with 15 out of 21 assayed proteins eliciting a cytotoxic effect in yeast. The results of this study demonstrate the utility of our approach, combining a novel in silico search method with in vivo experimental validation, and will be useful in future research aimed at identifying and authenticating bacterial effector proteins from other genera.
Collapse
Affiliation(s)
- Luiz Thiberio Rangel
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
- Interunidades em Bioinformática, Universidade de São Paulo, São Paulo, Brasil
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brasil
| | - Jeremiah Marden
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
| | - Sophie Colston
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
| | - João Carlos Setubal
- Interunidades em Bioinformática, Universidade de São Paulo, São Paulo, Brasil
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brasil
| | - Joerg Graf
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
- Institute for Systems Genomics, University of Connecticut, Storrs, Connecticut, United States of America
| | - Johann Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
- Institute for Systems Genomics, University of Connecticut, Storrs, Connecticut, United States of America
| |
Collapse
|
4
|
Abstract
The distinction between orthologs and paralogs, genes that started diverging by speciation versus duplication, is relevant in a wide range of contexts, most notably phylogenetic tree inference and protein function annotation. In this chapter, we provide an overview of the methods used to infer orthology and paralogy. We survey both graph-based approaches (and their various grouping strategies) and tree-based approaches, which solve the more general problem of gene/species tree reconciliation. We discuss conceptual differences among the various orthology inference methods and databases and examine the difficult issue of verifying and benchmarking orthology predictions. Finally, we review typical applications of orthologous genes, groups, and reconciled trees and conclude with thoughts on future methodological developments.
Collapse
|
5
|
Trail F, Wang Z, Stefanko K, Cubba C, Townsend JP. The ancestral levels of transcription and the evolution of sexual phenotypes in filamentous fungi. PLoS Genet 2017; 13:e1006867. [PMID: 28704372 PMCID: PMC5509106 DOI: 10.1371/journal.pgen.1006867] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 06/13/2017] [Indexed: 12/29/2022] Open
Abstract
Changes in gene expression have been hypothesized to play an important role in the evolution of divergent morphologies. To test this hypothesis in a model system, we examined differences in fruiting body morphology of five filamentous fungi in the Sordariomycetes, culturing them in a common garden environment and profiling genome-wide gene expression at five developmental stages. We reconstructed ancestral gene expression phenotypes, identifying genes with the largest evolved increases in gene expression across development. Conducting knockouts and performing phenotypic analysis in two divergent species typically demonstrated altered fruiting body development in the species that had evolved increased expression. Our evolutionary approach to finding relevant genes proved far more efficient than other gene deletion studies targeting whole genomes or gene families. Combining gene expression measurements with knockout phenotypes facilitated the refinement of Bayesian networks of the genes underlying fruiting body development, regulation of which is one of the least understood processes of multicellular development.
Collapse
Affiliation(s)
- Frances Trail
- Department of Plant Biology, Michigan State University, East Lansing, MI, United States of America
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, United States of America
| | - Zheng Wang
- Department of Biostatistics, Yale University, New Haven, CT, United States of America
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States of America
| | - Kayla Stefanko
- Department of Plant Biology, Michigan State University, East Lansing, MI, United States of America
| | - Caitlyn Cubba
- Department of Plant Biology, Michigan State University, East Lansing, MI, United States of America
| | - Jeffrey P. Townsend
- Department of Biostatistics, Yale University, New Haven, CT, United States of America
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| |
Collapse
|
6
|
Marcelletti S, Scortichini M. Xylella fastidiosa CoDiRO strain associated with the olive quick decline syndrome in southern Italy belongs to a clonal complex of the subspecies pauca that evolved in Central America. Microbiology (Reading) 2016; 162:2087-2098. [DOI: 10.1099/mic.0.000388] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Simone Marcelletti
- Council for Agricultural Research and Analysis of Agricultural Economics (CREA), Research Centre for Fruit Trees, Via di Fioranello 52, I-00134 Roma, Italy
| | - Marco Scortichini
- Council for Agricultural Research and Analysis of Agricultural Economics (CREA), Research Centre for Fruit Trees, Via Torrino 3, I-81100 Caserta, Italy
- Council for Agricultural Research and Analysis of Agricultural Economics (CREA), Research Centre for Fruit Trees, Via di Fioranello 52, I-00134 Roma, Italy
| |
Collapse
|
7
|
Marcelletti S, Scortichini M. Genome-wide comparison and taxonomic relatedness of multiple Xylella fastidiosa strains reveal the occurrence of three subspecies and a new Xylella species. Arch Microbiol 2016; 198:803-12. [PMID: 27209415 DOI: 10.1007/s00203-016-1245-1] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Revised: 04/13/2016] [Accepted: 05/16/2016] [Indexed: 11/30/2022]
Abstract
A total of 21 Xylella fastidiosa strains were assessed by comparing their genomes to infer their taxonomic relationships. The whole-genome-based average nucleotide identity and tetranucleotide frequency correlation coefficient analyses were performed. In addition, a consensus tree based on comparisons of 956 core gene families, and a genome-wide phylogenetic tree and a Neighbor-net network were constructed with 820,088 nucleotides (i.e., approximately 30-33 % of the entire X. fastidiosa genome). All approaches revealed the occurrence of three well-demarcated genetic clusters that represent X. fastidiosa subspecies fastidiosa, multiplex and pauca, with the latter appeared to diverge. We suggest that the proposed but never formally described subspecies 'sandyi' and 'morus' are instead members of the subspecies fastidiosa. These analyses support the view that the Xylella strain isolated from Pyrus pyrifolia in Taiwan is likely to be a new species. A widely used multilocus sequence typing analysis yielded conflicting results.
Collapse
Affiliation(s)
- Simone Marcelletti
- Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria - Centro di ricerca per le Colture Arboree, Via di Fioranello, 52, 00134, Rome, Italy
| | - Marco Scortichini
- Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria - Centro di ricerca per le Colture Arboree, Via di Fioranello, 52, 00134, Rome, Italy. .,Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria - Centro di ricerca per le Colture Arboree, Via Torrino, 3, 81100, Caserta, Italy.
| |
Collapse
|
8
|
Tekaia F. Inferring Orthologs: Open Questions and Perspectives. GENOMICS INSIGHTS 2016; 9:17-28. [PMID: 26966373 PMCID: PMC4778853 DOI: 10.4137/gei.s37925] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/30/2015] [Accepted: 01/02/2016] [Indexed: 01/25/2023]
Abstract
With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.
Collapse
Affiliation(s)
- Fredj Tekaia
- Institut Pasteur, Unit of Structural Microbiology, CNRS URA 3528 and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| |
Collapse
|
9
|
Marcelletti S, Scortichini M. Comparative Genomic Analyses of Multiple Pseudomonas Strains Infecting Corylus avellana Trees Reveal the Occurrence of Two Genetic Clusters with Both Common and Distinctive Virulence and Fitness Traits. PLoS One 2015; 10:e0131112. [PMID: 26147218 PMCID: PMC4492584 DOI: 10.1371/journal.pone.0131112] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 05/28/2015] [Indexed: 01/26/2023] Open
Abstract
The European hazelnut (Corylus avellana) is threatened in Europe by several pseudomonads which cause symptoms ranging from twig dieback to tree death. A comparison of the draft genomes of nine Pseudomonas strains isolated from symptomatic C. avellana trees was performed to identify common and distinctive genomic traits. The thorough assessment of genetic relationships among the strains revealed two clearly distinct clusters: P. avellanae and P. syringae. The latter including the pathovars avellanae, coryli and syringae. Between these two clusters, no recombination event was found. A genomic island of approximately 20 kb, containing the hrp/hrc type III secretion system gene cluster, was found to be present without any genomic difference in all nine pseudomonads. The type III secretion system effector repertoires were remarkably different in the two groups, with P. avellanae showing a higher number of effectors. Homologue genes of the antimetabolite mangotoxin and ice nucleation activity clusters were found solely in all P. syringae pathovar strains, whereas the siderophore yersiniabactin was only present in P. avellanae. All nine strains have genes coding for pectic enzymes and sucrose metabolism. By contrast, they do not have genes coding for indolacetic acid and anti-insect toxin. Collectively, this study reveals that genomically different Pseudomonas can converge on the same host plant by suppressing the host defence mechanisms with the use of different virulence weapons. The integration into their genomes of a horizontally acquired genomic island could play a fundamental role in their evolution, perhaps giving them the ability to exploit new ecological niches.
Collapse
Affiliation(s)
- Simone Marcelletti
- Consiglio per la ricerca in agricoltura e l’analisi dell’economia agraria (C.R.A.)-Centro di Ricerca per la Frutticoltura, Via di Fioranello 52, I-00134, Roma, Italy
| | - Marco Scortichini
- Consiglio per la ricerca in agricoltura e l’analisi dell’economia agraria (C.R.A.)-Centro di Ricerca per la Frutticoltura, Via di Fioranello 52, I-00134, Roma, Italy
- Consiglio per la ricerca in agricoltura e l’analisi dell’economia agraria (C.R.A.)-Unità di Ricerca per la Frutticoltura, Via Torrino 3, I-81100, Caserta, Italy
| |
Collapse
|
10
|
Lehr NA, Wang Z, Li N, Hewitt DA, López-Giráldez F, Trail F, Townsend JP. Gene expression differences among three Neurospora species reveal genes required for sexual reproduction in Neurospora crassa. PLoS One 2014; 9:e110398. [PMID: 25329823 PMCID: PMC4203796 DOI: 10.1371/journal.pone.0110398] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Accepted: 09/16/2014] [Indexed: 12/23/2022] Open
Abstract
Many fungi form complex three-dimensional fruiting bodies, within which the meiotic machinery for sexual spore production has been considered to be largely conserved over evolutionary time. Indeed, much of what we know about meiosis in plant and animal taxa has been deeply informed by studies of meiosis in Saccharomyces and Neurospora. Nevertheless, the genetic basis of fruiting body development and its regulation in relation to meiosis in fungi is barely known, even within the best studied multicellular fungal model Neurospora crassa. We characterized morphological development and genome-wide transcriptomics in the closely related species Neurospora crassa, Neurospora tetrasperma, and Neurospora discreta, across eight stages of sexual development. Despite diverse life histories within the genus, all three species produce vase-shaped perithecia. Transcriptome sequencing provided gene expression levels of orthologous genes among all three species. Expression of key meiosis genes and sporulation genes corresponded to known phenotypic and developmental differences among these Neurospora species during sexual development. We assembled a list of genes putatively relevant to the recent evolution of fruiting body development by sorting genes whose relative expression across developmental stages increased more in N. crassa relative to the other species. Then, in N. crassa, we characterized the phenotypes of fruiting bodies arising from crosses of homozygous knockout strains of the top genes. Eight N. crassa genes were found to be critical for the successful formation of perithecia. The absence of these genes in these crosses resulted in either no perithecium formation or in arrested development at an early stage. Our results provide insight into the genetic basis of Neurospora sexual reproduction, which is also of great importance with regard to other multicellular ascomycetes, including perithecium-forming pathogens, such as Claviceps purpurea, Ophiostoma ulmi, and Glomerella graminicola.
Collapse
Affiliation(s)
- Nina A. Lehr
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| | - Zheng Wang
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
- Department of Biostatistics, Yale University, New Haven, Connecticut, United States of America
| | - Ning Li
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| | - David A. Hewitt
- Department of Botany, Academy of Natural Sciences, Philadelphia, Pennsylvania, United States of America
- Wagner Free Institute of Science, Philadelphia, Pennsylvania, United States of America
| | - Francesc López-Giráldez
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| | - Frances Trail
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, Michigan, United States of America
| | - Jeffrey P. Townsend
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
- Department of Biostatistics, Yale University, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Program in Microbiology, Yale University, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
11
|
Rusin LY, Lyubetskaya EV, Gorbunov KY, Lyubetsky VA. Reconciliation of gene and species trees. BIOMED RESEARCH INTERNATIONAL 2014; 2014:642089. [PMID: 24800245 PMCID: PMC3985182 DOI: 10.1155/2014/642089] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/11/2013] [Accepted: 11/27/2013] [Indexed: 11/18/2022]
Abstract
The first part of the paper briefly overviews the problem of gene and species trees reconciliation with the focus on defining and algorithmic construction of the evolutionary scenario. Basic ideas are discussed for the aspects of mapping definitions, costs of the mapping and evolutionary scenario, imposing time scales on a scenario, incorporating horizontal gene transfers, binarization and reconciliation of polytomous trees, and construction of species trees and scenarios. The review does not intend to cover the vast diversity of literature published on these subjects. Instead, the authors strived to overview the problem of the evolutionary scenario as a central concept in many areas of evolutionary research. The second part provides detailed mathematical proofs for the solutions of two problems: (i) inferring a gene evolution along a species tree accounting for various types of evolutionary events and (ii) trees reconciliation into a single species tree when only gene duplications and losses are allowed. All proposed algorithms have a cubic time complexity and are mathematically proved to find exact solutions. Solving algorithms for problem (ii) can be naturally extended to incorporate horizontal transfers, other evolutionary events, and time scales on the species tree.
Collapse
Affiliation(s)
- L. Y. Rusin
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoy Karetny Pereulok 19, Moscow 127994, Russia
- Faculty of Biology, Moscow State University, Leninskie Gory 1-12, Moscow 119234, Russia
| | - E. V. Lyubetskaya
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoy Karetny Pereulok 19, Moscow 127994, Russia
| | - K. Y. Gorbunov
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoy Karetny Pereulok 19, Moscow 127994, Russia
| | - V. A. Lyubetsky
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoy Karetny Pereulok 19, Moscow 127994, Russia
| |
Collapse
|
12
|
Reconstructed ancestral Myo-inositol-3-phosphate synthases indicate that ancestors of the Thermococcales and Thermotoga species were more thermophilic than their descendants. PLoS One 2013; 8:e84300. [PMID: 24391933 PMCID: PMC3877268 DOI: 10.1371/journal.pone.0084300] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 11/19/2013] [Indexed: 01/06/2023] Open
Abstract
The bacterial genomes of Thermotoga species show evidence of significant interdomain horizontal gene transfer from the Archaea. Members of this genus acquired many genes from the Thermococcales, which grow at higher temperatures than Thermotoga species. In order to study the functional history of an interdomain horizontally acquired gene we used ancestral sequence reconstruction to examine the thermal characteristics of reconstructed ancestral proteins of the Thermotoga lineage and its archaeal donors. Several ancestral sequence reconstruction methods were used to determine the possible sequences of the ancestral Thermotoga and Archaea myo-inositol-3-phosphate synthase (MIPS). These sequences were predicted to be more thermostable than the extant proteins using an established sequence composition method. We verified these computational predictions by measuring the activities and thermostabilities of purified proteins from the Thermotoga and the Thermococcales species, and eight ancestral reconstructed proteins. We found that the ancestral proteins from both the archaeal donor and the Thermotoga most recent common ancestor recipient were more thermostable than their descendants. We show that there is a correlation between the thermostability of MIPS protein and the optimal growth temperature (OGT) of its host, which suggests that the OGT of the ancestors of these species of Archaea and the Thermotoga grew at higher OGTs than their descendants.
Collapse
|
13
|
Metabolic analysis of Chlorobium chlorochromatii CaD3 reveals clues of the symbiosis in 'Chlorochromatium aggregatum'. ISME JOURNAL 2013; 8:991-8. [PMID: 24285361 DOI: 10.1038/ismej.2013.207] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2013] [Revised: 09/25/2013] [Accepted: 10/07/2013] [Indexed: 11/08/2022]
Abstract
A symbiotic association occurs in 'Chlorochromatium aggregatum', a phototrophic consortium integrated by two species of phylogenetically distant bacteria composed by the green-sulfur Chlorobium chlorochromatii CaD3 epibiont that surrounds a central β-proteobacterium. The non-motile chlorobia can perform nitrogen and carbon fixation, using sulfide as electron donors for anoxygenic photosynthesis. The consortium can move due to the flagella present in the central β-protobacterium. Although Chl. chlorochromatii CaD3 is never found as free-living bacteria in nature, previous transcriptomic and proteomic studies have revealed that there are differential transcription patterns between the symbiotic and free-living status of Chl. chlorocromatii CaD3 when grown in laboratory conditions. The differences occur mainly in genes encoding the enzymatic reactions involved in nitrogen and amino acid metabolism. We performed a metabolic reconstruction of Chl. chlorochromatii CaD3 and an in silico analysis of its amino acid metabolism using an elementary flux modes approach (EFM). Our study suggests that in symbiosis, Chl. chlorochromatii CaD3 is under limited nitrogen conditions where the GS/GOGAT (glutamine synthetase/glutamate synthetase) pathway is actively assimilating ammonia obtained via N2 fixation. In contrast, when free-living, Chl. chlorochromatii CaD3 is in a condition of nitrogen excess and ammonia is assimilated by the alanine dehydrogenase (AlaDH) pathway. We postulate that 'Chlorochromatium aggregatum' originated from a parasitic interaction where the N2 fixation capacity of the chlorobia would be enhanced by injection of 2-oxoglutarate from the β-proteobacterium via the periplasm. This consortium would have the advantage of motility, which is fundamental to a phototrophic bacterium, and the syntrophy of nitrogen and carbon sources.
Collapse
|
14
|
Scortichini M, Marcelletti S, Ferrante P, Firrao G. A Genomic redefinition of Pseudomonas avellanae species. PLoS One 2013; 8:e75794. [PMID: 24086635 PMCID: PMC3783423 DOI: 10.1371/journal.pone.0075794] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 08/20/2013] [Indexed: 11/18/2022] Open
Abstract
The circumscription of bacterial species is a complex task. So far, DNA-DNA hybridization (DDH), 16S rRNA gene sequencing, and multiocus sequence typing analysis (MLSA) are currently the preferred techniques for their genetic determination. However, the average nucleotide identity (ANI) analysis of conserved and shared genes between two bacterial strains based on the pair-wise genome comparisons, with support of the tetranucleotide frequency correlation coefficients (TETRA) value, has recently been proposed as a reliable substitute for DDH. The species demarcation boundary has been set to a value of 95-96% of the ANI identity, with further confirmation through the assessment of the corresponding TETRA value. In this study, we performed a genome-wide MLSA of 14 phytopathogenic pseudomonads genomes, and assessed the ANI and TETRA values of 27 genomes, representing seven out of the nine genomospecies of Pseudomonas spp. sensu Gardan et alii, and their phylogenetic relationships using maximum likelihood and Bayesian approaches. The results demonstrate the existence of a well demarcated genomic cluster that includes strains classified as P. avellanae, P. syringae pv. theae, P. s. pv. actinidiae and one P. s. pv. morsprunorum strain all belonging to the single species P. avellanae. In addition, when compared with P. avellanae, five strains of P. s. pv. tomato, including the model strain DC3000, and one P. s. pv. lachrymans strain, appear as very closely related to P. avellanae, with ANI values of nearly 96% as confirmed by the TETRA analysis. Conversely, one representative strain, previously classified as P. avellanae and isolated in central Italy, is a genuine member of the P. syringae species complex and can be defined as P. s. pv. avellanae. Currently. The core and pan genomes of P. avellanae species consist of 3,995 and 5,410 putative protein-coding genes, respectively.
Collapse
Affiliation(s)
- Marco Scortichini
- Consiglio per la Ricerca e la Sperimentazione in Agricultura - Centro di Ricerca per la Frutticoltura, Roma, Italy
- Consiglio per la Ricerca e la Sperimentazione in Agricultura – Unità di Ricerca per la Frutticoltura, Caserta, Italy
- * E-mail:
| | - Simone Marcelletti
- Consiglio per la Ricerca e la Sperimentazione in Agricultura - Centro di Ricerca per la Frutticoltura, Roma, Italy
| | - Patrizia Ferrante
- Consiglio per la Ricerca e la Sperimentazione in Agricultura - Centro di Ricerca per la Frutticoltura, Roma, Italy
| | - Giuseppe Firrao
- Dipartimento di Scienze Agrarie ed Ambientali, Università di Udine, Udine, Italy
| |
Collapse
|
15
|
Firrao G, Martini M, Ermacora P, Loi N, Torelli E, Foissac X, Carle P, Kirkpatrick BC, Liefting L, Schneider B, Marzachì C, Palmano S. Genome wide sequence analysis grants unbiased definition of species boundaries in "Candidatus Phytoplasma". Syst Appl Microbiol 2013; 36:539-48. [PMID: 24034865 DOI: 10.1016/j.syapm.2013.07.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2013] [Revised: 07/08/2013] [Accepted: 07/18/2013] [Indexed: 10/26/2022]
Abstract
The phytoplasmas are currently named using the Candidatus category, as the inability to grow them in vitro prevented (i) the performance of tests, such as DNA-DNA hybridization, that are regarded as necessary to establish species boundaries, and (ii) the deposition of type strains in culture collections. The recent accession to complete or nearly complete genome sequence information disclosed the opportunity to apply to the uncultivable phytoplasmas the same taxonomic approaches used for other bacteria. In this work, the genomes of 14 strains, belonging to the 16SrI, 16SrIII, 16SrV and 16SrX groups, including the species "Ca. P. asteris", "Ca. P. mali", "Ca. P. pyri", "Ca. P. pruni", and "Ca. P. australiense" were analyzed along with Acholeplasma laidlawi, to determine their taxonomic relatedness. Average nucleotide index (ANIm), tetranucleotide signature frequency correlation index (Tetra), and multilocus sequence analysis of 107 shared genes using both phylogenetic inference of concatenated (DNA and amino acid) sequences and consensus networks, were carried out. The results were in large agreement with the previously established 16S rDNA based classification schemes. Moreover, the taxonomic relationships within the 16SrI, 16SrIII and 16SrX groups, that represent clusters of strains whose relatedness could not be determined by 16SrDNA analysis, could be comparatively evaluated with non-subjective criteria. "Ca. P. mali" and "Ca. P. pyri" were found to meet the genome characteristics for the retention into two different, yet strictly related species; representatives of subgroups 16SrI-A and 16SrI-B were also found to meet the standards used in other bacteria to distinguish separate species; the genomes of the strains belonging to 16SrIII were found more closely related, suggesting that their subdivision into Candidatus species should be approached with caution.
Collapse
Affiliation(s)
- Giuseppe Firrao
- Dipartimento di Scienze Agrarie ed Ambientali, Università di Udine, Udine, Italy; Istituto Nazionale di Biostrutture e Biosistemi, Interuniversity Consortium, Italy.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Williams D, Gogarten JP, Papke RT. Quantifying homologous replacement of loci between haloarchaeal species. Genome Biol Evol 2013; 4:1223-44. [PMID: 23160063 PMCID: PMC3542582 DOI: 10.1093/gbe/evs098] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
In vitro studies of the haloarchaeal genus Haloferax have demonstrated
their ability to frequently exchange DNA between species, whereas rates of homologous
recombination estimated from natural populations in the genus Halorubrum
are high enough to maintain random association of alleles between five loci. To quantify
the effects of gene transfer and recombination of commonly held (relaxed core) genes
during the evolution of the class Halobacteria (haloarchaea), we reconstructed the history
of 21 genomes representing all major groups. Using a novel algorithm and a concatenated
ribosomal protein phylogeny as a reference, we created a directed horizontal genetic
transfer (HGT) network of contemporary and ancestral genomes. Gene order analysis revealed
that 90% of testable HGTs were by direct homologous replacement, rather than
nonhomologous integration followed by a loss. Network analysis revealed an inverse
log-linear relationship between HGT frequency and ribosomal protein evolutionary distance
that is maintained across the deepest divergences in Halobacteria. We use this
mathematical relationship to estimate the total transfers and amino acid substitutions
delivered by HGTs in each genome, providing a measure of chimerism. For the relaxed core
genes of each genome, we conservatively estimate that 11–20% of their
evolution occurred in other haloarchaea. Our findings are unexpected, because the transfer
and homologous recombination of relaxed core genes between members of the class
Halobacteria disrupts the coevolution of genes; however, the generation of new
combinations of divergent but functionally related genes may lead to adaptive phenotypes
not available through cumulative mutations and recombination within a single
population.
Collapse
Affiliation(s)
- David Williams
- Department of Molecular and Cell Biology, University of Connecticut, CT, USA
| | | | | |
Collapse
|
17
|
Ding Y, Cai Y, Han Y, Zhao B, Zhu L. Application of principal component analysis to determine the key structural features contributing to iron superoxide dismutase thermostability. Biopolymers 2012; 97:864-72. [PMID: 22899361 DOI: 10.1002/bip.22093] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Iron superoxide dismutase (Fe-SOD) is predominantly found in bacteria and mitochondria. The thermal stability of Fe-SOD from different sources can vary dramatically. We have studied the influence of structural parameters on Fe-SOD thermostability by principal component analysis (PCA). The results show that an increased α-helical and turn content, an increased α-helix and loop length, an increase in the number of main-main chains and charged-uncharged hydrogen bonds, a decrease in the 3(10) -helix content, and a decreased β-strand and loop length are all important factors for Fe-SOD thermostability. Interestingly, the use of charged residues to form salt bridges is tendentious in thermophilic Fe-SOD. Negatively charged Arg and positively charged Glu are efficiently used to form salt bridges. The cooperative action of the exposed area, the hydrogen bonds, and the secondary structure plays a crucial role in resisting high temperatures, which demonstrates that the increased stability of thermophilic Fe-SOD is provided by several structural factors acting together.
Collapse
Affiliation(s)
- Yanrui Ding
- Jiangnan University, Wuxi, People's Republic of China.
| | | | | | | | | |
Collapse
|
18
|
Saccardo F, Martini M, Palmano S, Ermacora P, Scortichini M, Loi N, Firrao G. Genome drafts of four phytoplasma strains of the ribosomal group 16SrIII. MICROBIOLOGY-SGM 2012; 158:2805-2814. [PMID: 22936033 DOI: 10.1099/mic.0.061432-0] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
By applying a coverage-based read selection and filtration through a healthy plant dataset, and a post-assembly contig selection based on homology and linkage, genome sequence drafts were obtained for four phytoplasma strains belonging to the 16SrIII group (X disease clade), namely Vaccinium Witches' Broom phytoplasma (647 754 nt in 272 contigs), Italian Clover Phyllody phytoplasma strain MA (597 245 nt in 197 contigs), Poinsettia branch-inducing phytoplasma strain JR1 (631 440 nt in 185 contigs) and Milkweed Yellows phytoplasma (583 806 nt in 158 contigs). Despite assignment to different 16SrIII subgroups, the genomes of the four strains were similar, comprising a highly conserved core (92-98 % similar in their nucleotide sequence among each other over alignments about 500 kb in length) and a minor strain-specific component. As far as their protein complement was concerned, they did not differ significantly in their basic metabolism potential from the genomes of other wide-host-range phytoplasmas sequenced previously, but were distinct from strains of other species, as well as among each other, in genes encoding functions conceivably related to interactions with the host, such as membrane trafficking components, proteases, DNA methylases, effectors and several hypothetical proteins of unknown function, some of which are likely secreted through the Sec-dependent secretion system. The four genomes displayed a group of genes encoding hypothetical proteins with high similarity to a central domain of IcmE/DotG, a core component of the type IVB secretion system of Gram-negative Legionella spp. Conversely, genes encoding functional GroES/GroEL chaperones were not detected in any of the four drafts. The results also indicated the significant role of horizontal gene transfer among different 'Candidatus Phytoplasma' species in shaping phytoplasma genomes and promoting their diversity.
Collapse
Affiliation(s)
- Federica Saccardo
- Dipartimento di Scienze Agrarie ed Ambientali, Università di Udine, via Scienze 208, Udine, Italy
| | - Marta Martini
- Dipartimento di Scienze Agrarie ed Ambientali, Università di Udine, via Scienze 208, Udine, Italy
| | - Sabrina Palmano
- Istituto di Virologia Vegetale, CNR, Strada delle Cacce 73, 10135 Torino, Italy
| | - Paolo Ermacora
- Dipartimento di Scienze Agrarie ed Ambientali, Università di Udine, via Scienze 208, Udine, Italy
| | - Marco Scortichini
- Centro di Ricerca per la Frutticoltura, CRA, via di Fioranello 54, Roma, Italy
| | - Nazia Loi
- Dipartimento di Scienze Agrarie ed Ambientali, Università di Udine, via Scienze 208, Udine, Italy
| | - Giuseppe Firrao
- Istituto Nazionale di Biostrutture e Biosistemi, Interuniversity Consortium, Italy.,Dipartimento di Scienze Agrarie ed Ambientali, Università di Udine, via Scienze 208, Udine, Italy
| |
Collapse
|
19
|
Abstract
Methods for identifying alien genes in genomes fall into two general classes. Phylogenetic methods examine the distribution of a gene's homologues among genomes to find those with relationships not consistent with vertical inheritance. These approaches include identifying orphan genes which lack homologues in closely related genomes and genes with unduly high levels of similarity to genes in otherwise unrelated genomes. Rigorous statistical tests are available to place confidence intervals for predicted alien genes. Parametric methods examine the compositional properties of genes within a genome to find those with atypical properties, likely indicating the directional mutational pressures of a donor genome. These methods may compare the properties of genes to genomic averages, properties of genes to each other, or properties of large, multigene regions of the chromosome. Here, we discuss the strengths and weaknesses of each approach.
Collapse
Affiliation(s)
- Rajeev K Azad
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | | |
Collapse
|
20
|
Abstract
The distinction between orthologs and paralogs, genes that started diverging by speciation versus duplication, is relevant in a wide range of contexts, most notably phylogenetic tree inference and protein function annotation. In this chapter, we provide an overview of the methods used to infer orthology and paralogy. We survey both graph-based approaches (and their various grouping strategies) and tree-based approaches, which solve the more general problem of gene/species tree reconciliation. We discuss conceptual differences among the various orthology inference methods and databases, and examine the difficult issue of verifying and benchmarking orthology predictions. Finally, we review typical applications of orthologous genes, groups, and reconciled trees and conclude with thoughts on future methodological developments.
Collapse
|
21
|
Tekaia F, Yeramian E. SuperPartitions: detection and classification of orthologs. Gene 2011; 492:199-211. [PMID: 22056699 DOI: 10.1016/j.gene.2011.10.027] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Revised: 10/08/2011] [Accepted: 10/11/2011] [Indexed: 10/16/2022]
Abstract
The proper detection of orthologs is crucial for evolutionary studies of genes and species. Despite large efforts to solve this problem the methodological situation appears unsettled to a large extent and the "quest for orthologs" is still an ongoing task in large-scale genome comparisons. Here, we introduce a simple operational framework for the detection of orthologs and their classification. The operational framework relies on well-established principles, optimizing their implementation for the considered purposes, and chaining components in coherent procedures: 1) We take advantage of the efficiency and simplicity of the Reciprocal Best Hit (RBH) detections, remedying (by design) the drawback concerning the limitations in terms of 1:1 detections. The procedure is based on the partitioning of Reciprocal Best Hits, with the further merging of partitions including members of the same paralogous classes ("SuperPartition of Orthologs" (SPOs)). 2) We then resort to the conservation profiles of the obtained clusters, allowing simple detection of SPOs containing duplicated members. Based on accepted evolutionary principles, such members can be further tagged as in-paralogs (co-orthologs) or out-paralogs. The method is illustrated and validated by extensive genomic analyses. The performances of the overall approach are characterized in global terms for three sets of species (Chlamydiae, Mycobacteria, Aspergilli), showing that at least 75% of the sets of orthologs contain at most one protein from a given species. The sets including more than one protein from a given species are shown to contain in-paralogs in proportions varying from 28% to 58%. The characterizations also show that the large majority of SPOs are associated with ancestral motifs, and accordingly not prone to chaining effects that might be triggered by multi-domain proteins. Further the SPO formulation is compared to other similarity based ortholog detection methods. Beyond core common results, significant differences are observed between various methods, which can be accounted for to a large extent on conceptual grounds, relative to the different merging schemes involved. Such comparisons highlight a major advantage of the SPO approach concerning the proper clustering of associated paralogs, which appear to be often dispatched spuriously into distinct orthologous classes. Finally the perspectives for future applications and elaborations of SPO-based compositional analyses are discussed.
Collapse
Affiliation(s)
- Fredj Tekaia
- Institut Pasteur, Unité de Génétique Moléculaire des Levures (URA 2171 CNRS and UFR927 Univ. P.M. Curie), 25, Rue du Dr Roux, 75724 Paris Cedex 15, France.
| | | |
Collapse
|
22
|
Abstract
BACKGROUND Genome sequencing has revolutionized our view of the relationships among genomes, particularly in revealing the confounding effects of lateral genetic transfer (LGT). Phylogenomic techniques have been used to construct purported trees of microbial life. Although such trees are easily interpreted and allow the use of a subset of genomes as "proxies" for the full set, LGT and other phenomena impact the positioning of different groups in genome trees, confounding and potentially invalidating attempts to construct a phylogeny-based taxonomy of microorganisms. Network and graph approaches can reveal complex sets of relationships, but applying these techniques to large data sets is a significant challenge. Notwithstanding the question of what exactly it might represent, generating and interpreting a Tree or Network of All Genomes will only be feasible if current algorithms can be improved upon. RESULTS Complex relationships among even the most-similar genomes demonstrate that proxy-based approaches to simplifying large sets of genomes are not alone sufficient to solve the analysis problem. A phylogenomic analysis of 1173 sequenced bacterial and archaeal genomes generated phylogenetic trees for 159,905 distinct homologous gene sets. The relationships inferred from this set can be heavily dependent on the inclusion of other taxa: for example, phyla such as Spirochaetes, Proteobacteria and Firmicutes are recovered as cohesive groups or split depending on the presence of other specific lineages. Furthermore, named groups such as Acidithiobacillus, Coprothermobacter and Brachyspira show a multitude of affiliations that are more consistent with their ecology than with small subunit ribosomal DNA-based taxonomy. Network and graph representations can illustrate the multitude of conflicting affinities, but all methods impose constraints on the input data and create challenges of construction and interpretation. CONCLUSIONS These complex relationships highlight the need for an inclusive approach to genomic data, and current methods with minor alterations will likely scale to allow the analysis of data sets with 10,000 or more genomes. The main challenges lie in the visualization and interpretation of genomic relationships, and the redefinition of microbial taxonomy when subsets of genomic data are so evidently in conflict with one another, and with the "canonical" molecular taxonomy.
Collapse
Affiliation(s)
- Robert G Beiko
- Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 1W5 Canada.
| |
Collapse
|
23
|
Plett D, Toubia J, Garnett T, Tester M, Kaiser BN, Baumann U. Dichotomy in the NRT gene families of dicots and grass species. PLoS One 2010; 5:e15289. [PMID: 21151904 PMCID: PMC2997785 DOI: 10.1371/journal.pone.0015289] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Accepted: 11/04/2010] [Indexed: 11/19/2022] Open
Abstract
A large proportion of the nitrate (NO(3)(-)) acquired by plants from soil is actively transported via members of the NRT families of NO(3)(-) transporters. In Arabidopsis, the NRT1 family has eight functionally characterised members and predominantly comprises low-affinity transporters; the NRT2 family contains seven members which appear to be high-affinity transporters; and there are two NRT3 (NAR2) family members which are known to participate in high-affinity transport. A modified reciprocal best hit (RBH) approach was used to identify putative orthologues of the Arabidopsis NRT genes in the four fully sequenced grass genomes (maize, rice, sorghum, Brachypodium). We also included the poplar genome in our analysis to establish whether differences between Arabidopsis and the grasses may be generally applicable to monocots and dicots. Our analysis reveals fundamental differences between Arabidopsis and the grass species in the gene number and family structure of all three families of NRT transporters. All grass species possessed additional NRT1.1 orthologues and appear to lack NRT1.6/NRT1.7 orthologues. There is significant separation in the NRT2 phylogenetic tree between NRT2 genes from dicots and grass species. This indicates that determination of function of NRT2 genes in grass species will not be possible in cereals based simply on sequence homology to functionally characterised Arabidopsis NRT2 genes and that proper functional analysis will be required. Arabidopsis has a unique NRT3.2 gene which may be a fusion of the NRT3.1 and NRT3.2 genes present in all other species examined here. This work provides a framework for future analysis of NO(3)(-) transporters and NO(3)(-) transport in grass crop species.
Collapse
Affiliation(s)
- Darren Plett
- Australian Centre for Plant Functional Genomics, Waite Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - John Toubia
- Australian Centre for Plant Functional Genomics, Waite Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Trevor Garnett
- Australian Centre for Plant Functional Genomics, Waite Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Mark Tester
- Australian Centre for Plant Functional Genomics, Waite Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Brent N. Kaiser
- School of Agriculture, Food and Wine, Waite Research Institute, University of Adelaide, Adelaide, South Australia, Australia
- * E-mail:
| | - Ute Baumann
- Australian Centre for Plant Functional Genomics, Waite Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| |
Collapse
|
24
|
Poptsova MS, Gogarten JP. Using comparative genome analysis to identify problems in annotated microbial genomes. Microbiology (Reading) 2010; 156:1909-1917. [DOI: 10.1099/mic.0.033811-0] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.
Collapse
Affiliation(s)
- Maria S. Poptsova
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125, USA
| | - J. Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125, USA
| |
Collapse
|
25
|
Jun J, Ryvkin P, Hemphill E, Nelson C. Duplication mechanism and disruptions in flanking regions determine the fate of Mammalian gene duplicates. J Comput Biol 2010; 16:1253-66. [PMID: 19772436 DOI: 10.1089/cmb.2009.0074] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Here we identify duplicated genes in five mammalian genomes and classify these duplicates based on the mechanisms by which they were generated. Retrotransposition accounts for at least half of all predicted duplicate genes in these genomes, with tandem and interspersed DNA-mediated duplicates comprising the other half. Estimation of the evolutionary rates in each class revealed greater rate asymmetry between retrotransposed and interspersed DNA duplicate pairs than between tandem duplicates, suggesting that retrotransposed and interspersed DNA duplicates are diverging more quickly. In an attempt to understand the basis of this asymmetry, we identified disruption of flanking DNA as an indicator of new duplicate fate-loss of local synteny accelerates the asymmetry of divergence of interspersed DNA duplicates. We also show that intact retrogenes are enriched in intergenic regions and indel purified regions of the human genome. Moreover, intact retrogenes closest to annotated genes show the greatest levels of purifying selective pressure. Together, these findings suggest that the differential evolution of duplicate genes may be significantly influenced by changes in local genome architecture.
Collapse
Affiliation(s)
- Jin Jun
- Department of Computer Science and Engineering, University of Connecticut , Storrs, CT 06269, USA
| | | | | | | |
Collapse
|
26
|
Jun J, Mandoiu II, Nelson CE. Identification of mammalian orthologs using local synteny. BMC Genomics 2009; 10:630. [PMID: 20030836 PMCID: PMC2807883 DOI: 10.1186/1471-2164-10-630] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Accepted: 12/23/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate determination of orthology is central to comparative genomics. For vertebrates in particular, very large gene families, high rates of gene duplication and loss, multiple mechanisms of gene duplication, and high rates of retrotransposition all combine to make inference of orthology between genes difficult. Many methods have been developed to identify orthologous genes, mostly based upon analysis of the inferred protein sequence of the genes. More recently, methods have been proposed that use genomic context in addition to protein sequence to improve orthology assignment in vertebrates. Such methods have been most successfully implemented in fungal genomes and have long been used in prokaryotic genomes, where gene order is far less variable than in vertebrates. However, to our knowledge, no explicit comparison of synteny and sequence based definitions of orthology has been reported in vertebrates, or, more specifically, in mammals. RESULTS We test a simple method for the measurement and utilization of gene order (local synteny) in the identification of mammalian orthologs by investigating the agreement between coding sequence based orthology (Inparanoid) and local synteny based orthology. In the 5 mammalian genomes studied, 93% of the sampled inter-species pairs were found to be concordant between the two orthology methods, illustrating that local synteny is a robust substitute to coding sequence for identifying orthologs. However, 7% of pairs were found to be discordant between local synteny and Inparanoid. These cases of discordance result from evolutionary events including retrotransposition and genome rearrangements. CONCLUSIONS By analyzing cases of discordance between local synteny and Inparanoid we show that local synteny can distinguish between true orthologs and recent retrogenes, can resolve ambiguous many-to-many orthology relationships into one-to-one ortholog pairs, and might be used to identify cases of non-orthologous gene displacement by retroduplicated paralogs.
Collapse
Affiliation(s)
- Jin Jun
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA
| | | | | |
Collapse
|
27
|
Zhaxybayeva O, Doolittle WF, Papke RT, Gogarten JP. Intertwined evolutionary histories of marine Synechococcus and Prochlorococcus marinus. Genome Biol Evol 2009; 1:325-39. [PMID: 20333202 PMCID: PMC2817427 DOI: 10.1093/gbe/evp032] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/28/2009] [Indexed: 02/04/2023] Open
Abstract
Prochlorococcus is a genus of marine cyanobacteria characterized by small cell and genome size, an evolutionary trend toward low GC content, the possession of chlorophyll b, and the absence of phycobilisomes. Whereas many shared derived characters define Prochlorococcus as a clade, many genome-based analyses recover them as paraphyletic, with some low-light adapted Prochlorococcus spp. grouping with marine Synechococcus. Here, we use 18 Prochlorococcus and marine Synechococcus genomes to analyze gene flow within and between these taxa. We introduce embedded quartet scatter plots as a tool to screen for genes whose phylogeny agrees or conflicts with the plurality phylogenetic signal, with accepted taxonomy and naming, with GC content, and with the ecological adaptation to high and low light intensities. We find that most gene families support high-light adapted Prochlorococcus spp. as a monophyletic clade and low-light adapted Prochlorococcus sp. as a paraphyletic group. But we also detect 16 gene families that were transferred between high-light adapted and low-light adapted Prochlorococcus sp. and 495 gene families, including 19 ribosomal proteins, that do not cluster designated Prochlorococcus and Synechococcus strains in the expected manner. To explain the observed data, we propose that frequent gene transfer between marine Synechococcus spp. and low-light adapted Prochlorococcus spp. has created a “highway of gene sharing” (Beiko RG, Harlow TJ, Ragan MA. 2005. Highways of gene sharing in prokaryotes. Proc Natl Acad Sci USA. 102:14332–14337) that tends to erode genus boundaries without erasing the Prochlorococcus-specific ecological adaptations.
Collapse
Affiliation(s)
- Olga Zhaxybayeva
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada.
| | | | | | | |
Collapse
|
28
|
Similarity clustering of proteins using substantive knowledge and reconstruction of evolutionary gene histories in herpesvirus. Theor Chem Acc 2009. [DOI: 10.1007/s00214-009-0614-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
29
|
Poptsova MS, Larionov SA, Ryadchenko EV, Rybalko SD, Zakharov IA, Loskutov A. Hidden chromosome symmetry: in silico transformation reveals symmetry in 2D DNA walk trajectories of 671 chromosomes. PLoS One 2009; 4:e6396. [PMID: 19636424 PMCID: PMC2712679 DOI: 10.1371/journal.pone.0006396] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2009] [Accepted: 06/23/2009] [Indexed: 11/18/2022] Open
Abstract
Maps of 2D DNA walk of 671 examined chromosomes show composition complexity change from symmetrical half-turn in bacteria to pseudo-random trajectories in archaea, fungi and humans. In silico transformation of gene order and strand position returns most of the analyzed chromosomes to a symmetrical bacterial-like state with one transition point. The transformed chromosomal sequences also reveal remarkable segmental compositional symmetry between regions from different strands located equidistantly from the transition point. Despite extensive chromosome rearrangement the relation of gene numbers on opposite strands for chromosomes of different taxa varies in narrow limits around unity with Pearson coefficient r = 0.98. Similar relation is observed for total genes' length (r = 0.86) and cumulative GC (r = 0.95) and AT (r = 0.97) skews. This is also true for human coding sequences (CDS), which comprise only several percent of the entire chromosome length. We found that frequency distributions of the length of gene clusters, continuously located on the same strand, have close values for both strands. Eukaryotic gene distribution is believed to be non-random. Contribution of different subsystems to the noted symmetries and distributions, and evolutionary aspects of symmetry are discussed.
Collapse
Affiliation(s)
- Maria S Poptsova
- University of Connecticut, Storrs, Connecticut, United States of America.
| | | | | | | | | | | |
Collapse
|
30
|
Abstract
This chapter discusses the pros and cons of the existing computational methods for the detection of horizontal (or lateral) gene transfer and highlights the genome-wide studies utilizing these methods. The impact of horizontal gene transfer (HGT) on prokaryote genome evolution is discussed.
Collapse
|
31
|
Commins J, Toft C, Fares MA. Computational biology methods and their application to the comparative genomics of endocellular symbiotic bacteria of insects. Biol Proced Online 2009; 11:52-78. [PMID: 19495914 PMCID: PMC3055744 DOI: 10.1007/s12575-009-9004-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Accepted: 02/17/2009] [Indexed: 12/02/2022] Open
Abstract
Comparative genomics has become a real tantalizing challenge in the postgenomic era. This fact has been mostly magnified by the plethora of new genomes becoming available in a daily bases. The overwhelming list of new genomes to compare has pushed the field of bioinformatics and computational biology forward toward the design and development of methods capable of identifying patterns in a sea of swamping data noise. Despite many advances made in such endeavor, the ever-lasting annoying exceptions to the general patterns remain to pose difficulties in generalizing methods for comparative genomics. In this review, we discuss the different tools devised to undertake the challenge of comparative genomics and some of the exceptions that compromise the generality of such methods. We focus on endosymbiotic bacteria of insects because of their genomic dynamics peculiarities when compared to free-living organisms.
Collapse
Affiliation(s)
- Jennifer Commins
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics, Smurfit Institute of Genetics, Trinity College, University of Dublin, Dublin, Ireland
| | - Christina Toft
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics, Smurfit Institute of Genetics, Trinity College, University of Dublin, Dublin, Ireland
| | - Mario A Fares
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics, Smurfit Institute of Genetics, Trinity College, University of Dublin, Dublin, Ireland
| |
Collapse
|
32
|
Ramsay H, Rieseberg LH, Ritland K. The correlation of evolutionary rate with pathway position in plant terpenoid biosynthesis. Mol Biol Evol 2009; 26:1045-53. [PMID: 19188263 DOI: 10.1093/molbev/msp021] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Genes are expected to face stronger selective constraint and to evolve more slowly if they encode enzymes upstream as opposed to downstream in metabolic pathways, because upstream genes are more pleiotropic, being required for a wider range of end products. However, few clear examples of this trend in evolutionary rate variation exist. We examined whether genes involved in plant terpenoid biosynthesis exhibit such a pattern, using data for 40 genes from five fully sequenced angiosperms, Oryza, Vitis, Arabidopsis, Populus, and Ricinus. Our results show that d(N)/d(S) does in fact correlate with pathway position along pathways converting glucose to the terpenoid phytohormones abscissic acid, gibberellic acid (GA), and brassinosteroids. Upstream versus downstream rate variation is particularly strong in the GA pathway. In contrast, we found no or little apparent variation in d(N)/d(S) with gene copy number. We also introduce a new measure of pathway position, the Pathway Pleiotropy Index (PPI), which counts groups of enzymes between pathway branch points. We found that this measure is superior to pathway position in explaining variation in d(N)/d(S) along each pathway. Although at least 8 of the 40 genes showed evidence of positive selection, correlations of d(N)/d(S) with PPI remain significant when these genes are removed. Therefore, our results are consistent with the prediction that selective constraint is progressively relaxed along metabolic pathways.
Collapse
Affiliation(s)
- Heather Ramsay
- Faculty of Forestry, Department of Forest Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | | | | |
Collapse
|
33
|
Sato N. Gclust: trans-kingdom classification of proteins using automatic individual threshold setting. Bioinformatics 2009; 25:599-605. [DOI: 10.1093/bioinformatics/btp047] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
34
|
Beiko RG, Ragan MA. Untangling hybrid phylogenetic signals: horizontal gene transfer and artifacts of phylogenetic reconstruction. Methods Mol Biol 2009; 532:241-256. [PMID: 19271189 DOI: 10.1007/978-1-60327-853-9_14] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Phylogenomic methods can be used to investigate the tangled evolutionary relationships among genomes. Building 'all the trees of all the genes' can potentially identify common pathways of horizontal gene transfer (HGT) among taxa at varying levels of phylogenetic depth. Phylogenetic affinities can be aggregated and merged with the information about genetic linkage and biochemical function to examine hypotheses of adaptive evolution via HGT. Additionally, the use of many genetic data sets increases the power of statistical tests for phylogenetic artifacts. However, large-scale phylogenetic analyses pose several challenges, including the necessary abandonment of manual validation techniques, the need to translate inferred phylogenetic discordance into inferred HGT events, and the challenges involved in aggregating results from search-based inference methods. In this chapter we describe a tree search procedure to recover the most parsimonious pathways of HGT, and examine some of the assumptions that are made by this method.
Collapse
Affiliation(s)
- Robert G Beiko
- Department of Computer Science, Dalhousie University, Halifax, NS, Canada
| | | |
Collapse
|
35
|
Abstract
The subject of this chapter is to describe the methodology for assessing the power of phylogenetic HGT detection methods. Detection power is defined in the framework of hypothesis testing. Rates of false positives and false negatives can be estimated by testing HGT detection methods on HGT-free orthologous sets, and on the same sets with in silico simulated HGT events. The whole process can be divided into three steps: obtaining HGT-free orthologous sets, in silico simulation of HGT events in the same set, and submitting both sets for evaluation by any of the tested methods.Phylogenetic methods of HGT detection can be roughly divided into three types: likelihood-based tests of topologies (Kishino-Hasegawa (KH), Shimodaira-Hasegawa (SH), and Approximately Unbiased (AU) tests), tree distance methods (symmetrical difference of Robinson and Foulds (RF), and Subtree Pruning and Regrafting (SPR) distances), and genome spectral approaches (bipartition and quartet decomposition analysis). Restrictions that are inherent to phylogenetic methods of HGT detection in general and the power and precision of each method are discussed and comparative analyses of different approaches are provided, as well as some examples of assessing the power of phylogenetic HGT detection methods from a case study of orthologous sets from gamma-proteobacteria (Poptsova and Gogarten, BMC Evol Biol 7, 45, 2007) and cyanobacteria (Zhaxybayeva et al., Genome Res 16, 1099-108, 2006).
Collapse
Affiliation(s)
- Maria Poptsova
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
36
|
Wu H, Mao F, Olman V, Xu Y. On application of directons to functional classification of genes in prokaryotes. Comput Biol Chem 2008; 32:176-84. [PMID: 18440870 DOI: 10.1016/j.compbiolchem.2008.02.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2007] [Accepted: 02/15/2008] [Indexed: 11/30/2022]
Abstract
Functional classification of genes represents one of the most basic problems in genome analysis and annotation. Our analysis of some of the popular methods for functional classification of genes shows that these methods are not always consistent with each other and may not be specific enough for high-resolution gene functional annotations. We have developed a method to integrate genomic neighborhood information of genes with their sequence similarity information for the functional classification of prokaryotic genes. The application of our method to 93 proteobacterial genomes has shown that (i) the genomic neighborhoods are much more conserved across prokaryotic genomes than expected by chance, and such conservation can be utilized to improve functional classification of genes; (ii) while our method is consistent with the existing popular schemes as much as they are among themselves, it does provide functional classification at higher resolution and hence allows functional assignments of (new) genes at a more specific level; and (iii) our method is fairly stable when being applied to different genomes.
Collapse
Affiliation(s)
- Hongwei Wu
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Savannah, GA 31407, USA
| | | | | | | |
Collapse
|
37
|
Poptsova MS, Gogarten JP. BranchClust: a phylogenetic algorithm for selecting gene families. BMC Bioinformatics 2007; 8:120. [PMID: 17425803 PMCID: PMC1853112 DOI: 10.1186/1471-2105-8-120] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2006] [Accepted: 04/10/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Automated methods for assembling families of orthologous genes include those based on sequence similarity scores and those based on phylogenetic approaches. The first are easy to automate but usually they do not distinguish between paralogs and orthologs or have restriction on the number of taxa. Phylogenetic methods often are based on reconciliation of a gene tree with a known rooted species tree; a limitation of this approach, especially in case of prokaryotes, is that the species tree is often unknown, and that from the analyses of single gene families the branching order between related organisms frequently is unresolved. RESULTS Here we describe an algorithm for the automated selection of orthologous genes that recognizes orthologous genes from different species in a phylogenetic tree for any number of taxa. The algorithm is capable of distinguishing complete (containing all taxa) and incomplete (not containing all taxa) families and recognizes in- and outparalogs. The BranchClust algorithm is implemented in Perl with the use of the BioPerl module for parsing trees and is freely available at http://bioinformatics.org/branchclust. CONCLUSION BranchClust outperforms the Reciprocal Best Blast hit method in selecting more sets of putatively orthologous genes. In the test cases examined, the correctness of the selected families and of the identified in- and outparalogs was confirmed by inspection of the pertinent phylogenetic trees.
Collapse
Affiliation(s)
- Maria S Poptsova
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125, USA
| | - J Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125, USA
| |
Collapse
|