1
|
Kayani MUR, Huang W, Feng R, Chen L. Genome-resolved metagenomics using environmental and clinical samples. Brief Bioinform 2021; 22:bbab030. [PMID: 33758906 PMCID: PMC8425419 DOI: 10.1093/bib/bbab030] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 11/29/2020] [Accepted: 01/20/2021] [Indexed: 12/25/2022] Open
Abstract
Recent advances in high-throughput sequencing technologies and computational methods have added a new dimension to metagenomic data analysis i.e. genome-resolved metagenomics. In general terms, it refers to the recovery of draft or high-quality microbial genomes and their taxonomic classification and functional annotation. In recent years, several studies have utilized the genome-resolved metagenome analysis approach and identified previously unknown microbial species from human and environmental metagenomes. In this review, we describe genome-resolved metagenome analysis as a series of four necessary steps: (i) preprocessing of the sequencing reads, (ii) de novo metagenome assembly, (iii) genome binning and (iv) taxonomic and functional analysis of the recovered genomes. For each of these four steps, we discuss the most commonly used tools and the currently available pipelines to guide the scientific community in the recovery and subsequent analyses of genomes from any metagenome sample. Furthermore, we also discuss the tools required for validation of assembly quality as well as for improving quality of the recovered genomes. We also highlight the currently available pipelines that can be used to automate the whole analysis without having advanced bioinformatics knowledge. Finally, we will highlight the most widely adapted and actively maintained tools and pipelines that can be helpful to the scientific community in decision making before they commence the analysis.
Collapse
Affiliation(s)
- Masood ur Rehman Kayani
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| | - Wanqiu Huang
- Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 200,000, China
| | - Ru Feng
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| | - Lei Chen
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| |
Collapse
|
2
|
von Meijenfeldt FAB, Arkhipova K, Cambuy DD, Coutinho FH, Dutilh BE. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biol 2019; 20:217. [PMID: 31640809 PMCID: PMC6805573 DOI: 10.1186/s13059-019-1817-x] [Citation(s) in RCA: 242] [Impact Index Per Article: 48.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 09/10/2019] [Indexed: 01/08/2023] Open
Abstract
Current-day metagenomics analyses increasingly involve de novo taxonomic classification of long DNA sequences and metagenome-assembled genomes. Here, we show that the conventional best-hit approach often leads to classifications that are too specific, especially when the sequences represent novel deep lineages. We present a classification method that integrates multiple signals to classify sequences (Contig Annotation Tool, CAT) and metagenome-assembled genomes (Bin Annotation Tool, BAT). Classifications are automatically made at low taxonomic ranks if closely related organisms are present in the reference database and at higher ranks otherwise. The result is a high classification precision even for sequences from considerably unknown organisms.
Collapse
Affiliation(s)
| | - Ksenia Arkhipova
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| | - Diego D Cambuy
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| | - Felipe H Coutinho
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
- Instituto de Biologia, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
- Present Address: Evolutionary Genomics Group, Departamento de Produccíon Vegetal y Microbiología, Universidad Miguel Hernández, Campus San Juan, San Juan, 03550, Alicante, Spain
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands.
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands.
| |
Collapse
|
3
|
von Meijenfeldt FAB, Arkhipova K, Cambuy DD, Coutinho FH, Dutilh BE. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biol 2019; 20:217. [PMID: 31640809 DOI: 10.1101/530188] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 09/10/2019] [Indexed: 05/23/2023] Open
Abstract
Current-day metagenomics analyses increasingly involve de novo taxonomic classification of long DNA sequences and metagenome-assembled genomes. Here, we show that the conventional best-hit approach often leads to classifications that are too specific, especially when the sequences represent novel deep lineages. We present a classification method that integrates multiple signals to classify sequences (Contig Annotation Tool, CAT) and metagenome-assembled genomes (Bin Annotation Tool, BAT). Classifications are automatically made at low taxonomic ranks if closely related organisms are present in the reference database and at higher ranks otherwise. The result is a high classification precision even for sequences from considerably unknown organisms.
Collapse
Affiliation(s)
| | - Ksenia Arkhipova
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| | - Diego D Cambuy
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| | - Felipe H Coutinho
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
- Instituto de Biologia, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
- Present Address: Evolutionary Genomics Group, Departamento de Produccíon Vegetal y Microbiología, Universidad Miguel Hernández, Campus San Juan, San Juan, 03550, Alicante, Spain
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands.
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands.
| |
Collapse
|
4
|
Global phylogeography and ancient evolution of the widespread human gut virus crAssphage. Nat Microbiol 2019; 4:1727-1736. [PMID: 31285584 DOI: 10.1038/s41564-019-0494-6] [Citation(s) in RCA: 149] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2018] [Accepted: 05/22/2019] [Indexed: 12/22/2022]
Abstract
Microbiomes are vast communities of microorganisms and viruses that populate all natural ecosystems. Viruses have been considered to be the most variable component of microbiomes, as supported by virome surveys and examples of high genomic mosaicism. However, recent evidence suggests that the human gut virome is remarkably stable compared with that of other environments. Here, we investigate the origin, evolution and epidemiology of crAssphage, a widespread human gut virus. Through a global collaboration, we obtained DNA sequences of crAssphage from more than one-third of the world's countries and showed that the phylogeography of crAssphage is locally clustered within countries, cities and individuals. We also found fully colinear crAssphage-like genomes in both Old-World and New-World primates, suggesting that the association of crAssphage with primates may be millions of years old. Finally, by exploiting a large cohort of more than 1,000 individuals, we tested whether crAssphage is associated with bacterial taxonomic groups of the gut microbiome, diverse human health parameters and a wide range of dietary factors. We identified strong correlations with different clades of bacteria that are related to Bacteroidetes and weak associations with several diet categories, but no significant association with health or disease. We conclude that crAssphage is a benign cosmopolitan virus that may have coevolved with the human lineage and is an integral part of the normal human gut virome.
Collapse
|
5
|
Meirelles PM, Soares AC, Oliveira L, Leomil L, Appolinario LR, Francini-Filho RB, de Moura RL, de Barros Almeida RT, Salomon PS, Amado-Filho GM, Kruger R, Siegle E, Tschoeke DA, Kudo I, Mino S, Sawabe T, Thompson CC, Thompson FL. Metagenomics of Coral Reefs Under Phase Shift and High Hydrodynamics. Front Microbiol 2018; 9:2203. [PMID: 30337906 PMCID: PMC6180206 DOI: 10.3389/fmicb.2018.02203] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 08/29/2018] [Indexed: 01/06/2023] Open
Abstract
Local and global stressors have affected coral reef ecosystems worldwide. Switches from coral to algal dominance states and microbialization are the major processes underlying the global decline of coral reefs. However, most of the knowledge concerning microbialization has not considered physical disturbances (e.g., typhoons, waves, and currents). Southern Japan reef systems have developed under extreme physical disturbances. Here, we present analyses of a three-year investigation on the coral reefs of Ishigaki Island that comprised benthic and fish surveys, water quality analyses, metagenomics and microbial abundance data. At the four studied sites, inorganic nutrient concentrations were high and exceeded eutrophication thresholds. The dissolved organic carbon (DOC) concentration (up to 233.3 μM) and microbial abundance (up to 2.5 × 105 cell/mL) values were relatively high. The highest vibrio counts coincided with the highest turf cover (∼55-85%) and the lowest coral cover (∼4.4-10.2%) and fish biomass (0.06 individuals/m2). Microbiome compositions were similar among all sites and were dominated by heterotrophs. Our data suggest that a synergic effect among several regional stressors are driving coral decline. In a high hydrodynamics reef environment, high algal/turf cover, stimulated by eutrophication and low fish abundance due to overfishing, promote microbialization. Together with crown-of-thorns starfish (COTS) outbreaks and possible of climate changes impacts, theses coral reefs are likely to collapse.
Collapse
Affiliation(s)
- Pedro Milet Meirelles
- Institute of Biology and SAGE-COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Ana Carolina Soares
- Institute of Biology and SAGE-COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Louisi Oliveira
- Institute of Biology and SAGE-COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Luciana Leomil
- Institute of Biology and SAGE-COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Luciana Reis Appolinario
- Institute of Biology and SAGE-COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | | | - Rodrigo Leão de Moura
- Institute of Biology and SAGE-COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | | | - Paulo S. Salomon
- Institute of Biology and SAGE-COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | | | - Ricardo Kruger
- Department of Cellular Biology, University of Brasília, Brasília, Brazil
| | - Eduardo Siegle
- Oceanographic Institute, University of São Paulo, São Paulo, Brazil
| | - Diogo A. Tschoeke
- Institute of Biology and SAGE-COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Isao Kudo
- Graduate School of Environmental Science, Hokkaido University, Sapporo, Japan
| | - Sayaka Mino
- Laboratory of Microbiology, Faculty of Fisheries Sciences, Hokkaido University, Hakodate, Japan
| | - Tomoo Sawabe
- Laboratory of Microbiology, Faculty of Fisheries Sciences, Hokkaido University, Hakodate, Japan
| | - Cristiane C. Thompson
- Institute of Biology and SAGE-COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Fabiano L. Thompson
- Institute of Biology and SAGE-COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
6
|
Salt and drought stress and ABA responses related to bZIP genes from V. radiata and V. angularis. Gene 2018; 651:152-160. [PMID: 29425824 DOI: 10.1016/j.gene.2018.02.005] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 01/08/2018] [Accepted: 02/02/2018] [Indexed: 12/11/2022]
Abstract
Mung bean and adzuki bean are warm-season legumes widely cultivated in China. However, bean production in major producing regions is limited by biotic and abiotic stress, such as drought and salt stress. Basic leucine zipper (bZIP) genes play key roles in responses to various biotic and abiotic stresses. However, only several bZIP genes involved in drought and salt stress in legumes, especially Vigna radiata and Vigna angularis, have been identified. In this study, we identified 54 and 50 bZIP proteins from whole-genome sequences of V. radiata and V. angularis, respectively. First, we comprehensively surveyed the characteristics of all bZIP genes, including their gene structure, chromosome distribution and motif composition. Phylogenetic trees showed that VrbZIP and VabZIP proteins were divided into ten clades comprising nine known and one unknown subgroup. The results of the nucleotide substitution rate of the orthologous gene pairs showed that bZIP proteins have undergone strong purifying selection: V. radiata and V. angularis diverged 1.25 million years ago (mya) to 9.20 mya (average of 4.95 mya). We also found that many cis-acting regulatory elements (CAREs) involved in abiotic stress and plant hormone responses were detected in the putative promoter regions of the bZIP genes. Finally, using the quantitative real-time PCR (qRT-PCR) method, we performed expression profiling of the bZIP genes in response to drought, salt and abscisic acid (ABA). We identified several bZIP genes that may be involved in drought and salt responses. Generally, our results provided useful and rich resources of VrbZIP and VabZIP genes for the functional characterization and understanding of bZIP transcription factors (TFs) in warm-season legumes. In addition, our results revealed important and interesting data - a subset of VrbZIP and VabZIP gene expression profiles in response to drought, salt and ABA stress. These results provide gene expression evidence for the selection of candidate genes under drought and salt stress for future study.
Collapse
|
7
|
Abstract
Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.
Collapse
|
8
|
Gutleben J, Chaib De Mares M, van Elsas JD, Smidt H, Overmann J, Sipkema D. The multi-omics promise in context: from sequence to microbial isolate. Crit Rev Microbiol 2017; 44:212-229. [PMID: 28562180 DOI: 10.1080/1040841x.2017.1332003] [Citation(s) in RCA: 98] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
The numbers and diversity of microbes in ecosystems within and around us is unmatched, yet most of these microorganisms remain recalcitrant to in vitro cultivation. Various high-throughput molecular techniques, collectively termed multi-omics, provide insights into the genomic structure and metabolic potential as well as activity of complex microbial communities. Nonetheless, pure or defined cultures are needed to (1) decipher microbial physiology and thus test multi-omics-based ecological hypotheses, (2) curate and improve database annotations and (3) realize novel applications in biotechnology. Cultivation thus provides context. In turn, we here argue that multi-omics information awaits integration into the development of novel cultivation strategies. This can build the foundation for a new era of omics information-guided microbial cultivation technology and reduce the inherent trial-and-error search space. This review discusses how information that can be extracted from multi-omics data can be applied for the cultivation of hitherto uncultured microorganisms. Furthermore, we summarize groundbreaking studies that successfully translated information derived from multi-omics into specific media formulations, screening techniques and selective enrichments in order to obtain novel targeted microbial isolates. By integrating these examples, we conclude with a proposed workflow to facilitate future omics-aided cultivation strategies that are inspired by the microbial complexity of the environment.
Collapse
Affiliation(s)
- Johanna Gutleben
- a Laboratory of Microbiology , Wageningen University & Research , Wageningen , The Netherlands
| | - Maryam Chaib De Mares
- b Department of Microbial Ecology, Groningen Institute for Evolutionary Life Sciences (GELIFES) , Rijksuniversiteit Groningen , Groningen , The Netherlands
| | - Jan Dirk van Elsas
- b Department of Microbial Ecology, Groningen Institute for Evolutionary Life Sciences (GELIFES) , Rijksuniversiteit Groningen , Groningen , The Netherlands
| | - Hauke Smidt
- a Laboratory of Microbiology , Wageningen University & Research , Wageningen , The Netherlands
| | - Jörg Overmann
- c Leibniz-Institut DSMZ-Deutsche Sammlung von Mikroorganismen , Braunschweig , Germany
| | - Detmer Sipkema
- a Laboratory of Microbiology , Wageningen University & Research , Wageningen , The Netherlands
| |
Collapse
|
9
|
Petersen M, Meusemann K, Donath A, Dowling D, Liu S, Peters RS, Podsiadlowski L, Vasilikopoulos A, Zhou X, Misof B, Niehuis O. Orthograph: a versatile tool for mapping coding nucleotide sequences to clusters of orthologous genes. BMC Bioinformatics 2017; 18:111. [PMID: 28209129 PMCID: PMC5312442 DOI: 10.1186/s12859-017-1529-8] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 02/06/2017] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Orthology characterizes genes of different organisms that arose from a single ancestral gene via speciation, in contrast to paralogy, which is assigned to genes that arose via gene duplication. An accurate orthology assignment is a crucial step for comparative genomic studies. Orthologous genes in two organisms can be identified by applying a so-called reciprocal search strategy, given that complete information of the organisms' gene repertoire is available. In many investigations, however, only a fraction of the gene content of the organisms under study is examined (e.g., RNA sequencing). Here, identification of orthologous nucleotide or amino acid sequences can be achieved using a graph-based approach that maps nucleotide sequences to genes of known orthology. Existing implementations of this approach, however, suffer from algorithmic issues that may cause problems in downstream analyses. RESULTS We present a new software pipeline, Orthograph, that addresses and solves the above problems and implements useful features for a wide range of comparative genomic and transcriptomic analyses. Orthograph applies a best reciprocal hit search strategy using profile hidden Markov models and maps nucleotide sequences to the globally best matching cluster of orthologous genes, thus enabling researchers to conveniently and reliably delineate orthologs and paralogs from transcriptomic and genomic sequence data. We demonstrate the performance of our approach on de novo-sequenced and assembled transcript libraries of 24 species of apoid wasps (Hymenoptera: Aculeata) as well as on published genomic datasets. CONCLUSION With Orthograph, we implemented a best reciprocal hit approach to reference-based orthology prediction for coding nucleotide sequences such as RNAseq data. Orthograph is flexible, easy to use, open source and freely available at https://mptrsen.github.io/Orthograph . Additionally, we release 24 de novo-sequenced and assembled transcript libraries of apoid wasp species.
Collapse
Affiliation(s)
- Malte Petersen
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Adenauerallee 160, Bonn, 53113, Germany.
| | - Karen Meusemann
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Adenauerallee 160, Bonn, 53113, Germany
- Australian National Insect Collection, CSIRO National Research Collections Australia (NRCA), Clunies Ross Street, Canberra, ACT 2601, Australia
- Department for Evolutionary Biology & Ecology, Institute for Biology I (Zoology), University of Freiburg, Hauptstraße 1, Freiburg, 79104, Germany
| | - Alexander Donath
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Adenauerallee 160, Bonn, 53113, Germany
| | - Daniel Dowling
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Adenauerallee 160, Bonn, 53113, Germany
- Institute of Molecular Biology (IMB), Ackermannweg 4, Mainz, 55128, Germany
| | - Shanlin Liu
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Ralph S Peters
- Arthropod Department, Zoological Research Museum Alexander Koenig, Adenauerallee 160, Bonn, 53113, Germany
| | - Lars Podsiadlowski
- Institute of Evolutionary Biology and Ecology, Zoology and Evolutionary Biology, University of Bonn, An der Immenburg 1, Bonn, 53121, Germany
| | - Alexandros Vasilikopoulos
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Adenauerallee 160, Bonn, 53113, Germany
| | - Xin Zhou
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, China Agricultural University, Beijing, 100193, China
- College of Food Science and Nutritional Engineering, China Agricultural University, Beijing, 100083, China
| | - Bernhard Misof
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Adenauerallee 160, Bonn, 53113, Germany
| | - Oliver Niehuis
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Adenauerallee 160, Bonn, 53113, Germany.
- Department for Evolutionary Biology & Ecology, Institute for Biology I (Zoology), University of Freiburg, Hauptstraße 1, Freiburg, 79104, Germany.
| |
Collapse
|
10
|
Song H, Wang P, Li C, Han S, Zhao C, Xia H, Bi Y, Guo B, Zhang X, Wang X. Comparative analysis of NBS-LRR genes and their response to Aspergillus flavus in Arachis. PLoS One 2017; 12:e0171181. [PMID: 28158222 PMCID: PMC5291535 DOI: 10.1371/journal.pone.0171181] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 01/17/2017] [Indexed: 12/31/2022] Open
Abstract
Studies have demonstrated that nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes respond to pathogen attack in plants. Characterization of NBS-LRR genes in peanut is not well documented. The newly released whole genome sequences of Arachis duranensis and Arachis ipaënsis have allowed a global analysis of this important gene family in peanut to be conducted. In this study, we identified 393 (AdNBS) and 437 (AiNBS) NBS-LRR genes from A. duranensis and A. ipaënsis, respectively, using bioinformatics approaches. Full-length sequences of 278 AdNBS and 303 AiNBS were identified. Fifty-one orthologous, four AdNBS paralogous, and six AiNBS paralogous gene pairs were predicted. All paralogous gene pairs were located in the same chromosomes, indicating that tandem duplication was the most likely mechanism forming these paralogs. The paralogs mainly underwent purifying selection, but most LRR 8 domains underwent positive selection. More gene clusters were found in A. ipaënsis than in A. duranensis, possibly owing to tandem duplication events occurring more frequently in A. ipaënsis. The expression profile of NBS-LRR genes was different between A. duranensis and A. hypogaea after Aspergillus flavus infection. The up-regulated expression of NBS-LRR in A. duranensis was continuous, while these genes responded to the pathogen temporally in A. hypogaea.
Collapse
Affiliation(s)
- Hui Song
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
| | - Pengfei Wang
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
| | - Changsheng Li
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
- College of Life Science, Shandong Normal University, Jinan, China
| | - Suoyi Han
- Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Chuanzhi Zhao
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
| | - Han Xia
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
| | - Yuping Bi
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
| | - Baozhu Guo
- Crop Protection and Management Research Unit, USDA-ARS, Tifton, Georgia, United States of America
| | - Xinyou Zhang
- Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Xingjun Wang
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
- College of Life Science, Shandong Normal University, Jinan, China
| |
Collapse
|
11
|
Nagy LG, Szöllősi G. Fungal Phylogeny in the Age of Genomics: Insights Into Phylogenetic Inference From Genome-Scale Datasets. ADVANCES IN GENETICS 2017; 100:49-72. [DOI: 10.1016/bs.adgen.2017.09.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
12
|
Song H, Wang P, Lin JY, Zhao C, Bi Y, Wang X. Genome-Wide Identification and Characterization of WRKY Gene Family in Peanut. FRONTIERS IN PLANT SCIENCE 2016; 7:534. [PMID: 27200012 PMCID: PMC4845656 DOI: 10.3389/fpls.2016.00534] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Accepted: 04/04/2016] [Indexed: 05/18/2023]
Abstract
WRKY, an important transcription factor family, is widely distributed in the plant kingdom. Many reports focused on analysis of phylogenetic relationship and biological function of WRKY protein at the whole genome level in different plant species. However, little is known about WRKY proteins in the genome of Arachis species and their response to salicylic acid (SA) and jasmonic acid (JA) treatment. In this study, we identified 77 and 75 WRKY proteins from the two wild ancestral diploid genomes of cultivated tetraploid peanut, Arachis duranensis and Arachis ipaënsis, using bioinformatics approaches. Most peanut WRKY coding genes were located on A. duranensis chromosome A6 and A. ipaënsis chromosome B3, while the least number of WRKY genes was found in chromosome 9. The WRKY orthologous gene pairs in A. duranensis and A. ipaënsis chromosomes were highly syntenic. Our analysis indicated that segmental duplication events played a major role in AdWRKY and AiWRKY genes, and strong purifying selection was observed in gene duplication pairs. Furthermore, we translate the knowledge gained from the genome-wide analysis result of wild ancestral peanut to cultivated peanut to reveal that gene activities of specific cultivated peanut WRKY gene were changed due to SA and JA treatment. Peanut WRKY7, 8 and 13 genes were down-regulated, whereas WRKY1 and 12 genes were up-regulated with SA and JA treatment. These results could provide valuable information for peanut improvement.
Collapse
Affiliation(s)
- Hui Song
- Shandong Provincial Key Laboratory of Crop Genetic Improvement, Ecology and Physiology, Biotechnology Research Center, Shandong Academy of Agricultural SciencesJinan, China
| | - Pengfei Wang
- Shandong Provincial Key Laboratory of Crop Genetic Improvement, Ecology and Physiology, Biotechnology Research Center, Shandong Academy of Agricultural SciencesJinan, China
| | - Jer-Young Lin
- Department of Molecular, Cell, and Developmental Biology, University of California, Los AngelesLos Angeles, CA, USA
| | - Chuanzhi Zhao
- Shandong Provincial Key Laboratory of Crop Genetic Improvement, Ecology and Physiology, Biotechnology Research Center, Shandong Academy of Agricultural SciencesJinan, China
| | - Yuping Bi
- Shandong Provincial Key Laboratory of Crop Genetic Improvement, Ecology and Physiology, Biotechnology Research Center, Shandong Academy of Agricultural SciencesJinan, China
| | - Xingjun Wang
- Shandong Provincial Key Laboratory of Crop Genetic Improvement, Ecology and Physiology, Biotechnology Research Center, Shandong Academy of Agricultural SciencesJinan, China
| |
Collapse
|
13
|
Whelan NV, Kocot KM, Halanych KM. Employing Phylogenomics to Resolve the Relationships among Cnidarians, Ctenophores, Sponges, Placozoans, and Bilaterians. Integr Comp Biol 2015; 55:1084-95. [PMID: 25972566 DOI: 10.1093/icb/icv037] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Despite an explosion in the amount of sequence data, phylogenomics has failed to settle controversy regarding some critical nodes on the animal tree of life. Understanding relationships among Bilateria, Ctenophora, Cnidaria, Placozoa, and Porifera is essential for studying how complex traits such as neurons, muscles, and gastrulation have evolved. Recent studies have cast doubt on the historical viewpoint that sponges are sister to all other animal lineages with recent studies recovering ctenophores as sister. However, the ctenophore-sister hypothesis has been criticized as unrealistic and caused by systematic error. We review past phylogenomic studies and potential causes of systematic error in an effort to identify areas that can be improved in future studies. Increased sampling of taxa, less missing data, and a priori removal of sequences and taxa that may cause systematic error in phylogenomic inference will likely be the most fruitful areas of focus when assembling future datasets. Ultimately, we foresee metazoan relationships being resolved with higher support in the near future, and we caution against dismissing novel hypotheses merely because they conflict with historical viewpoints of animal evolution.
Collapse
Affiliation(s)
- Nathan V Whelan
- *Department of Biological Sciences, Molette Biology Laboratory for Environmental and Climate Change Studies, Auburn University, 101 Life Sciences Building, Auburn, AL 36849, USA;
| | - Kevin M Kocot
- School of Biological Sciences, The University of Queensland, 325 Goddard Building, St Lucia, QLD 4101, Australia
| | - Kenneth M Halanych
- *Department of Biological Sciences, Molette Biology Laboratory for Environmental and Climate Change Studies, Auburn University, 101 Life Sciences Building, Auburn, AL 36849, USA
| |
Collapse
|
14
|
Sebé-Pedrós A, Grau-Bové X, Richards TA, Ruiz-Trillo I. Evolution and classification of myosins, a paneukaryotic whole-genome approach. Genome Biol Evol 2015; 6:290-305. [PMID: 24443438 PMCID: PMC3942036 DOI: 10.1093/gbe/evu013] [Citation(s) in RCA: 100] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Myosins are key components of the eukaryotic cytoskeleton, providing motility for a broad diversity of cargoes. Therefore, understanding the origin and evolutionary history of myosin classes is crucial to address the evolution of eukaryote cell biology. Here, we revise the classification of myosins using an updated taxon sampling that includes newly or recently sequenced genomes and transcriptomes from key taxa. We performed a survey of eukaryotic genomes and phylogenetic analyses of the myosin gene family, reconstructing the myosin toolkit at different key nodes in the eukaryotic tree of life. We also identified the phylogenetic distribution of myosin diversity in terms of number of genes, associated protein domains and number of classes in each taxa. Our analyses show that new classes (i.e., paralogs) and domain architectures were continuously generated throughout eukaryote evolution, with a significant expansion of myosin abundance and domain architectural diversity at the stem of Holozoa, predating the origin of animal multicellularity. Indeed, single-celled holozoans have the most complex myosin complement among eukaryotes, with paralogs of most myosins previously considered animal specific. We recover a dynamic evolutionary history, with several lineage-specific expansions (e.g., the myosin III-like gene family diversification in choanoflagellates), convergence in protein domain architectures (e.g., fungal and animal chitin synthase myosins), and important secondary losses. Overall, our evolutionary scheme demonstrates that the ancestral eukaryote likely had a complex myosin repertoire that included six genes with different protein domain architectures. Finally, we provide an integrative and robust classification, useful for future genomic and functional studies on this crucial eukaryotic gene family.
Collapse
Affiliation(s)
- Arnau Sebé-Pedrós
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Passeig Marítim de la Barceloneta, Barcelona, Catalonia, Spain
| | | | | | | |
Collapse
|
15
|
Ward N, Moreno-Hagelsieb G. Quickly finding orthologs as reciprocal best hits with BLAT, LAST, and UBLAST: how much do we miss? PLoS One 2014; 9:e101850. [PMID: 25013894 PMCID: PMC4094424 DOI: 10.1371/journal.pone.0101850] [Citation(s) in RCA: 107] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Accepted: 06/11/2014] [Indexed: 11/30/2022] Open
Abstract
Reciprocal Best Hits (RBH) are a common proxy for orthology in comparative genomics. Essentially, a RBH is found when the proteins encoded by two genes, each in a different genome, find each other as the best scoring match in the other genome. NCBI's BLAST is the software most usually used for the sequence comparisons necessary to finding RBHs. Since sequence comparison can be time consuming, we decided to compare the number and quality of RBHs detected using algorithms that run in a fraction of the time as BLAST. We tested BLAT, LAST and UBLAST. All three programs ran in a hundredth to a 25th of the time required to run BLAST. A reduction in the number of homologs and RBHs found by the faster algorithms compared to BLAST becomes apparent as the genomes compared become more dissimilar, with BLAT, a program optimized for quickly finding very similar sequences, missing both the most homologs and the most RBHs. Though LAST produced the closest number of homologs and RBH to those produced with BLAST, UBLAST was very close, with either program producing between 0.6 and 0.8 of the RBHs as BLAST between dissimilar genomes, while in more similar genomes the differences were barely apparent. UBLAST ran faster than LAST, making it the best option among the programs tested.
Collapse
Affiliation(s)
- Natalie Ward
- Department of Biology, Wilfrid Laurier University, Waterloo, Ontario, Canada
| | | |
Collapse
|
16
|
Blank CE. Phylogenetic distribution of compatible solute synthesis genes support a freshwater origin for cyanobacteria. JOURNAL OF PHYCOLOGY 2013; 49:880-895. [PMID: 27007313 DOI: 10.1111/jpy.12098] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Accepted: 06/22/2013] [Indexed: 06/05/2023]
Abstract
Previous work using ancestral state reconstruction of habitat salinity preference proposed that the early cyanobacteria likely lived in a freshwater environment. The aim of this study was to test that hypothesis by performing phylogenetic analyses of the genes underlying salinity preferences in the cyanobacteria. Phylogenetic analysis of compatible solute genes shows that sucrose synthesis genes were likely ancestral in the cyanobacteria, and were also likely inherited during the cyanobacterial endosymbiosis and into the photosynthetic algae and land plants. In addition, the genes for the synthesis of compatible solutes that are necessary for survival in marine and hypersaline environments (such as glucosylglycerol, glucosylglycerate, and glycine betaine) were likely acquired independently high up (i.e., more recently) in the cyanobacterial tree. Because sucrose synthesis is strongly associated with growth in a low salinity environment, this independently supports a freshwater origin for the cyanobacteria. It is also consistent with geologic evidence showing that the early oceans were much warmer and saltier than modern oceans-sucrose synthesis alone would have been insufficient for early cyanobacteria to have colonized early Precambrian oceans that had a higher ionic strength. Indeed, the acquisition of an expanded set of new compatible solute genes may have enabled the historical colonization of marine and hypersaline environments by cyanobacteria, midway through their evolutionary history.
Collapse
Affiliation(s)
- Carrine E Blank
- Department of Geosciences, University of Montana, 32 Campus Drive #1296, Missoula, Montana, 59812-1296, USA
| |
Collapse
|
17
|
Dutilh BE, Backus L, Edwards RA, Wels M, Bayjanov JR, van Hijum SAFT. Explaining microbial phenotypes on a genomic scale: GWAS for microbes. Brief Funct Genomics 2013; 12:366-80. [PMID: 23625995 PMCID: PMC3743258 DOI: 10.1093/bfgp/elt008] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
There is an increasing availability of complete or draft genome sequences for microbial organisms. These data form a potentially valuable resource for genotype-phenotype association and gene function prediction, provided that phenotypes are consistently annotated for all the sequenced strains. In this review, we address the requirements for successful gene-trait matching. We outline a basic protocol for microbial functional genomics, including genome assembly, annotation of genotypes (including single nucleotide polymorphisms, orthologous groups and prophages), data pre-processing, genotype-phenotype association, visualization and interpretation of results. The methodologies for association described herein can be applied to other data types, opening up possibilities to analyze transcriptome-phenotype associations, and correlate microbial population structure or activity, as measured by metagenomics, to environmental parameters.
Collapse
Affiliation(s)
- Bas E Dutilh
- CMBI, NCMLS, Radboud University Medical Centre. Geert Grooteplein 28, 6525 GA Nijmegen, The Netherlands.
| | | | | | | | | | | |
Collapse
|
18
|
Kuenne C, Billion A, Mraheil MA, Strittmatter A, Daniel R, Goesmann A, Barbuddhe S, Hain T, Chakraborty T. Reassessment of the Listeria monocytogenes pan-genome reveals dynamic integration hotspots and mobile genetic elements as major components of the accessory genome. BMC Genomics 2013; 14:47. [PMID: 23339658 PMCID: PMC3556495 DOI: 10.1186/1471-2164-14-47] [Citation(s) in RCA: 154] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Accepted: 12/15/2012] [Indexed: 12/14/2022] Open
Abstract
Background Listeria monocytogenes is an important food-borne pathogen and model organism for host-pathogen interaction, thus representing an invaluable target considering research on the forces governing the evolution of such microbes. The diversity of this species has not been exhaustively explored yet, as previous efforts have focused on analyses of serotypes primarily implicated in human listeriosis. We conducted complete genome sequencing of 11 strains employing 454 GS FLX technology, thereby achieving full coverage of all serotypes including the first complete strains of serotypes 1/2b, 3c, 3b, 4c, 4d, and 4e. These were comparatively analyzed in conjunction with publicly available data and assessed for pathogenicity in the Galleria mellonella insect model. Results The species pan-genome of L. monocytogenes is highly stable but open, suggesting an ability to adapt to new niches by generating or including new genetic information. The majority of gene-scale differences represented by the accessory genome resulted from nine hyper variable hotspots, a similar number of different prophages, three transposons (Tn916, Tn554, IS3-like), and two mobilizable islands. Only a subset of strains showed CRISPR/Cas bacteriophage resistance systems of different subtypes, suggesting a supplementary function in maintenance of chromosomal stability. Multiple phylogenetic branches of the genus Listeria imply long common histories of strains of each lineage as revealed by a SNP-based core genome tree highlighting the impact of small mutations for the evolution of species L. monocytogenes. Frequent loss or truncation of genes described to be vital for virulence or pathogenicity was confirmed as a recurring pattern, especially for strains belonging to lineages III and II. New candidate genes implicated in virulence function were predicted based on functional domains and phylogenetic distribution. A comparative analysis of small regulatory RNA candidates supports observations of a differential distribution of trans-encoded RNA, hinting at a diverse range of adaptations and regulatory impact. Conclusions This study determined commonly occurring hyper variable hotspots and mobile elements as primary effectors of quantitative gene-scale evolution of species L. monocytogenes, while gene decay and SNPs seem to represent major factors influencing long-term evolution. The discovery of common and disparately distributed genes considering lineages, serogroups, serotypes and strains of species L. monocytogenes will assist in diagnostic, phylogenetic and functional research, supported by the comparative genomic GECO-LisDB analysis server (http://bioinfo.mikrobio.med.uni-giessen.de/geco2lisdb).
Collapse
Affiliation(s)
- Carsten Kuenne
- Institute of Medical Microbiology, German Centre for Infection Research, Justus-Liebig-University, D-35392, Giessen, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Low rates of lateral gene transfer among metabolic genes define the evolving biogeochemical niches of archaea through deep time. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2012; 2012:843539. [PMID: 23226971 PMCID: PMC3512248 DOI: 10.1155/2012/843539] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Revised: 09/02/2012] [Accepted: 10/02/2012] [Indexed: 01/26/2023]
Abstract
Phylogenomic analyses of archaeal genome sequences are providing windows into the group's evolutionary past, even though most archaeal taxa lack a conventional fossil record. Here, phylogenetic analyses were performed using key metabolic genes that define the metabolic niche of microorganisms. Such genes are generally considered to have undergone high rates of lateral gene transfer. Many gene sequences formed clades that were identical, or similar, to the tree constructed using large numbers of genes from the stable core of the genome. Surprisingly, such lateral transfer events were readily identified and quantifiable, occurring only a relatively small number of times in the archaeal domain of life. By placing gene acquisition events into a temporal framework, the rates by which new metabolic genes were acquired can be quantified. The highest lateral transfer rates were among cytochrome oxidase genes that use oxygen as a terminal electron acceptor (with a total of 12–14 lateral transfer events, or 3.4–4.0 events per billion years, across the entire archaeal domain). Genes involved in sulfur or nitrogen metabolism had much lower rates, on the order of one lateral transfer event per billion years. This suggests that lateral transfer rates of key metabolic proteins are rare and not rampant.
Collapse
|
20
|
Dutilh BE, Schmieder R, Nulton J, Felts B, Salamon P, Edwards RA, Mokili JL. Reference-independent comparative metagenomics using cross-assembly: crAss. ACTA ACUST UNITED AC 2012; 28:3225-31. [PMID: 23074261 PMCID: PMC3519457 DOI: 10.1093/bioinformatics/bts613] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
MOTIVATION Metagenomes are often characterized by high levels of unknown sequences. Reads derived from known microorganisms can easily be identified and analyzed using fast homology search algorithms and a suitable reference database, but the unknown sequences are often ignored in further analyses, biasing conclusions. Nevertheless, it is possible to use more data in a comparative metagenomic analysis by creating a cross-assembly of all reads, i.e. a single assembly of reads from different samples. Comparative metagenomics studies the interrelationships between metagenomes from different samples. Using an assembly algorithm is a fast and intuitive way to link (partially) homologous reads without requiring a database of reference sequences. RESULTS Here, we introduce crAss, a novel bioinformatic tool that enables fast simple analysis of cross-assembly files, yielding distances between all metagenomic sample pairs and an insightful image displaying the similarities.
Collapse
Affiliation(s)
- Bas E Dutilh
- Centre for Molecular and Biomolecular Informatics, Nijmegen Centre for Molecular Life Sciences, Radboud University Medical Centre, 6525 GA Nijmegen, The Netherlands.
| | | | | | | | | | | | | |
Collapse
|
21
|
Meinel T, Krause A. Meta-analysis of general bacterial subclades in whole-genome phylogenies using tree topology profiling. Evol Bioinform Online 2012; 8:489-525. [PMID: 22915837 PMCID: PMC3422217 DOI: 10.4137/ebo.s9642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
In the last two decades, a large number of whole-genome phylogenies have been inferred to reconstruct the Tree of Life (ToL). Underlying data models range from gene or functionality content in species to phylogenetic gene family trees and multiple sequence alignments of concatenated protein sequences. Diversity in data models together with the use of different tree reconstruction techniques, disruptive biological effects and the steadily increasing number of genomes have led to a huge diversity in published phylogenies. Comparison of those and, moreover, identification of the impact of inference properties (underlying data model, inference technique) on particular reconstructions is almost impossible. In this work, we introduce tree topology profiling as a method to compare already published whole-genome phylogenies. This method requires visual determination of the particular topology in a drawn whole-genome phylogeny for a set of particular bacterial clans. For each clan, neighborhoods to other bacteria are collected into a catalogue of generalized alternative topologies. Particular topology alternatives found for an ordered list of bacterial clans reveal a topology profile that represents the analyzed phylogeny. To simulate the inhomogeneity of published gene content phylogenies we generate a set of seven phylogenies using different inference techniques and the SYSTERS-PhyloMatrix data model. After tree topology profiling on in total 54 selected published and newly inferred phylogenies, we separate artefactual from biologically meaningful phylogenies and associate particular inference results (phylogenies) with inference background (inference techniques as well as data models). Topological relationships of particular bacterial species groups are presented. With this work we introduce tree topology profiling into the scientific field of comparative phylogenomics.
Collapse
Affiliation(s)
- Thomas Meinel
- Charité-University Medicine Berlin, Institute for Physiology, Structural Bioinformatics Group, Thielallee 71, 14195 Berlin, Germany
| | | |
Collapse
|
22
|
Taxonomic and functional microbial signatures of the endemic marine sponge Arenosclera brasiliensis. PLoS One 2012; 7:e39905. [PMID: 22768320 PMCID: PMC3388064 DOI: 10.1371/journal.pone.0039905] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 05/29/2012] [Indexed: 11/19/2022] Open
Abstract
The endemic marine sponge Arenosclera brasiliensis (Porifera, Demospongiae, Haplosclerida) is a known source of secondary metabolites such as arenosclerins A-C. In the present study, we established the composition of the A. brasiliensis microbiome and the metabolic pathways associated with this community. We used 454 shotgun pyrosequencing to generate approximately 640,000 high-quality sponge-derived sequences (∼150 Mb). Clustering analysis including sponge, seawater and twenty-three other metagenomes derived from marine animal microbiomes shows that A. brasiliensis contains a specific microbiome. Fourteen bacterial phyla (including Proteobacteria, Cyanobacteria, Actinobacteria, Bacteroidetes, Firmicutes and Cloroflexi) were consistently found in the A. brasiliensis metagenomes. The A. brasiliensis microbiome is enriched for Betaproteobacteria (e.g., Burkholderia) and Gammaproteobacteria (e.g., Pseudomonas and Alteromonas) compared with the surrounding planktonic microbial communities. Functional analysis based on Rapid Annotation using Subsystem Technology (RAST) indicated that the A. brasiliensis microbiome is enriched for sequences associated with membrane transport and one-carbon metabolism. In addition, there was an overrepresentation of sequences associated with aerobic and anaerobic metabolism as well as the synthesis and degradation of secondary metabolites. This study represents the first analysis of sponge-associated microbial communities via shotgun pyrosequencing, a strategy commonly applied in similar analyses in other marine invertebrate hosts, such as corals and algae. We demonstrate that A. brasiliensis has a unique microbiome that is distinct from that of the surrounding planktonic microbes and from other marine organisms, indicating a species-specific microbiome.
Collapse
|
23
|
Rosenfeld JA, DeSalle R. E value cutoff and eukaryotic genome content phylogenetics. Mol Phylogenet Evol 2012; 63:342-50. [PMID: 22306824 DOI: 10.1016/j.ympev.2012.01.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2011] [Revised: 01/02/2012] [Accepted: 01/03/2012] [Indexed: 10/14/2022]
Abstract
Genome content analysis has been used as a source of phylogenetic information in large prokaryotic tree of life studies. Recently the sequencing of many eukaryotic genomes has allowed for the similar use of genome content analysis for these organisms too. In this communication we examine the utility of genome content analysis for recovering phylogenetic patterns in several eukaryotic groups. By constructing multiple matrices using different e value cutoffs we examine the dynamics of altering the e value cutoff on five eukaryotic genome data sets. Our analysis indicates that the e value cutoff that is used as a criterion in the construction of the genome content matrix is a critical factor in both the accuracy and information content of the analysis. Strikingly, genome content by itself is not a reliable or accurate source of characters for phylogenetic analysis of the taxa in the five data sets we analyzed. We discuss two problems--small genome attraction and genome duplications as being involved in the rather poor performance of genome content data in recovering eukaryotic phylogeny.
Collapse
Affiliation(s)
- Jeffrey A Rosenfeld
- IST/High Performance and Research Computing, University of Medicine and Dentistry of New Jersey, Newark, NJ 07103, United States.
| | | |
Collapse
|
24
|
Rintoul TL, Eggertson QA, Lévesque CA. Multigene phylogenetic analyses to delimit new species in fungal plant pathogens. Methods Mol Biol 2012; 835:549-69. [PMID: 22183677 DOI: 10.1007/978-1-61779-501-5_34] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Supporting the identification of unknown strains or specimens by sequencing a genetic marker commonly used for phylogenetics or DNA barcoding is now standard practice for mycologists and plant pathologists. Does one have a new species when a strain differs by a few base pairs when compared to reference sequences from taxonomically well-characterized species that do not differ morphologically from this new strain? If variation at the intra- and interspecific levels for the locus used for identification is already understood for all the closely related species, it is possible to make a reliable prediction of a new species status, but ultimately this question can only be properly addressed by determining the presence or absence of gene flow among a group of strains of the putative new species and strains of previously delimited species. The Phylogenetic Species Concept (PSC) and its assessment using multigene phylogeny and Genealogical Concordance Phylogenetic Species Recognition (GCPSR) are the basis for this chapter. The theoretical framework and a variety of tools to apply these concepts are explained, to assist in the assessment of whether a species is distinct or new when confronted with some sequence divergence from reference data.
Collapse
Affiliation(s)
- Tara L Rintoul
- Biodiversity (Mycology), Central Experimental Farm, Agriculture and Agri-Food Canada, Ottawa, ON, Canada
| | | | | |
Collapse
|
25
|
Blank CE. An expansion of age constraints for microbial clades that lack a conventional fossil record using phylogenomic dating. J Mol Evol 2011; 73:188-208. [PMID: 22105429 DOI: 10.1007/s00239-011-9467-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2011] [Accepted: 10/24/2011] [Indexed: 01/22/2023]
Abstract
Most microbial taxa lack a conventional microfossil or biomarker record, and so we currently have little information regarding how old most microbial clades and their associated traits are. Building on the previously published oxygen age constraint, two new age constraints are proposed based on the ability of microbial clades to metabolize chitin and aromatic compounds derived from lignin. Using the archaeal domain of life as a test case, phylogenetic analyses, along with published metabolic and genetic data, showed that members of the Halobacteriales and Thermococcales are able to metabolize chitin. Ancestral state reconstruction combined with phylogenetic analysis of the genes underlying chitin degradation predicted that the ancestors of these two groups were also likely able to metabolize chitin or chitin-related compounds. These two clades were therefore assigned a maximum age of 1.0 Ga (when chitin likely first appeared). Similar analyses also predicted that the ancestor to the Sulfolobus solfataricus-Sulfolobus islandicus clade was able to metabolize phenol using catechol dioxygenase, so this clade was assigned a maximum age of 475 Ma. Inferred ages of archaeal clades using relaxed molecular clocks with the new age constraints were consistent with those inferred with the oxygen age constraints. This work expands our current toolkit to include Paleoproterozoic, Neoproterozoic, and Paleozoic age constraints, and should aid in our ability to phylogenetically reconstruct the antiquity of a wide array of microbial clades and their associated morphological and biogeochemical traits, spanning deep geologic time. Such hypotheses-although built upon evolutionary inferences-are fundamentally testable.
Collapse
Affiliation(s)
- Carrine E Blank
- Department of Geosciences, University of Montana, 32 Campus Drive #1296, Missoula, MT 59812-1296, USA.
| |
Collapse
|
26
|
THUILLARD MARC, MOULTON VINCENT. IDENTIFYING AND RECONSTRUCTING LATERAL TRANSFERS FROM DISTANCE MATRICES BY COMBINING THE MINIMUM CONTRADICTION METHOD AND NEIGHBOR-NET. J Bioinform Comput Biol 2011; 9:453-70. [DOI: 10.1142/s0219720011005409] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2010] [Revised: 02/01/2011] [Accepted: 02/13/2011] [Indexed: 11/18/2022]
Abstract
Identifying lateral gene transfers is an important problem in evolutionary biology. Under a simple model of evolution, the expected values of an evolutionary distance matrix describing a phylogenetic tree fulfill the so-called Kalmanson inequalities. The Minimum Contradiction method for identifying lateral gene transfers exploits the fact that lateral transfers may generate large deviations from the Kalmanson inequalities. Here a new approach is presented to deal with such cases that combines the Neighbor-Net algorithm for computing phylogenetic networks with the Minimum Contradiction method. A subset of taxa, prescribed using Neighbor-Net, is obtained by measuring how closely the Kalmanson inequalities are fulfilled by each taxon. A criterion is then used to identify the taxa, possibly involved in a lateral transfer between nonconsecutive taxa. We illustrate the utility of the new approach by applying it to a distance matrix for Archaea, Bacteria, and Eukaryota.
Collapse
Affiliation(s)
| | - VINCENT MOULTON
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| |
Collapse
|
27
|
Casaregola S, Weiss S, Morel G. New perspectives in hemiascomycetous yeast taxonomy. C R Biol 2011; 334:590-8. [DOI: 10.1016/j.crvi.2011.05.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2010] [Accepted: 04/01/2011] [Indexed: 12/26/2022]
|
28
|
Abstract
We have developed a semi-automatic methodology to reconstruct the phylogenetic species tree in Protozoa, integrating different phylogenetic algorithms and programs, and demonstrating the utility of a supermatrix approach to construct phylogenomics-based trees using 31 universal orthologs (UO). The species tree obtained was formed by three major clades that were related to three groups of data: i) Species containing at least 80% of UO (25/31) in the concatenated multiple alignment or supermatrix, this clade was called C1, ii) Species containing between 50%–79% (15–24/31) of UO called C2, and iii) Species containing less than 50% (1–14/31) of UO called C3. C1 was composed by only protozoan species, C2 was composed by species related to Protozoa, and C3 was composed by some species of C1 (Protozoa) and C2 (related to Protozoa). Our phylogenomics-based methodology using a supermatrix approach proved to be reliable with protozoan genome data and using at least 25 UO, suggesting that (a) the more UO used the better, (b) using the entire UO sequence or just a conserved block of it for the supermatrix produced similar phylogenomic trees.
Collapse
|
29
|
Kupczok A, Schmidt HA, von Haeseler A. Accuracy of phylogeny reconstruction methods combining overlapping gene data sets. Algorithms Mol Biol 2010; 5:37. [PMID: 21134245 PMCID: PMC3022592 DOI: 10.1186/1748-7188-5-37] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Accepted: 12/06/2010] [Indexed: 11/17/2022] Open
Abstract
Background The availability of many gene alignments with overlapping taxon sets raises the question of which strategy is the best to infer species phylogenies from multiple gene information. Methods and programs abound that use the gene alignment in different ways to reconstruct the species tree. In particular, different methods combine the original data at different points along the way from the underlying sequences to the final tree. Accordingly, they are classified into superalignment, supertree and medium-level approaches. Here, we present a simulation study to compare different methods from each of these three approaches. Results We observe that superalignment methods usually outperform the other approaches over a wide range of parameters including sparse data and gene-specific evolutionary parameters. In the presence of high incongruency among gene trees, however, other combination methods show better performance than the superalignment approach. Surprisingly, some supertree and medium-level methods exhibit, on average, worse results than a single gene phylogeny with complete taxon information. Conclusions For some methods, using the reconstructed gene tree as an estimation of the species tree is superior to the combination of incomplete information. Superalignment usually performs best since it is less susceptible to stochastic error. Supertree methods can outperform superalignment in the presence of gene-tree conflict.
Collapse
|
30
|
Valdivia-Granda WA. Bioinformatics for biodefense: challenges and opportunities. Biosecur Bioterror 2010; 8:69-77. [PMID: 20230234 DOI: 10.1089/bsp.2009.0024] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The intentional release of traditional or combinatorial bioweapons remains one of the most important challenges that will continue to shape homeland security. The misuse of dual-use and how-to methods and techniques in the fields of molecular, synthetic, and computational biology can lessen the technical barriers for launching attacks, even for small groups or individuals. Bioinformatics is guiding the implementation of several biodefense countermeasures. However, existing algorithms have not effectively translated available pathogen genomic data into standardized diagnostics, rational vaccine development, or broad spectrum therapeutics. Despite its potential, bioinformatics has a limited impact on forensic and intelligence operations. More than 12 biodefense databases and information exchange architectures lack interoperability and a common layer that restricts scalability and the development of biodefense enterprises. Therefore, in order to use next-generation genome sequencing for medical intelligence, forensic operations, biothreat awareness, and mitigation, the attention has to be redirected toward the development of computational biology applications. This article debates some of the challenges that the bioinformatics field confronts in terms of biodefense problems and proposes potential opportunities to use pathogen genomic data. Issues related to the analysis of pathogen genomes and emerging methods including genomic barcoding, active curation, and knowledge management and their impact on intelligence, forensics, and policymaking are discussed.
Collapse
|
31
|
En route to a genome-based classification of Archaea and Bacteria? Syst Appl Microbiol 2010; 33:175-82. [PMID: 20409658 DOI: 10.1016/j.syapm.2010.03.003] [Citation(s) in RCA: 250] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Revised: 03/10/2010] [Accepted: 03/17/2010] [Indexed: 11/23/2022]
Abstract
Given the considerable promise whole-genome sequencing offers for phylogeny and classification, it is surprising that microbial systematics and genomics have not yet been reconciled. This might be due to the intrinsic difficulties in inferring reasonable phylogenies from genomic sequences, particularly in the light of the significant amount of lateral gene transfer in prokaryotic genomes. However, recent studies indicate that the species tree and the hierarchical classification based on it are still meaningful concepts, and that state-of-the-art phylogenetic inference methods are able to provide reliable estimates of the species tree to the benefit of taxonomy. Conversely, we suspect that the current lack of completely sequenced genomes for many of the major lineages of prokaryotes and for most type strains is a major obstacle in progress towards a genome-based classification of microorganisms. We conclude that phylogeny-driven microbial genome sequencing projects such as the Genomic Encyclopaedia of Archaea and Bacteria (GEBA) project are likely to rectify this situation.
Collapse
|
32
|
Ebersberger I, Strauss S, von Haeseler A. HaMStR: profile hidden markov model based search for orthologs in ESTs. BMC Evol Biol 2009; 9:157. [PMID: 19586527 PMCID: PMC2723089 DOI: 10.1186/1471-2148-9-157] [Citation(s) in RCA: 208] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2008] [Accepted: 07/08/2009] [Indexed: 12/05/2022] Open
Abstract
Background EST sequencing is a versatile approach for rapidly gathering protein coding sequences. They provide direct access to an organism's gene repertoire bypassing the still error-prone procedure of gene prediction from genomic data. Therefore, ESTs are often the only source for biological sequence data from taxa outside mainstream interest. The widespread use of ESTs in evolutionary studies and particularly in molecular systematics studies is still hindered by the lack of efficient and reliable approaches for automated ortholog predictions in ESTs. Existing methods either depend on a known species tree or cannot cope with redundancy in EST data. Results We present a novel approach (HaMStR) to mine EST data for the presence of orthologs to a curated set of genes. HaMStR combines a profile Hidden Markov Model search and a subsequent BLAST search to extend existing ortholog cluster with sequences from further taxa. We show that the HaMStR results are consistent with those obtained with existing orthology prediction methods that require completely sequenced genomes. A case study on the phylogeny of 35 fungal taxa illustrates that HaMStR is well suited to compile informative data sets for phylogenomic studies from ESTs and protein sequence data. Conclusion HaMStR extends in a standardized manner a pre-defined set of orthologs with ESTs from further taxa. In the same fashion HaMStR can be applied to protein sequence data, and thus provides a comprehensive approach to compile ortholog cluster from any protein coding data. The resulting orthology predictions serve as the data basis for a variety of evolutionary studies. Here, we have demonstrated the application of HaMStR in a molecular systematics study. However, we envision that studies tracing the evolutionary fate of individual genes or functional complexes of genes will greatly benefit from HaMStR orthology predictions as well.
Collapse
Affiliation(s)
- Ingo Ebersberger
- Center for Integrative Bioinformatics Vienna, Max F, Perutz Laboratories, Vienna, Austria.
| | | | | |
Collapse
|
33
|
The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc Natl Acad Sci U S A 2009; 106:7273-80. [PMID: 19351897 DOI: 10.1073/pnas.0901808106] [Citation(s) in RCA: 169] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The evolutionary rates of protein-coding genes in an organism span, approximately, 3 orders of magnitude and show a universal, approximately log-normal distribution in a broad variety of species from prokaryotes to mammals. This universal distribution implies a steady-state process, with identical distributions of evolutionary rates among genes that are gained and genes that are lost. A mathematical model of such process is developed under the single assumption of the constancy of the distributions of the propensities for gene loss (PGL). This model predicts that genes of different ages, that is, genes with homologs detectable at different phylogenetic depths, substantially differ in those variables that correlate with PGL. We computationally partition protein-coding genes from humans, flies, and Aspergillus fungus into age classes, and show that genes of different ages retain the universal log-normal distribution of evolutionary rates, with a shift toward higher rates in "younger" classes but also with a substantial overlap. The only exception involves human primate-specific genes that show a heavy tail of rapidly evolving genes, probably owing to gene annotation artifacts. As predicted, the gene age classes differ in characteristics correlated with PGL. Compared with "young" genes (e.g., mammal-specific human ones), "old" genes (e.g., eukaryote-specific), on average, are longer, are expressed at a higher level, possess a higher intron density, evolve slower on the short time scale, and are subject to stronger purifying selection. Thus, genome evolution fits a simple model with approximately uniform rates of gene gain and loss, without major bursts of genomic innovation.
Collapse
|
34
|
Langille MGI, Brinkman FSL. Bioinformatic detection of horizontally transferred DNA in bacterial genomes. F1000 BIOLOGY REPORTS 2009; 1:25. [PMID: 20948661 PMCID: PMC2920674 DOI: 10.3410/b1-25] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
We highlight a selection of recent research on computational methods and associated challenges surrounding the prediction of bacterial horizontal gene transfer. This research area continues to face controversy, but is becoming more critical as the importance of horizontal gene transfer in medically and ecologically important prokaryotic evolution is further appreciated.
Collapse
Affiliation(s)
- Morgan G I Langille
- Department of Molecular Biology and Biochemistry, Simon Fraser University Burnaby, BC Canada V5A 1S6
| | | |
Collapse
|
35
|
Nikolaou E, Agrafioti I, Stumpf M, Quinn J, Stansfield I, Brown AJP. Phylogenetic diversity of stress signalling pathways in fungi. BMC Evol Biol 2009; 9:44. [PMID: 19232129 PMCID: PMC2666651 DOI: 10.1186/1471-2148-9-44] [Citation(s) in RCA: 143] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2008] [Accepted: 02/21/2009] [Indexed: 01/05/2023] Open
Abstract
Background Microbes must sense environmental stresses, transduce these signals and mount protective responses to survive in hostile environments. In this study we have tested the hypothesis that fungal stress signalling pathways have evolved rapidly in a niche-specific fashion that is independent of phylogeny. To test this hypothesis we have compared the conservation of stress signalling molecules in diverse fungal species with their stress resistance. These fungi, which include ascomycetes, basidiomycetes and microsporidia, occupy highly divergent niches from saline environments to plant or mammalian hosts. Results The fungi displayed significant variation in their resistance to osmotic (NaCl and sorbitol), oxidative (H2O2 and menadione) and cell wall stresses (Calcofluor White and Congo Red). There was no strict correlation between fungal phylogeny and stress resistance. Rather, the human pathogens tended to be more resistant to all three types of stress, an exception being the sensitivity of Candida albicans to the cell wall stress, Calcofluor White. In contrast, the plant pathogens were relatively sensitive to oxidative stress. The degree of conservation of osmotic, oxidative and cell wall stress signalling pathways amongst the eighteen fungal species was examined. Putative orthologues of functionally defined signalling components in Saccharomyces cerevisiae were identified by performing reciprocal BLASTP searches, and the percent amino acid identities of these orthologues recorded. This revealed that in general, central components of the osmotic, oxidative and cell wall stress signalling pathways are relatively well conserved, whereas the sensors lying upstream and transcriptional regulators lying downstream of these modules have diverged significantly. There was no obvious correlation between the degree of conservation of stress signalling pathways and the resistance of a particular fungus to the corresponding stress. Conclusion Our data are consistent with the hypothesis that fungal stress signalling components have undergone rapid recent evolution to tune the stress responses in a niche-specific fashion.
Collapse
Affiliation(s)
- Elissavet Nikolaou
- Aberdeen Fungal Group, School of Medical Sciences, University of Aberdeen, Institute of Medical Sciences, Foresterhill, Aberdeen, AB25 2ZD, UK.
| | | | | | | | | | | |
Collapse
|
36
|
The tree versus the forest: the fungal tree of life and the topological diversity within the yeast phylome. PLoS One 2009; 4:e4357. [PMID: 19190756 PMCID: PMC2629814 DOI: 10.1371/journal.pone.0004357] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2008] [Accepted: 12/18/2008] [Indexed: 01/01/2023] Open
Abstract
A recurrent topic in phylogenomics is the combination of various sequence alignments to reconstruct a tree that describes the evolutionary relationships within a group of species. However, such approach has been criticized for not being able to properly represent the topological diversity found among gene trees. To evaluate the representativeness of species trees based on concatenated alignments, we reconstruct several fungal species trees and compare them with the complete collection of phylogenies of genes encoded in the Saccharomyces cerevisiae genome. We found that, despite high levels of among-gene topological variation, the species trees do represent widely supported phylogenetic relationships. Most topological discrepancies between gene and species trees are concentrated in certain conflicting nodes. We propose to map such information on the species tree so that it accounts for the levels of congruence across the genome. We identified the lack of sufficient accuracy of current alignment and phylogenetic methods as an important source for the topological diversity encountered among gene trees. Finally, we discuss the implications of the high levels of topological variation for phylogeny-based orthology prediction strategies.
Collapse
|
37
|
Abstract
A universal Tree of Life has been a longstanding goal of the biosciences. The most common Tree of Life, based on the small subunit rRNA gene, may or may not represent the phylogenetic history of microorganisms. The horizontal transfer of genes from one taxon to another provides a means by which each gene may tell of an independent history. When complete genomes became available, the extent to which horizontal gene transfer (HGT) has occurred became more evident. When using genomic data to study the Tree of Life, one can use any of the four broad approaches: (i) build lots of individual gene trees ("phylogenomics"), (ii) concatenate genes together for an analysis yielding one "supergene" tree, (iii) form a single tree based on the "gene content" within genomes using either orthologs or homologs, or (iv) investigate the order of genes within genomes to discern some aspects of microbial evolution. The application of whole genome tree building has suggested that there is a core tree, that such a core tree can be investigated using these varied methods, and that the results are largely similar to those of the rRNA universal Tree of Life. Some of the most interesting features of the rRNA tree, such as early diverging hyperthermophilic lineages are still uncertain, but remain a possibility. Genomic trees and geologic evidence together suggest that the vertical descent of genes and the horizontal transfer of genes between genetically similar lineages ultimately results in a core Tree of Life with at least some lineages that have phenotypic characteristics recognizable for billions of years.
Collapse
Affiliation(s)
- Christopher H House
- Department of Geosciences and Pennsylvania State Astrobiology Research Center, Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
38
|
|
39
|
Liu Y, Leigh JW, Brinkmann H, Cushion MT, Rodriguez-Ezpeleta N, Philippe H, Lang BF. Phylogenomic analyses support the monophyly of Taphrinomycotina, including Schizosaccharomyces fission yeasts. Mol Biol Evol 2008; 26:27-34. [PMID: 18922765 DOI: 10.1093/molbev/msn221] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Several morphologically dissimilar ascomycete fungi including Schizosaccharomyces, Taphrina, Saitoella, Pneumocystis, and Neolecta have been grouped into the taxon Taphrinomycotina (Archiascomycota or Archiascomycotina), originally based on rRNA phylogeny. These analyses lack statistically significant support for the monophyly of this grouping, and although confirmed by more recent multigene analyses, this topology is contradicted by mitochondrial phylogenies. To resolve this inconsistency, we have assembled phylogenomic mitochondrial and nuclear data sets from four distantly related taphrinomycotina taxa: Schizosaccharomyces pombe, Pneumocystis carinii, Saitoella complicata, and Taphrina deformans. Our phylogenomic analyses based on nuclear data (113 proteins) conclusively support the monophyly of Taphrinomycotina, diverging as a sister group to Saccharomycotina + Pezizomycotina. However, despite the improved taxon sampling, Taphrinomycotina continue to be paraphyletic with the mitochondrial data set (13 proteins): Schizosaccharomyces species associate with budding yeasts (Saccharomycotina) and the other Taphrinomycotina group as a sister group to Saccharomycotina + Pezizomycotina. Yet, as Schizosaccharomyces and Saccharomycotina species are fast evolving, the mitochondrial phylogeny may be influenced by a long-branch attraction (LBA) artifact. After removal of fast-evolving sequence positions from the mitochondrial data set, we recover the monophyly of Taphrinomycotina. Our combined results suggest that Taphrinomycotina is a legitimate taxon, that this group of species diverges as a sister group to Saccharomycotina + Pezizomycotina, and that phylogenetic positioning of yeasts and fission yeasts with mitochondrial data is plagued by a strong LBA artifact.
Collapse
Affiliation(s)
- Yu Liu
- Robert Cedergren Centre, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | | | | | | | | | | | | |
Collapse
|
40
|
Abstract
Minimum contradiction matrices are a useful complement to distance-based phylogenies. A minimum contradiction matrix represents phylogenetic information under the form of an ordered distance matrix Y(i) (,) (j) (n). A matrix element corresponds to the distance from a reference vertex n to the path (i, j). For an X-tree or a split network, the minimum contradiction matrix is a Robinson matrix. It therefore fulfills all the inequalities defining perfect order: Y(i) (,) (j) (n) >or= Y(i) (,) (k) (n) (,)Y(k j) (n) >or= Y(k) (,) (I) (n), i
Collapse
|
41
|
Zwick ME, Kiley MP, Stewart AC, Mateczun A, Read TD. Genotyping of Bacillus cereus strains by microarray-based resequencing. PLoS One 2008; 3:e2513. [PMID: 18596941 PMCID: PMC2438477 DOI: 10.1371/journal.pone.0002513] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2008] [Accepted: 05/18/2008] [Indexed: 11/20/2022] Open
Abstract
The ability to distinguish microbial pathogens from closely related but nonpathogenic strains is key to understanding the population biology of these organisms. In this regard, Bacillus anthracis, the bacterium that causes inhalational anthrax, is of interest because it is closely related and often difficult to distinguish from other members of the B. cereus group that can cause diverse diseases. We employed custom-designed resequencing arrays (RAs) based on the genome sequence of Bacillus anthracis to generate 422 kb of genomic sequence from a panel of 41 Bacillus cereus sensu lato strains. Here we show that RAs represent a “one reaction” genotyping technology with the ability to discriminate between highly similar B. anthracis isolates and more divergent strains of the B. cereus s.l. Clade 1. Our data show that RAs can be an efficient genotyping technology for pre-screening the genetic diversity of large strain collections to selected the best candidates for whole genome sequencing.
Collapse
Affiliation(s)
- Michael E Zwick
- Biological Defense Research Directorate, Naval Medical Research Center, Silver Spring, Maryland, United States of America. Michael E. Zwick
| | | | | | | | | |
Collapse
|
42
|
An Exact Algorithm for the Geodesic Distance between Phylogenetic Trees. J Comput Biol 2008; 15:577-91. [DOI: 10.1089/cmb.2008.0068] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
43
|
Abstract
Gene content has been shown to contain a strong phylogenetic signal, yet its usage for phylogenetic questions is hampered by horizontal gene transfer and parallel gene loss and until now required completely sequenced genomes. Here, we introduce an approach that allows the phylogenetic signal in gene content to be applied to any set of sequences, using signature genes for phylogenetic classification. The hundreds of publicly available genomes allow us to identify signature genes at various taxonomic depths, and we show how the presence of signature genes in an unspecified sample can be used to characterize its taxonomic composition. We identify 8,362 signature genes specific for 112 prokaryotic taxa. We show that these signature genes can be used to address phylogenetic questions on the basis of gene content in cases where classic gene content or sequence analyses provide an ambiguous answer, such as for Nanoarchaeum equitans, and even in cases where complete genomes are not available, such as for metagenomics data. Cross-validation experiments leaving out up to 30% of the species show that ∼92% of the signature genes correctly place the species in a related clade. Analyses of metagenomics data sets with the signature gene approach are in good agreement with the previously reported species distributions based on phylogenetic analysis of marker genes. Summarizing, signature genes can complement traditional sequence-based methods in addressing taxonomic questions.
Collapse
Affiliation(s)
- Bas E Dutilh
- Center for Molecular and Biomolecular Informatics/Nijmegen Center for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | | | | | | |
Collapse
|
44
|
Dutilh BE, He Y, Hekkelman ML, Huynen MA. Signature, a web server for taxonomic characterization of sequence samples using signature genes. Nucleic Acids Res 2008; 36:W470-4. [PMID: 18487625 PMCID: PMC2447722 DOI: 10.1093/nar/gkn277] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Signature genes are genes that are unique to a taxonomic clade and are common within it. They contain a wealth of information about clade-specific processes and hold a strong evolutionary signal that can be used to phylogenetically characterize a set of sequences, such as a metagenomics sample. As signature genes are based on gene content, they provide a means to assess the taxonomic origin of a sequence sample that is complementary to sequence-based analyses. Here, we introduce Signature (http://www.cmbi.ru.nl/signature), a web server that identifies the signature genes in a set of query sequences, and therewith phylogenetically characterizes it. The server produces a list of taxonomic clades that share signature genes with the set of query sequences, along with an insightful image of the tree of life, in which the clades are color coded based on the number of signature genes present. This allows the user to quickly see from which part(s) of the taxonomy the query sequences likely originate.
Collapse
Affiliation(s)
- Bas E Dutilh
- Center for Molecular and Biomolecular Informatics/Nijmegen Center for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands.
| | | | | | | |
Collapse
|
45
|
Wu Q, James SA, Roberts IN, Moulton V, Huber KT. Exploring contradictory phylogenetic relationships in yeasts. FEMS Yeast Res 2008; 8:641-50. [DOI: 10.1111/j.1567-1364.2008.00362.x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
|
46
|
Kensche PR, van Noort V, Dutilh BE, Huynen MA. Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution. J R Soc Interface 2008; 5:151-70. [PMID: 17535793 PMCID: PMC2405902 DOI: 10.1098/rsif.2007.1047] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2007] [Revised: 05/05/2007] [Accepted: 05/05/2007] [Indexed: 11/12/2022] Open
Abstract
The gap between the amount of genome information released by genome sequencing projects and our knowledge about the proteins' functions is rapidly increasing. To fill this gap, various 'genomic-context' methods have been proposed that exploit sequenced genomes to predict the functions of the encoded proteins. One class of methods, phylogenetic profiling, predicts protein function by correlating the phylogenetic distribution of genes with that of other genes or phenotypic characteristics. The functions of a number of proteins, including ones of medical relevance, have thus been predicted and subsequently confirmed experimentally. Additionally, various approaches to measure the similarity of phylogenetic profiles and to account for the phylogenetic bias in the data have been proposed. We review the successful applications of phylogenetic profiling and analyse the performance of various profile similarity measures with a set of one microsporidial and 25 fungal genomes. In the fungi, phylogenetic profiling yields high-confidence predictions for the highest and only the highest scoring gene pairs illustrating both the power and the limitations of the approach. Both practical examples and theoretical considerations suggest that in order to get a reliable and specific picture of a protein's function, results from phylogenetic profiling have to be combined with other sources of evidence.
Collapse
Affiliation(s)
- Philip R. Kensche
- Centre for Molecular and Biomolecular Informatics/Nijmegen, Centre for Molecular Life Sciences, Radboud University Medical CentrePO Box 9101, 6500 HB Nijmegen, The Netherlands
| | - Vera van Noort
- European Molecular Biology Laboratory, Meyerhofstrasse 169117 Heidelberg, Germany
| | - Bas E. Dutilh
- Centre for Molecular and Biomolecular Informatics/Nijmegen, Centre for Molecular Life Sciences, Radboud University Medical CentrePO Box 9101, 6500 HB Nijmegen, The Netherlands
| | - Martijn A. Huynen
- Centre for Molecular and Biomolecular Informatics/Nijmegen, Centre for Molecular Life Sciences, Radboud University Medical CentrePO Box 9101, 6500 HB Nijmegen, The Netherlands
| |
Collapse
|
47
|
Phylogenomics, Protein Family Evolution, and the Tree of Life: An Integrated Approach between Molecular Evolution and Computational Intelligence. APPLICATIONS OF COMPUTATIONAL INTELLIGENCE IN BIOLOGY 2008. [DOI: 10.1007/978-3-540-78534-7_11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
48
|
Marcet-Houben M, Puigbò P, Romeu A, Garcia-Vallve S. Towards reconstructing a metabolic tree of life. Bioinformation 2007; 2:135-44. [PMID: 21670791 PMCID: PMC2255071 DOI: 10.6026/97320630002135] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2007] [Accepted: 11/23/2007] [Indexed: 11/24/2022] Open
Abstract
Using information from several metabolic databases, we have built our own metabolic database containing 434 pathways and 1157 different enzymes. We have used this information to construct a dendrogram that demonstrates the metabolic similarities between 282 species. The resulting species distribution and the clusters defined in the tree show a certain taxonomic congruence, especially in recent relationships between species. This dendrogram is another representation of the tree of life, based on metabolism that may complement the trees constructed by other methods. For example, the metabolic dissimilarity we demonstrate between Symbiobacterium thermophilum (previously defined as Actinobacteria) and the other Actinobacteria species, and the metabolic similarity between S. thermophilum and Clostridia, combined with other evidence, suggest that S. thermophilum may be re-classified as Firmicutes, Clostridia.
Collapse
Affiliation(s)
- Marina Marcet-Houben
- Evolutionary Genomics Group, Biochemistry and Biotechnology Department, Rovira i Virgili University, Campus Sescelades, c/ Marcel li Domingo s/n, 43007 TARRAGONA, Spain
| | - Pere Puigbò
- Evolutionary Genomics Group, Biochemistry and Biotechnology Department, Rovira i Virgili University, Campus Sescelades, c/ Marcel li Domingo s/n, 43007 TARRAGONA, Spain
| | - Antoni Romeu
- Evolutionary Genomics Group, Biochemistry and Biotechnology Department, Rovira i Virgili University, Campus Sescelades, c/ Marcel li Domingo s/n, 43007 TARRAGONA, Spain
| | - Santiago Garcia-Vallve
- Evolutionary Genomics Group, Biochemistry and Biotechnology Department, Rovira i Virgili University, Campus Sescelades, c/ Marcel li Domingo s/n, 43007 TARRAGONA, Spain
| |
Collapse
|
49
|
Lemoine F, Lespinet O, Labedan B. Assessing the evolutionary rate of positional orthologous genes in prokaryotes using synteny data. BMC Evol Biol 2007; 7:237. [PMID: 18047665 PMCID: PMC2238764 DOI: 10.1186/1471-2148-7-237] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2007] [Accepted: 11/29/2007] [Indexed: 11/15/2022] Open
Abstract
Background Comparison of completely sequenced microbial genomes has revealed how fluid these genomes are. Detecting synteny blocks requires reliable methods to determining the orthologs among the whole set of homologs detected by exhaustive comparisons between each pair of completely sequenced genomes. This is a complex and difficult problem in the field of comparative genomics but will help to better understand the way prokaryotic genomes are evolving. Results We have developed a suite of programs that automate three essential steps to study conservation of gene order, and validated them with a set of 107 bacteria and archaea that cover the majority of the prokaryotic taxonomic space. We identified the whole set of shared homologs between two or more species and computed the evolutionary distance separating each pair of homologs. We applied two strategies to extract from the set of homologs a collection of valid orthologs shared by at least two genomes. The first computes the Reciprocal Smallest Distance (RSD) using the PAM distances separating pairs of homologs. The second method groups homologs in families and reconstructs each family's evolutionary tree, distinguishing bona fide orthologs as well as paralogs created after the last speciation event. Although the phylogenetic tree method often succeeds where RSD fails, the reverse could occasionally be true. Accordingly, we used the data obtained with either methods or their intersection to number the orthologs that are adjacent in for each pair of genomes, the Positional Orthologous Genes (POGs), and to further study their properties. Once all these synteny blocks have been detected, we showed that POGs are subject to more evolutionary constraints than orthologs outside synteny groups, whichever the taxonomic distance separating the compared organisms. Conclusion The suite of programs described in this paper allows a reliable detection of orthologs and is useful for evaluating gene order conservation in prokaryotes whichever their taxonomic distance. Thus, our approach will make easy the rapid identification of POGS in the next few years as we are expecting to be inundated with thousands of completely sequenced microbial genomes.
Collapse
Affiliation(s)
- Frédéric Lemoine
- Institut de Génétique et Microbiologie, CNRS UMR 8621, Bâtiment 400, Université Paris Sud XI, 91405 Orsay Cedex, France.
| | | | | |
Collapse
|
50
|
Koonin EV. The Biological Big Bang model for the major transitions in evolution. Biol Direct 2007; 2:21. [PMID: 17708768 PMCID: PMC1973067 DOI: 10.1186/1745-6150-2-21] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2007] [Accepted: 08/20/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Major transitions in biological evolution show the same pattern of sudden emergence of diverse forms at a new level of complexity. The relationships between major groups within an emergent new class of biological entities are hard to decipher and do not seem to fit the tree pattern that, following Darwin's original proposal, remains the dominant description of biological evolution. The cases in point include the origin of complex RNA molecules and protein folds; major groups of viruses; archaea and bacteria, and the principal lineages within each of these prokaryotic domains; eukaryotic supergroups; and animal phyla. In each of these pivotal nexuses in life's history, the principal "types" seem to appear rapidly and fully equipped with the signature features of the respective new level of biological organization. No intermediate "grades" or intermediate forms between different types are detectable. Usually, this pattern is attributed to cladogenesis compressed in time, combined with the inevitable erosion of the phylogenetic signal. HYPOTHESIS I propose that most or all major evolutionary transitions that show the "explosive" pattern of emergence of new types of biological entities correspond to a boundary between two qualitatively distinct evolutionary phases. The first, inflationary phase is characterized by extremely rapid evolution driven by various processes of genetic information exchange, such as horizontal gene transfer, recombination, fusion, fission, and spread of mobile elements. These processes give rise to a vast diversity of forms from which the main classes of entities at the new level of complexity emerge independently, through a sampling process. In the second phase, evolution dramatically slows down, the respective process of genetic information exchange tapers off, and multiple lineages of the new type of entities emerge, each of them evolving in a tree-like fashion from that point on. This biphasic model of evolution incorporates the previously developed concepts of the emergence of protein folds by recombination of small structural units and origin of viruses and cells from a pre-cellular compartmentalized pool of recombining genetic elements. The model is extended to encompass other major transitions. It is proposed that bacterial and archaeal phyla emerged independently from two distinct populations of primordial cells that, originally, possessed leaky membranes, which made the cells prone to rampant gene exchange; and that the eukaryotic supergroups emerged through distinct, secondary endosymbiotic events (as opposed to the primary, mitochondrial endosymbiosis). This biphasic model of evolution is substantially analogous to the scenario of the origin of universes in the eternal inflation version of modern cosmology. Under this model, universes like ours emerge in the infinite multiverse when the eternal process of exponential expansion, known as inflation, ceases in a particular region as a result of false vacuum decay, a first order phase transition process. The result is the nucleation of a new universe, which is traditionally denoted Big Bang, although this scenario is radically different from the Big Bang of the traditional model of an expanding universe. Hence I denote the phase transitions at the end of each inflationary epoch in the history of life Biological Big Bangs (BBB). CONCLUSION A Biological Big Bang (BBB) model is proposed for the major transitions in life's evolution. According to this model, each transition is a BBB such that new classes of biological entities emerge at the end of a rapid phase of evolution (inflation) that is characterized by extensive exchange of genetic information which takes distinct forms for different BBBs. The major types of new forms emerge independently, via a sampling process, from the pool of recombining entities of the preceding generation. This process is envisaged as being qualitatively different from tree-pattern cladogenesis.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|