1
|
Salichos L, Warrell J, Cevasco H, Chung A, Gerstein M. Genetic determination of regional connectivity in modelling the spread of COVID-19 outbreak for more efficient mitigation strategies. Sci Rep 2023; 13:8470. [PMID: 37231011 DOI: 10.1038/s41598-023-34959-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 05/10/2023] [Indexed: 05/27/2023] Open
Abstract
For the COVID-19 pandemic, viral transmission has been documented in many historical and geographical contexts. Nevertheless, few studies have explicitly modeled the spatiotemporal flow based on genetic sequences, to develop mitigation strategies. Additionally, thousands of SARS-CoV-2 genomes have been sequenced with associated records, potentially providing a rich source for such spatiotemporal analysis, an unprecedented amount during a single outbreak. Here, in a case study of seven states, we model the first wave of the outbreak by determining regional connectivity from phylogenetic sequence information (i.e. "genetic connectivity"), in addition to traditional epidemiologic and demographic parameters. Our study shows nearly all of the initial outbreak can be traced to a few lineages, rather than disconnected outbreaks, indicative of a mostly continuous initial viral flow. While the geographic distance from hotspots is initially important in the modeling, genetic connectivity becomes increasingly significant later in the first wave. Moreover, our model predicts that isolated local strategies (e.g. relying on herd immunity) can negatively impact neighboring regions, suggesting more efficient mitigation is possible with unified, cross-border interventions. Finally, our results suggest that a few targeted interventions based on connectivity can have an effect similar to that of an overall lockdown. They also suggest that while successful lockdowns are very effective in mitigating an outbreak, less disciplined lockdowns quickly decrease in effectiveness. Our study provides a framework for combining phylodynamic and computational methods to identify targeted interventions.
Collapse
Affiliation(s)
- Leonidas Salichos
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
- Biological and Chemical Sciences, New York Institute of Technology, Manhattan, NY, 10023, USA.
| | - Jonathan Warrell
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
| | - Hannah Cevasco
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
| | - Alvin Chung
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA.
- Department of Computer Science, Yale University, New Haven, CT, 06520, USA.
- Center for Biomedical Data Science, Yale University, New Haven, CT, 06520, USA.
- Department of Statistics & Data Science, Yale University, New Haven, CT, 06520, USA.
| |
Collapse
|
2
|
Wedell E, Cai Y, Warnow T. SCAMPP: Scaling Alignment-Based Phylogenetic Placement to Large Trees. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1417-1430. [PMID: 35471888 DOI: 10.1109/tcbb.2022.3170386] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Phylogenetic placement, the problem of placing a "query" sequence into a precomputed phylogenetic "backbone" tree, is useful for constructing large trees, performing taxon identification of newly obtained sequences, and other applications. The most accurate current methods, such as pplacer and EPA-ng, are based on maximum likelihood and require that the query sequence be provided within a multiple sequence alignment that includes the leaf sequences in the backbone tree. This approach enables high accuracy but also makes these likelihood-based methods computationally intensive on large backbone trees, and can even lead to them failing when the backbone trees are very large (e.g., having 50,000 or more leaves). We present SCAMPP (SCaling AlignMent-based Phylogenetic Placement), a technique to extend the scalability of these likelihood-based placement methods to ultra-large backbone trees. We show that pplacer-SCAMPP and EPA-ng-SCAMPP both scale well to ultra-large backbone trees (even up to 200,000 leaves), with accuracy that improves on APPLES and APPLES-2, two recently developed fast phylogenetic placement methods that scale to ultra-large datasets. EPA-ng-SCAMPP and pplacer-SCAMPP are available at https://github.com/chry04/PLUSplacer.
Collapse
|
3
|
African mitochondrial haplogroup L7: a 100,000-year-old maternal human lineage discovered through reassessment and new sequencing. Sci Rep 2022; 12:10747. [PMID: 35750688 PMCID: PMC9232647 DOI: 10.1038/s41598-022-13856-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 05/30/2022] [Indexed: 11/17/2022] Open
Abstract
Archaeological and genomic evidence suggest that modern Homo sapiens have roamed the planet for some 300–500 thousand years. In contrast, global human mitochondrial (mtDNA) diversity coalesces to one African female ancestor (“Mitochondrial Eve”) some 145 thousand years ago, owing to the ¼ gene pool size of our matrilineally inherited haploid genome. Therefore, most of human prehistory was spent in Africa where early ancestors of Southern African Khoisan and Central African rainforest hunter-gatherers (RFHGs) segregated into smaller groups. Their subdivisions followed climatic oscillations, new modes of subsistence, local adaptations, and cultural-linguistic differences, all prior to their exodus out of Africa. Seven African mtDNA haplogroups (L0–L6) traditionally captured this ancient structure—these L haplogroups have formed the backbone of the mtDNA tree for nearly two decades. Here we describe L7, an eighth haplogroup that we estimate to be ~ 100 thousand years old and which has been previously misclassified in the literature. In addition, L7 has a phylogenetic sublineage L7a*, the oldest singleton branch in the human mtDNA tree (~ 80 thousand years). We found that L7 and its sister group L5 are both low-frequency relics centered around East Africa, but in different populations (L7: Sandawe; L5: Mbuti). Although three small subclades of African foragers hint at the population origins of L5'7, the majority of subclades are divided into Afro-Asiatic and eastern Bantu groups, indicative of more recent admixture. A regular re-estimation of the entire mtDNA haplotype tree is needed to ensure correct cladistic placement of new samples in the future.
Collapse
|
4
|
Czech L, Stamatakis A, Dunthorn M, Barbera P. Metagenomic Analysis Using Phylogenetic Placement-A Review of the First Decade. FRONTIERS IN BIOINFORMATICS 2022; 2:871393. [PMID: 36304302 PMCID: PMC9580882 DOI: 10.3389/fbinf.2022.871393] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 04/11/2022] [Indexed: 12/20/2022] Open
Abstract
Phylogenetic placement refers to a family of tools and methods to analyze, visualize, and interpret the tsunami of metagenomic sequencing data generated by high-throughput sequencing. Compared to alternative (e. g., similarity-based) methods, it puts metabarcoding sequences into a phylogenetic context using a set of known reference sequences and taking evolutionary history into account. Thereby, one can increase the accuracy of metagenomic surveys and eliminate the requirement for having exact or close matches with existing sequence databases. Phylogenetic placement constitutes a valuable analysis tool per se, but also entails a plethora of downstream tools to interpret its results. A common use case is to analyze species communities obtained from metagenomic sequencing, for example via taxonomic assignment, diversity quantification, sample comparison, and identification of correlations with environmental variables. In this review, we provide an overview over the methods developed during the first 10 years. In particular, the goals of this review are 1) to motivate the usage of phylogenetic placement and illustrate some of its use cases, 2) to outline the full workflow, from raw sequences to publishable figures, including best practices, 3) to introduce the most common tools and methods and their capabilities, 4) to point out common placement pitfalls and misconceptions, 5) to showcase typical placement-based analyses, and how they can help to analyze, visualize, and interpret phylogenetic placement data.
Collapse
Affiliation(s)
- Lucas Czech
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, United States
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Micah Dunthorn
- Natural History Museum, University of Oslo, Oslo, Norway
| | | |
Collapse
|
5
|
Pérez-Losada M, Narayanan DB, Kolbe AR, Ramos-Tapia I, Castro-Nallar E, Crandall KA, Domínguez J. Comparative Analysis of Metagenomics and Metataxonomics for the Characterization of Vermicompost Microbiomes. Front Microbiol 2022; 13:854423. [PMID: 35620097 PMCID: PMC9127802 DOI: 10.3389/fmicb.2022.854423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 04/21/2022] [Indexed: 11/21/2022] Open
Abstract
The study of microbial communities or microbiotas in animals and environments is important because of their impact in a broad range of industrial applications, diseases and ecological roles. High throughput sequencing (HTS) is the best strategy to characterize microbial composition and function. Microbial profiles can be obtained either by shotgun sequencing of genomes, or through amplicon sequencing of target genes (e.g., 16S rRNA for bacteria and ITS for fungi). Here, we compared both HTS approaches at assessing taxonomic and functional diversity of bacterial and fungal communities during vermicomposting of white grape marc. We applied specific HTS workflows to the same 12 microcosms, with and without earthworms, sampled at two distinct phases of the vermicomposting process occurring at 21 and 63 days. Metataxonomic profiles were inferred in DADA2, with bacterial metabolic pathways predicted via PICRUSt2. Metagenomic taxonomic profiles were inferred in PathoScope, while bacterial functional profiles were inferred in Humann2. Microbial profiles inferred by metagenomics and metataxonomics showed similarities and differences in composition, structure, and metabolic function at different taxonomic levels. Microbial composition and abundance estimated by both HTS approaches agreed reasonably well at the phylum level, but larger discrepancies were observed at lower taxonomic ranks. Shotgun HTS identified ~1.8 times more bacterial genera than 16S rRNA HTS, while ITS HTS identified two times more fungal genera than shotgun HTS. This is mainly a consequence of the difference in resolution and reference richness between amplicon and genome sequencing approaches and databases, respectively. Our study also revealed great differences and even opposite trends in alpha- and beta-diversity between amplicon and shotgun HTS. Interestingly, amplicon PICRUSt2-imputed functional repertoires overlapped ~50% with shotgun Humann2 profiles. Finally, both approaches indicated that although bacteria and fungi are the main drivers of biochemical decomposition, earthworms also play a key role in plant vermicomposting. In summary, our study highlights the strengths and weaknesses of metagenomics and metataxonomics and provides new insights on the vermicomposting of white grape marc. Since both approaches may target different biological aspects of the communities, combining them will provide a better understanding of the microbiotas under study.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, The George Washington University, Washington, DC, United States
- Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC, United States
- CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Vairão, Portugal
| | - Dhatri Badri Narayanan
- Computational Biology Institute, The George Washington University, Washington, DC, United States
- Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC, United States
| | - Allison R. Kolbe
- Computational Biology Institute, The George Washington University, Washington, DC, United States
- Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC, United States
| | - Ignacio Ramos-Tapia
- Instituto de Investigación Interdisciplinaria (I3), Universidad de Talca, Talca, Chile
| | - Eduardo Castro-Nallar
- CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Vairão, Portugal
- Instituto de Investigación Interdisciplinaria (I3), Universidad de Talca, Talca, Chile
- Departamento de Microbiología, Facultad de Ciencias de la Salud, Universidad de Talca, Talca, Chile
| | - Keith A. Crandall
- Computational Biology Institute, The George Washington University, Washington, DC, United States
- Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC, United States
| | - Jorge Domínguez
- Grupo de Ecoloxía Animal (GEA), Universidade de Vigo, Vigo, Spain
| |
Collapse
|
6
|
Abstract
Determining the evolutionary relationships between genes is fundamental to comparative biological research. Here, we present SHOOT. SHOOT searches a user query sequence against a database of phylogenetic trees and returns a tree with the query sequence correctly placed within it. We show that SHOOT performs this analysis with comparable speed to a BLAST search. We demonstrate that SHOOT phylogenetic placements are as accurate as conventional tree inference, and it can identify orthologs with high accuracy. In summary, SHOOT is a fast and accurate tool for phylogenetic analyses of novel query sequences. It is available online at www.shoot.bio .
Collapse
Affiliation(s)
- David Mark Emms
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| |
Collapse
|
7
|
Zafeiropoulos H, Gargan L, Hintikka S, Pavloudi C, Carlsson J. The Dark mAtteR iNvestigator (DARN) tool: getting to know the known unknowns in COI amplicon data. METABARCODING AND METAGENOMICS 2021. [DOI: 10.3897/mbmg.5.69657] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The mitochondrial cytochrome C oxidase subunit I gene (COI) is commonly used in environmental DNA (eDNA) metabarcoding studies, especially for assessing metazoan diversity. Yet, a great number of COI operational taxonomic units (OTUs) or/and amplicon sequence variants (ASVs) retrieved from such studies do not get a taxonomic assignment with a reference sequence. To assess and investigate such sequences, we have developed the Dark mAtteR iNvestigator (DARN) software tool. For this purpose, a reference COI-oriented phylogenetic tree was built from 1,593 consensus sequences covering all the three domains of life. With respect to eukaryotes, consensus sequences at the family level were constructed from 183,330 sequences retrieved from the Midori reference 2 database, which represented 70% of the initial number of reference sequences. Similarly, sequences from 431 bacterial and 15 archaeal taxa at the family level (29% and 1% of the initial number of reference sequences respectively) were retrieved from the BOLD and the PFam databases. DARN makes use of this phylogenetic tree to investigate COI pre-processed sequences of amplicon samples to provide both a tabular and a graphical overview of their phylogenetic assignments. To evaluate DARN, both environmental and bulk metabarcoding samples from different aquatic environments using various primer sets were analysed. We demonstrate that a large proportion of non-target prokaryotic organisms, such as bacteria and archaea, are also amplified in eDNA samples and we suggest prokaryotic COI sequences to be included in the reference databases used for the taxonomy assignment to allow for further analyses of dark matter. DARN source code is available on GitHub at https://github.com/hariszaf/darn and as a Docker image at https://hub.docker.com/r/hariszaf/darn.
Collapse
|
8
|
Blanke M, Morgenstern B. App-SpaM: phylogenetic placement of short reads without sequence alignment. BIOINFORMATICS ADVANCES 2021; 1:vbab027. [PMID: 36700102 PMCID: PMC9710606 DOI: 10.1093/bioadv/vbab027] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 09/27/2021] [Accepted: 10/11/2021] [Indexed: 01/28/2023]
Abstract
Motivation Phylogenetic placement is the task of placing a query sequence of unknown taxonomic origin into a given phylogenetic tree of a set of reference sequences. A major field of application of such methods is, for example, the taxonomic identification of reads in metabarcoding or metagenomic studies. Several approaches to phylogenetic placement have been proposed in recent years. The most accurate of them requires a multiple sequence alignment of the references as input. However, calculating multiple alignments is not only time-consuming but also limits the applicability of these approaches. Results Herein, we propose Alignment-free phylogenetic placement algorithm based on Spaced-word Matches (App-SpaM), an efficient algorithm for the phylogenetic placement of short sequencing reads on a tree of a set of reference sequences. App-SpaM produces results of high quality that are on a par with the best available approaches to phylogenetic placement, while our software is two orders of magnitude faster than these existing methods. Our approach neither requires a multiple alignment of the reference sequences nor alignments of the queries to the references. This enables App-SpaM to perform phylogenetic placement on a broad variety of datasets. Availability and implementation The source code of App-SpaM is freely available on Github at https://github.com/matthiasblanke/App-SpaM together with detailed instructions for installation and settings. App-SpaM is furthermore available as a Conda-package on the Bioconda channel. Contact matthias.blanke@biologie.uni-goettingen.de. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Matthias Blanke
- Department of Bioinformatics, Institute of Microbiology and Genetics, Georg-August-University Göttingen, Göttingen 37077, Germany
- International Max Planck Research School for Genome Science, Göttingen 37077, Germany
| | - Burkhard Morgenstern
- Department of Bioinformatics, Institute of Microbiology and Genetics, Georg-August-University Göttingen, Göttingen 37077, Germany
- Campus-Institute Data Science (CIDAS), Göttingen 37077, Germany
| |
Collapse
|
9
|
Type II Photosynthetic Reaction Center Genes of Avocado (Persea americana Mill.) Bark Microbial Communities are Dominated by Aerobic Anoxygenic Alphaproteobacteria. Curr Microbiol 2021; 78:2623-2630. [PMID: 33990868 DOI: 10.1007/s00284-021-02525-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 04/28/2021] [Indexed: 10/21/2022]
Abstract
The tree bark environment is an important microbial habitat distributed worldwide on thrillions of trees. However, the microbial communities of tree bark are largely unknown, with most studies on plant aerial surfaces focused on the leaves. Recently, we presented a metagenomic study of bark microbial communities from avocado. In these communities, oxygenic and anoxygenic photosynthesis genes were very abundant, especially when compared to rhizospheric soil from the same trees. In this work, Evolutionary Placement Algorithm analysis was performed on metagenomic reads orthologous to the PufLM gene cluster, encoding for the bacterial type II photosynthetic reaction center. These photosynthetic genes were found affiliated to different groups of bacteria, mostly aerobic anoxygenic photosynthetic Alphaproteobacteria, including Sphingomonas, Methylobacterium and several Rhodospirillales. These results suggest that anoxygenic photosynthesis in avocado bark microbial communities functions primarily as additional energy source for heterotrophic growth. Together with our previous results, showing a large abundance of cyanobacteria in these communities, a picture emerges of the tree holobiont, where light penetrating the tree canopies and reaching the inner stems, including the trunk, is probably utilized by cyanobacteria for oxygenic photosynthesis, and the far-red light aids the growth of aerobic anoxygenic photosynthetic bacteria.
Collapse
|
10
|
Morel B, Barbera P, Czech L, Bettisworth B, Hübner L, Lutteropp S, Serdari D, Kostaki EG, Mamais I, Kozlov AM, Pavlidis P, Paraskevis D, Stamatakis A. Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult. Mol Biol Evol 2021; 38:1777-1791. [PMID: 33316067 PMCID: PMC7798910 DOI: 10.1093/molbev/msaa314] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.
Collapse
Affiliation(s)
- Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Pierre Barbera
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Lucas Czech
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA
| | - Ben Bettisworth
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Lukas Hübner
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Sarah Lutteropp
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Dora Serdari
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Evangelia-Georgia Kostaki
- Department of Hygiene Epidemiology and Medical Statistics, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Ioannis Mamais
- Department of Health Sciences, European University Cyprus, Nicosia, Cyprus
| | - Alexey M Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Crete, Greece
| | - Dimitrios Paraskevis
- Department of Hygiene Epidemiology and Medical Statistics, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| |
Collapse
|
11
|
Hleap JS, Littlefair JE, Steinke D, Hebert PDN, Cristescu ME. Assessment of current taxonomic assignment strategies for metabarcoding eukaryotes. Mol Ecol Resour 2021; 21:2190-2203. [PMID: 33905615 DOI: 10.1111/1755-0998.13407] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 03/08/2021] [Accepted: 04/19/2021] [Indexed: 01/04/2023]
Abstract
The effective use of metabarcoding in biodiversity science has brought important analytical challenges due to the need to generate accurate taxonomic assignments. The assignment of sequences to genus or species level is critical for biodiversity surveys and biomonitoring, but it is particularly challenging as researchers must select the approach that best recovers information on species composition. This study evaluates the performance and accuracy of seven methods in recovering the species composition of mock communities by using COI barcode fragments. The mock communities varied in species number and specimen abundance, while upstream molecular and bioinformatic variables were held constant, and using a set of COI fragments. We evaluated the impact of parameter optimization on the quality of the predictions. Our results indicate that BLAST top hit competes well with more complex approaches if optimized for the mock community under study. For example, the two machine learning methods that were benchmarked proved more sensitive to reference database heterogeneity and completeness than methods based on sequence similarity. The accuracy of assignments was impacted by both species and specimen counts (query compositional heterogeneity) which ultimately influence the selection of appropriate software. We urge researchers to: (i) use realistic mock communities to allow optimization of parameters, regardless of the taxonomic assignment method employed; (ii) carefully choose and curate the reference databases including completeness; and (iii) use QIIME, BLAST or LCA methods, in conjunction with parameter tuning to better assign taxonomy to diverse communities, especially when information on species diversity is lacking for the area under study.
Collapse
Affiliation(s)
- Jose S Hleap
- Department of Biology, McGill University, Montreal, QC, Canada.,SHARCNET, University of Guelph, Guelph, ON, Canada.,Fundacion SQUALUS, Cali, Colombia
| | - Joanne E Littlefair
- Department of Biology, McGill University, Montreal, QC, Canada.,Queen Mary University of London, London, UK
| | - Dirk Steinke
- Centre for Biodiversity Genomics & Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
| | - Paul D N Hebert
- Centre for Biodiversity Genomics & Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
| | | |
Collapse
|
12
|
Botnen SS, Mundra S, Kauserud H, Eidesen PB. Glacier retreat in the High Arctic: opportunity or threat for ectomycorrhizal diversity? FEMS Microbiol Ecol 2021; 96:5894921. [PMID: 32816005 DOI: 10.1093/femsec/fiaa171] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 08/17/2020] [Indexed: 11/13/2022] Open
Abstract
Climate change causes Arctic glaciers to retreat faster, exposing new areas for colonization. Several pioneer plants likely to colonize recent deglaciated, nutrient-poor areas depend on fungal partners for successful establishment. Little is known about general patterns or characteristics of facilitating fungal pioneers and how they vary with regional climate in the Arctic. The High Arctic Archipelago Svalbard represents an excellent study system to address these questions, as glaciers cover ∼60% of the land surface and recent estimations suggest at least 7% reduction of glacier area since 1960s. Roots of two ectomycorrhizal (ECM) plants (Salix polaris and Bistorta vivipara) were sampled in eight glacier forelands. Associated ECM fungi were assessed using DNA metabarcoding. About 25% of the diversity was unknown at family level, indicating presence of undescribed species. Seven genera dominated based on richness and abundance, but their relative importance varied with local factors. The genus Geopora showed surprisingly high richness and abundance, particularly in dry, nutrient-poor forelands. Such forelands will diminish along with increasing temperature and precipitation, and faster succession. Our results support a taxonomical shift in pioneer ECM diversity with climate change, and we are likely to lose unknown fungal diversity, without knowing their identity or ecological importance.
Collapse
Affiliation(s)
- S S Botnen
- Section for Genetics and Evolutionary Biology (EVOGENE), Department of Biosciences, University of Oslo, PO Box 1066 Blindern, NO-0316 Oslo, Norway.,The University Centre in Svalbard, PO Box 156, NO-9171 Longyearbyen, Norway
| | - S Mundra
- Section for Genetics and Evolutionary Biology (EVOGENE), Department of Biosciences, University of Oslo, PO Box 1066 Blindern, NO-0316 Oslo, Norway.,The University Centre in Svalbard, PO Box 156, NO-9171 Longyearbyen, Norway.,Department of Biology, College of Science, United Arab Emirates University, PO Box 15551, Al-Ain, Abu Dhabi, UAE
| | - H Kauserud
- Section for Genetics and Evolutionary Biology (EVOGENE), Department of Biosciences, University of Oslo, PO Box 1066 Blindern, NO-0316 Oslo, Norway
| | - P B Eidesen
- The University Centre in Svalbard, PO Box 156, NO-9171 Longyearbyen, Norway
| |
Collapse
|
13
|
Morgan-Lang C, McLaughlin R, Armstrong Z, Zhang G, Chan K, Hallam SJ. TreeSAPP: the Tree-based Sensitive and Accurate Phylogenetic Profiler. Bioinformatics 2021; 36:4706-4713. [PMID: 32637989 PMCID: PMC7695126 DOI: 10.1093/bioinformatics/btaa588] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 06/11/2020] [Accepted: 06/30/2020] [Indexed: 12/21/2022] Open
Abstract
Motivation Microbial communities drive matter and energy transformations integral to global biogeochemical cycles, yet many taxonomic groups facilitating these processes remain poorly represented in biological sequence databases. Due to this missing information, taxonomic assignment of sequences from environmental genomes remains inaccurate. Results We present the Tree-based Sensitive and Accurate Phylogenetic Profiler (TreeSAPP) software for functionally and taxonomically classifying genes, reactions and pathways from genomes of cultivated and uncultivated microorganisms using reference packages representing coding sequences mediating multiple globally relevant biogeochemical cycles. TreeSAPP uses linear regression of evolutionary distance on taxonomic rank to improve classifications, assigning both closely related and divergent query sequences at the appropriate taxonomic rank. TreeSAPP is able to provide quantitative functional and taxonomic classifications for both assembled and unassembled sequences and files supporting interactive tree of life visualizations. Availability and implementation TreeSAPP was developed in Python 3 as an open-source Python package and is available on GitHub at https://github.com/hallamlab/TreeSAPP. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Connor Morgan-Lang
- Graduate Program in Bioinformatics, University of British Columbia, Genome Sciences Centre, Vancouver, British Columbia V5Z 4S6, Canada
| | - Ryan McLaughlin
- Graduate Program in Bioinformatics, University of British Columbia, Genome Sciences Centre, Vancouver, British Columbia V5Z 4S6, Canada
| | - Zachary Armstrong
- Genome Science and Technology Program, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Grace Zhang
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver BC V6T 1Z4, Canada
| | - Kevin Chan
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver BC V6T 1Z4, Canada
| | - Steven J Hallam
- Graduate Program in Bioinformatics, University of British Columbia, Genome Sciences Centre, Vancouver, British Columbia V5Z 4S6, Canada.,Genome Science and Technology Program, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.,Department of Electrical and Computer Engineering, University of British Columbia, Vancouver BC V6T 1Z4, Canada.,Department of Microbiology and Immunology, University of British Columbia, 2552-2350 Health Sciences Mall, Vancouver, British Columbia V6T 1Z3, Canada.,ECOSCOPE Training Program, University of British Columbia, Vancouver, British Columbia V6T 1Z, Canada
| |
Collapse
|
14
|
Gottschling M, Czech L, Mahé F, Adl S, Dunthorn M. The Windblown: Possible Explanations for Dinophyte DNA in Forest Soils. J Eukaryot Microbiol 2020; 68:e12833. [PMID: 33155377 DOI: 10.1111/jeu.12833] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/12/2020] [Accepted: 10/28/2020] [Indexed: 11/28/2022]
Abstract
Dinophytes are widely distributed in marine- and fresh-waters, but have yet to be conclusively documented in terrestrial environments. Here, we evaluated the presence of these protists from an environmental DNA metabarcoding dataset of Neotropical rainforest soils. Using a phylogenetic placement approach with a reference alignment and tree, we showed that the numerous sequencing reads that were phylogenetically placed as dinophytes did not correlate with taxonomic assignment, environmental preference, nutritional mode, or dormancy. All the dinophytes in the soils are rather windblown dispersal units of aquatic species and are not biologically active residents of terrestrial environments.
Collapse
Affiliation(s)
- Marc Gottschling
- Department Biologie, Systematische Botanik und Mykologie, GeoBio-Center, Ludwig-Maximilians-Universität München, Munich, D-80638, Germany
| | - Lucas Czech
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, D-69118, Germany.,Department of Plant Biology, Carnegie Institution for Science, Stanford, California, 94305, USA
| | - Frédéric Mahé
- CIRAD, UMR BGPI, Montpellier, F-34398, France.,BGPI, Université de Montpellier, CIRAD, IRD, Montpellier SupAgro, , Montpellier, France
| | - Sina Adl
- Department of Soil Sciences, College of Agriculture and Bioresources, University of Saskatchewan, Saskatoon, SK, S7N 5A8, Canada
| | - Micah Dunthorn
- Eukaryotic Microbiology, Faculty of Biology, Universität Duisburg-Essen, Essen, D-45141, Germany.,Centre for Water and Environmental Research (ZWU), Universität Duisburg-Essen, Essen, D-45141, Germany
| |
Collapse
|
15
|
Martijn J, Schön ME, Lind AE, Vosseberg J, Williams TA, Spang A, Ettema TJG. Hikarchaeia demonstrate an intermediate stage in the methanogen-to-halophile transition. Nat Commun 2020; 11:5490. [PMID: 33127909 PMCID: PMC7599335 DOI: 10.1038/s41467-020-19200-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 10/01/2020] [Indexed: 01/09/2023] Open
Abstract
Halobacteria (henceforth: Haloarchaea) are predominantly aerobic halophiles that are thought to have evolved from anaerobic methanogens. This remarkable transformation most likely involved an extensive influx of bacterial genes. Whether it entailed a single massive transfer event or a gradual stream of transfers remains a matter of debate. To address this, genomes that descend from methanogen-to-halophile intermediates are necessary. Here, we present five such near-complete genomes of Marine Group IV archaea (Hikarchaeia), the closest known relatives of Haloarchaea. Their inclusion in gene tree-aware ancestral reconstructions reveals an intermediate stage that had already lost a large number of genes, including nearly all of those involved in methanogenesis and the Wood-Ljungdahl pathway. In contrast, the last Haloarchaea common ancestor gained a large number of genes and expanded its aerobic respiration and salt/UV resistance gene repertoire. Our results suggest that complex and gradual patterns of gain and loss shaped the methanogen-to-halophile transition.
Collapse
Affiliation(s)
- Joran Martijn
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Canada
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Max E Schön
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Anders E Lind
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Julian Vosseberg
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
- Theoretical Biology and Bioinformatics, Department of Biology, Utrecht University, Utrecht, The Netherlands
| | - Tom A Williams
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Anja Spang
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
- NIOZ, Royal Netherlands Institute for Sea Research, Department of Marine Microbiology and Biogeochemistry, Utrecht University, Den Burg, The Netherlands
| | - Thijs J G Ettema
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden.
- Laboratory of Microbiology, Wageningen University and Research, Wageningen, The Netherlands.
| |
Collapse
|
16
|
Lücking R, Aime MC, Robbertse B, Miller AN, Ariyawansa HA, Aoki T, Cardinali G, Crous PW, Druzhinina IS, Geiser DM, Hawksworth DL, Hyde KD, Irinyi L, Jeewon R, Johnston PR, Kirk PM, Malosso E, May TW, Meyer W, Öpik M, Robert V, Stadler M, Thines M, Vu D, Yurkov AM, Zhang N, Schoch CL. Unambiguous identification of fungi: where do we stand and how accurate and precise is fungal DNA barcoding? IMA Fungus 2020; 11:14. [PMID: 32714773 PMCID: PMC7353689 DOI: 10.1186/s43008-020-00033-z] [Citation(s) in RCA: 185] [Impact Index Per Article: 46.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
True fungi (Fungi) and fungus-like organisms (e.g. Mycetozoa, Oomycota) constitute the second largest group of organisms based on global richness estimates, with around 3 million predicted species. Compared to plants and animals, fungi have simple body plans with often morphologically and ecologically obscure structures. This poses challenges for accurate and precise identifications. Here we provide a conceptual framework for the identification of fungi, encouraging the approach of integrative (polyphasic) taxonomy for species delimitation, i.e. the combination of genealogy (phylogeny), phenotype (including autecology), and reproductive biology (when feasible). This allows objective evaluation of diagnostic characters, either phenotypic or molecular or both. Verification of identifications is crucial but often neglected. Because of clade-specific evolutionary histories, there is currently no single tool for the identification of fungi, although DNA barcoding using the internal transcribed spacer (ITS) remains a first diagnosis, particularly in metabarcoding studies. Secondary DNA barcodes are increasingly implemented for groups where ITS does not provide sufficient precision. Issues of pairwise sequence similarity-based identifications and OTU clustering are discussed, and multiple sequence alignment-based phylogenetic approaches with subsequent verification are recommended as more accurate alternatives. In metabarcoding approaches, the trade-off between speed and accuracy and precision of molecular identifications must be carefully considered. Intragenomic variation of the ITS and other barcoding markers should be properly documented, as phylotype diversity is not necessarily a proxy of species richness. Important strategies to improve molecular identification of fungi are: (1) broadly document intraspecific and intragenomic variation of barcoding markers; (2) substantially expand sequence repositories, focusing on undersampled clades and missing taxa; (3) improve curation of sequence labels in primary repositories and substantially increase the number of sequences based on verified material; (4) link sequence data to digital information of voucher specimens including imagery. In parallel, technological improvements to genome sequencing offer promising alternatives to DNA barcoding in the future. Despite the prevalence of DNA-based fungal taxonomy, phenotype-based approaches remain an important strategy to catalog the global diversity of fungi and establish initial species hypotheses.
Collapse
Affiliation(s)
- Robert Lücking
- Botanischer Garten und Botanisches Museum, Freie Universität Berlin, Königin-Luise-Straße 6–8, 14195 Berlin, Germany
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
| | - M. Catherine Aime
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN 47907 USA
| | - Barbara Robbertse
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892 USA
| | - Andrew N. Miller
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Illinois Natural History Survey, University of Illinois, 1816 South Oak Street, Champaign, IL 61820-6970 USA
| | - Hiran A. Ariyawansa
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Department of Plant Pathology and Microbiology, College of Bio-Resources and Agriculture, National Taiwan University, Taipe City, Taiwan
| | - Takayuki Aoki
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- National Agriculture and Food Research Organization, Genetic Resources Center, 2-1-2 Kannondai, Tsukuba, Ibaraki, 305-8602 Japan
| | - Gianluigi Cardinali
- Department Pharmaceutical Sciences, University of Perugia, Via Borgo 20 Giugno, 74, Perugia, Italy
| | - Pedro W. Crous
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Westerdijk Fungal Biodiversity Institute, Uppsalalaan 8, 3584 CT Utrecht, The Netherlands
- Wageningen University and Research Centre (WUR), Laboratory of Phytopathology, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Irina S. Druzhinina
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Microbiology and Applied Genomics Group, Research Area Biochemical Technology, Institute of Chemical, Environmental & Bioscience Engineering (ICEBE), TU Wien, Vienna, Austria
- Jiangsu Provincial Key Lab of Organic Solid Waste Utilization, Nanjing Agricultural University, Nanjing, China
| | - David M. Geiser
- Department of Plant Pathology & Environmental Microbiology, The Pennsylvania State University, University Park, PA 16802 USA
| | - David L. Hawksworth
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Department of Life Sciences, The Natural History Museum, Cromwell Road, London, SW7 5BD UK
- Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Surrey, TW9 3DS UK
- Geography and Environment, University of Southampton, Southampton, SO17 1BJ UK
- Jilin Agricultural University, Changchun, 130118 Jilin Province China
| | - Kevin D. Hyde
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Science, Kunming, 650201 Yunnan China
- Center of Excellence in Fungal Research, Mae Fah Luang University, Chiang Rai, 57100 Thailand
- World Agroforestry Centre, East and Central Asia, Kunming, 650201 Yunnan China
- Mushroom Research Foundation, 128 M.3 Ban Pa Deng T. Pa Pae, A. Mae Taeng, Chiang Rai, 50150 Thailand
| | - Laszlo Irinyi
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, Marie Bashir Institute for Infectious Diseases and Biosecurity, The University of Sydney, Westmead Hospital (Research and Education Network), Westmead Institute for Medical Research, Sydney, NSW Australia
| | - Rajesh Jeewon
- Department of Health Sciences, Faculty of Science, University of Mauritius, Reduit, Mauritius
| | - Peter R. Johnston
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Manaaki Whenua – Landcare Research, Private Bag 92170, Auckland, 1142 New Zealand
| | | | - Elaine Malosso
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Universidade Federal de Pernambuco, Centro de Biociências, Departamento de Micologia, Laboratório de Hifomicetos de Folhedo, Avenida da Engenharia, s/n Cidade Universitária, Recife, PE 50.740-600 Brazil
| | - Tom W. May
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Royal Botanic Gardens Victoria, Birdwood Avenue, Melbourne, Victoria 3004 Australia
| | - Wieland Meyer
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, Marie Bashir Institute for Infectious Diseases and Biosecurity, The University of Sydney, Westmead Hospital (Research and Education Network), Westmead Institute for Medical Research, Sydney, NSW Australia
| | - Maarja Öpik
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- University of Tartu, 40 Lai Street, 51 005 Tartu, Estonia
| | - Vincent Robert
- Department Pharmaceutical Sciences, University of Perugia, Via Borgo 20 Giugno, 74, Perugia, Italy
- Westerdijk Fungal Biodiversity Institute, Uppsalalaan 8, 3584 CT Utrecht, The Netherlands
| | - Marc Stadler
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Department Microbial Drugs, Helmholtz Centre for Infection Research, and German Centre for Infection Research (DZIF), partner site Hannover-Braunschweig, Inhoffenstrasse 7, 38124 Braunschweig, Germany
| | - Marco Thines
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Institute of Ecology, Evolution and Diversity, Goethe University, Max-von-Laue-Straße 9, 60439 Frankfurt (Main); Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage 25, 60325 Frankfurt (Main), Germany
| | - Duong Vu
- Westerdijk Fungal Biodiversity Institute, Uppsalalaan 8, 3584 CT Utrecht, The Netherlands
| | - Andrey M. Yurkov
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Ning Zhang
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- Department of Plant Biology, Rutgers University, New Brunswick, NJ 08901 USA
| | - Conrad L. Schoch
- International Commission on the Taxonomy of Fungi, Champaign, IL USA
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892 USA
| |
Collapse
|
17
|
Gill MS, Lemey P, Suchard MA, Rambaut A, Baele G. Online Bayesian Phylodynamic Inference in BEAST with Application to Epidemic Reconstruction. Mol Biol Evol 2020; 37:1832-1842. [PMID: 32101295 PMCID: PMC7253210 DOI: 10.1093/molbev/msaa047] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Reconstructing pathogen dynamics from genetic data as they become available during an outbreak or epidemic represents an important statistical scenario in which observations arrive sequentially in time and one is interested in performing inference in an "online" fashion. Widely used Bayesian phylogenetic inference packages are not set up for this purpose, generally requiring one to recompute trees and evolutionary model parameters de novo when new data arrive. To accommodate increasing data flow in a Bayesian phylogenetic framework, we introduce a methodology to efficiently update the posterior distribution with newly available genetic data. Our procedure is implemented in the BEAST 1.10 software package, and relies on a distance-based measure to insert new taxa into the current estimate of the phylogeny and imputes plausible values for new model parameters to accommodate growing dimensionality. This augmentation creates informed starting values and re-uses optimally tuned transition kernels for posterior exploration of growing data sets, reducing the time necessary to converge to target posterior distributions. We apply our framework to data from the recent West African Ebola virus epidemic and demonstrate a considerable reduction in time required to obtain posterior estimates at different time points of the outbreak. Beyond epidemic monitoring, this framework easily finds other applications within the phylogenetics community, where changes in the data-in terms of alignment changes, sequence addition or removal-present common scenarios that can benefit from online inference.
Collapse
Affiliation(s)
- Mandev S Gill
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Marc A Suchard
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA
- Department of Biostatistics, School of Public Health, University of California, Los Angeles, CA
- Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, CA
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, United Kingdom
- Fogarty International Center, National Institutes of Health, Bethesda, MD
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| |
Collapse
|
18
|
Czech L, Barbera P, Stamatakis A. Genesis and Gappa: processing, analyzing and visualizing phylogenetic (placement) data. Bioinformatics 2020; 36:3263-3265. [PMID: 32016344 PMCID: PMC7214027 DOI: 10.1093/bioinformatics/btaa070] [Citation(s) in RCA: 141] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 01/22/2020] [Accepted: 01/28/2020] [Indexed: 11/14/2022] Open
Abstract
SUMMARY We present genesis, a library for working with phylogenetic data, and gappa, an accompanying command-line tool for conducting typical analyses on such data. The tools target phylogenetic trees and phylogenetic placements, sequences, taxonomies and other relevant data types, offer high-level simplicity as well as low-level customizability, and are computationally efficient, well-tested and field-proven. AVAILABILITY AND IMPLEMENTATION Both genesis and gappa are written in modern C++11, and are freely available under GPLv3 at http://github.com/lczech/genesis and http://github.com/lczech/gappa. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lucas Czech
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Pierre Barbera
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe 76131, Germany
| |
Collapse
|
19
|
Sato N, Kakuta M, Uchino E, Hasegawa T, Kojima R, Kobayashi W, Sawada K, Tamura Y, Tokuda I, Imoto S, Nakaji S, Murashita K, Yanagita M, Okuno Y. The relationship between cigarette smoking and the tongue microbiome in an East Asian population. J Oral Microbiol 2020; 12:1742527. [PMID: 32341759 PMCID: PMC7170382 DOI: 10.1080/20002297.2020.1742527] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Revised: 11/28/2019] [Accepted: 12/17/2019] [Indexed: 01/27/2023] Open
Abstract
Background: The oral microbiome, which consists of various habitats, has been shown to be influenced by smoking. However, differences in the tongue microbiomes of current and former smokers, as well as their resultant functional consequences, have rarely been investigated in East Asian populations. Methods: We used 16S rRNA amplicon sequencing of tongue-coating samples obtained from East Asian subjects who were current, former, or never smokers to identify differences in their tongue microbiomes and related metagenomic functions. Two sets of participants from 2016 to 2017 (n = 657 and n = 187, respectively) were analyzed separately. Results: We found significant differences between the overall microbiome compositions of current versus never smokers (p = 0.0015), but not between former versus never smokers (p = 0.43) based on the weighted UniFrac distance. Twenty-nine of 43 investigated genera showed significantly different expression levels in current versus never smokers. Neisseria and Capnocytophaga were less abundant, and Streptococcus and Megasphaera were more abundant in current smokers. Moreover, the abundances of metagenomic pathways, including those related to nitrate reduction and the tricarboxylic acid cycle, were significantly different between current and never smokers. Conclusions: The tongue microbiomes and related metagenomic pathways of current smokers differ from those of never smokers among East Asians.
Collapse
Affiliation(s)
- Noriaki Sato
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto, Japan.,Department of Nephrology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Masanori Kakuta
- Human Genome Center, The Institute of Medical Science, the University of Tokyo, Tokyo, Japan
| | - Eiichiro Uchino
- Department of Nephrology, Graduate School of Medicine, Kyoto University, Kyoto, Japan.,Department of Medical Intelligent Systems, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Takanori Hasegawa
- Health Intelligence Center, the Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Ryosuke Kojima
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Wataru Kobayashi
- Department of Oral and Maxillofacial Surgery, Hirosaki University Graduate School of Medicine, Aomori, Japan
| | - Kaori Sawada
- Department of Social Medicine, Hirosaki University Graduate School of Medicine, Aomori, Japan
| | - Yoshihiro Tamura
- Department of Oral and Maxillofacial Surgery, Hirosaki University Graduate School of Medicine, Aomori, Japan
| | - Itoyo Tokuda
- Department of Oral Health Care, Hirosaki University Graduate School of Medicine, Aomori, Japan
| | - Seiya Imoto
- Health Intelligence Center, the Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Shigeyuki Nakaji
- Department of Social Medicine, Hirosaki University Graduate School of Medicine, Aomori, Japan
| | | | - Motoko Yanagita
- Department of Nephrology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Yasushi Okuno
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| |
Collapse
|
20
|
Zafeiropoulos H, Viet HQ, Vasileiadou K, Potirakis A, Arvanitidis C, Topalis P, Pavloudi C, Pafilis E. PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S ribosomal RNA, ITS, and COI marker genes. Gigascience 2020; 9:giaa022. [PMID: 32161947 PMCID: PMC7066391 DOI: 10.1093/gigascience/giaa022] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 01/05/2020] [Accepted: 02/14/2020] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Environmental DNA and metabarcoding allow the identification of a mixture of species and launch a new era in bio- and eco-assessment. Many steps are required to obtain taxonomically assigned matrices from raw data. For most of these, a plethora of tools are available; each tool's execution parameters need to be tailored to reflect each experiment's idiosyncrasy. Adding to this complexity, the computation capacity of high-performance computing systems is frequently required for such analyses. To address the difficulties, bioinformatic pipelines need to combine state-of-the art technologies and algorithms with an easy to get-set-use framework, allowing researchers to tune each study. Software containerization technologies ease the sharing and running of software packages across operating systems; thus, they strongly facilitate pipeline development and usage. Likewise programming languages specialized for big data pipelines incorporate features like roll-back checkpoints and on-demand partial pipeline execution. FINDINGS PEMA is a containerized assembly of key metabarcoding analysis tools that requires low effort in setting up, running, and customizing to researchers' needs. Based on third-party tools, PEMA performs read pre-processing, (molecular) operational taxonomic unit clustering, amplicon sequence variant inference, and taxonomy assignment for 16S and 18S ribosomal RNA, as well as ITS and COI marker gene data. Owing to its simplified parameterization and checkpoint support, PEMA allows users to explore alternative algorithms for specific steps of the pipeline without the need of a complete re-execution. PEMA was evaluated against both mock communities and previously published datasets and achieved results of comparable quality. CONCLUSIONS A high-performance computing-based approach was used to develop PEMA; however, it can be used in personal computers as well. PEMA's time-efficient performance and good results will allow it to be used for accurate environmental DNA metabarcoding analysis, thus enhancing the applicability of next-generation biodiversity assessment studies.
Collapse
Affiliation(s)
- Haris Zafeiropoulos
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes P.O. Box 2214, 71003, Heraklion, Crete, Greece
- Department of Biology, University of Crete, Voutes University Campus, Heraklion, Greece
| | - Ha Quoc Viet
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Katerina Vasileiadou
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes P.O. Box 2214, 71003, Heraklion, Crete, Greece
- Charles University, Department of Ecology, Faculty of Science, Viničná 7, CZ-12844, Prague, Czech Republic
| | - Antonis Potirakis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Christos Arvanitidis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes P.O. Box 2214, 71003, Heraklion, Crete, Greece
- LifeWatch ERIC, Plaza España SN, SECTOR II-III 41013, Seville, Spain
| | - Pantelis Topalis
- Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology (FORTH), Foundation for Research and Technology – Hellas, N. Plastira 100, GR-70013, Heraklion, Crete, Greece
| | - Christina Pavloudi
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes P.O. Box 2214, 71003, Heraklion, Crete, Greece
| |
Collapse
|
21
|
Bremges A, Fritz A, McHardy AC. CAMITAX: Taxon labels for microbial genomes. Gigascience 2020; 9:giz154. [PMID: 31909794 PMCID: PMC6946028 DOI: 10.1093/gigascience/giz154] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2019] [Revised: 11/23/2019] [Accepted: 12/10/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The number of microbial genome sequences is increasing exponentially, especially thanks to recent advances in recovering complete or near-complete genomes from metagenomes and single cells. Assigning reliable taxon labels to genomes is key and often a prerequisite for downstream analyses. FINDINGS We introduce CAMITAX, a scalable and reproducible workflow for the taxonomic labelling of microbial genomes recovered from isolates, single cells, and metagenomes. CAMITAX combines genome distance-, 16S ribosomal RNA gene-, and gene homology-based taxonomic assignments with phylogenetic placement. It uses Nextflow to orchestrate reference databases and software containers and thus combines ease of installation and use with computational reproducibility. We evaluated the method on several hundred metagenome-assembled genomes with high-quality taxonomic annotations from the TARA Oceans project, and we show that the ensemble classification method in CAMITAX improved on all individual methods across tested ranks. CONCLUSIONS While we initially developed CAMITAX to aid the Critical Assessment of Metagenome Interpretation (CAMI) initiative, it evolved into a comprehensive software package to reliably assign taxon labels to microbial genomes. CAMITAX is available under Apache License 2.0 at https://github.com/CAMI-challenge/CAMITAX.
Collapse
Affiliation(s)
- Andreas Bremges
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany
- German Center for Infection Research (DZIF), Partner Site Hannover-Braunschweig, Inhoffenstraße 7, 38124 Braunschweig, Germany
| | - Adrian Fritz
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany
| |
Collapse
|
22
|
|
23
|
Czech L, Stamatakis A. Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples. PLoS One 2019; 14:e0217050. [PMID: 31136592 PMCID: PMC6538146 DOI: 10.1371/journal.pone.0217050] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 05/05/2019] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND The exponential decrease in molecular sequencing cost generates unprecedented amounts of data. Hence, scalable methods to analyze these data are required. Phylogenetic (or Evolutionary) Placement methods identify the evolutionary provenance of anonymous sequences with respect to a given reference phylogeny. This increasingly popular method is deployed for scrutinizing metagenomic samples from environments such as water, soil, or the human gut. NOVEL METHODS Here, we present novel and, more importantly, highly scalable methods for analyzing phylogenetic placements of metagenomic samples. More specifically, we introduce methods for (a) visualizing differences between samples and their correlation with associated meta-data on the reference phylogeny, (b) clustering similar samples using a variant of the k-means method, and (c) finding phylogenetic factors using an adaptation of the Phylofactorization method. These methods enable to interpret metagenomic data in a phylogenetic context, to find patterns in the data, and to identify branches of the phylogeny that are driving these patterns. RESULTS To demonstrate the scalability and utility of our methods, as well as to provide exemplary interpretations of our methods, we applied them to 3 publicly available datasets comprising 9782 samples with a total of approximately 168 million sequences. The results indicate that new biological insights can be attained via our methods.
Collapse
Affiliation(s)
- Lucas Czech
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| |
Collapse
|