1
|
Intragenomic variation in nuclear ribosomal markers and its implication in species delimitation, identification and barcoding in fungi. FUNGAL BIOL REV 2022. [DOI: 10.1016/j.fbr.2022.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
2
|
Van Dam AR, Covas Orizondo JO, Lam AW, McKenna DD, Van Dam MH. Metagenomic clustering reveals microbial contamination as an essential consideration in ultraconserved element design for phylogenomics with insect museum specimens. Ecol Evol 2022; 12:e8625. [PMID: 35342556 PMCID: PMC8932080 DOI: 10.1002/ece3.8625] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 01/03/2022] [Accepted: 01/17/2022] [Indexed: 11/30/2022] Open
Abstract
Phylogenomics via ultraconserved elements (UCEs) has led to improved phylogenetic reconstructions across the tree of life. However, inadvertently incorporating non‐targeted DNA into the UCE marker design will lead to misinformation being incorporated into subsequent analyses. To date, the effectiveness of basic metagenomic filtering strategies has not been assessed in arthropods. Designing markers from museum specimens requires careful consideration of methods due to the high levels of microbial contamination typically found in such specimens. We investigate if contaminant sequences are carried forward into a UCE marker set we developed from insect museum specimens using a standard bioinformatics pipeline. We find that the methods currently employed by most researchers do not exclude contamination from the final set of targets. Lastly, we highlight several paths forward for reducing contamination in UCE marker design.
Collapse
Affiliation(s)
- Alex R. Van Dam
- Department of Biology University of Puerto Rico Mayagüez Mayagüez Puerto Rico
| | | | - Athena W. Lam
- Department of Entomology California Academy of Sciences San Francisco California USA
| | - Duane D. McKenna
- Department of Biological Sciences University of Memphis Memphis Tennessee USA
- Center for Biodiversity Research University of Memphis Memphis Tennessee USA
| | - Matthew H. Van Dam
- Department of Entomology California Academy of Sciences San Francisco California USA
| |
Collapse
|
3
|
Rossi N, Colautti A, Iacumin L, Piazza C. WGA-LP: a pipeline for whole genome assembly of contaminated reads. Bioinformatics 2022; 38:846-848. [PMID: 34668528 DOI: 10.1093/bioinformatics/btab719] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 09/22/2021] [Accepted: 10/15/2021] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Whole genome assembly (WGA) of bacterial genomes with short reads is a quite common task as DNA sequencing has become cheaper with the advances of its technology. The process of assembling a genome has no absolute golden standard and it requires to perform a sequence of steps each of which can involve combinations of many different tools. However, the quality of the final assembly is always strongly related to the quality of the input data. With this in mind we built WGA-LP, a package that connects state-of-the-art programs for microbial analysis and novel scripts to check and improve the quality of both samples and resulting assemblies. WGA-LP, with its conservative decontamination approach, has shown to be capable of creating high quality assemblies even in the case of contaminated reads. AVAILABILITY AND IMPLEMENTATION WGA-LP is available on GitHub (https://github.com/redsnic/WGA-LP) and Docker Hub (https://hub.docker.com/r/redsnic/wgalp). The web app for node visualization is hosted by shinyapps.io (https://redsnic.shinyapps.io/ContigCoverageVisualizer/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- N Rossi
- Department of Mathematics, Computer Science, and Physics, University of Udine, 33100 Udine, Italy
| | - A Colautti
- Dipartimento di Scienze Agroalimentari, Ambientali e Animali, University of Udine, 33100 Udine, Italy
| | - L Iacumin
- Dipartimento di Scienze Agroalimentari, Ambientali e Animali, University of Udine, 33100 Udine, Italy
| | - C Piazza
- Department of Mathematics, Computer Science, and Physics, University of Udine, 33100 Udine, Italy
| |
Collapse
|
4
|
A high-quality fungal genome assembly resolved from a sample accidentally contaminated by multiple taxa. Biotechniques 2021; 72:39-50. [PMID: 34846173 DOI: 10.2144/btn-2021-0097] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Contamination in sequenced genomes is a relatively common problem and several methods to remove non-target sequences have been devised. Typically, the target and contaminating organisms reside in different kingdoms, simplifying their separation. The authors present the case of a genome for the ascomycete fungus Teratosphaeria eucalypti, contaminated by another ascomycete fungus and a bacterium. Approaching the problem as a low-complexity metagenomics project, the authors used two available software programs, BlobToolKit and anvi'o, to filter the contaminated genome. Both the de novo and reference-assisted approaches yielded a high-quality draft genome assembly for the target fungus. Incorporating reference sequences increased assembly completeness and visualization elucidated previously unknown genome features. The authors suggest that visualization should be routine in any sequencing project, regardless of suspected contamination.
Collapse
|
5
|
Bell KL, Petit RA, Cutler A, Dobbs EK, Macpherson JM, Read TD, Burgess KS, Brosi BJ. Comparing whole-genome shotgun sequencing and DNA metabarcoding approaches for species identification and quantification of pollen species mixtures. Ecol Evol 2021; 11:16082-16098. [PMID: 34824813 PMCID: PMC8601920 DOI: 10.1002/ece3.8281] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 10/04/2021] [Accepted: 10/08/2021] [Indexed: 12/12/2022] Open
Abstract
Molecular identification of mixed-species pollen samples has a range of applications in various fields of research. To date, such molecular identification has primarily been carried out via amplicon sequencing, but whole-genome shotgun (WGS) sequencing of pollen DNA has potential advantages, including (1) more genetic information per sample and (2) the potential for better quantitative matching. In this study, we tested the performance of WGS sequencing methodology and publicly available reference sequences in identifying species and quantifying their relative abundance in pollen mock communities. Using mock communities previously analyzed with DNA metabarcoding, we sequenced approximately 200Mbp for each sample using Illumina HiSeq and MiSeq. Taxonomic identifications were based on the Kraken k-mer identification method with reference libraries constructed from full-genome and short read archive data from the NCBI database. We found WGS to be a reliable method for taxonomic identification of pollen with near 100% identification of species in mixtures but generating higher rates of false positives (reads not identified to the correct taxon at the required taxonomic level) relative to rbcL and ITS2 amplicon sequencing. For quantification of relative species abundance, WGS data provided a stronger correlation between pollen grain proportion and sequence read proportion, but diverged more from a 1:1 relationship, likely due to the higher rate of false positives. Currently, a limitation of WGS-based pollen identification is the lack of representation of plant diversity in publicly available genome databases. As databases improve and costs drop, we expect that eventually genomics methods will become the methods of choice for species identification and quantification of mixed-species pollen samples.
Collapse
Affiliation(s)
- Karen L Bell
- Department of Environmental Sciences Emory University Atlanta Georgia USA
- Present address: School of Biological Sciences University of Western Australia Perth Australia
- Present address: CSIRO Land & Water and CSIRO Health & Biosecurity Floreat WA Australia
| | - Robert A Petit
- Division of Infectious Diseases Department of Medicine Emory University Atlanta Georgia USA
| | - Anya Cutler
- Department of Environmental Sciences Emory University Atlanta Georgia USA
| | - Emily K Dobbs
- Department of Environmental Sciences Emory University Atlanta Georgia USA
- Present address: Department of Biology Northern Kentucky University Highland Heights Kentucky USA
| | - J Michael Macpherson
- Department of Biology Chapman University Orange California USA
- Present address: 23andMe Mountain View California USA
| | - Timothy D Read
- Division of Infectious Diseases Department of Medicine Emory University Atlanta Georgia USA
| | - Kevin S Burgess
- Department of Biology Columbus State University Columbus Georgia USA
| | - Berry J Brosi
- Department of Environmental Sciences Emory University Atlanta Georgia USA
- Present address: Department of Biology University of Washington Seattle Washington USA
| |
Collapse
|
6
|
Music of metagenomics-a review of its applications, analysis pipeline, and associated tools. Funct Integr Genomics 2021; 22:3-26. [PMID: 34657989 DOI: 10.1007/s10142-021-00810-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 09/25/2021] [Accepted: 10/03/2021] [Indexed: 10/20/2022]
Abstract
This humble effort highlights the intricate details of metagenomics in a simple, poetic, and rhythmic way. The paper enforces the significance of the research area, provides details about major analytical methods, examines the taxonomy and assembly of genomes, emphasizes some tools, and concludes by celebrating the richness of the ecosystem populated by the "metagenome."
Collapse
|
7
|
Dovrolis N, Kassela K, Konstantinidis K, Kouvela A, Veletza S, Karakasiliotis I. ZWA: Viral genome assembly and characterization hindrances from virus-host chimeric reads; a refining approach. PLoS Comput Biol 2021; 17:e1009304. [PMID: 34370725 PMCID: PMC8376068 DOI: 10.1371/journal.pcbi.1009304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 08/19/2021] [Accepted: 07/24/2021] [Indexed: 11/19/2022] Open
Abstract
Viral metagenomics, also known as virome studies, have yielded an unprecedented number of novel sequences, essential in recognizing and characterizing the etiological agent and the origin of emerging infectious diseases. Several tools and pipelines have been developed, to date, for the identification and assembly of viral genomes. Assembly pipelines often result in viral genomes contaminated with host genetic material, some of which are currently deposited into public databases. In the current report, we present a group of deposited sequences that encompass ribosomal RNA (rRNA) contamination. We highlight the detrimental role of chimeric next generation sequencing reads, between host rRNA sequences and viral sequences, in virus genome assembly and we present the hindrances these reads may pose to current methodologies. We have further developed a refining pipeline, the Zero Waste Algorithm (ZWA) that assists in the assembly of low abundance viral genomes. ZWA performs context-depended trimming of chimeric reads, precisely removing their rRNA moiety. These, otherwise discarded, reads were fed to the assembly pipeline and assisted in the construction of larger and cleaner contigs making a substantial impact on current assembly methodologies. ZWA pipeline may significantly enhance virus genome assembly from low abundance samples and virus metagenomics approaches in which a small number of reads determine genome quality and integrity. For years now the study of viruses and their genetic composition has been important in their identification and classification. Especially in these times of the pandemic turmoil, accurate knowledge of a virus’ exact genetic composition can help identify its strengths and weaknesses allowing us to track its evolution and assist in the development of vaccines and antiviral agents. The reconstruction of these genomic sequences is called the assembly process, a bioinformatics approach which can be complicated and full of pitfalls. This work identifies one such issue, concerning artifacts introduced in viral genomes from the new technologies of nucleic acid sequencing. The proposed algorithm helps alleviate this problem by tentatively removing these problematic regions while keeping the vast majority of the genetic information required to produce a more complete viral genome. This work is anticipated to assist in the submission of higher integrity and accuracy viral genomes in public databases used for novel virus identification and characterization.
Collapse
Affiliation(s)
- Nikolas Dovrolis
- Laboratory of Biology, Department of Medicine, Democritus University of Thrace, Alexandroupolis, Greece
- * E-mail: (ND); (IK)
| | - Katerina Kassela
- Laboratory of Biology, Department of Medicine, Democritus University of Thrace, Alexandroupolis, Greece
| | | | - Adamantia Kouvela
- Laboratory of Biology, Department of Medicine, Democritus University of Thrace, Alexandroupolis, Greece
| | - Stavroula Veletza
- Laboratory of Biology, Department of Medicine, Democritus University of Thrace, Alexandroupolis, Greece
| | - Ioannis Karakasiliotis
- Laboratory of Biology, Department of Medicine, Democritus University of Thrace, Alexandroupolis, Greece
- * E-mail: (ND); (IK)
| |
Collapse
|
8
|
Sutton JM, Millwood JD, Case McCormack A, Fierst JL. Optimizing experimental design for genome sequencing and assembly with Oxford Nanopore Technologies. GIGABYTE 2021; 2021:gigabyte27. [PMID: 36824342 PMCID: PMC9650304 DOI: 10.46471/gigabyte.27] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 07/05/2021] [Indexed: 11/09/2022] Open
Abstract
High quality reference genome sequences are the core of modern genomics. Oxford Nanopore Technologies (ONT) produces inexpensive DNA sequences, but has high error rates, which make sequence assembly and analysis difficult as genome size and complexity increases. Robust experimental design is necessary for ONT genome sequencing and assembly, but few studies have addressed eukaryotic organisms. Here, we present novel results using simulated and empirical ONT and DNA libraries to identify best practices for sequencing and assembly for several model species. We find that the unique error structure of ONT libraries causes errors to accumulate and assembly statistics plateau as sequence depth increases. High-quality assembled eukaryotic sequences require high-molecular-weight DNA extractions that increase sequence read length, and computational protocols that reduce error through pre-assembly correction and read selection. Our quantitative results will be helpful for researchers seeking guidance for de novo assembly projects.
Collapse
Affiliation(s)
- John M. Sutton
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487-0344, USA
| | - Joshua D. Millwood
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487-0344, USA
| | - A. Case McCormack
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487-0344, USA
| | - Janna L. Fierst
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487-0344, USA, Corresponding author. E-mail:
| |
Collapse
|
9
|
Smith CA. Macrosynteny analysis between Lentinula edodes and Lentinula novae-zelandiae reveals signals of domestication in Lentinula edodes. Sci Rep 2021; 11:9845. [PMID: 33972587 PMCID: PMC8110776 DOI: 10.1038/s41598-021-89146-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 04/08/2021] [Indexed: 12/03/2022] Open
Abstract
The basidiomycete fungus Lentinula novae-zelandiae is endemic to New Zealand and is a sister taxon to Lentinula edodes, the second most cultivated mushroom in the world. To explore the biology of this organism, a high-quality chromosome level reference genome of L. novae-zelandiae was produced. Macrosyntenic comparisons between the genome assembly of L. novae-zelandiae, L. edodes and a set of three genome assemblies of diverse species from the Agaricomycota reveal a high degree of macrosyntenic restructuring within L. edodes consistent with signal of domestication. These results show L. edodes has undergone significant genomic change during the course of its evolutionary history, likely a result of its cultivation and domestication over the last 1000 years.
Collapse
|
10
|
Garrido-Sanz L, Senar MÀ, Piñol J. Estimation of the relative abundance of species in artificial mixtures of insects using low-coverage shotgun metagenomics. METABARCODING AND METAGENOMICS 2020. [DOI: 10.3897/mbmg.4.48281] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Amplicon metabarcoding is an established technique to analyse the taxonomic composition of communities of organisms using high-throughput DNA sequencing, but there are doubts about its ability to quantify the relative proportions of the species, as opposed to the species list. Here, we bypass the enrichment step and avoid the PCR-bias, by directly sequencing the extracted DNA using shotgun metagenomics. This approach is common practice in prokaryotes, but not in eukaryotes, because of the low number of sequenced genomes of eukaryotic species. We tested the metagenomics approach using insect species whose genome is already sequenced and assembled to an advanced degree. We shotgun-sequenced, at low-coverage, 18 species of insects in 22 single-species and 6 mixed-species libraries and mapped the reads against 110 reference genomes of insects. We used the single-species libraries to calibrate the process of assignation of reads to species and the libraries created from species mixtures to evaluate the ability of the method to quantify the relative species abundance. Our results showed that the shotgun metagenomic method is easily able to set apart closely-related insect species, like four species of Drosophila included in the artificial libraries. However, to avoid the counting of rare misclassified reads in samples, it was necessary to use a rather stringent detection limit of 0.001, so species with a lower relative abundance are ignored. We also identified that approximately half the raw reads were informative for taxonomic purposes. Finally, using the mixed-species libraries, we showed that it was feasible to quantify with confidence the relative abundance of individual species in the mixtures.
Collapse
|
11
|
Miller IJ, Rees ER, Ross J, Miller I, Baxa J, Lopera J, Kerby RL, Rey FE, Kwan JC. Autometa: automated extraction of microbial genomes from individual shotgun metagenomes. Nucleic Acids Res 2019; 47:e57. [PMID: 30838416 PMCID: PMC6547426 DOI: 10.1093/nar/gkz148] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 02/15/2019] [Accepted: 02/21/2019] [Indexed: 12/28/2022] Open
Abstract
Shotgun metagenomics is a powerful, high-resolution technique enabling the study of microbial communities in situ. However, species-level resolution is only achieved after a process of 'binning' where contigs predicted to originate from the same genome are clustered. Such culture-independent sequencing frequently unearths novel microbes, and so various methods have been devised for reference-free binning. As novel microbiomes of increasing complexity are explored, sometimes associated with non-model hosts, robust automated binning methods are required. Existing methods struggle with eukaryotic contamination and cannot handle highly complex single metagenomes. We therefore developed an automated binning pipeline, termed 'Autometa', to address these issues. This command-line application integrates sequence homology, nucleotide composition, coverage and the presence of single-copy marker genes to separate microbial genomes from non-model host genomes and other eukaryotic contaminants, before deconvoluting individual genomes from single metagenomes. The method is able to effectively separate over 1000 genomes from a metagenome, allowing the study of previously intractably complex environments at the level of single species. Autometa is freely available at https://bitbucket.org/jason_c_kwan/autometa and as a docker image at https://hub.docker.com/r/jasonkwan/autometa under the GNU Affero General Public License 3 (AGPL 3).
Collapse
Affiliation(s)
- Ian J Miller
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Jennifer Ross
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Izaak Miller
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Jared Baxa
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Juan Lopera
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Robert L Kerby
- Department of Bacteriology, University of Wisconsin–Madison, 1550 Linden Drive, Madison, WI 53706, USA
| | - Federico E Rey
- Department of Bacteriology, University of Wisconsin–Madison, 1550 Linden Drive, Madison, WI 53706, USA
| | - Jason C Kwan
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| |
Collapse
|