1
|
Siderius NL, Sapula SA, Hart BJ, Hutchings JL, Venter H. Enterobacter adelaidei sp. nov. Isolation of an extensively drug resistant strain from hospital wastewater in Australia and the global distribution of the species. Microbiol Res 2024; 288:127867. [PMID: 39163716 DOI: 10.1016/j.micres.2024.127867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 08/02/2024] [Accepted: 08/03/2024] [Indexed: 08/22/2024]
Abstract
BACKGROUND Enterobacter species are included among the normal human gut microflora and persist in a diverse range of other environmental niches. They have become important opportunistic nosocomial pathogens known to harbour plasmid-mediated multi-class antimicrobial resistance (AMR) determinants. Global AMR surveillance of Enterobacterales isolates shows the genus is second to Klebsiella in terms of frequency of carbapenem resistance. Enterobacter taxonomy is confusing and standard species identification methods are largely inaccurate or insufficient. There are currently 27 named species and a total of 46 taxa in the genus distinguishable via average nucleotide identity (ANI) calculation between pairs of genomic sequences. Here we describe an Enterobacter strain, ECC3473, isolated from the wastewater of an Australian hospital whose species could not be determined by standard methods nor by ribosomal RNA gene multi-locus typing. AIM To characterise ECC3473 in terms of phenotypic and genotypic antimicrobial resistance, biochemical characteristics and taxonomy as well as to determine the global distribution of the novel species to which it belongs. METHODS Standard broth dilution and disk diffusion were used to determine phenotypic AMR. The strain's complete genome, including plasmids, was obtained following long- and short read sequencing and a novel long/short read hybrid assembly and polishing, and the genomic basis of AMR was determined. Phylogenomic analysis and quantitative measures of relatedness (ANI, digital DNA-DNA hybridisation, and difference in G+C content) were used to study the taxonomic relationship between ECC3473 and Enterobacter type-strains. NCBI and PubMLST databases and the literature were searched for additional members of the novel species to determine its global distribution. RESULTS ECC3473 is one of 21 strains isolated globally belonging to a novel Enterobacter species for which the name, Enterobacter adelaidei sp. nov. is proposed. The novel species was found to be resilient in its capacity to persist in contaminated water and adaptable in its ability to accumulate multiple transmissible AMR determinants. CONCLUSION E. adelaidei sp. nov. may become increasingly important to the dissemination of AMR.
Collapse
Affiliation(s)
- Naomi L Siderius
- UniSA Clinical and Health Sciences, Health and Biomedical Innovation, University of South Australia, Adelaide, SA 5000, Australia.
| | - Sylvia A Sapula
- UniSA Clinical and Health Sciences, Health and Biomedical Innovation, University of South Australia, Adelaide, SA 5000, Australia.
| | - Bradley J Hart
- UniSA Clinical and Health Sciences, Health and Biomedical Innovation, University of South Australia, Adelaide, SA 5000, Australia.
| | - Joshua L Hutchings
- UniSA Clinical and Health Sciences, Health and Biomedical Innovation, University of South Australia, Adelaide, SA 5000, Australia.
| | - Henrietta Venter
- UniSA Clinical and Health Sciences, Health and Biomedical Innovation, University of South Australia, Adelaide, SA 5000, Australia.
| |
Collapse
|
2
|
Williams SK, Jerlström Hultqvist J, Eglit Y, Salas-Leiva DE, Curtis B, Orr RJS, Stairs CW, Atalay TN, MacMillan N, Simpson AGB, Roger AJ. Extreme mitochondrial reduction in a novel group of free-living metamonads. Nat Commun 2024; 15:6805. [PMID: 39122691 PMCID: PMC11316075 DOI: 10.1038/s41467-024-50991-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 07/15/2024] [Indexed: 08/12/2024] Open
Abstract
Metamonads are a diverse group of heterotrophic microbial eukaryotes adapted to living in hypoxic environments. All metamonads but one harbour metabolically altered 'mitochondrion-related organelles' (MROs) with reduced functions, however the degree of reduction varies. Here, we generate high-quality draft genomes, transcriptomes, and predicted proteomes for five recently discovered free-living metamonads. Phylogenomic analyses placed these organisms in a group we name the 'BaSk' (Barthelonids+Skoliomonads) clade, a deeply branching sister group to the Fornicata, a phylum that includes parasitic and free-living flagellates. Bioinformatic analyses of gene models shows that these organisms are predicted to have extremely reduced MRO proteomes in comparison to other free-living metamonads. Loss of the mitochondrial iron-sulfur cluster assembly system in some organisms in this group appears to be linked to the acquisition in their common ancestral lineage of a SUF-like minimal system Fe/S cluster pathway by lateral gene transfer. One of the isolates, Skoliomonas litria, appears to have lost all other known MRO pathways. No proteins were confidently assigned to the predicted MRO proteome of this organism suggesting that the organelle has been lost. The extreme mitochondrial reduction observed within this free-living anaerobic protistan clade demonstrates that mitochondrial functions may be completely lost even in free-living organisms.
Collapse
Affiliation(s)
- Shelby K Williams
- Institute for Comparative Genomics, Dalhousie University, Halifax, NS, Canada
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Canada
| | - Jon Jerlström Hultqvist
- Institute for Comparative Genomics, Dalhousie University, Halifax, NS, Canada
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Yana Eglit
- Institute for Comparative Genomics, Dalhousie University, Halifax, NS, Canada
- Department of Biology, Dalhousie University, Halifax, Canada
| | - Dayana E Salas-Leiva
- Institute for Comparative Genomics, Dalhousie University, Halifax, NS, Canada
- Department of Biochemistry, Cambridge University, Cambridge, UK
| | - Bruce Curtis
- Institute for Comparative Genomics, Dalhousie University, Halifax, NS, Canada
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Canada
| | - Russell J S Orr
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, Oslo, Norway
| | | | - Tuğba N Atalay
- Institute for Comparative Genomics, Dalhousie University, Halifax, NS, Canada
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Canada
| | - Naomi MacMillan
- Institute for Comparative Genomics, Dalhousie University, Halifax, NS, Canada
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Canada
| | - Alastair G B Simpson
- Institute for Comparative Genomics, Dalhousie University, Halifax, NS, Canada
- Department of Biology, Dalhousie University, Halifax, Canada
| | - Andrew J Roger
- Institute for Comparative Genomics, Dalhousie University, Halifax, NS, Canada.
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Canada.
| |
Collapse
|
3
|
Bouras G, Judd LM, Edwards RA, Vreugde S, Stinear TP, Wick RR. How low can you go? Short-read polishing of Oxford Nanopore bacterial genome assemblies. Microb Genom 2024; 10:001254. [PMID: 38833287 PMCID: PMC11261834 DOI: 10.1099/mgen.0.001254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 04/30/2024] [Indexed: 06/06/2024] Open
Abstract
It is now possible to assemble near-perfect bacterial genomes using Oxford Nanopore Technologies (ONT) long reads, but short-read polishing is usually required for perfection. However, the effect of short-read depth on polishing performance is not well understood. Here, we introduce Pypolca (with default and careful parameters) and Polypolish v0.6.0 (with a new careful parameter). We then show that: (1) all polishers other than Pypolca-careful, Polypolish-default and Polypolish-careful commonly introduce false-positive errors at low read depth; (2) most of the benefit of short-read polishing occurs by 25× depth; (3) Polypolish-careful almost never introduces false-positive errors at any depth; and (4) Pypolca-careful is the single most effective polisher. Overall, we recommend the following polishing strategies: Polypolish-careful alone when depth is very low (<5×), Polypolish-careful and Pypolca-careful when depth is low (5-25×), and Polypolish-default and Pypolca-careful when depth is sufficient (>25×).
Collapse
Affiliation(s)
- George Bouras
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery – Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, South Australia, Australia
| | - Louise M. Judd
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Robert A. Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| | - Sarah Vreugde
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery – Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, South Australia, Australia
| | - Timothy P. Stinear
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Ryan R. Wick
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| |
Collapse
|
4
|
Bouras G, Houtak G, Wick RR, Mallawaarachchi V, Roach MJ, Papudeshi B, Judd LM, Sheppard AE, Edwards RA, Vreugde S. Hybracter: enabling scalable, automated, complete and accurate bacterial genome assemblies. Microb Genom 2024; 10:001244. [PMID: 38717808 PMCID: PMC11165638 DOI: 10.1099/mgen.0.001244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/16/2024] [Indexed: 05/21/2024] Open
Abstract
Improvements in the accuracy and availability of long-read sequencing mean that complete bacterial genomes are now routinely reconstructed using hybrid (i.e. short- and long-reads) assembly approaches. Complete genomes allow a deeper understanding of bacterial evolution and genomic variation beyond single nucleotide variants. They are also crucial for identifying plasmids, which often carry medically significant antimicrobial resistance genes. However, small plasmids are often missed or misassembled by long-read assembly algorithms. Here, we present Hybracter which allows for the fast, automatic and scalable recovery of near-perfect complete bacterial genomes using a long-read first assembly approach. Hybracter can be run either as a hybrid assembler or as a long-read only assembler. We compared Hybracter to existing automated hybrid and long-read only assembly tools using a diverse panel of samples of varying levels of long-read accuracy with manually curated ground truth reference genomes. We demonstrate that Hybracter as a hybrid assembler is more accurate and faster than the existing gold standard automated hybrid assembler Unicycler. We also show that Hybracter with long-reads only is the most accurate long-read only assembler and is comparable to hybrid methods in accurately recovering small plasmids.
Collapse
Affiliation(s)
- George Bouras
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery – Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, South Australia, Australia
| | - Ghais Houtak
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery – Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, South Australia, Australia
| | - Ryan R. Wick
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| | - Michael J. Roach
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
- Adelaide Centre for Epigenetics and South Australian Immunogenomics Cancer Institute, The University of Adelaide, Adelaide, Australia
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| | - Lousie M. Judd
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Anna E. Sheppard
- School of Biological Sciences, The University of Adelaide, Adelaide, Australia
| | - Robert A. Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| | - Sarah Vreugde
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery – Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, South Australia, Australia
| |
Collapse
|
5
|
Bouras G, Houtak G, Wick RR, Mallawaarachchi V, Roach MJ, Papudeshi B, Judd LM, Sheppard AE, Edwards RA, Vreugde S. Hybracter: Enabling Scalable, Automated, Complete and Accurate Bacterial Genome Assemblies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.12.571215. [PMID: 38168369 PMCID: PMC10760025 DOI: 10.1101/2023.12.12.571215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Improvements in the accuracy and availability of long-read sequencing mean that complete bacterial genomes are now routinely reconstructed using hybrid (i.e. short- and long-reads) assembly approaches. Complete genomes allow a deeper understanding of bacterial evolution and genomic variation beyond single nucleotide variants (SNVs). They are also crucial for identifying plasmids, which often carry medically significant antimicrobial resistance (AMR) genes. However, small plasmids are often missed or misassembled by long-read assembly algorithms. Here, we present Hybracter which allows for the fast, automatic, and scalable recovery of near-perfect complete bacterial genomes using a long-read first assembly approach. Hybracter can be run either as a hybrid assembler or as a long-read only assembler. We compared Hybracter to existing automated hybrid and long-read only assembly tools using a diverse panel of samples of varying levels of long-read accuracy with manually curated ground truth reference genomes. We demonstrate that Hybracter as a hybrid assembler is more accurate and faster than the existing gold standard automated hybrid assembler Unicycler. We also show that Hybracter with long-reads only is the most accurate long-read only assembler and is comparable to hybrid methods in accurately recovering small plasmids.
Collapse
Affiliation(s)
- George Bouras
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery - Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, South Australia, Australia
| | - Ghais Houtak
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery - Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, South Australia, Australia
| | - Ryan R. Wick
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| | - Michael J. Roach
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
- Adelaide Centre for Epigenetics and South Australian Immunogenomics Cancer Institute, The University of Adelaide, Adelaide, Australia
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| | - Lousie M. Judd
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Anna E. Sheppard
- School of Biological Sciences, The University of Adelaide, Adelaide, Australia
| | - Robert A. Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| | - Sarah Vreugde
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery - Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, South Australia, Australia
| |
Collapse
|
6
|
Li K, Xu P, Wang J, Yi X, Jiao Y. Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nat Commun 2023; 14:6556. [PMID: 37848433 PMCID: PMC10582259 DOI: 10.1038/s41467-023-42336-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 10/05/2023] [Indexed: 10/19/2023] Open
Abstract
Assembly of a high-quality genome is important for downstream comparative and functional genomic studies. However, most tools for genome assembly assessment only give qualitative reports, which do not pinpoint assembly errors at specific regions. Here, we develop a new reference-free tool, Clipping information for Revealing Assembly Quality (CRAQ), which maps raw reads back to assembled sequences to identify regional and structural assembly errors based on effective clipped alignment information. Error counts are transformed into corresponding assembly evaluation indexes to reflect the assembly quality at single-nucleotide resolution. Notably, CRAQ distinguishes assembly errors from heterozygous sites or structural differences between haplotypes. This tool can clearly indicate low-quality regions and potential structural error breakpoints; thus, it can identify misjoined regions that should be split for further scaffold building and improvement of the assembly. We have benchmarked CRAQ on multiple genomes assembled using different strategies, and demonstrated the misjoin correction for improving the constructed pseudomolecules.
Collapse
Affiliation(s)
- Kunpeng Li
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Peng Xu
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinpeng Wang
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xin Yi
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China
- China National Botanical Garden, Beijing, China
| | - Yuannian Jiao
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
- China National Botanical Garden, Beijing, China.
| |
Collapse
|
7
|
Rafique Q, Rehman A, Afghan MS, Ahmad HM, Zafar I, Fayyaz K, Ain Q, Rayan RA, Al-Aidarous KM, Rashid S, Mushtaq G, Sharma R. Reviewing methods of deep learning for diagnosing COVID-19, its variants and synergistic medicine combinations. Comput Biol Med 2023; 163:107191. [PMID: 37354819 PMCID: PMC10281043 DOI: 10.1016/j.compbiomed.2023.107191] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 05/28/2023] [Accepted: 06/19/2023] [Indexed: 06/26/2023]
Abstract
The COVID-19 pandemic has necessitated the development of reliable diagnostic methods for accurately detecting the novel coronavirus and its variants. Deep learning (DL) techniques have shown promising potential as screening tools for COVID-19 detection. In this study, we explore the realistic development of DL-driven COVID-19 detection methods and focus on the fully automatic framework using available resources, which can effectively investigate various coronavirus variants through modalities. We conducted an exploration and comparison of several diagnostic techniques that are widely used and globally validated for the detection of COVID-19. Furthermore, we explore review-based studies that provide detailed information on synergistic medicine combinations for the treatment of COVID-19. We recommend DL methods that effectively reduce time, cost, and complexity, providing valuable guidance for utilizing available synergistic combinations in clinical and research settings. This study also highlights the implication of innovative diagnostic technical and instrumental strategies, exploring public datasets, and investigating synergistic medicines using optimised DL rules. By summarizing these findings, we aim to assist future researchers in their endeavours by providing a comprehensive overview of the implication of DL techniques in COVID-19 detection and treatment. Integrating DL methods with various diagnostic approaches holds great promise in improving the accuracy and efficiency of COVID-19 diagnostics, thus contributing to effective control and management of the ongoing pandemic.
Collapse
Affiliation(s)
- Qandeel Rafique
- Department of Internal Medicine, Sahiwal Medical College, Sahiwal, 57040, Pakistan.
| | - Ali Rehman
- Department of General Medicine Govt. Eye and General Hospital Lahore, 54000, Pakistan.
| | - Muhammad Sher Afghan
- Department of Internal Medicine District Headquarter Hospital Faislaabad, 62300, Pakistan.
| | - Hafiz Muhamad Ahmad
- Department of Internal Medicine District Headquarter Hospital Bahawalnagar, 62300, Pakistan.
| | - Imran Zafar
- Department of Bioinformatics and Computational Biology, Virtual University Pakistan, 44000, Pakistan.
| | - Kompal Fayyaz
- Department of National Centre for Bioinformatics, Quaid-I-Azam University Islamabad, 45320, Pakistan.
| | - Quratul Ain
- Department of Chemistry, Government College Women University Faisalabad, 03822, Pakistan.
| | - Rehab A Rayan
- Department of Epidemiology, High Institute of Public Health, Alexandria University, 21526, Egypt.
| | - Khadija Mohammed Al-Aidarous
- Department of Computer Science, College of Science and Arts in Sharurah, Najran University, 51730, Saudi Arabia.
| | - Summya Rashid
- Department of Pharmacology & Toxicology, College of Pharmacy, Prince Sattam Bin Abdulaziz University, P.O. Box 173, Al-Kharj, 11942, Saudi Arabia.
| | - Gohar Mushtaq
- Center for Scientific Research, Faculty of Medicine, Idlib University, Idlib, Syria.
| | - Rohit Sharma
- Department of Rasashastra and Bhaishajya Kalpana, Faculty of Ayurveda, Institute of Medical Sciences, Banaras Hindu University, Varanasi, India.
| |
Collapse
|
8
|
Medvedev P. Theoretical Analysis of Sequencing Bioinformatics Algorithms and Beyond. COMMUNICATIONS OF THE ACM 2023; 66:118-125. [PMID: 38736702 PMCID: PMC11087067 DOI: 10.1145/3571723] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/14/2024]
Abstract
A case study reveals the theoretical analysis of algorithms is not always as helpful as standard dogma might suggest.
Collapse
Affiliation(s)
- Paul Medvedev
- Department of Computer Science and Engineering and the Department of Biochemistry and Molecular Biology and the Director of the Center for Computational Biology and Bioinformatics at Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
9
|
Žárský V, Karnkowska A, Boscaro V, Trznadel M, Whelan TA, Hiltunen-Thorén M, Onut-Brännström I, Abbott CL, Fast NM, Burki F, Keeling PJ. Contrasting outcomes of genome reduction in mikrocytids and microsporidians. BMC Biol 2023; 21:137. [PMID: 37280585 DOI: 10.1186/s12915-023-01635-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 05/26/2023] [Indexed: 06/08/2023] Open
Abstract
BACKGROUND Intracellular symbionts often undergo genome reduction, losing both coding and non-coding DNA in a process that ultimately produces small, gene-dense genomes with few genes. Among eukaryotes, an extreme example is found in microsporidians, which are anaerobic, obligate intracellular parasites related to fungi that have the smallest nuclear genomes known (except for the relic nucleomorphs of some secondary plastids). Mikrocytids are superficially similar to microsporidians: they are also small, reduced, obligate parasites; however, as they belong to a very different branch of the tree of eukaryotes, the rhizarians, such similarities must have evolved in parallel. Since little genomic data are available from mikrocytids, we assembled a draft genome of the type species, Mikrocytos mackini, and compared the genomic architecture and content of microsporidians and mikrocytids to identify common characteristics of reduction and possible convergent evolution. RESULTS At the coarsest level, the genome of M. mackini does not exhibit signs of extreme genome reduction; at 49.7 Mbp with 14,372 genes, the assembly is much larger and gene-rich than those of microsporidians. However, much of the genomic sequence and most (8075) of the protein-coding genes code for transposons, and may not contribute much of functional relevance to the parasite. Indeed, the energy and carbon metabolism of M. mackini share several similarities with those of microsporidians. Overall, the predicted proteome involved in cellular functions is quite reduced and gene sequences are extremely divergent. Microsporidians and mikrocytids also share highly reduced spliceosomes that have retained a strikingly similar subset of proteins despite having reduced independently. In contrast, the spliceosomal introns in mikrocytids are very different from those of microsporidians in that they are numerous, conserved in sequence, and constrained to an exceptionally narrow size range (all 16 or 17 nucleotides long) at the shortest extreme of known intron lengths. CONCLUSIONS Nuclear genome reduction has taken place many times and has proceeded along different routes in different lineages. Mikrocytids show a mix of similarities and differences with other extreme cases, including uncoupling the actual size of a genome with its functional reduction.
Collapse
Affiliation(s)
- Vojtečh Žárský
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada
| | - Anna Karnkowska
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada
- Institute of Evolutionary Biology, Faculty of Biology, University of Warsaw, 02-089, Warsaw, Poland
| | - Vittorio Boscaro
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada.
| | - Morelia Trznadel
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada
| | - Thomas A Whelan
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada
| | - Markus Hiltunen-Thorén
- Department of Organismal Biology, Uppsala University, Norbyv. 18D, 752 36, Uppsala, Sweden
- Department of Ecology, Environment and Plant Sciences, Stockholm University, SE-106 91, Stockholm, Sweden
| | - Ioana Onut-Brännström
- Department of Organismal Biology, Uppsala University, Norbyv. 18D, 752 36, Uppsala, Sweden
- Department of Ecology and Genetics, Uppsala University, 752 36, Uppsala, Sweden
- Natural History Museum, University of Oslo, 0562, Oslo, Norway
| | - Cathryn L Abbott
- Pacific Biological Station, Fisheries and Oceans Canada, Nanaimo, BC, V9T 6N7, Canada
| | - Naomi M Fast
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada
| | - Fabien Burki
- Department of Organismal Biology, Uppsala University, Norbyv. 18D, 752 36, Uppsala, Sweden
| | - Patrick J Keeling
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada.
| |
Collapse
|
10
|
Mineeva O, Danciu D, Schölkopf B, Ley RE, Rätsch G, Youngblut ND. ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning. PLoS Comput Biol 2023; 19:e1011001. [PMID: 37126495 PMCID: PMC10174551 DOI: 10.1371/journal.pcbi.1011001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 05/11/2023] [Accepted: 03/06/2023] [Indexed: 05/02/2023] Open
Abstract
The number of published metagenome assemblies is rapidly growing due to advances in sequencing technologies. However, sequencing errors, variable coverage, repetitive genomic regions, and other factors can produce misassemblies, which are challenging to detect for taxonomically novel genomic data. Assembly errors can affect all downstream analyses of the assemblies. Accuracy for the state of the art in reference-free misassembly prediction does not exceed an AUPRC of 0.57, and it is not clear how well these models generalize to real-world data. Here, we present the Residual neural network for Misassembled Contig identification (ResMiCo), a deep learning approach for reference-free identification of misassembled contigs. To develop ResMiCo, we first generated a training dataset of unprecedented size and complexity that can be used for further benchmarking and developments in the field. Through rigorous validation, we show that ResMiCo is substantially more accurate than the state of the art, and the model is robust to novel taxonomic diversity and varying assembly methods. ResMiCo estimated 7% misassembled contigs per metagenome across multiple real-world datasets. We demonstrate how ResMiCo can be used to optimize metagenome assembly hyperparameters to improve accuracy, instead of optimizing solely for contiguity. The accuracy, robustness, and ease-of-use of ResMiCo make the tool suitable for general quality control of metagenome assemblies and assembly methodology optimization.
Collapse
Affiliation(s)
- Olga Mineeva
- Department of Computer Science, ETH Zürich, Zürich, Switzerland
- Department of Empirical Inference, Max Planck Institute for Intelligent Systems, Tübingen, Germany
- Swiss Institute for Bioinformatics, Lausanne, Switzerland
| | - Daniel Danciu
- Department of Computer Science, ETH Zürich, Zürich, Switzerland
| | - Bernhard Schölkopf
- Department of Computer Science, ETH Zürich, Zürich, Switzerland
- Department of Empirical Inference, Max Planck Institute for Intelligent Systems, Tübingen, Germany
- ETH AI center, ETH Zürich, Zürich, Switzerland
| | - Ruth E Ley
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany
| | - Gunnar Rätsch
- Department of Computer Science, ETH Zürich, Zürich, Switzerland
- Swiss Institute for Bioinformatics, Lausanne, Switzerland
- ETH AI center, ETH Zürich, Zürich, Switzerland
- Department of Biology, ETH Zürich, Zürich, Switzerland
- Medical Informatics Unit, Zürich University Hospital, Zürich, Switzerland
| | - Nicholas D Youngblut
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany
| |
Collapse
|
11
|
Jia L, Wu Y, Dong Y, Chen J, Chen WH, Zhao XM. A survey on computational strategies for genome-resolved gut metagenomics. Brief Bioinform 2023; 24:7145904. [PMID: 37114640 DOI: 10.1093/bib/bbad162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 03/20/2023] [Accepted: 04/04/2023] [Indexed: 04/29/2023] Open
Abstract
Recovering high-quality metagenome-assembled genomes (HQ-MAGs) is critical for exploring microbial compositions and microbe-phenotype associations. However, multiple sequencing platforms and computational tools for this purpose may confuse researchers and thus call for extensive evaluation. Here, we systematically evaluated a total of 40 combinations of popular computational tools and sequencing platforms (i.e. strategies), involving eight assemblers, eight metagenomic binners and four sequencing technologies, including short-, long-read and metaHiC sequencing. We identified the best tools for the individual tasks (e.g. the assembly and binning) and combinations (e.g. generating more HQ-MAGs) depending on the availability of the sequencing data. We found that the combination of the hybrid assemblies and metaHiC-based binning performed best, followed by the hybrid and long-read assemblies. More importantly, both long-read and metaHiC sequencings link more mobile elements and antibiotic resistance genes to bacterial hosts and improve the quality of public human gut reference genomes with 32% (34/105) HQ-MAGs that were either of better quality than those in the Unified Human Gastrointestinal Genome catalog version 2 or novel.
Collapse
Affiliation(s)
- Longhao Jia
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Yingjian Wu
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Yanqi Dong
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Jingchao Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Wei-Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
- Institution of Medical Artificial Intelligence, Binzhou Medical University, Yantai 264003, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Ministry of Education, Shanghai 200433, China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
| |
Collapse
|
12
|
Bizic M, Brad T, Ionescu D, Barbu-Tudoran L, Zoccarato L, Aerts JW, Contarini PE, Gros O, Volland JM, Popa R, Ody J, Vellone D, Flot JF, Tighe S, Sarbu SM. Cave Thiovulum (Candidatus Thiovulum stygium) differs metabolically and genomically from marine species. THE ISME JOURNAL 2023; 17:340-353. [PMID: 36528730 PMCID: PMC9938260 DOI: 10.1038/s41396-022-01350-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 11/29/2022] [Accepted: 12/02/2022] [Indexed: 12/23/2022]
Abstract
Thiovulum spp. (Campylobacterota) are large sulfur bacteria that form veil-like structures in aquatic environments. The sulfidic Movile Cave (Romania), sealed from the atmosphere for ~5 million years, has several aqueous chambers, some with low atmospheric O2 (~7%). The cave's surface-water microbial community is dominated by bacteria we identified as Thiovulum. We show that this strain, and others from subsurface environments, are phylogenetically distinct from marine Thiovulum. We assembled a closed genome of the Movile strain and confirmed its metabolism using RNAseq. We compared the genome of this strain and one we assembled from public data from the sulfidic Frasassi caves to four marine genomes, including Candidatus Thiovulum karukerense and Ca. T. imperiosus, whose genomes we sequenced. Despite great spatial and temporal separation, the genomes of the Movile and Frasassi Thiovulum were highly similar, differing greatly from the very diverse marine strains. We concluded that cave Thiovulum represent a new species, named here Candidatus Thiovulum stygium. Based on their genomes, cave Thiovulum can switch between aerobic and anaerobic sulfide oxidation using O2 and NO3- as electron acceptors, the latter likely via dissimilatory nitrate reduction to ammonia. Thus, Thiovulum is likely important to both S and N cycles in sulfidic caves. Electron microscopy analysis suggests that at least some of the short peritrichous structures typical of Thiovulum are type IV pili, for which genes were found in all strains. These pili may play a role in veil formation, by connecting adjacent cells, and in the motility of these exceptionally fast swimmers.
Collapse
Affiliation(s)
- Mina Bizic
- Leibniz Institute for Freshwater Ecology and Inland Fisheries, IGB, Dep 3, Plankton and Microbial Ecology, Zur Alte Fischerhütte 2, OT Neuglobsow, 16775, Stechlin, Germany. .,Berlin-Brandenburg Institute of Advanced Biodiversity Research (BBIB), Berlin, Germany.
| | - Traian Brad
- "Emil Racoviţă" Institute of Speleology, Clinicilor 5-7, 400006, Cluj-Napoca Romania, Romania.
| | - Danny Ionescu
- Leibniz Institute for Freshwater Ecology and Inland Fisheries, IGB, Dep 3, Plankton and Microbial Ecology, Zur Alte Fischerhütte 2, OT Neuglobsow, 16775, Stechlin, Germany. .,Berlin-Brandenburg Institute of Advanced Biodiversity Research (BBIB), Berlin, Germany.
| | - Lucian Barbu-Tudoran
- grid.7399.40000 0004 1937 1397Center for Electron Microscopy, “Babeș-Bolyai” University, Clinicilor 5, 400006 Cluj-Napoca, Romania
| | - Luca Zoccarato
- Leibniz Institute for Freshwater Ecology and Inland Fisheries, IGB, Dep 3, Plankton and Microbial Ecology, Zur Alte Fischerhütte 2, OT Neuglobsow, 16775 Stechlin, Germany ,grid.5173.00000 0001 2298 5320Institute of Computational Biology, University of Natural Resources and Life Sciences, Gregor-Mendel-Straße 3, 31180 Vienna, Austria
| | - Joost W. Aerts
- grid.12380.380000 0004 1754 9227Department of Molecular Cell Physiology, Faculty of Earth and Life sciences, De Boelelaan 1085, 1081 HV Amsterdam, The Netherlands
| | - Paul-Emile Contarini
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d’Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, 97110 Pointe-à-Pitre, France ,Laboratory for Research in Complex Systems, Menlo Park, CA USA
| | - Olivier Gros
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d’Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, 97110 Pointe-à-Pitre, France
| | - Jean-Marie Volland
- Laboratory for Research in Complex Systems, Menlo Park, CA USA ,grid.184769.50000 0001 2231 4551Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 94720 Berkeley, CA USA
| | - Radu Popa
- River Road Research, 62 Leslie St, Buffalo, NY 1421 USA
| | - Jessica Ody
- grid.4989.c0000 0001 2348 0746Evolutionary Biology and Ecology, Université libre de Bruxelles (ULB), C.P. 160/12, Avenue F.D. Roosevelt 50, 1050 Brussels, Belgium
| | - Daniel Vellone
- grid.59062.380000 0004 1936 7689Vermont Integrative Genomics Lab, University of Vermont Cancer Center, Health Science Research Facility, Burlington, Vermont, VT 05405 USA
| | - Jean-François Flot
- grid.4989.c0000 0001 2348 0746Evolutionary Biology and Ecology, Université libre de Bruxelles (ULB), C.P. 160/12, Avenue F.D. Roosevelt 50, 1050 Brussels, Belgium ,Interuniversity Institute of Bioinformatics in Brussels—(IB)², Brussels, Belgium
| | - Scott Tighe
- grid.59062.380000 0004 1936 7689Vermont Integrative Genomics Lab, University of Vermont Cancer Center, Health Science Research Facility, Burlington, Vermont, VT 05405 USA
| | - Serban M. Sarbu
- grid.501624.40000 0001 2260 1489“Emil Racoviţă” Institute of Speleology, Frumoasă 31-B, 010986 Bucureşti, Romania ,grid.253555.10000 0001 2297 1981Department of Biological Sciences, California State University, Chico, CA 95929 USA
| |
Collapse
|
13
|
Wick RR, Judd LM, Holt KE. Assembling the perfect bacterial genome using Oxford Nanopore and Illumina sequencing. PLoS Comput Biol 2023; 19:e1010905. [PMID: 36862631 PMCID: PMC9980784 DOI: 10.1371/journal.pcbi.1010905] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023] Open
Abstract
A perfect bacterial genome assembly is one where the assembled sequence is an exact match for the organism's genome-each replicon sequence is complete and contains no errors. While this has been difficult to achieve in the past, improvements in long-read sequencing, assemblers, and polishers have brought perfect assemblies within reach. Here, we describe our recommended approach for assembling a bacterial genome to perfection using a combination of Oxford Nanopore Technologies long reads and Illumina short reads: Trycycler long-read assembly, Medaka long-read polishing, Polypolish short-read polishing, followed by other short-read polishing tools and manual curation. We also discuss potential pitfalls one might encounter when assembling challenging genomes, and we provide an online tutorial with sample data (github.com/rrwick/perfect-bacterial-genome-tutorial).
Collapse
Affiliation(s)
- Ryan R. Wick
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Australia
| | - Louise M. Judd
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Kathryn E. Holt
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Australia
- Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, United Kingdom
| |
Collapse
|
14
|
Lai S, Pan S, Sun C, Coelho LP, Chen WH, Zhao XM. metaMIC: reference-free misassembly identification and correction of de novo metagenomic assemblies. Genome Biol 2022; 23:242. [PMID: 36376928 PMCID: PMC9661791 DOI: 10.1186/s13059-022-02810-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 11/01/2022] [Indexed: 11/16/2022] Open
Abstract
Evaluating the quality of metagenomic assemblies is important for constructing reliable metagenome-assembled genomes and downstream analyses. Here, we present metaMIC ( https://github.com/ZhaoXM-Lab/metaMIC ), a machine learning-based tool for identifying and correcting misassemblies in metagenomic assemblies. Benchmarking results on both simulated and real datasets demonstrate that metaMIC outperforms existing tools when identifying misassembled contigs. Furthermore, metaMIC is able to localize the misassembly breakpoints, and the correction of misassemblies by splitting at misassembly breakpoints can improve downstream scaffolding and binning results.
Collapse
Affiliation(s)
- Senying Lai
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Shaojun Pan
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Chuqing Sun
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei China
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Wei-Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei China
- College of Life Science, Henan Normal University, Xinxiang, Henan China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China
- International Human Phenome Institutes (Shanghai), Shanghai, China
- Zhangjiang Fudan International Innovation Center, Shanghai, China
| |
Collapse
|
15
|
Fan J, Chan S, Patro R. Perplexity: evaluating transcript abundance estimation in the absence of ground truth. Algorithms Mol Biol 2022; 17:6. [PMID: 35331283 PMCID: PMC8951746 DOI: 10.1186/s13015-022-00214-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 03/01/2022] [Indexed: 11/20/2022] Open
Abstract
Background There has been rapid development of probabilistic models and inference methods for transcript abundance estimation from RNA-seq data. These models aim to accurately estimate transcript-level abundances, to account for different biases in the measurement process, and even to assess uncertainty in resulting estimates that can be propagated to subsequent analyses. The assumed accuracy of the estimates inferred by such methods underpin gene expression based analysis routinely carried out in the lab. Although hyperparameter selection is known to affect the distributions of inferred abundances (e.g. producing smooth versus sparse estimates), strategies for performing model selection in experimental data have been addressed informally at best. Results We derive perplexity for evaluating abundance estimates on fragment sets directly. We adapt perplexity from the analogous metric used to evaluate language and topic models and extend the metric to carefully account for corner cases unique to RNA-seq. In experimental data, estimates with the best perplexity also best correlate with qPCR measurements. In simulated data, perplexity is well behaved and concordant with genome-wide measurements against ground truth and differential expression analysis. Furthermore, we demonstrate theoretically and experimentally that perplexity can be computed for arbitrary transcript abundance estimation models. Conclusions Alongside the derivation and implementation of perplexity for transcript abundance estimation, our study is the first to make possible model selection for transcript abundance estimation on experimental data in the absence of ground truth.
Collapse
|
16
|
Wick RR, Holt KE. Polypolish: Short-read polishing of long-read bacterial genome assemblies. PLoS Comput Biol 2022; 18:e1009802. [PMID: 35073327 PMCID: PMC8812927 DOI: 10.1371/journal.pcbi.1009802] [Citation(s) in RCA: 209] [Impact Index Per Article: 104.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 02/03/2022] [Accepted: 01/03/2022] [Indexed: 12/12/2022] Open
Abstract
Long-read-only bacterial genome assemblies usually contain residual errors, most commonly homopolymer-length errors. Short-read polishing tools can use short reads to fix these errors, but most rely on short-read alignment which is unreliable in repeat regions. Errors in such regions are therefore challenging to fix and often remain after short-read polishing. Here we introduce Polypolish, a new short-read polisher which uses all-per-read alignments to repair errors in repeat sequences that other polishers cannot. Polypolish performed well in benchmarking tests using both simulated and real reads, and it almost never introduced errors during polishing. The best results were achieved by using Polypolish in combination with other short-read polishers. Recent improvements in Oxford Nanopore Technologies sequencing platforms and assembly algorithms have made it easier than ever to generate complete bacterial genome sequences. However, Oxford Nanopore genome sequences suffer from errors that limit their utility in downstream analyses. To fix these errors, one can ‘polish’ the genome with Illumina sequencing, exploiting the fact that Oxford Nanopore and Illumina sequencing have different error profiles. There are several polishing tools which can fix most errors in an Oxford Nanopore genome, but they struggle with errors in repetitive regions of the genome. With this in mind, we have developed a polisher, Polypolish, which uses a novel approach that allows it to fix more errors in genomic repeats. Our results show that Polypolish is both effective at repairing sequence errors and very unlikely to introduce new errors. Polypolish can often fix errors that other polishers cannot and vice versa, so the best results come from using a combination of tools. Polypolish therefore has an important role in bacterial genome assembly methods that aim for the highest possible sequence accuracy.
Collapse
Affiliation(s)
- Ryan R. Wick
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria, Australia
- * E-mail:
| | - Kathryn E. Holt
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria, Australia
- Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, United Kingdom
| |
Collapse
|
17
|
Genome assembly and annotation. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00013-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
18
|
Poláková E, Albanaz ATS, Zakharova A, Novozhilova TS, Gerasimov ES, Yurchenko V. Ku80 is involved in telomere maintenance but dispensable for genomic stability in Leishmania mexicana. PLoS Negl Trop Dis 2021; 15:e0010041. [PMID: 34965251 PMCID: PMC8716037 DOI: 10.1371/journal.pntd.0010041] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 11/30/2021] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Telomeres are indispensable for genome stability maintenance. They are maintained by the telomere-associated protein complex, which include Ku proteins and a telomerase among others. Here, we investigated a role of Ku80 in Leishmania mexicana. Leishmania is a genus of parasitic protists of the family Trypanosomatidae causing a vector-born disease called leishmaniasis. METHODOLOGY/PRINCIPAL FINDINGS We used the previously established CRISPR/Cas9 system to mediate ablation of Ku80- and Ku70-encoding genes in L. mexicana. Complete knock-outs of both genes were confirmed by Southern blotting, whole-genome Illumina sequencing, and RT-qPCR. Resulting telomeric phenotypes were subsequently investigated using Southern blotting detection of terminal restriction fragments. The genome integrity in the Ku80- deficient cells was further investigated by whole-genome sequencing. Our work revealed that telomeres in the ΔKu80 L. mexicana are elongated compared to those of the wild type. This is a surprising finding considering that in another model trypanosomatid, Trypanosoma brucei, they are shortened upon ablation of the same gene. A telomere elongation phenotype has been documented in other species and associated with a presence of telomerase-independent alternative telomere lengthening pathway. Our results also showed that Ku80 appears to be not involved in genome stability maintenance in L. mexicana. CONCLUSION/SIGNIFICANCE Ablation of the Ku proteins in L. mexicana triggers telomere elongation, but does not have an adverse impact on genome integrity.
Collapse
Affiliation(s)
- Ester Poláková
- Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Amanda T. S. Albanaz
- Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Alexandra Zakharova
- Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | | | - Evgeny S. Gerasimov
- Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow, Russia
- Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Vyacheslav Yurchenko
- Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
- Martsinovsky Institute of Medical Parasitology, Tropical and Vector Borne Diseases, Sechenov University, Moscow, Russia
| |
Collapse
|
19
|
MacDonald ML, Lee KH. EvalDNA: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality. BMC Bioinformatics 2021; 22:570. [PMID: 34837948 PMCID: PMC8627028 DOI: 10.1186/s12859-021-04480-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 11/15/2021] [Indexed: 11/16/2022] Open
Abstract
Background To select the most complete, continuous, and accurate assembly for an organism of interest, comprehensive quality assessment of assemblies is necessary. We present a novel tool, called Evaluation of De Novo Assemblies (EvalDNA), which uses supervised machine learning for the quality scoring of genome assemblies and does not require an existing reference genome for accuracy assessment. Results EvalDNA calculates a list of quality metrics from an assembled sequence and applies a model created from supervised machine learning methods to integrate various metrics into a comprehensive quality score. A well-tested, accurate model for scoring mammalian genome sequences is provided as part of EvalDNA. This random forest regression model evaluates an assembled sequence based on continuity, completeness, and accuracy, and was able to explain 86% of the variation in reference-based quality scores within the testing data. EvalDNA was applied to human chromosome 14 assemblies from the GAGE study to rank genome assemblers and to compare EvalDNA to two other quality evaluation tools. In addition, EvalDNA was used to evaluate several genome assemblies of the Chinese hamster genome to help establish a better reference genome for the biopharmaceutical manufacturing community. EvalDNA was also used to assess more recent human assemblies from the QUAST-LG study completed in 2018, and its ability to score bacterial genomes was examined through application on bacterial assemblies from the GAGE-B study. Conclusions EvalDNA enables scientists to easily identify the best available genome assembly for their organism of interest without requiring a reference assembly. EvalDNA sets itself apart from other quality assessment tools by producing a quality score that enables direct comparison among assemblies from different species. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04480-2.
Collapse
Affiliation(s)
- Madolyn L MacDonald
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, 19711, USA.,Department of Computer and Information Sciences, University of Delaware, 18 Amstel Ave., Newark, 19716, USA.,Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Newark, 19711, USA
| | - Kelvin H Lee
- Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Newark, 19711, USA. .,Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, 19716, USA.
| |
Collapse
|
20
|
D’aes J, Fraiture MA, Bogaerts B, De Keersmaecker SCJ, Roosens NHC, Vanneste K. Characterization of Genetically Modified Microorganisms Using Short- and Long-Read Whole-Genome Sequencing Reveals Contaminations of Related Origin in Multiple Commercial Food Enzyme Products. Foods 2021; 10:2637. [PMID: 34828918 PMCID: PMC8624754 DOI: 10.3390/foods10112637] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 10/22/2021] [Accepted: 10/28/2021] [Indexed: 12/02/2022] Open
Abstract
Despite their presence being unauthorized on the European market, contaminations with genetically modified (GM) microorganisms have repeatedly been reported in diverse commercial microbial fermentation produce types. Several of these contaminations are related to a GM Bacillus velezensis used to synthesize a food enzyme protease, for which genomic characterization remains currently incomplete, and it is unknown whether these contaminations have a common origin. In this study, GM B. velezensis isolates from multiple food enzyme products were characterized by short- and long-read whole-genome sequencing (WGS), demonstrating that they harbor a free recombinant pUB110-derived plasmid carrying antimicrobial resistance genes. Additionally, single-nucleotide polymorphism (SNP) and whole-genome based comparative analyses showed that the isolates likely originate from the same parental GM strain. This study highlights the added value of a hybrid WGS approach for accurate genomic characterization of GMM (e.g., genomic location of the transgenic construct), and of SNP-based phylogenomic analysis for source-tracking of GMM.
Collapse
Affiliation(s)
- Jolien D’aes
- Transversal Activities in Applied Genomics (TAG), Department Expertise and Service Provision, Sciensano, J. Wytsmanstraat 14, 1050 Brussels, Belgium; (J.D.); (M.-A.F.); (B.B.); (S.C.J.D.K.); (N.H.C.R.)
| | - Marie-Alice Fraiture
- Transversal Activities in Applied Genomics (TAG), Department Expertise and Service Provision, Sciensano, J. Wytsmanstraat 14, 1050 Brussels, Belgium; (J.D.); (M.-A.F.); (B.B.); (S.C.J.D.K.); (N.H.C.R.)
| | - Bert Bogaerts
- Transversal Activities in Applied Genomics (TAG), Department Expertise and Service Provision, Sciensano, J. Wytsmanstraat 14, 1050 Brussels, Belgium; (J.D.); (M.-A.F.); (B.B.); (S.C.J.D.K.); (N.H.C.R.)
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9000 Ghent, Belgium
| | - Sigrid C. J. De Keersmaecker
- Transversal Activities in Applied Genomics (TAG), Department Expertise and Service Provision, Sciensano, J. Wytsmanstraat 14, 1050 Brussels, Belgium; (J.D.); (M.-A.F.); (B.B.); (S.C.J.D.K.); (N.H.C.R.)
| | - Nancy H. C. Roosens
- Transversal Activities in Applied Genomics (TAG), Department Expertise and Service Provision, Sciensano, J. Wytsmanstraat 14, 1050 Brussels, Belgium; (J.D.); (M.-A.F.); (B.B.); (S.C.J.D.K.); (N.H.C.R.)
| | - Kevin Vanneste
- Transversal Activities in Applied Genomics (TAG), Department Expertise and Service Provision, Sciensano, J. Wytsmanstraat 14, 1050 Brussels, Belgium; (J.D.); (M.-A.F.); (B.B.); (S.C.J.D.K.); (N.H.C.R.)
| |
Collapse
|
21
|
Music of metagenomics-a review of its applications, analysis pipeline, and associated tools. Funct Integr Genomics 2021; 22:3-26. [PMID: 34657989 DOI: 10.1007/s10142-021-00810-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 09/25/2021] [Accepted: 10/03/2021] [Indexed: 10/20/2022]
Abstract
This humble effort highlights the intricate details of metagenomics in a simple, poetic, and rhythmic way. The paper enforces the significance of the research area, provides details about major analytical methods, examines the taxonomy and assembly of genomes, emphasizes some tools, and concludes by celebrating the richness of the ecosystem populated by the "metagenome."
Collapse
|
22
|
Tihelka E, Cai C, Giacomelli M, Lozano-Fernandez J, Rota-Stabelli O, Huang D, Engel MS, Donoghue PCJ, Pisani D. The evolution of insect biodiversity. Curr Biol 2021; 31:R1299-R1311. [PMID: 34637741 DOI: 10.1016/j.cub.2021.08.057] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Insects comprise over half of all described animal species. Together with the Protura (coneheads), Collembola (springtails) and Diplura (two-pronged bristletails), insects form the Hexapoda, a terrestrial arthropod lineage characterised by possessing six legs. Exponential growth of genome-scale data for the hexapods has substantially altered our understanding of the origin and evolution of insect biodiversity. Phylogenomics has provided a new framework for reconstructing insect evolutionary history, resolving their position among the arthropods and some long-standing internal controversies such as the placement of the termites, twisted-winged insects, lice and fleas. However, despite the greatly increased size of phylogenomic datasets, contentious relationships among key insect clades remain unresolved. Further advances in insect phylogeny cannot rely on increased depth and breadth of genome and taxon sequencing. Improved modelling of the substitution process is fundamental to countering tree-reconstruction artefacts, while gene content, modelling of duplications and deletions, and comparative morphology all provide complementary lines of evidence to test hypotheses emerging from the analysis of sequence data. Finally, the integration of molecular and morphological data is key to the incorporation of fossil species within insect phylogeny. The emerging integrated framework of insect evolution will help explain the origins of insect megadiversity in terms of the evolution of their body plan, species diversity and ecology. Future studies of insect phylogeny should build upon an experimental, hypothesis-driven approach where the robustness of hypotheses generated is tested against increasingly realistic evolutionary models as well as complementary sources of phylogenetic evidence.
Collapse
Affiliation(s)
- Erik Tihelka
- School of Earth Sciences, University of Bristol, Bristol, UK; State Key Laboratory of Palaeobiology and Stratigraphy, Nanjing Institute of Geology and Palaeontology, and Centre for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Nanjing, China.
| | - Chenyang Cai
- School of Earth Sciences, University of Bristol, Bristol, UK; State Key Laboratory of Palaeobiology and Stratigraphy, Nanjing Institute of Geology and Palaeontology, and Centre for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Nanjing, China.
| | | | - Jesus Lozano-Fernandez
- School of Biological Sciences, University of Bristol, Bristol, UK; Institute of Evolutionary Biology (CSIC-UPF), Barcelona, Spain
| | - Omar Rota-Stabelli
- Research and Innovation Centre, Fondazione Edmund Mach, 38010 San Michele all Adige, Italy; Center Agriculture Food Environment, University of Trento, 38010 San Michele all Adige, Italy
| | - Diying Huang
- State Key Laboratory of Palaeobiology and Stratigraphy, Nanjing Institute of Geology and Palaeontology, and Centre for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Nanjing, China
| | - Michael S Engel
- Division of Entomology, Natural History Museum, University of Kansas, Lawrence, KS, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA
| | | | - Davide Pisani
- School of Earth Sciences, University of Bristol, Bristol, UK; School of Biological Sciences, University of Bristol, Bristol, UK.
| |
Collapse
|
23
|
Wick RR, Judd LM, Cerdeira LT, Hawkey J, Méric G, Vezina B, Wyres KL, Holt KE. Trycycler: consensus long-read assemblies for bacterial genomes. Genome Biol 2021; 22:266. [PMID: 34521459 PMCID: PMC8442456 DOI: 10.1186/s13059-021-02483-z] [Citation(s) in RCA: 166] [Impact Index Per Article: 55.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 08/31/2021] [Indexed: 01/23/2023] Open
Abstract
While long-read sequencing allows for the complete assembly of bacterial genomes, long-read assemblies contain a variety of errors. Here, we present Trycycler, a tool which produces a consensus assembly from multiple input assemblies of the same genome. Benchmarking showed that Trycycler assemblies contained fewer errors than assemblies constructed with a single tool. Post-assembly polishing further reduced errors and Trycycler+polishing assemblies were the most accurate genomes in our study. As Trycycler requires manual intervention, its output is not deterministic. However, we demonstrated that multiple users converge on similar assemblies that are consistently more accurate than those produced by automated assembly tools.
Collapse
Affiliation(s)
- Ryan R Wick
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia.
| | - Louise M Judd
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Louise T Cerdeira
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Jane Hawkey
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Guillaume Méric
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
- Cambridge Baker Systems Genomics Initiative, Baker Heart & Diabetes Institute, Melbourne, VIC, 3004, Australia
| | - Ben Vezina
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Kelly L Wyres
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Kathryn E Holt
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
- Department of Infection Biology, London School of Hygiene & Tropical Medicine, WC1E 7HT, London, UK
| |
Collapse
|
24
|
Urban JM, Foulk MS, Bliss JE, Coleman CM, Lu N, Mazloom R, Brown SJ, Spradling AC, Gerbi SA. High contiguity de novo genome assembly and DNA modification analyses for the fungus fly, Sciara coprophila, using single-molecule sequencing. BMC Genomics 2021; 22:643. [PMID: 34488624 PMCID: PMC8419958 DOI: 10.1186/s12864-021-07926-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Accepted: 08/08/2021] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The lower Dipteran fungus fly, Sciara coprophila, has many unique biological features that challenge the rule of genome DNA constancy. For example, Sciara undergoes paternal chromosome elimination and maternal X chromosome nondisjunction during spermatogenesis, paternal X elimination during embryogenesis, intrachromosomal DNA amplification of DNA puff loci during larval development, and germline-limited chromosome elimination from all somatic cells. Paternal chromosome elimination in Sciara was the first observation of imprinting, though the mechanism remains a mystery. Here, we present the first draft genome sequence for Sciara coprophila to take a large step forward in addressing these features. RESULTS We assembled the Sciara genome using PacBio, Nanopore, and Illumina sequencing. To find an optimal assembly using these datasets, we generated 44 short-read and 50 long-read assemblies. We ranked assemblies using 27 metrics assessing contiguity, gene content, and dataset concordance. The highest-ranking assemblies were scaffolded using BioNano optical maps. RNA-seq datasets from multiple life stages and both sexes facilitated genome annotation. A set of 66 metrics was used to select the first draft assembly for Sciara. Nearly half of the Sciara genome sequence was anchored into chromosomes, and all scaffolds were classified as X-linked or autosomal by coverage. CONCLUSIONS We determined that X-linked genes in Sciara males undergo dosage compensation. An entire bacterial genome from the Rickettsia genus, a group known to be endosymbionts in insects, was co-assembled with the Sciara genome, opening the possibility that Rickettsia may function in sex determination in Sciara. Finally, the signal level of the PacBio and Nanopore data support the presence of cytosine and adenine modifications in the Sciara genome, consistent with a possible role in imprinting.
Collapse
Affiliation(s)
- John M Urban
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University Division of Biology and Medicine, Sidney Frank Hall for Life Sciences, 185 Meeting Street, Providence, RI, 02912, USA.
- Department of Embryology, Carnegie Institution for Science, Howard Hughes Medical Institute Research Laboratories, 3520 San Martin Drive, Baltimore, MD, 21218, USA.
| | - Michael S Foulk
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University Division of Biology and Medicine, Sidney Frank Hall for Life Sciences, 185 Meeting Street, Providence, RI, 02912, USA
- Present Address: Department of Biology, Mercyhurst University, Erie, PA, 16546, USA
| | - Jacob E Bliss
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University Division of Biology and Medicine, Sidney Frank Hall for Life Sciences, 185 Meeting Street, Providence, RI, 02912, USA
| | - C Michelle Coleman
- KSU Bioinformatics Center, Kansas State University Division of Biology, Ackert Hall, Manhattan, Kansas, 66502, USA
| | - Nanyan Lu
- KSU Bioinformatics Center, Kansas State University Division of Biology, Ackert Hall, Manhattan, Kansas, 66502, USA
| | - Reza Mazloom
- KSU Bioinformatics Center, Kansas State University Division of Biology, Ackert Hall, Manhattan, Kansas, 66502, USA
| | - Susan J Brown
- KSU Bioinformatics Center, Kansas State University Division of Biology, Ackert Hall, Manhattan, Kansas, 66502, USA
| | - Allan C Spradling
- Department of Embryology, Carnegie Institution for Science, Howard Hughes Medical Institute Research Laboratories, 3520 San Martin Drive, Baltimore, MD, 21218, USA
| | - Susan A Gerbi
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University Division of Biology and Medicine, Sidney Frank Hall for Life Sciences, 185 Meeting Street, Providence, RI, 02912, USA.
| |
Collapse
|
25
|
Kayani MUR, Huang W, Feng R, Chen L. Genome-resolved metagenomics using environmental and clinical samples. Brief Bioinform 2021; 22:bbab030. [PMID: 33758906 PMCID: PMC8425419 DOI: 10.1093/bib/bbab030] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 11/29/2020] [Accepted: 01/20/2021] [Indexed: 12/25/2022] Open
Abstract
Recent advances in high-throughput sequencing technologies and computational methods have added a new dimension to metagenomic data analysis i.e. genome-resolved metagenomics. In general terms, it refers to the recovery of draft or high-quality microbial genomes and their taxonomic classification and functional annotation. In recent years, several studies have utilized the genome-resolved metagenome analysis approach and identified previously unknown microbial species from human and environmental metagenomes. In this review, we describe genome-resolved metagenome analysis as a series of four necessary steps: (i) preprocessing of the sequencing reads, (ii) de novo metagenome assembly, (iii) genome binning and (iv) taxonomic and functional analysis of the recovered genomes. For each of these four steps, we discuss the most commonly used tools and the currently available pipelines to guide the scientific community in the recovery and subsequent analyses of genomes from any metagenome sample. Furthermore, we also discuss the tools required for validation of assembly quality as well as for improving quality of the recovered genomes. We also highlight the currently available pipelines that can be used to automate the whole analysis without having advanced bioinformatics knowledge. Finally, we will highlight the most widely adapted and actively maintained tools and pipelines that can be helpful to the scientific community in decision making before they commence the analysis.
Collapse
Affiliation(s)
- Masood ur Rehman Kayani
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| | - Wanqiu Huang
- Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 200,000, China
| | - Ru Feng
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| | - Lei Chen
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| |
Collapse
|
26
|
Ayling M, Clark MD, Leggett RM. New approaches for metagenome assembly with short reads. Brief Bioinform 2021; 21:584-594. [PMID: 30815668 PMCID: PMC7299287 DOI: 10.1093/bib/bbz020] [Citation(s) in RCA: 100] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 01/31/2019] [Accepted: 02/01/2019] [Indexed: 02/07/2023] Open
Abstract
In recent years, the use of longer range read data combined with advances in assembly algorithms has stimulated big improvements in the contiguity and quality of genome assemblies. However, these advances have not directly transferred to metagenomic data sets, as assumptions made by the single genome assembly algorithms do not apply when assembling multiple genomes at varying levels of abundance. The development of dedicated assemblers for metagenomic data was a relatively late innovation and for many years, researchers had to make do using tools designed for single genomes. This has changed in the last few years and we have seen the emergence of a new type of tool built using different principles. In this review, we describe the challenges inherent in metagenomic assemblies and compare the different approaches taken by these novel assembly tools.
Collapse
Affiliation(s)
- Martin Ayling
- Earlham Institute, Norwich Research Park, Norwich, UK
| | | | | |
Collapse
|
27
|
Re-examination of two diatom reference genomes using long-read sequencing. BMC Genomics 2021; 22:379. [PMID: 34030633 PMCID: PMC8147415 DOI: 10.1186/s12864-021-07666-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 04/26/2021] [Indexed: 12/03/2022] Open
Abstract
Background The marine diatoms Thalassiosira pseudonana and Phaeodactylum tricornutum are valuable model organisms for exploring the evolution, diversity and ecology of this important algal group. Their reference genomes, published in 2004 and 2008, respectively, were the product of traditional Sanger sequencing. In the case of T. pseudonana, optical restriction site mapping was employed to further clarify and contextualize chromosome-level scaffolds. While both genomes are considered highly accurate and reasonably contiguous, they still contain many unresolved regions and unordered/unlinked scaffolds. Results We have used Oxford Nanopore Technologies long-read sequencing to update and validate the quality and contiguity of the T. pseudonana and P. tricornutum genomes. Fine-scale assessment of our long-read derived genome assemblies allowed us to resolve previously uncertain genomic regions, further characterize complex structural variation, and re-evaluate the repetitive DNA content of both genomes. We also identified 1862 previously undescribed genes in T. pseudonana. In P. tricornutum, we used transposable element detection software to identify 33 novel copia-type LTR-RT insertions, indicating ongoing activity and rapid expansion of this superfamily as the organism continues to be maintained in culture. Finally, Bionano optical mapping of P. tricornutum chromosomes was combined with long-read sequence data to explore the potential of long-read sequencing and optical mapping for resolving haplotypes. Conclusion Despite its potential to yield highly contiguous scaffolds, long-read sequencing is not a panacea. Even for relatively small nuclear genomes such as those investigated herein, repetitive DNA sequences cause problems for current genome assembly algorithms. Determining whether a long-read derived genomic assembly is ‘better’ than one produced using traditional sequence data is not straightforward. Our revised reference genomes for P. tricornutum and T. pseudonana nevertheless provide additional insight into the structure and evolution of both genomes, thereby providing a more robust foundation for future diatom research. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07666-3.
Collapse
|
28
|
Meyer F, Lesker TR, Koslicki D, Fritz A, Gurevich A, Darling AE, Sczyrba A, Bremges A, McHardy AC. Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit. Nat Protoc 2021; 16:1785-1801. [PMID: 33649565 DOI: 10.1038/s41596-020-00480-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 11/26/2020] [Indexed: 01/31/2023]
Abstract
Computational methods are key in microbiome research, and obtaining a quantitative and unbiased performance estimate is important for method developers and applied researchers. For meaningful comparisons between methods, to identify best practices and common use cases, and to reduce overhead in benchmarking, it is necessary to have standardized datasets, procedures and metrics for evaluation. In this tutorial, we describe emerging standards in computational meta-omics benchmarking derived and agreed upon by a larger community of researchers. Specifically, we outline recent efforts by the Critical Assessment of Metagenome Interpretation (CAMI) initiative, which supplies method developers and applied researchers with exhaustive quantitative data about software performance in realistic scenarios and organizes community-driven benchmarking challenges. We explain the most relevant evaluation metrics for assessing metagenome assembly, binning and profiling results, and provide step-by-step instructions on how to generate them. The instructions use simulated mouse gut metagenome data released in preparation for the second round of CAMI challenges and showcase the use of a repository of tool results for CAMI datasets. This tutorial will serve as a reference for the community and facilitate informative and reproducible benchmarking in microbiome research.
Collapse
Affiliation(s)
- Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Till-Robin Lesker
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Braunschweig, Germany
| | - David Koslicki
- Computer Science and Engineering, Biology, and The Huck Institutes of the Life Sciences, Penn State University, State College, PA, USA
| | - Adrian Fritz
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, Australia
| | - Alexander Sczyrba
- Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Andreas Bremges
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Braunschweig, Germany
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.
| |
Collapse
|
29
|
Complete and Circularized Bacterial Genome Sequence of Gordonia sp. Strain X0973. Microbiol Resour Announc 2021; 10:10/9/e01479-20. [PMID: 33664146 PMCID: PMC7936644 DOI: 10.1128/mra.01479-20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Gordonia sp. strain X0973 is a Gram-positive, weakly acid-fast, aerobic actinomycete obtained from a human abscess with Gordonia araii NBRC 100433T as its closest phylogenetic neighbor. Here, we report using Illumina MiSeq and PacBio reads to assemble the complete and circular genome sequence of 3.75 Mbp with 3,601 predicted coding sequences.
Collapse
|
30
|
Lipworth S, Pickford H, Sanderson N, Chau KK, Kavanagh J, Barker L, Vaughan A, Swann J, Andersson M, Jeffery K, Morgan M, Peto TEA, Crook DW, Stoesser N, Walker AS. Optimized use of Oxford Nanopore flowcells for hybrid assemblies. Microb Genom 2020; 6:mgen000453. [PMID: 33174830 PMCID: PMC7725331 DOI: 10.1099/mgen.0.000453] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 09/25/2020] [Indexed: 01/16/2023] Open
Abstract
Hybrid assemblies are highly valuable for studies of Enterobacteriaceae due to their ability to fully resolve the structure of mobile genetic elements, such as plasmids, which are involved in the carriage of clinically important genes (e.g. those involved in antimicrobial resistance/virulence). The widespread application of this technique is currently primarily limited by cost. Recent data have suggested that non-inferior, and even superior, hybrid assemblies can be produced using a fraction of the total output from a multiplexed nanopore [Oxford Nanopore Technologies (ONT)] flowcell run. In this study we sought to determine the optimal minimal running time for flowcells when acquiring reads for hybrid assembly. We then evaluated whether the ONT wash kit might allow users to exploit shorter running times by sequencing multiple libraries per flowcell. After 24 h of sequencing, most chromosomes and plasmids had circularized and there was no benefit associated with longer running times. Quality was similar at 12 h, suggesting that shorter running times are likely to be acceptable for certain applications (e.g. plasmid genomics). The ONT wash kit was highly effective in removing DNA between libraries. Contamination between libraries did not appear to affect subsequent hybrid assemblies, even when the same barcodes were used successively on a single flowcell. Utilizing shorter run times in combination with between-library nuclease washes allows at least 36 Enterobacteriaceae isolates to be sequenced per flowcell, significantly reducing the per-isolate sequencing cost. Ultimately this will facilitate large-scale studies utilizing hybrid assembly, advancing our understanding of the genomics of key human pathogens.
Collapse
Affiliation(s)
- Samuel Lipworth
- Modernising Medical Microbiology, Nuffield Department of Medicine, University of Oxford, UK
| | - Hayleah Pickford
- Modernising Medical Microbiology, Nuffield Department of Medicine, University of Oxford, UK
| | - Nicholas Sanderson
- Modernising Medical Microbiology, Nuffield Department of Medicine, University of Oxford, UK
- NIHR Oxford Biomedical Research Centre, Oxford, UK
| | - Kevin K. Chau
- Modernising Medical Microbiology, Nuffield Department of Medicine, University of Oxford, UK
| | - James Kavanagh
- Modernising Medical Microbiology, Nuffield Department of Medicine, University of Oxford, UK
| | - Leanne Barker
- Modernising Medical Microbiology, Nuffield Department of Medicine, University of Oxford, UK
| | - Alison Vaughan
- Modernising Medical Microbiology, Nuffield Department of Medicine, University of Oxford, UK
- NIHR Oxford Biomedical Research Centre, Oxford, UK
| | - Jeremy Swann
- Modernising Medical Microbiology, Nuffield Department of Medicine, University of Oxford, UK
- NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance at the University of Oxford in partnership with Public Health England, Oxford, UK
| | - Monique Andersson
- Department of Clinical Microbiology, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, UK
| | - Katie Jeffery
- Department of Clinical Microbiology, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, UK
| | - Marcus Morgan
- Department of Clinical Microbiology, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, UK
| | - Timothy E. A. Peto
- Modernising Medical Microbiology, Nuffield Department of Medicine, University of Oxford, UK
- NIHR Oxford Biomedical Research Centre, Oxford, UK
| | - Derrick W. Crook
- Modernising Medical Microbiology, Nuffield Department of Medicine, University of Oxford, UK
- NIHR Oxford Biomedical Research Centre, Oxford, UK
- Department of Clinical Microbiology, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, UK
| | - Nicole Stoesser
- Modernising Medical Microbiology, Nuffield Department of Medicine, University of Oxford, UK
- Department of Clinical Microbiology, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, UK
| | - A. Sarah Walker
- Modernising Medical Microbiology, Nuffield Department of Medicine, University of Oxford, UK
- NIHR Oxford Biomedical Research Centre, Oxford, UK
| |
Collapse
|
31
|
Mineeva O, Rojas-Carulla M, Ley RE, Schölkopf B, Youngblut ND. DeepMAsED: evaluating the quality of metagenomic assemblies. Bioinformatics 2020; 36:3011-3017. [PMID: 32096824 DOI: 10.1093/bioinformatics/btaa124] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 01/19/2020] [Accepted: 02/18/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies. RESULTS We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications. CONCLUSIONS DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects. AVAILABILITY AND IMPLEMENTATION DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Olga Mineeva
- Department of Empirical Inference, Max Planck Institute for Intelligent Systems, Tübingen 72076, Germany.,Department of Computer Science, ETH Zürich, Zürich 8092, Switzerland
| | - Mateo Rojas-Carulla
- Department of Empirical Inference, Max Planck Institute for Intelligent Systems, Tübingen 72076, Germany
| | - Ruth E Ley
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübingen 72076, Germany
| | - Bernhard Schölkopf
- Department of Empirical Inference, Max Planck Institute for Intelligent Systems, Tübingen 72076, Germany
| | - Nicholas D Youngblut
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübingen 72076, Germany
| |
Collapse
|
32
|
Complete and Circularized Genome Assemblies of the Kroppenstedtia eburnea Genus Type Strain and the Kroppenstedtia pulmonis Species Type Strain with MiSeq and MinION Sequence Data. Microbiol Resour Announc 2020; 9:9/44/e00650-20. [PMID: 33122418 PMCID: PMC7595940 DOI: 10.1128/mra.00650-20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Kroppenstedtia eburnea DSM 45196T and Kroppenstedtia pulmonis W9323T are aerobic, Gram-positive, filamentous, chemoorganotrophic thermoactinomycetes. Here, we report on the complete and circular genome assemblies generated using Illumina MiSeq and Oxford Nanopore Technologies MinION reads. Putative gene clusters predicted to be involved in the production of secondary metabolites were also identified.
Collapse
|
33
|
Padovani de Souza K, Setubal JC, Ponce de Leon F de Carvalho AC, Oliveira G, Chateau A, Alves R. Machine learning meets genome assembly. Brief Bioinform 2020; 20:2116-2129. [PMID: 30137230 DOI: 10.1093/bib/bby072] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 07/11/2018] [Accepted: 07/22/2018] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION With the recent advances in DNA sequencing technologies, the study of the genetic composition of living organisms has become more accessible for researchers. Several advances have been achieved because of it, especially in the health sciences. However, many challenges which emerge from the complexity of sequencing projects remain unsolved. Among them is the task of assembling DNA fragments from previously unsequenced organisms, which is classified as an NP-hard (nondeterministic polynomial time hard) problem, for which no efficient computational solution with reasonable execution time exists. However, several tools that produce approximate solutions have been used with results that have facilitated scientific discoveries, although there is ample room for improvement. As with other NP-hard problems, machine learning algorithms have been one of the approaches used in recent years in an attempt to find better solutions to the DNA fragment assembly problem, although still at a low scale. RESULTS This paper presents a broad review of pioneering literature comprising artificial intelligence-based DNA assemblers-particularly the ones that use machine learning-to provide an overview of state-of-the-art approaches and to serve as a starting point for further study in this field.
Collapse
Affiliation(s)
| | - João Carlos Setubal
- University of São Paulo, Brazil.,Department of Computer Science, University of São Paulo, Brazil
| | | | | | - Annie Chateau
- Vale Technology Institute-Sustainable Development, Brazil
| | - Ronnie Alves
- Federal University of Pará, Brazil.,University of Montpellier, LIRMM, France
| |
Collapse
|
34
|
Prussing C, Snavely EA, Singh N, Lapierre P, Lasek-Nesselquist E, Mitchell K, Haas W, Owsiak R, Nazarian E, Musser KA. Nanopore MinION Sequencing Reveals Possible Transfer of bla KPC-2 Plasmid Across Bacterial Species in Two Healthcare Facilities. Front Microbiol 2020; 11:2007. [PMID: 32973725 PMCID: PMC7466660 DOI: 10.3389/fmicb.2020.02007] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 07/29/2020] [Indexed: 11/13/2022] Open
Abstract
Carbapenemase-producing Enterobacteriaceae are a major threat to global public health. Klebsiella pneumoniae carbapenemase (KPC) is the most commonly identified carbapenemase in the United States and is frequently found on mobile genetic elements including plasmids, which can be horizontally transmitted between bacteria of the same or different species. Here we describe the results of an epidemiological investigation of KPC-producing bacteria at two healthcare facilities. Using a combination of short-read and long-read whole-genome sequencing, we identified an identical 44 kilobase plasmid carrying the bla KPC-2 gene in four bacterial isolates belonging to three different species (Citrobacter freundii, Klebsiella pneumoniae, and Escherichia coli). The isolates in this investigation were collected from patients who were epidemiologically linked in a region in which KPC was uncommon, suggesting that the antibiotic resistance plasmid was transmitted between these bacterial species. This investigation highlights the importance of long-read sequencing in investigating the relatedness of bacterial plasmids, and in elucidating potential plasmid-mediated outbreaks caused by antibiotic resistant bacteria.
Collapse
Affiliation(s)
- Catharine Prussing
- Wadsworth Center, New York State Department of Health, Albany, NY, United States
| | - Emily A. Snavely
- Wadsworth Center, New York State Department of Health, Albany, NY, United States
| | - Navjot Singh
- Wadsworth Center, New York State Department of Health, Albany, NY, United States
| | - Pascal Lapierre
- Wadsworth Center, New York State Department of Health, Albany, NY, United States
| | | | - Kara Mitchell
- Wadsworth Center, New York State Department of Health, Albany, NY, United States
| | - Wolfgang Haas
- Wadsworth Center, New York State Department of Health, Albany, NY, United States
| | - Rita Owsiak
- Maine Center for Disease Control and Prevention, Department of Health and Human Services, Augusta, ME, United States
| | - Elizabeth Nazarian
- Wadsworth Center, New York State Department of Health, Albany, NY, United States
| | - Kimberlee A. Musser
- Wadsworth Center, New York State Department of Health, Albany, NY, United States
| |
Collapse
|
35
|
Jung H, Jeon MS, Hodgett M, Waterhouse P, Eyun SI. Comparative Evaluation of Genome Assemblers from Long-Read Sequencing for Plants and Crops. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2020; 68:7670-7677. [PMID: 32530283 DOI: 10.1021/acs.jafc.0c01647] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
The availability of recent state-of-the-art long-read sequencing technologies has significantly increased the ease and speed of producing high-quality plant genome assemblies. A wide variety of genome-related software tools are now available and they are typically benchmarked using microbial or model eukaryotic genomes such as Arabidopsis and rice. However, many plant species have much larger and more complex genomes than these, and the choice of tools, parameters, and/or strategies that can be used is not always obvious. Thus, we have compared the metrics of assemblies generated by various pipelines to discuss how assembly quality can be affected by two different assembly strategies. First, we focused on optimizing read preprocessing and assembler variables using eight different de novo assemblers on five different Pacific Biosciences long-read datasets of diploid and tetraploid species. Then, we examined a single scaffolding tool (quickmerge) that has been employed for the postprocessing step. We then merged the outputs from multiple assemblies to produce a higher quality consensus assembly. Then, we benchmarked the assemblies for completeness and accuracy (assembly metrics and BUSCO), computer memory, and CPU times. Two lightweight assemblers, Miniasm/Minimap/Racon and WTDBG, were deemed good for novice users because they involved smaller required learning curves and light computational resources. However, two heavyweight tools, CANU and Flye, should be the first choice when the goal is to achieve accurate and complete assemblies. Our results will provide valuable guidance in future plant genome projects and beyond.
Collapse
Affiliation(s)
- Hyungtaek Jung
- Centre for Agriculture and Biocommodities, Queensland University of Technology, Brisbane, Queensland 4001, Australia
| | - Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul 06974, Korea
| | - Matthew Hodgett
- Information Technology Services, Queensland University of Technology, Brisbane, Queensland 4001, Australia
| | - Peter Waterhouse
- Centre for Agriculture and Biocommodities, Queensland University of Technology, Brisbane, Queensland 4001, Australia
| | - Seong-Il Eyun
- Department of Life Science, Chung-Ang University, Seoul 06974, Korea
| |
Collapse
|
36
|
Mikheenko A, Bzikadze AV, Gurevich A, Miga KH, Pevzner PA. TandemTools: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats. Bioinformatics 2020; 36:i75-i83. [PMID: 32657355 PMCID: PMC7355294 DOI: 10.1093/bioinformatics/btaa440] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Extra-long tandem repeats (ETRs) are widespread in eukaryotic genomes and play an important role in fundamental cellular processes, such as chromosome segregation. Although emerging long-read technologies have enabled ETR assemblies, the accuracy of such assemblies is difficult to evaluate since there are no tools for their quality assessment. Moreover, since the mapping of error-prone reads to ETRs remains an open problem, it is not clear how to polish draft ETR assemblies. RESULTS To address these problems, we developed the TandemTools software that includes the TandemMapper tool for mapping reads to ETRs and the TandemQUAST tool for polishing ETR assemblies and their quality assessment. We demonstrate that TandemTools not only reveals errors in ETR assemblies but also improves the recently generated assemblies of human centromeres. AVAILABILITY AND IMPLEMENTATION https://github.com/ablab/TandemTools. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA 92093, USA
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093, USA
| |
Collapse
|
37
|
Bohr LL, Mortimer TD, Pepperell CS. Lateral Gene Transfer Shapes Diversity of Gardnerella spp. Front Cell Infect Microbiol 2020; 10:293. [PMID: 32656099 PMCID: PMC7324480 DOI: 10.3389/fcimb.2020.00293] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 05/18/2020] [Indexed: 12/13/2022] Open
Abstract
Gardnerella spp. are pathognomonic for bacterial vaginosis, which increases the risk of preterm birth and the transmission of sexually transmitted infections. Gardnerella spp. are genetically diverse, comprising what have recently been defined as distinct species with differing functional capacities. Disease associations with Gardnerella spp. are not straightforward: patients with BV are usually infected with multiple species, and Gardnerella spp. are also found in the vaginal microbiome of healthy women. Genome comparisons of Gardnerella spp. show evidence of lateral gene transfer (LGT), but patterns of LGT have not been characterized in detail. Here we sought to define the role of LGT in shaping the genetic structure of Gardnerella spp. We analyzed whole genome sequencing data for 106 Gardnerella strains and used these data for pan genome analysis and to characterize LGT in the core and accessory genomes, over recent and remote timescales. In our diverse sample of Gardnerella strains, we found that both the core and accessory genomes are clearly differentiated in accordance with newly defined species designations. We identified putative competence and pilus assembly genes across most species; we also found them to be differentiated between species. Competence machinery has diverged in parallel with the core genome, with selection against deleterious mutations as a predominant influence on their evolution. By contrast, the virulence factor vaginolysin, which encodes a toxin, appears to be readily exchanged among species. We identified five distinct prophage clusters in Gardnerella genomes, two of which appear to be exchanged between Gardnerella species. Differences among species are apparent in their patterns of LGT, including their exchange with diverse gene pools. Despite frequent LGT and co-localization in the same niche, our results show that Gardnerella spp. are clearly genetically differentiated and yet capable of exchanging specific genetic material. This likely reflects complex interactions within bacterial communities associated with the vaginal microbiome. Our results provide insight into how such interactions evolve and are maintained, allowing these multi-species communities to colonize and invade human tissues and adapt to antibiotics and other stressors.
Collapse
Affiliation(s)
- Lindsey L Bohr
- Department of Medical Microbiology and Immunology, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States
| | - Tatum D Mortimer
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Caitlin S Pepperell
- Department of Medical Microbiology and Immunology, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States.,Department of Medicine, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|
38
|
Olson ND, Treangen TJ, Hill CM, Cepeda-Espinoza V, Ghurye J, Koren S, Pop M. Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes. Brief Bioinform 2020; 20:1140-1150. [PMID: 28968737 DOI: 10.1093/bib/bbx098] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Revised: 07/13/2017] [Indexed: 01/09/2023] Open
Abstract
Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation.
Collapse
|
39
|
Abstract
By using next-generation sequencing technologies, it is possible to quickly and inexpensively generate large numbers of relatively short reads from both the nuclear and mitochondrial DNA (mtDNA) contained in a biological sample. Unfortunately, assembling such whole-genome sequencing (WGS) data with standard de novo assemblers often fails to generate high-quality mitochondrial genome sequences due to the large difference in copy number (and hence sequencing depth) between the mitochondrial and nuclear genomes. Assembly of complete mitochondrial genome sequences is further complicated by the fact that many de novo assemblers are not designed for circular genomes and by the presence of repeats in the mitochondrial genomes of some species. In this article, we describe the Statistical Mitogenome Assembly with RepeaTs (SMART) pipeline for automated assembly of mitochondrial genomes from WGS data. SMART uses an efficient coverage-based filter to first select a subset of reads enriched in mtDNA sequences. Contigs produced by an initial assembly step are filtered using the Basic Local Alignment Search Tool searches against a comprehensive mitochondrial genome database and are used as "baits" for an alignment-based filter that produces the set of reads used in a second de novo assembly and scaffolding step. In the presence of repeats, the possible paths through the assembly graph are evaluated using a maximum likelihood model. Additionally, the assembly process is repeated for a user-specified number of times on resampled subsets of reads to select for annotation of the reconstructed sequences with highest bootstrap support. Experiments on WGS data sets from a variety of species show that the SMART pipeline produces complete circular mitochondrial genome sequences with a higher success rate than current state-of-the-art tools, particularly for low-coverage WGS data sets.
Collapse
Affiliation(s)
- Fahad Alqahtani
- Computer Science & Engineering Department, University of Connecticut, Storrs, Connecticut, USA.,National Center for Artificial Intelligence and Big Data Technology, King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
| | - Ion I Măndoiu
- Computer Science & Engineering Department, University of Connecticut, Storrs, Connecticut, USA
| |
Collapse
|
40
|
Luo Y, Liao X, Wu FX, Wang J. Computational Approaches for Transcriptome Assembly Based on Sequencing Technologies. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190410155603] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Transcriptome assembly plays a critical role in studying biological properties and
examining the expression levels of genomes in specific cells. It is also the basis of many
downstream analyses. With the increase of speed and the decrease in cost, massive sequencing
data continues to accumulate. A large number of assembly strategies based on different
computational methods and experiments have been developed. How to efficiently perform
transcriptome assembly with high sensitivity and accuracy becomes a key issue. In this work, the
issues with transcriptome assembly are explored based on different sequencing technologies.
Specifically, transcriptome assemblies with next-generation sequencing reads are divided into
reference-based assemblies and de novo assemblies. The examples of different species are used to
illustrate that long reads produced by the third-generation sequencing technologies can cover fulllength
transcripts without assemblies. In addition, different transcriptome assemblies using the
Hybrid-seq methods and other tools are also summarized. Finally, we discuss the future directions
of transcriptome assemblies.
Collapse
Affiliation(s)
- Yuwen Luo
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xingyu Liao
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan, Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
41
|
Nethery MA, Henriksen ED, Daughtry KV, Johanningsmeier SD, Barrangou R. Comparative genomics of eight Lactobacillus buchneri strains isolated from food spoilage. BMC Genomics 2019; 20:902. [PMID: 31775607 PMCID: PMC6881996 DOI: 10.1186/s12864-019-6274-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Accepted: 11/12/2019] [Indexed: 12/22/2022] Open
Abstract
Abstract Background Lactobacillus buchneri is a lactic acid bacterium frequently associated with food bioprocessing and fermentation and has been found to be either beneficial or detrimental to industrial food processes depending on the application. The ability to metabolize lactic acid into acetic acid and 1,2-propandiol makes L. buchneri invaluable to the ensiling process, however, this metabolic activity leads to spoilage in other applications, and is especially damaging to the cucumber fermentation industry. This study aims to augment our genomic understanding of L. buchneri in order to make better use of the species in a wide range of applicable industrial settings. Results Whole-genome sequencing (WGS) was performed on seven phenotypically diverse strains isolated from spoiled, fermented cucumber and the ATCC type strain for L. buchneri, ATCC 4005. Here, we present our findings from the comparison of eight newly-sequenced and assembled genomes against two publicly available closed reference genomes, L. buchneri CD034 and NRRL B-30929. Overall, we see ~ 50% of all coding sequences are conserved across these ten strains. When these coding sequences are clustered by functional description, the strains appear to be enriched in mobile genetic elements, namely transposons. All isolates harbor at least one CRISPR-Cas system, and many contain putative prophage regions, some of which are targeted by the host’s own DNA-encoded spacer sequences. Conclusions Our findings provide new insights into the genomics of L. buchneri through whole genome sequencing and subsequent characterization of genomic features, building a platform for future studies and identifying elements for potential strain manipulation or engineering.
Collapse
Affiliation(s)
- Matthew A Nethery
- Genomic Sciences Graduate Program, North Carolina State University, Raleigh, NC, USA.,Department of Food, Bioprocessing & Nutrition Sciences, North Carolina State University, Raleigh, NC, USA
| | | | - Katheryne V Daughtry
- Department of Food, Bioprocessing & Nutrition Sciences, North Carolina State University, Raleigh, NC, USA.,United States Department of Agriculture, Agricultural Research Service, Southeast Area, Food Science Research Unit, North Carolina State University, 322 Schaub Hall, Box 7624, Raleigh, NC, 27695-7624, USA
| | - Suzanne D Johanningsmeier
- United States Department of Agriculture, Agricultural Research Service, Southeast Area, Food Science Research Unit, North Carolina State University, 322 Schaub Hall, Box 7624, Raleigh, NC, 27695-7624, USA
| | - Rodolphe Barrangou
- Genomic Sciences Graduate Program, North Carolina State University, Raleigh, NC, USA. .,Department of Food, Bioprocessing & Nutrition Sciences, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
42
|
Royo-Llonch M, Sánchez P, González JM, Pedrós-Alió C, Acinas SG. Ecological and functional capabilities of an uncultured Kordia sp. Syst Appl Microbiol 2019; 43:126045. [PMID: 31831198 DOI: 10.1016/j.syapm.2019.126045] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 10/28/2019] [Accepted: 11/12/2019] [Indexed: 01/07/2023]
Abstract
Cultivable bacteria represent only a fraction of the diversity in microbial communities. However, the official procedures for classification and characterization of a novel prokaryotic species still rely on isolates. Nevertheless, due to single cell genomics, it is possible to retrieve genomes from environmental samples by sequencing them individually, and to assign specific genes to a specific taxon, regardless of their ability to grow in culture. In this study, a complete description was performed for uncultured Kordia sp. TARA_039_SRF, a proposed novel species within the genus Kordia, using culture-independent techniques. The type material was a high-quality draft genome (94.97% complete, 4.65% gene redundancy) co-assembled using ten nearly identical single amplified genomes (SAGs) from surface seawater in the North Indian Ocean during the Tara Oceans Expedition. The assembly process was optimized to obtain the best possible assembly metrics and a less fragmented genome. The closest relative of the species was Kordia periserrulae, which shared 97.56% similarity of the 16S rRNA gene, 75% orthologs and 89.13% average nucleotide identity. The functional potential of the proposed novel species included proteorhodopsin, the ability to incorporate nitrate, cytochrome oxidases with high affinity for oxygen, and CAZymes that were unique features within the genus. Its abundance at different depths and size fractions was also evaluated together with its functional annotation, revealing that its putative ecological niche could be particles of phytoplanktonic origin. It could putatively attach to these particles and consume them while sinking to the deeper and oxygen depleted layers of the North Indian Ocean.
Collapse
Affiliation(s)
- M Royo-Llonch
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (ICM), CSIC, Barcelona, Spain
| | - P Sánchez
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (ICM), CSIC, Barcelona, Spain
| | - J M González
- Department of Microbiology, University of La Laguna, La Laguna, Spain
| | - C Pedrós-Alió
- Systems Biology Program, Centro Nacional de Biotecnología (CNB), CSIC, Madrid, Spain
| | - S G Acinas
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (ICM), CSIC, Barcelona, Spain.
| |
Collapse
|
43
|
Athena: Automated Tuning of k-mer based Genomic Error Correction Algorithms using Language Models. Sci Rep 2019; 9:16157. [PMID: 31695060 PMCID: PMC6834855 DOI: 10.1038/s41598-019-52196-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 10/07/2019] [Indexed: 01/30/2023] Open
Abstract
The performance of most error-correction (EC) algorithms that operate on genomics reads is dependent on the proper choice of its configuration parameters, such as the value of k in k-mer based techniques. In this work, we target the problem of finding the best values of these configuration parameters to optimize error correction and consequently improve genome assembly. We perform this in an adaptive manner, adapted to different datasets and to EC tools, due to the observation that different configuration parameters are optimal for different datasets, i.e., from different platforms and species, and vary with the EC algorithm being applied. We use language modeling techniques from the Natural Language Processing (NLP) domain in our algorithmic suite, Athena, to automatically tune the performance-sensitive configuration parameters. Through the use of N-Gram and Recurrent Neural Network (RNN) language modeling, we validate the intuition that the EC performance can be computed quantitatively and efficiently using the “perplexity” metric, repurposed from NLP. After training the language model, we show that the perplexity metric calculated from a sample of the test (or production) data has a strong negative correlation with the quality of error correction of erroneous NGS reads. Therefore, we use the perplexity metric to guide a hill climbing-based search, converging toward the best configuration parameter value. Our approach is suitable for both de novo and comparative sequencing (resequencing), eliminating the need for a reference genome to serve as the ground truth. We find that Athena can automatically find the optimal value of k with a very high accuracy for 7 real datasets and using 3 different k-mer based EC algorithms, Lighter, Blue, and Racer. The inverse relation between the perplexity metric and alignment rate exists under all our tested conditions—for real and synthetic datasets, for all kinds of sequencing errors (insertion, deletion, and substitution), and for high and low error rates. The absolute value of that correlation is at least 73%. In our experiments, the best value of k found by Athena achieves an alignment rate within 0.53% of the oracle best value of k found through brute force searching (i.e., scanning through the entire range of k values). Athena’s selected value of k lies within the top-3 best k values using N-Gram models and the top-5 best k values using RNN models With best parameter selection by Athena, the assembly quality (NG50) is improved by a Geometric Mean of 4.72X across the 7 real datasets.
Collapse
|
44
|
Sydenham TV, Overballe-Petersen S, Hasman H, Wexler H, Kemp M, Justesen US. Complete hybrid genome assembly of clinical multidrug-resistant Bacteroides fragilis isolates enables comprehensive identification of antimicrobial-resistance genes and plasmids. Microb Genom 2019; 5:e000312. [PMID: 31697231 PMCID: PMC6927303 DOI: 10.1099/mgen.0.000312] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 10/17/2019] [Indexed: 02/06/2023] Open
Abstract
Bacteroides fragilis constitutes a significant part of the normal human gut microbiota and can also act as an opportunistic pathogen. Antimicrobial resistance (AMR) and the prevalence of AMR genes are increasing, and prediction of antimicrobial susceptibility based on sequence information could support targeted antimicrobial therapy in a clinical setting. Complete identification of insertion sequence (IS) elements carrying promoter sequences upstream of resistance genes is necessary for prediction of AMR. However, de novo assemblies from short reads alone are often fractured due to repeat regions and the presence of multiple copies of identical IS elements. Identification of plasmids in clinical isolates can aid in the surveillance of the dissemination of AMR, and comprehensive sequence databases support microbiome and metagenomic studies. We tested several short-read, hybrid and long-lead assembly pipelines by assembling the type strain B. fragilis CCUG4856T (=ATCC25285=NCTC9343) with Illumina short reads and long reads generated by Oxford Nanopore Technologies (ONT) MinION sequencing. Hybrid assembly with Unicycler, using quality filtered Illumina reads and Filtlong filtered and Canu-corrected ONT reads, produced the assembly of highest quality. This approach was then applied to six clinical multidrug-resistant B. fragilis isolates and, with minimal manual finishing of chromosomal assemblies of three isolates, complete, circular assemblies of all isolates were produced. Eleven circular, putative plasmids were identified in the six assemblies, of which only three corresponded to a known cultured Bacteroides plasmid. Complete IS elements could be identified upstream of AMR genes; however, there was not complete correlation between the absence of IS elements and antimicrobial susceptibility. As our knowledge on factors that increase expression of resistance genes in the absence of IS elements is limited, further research is needed prior to implementing AMR prediction for B. fragilis from whole-genome sequencing.
Collapse
Affiliation(s)
- Thomas V. Sydenham
- Research Unit of Clinical Microbiology, Department of Clinical Research, University of Southern Denmark, Odense, Denmark
- Department of Clinical Microbiology, Odense University Hospital, Odense, Denmark
- Department of Clinical Microbiology, Lillebaelt Hospital, Vejle, Denmark
| | | | - Henrik Hasman
- Bacteria, Parasites and Fungi, Statens Serum Institut, Copenhagen, Denmark
| | - Hannah Wexler
- GLAVA Health Care System and David Geffen School of Medicine, UCLA (University of California, Los Angeles), Los Angeles, CA, USA
| | - Michael Kemp
- Research Unit of Clinical Microbiology, Department of Clinical Research, University of Southern Denmark, Odense, Denmark
- Department of Clinical Microbiology, Odense University Hospital, Odense, Denmark
| | - Ulrik S. Justesen
- Research Unit of Clinical Microbiology, Department of Clinical Research, University of Southern Denmark, Odense, Denmark
- Department of Clinical Microbiology, Odense University Hospital, Odense, Denmark
| |
Collapse
|
45
|
De Maio N, Shaw LP, Hubbard A, George S, Sanderson ND, Swann J, Wick R, AbuOun M, Stubberfield E, Hoosdally SJ, Crook DW, Peto TEA, Sheppard AE, Bailey MJ, Read DS, Anjum MF, Walker AS, Stoesser N. Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. Microb Genom 2019; 5:e000294. [PMID: 31483244 PMCID: PMC6807382 DOI: 10.1099/mgen.0.000294] [Citation(s) in RCA: 121] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 08/19/2019] [Indexed: 01/23/2023] Open
Abstract
Illumina sequencing allows rapid, cheap and accurate whole genome bacterial analyses, but short reads (<300 bp) do not usually enable complete genome assembly. Long-read sequencing greatly assists with resolving complex bacterial genomes, particularly when combined with short-read Illumina data (hybrid assembly). However, it is not clear how different long-read sequencing methods affect hybrid assembly accuracy. Relative automation of the assembly process is also crucial to facilitating high-throughput complete bacterial genome reconstruction, avoiding multiple bespoke filtering and data manipulation steps. In this study, we compared hybrid assemblies for 20 bacterial isolates, including two reference strains, using Illumina sequencing and long reads from either Oxford Nanopore Technologies (ONT) or SMRT Pacific Biosciences (PacBio) sequencing platforms. We chose isolates from the family Enterobacteriaceae, as these frequently have highly plastic, repetitive genetic structures, and complete genome reconstruction for these species is relevant for a precise understanding of the epidemiology of antimicrobial resistance. We de novo assembled genomes using the hybrid assembler Unicycler and compared different read processing strategies, as well as comparing to long-read-only assembly with Flye followed by short-read polishing with Pilon. Hybrid assembly with either PacBio or ONT reads facilitated high-quality genome reconstruction, and was superior to the long-read assembly and polishing approach evaluated with respect to accuracy and completeness. Combining ONT and Illumina reads fully resolved most genomes without additional manual steps, and at a lower consumables cost per isolate in our setting. Automated hybrid assembly is a powerful tool for complete and accurate bacterial genome assembly.
Collapse
Affiliation(s)
- Nicola De Maio
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Liam P. Shaw
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Alasdair Hubbard
- Department of Tropical Disease Biology, Liverpool School of Tropical Medicine, Liverpool, L3 5QA, UK
| | - Sophie George
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- NIHR HPRU Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, UK
| | | | - Jeremy Swann
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Ryan Wick
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Melbourne, Australia
| | - Manal AbuOun
- Department of Bacteriology, Animal and Plant Health Agency, Addlestone, Surrey, KT15 3NB, UK
| | - Emma Stubberfield
- Department of Bacteriology, Animal and Plant Health Agency, Addlestone, Surrey, KT15 3NB, UK
| | | | - Derrick W. Crook
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- NIHR HPRU Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, UK
| | - Timothy E. A. Peto
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- NIHR HPRU Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, UK
| | - Anna E. Sheppard
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- NIHR HPRU Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, UK
| | - Mark J. Bailey
- Centre for Ecology & Hydrology, Benson Lane, Crowmarsh Gifford, Wallingford, OX10 8BB, UK
| | - Daniel S. Read
- Centre for Ecology & Hydrology, Benson Lane, Crowmarsh Gifford, Wallingford, OX10 8BB, UK
| | - Muna F. Anjum
- Department of Bacteriology, Animal and Plant Health Agency, Addlestone, Surrey, KT15 3NB, UK
| | - A. Sarah Walker
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- NIHR HPRU Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, UK
| | - Nicole Stoesser
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
46
|
Grosmaire M, Launay C, Siegwald M, Brugière T, Estrada-Virrueta L, Berger D, Burny C, Modolo L, Blaxter M, Meister P, Félix MA, Gouyon PH, Delattre M. Males as somatic investment in a parthenogenetic nematode. Science 2019; 363:1210-1213. [PMID: 30872523 DOI: 10.1126/science.aau0099] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 02/13/2019] [Indexed: 12/20/2022]
Abstract
We report the reproductive strategy of the nematode Mesorhabditis belari This species produces only 9% males, whose sperm is necessary to fertilize and activate the eggs. However, most of the fertilized eggs develop without using the sperm DNA and produce female individuals. Only in 9% of eggs is the male DNA utilized, producing sons. We found that mixing of parental genomes only gives rise to males because the Y-bearing sperm of males are much more competent than the X-bearing sperm for penetrating the eggs. In this previously unrecognized strategy, asexual females produce few sexual males whose genes never reenter the female pool. Here, production of males is of interest only if sons are more likely to mate with their sisters. Using game theory, we show that in this context, the production of 9% males by M. belari females is an evolutionary stable strategy.
Collapse
Affiliation(s)
- Manon Grosmaire
- Laboratoire de Biologie et Modélisation de la Cellule, Université de Lyon, ENS, UCBL, CNRS, INSERM, UMR 5239, U 1210, F-69364 Lyon, France
| | - Caroline Launay
- Laboratoire de Biologie et Modélisation de la Cellule, Université de Lyon, ENS, UCBL, CNRS, INSERM, UMR 5239, U 1210, F-69364 Lyon, France
| | - Marion Siegwald
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum national d'Histoire naturelle, CNRS, SU, EPHE, UA, CP 39, 57 rue Cuvier, 75005 Paris, France
| | - Thibault Brugière
- Laboratoire de Biologie et Modélisation de la Cellule, Université de Lyon, ENS, UCBL, CNRS, INSERM, UMR 5239, U 1210, F-69364 Lyon, France
| | - Lilia Estrada-Virrueta
- Laboratoire de Biologie et Modélisation de la Cellule, Université de Lyon, ENS, UCBL, CNRS, INSERM, UMR 5239, U 1210, F-69364 Lyon, France
| | - Duncan Berger
- The Ashworth Laboratories, Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Claire Burny
- Laboratoire de Biologie et Modélisation de la Cellule, Université de Lyon, ENS, UCBL, CNRS, INSERM, UMR 5239, U 1210, F-69364 Lyon, France.,Present address: Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna A-1210, Austria
| | - Laurent Modolo
- Laboratoire de Biologie et Modélisation de la Cellule, Université de Lyon, ENS, UCBL, CNRS, INSERM, UMR 5239, U 1210, F-69364 Lyon, France
| | - Mark Blaxter
- The Ashworth Laboratories, Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Peter Meister
- Cell Fate and Nuclear Organization, Institute of Cell Biology, University of Bern, 3012 Bern, Switzerland
| | - Marie-Anne Félix
- Département de Biologie, Ecole Normale Supérieure, IBENS, CNRS, Inserm, PSL Research University, 75005 Paris, France
| | - Pierre-Henri Gouyon
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum national d'Histoire naturelle, CNRS, SU, EPHE, UA, CP 39, 57 rue Cuvier, 75005 Paris, France
| | - Marie Delattre
- Laboratoire de Biologie et Modélisation de la Cellule, Université de Lyon, ENS, UCBL, CNRS, INSERM, UMR 5239, U 1210, F-69364 Lyon, France.
| |
Collapse
|
47
|
Plastome based phylogenetics and younger crown node age in Pelargonium. Mol Phylogenet Evol 2019; 137:33-43. [DOI: 10.1016/j.ympev.2019.03.021] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 03/23/2019] [Accepted: 03/25/2019] [Indexed: 11/20/2022]
|
48
|
Nieuwenhuis M, van de Peppel LJJ, Bakker FT, Zwaan BJ, Aanen DK. Enrichment of G4DNA and a Large Inverted Repeat Coincide in the Mitochondrial Genomes of Termitomyces. Genome Biol Evol 2019; 11:1857-1869. [PMID: 31209489 PMCID: PMC6609731 DOI: 10.1093/gbe/evz122] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/11/2019] [Indexed: 12/20/2022] Open
Abstract
Mitochondria retain their own genome, a hallmark of their bacterial ancestry. Mitochondrial genomes (mtDNA) are highly diverse in size, shape, and structure, despite their conserved function across most eukaryotes. Exploring extreme cases of mtDNA architecture can yield important information on fundamental aspects of genome biology. We discovered that the mitochondrial genomes of a basidiomycete fungus (Termitomyces spp.) contain an inverted repeat (IR), a duplicated region half the size of the complete genome. In addition, we found an abundance of sequences capable of forming G-quadruplexes (G4DNA); structures that can disrupt the double helical formation of DNA. G4DNA is implicated in replication fork stalling, double-stranded breaks, altered gene expression, recombination, and other effects. To determine whether this occurrence of IR and G4DNA was correlated within the genus Termitomyces, we reconstructed the mitochondrial genomes of 11 additional species including representatives of several closely related genera. We show that the mtDNA of all sampled species of Termitomyces and its sister group, represented by the species Tephrocybe rancida and Blastosporella zonata, are characterized by a large IR and enrichment of G4DNA. To determine whether high mitochondrial G4DNA content is common in fungi, we conducted the first broad survey of G4DNA content in fungal mtDNA, revealing it to be a highly variable trait. The results of this study provide important direction for future research on the function and evolution of G4DNA and organellar IRs.
Collapse
Affiliation(s)
| | | | - Freek T Bakker
- Biosystematics Group, Wageningen University & Research, The Netherlands
| | - Bas J Zwaan
- Laboratory of Genetics, Wageningen University & Research, The Netherlands
| | - Duur K Aanen
- Laboratory of Genetics, Wageningen University & Research, The Netherlands
| |
Collapse
|
49
|
Marijon P, Chikhi R, Varré JS. Graph analysis of fragmented long-read bacterial genome assemblies. Bioinformatics 2019; 35:4239-4246. [DOI: 10.1093/bioinformatics/btz219] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 02/19/2019] [Accepted: 03/26/2019] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Long-read genome assembly tools are expected to reconstruct bacterial genomes nearly perfectly; however, they still produce fragmented assemblies in some cases. It would be beneficial to understand whether these cases are intrinsically impossible to resolve, or if assemblers are at fault, implying that genomes could be refined or even finished with little to no additional experimental cost.
Results
We propose a set of computational techniques to assist inspection of fragmented bacterial genome assemblies, through careful analysis of assembly graphs. By finding paths of overlapping raw reads between pairs of contigs, we recover potential short-range connections between contigs that were lost during the assembly process. We show that our procedure recovers 45% of missing contig adjacencies in fragmented Canu assemblies, on samples from the NCTC bacterial sequencing project. We also observe that a simple procedure based on enumerating weighted Hamiltonian cycles can suggest likely contig orderings. In our tests, the correct contig order is ranked first in half of the cases and within the top-three predictions in nearly all evaluated cases, providing a direction for finishing fragmented long-read assemblies.
Availability and implementation
https://gitlab.inria.fr/pmarijon/knot .
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pierre Marijon
- Inria, Université de Lille, CNRS, Centrale Lille, UMR 9189 – CRIStAL, Lille F-59000, France
| | - Rayan Chikhi
- Institut Pasteur, C3BI USR 3756 IP CNRS, Paris, France
| | - Jean-Stéphane Varré
- Université de Lille, CNRS, Centrale Lille, Inria, UMR 9189 – CRIStAL, Lille F-59000, France
| |
Collapse
|
50
|
Complete Genome Sequence of Nocardia farcinica W6977 T Obtained by Combining Illumina and PacBio Reads. Microbiol Resour Announc 2019; 8:MRA01373-18. [PMID: 30687825 PMCID: PMC6346157 DOI: 10.1128/mra.01373-18] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 12/03/2018] [Indexed: 12/11/2022] Open
Abstract
The complete genome sequence of the Nocardia farcinica type strain was obtained by combining Illumina HiSeq and PacBio reads, producing a single 6.29-Mb chromosome and 2 circular plasmids. Bioinformatic analysis identified 5,991 coding sequences, including putative genes for virulence, microbial resistance, transposons, and biosynthesis gene clusters. The complete genome sequence of the Nocardia farcinica type strain was obtained by combining Illumina HiSeq and PacBio reads, producing a single 6.29-Mb chromosome and 2 circular plasmids. Bioinformatic analysis identified 5,991 coding sequences, including putative genes for virulence, microbial resistance, transposons, and biosynthesis gene clusters.
Collapse
|