1
|
Wu X, Luo H, Ge C, Xu F, Deng X, Wiedmann M, Baker RC, Stevenson AE, Zhang G, Tang S. Evaluation of multiplex nanopore sequencing for Salmonella serotype prediction and antimicrobial resistance gene and virulence gene detection. Front Microbiol 2023; 13:1073057. [PMID: 36817104 PMCID: PMC9930645 DOI: 10.3389/fmicb.2022.1073057] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 12/22/2022] [Indexed: 02/04/2023] Open
Abstract
In a previous study, Multiplex-nanopore-sequencing based whole genome sequencing (WGS) allowed for accurate in silico serotype prediction of Salmonella within one day for five multiplexed isolates, using both SISTR and SeqSero2. Since only ten serotypes were tested in our previous study, the conclusions above were yet to be evaluated in a larger scale test. In the current study we evaluated this workflow with 69 Salmonella serotypes and also explored the feasibility of using multiplex-nanopore-sequencing based WGS for antimicrobial resistance gene (AMR) and virulence gene detection. We found that accurate in silico serotype prediction with nanopore-WGS data was achieved within about five hours of sequencing at a minimum of 30× Salmonella genome coverage, with SeqSero2 as the serotype prediction tool. For each tested isolate, small variations were observed between the AMR/virulence gene profiles from the Illumina and Nanopore sequencing platforms. Taking results generated using Illumina data as the benchmark, the average precision value per isolate was 0.99 for both AMR and virulence gene detection. We found that the resistance gene identifier - RGI identified AMR genes with nanopore data at a much lower accuracy compared to Abricate, possibly due to RGI's less stringent minimum similarity and coverage by default for database matching. This study is an evaluation of multiplex-nanopore-sequencing based WGS as a cost-efficient and rapid Salmonella classification method, and a starting point for future validation and verification of using it as a AMR/virulence gene profiling tool for the food industry. This study paves the way for the application of nanopore sequencing in surveillance, tracking, and risk assessment of Salmonella across the food supply chain.
Collapse
Affiliation(s)
- Xingwen Wu
- Mars Global Food Safety Center, Beijing, China
| | - Hao Luo
- Mars Global Food Safety Center, Beijing, China
| | - Chongtao Ge
- Mars Global Food Safety Center, Beijing, China
| | - Feng Xu
- Mars Global Food Safety Center, Beijing, China
| | - Xiangyu Deng
- Center for Food Safety, University of Georgia, Griffin, GA, United States
| | - Martin Wiedmann
- Department of Food Science, Cornell University, Ithaca, NY, United States
| | | | | | | | - Silin Tang
- Mars Global Food Safety Center, Beijing, China,*Correspondence: Silin Tang, ✉
| |
Collapse
|
2
|
Leff LG, Fasina K, Engohang-Ndong J. Detecting antibiotic resistance genes in anthropogenically impacted streams and rivers. Curr Opin Biotechnol 2023; 79:102878. [PMID: 36621219 DOI: 10.1016/j.copbio.2022.102878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 12/01/2022] [Accepted: 12/05/2022] [Indexed: 01/09/2023]
Abstract
Streams and rivers are widely impacted by human activities ranging from hydrological modifications to point and nonpoint pollution. Among the pollutants that enter lotic ecosystems are pharmaceuticals and personal care products, including antibiotics, that may play a role in the occurrence of antibiotic resistance genes (ARGs). Oftentimes, ARGs are detected based on culturing of bacteria or by using quantitative polymerase chain reaction; the limitations of these methods create barriers to our understanding. Use of more exhaustive methods, such as metagenomics, may overcome some of these barriers. The public health and ecological impacts of ARGs may be profound but are largely understudied. Antibiotic resistance is a growing concern for public health.
Collapse
Affiliation(s)
- Laura G Leff
- Department of Biological Sciences and School of Biomedical Sciences, Kent State University, Kent OH 44236, USA.
| | - Kolapo Fasina
- Department of Biological Sciences and School of Biomedical Sciences, Kent State University, Kent OH 44236, USA
| | - Jean Engohang-Ndong
- Department of Biological Sciences, Kent State University - Tuscarawas, 330 University DR. NE, New Philadelphia, OH 44663, USA
| |
Collapse
|
3
|
Senanayake A, Gamaarachchi H, Herath D, Ragel R. DeepSelectNet: deep neural network based selective sequencing for oxford nanopore sequencing. BMC Bioinformatics 2023; 24:31. [PMID: 36709261 PMCID: PMC9883605 DOI: 10.1186/s12859-023-05151-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 01/17/2023] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND Nanopore sequencing allows selective sequencing, the ability to programmatically reject unwanted reads in a sample. Selective sequencing has many present and future applications in genomics research and the classification of species from a pool of species is an example. Existing methods for selective sequencing for species classification are still immature and the accuracy highly varies depending on the datasets. For the five datasets we tested, the accuracy of existing methods varied in the range of [Formula: see text] 77 to 97% (average accuracy < 89%). Here we present DeepSelectNet, an accurate deep-learning-based method that can directly classify nanopore current signals belonging to a particular species. DeepSelectNet utilizes novel data preprocessing techniques and improved neural network architecture for regularization. RESULTS For the five datasets tested, DeepSelectNet's accuracy varied between [Formula: see text] 91 and 99% (average accuracy [Formula: see text] 95%). At its best performance, DeepSelectNet achieved a nearly 12% accuracy increase compared to its deep learning-based predecessor SquiggleNet. Furthermore, precision and recall evaluated for DeepSelectNet on average were always > 89% (average [Formula: see text] 95%). In terms of execution performance, DeepSelectNet outperformed SquiggleNet by [Formula: see text] 13% on average. Thus, DeepSelectNet is a practically viable method to improve the effectiveness of selective sequencing. CONCLUSIONS Compared to base alignment and deep learning predecessors, DeepSelectNet can significantly improve the accuracy to enable real-time species classification using selective sequencing. The source code of DeepSelectNet is available at https://github.com/AnjanaSenanayake/DeepSelectNet .
Collapse
Affiliation(s)
- Anjana Senanayake
- grid.11139.3b0000 0000 9816 8637Department of Computer Engineering, University of Peradeniya, Peradeniya, Sri Lanka
| | - Hasindu Gamaarachchi
- grid.415306.50000 0000 9983 6924Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia ,grid.1005.40000 0004 4902 0432School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
| | - Damayanthi Herath
- grid.11139.3b0000 0000 9816 8637Department of Computer Engineering, University of Peradeniya, Peradeniya, Sri Lanka
| | - Roshan Ragel
- grid.11139.3b0000 0000 9816 8637Department of Computer Engineering, University of Peradeniya, Peradeniya, Sri Lanka
| |
Collapse
|
4
|
Validation and Application of Long-Read Whole-Genome Sequencing for Antimicrobial Resistance Gene Detection and Antimicrobial Susceptibility Testing. Antimicrob Agents Chemother 2023; 67:e0107222. [PMID: 36533931 PMCID: PMC9872642 DOI: 10.1128/aac.01072-22] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Next-generation sequencing applications are increasingly used for detection and characterization of antimicrobial-resistant pathogens in clinical settings. Oxford Nanopore Technologies (ONT) sequencing offers advantages for clinical use compared with other sequencing methodologies because it enables real-time basecalling, produces long sequencing reads that increase the ability to correctly assemble DNA fragments, provides short turnaround times, and requires relatively uncomplicated sample preparation. A drawback of ONT sequencing, however, is its lower per-read accuracy than short-read sequencing. We sought to identify best practices in ONT sequencing protocols. As some variability in sequencing results may be introduced by the DNA extraction methodology, we tested three DNA extraction kits across three independent laboratories using a representative set of six bacterial isolates to investigate accuracy and reproducibility of ONT technology. All DNA extraction techniques showed comparable performance; however, the DNeasy PowerSoil Pro kit had the highest sequencing yield. This kit was subsequently applied to 42 sequentially collected bacterial isolates from blood cultures to assess Ares Genetics's pipelines for predictive whole-genome sequencing antimicrobial susceptibility testing (WGS-AST) performance compared to phenotypic triplicate broth microdilution results. WGS-AST results ranged across the organisms and resulted in an overall categorical agreement of 95% for penicillins, 82.4% for cephalosporins, 76.7% for carbapenems, 86.9% for fluoroquinolones, and 96.2% for aminoglycosides. Very major errors/major errors were 0%/16.7% (penicillins), 11.7%/3.6% (cephalosporins), 0%/24.4% (carbapenems), 2.5%/7.7% (fluoroquinolones), and 0%/4.1% (aminoglycosides), respectively. This work showed that, although additional refinements are necessary, ONT sequencing demonstrates potential as a method to perform WGS-AST on cultured isolates for patient care.
Collapse
|
5
|
Liu CC, Hsiao WWL. Large-scale comparative genomics to refine the organization of the global Salmonella enterica population structure. Microb Genom 2022; 8:mgen000906. [PMID: 36748524 PMCID: PMC9837569 DOI: 10.1099/mgen.0.000906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The White-Kauffmann-Le Minor (WKL) scheme is the most widely used Salmonella typing scheme for reporting the disease prevalence of the enteric pathogen. With the advent of whole-genome sequencing (WGS), in silico methods have increasingly replaced traditional serotyping due to reproducibility, speed and coverage. However, despite integrating genomic-based typing by in silico serotyping tools such as SISTR, in silico serotyping in certain contexts remains ambiguous and insufficiently informative. Specifically, in silico serotyping does not attempt to resolve polyphyly. Furthermore, in spite of the widespread acknowledgement of polyphyly from genomic studies, the prevalence of polyphyletic serovars is not well characterized. Here, we applied a genomics approach to acquire the necessary resolution to classify genetically discordant serovars and propose an alternative typing scheme that consistently reflect natural Salmonella populations. By accessing the unprecedented volume of bacterial genomic data publicly available in GenomeTrakr and PubMLST databases (>180 000 genomes representing 723 serovars), we characterized the global Salmonella population structure and systematically identified putative non-monophyletic serovars. The proportion of putative non-monophyletic serovars was estimated higher than previous reports, reinforcing the inability of antigenic determinants to depict the complexity of Salmonella evolutionary history. We explored the extent of genetic diversity masked by serotyping labels and found significant intra-serovar molecular differences across many clinically important serovars. To avoid false discovery due to incorrect in silico serotyping calls, we cross-referenced reported serovar labels and concluded a low error rate in in silico serotyping. The combined application of clustering statistics and genome-wide association methods demonstrated effective characterization of stable bacterial populations and explained functional differences. The collective methods adopted in our study have practical values in establishing genomic-based typing nomenclatures for an entire microbial species or closely related subpopulations. Ultimately, we foresee an improved typing scheme to be a hybrid that integrates both genomic and antigenic information such that the resolution from WGS is leveraged to improve the precision of subpopulation classification while preserving the common names defined by the WKL scheme.
Collapse
Affiliation(s)
- Chao Chun Liu
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada,Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - William W. L. Hsiao
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada,Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada,Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada,*Correspondence: William W. L. Hsiao,
| |
Collapse
|
6
|
Lin Y, Yang L, Qiu S, Yang C, Wang K, Li J, Jia L, Li P, Song H. Rapid Identification and Source Tracing of a Salmonella Typhimurium Outbreak in China by Metagenomic and Whole-Genome Sequencing. Foodborne Pathog Dis 2022; 19:259-265. [PMID: 35420907 DOI: 10.1089/fpd.2021.0072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Salmonella spp. are among the most prevalent foodborne pathogens. Rapid identification of etiologic agents during foodborne outbreaks is of great importance. In this study, we report a traceback investigation of a Salmonella outbreak in China. Metagenomic sequencing of suspected food samples was performed on MinION and MiSeq platforms. Real-time nanopore sequencing analysis identified reads belonging to the Enterobacteriaceae family. MiSeq sequencing identified 63 reads specifically mapped to Salmonella. Conventional methods including quantitative-PCR and culture-based isolation confirmed as Salmonella enterica serovar Typhimurium. The foodborne outbreak of Salmonella Typhimurium was further recognized by whole-genome sequencing and pulsed-field gel electrophoresis analysis. Our study demonstrates the ability of metagenomic sequencing to rapidly identify enteric pathogens directly from food samples. These results highlight the capacity of metagenomic sequencing to deliver actionable information rapidly and to expedite the tracing and identification of etiologic agents during foodborne outbreaks.
Collapse
Affiliation(s)
- Yanfeng Lin
- Academy of Military Medical Sciences, Academy of Military Sciences, Beijing, China.,Chinese PLA Center for Disease Control and Prevention, Beijing, China
| | - Lang Yang
- Chinese PLA Center for Disease Control and Prevention, Beijing, China
| | - Shaofu Qiu
- Chinese PLA Center for Disease Control and Prevention, Beijing, China
| | - Chaojie Yang
- Chinese PLA Center for Disease Control and Prevention, Beijing, China
| | - Kaiying Wang
- Chinese PLA Center for Disease Control and Prevention, Beijing, China
| | - Jinhui Li
- Chinese PLA Center for Disease Control and Prevention, Beijing, China
| | - Leili Jia
- Chinese PLA Center for Disease Control and Prevention, Beijing, China
| | - Peng Li
- Chinese PLA Center for Disease Control and Prevention, Beijing, China
| | - Hongbin Song
- Academy of Military Medical Sciences, Academy of Military Sciences, Beijing, China.,Chinese PLA Center for Disease Control and Prevention, Beijing, China
| |
Collapse
|
7
|
A microfluidic genoserotyping strategy for fast and objective identification of common Salmonella serotypes isolated from retail food samples in China. Anal Chim Acta 2022; 1201:339657. [DOI: 10.1016/j.aca.2022.339657] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 02/14/2022] [Accepted: 02/24/2022] [Indexed: 11/17/2022]
|
8
|
Lamb HJ, Hayes BJ, Randhawa IAS, Nguyen LT, Ross EM. Genomic prediction using low-coverage portable Nanopore sequencing. PLoS One 2021; 16:e0261274. [PMID: 34910782 PMCID: PMC8673642 DOI: 10.1371/journal.pone.0261274] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 11/26/2021] [Indexed: 11/18/2022] Open
Abstract
Most traits in livestock, crops and humans are polygenic, that is, a large number of loci contribute to genetic variation. Effects at these loci lie along a continuum ranging from common low-effect to rare high-effect variants that cumulatively contribute to the overall phenotype. Statistical methods to calculate the effect of these loci have been developed and can be used to predict phenotypes in new individuals. In agriculture, these methods are used to select superior individuals using genomic breeding values; in humans these methods are used to quantitatively measure an individual’s disease risk, termed polygenic risk scores. Both fields typically use SNP array genotypes for the analysis. Recently, genotyping-by-sequencing has become popular, due to lower cost and greater genome coverage (including structural variants). Oxford Nanopore Technologies’ (ONT) portable sequencers have the potential to combine the benefits genotyping-by-sequencing with portability and decreased turn-around time. This introduces the potential for in-house clinical genetic disease risk screening in humans or calculating genomic breeding values on-farm in agriculture. Here we demonstrate the potential of the later by calculating genomic breeding values for four traits in cattle using low-coverage ONT sequence data and comparing these breeding values to breeding values calculated from SNP arrays. At sequencing coverages between 2X and 4X the correlation between ONT breeding values and SNP array-based breeding values was > 0.92 when imputation was used and > 0.88 when no imputation was used. With an average sequencing coverage of 0.5x the correlation between the two methods was between 0.85 and 0.92 using imputation, depending on the trait. This suggests that ONT sequencing has potential for in clinic or on-farm genomic prediction, however, further work to validate these findings in a larger population still remains.
Collapse
Affiliation(s)
- Harrison J. Lamb
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
- * E-mail:
| | - Ben J. Hayes
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Imtiaz A. S. Randhawa
- School of Veterinary Science, The University of Queensland, Brisbane, QLD, Australia
| | - Loan T. Nguyen
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Elizabeth M. Ross
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
9
|
Thomassen GMB, Krych L, Knøchel S, Mehli L. ON-rep-seq as a rapid and cost-effective alternative to whole-genome sequencing for species-level identification and strain-level discrimination of Listeria monocytogenes contamination in a salmon processing plant. Microbiologyopen 2021; 10:e1246. [PMID: 34964295 PMCID: PMC8591450 DOI: 10.1002/mbo3.1246] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 10/19/2021] [Indexed: 12/28/2022] Open
Abstract
Identification, source tracking, and surveillance of food pathogens are crucial factors for the food-producing industry. Over the last decade, the techniques used for this have moved from conventional enrichment methods, through species-specific detection by PCR to sequencing-based methods, whole-genome sequencing (WGS) being the ultimate method. However, using WGS requires the right infrastructure, high computational power, and bioinformatics expertise. Therefore, there is a need for faster, more cost-effective, and more user-friendly methods. A newly developed method, ON-rep-seq, combines the classical rep-PCR method with nanopore sequencing, resulting in a highly discriminating set of sequences that can be used for species identification and also strain discrimination. This study is essentially a real industry case from a salmon processing plant. Twenty Listeria monocytogenes isolates were analyzed both by ON-rep-seq and WGS to identify and differentiate putative L. monocytogenes from a routine sampling of processing equipment and products, and finally, compare the strain-level discriminatory power of ON-rep-seq to different analyzing levels delivered from the WGS data. The analyses revealed that among the isolates tested there were three different strains. The isolates of the most frequently detected strain (n = 15) were all detected in the problematic area in the processing plant. The strain level discrimination done by ON-rep-seq was in full accordance with the interpretation of WGS data. Our findings also demonstrate that ON-rep-seq may serve as a primary screening method alternative to WGS for identification and strain-level differentiation for surveillance of potential pathogens in a food-producing environment.
Collapse
Affiliation(s)
| | - Lukasz Krych
- Department of Food ScienceUniversity of CopenhagenFrederiksbergDenmark
| | - Susanne Knøchel
- Department of Food ScienceUniversity of CopenhagenFrederiksbergDenmark
| | - Lisbeth Mehli
- Department of Biotechnology and Food ScienceNorwegian University of Science and Technology (NTNU)TrondheimNorway
| |
Collapse
|
10
|
Xu F, Ge C, Li S, Tang S, Wu X, Luo H, Deng X, Zhang G, Stevenson A, Baker RC. Evaluation of nanopore sequencing technology to identify Salmonella enterica Choleraesuis var. Kunzendorf and Orion var. 15 +, 34 . Int J Food Microbiol 2021; 346:109167. [PMID: 33774575 DOI: 10.1016/j.ijfoodmicro.2021.109167] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 03/04/2021] [Accepted: 03/08/2021] [Indexed: 02/06/2023]
Abstract
Our previous study demonstrated that whole genome sequencing (WGS) data generated by Oxford Nanopore Technologies (ONT) can be used for rapid and accurate prediction of selected Salmonella serotypes. However, one limitation is that established methods for WGS-based serotype prediction, utilizing data from either ONT or Illumina, cannot differentiate certain serotypes and serotype variants with the same or closely related antigenic formulae. This study aimed to evaluate nanopore sequencing and additional data analysis for identification of Salmonella enterica Choleraesuis var. Kunzendorf and S. enterica Orion var. 15+, 34+, thus overcoming this limitation. Five workflows that combined different flow cells, library construction methods and basecaller models were evaluated and compared. The workflow that consisted of the R9 flow cell, rapid sequencing library construction kit and guppy basecaller with base modified model performed best for Single Nucleotide Polymorphism (SNP) analysis. With this workflow, 99.98% of matching identity between assembled genomes from ONT and that from Illumina was achieved. Less than five high-quality SNPs differed when comparing sequencing data between ONT and Illumina. SNP typing successfully identified Choleraesuis var. Kunzendorf. While prophage prediction further differentiated Orion var. 15+, 34+ from the other two Orion variants. Our study improves the readiness of ONT as a Salmonella subtyping and source tracking tool for food industry applications.
Collapse
Affiliation(s)
- Feng Xu
- Mars Global Food Safety Center, Beijing 101407, China.
| | - Chongtao Ge
- Mars Global Food Safety Center, Beijing 101407, China.
| | - Shaoting Li
- Center for Food Safety, University of Georgia, Griffin, GA 30223, USA
| | - Silin Tang
- Mars Global Food Safety Center, Beijing 101407, China
| | - Xingwen Wu
- Mars Global Food Safety Center, Beijing 101407, China
| | - Hao Luo
- Mars Global Food Safety Center, Beijing 101407, China
| | - Xiangyu Deng
- Center for Food Safety, University of Georgia, Griffin, GA 30223, USA
| | | | | | | |
Collapse
|
11
|
Chen Z, Erickson DL, Meng J. Polishing the Oxford Nanopore long-read assemblies of bacterial pathogens with Illumina short reads to improve genomic analyses. Genomics 2021; 113:1366-1377. [PMID: 33716184 DOI: 10.1016/j.ygeno.2021.03.018] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 01/18/2021] [Accepted: 03/08/2021] [Indexed: 11/18/2022]
Abstract
Oxford Nanopore sequencing has been widely used to achieve complete genomes of bacterial pathogens. However, the error rates of Oxford Nanopore long reads are high. Various polishing algorithms using Illumina short reads to correct the errors in Oxford Nanopore long-read assemblies have been developed. The impact of polishing the Oxford Nanopore long-read assemblies of bacterial pathogens with Illumina short reads on improving genomic analyses was evaluated using both simulated and real reads. Ten species (10 strains) were selected for simulated reads, while real reads were tested on 11 species (11 strains). Oxford Nanopore long reads were assembled with Unicycler to produce a draft assembly, followed by three rounds of polishing with Illumina short reads using two polishing tools, Pilon and NextPolish. One round of NextPolish polishing generated genome completeness and accuracy parameters similar to the reference genomes, whereas two or three rounds of Pilon polishing were needed, though contiguity remained unchanged after polishing. The polished assemblies of Escherichia coli O157:H7, Salmonella Typhimurium, and Cronobacter sakazakii with simulated reads did not provide accurate plasmid identifications. One round of NextPolish polishing was needed for accurately identifying plasmids in Staphylococcus aureus and E. coli O26:H11 with real reads, whereas one and two rounds of Pilon polishing were necessary for these two strains, respectively. Polishing failed to provide an accurate antimicrobial resistance (AMR) genotype for S. aureus with real reads. One round of polishing recovered an accurate AMR genotype for Klebsiella pneumoniae with real reads. The reference genome and draft assembly of Citrobacter braakii with real reads differed, which carried blaCMY-83 and fosA6, respectively, while both genes were present after one round of polishing. However, polishing did not improve the assembly of E. coli O26:H11 with real reads to achieve numbers of virulence genes similar to the reference genome. The draft and polished assemblies showed a phylogenetic tree topology comparable with the reference genomes. For multilocus sequence typing and pan-genome analyses, one round of NextPolish polishing was sufficient to obtain accurate results, while two or three rounds of Pilon polishing were needed. Overall, NextPolish outperformed Pilon for polishing the Oxford Nanopore long-read assemblies of bacterial pathogens, though both polishing strategies improved genomic analyses compared to the draft assemblies.
Collapse
Affiliation(s)
- Zhao Chen
- Joint Institute for Food Safety and Applied Nutrition, Center for Food Safety and Security Systems, University of Maryland, College Park, MD 20742, USA
| | - David L Erickson
- Joint Institute for Food Safety and Applied Nutrition, Center for Food Safety and Security Systems, University of Maryland, College Park, MD 20742, USA
| | - Jianghong Meng
- Joint Institute for Food Safety and Applied Nutrition, Center for Food Safety and Security Systems, University of Maryland, College Park, MD 20742, USA; Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
12
|
Wu X, Luo H, Xu F, Ge C, Li S, Deng X, Wiedmann M, Baker RC, Stevenson A, Zhang G, Tang S. Evaluation of Salmonella Serotype Prediction With Multiplex Nanopore Sequencing. Front Microbiol 2021; 12:637771. [PMID: 33776971 PMCID: PMC7987803 DOI: 10.3389/fmicb.2021.637771] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 02/08/2021] [Indexed: 12/13/2022] Open
Abstract
The use of whole genome sequencing (WGS) data generated by the long-read sequencing platform Oxford Nanopore Technologies (ONT) has been shown to provide reliable results for Salmonella serotype prediction in a previous study. To further meet the needs of industry for accurate, rapid, and cost-efficient Salmonella confirmation and serotype classification, we evaluated the serotype prediction accuracy of using WGS data from multiplex ONT sequencing with three, four, five, seven, or ten Salmonella isolates (each isolate represented one Salmonella serotype) pooled in one R9.4.1 flow cell. Each multiplexing strategy was repeated with five flow cells, and the loaded samples were sequenced simultaneously in a GridION sequencer for 48 h. In silico serotype prediction was performed using both SeqSero2 (for raw reads and genome assemblies) and SISTR (for genome assemblies) software suites. An average of 10.63 Gbp of clean sequencing data was obtained per flow cell. We found that the unevenness of data yield among each multiplexed isolate was a major barrier for shortening sequencing time. Using genome assemblies, both SeqSero2 and SISTR accurately predicted all the multiplexed isolates under each multiplexing strategy when depth of genome coverage ≥50× for each isolate. We identified that cross-sample barcode assignment was a major cause of prediction errors when raw sequencing data were used for prediction. This study also demonstrated that, (i) sequence data generated by ONT multiplex sequencing can be used to simultaneously predict serotype for three to ten Salmonella isolates, (ii) with three to ten Salmonella isolates multiplexed, genome coverage at ≥50× per isolate was obtained within an average of 6 h of ONT multiplex sequencing, and (iii) with five isolates multiplexed, the cost per isolate might be reduced to 23% of that incurred with single ONT sequencing. This study is a starting point for future validation of multiplex ONT WGS as a cost-efficient and rapid Salmonella confirmation and serotype classification tool for the food industry.
Collapse
Affiliation(s)
- Xingwen Wu
- Mars Global Food Safety Center, Beijing, China
| | - Hao Luo
- Mars Global Food Safety Center, Beijing, China
| | - Feng Xu
- Mars Global Food Safety Center, Beijing, China
| | - Chongtao Ge
- Mars Global Food Safety Center, Beijing, China
| | - Shaoting Li
- Center for Food Safety, University of Georgia, Griffin, GA, United States
| | - Xiangyu Deng
- Center for Food Safety, University of Georgia, Griffin, GA, United States
| | - Martin Wiedmann
- Department of Food Science, Cornell University, Ithaca, NY, United States
| | | | | | | | - Silin Tang
- Mars Global Food Safety Center, Beijing, China
| |
Collapse
|
13
|
Baugher JD. SeroTools: a Python package for Salmonella serotype data analysis. JOURNAL OF OPEN SOURCE SOFTWARE 2020; 5:2556. [PMID: 33817546 PMCID: PMC8017488 DOI: 10.21105/joss.02556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Affiliation(s)
- Joseph D Baugher
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration
| |
Collapse
|