1
|
Hernandez SI, Berezin CT, Miller KM, Peccoud SJ, Peccoud J. Sequencing Strategy to Ensure Accurate Plasmid Assembly. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.25.586694. [PMID: 38585828 PMCID: PMC10996661 DOI: 10.1101/2024.03.25.586694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Despite the wide use of plasmids in research and clinical production, the need to verify plasmid sequences is a bottleneck that is too often underestimated in the manufacturing process. Although sequencing platforms continue to improve, the method and assembly pipeline chosen still influence the final plasmid assembly sequence. Furthermore, few dedicated tools exist for plasmid assembly, especially for de novo assembly. Here, we evaluated short-read, long-read, and hybrid (both short and long reads) de novo assembly pipelines across three replicates of a 24-plasmid library. Consistent with previous characterizations of each sequencing technology, short-read assemblies had issues resolving GC-rich regions, and long-read assemblies commonly had small insertions and deletions, especially in repetitive regions. The hybrid approach facilitated the most accurate, consistent assembly generation and identified mutations relative to the reference sequence. Although Sanger sequencing can be used to verify specific regions, some GC-rich and repetitive regions were difficult to resolve using any method, suggesting that easily sequenced genetic parts should be prioritized in the design of new genetic constructs.
Collapse
Affiliation(s)
- Sarah I. Hernandez
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, 80523, United States of America
| | - Casey-Tyler Berezin
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, 80523, United States of America
| | - Katie M. Miller
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, 80523, United States of America
| | - Samuel J. Peccoud
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, 80523, United States of America
| | - Jean Peccoud
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, 80523, United States of America
| |
Collapse
|
2
|
Peccoud S, Berezin CT, Hernandez SI, Peccoud J. PlasCAT: Plasmid Cloud Assembly Tool. Bioinformatics 2024; 40:btae299. [PMID: 38696761 PMCID: PMC11101281 DOI: 10.1093/bioinformatics/btae299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 04/04/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open
Abstract
SUMMARY PlasCAT (Plasmid Cloud Assembly Tool) is an easy-to-use cloud-based bioinformatics tool that enables de novo plasmid sequence assembly from raw sequencing data. Nontechnical users can now assemble sequences from long reads and short reads without ever touching a line of code. PlasCAT uses high-performance computing servers to reduce run times on assemblies and deliver results faster. AVAILABILITY AND IMPLEMENTATION PlasCAT is freely available on the web at https://sequencing.genofab.com. The assembly pipeline source code and server code are available for download at https://bitbucket.org/genofabinc/workspace/projects/PLASCAT. Click the Cancel button to access the source code without authenticating. Web servers implemented in React.js and Python, with all major browsers supported.
Collapse
Affiliation(s)
| | - Casey-Tyler Berezin
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO 80523, United States
| | - Sarah I Hernandez
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO 80523, United States
| | - Jean Peccoud
- GenoFAB, Inc., Fort Collins, CO 80528, United States
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO 80523, United States
| |
Collapse
|
3
|
Sun L, Zhuang H, Chen M, Chen Y, Chen Y, Shi K, Yu Y. Vancomycin heteroresistance caused by unstable tandem amplifications of the vanM gene cluster on linear conjugative plasmids in a clinical Enterococcus faecium. Antimicrob Agents Chemother 2024; 68:e0115923. [PMID: 38506549 PMCID: PMC11064493 DOI: 10.1128/aac.01159-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 01/20/2024] [Indexed: 03/21/2024] Open
Abstract
Vancomycin heteroresistance is prone to missed detection and poses a risk of clinical treatment failure. We encountered one clinical Enterococcus faecium strain, SRR12, that carried a complete vanM gene cluster but was determined as susceptible to vancomycin using the broth microdilution method. However, distinct subcolonies appeared within the clear zone of inhibition in the E-test assay, one of which, named SRR12-v1, showed high-level resistance to vancomycin. SRR12 was confirmed as heteroresistant to vancomycin using population analysis profiling and displayed "revive" growth curves with a lengthy lag phase of over 13 hours when exposed to 2-32 mg/L vancomycin. The resistant subcolony SRR12-v1 was found to carry an identical vanM gene cluster to that of SRR12 but a significantly increased vanM copy number in the genome. Long-read whole genome sequencing revealed that a one-copy vanM gene cluster was located on a pELF1-like linear plasmid in SRR12. In comparison, tandem amplification of the vanM gene cluster jointed with IS1216E was seated on a linear plasmid in the genome of SRR12-v1. These amplifications of the vanM gene cluster were demonstrated as unstable and would decrease accompanied by fitness reversion after serial passaging for 50 generations under increasing vancomycin pressure or without antibiotic pressure but were relatively stable under constant vancomycin pressure. Further, vanM resistance in resistant variants was verified to be carried by conjugative plasmids with variable sizes using conjugation assays and S1-pulsed field gel electrophoresis blotting, suggesting the instability/flexibility of vanM cluster amplification in the genome and an increased risk of vanM resistance dissemination.
Collapse
Affiliation(s)
- Lingyan Sun
- Department of Laboratory Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Clinical In Vitro Diagnostic Techniques of Zhejiang Province, Hangzhou, China
- Institute of Laboratory Medicine, Zhejiang University, Hangzhou, China
| | - Hemu Zhuang
- Department of Respiratory and Critical Medicine, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Regional Medical Center for National Institute of Respiratory Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Mengzhen Chen
- Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Regional Medical Center for National Institute of Respiratory Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, China
| | - Yan Chen
- Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Regional Medical Center for National Institute of Respiratory Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, China
| | - Yiyi Chen
- Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Regional Medical Center for National Institute of Respiratory Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, China
| | - Keren Shi
- Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Regional Medical Center for National Institute of Respiratory Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, China
| | - Yunsong Yu
- Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Regional Medical Center for National Institute of Respiratory Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, China
| |
Collapse
|
4
|
Lerminiaux N, Fakharuddin K, Mulvey MR, Mataseje L. Do we still need Illumina sequencing data? Evaluating Oxford Nanopore Technologies R10.4.1 flow cells and the Rapid v14 library prep kit for Gram negative bacteria whole genome assemblies. Can J Microbiol 2024; 70:178-189. [PMID: 38354391 DOI: 10.1139/cjm-2023-0175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2024]
Abstract
The best whole genome assemblies are currently built from a combination of highly accurate short-read sequencing data and long-read sequencing data that can bridge repetitive and problematic regions. Oxford Nanopore Technologies (ONT) produce long-read sequencing platforms and they are continually improving their technology to obtain higher quality read data that is approaching the quality obtained from short-read platforms such as Illumina. As these innovations continue, we evaluated how much ONT read coverage produced by the Rapid Barcoding Kit v14 (SQK-RBK114) is necessary to generate high-quality hybrid and long-read-only genome assemblies for a panel of carbapenemase-producing Enterobacterales bacterial isolates. We found that 30× long-read coverage is sufficient if Illumina data are available, and that more (at least 100× long-read coverage is recommended for long-read-only assemblies. Illumina polishing is still improving single nucleotide variants (SNVs) and INDELs in long-read-only assemblies. We also examined if antimicrobial resistance genes could be accurately identified in long-read-only data, and found that Flye assemblies regardless of ONT coverage detected >96% of resistance genes at 100% identity and length. Overall, the Rapid Barcoding Kit v14 and long-read-only assemblies can be an optimal sequencing strategy (i.e., plasmid characterization and AMR detection) but finer-scale analyses (i.e., SNV) still benefit from short-read data.
Collapse
Affiliation(s)
- Nicole Lerminiaux
- National Microbiology Lab, Public Health Agency of Canada, Winnipeg, MB, Canada
| | - Ken Fakharuddin
- National Microbiology Lab, Public Health Agency of Canada, Winnipeg, MB, Canada
| | - Michael R Mulvey
- National Microbiology Lab, Public Health Agency of Canada, Winnipeg, MB, Canada
| | - Laura Mataseje
- National Microbiology Lab, Public Health Agency of Canada, Winnipeg, MB, Canada
| |
Collapse
|
5
|
Nieto-Rosado M, Sands K, Portal EAR, Thomson KM, Carvalho MJ, Mathias J, Milton R, Dyer C, Akpulu C, Boostrom I, Hogan P, Saif H, Sanches Ferreira AD, Hender T, Portal B, Andrews R, Watkins WJ, Zahra R, Shirazi H, Muhammad A, Ullah SN, Jan MH, Akif S, Iregbu KC, Modibbo F, Uwaezuoke S, Audu L, Edwin CP, Yusuf AH, Adeleye A, Mukkadas AS, Mazarati JB, Rucogoza A, Gaju L, Mehtar S, Bulabula ANH, Whitelaw A, Roberts L, Chan G, Bekele D, Solomon S, Abayneh M, Metaferia G, Walsh TR. Colonisation of hospital surfaces from low- and middle-income countries by extended spectrum β-lactamase- and carbapenemase-producing bacteria. Nat Commun 2024; 15:2758. [PMID: 38553439 PMCID: PMC10980694 DOI: 10.1038/s41467-024-46684-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 03/06/2024] [Indexed: 04/02/2024] Open
Abstract
Hospital surfaces can harbour bacterial pathogens, which may disseminate and cause nosocomial infections, contributing towards mortality in low- and middle-income countries (LMICs). During the BARNARDS study, hospital surfaces from neonatal wards were sampled to assess the degree of environmental surface and patient care equipment colonisation by Gram-negative bacteria (GNB) carrying antibiotic resistance genes (ARGs). Here, we perform PCR screening for extended-spectrum β-lactamases (blaCTX-M-15) and carbapenemases (blaNDM, blaOXA-48-like and blaKPC), MALDI-TOF MS identification of GNB carrying ARGs, and further analysis by whole genome sequencing of bacterial isolates. We determine presence of consistently dominant clones and their relatedness to strains causing neonatal sepsis. Higher prevalence of carbapenemases is observed in Pakistan, Bangladesh, and Ethiopia, compared to other countries, and are mostly found in surfaces near the sink drain. Klebsiella pneumoniae, Enterobacter hormaechei, Acinetobacter baumannii, Serratia marcescens and Leclercia adecarboxylata are dominant; ST15 K. pneumoniae is identified from the same ward on multiple occasions suggesting clonal persistence within the same environment, and is found to be identical to isolates causing neonatal sepsis in Pakistan over similar time periods. Our data suggests persistence of dominant clones across multiple time points, highlighting the need for assessment of Infection Prevention and Control guidelines.
Collapse
Affiliation(s)
- Maria Nieto-Rosado
- Department of Biology, Ineos Oxford Institute for Antimicrobial Research, University of Oxford, Oxford, UK.
- Division of Infection and Immunity, Cardiff University, Cardiff, UK.
| | - Kirsty Sands
- Department of Biology, Ineos Oxford Institute for Antimicrobial Research, University of Oxford, Oxford, UK
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
| | - Edward A R Portal
- Department of Biology, Ineos Oxford Institute for Antimicrobial Research, University of Oxford, Oxford, UK
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
| | - Kathryn M Thomson
- Department of Biology, Ineos Oxford Institute for Antimicrobial Research, University of Oxford, Oxford, UK
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
| | - Maria J Carvalho
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
- Department of Medical Sciences, Institute of Biomedicine, University of Aveiro, Aveiro, Portugal
| | - Jordan Mathias
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
| | - Rebecca Milton
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
- Centre for Trials Research, Cardiff University, Cardiff, UK
| | - Calie Dyer
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
- Centre for Trials Research, Cardiff University, Cardiff, UK
| | - Chinenye Akpulu
- Department of Biology, Ineos Oxford Institute for Antimicrobial Research, University of Oxford, Oxford, UK
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
| | - Ian Boostrom
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
| | - Patrick Hogan
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
| | - Habiba Saif
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
| | - Ana D Sanches Ferreira
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
- Parasites and Microbes Programme, Wellcome Sanger Institute Hinxton, Hinxton, UK
| | - Thomas Hender
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
| | - Barbra Portal
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
| | - Robert Andrews
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
| | - W John Watkins
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
| | - Rabaab Zahra
- Department of Microbiology, Quaid-i-Azam University, Islamabad, Pakistan
| | - Haider Shirazi
- Pakistan Institute of Medical Sciences, Islamabad, Pakistan
| | - Adil Muhammad
- Department of Microbiology, Quaid-i-Azam University, Islamabad, Pakistan
| | - Syed Najeeb Ullah
- Department of Microbiology, Quaid-i-Azam University, Islamabad, Pakistan
| | - Muhammad Hilal Jan
- Department of Microbiology, Quaid-i-Azam University, Islamabad, Pakistan
| | - Shermeen Akif
- Department of Microbiology, Quaid-i-Azam University, Islamabad, Pakistan
| | | | | | | | | | - Chinago P Edwin
- Department of Microbiology, Medway Maritime Hospital NHS Foundation Trust, Gillingham, Kent, UK
- Aminu Kano Teaching Hospital, Kano, Nigeria
| | | | - Adeola Adeleye
- Murtala Muhammad Specialist Hospital, Kano City, Nigeria
| | | | | | - Aniceth Rucogoza
- The National Reference Laboratory, Rwanda Biomedical Centre, Kigali, Rwanda
| | - Lucie Gaju
- The National Reference Laboratory, Rwanda Biomedical Centre, Kigali, Rwanda
| | - Shaheen Mehtar
- Unit of IPC, Stellenbosch University, Cape Town, South Africa
- Infection Control Africa Network, Cape Town, South Africa
| | - Andrew N H Bulabula
- Infection Control Africa Network, Cape Town, South Africa
- Department of Global Health, Stellenbosch University, Cape Town, South Africa
| | - Andrew Whitelaw
- Division of Medical Microbiology, Stellenbosch University, Cape Town, South Africa
- National Health Laboratory Service, Tygerberg Hospital, Cape Town, South Africa
| | - Lauren Roberts
- Division of Medical Microbiology, Stellenbosch University, Cape Town, South Africa
| | - Grace Chan
- Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pediatrics and Child Health, St Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia
| | - Delayehu Bekele
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, USA
- Department of Obstetrics and Gynecology, St Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia
| | - Semaria Solomon
- Department of Microbiology, Immunology and Parasitology, St Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia
| | - Mahlet Abayneh
- Department of Pediatrics and Child Health, St Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia
| | - Gesit Metaferia
- Department of Microbiology, Immunology and Parasitology, St Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia
| | - Timothy R Walsh
- Department of Biology, Ineos Oxford Institute for Antimicrobial Research, University of Oxford, Oxford, UK
- Division of Infection and Immunity, Cardiff University, Cardiff, UK
| |
Collapse
|
6
|
Schäfer L, Jehle JA, Kleespies RG, Wennmann JT. A practical guide and Galaxy workflow to avoid inter-plasmidic repeat collapse and false gene loss in Unicycler's hybrid assemblies. Microb Genom 2024; 10:001173. [PMID: 38197876 PMCID: PMC10868617 DOI: 10.1099/mgen.0.001173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 12/18/2023] [Indexed: 01/11/2024] Open
Abstract
Generating complete, high-quality genome assemblies is key for any downstream analysis, such as comparative genomics. For bacterial genome assembly, various algorithms and fully automated pipelines exist, which are free-of-charge and easily accessible. However, these assembly tools often cannot unambiguously resolve a bacterial genome, for example due to the presence of sequence repeat structures on the chromosome or on plasmids. Then, a more sophisticated approach and/or manual curation is needed. Such modifications can be challenging, especially for non-bioinformaticians, because they are generally not considered as a straightforward process. In this study, we propose a standardized approach for manual genome completion focusing on the popular hybrid assembly pipeline Unicycler. The provided Galaxy workflow addresses two weaknesses in Unicycler's hybrid assemblies: (i) collapse of inter-plasmidic repeats and (ii) false loss of single-copy sequences. To demonstrate and validate how to detect and resolve these assembly errors, we use two genomes from the Bacillus cereus group. By applying the proposed pipeline following an automated assembly, the genome sequence quality can be significantly improved.
Collapse
Affiliation(s)
- Lea Schäfer
- Julius Kühn Institute (JKI) – Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| | - Johannes A. Jehle
- Julius Kühn Institute (JKI) – Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| | - Regina G. Kleespies
- Julius Kühn Institute (JKI) – Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| | - Jörg T. Wennmann
- Julius Kühn Institute (JKI) – Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| |
Collapse
|
7
|
Lamas A, Garrido-Maestu A, Prieto A, Cepeda A, Franco CM. Whole genome sequencing in the palm of your hand: how to implement a MinION Galaxy-based workflow in a food safety laboratory for rapid Salmonella spp. serotyping, virulence, and antimicrobial resistance gene identification. Front Microbiol 2023; 14:1254692. [PMID: 38107857 PMCID: PMC10722185 DOI: 10.3389/fmicb.2023.1254692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 11/02/2023] [Indexed: 12/19/2023] Open
Abstract
Introduction Whole Genome Sequencing (WGS) implementation in food safety laboratories is a significant advancement in food pathogen control and outbreak tracking. However, the initial investment for acquiring next-generation sequencing platforms and the need for bioinformatic skills represented an obstacle for the widespread use of WGS. Long-reading technologies, such as the one developed by Oxford Nanopore Technologies, can be easily implemented with a minor initial investment and with simple protocols that can be performed with basic laboratory equipment. Methods Herein, we report a simple MinION Galaxy-based workflow with analysis parameters that allow its implementation in food safety laboratories with limited computer resources and without previous knowledge in bioinformatics for rapid Salmonella serotyping, virulence, and identification of antimicrobial resistance genes. For that purpose, the single use Flongle flow cells, along with the MinION Mk1B for WGS, and the community-driven web-based analysis platform Galaxy for bioinformatic analysis was used. Three strains belonging to three different serotypes, monophasic S. Typhimurium, S. Grancanaria, and S. Senftenberg, were sequenced. Results After 24 h of sequencing, enough coverage was achieved in order to perform de novo assembly in all three strains. After evaluating different tools, Flye de novo assemblies with medaka polishing were shown to be optimal for in silico Salmonella spp. serotyping with SISRT tool followed by antimicrobial and virulence gene identification with ABRicate. Discussion The implementation of the present workflow in food safety laboratories with limited computer resources allows a rapid characterization of Salmonella spp. isolates.
Collapse
Affiliation(s)
- Alexandre Lamas
- Food Hygiene, Inspection and Control Laboratory (Lhica), Department of Analytical Chemistry, Nutrition and Bromatology, Veterinary School, Universidade da Santiago de Compostela, Lugo, Spain
| | - Alejandro Garrido-Maestu
- Food Quality and Safety Research Group, International Iberian Nanotechnology Laboratory, Braga, Portugal
| | - Alberto Prieto
- Department of Animal Pathology (INVESAGA Group), Faculty of Veterinary Sciences, Universidade de Santiago de Compostela, Lugo, Spain
| | - Alberto Cepeda
- Food Hygiene, Inspection and Control Laboratory (Lhica), Department of Analytical Chemistry, Nutrition and Bromatology, Veterinary School, Universidade da Santiago de Compostela, Lugo, Spain
| | - Carlos Manuel Franco
- Food Hygiene, Inspection and Control Laboratory (Lhica), Department of Analytical Chemistry, Nutrition and Bromatology, Veterinary School, Universidade da Santiago de Compostela, Lugo, Spain
| |
Collapse
|
8
|
Sielemann J, Sielemann K, Brejová B, Vinař T, Chauve C. plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph. Front Microbiol 2023; 14:1267695. [PMID: 37869681 PMCID: PMC10587606 DOI: 10.3389/fmicb.2023.1267695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 09/08/2023] [Indexed: 10/24/2023] Open
Abstract
Identification of plasmids from sequencing data is an important and challenging problem related to antimicrobial resistance spread and other One-Health issues. We provide a new architecture for identifying plasmid contigs in fragmented genome assemblies built from short-read data. We employ graph neural networks (GNNs) and the assembly graph to propagate the information from nearby nodes, which leads to more accurate classification, especially for short contigs that are difficult to classify based on sequence features or database searches alone. We trained plASgraph2 on a data set of samples from the ESKAPEE group of pathogens. plASgraph2 either outperforms or performs on par with a wide range of state-of-the-art methods on testing sets of independent ESKAPEE samples and samples from related pathogens. On one hand, our study provides a new accurate and easy to use tool for contig classification in bacterial isolates; on the other hand, it serves as a proof-of-concept for the use of GNNs in genomics. Our software is available at https://github.com/cchauve/plasgraph2 and the training and testing data sets are available at https://github.com/fmfi-compbio/plasgraph2-datasets.
Collapse
Affiliation(s)
- Janik Sielemann
- Computational Biology, Faculty of Biology, Center for Biotechnology & Graduate School Digital Infrastructures for the Life Sciences (DILS), Bielefeld Institute for Bioinformatics Infrastructure, Bielefeld University, Bielefeld, Germany
| | - Katharina Sielemann
- Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology & Graduate School Digital Infrastructures for the Life Sciences (DILS), Bielefeld Institute for Bioinformatics Infrastructure, Bielefeld University, Bielefeld, Germany
| | - Broňa Brejová
- Department of Computer Science, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, Slovakia
| | - Tomáš Vinař
- Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, Slovakia
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
9
|
Zhao W, Zeng W, Pang B, Luo M, Peng Y, Xu J, Kan B, Li Z, Lu X. Oxford nanopore long-read sequencing enables the generation of complete bacterial and plasmid genomes without short-read sequencing. Front Microbiol 2023; 14:1179966. [PMID: 37256057 PMCID: PMC10225699 DOI: 10.3389/fmicb.2023.1179966] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 04/27/2023] [Indexed: 06/01/2023] Open
Abstract
Introduction Genome-based analysis is crucial in monitoring antibiotic-resistant bacteria (ARB)and antibiotic-resistance genes (ARGs). Short-read sequencing is typically used to obtain incomplete draft genomes, while long-read sequencing can obtain genomes of multidrug resistance (MDR) plasmids and track the transmission of plasmid-borne antimicrobial resistance genes in bacteria. However, long-read sequencing suffers from low-accuracy base calling, and short-read sequencing is often required to improve genome accuracy. This increases costs and turnaround time. Methods In this study, a novel ONT sequencing method is described, which uses the latest ONT chemistry with improved accuracy to assemble genomes of MDR strains and plasmids from long-read sequencing data only. Three strains of Salmonella carrying MDR plasmids were sequenced using the ONT SQK-LSK114 kit with flow cell R10.4.1, and de novo genome assembly was performed with average read accuracy (Q > 10) of 98.9%. Results and Discussion For a 5-Mb-long bacterial genome, finished genome sequences with accuracy of >99.99% could be obtained at 75× sequencing coverage depth using Flye and Medaka software. Thus, this new ONT method greatly improves base-calling accuracy, allowing for the de novo assembly of high-quality finished bacterial or plasmid genomes without the need for short-read sequencing. This saves both money and time and supports the application of ONT data in critical genome-based epidemiological analyses. The novel ONT approach described in this study can take the place of traditional combination genome assembly based on short- and long-read sequencing, enabling pangenomic analyses based on high-quality complete bacterial and plasmid genomes to monitor the spread of antibiotic-resistant bacteria and antibiotic resistance genes.
Collapse
Affiliation(s)
- Wenxuan Zhao
- National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Jinzhong, Shanxi, China
| | - Wei Zeng
- National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
- School of Public Health, Shandong University, Jinan, China
| | - Bo Pang
- National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Ming Luo
- Yulin Center for Disease Control and Prevention, Yulin, Shanxi, China
| | - Yao Peng
- National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Jialiang Xu
- School of Food and Chemical Engineering, Beijing Technology and Business University, Beijing, China
| | - Biao Kan
- National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
- School of Public Health, Shandong University, Jinan, China
| | - Zhenpeng Li
- National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Xin Lu
- National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| |
Collapse
|
10
|
Johnson J, Soehnlen M, Blankenship HM. Long read genome assemblers struggle with small plasmids. Microb Genom 2023; 9:mgen001024. [PMID: 37224062 PMCID: PMC10272865 DOI: 10.1099/mgen.0.001024] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 03/30/2023] [Indexed: 05/26/2023] Open
Abstract
Whole-genome sequencing has become a preferred method for studying bacterial plasmids, as it is generally assumed to capture the entire genome. However, long-read genome assemblers have been shown to sometimes miss plasmid sequences - an issue that has been associated with plasmid size. The purpose of this study was to investigate the relationship between plasmid size and plasmid recovery by the long-read-only assemblers, Flye, Raven, Miniasm, and Canu. This was accomplished by determining the number of times each assembler successfully recovered 33 plasmids, ranging from 1919 to 194 062 bp in size and belonging to 14 bacterial isolates from six bacterial genera, using Oxford Nanopore long reads. These results were additionally compared to plasmid recovery rates by the short-read-first assembler, Unicycler, using both Oxford Nanopore long reads and Illumina short reads. Results from this study indicate that Canu, Flye, Miniasm, and Raven are prone to missing plasmid sequences, whereas Unicycler was successful at recovering 100 % of plasmid sequences. Excluding Canu, most plasmid loss by long-read-only assemblers was due to failure to recover plasmids smaller than 10 kb. As such, it is recommended that Unicycler be used to increase the likelihood of plasmid recovery during bacterial genome assembly.
Collapse
Affiliation(s)
- Jared Johnson
- Michigan Department of Health and Human Services, Bureau of Laboratories, Lansing, MI, 48906, USA
| | - Marty Soehnlen
- Michigan Department of Health and Human Services, Bureau of Laboratories, Lansing, MI, 48906, USA
| | - Heather M. Blankenship
- Michigan Department of Health and Human Services, Bureau of Laboratories, Lansing, MI, 48906, USA
| |
Collapse
|
11
|
Foster-Nyarko E, Cottingham H, Wick RR, Judd LM, Lam MMC, Wyres KL, Stanton TD, Tsang KK, David S, Aanensen DM, Brisse S, Holt KE. Nanopore-only assemblies for genomic surveillance of the global priority drug-resistant pathogen, Klebsiella pneumoniae. Microb Genom 2023; 9:mgen000936. [PMID: 36752781 PMCID: PMC9997738 DOI: 10.1099/mgen.0.000936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023] Open
Abstract
Oxford Nanopore Technologies (ONT) sequencing has rich potential for genomic epidemiology and public health investigations of bacterial pathogens, particularly in low-resource settings and at the point of care, due to its portability and affordability. However, low base-call accuracy has limited the reliability of ONT data for critical tasks such as antimicrobial resistance (AMR) and virulence gene detection and typing, serotype prediction, and cluster identification. Thus, Illumina sequencing remains the standard for genomic surveillance despite higher capital and running costs. We tested the accuracy of ONT-only assemblies for common applied bacterial genomics tasks (genotyping and cluster detection, implemented via Kleborate, Kaptive and Pathogenwatch), using data from 54 unique Klebsiella pneumoniae isolates. ONT reads generated via MinION with R9.4.1 flowcells were basecalled using three alternative models [Fast, High-accuracy (HAC) and Super-accuracy (SUP), available within ONT's Guppy software], assembled with Flye and polished using Medaka. Accuracy of typing using ONT-only assemblies was compared with that of Illumina-only and hybrid ONT+Illumina assemblies, constructed from the same isolates as reference standards. The most resource-intensive ONT-assembly approach (SUP basecalling, with or without Medaka polishing) performed best, yielding reliable capsule (K) type calls for all strains (100 % exact or best matching locus), reliable multi-locus sequence type (MLST) assignment (98.3 % exact match or single-locus variants), and good detection of acquired AMR genes and mutations (88-100 % correct identification across the various drug classes). Distance-based trees generated from SUP+Medaka assemblies accurately reflected overall genetic relationships between isolates. The definition of outbreak clusters from ONT-only assemblies was problematic due to inflation of SNP counts by high base-call errors. However, ONT data could be reliably used to 'rule out' isolates of distinct lineages from suspected transmission clusters. HAC basecalling + Medaka polishing performed similarly to SUP basecalling without polishing. Therefore, we recommend investing compute resources into basecalling (SUP model), wherever compute resources and time allow, and note that polishing is also worthwhile for improved performance. Overall, our results show that MLST, K type and AMR determinants can be reliably identified with ONT-only R9.4.1 flowcell data. However, cluster detection remains challenging with this technology.
Collapse
Affiliation(s)
- Ebenezer Foster-Nyarko
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, London, UK
- *Correspondence: Ebenezer Foster-Nyarko,
| | - Hugh Cottingham
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Ryan R. Wick
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Louise M. Judd
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Margaret M. C. Lam
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Kelly L. Wyres
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Thomas D. Stanton
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, London, UK
| | - Kara K. Tsang
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, London, UK
| | - Sophia David
- Centre for Genomic Pathogen Surveillance, Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, Oxford University, Oxford OX3 7LF, UK
| | - David M. Aanensen
- Centre for Genomic Pathogen Surveillance, Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, Oxford University, Oxford OX3 7LF, UK
| | - Sylvain Brisse
- Institut Pasteur, Université Paris Cité, Biodiversity and Epidemiology of Bacterial Pathogens, Paris, France
| | - Kathryn E. Holt
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, London, UK
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| |
Collapse
|
12
|
Lang J. MAECI: A pipeline for generating consensus sequence with nanopore sequencing long-read assembly and error correction. PLoS One 2022; 17:e0267066. [PMID: 35594250 PMCID: PMC9122195 DOI: 10.1371/journal.pone.0267066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 05/09/2022] [Indexed: 11/18/2022] Open
Abstract
Nanopore sequencing produces long reads and offers unique advantages over next-generation sequencing, especially for the assembly of draft bacterial genomes with improved completeness. However, assembly errors can occur due to data characteristics and assembly algorithms. To address these issues, we developed MAECI, a pipeline for generating consensus sequences from multiple assemblies of the same nanopore sequencing data and error correction. Systematic evaluation showed that MAECI is an efficient and effective pipeline to improve the accuracy and completeness of bacterial genome assemblies. The available codes and implementation are at https://github.com/langjidong/MAECI.
Collapse
Affiliation(s)
- Jidong Lang
- Department of Bioinformatics, Qitan Technology (Beijing) Co., Ltd, Beijing, China
| |
Collapse
|