1
|
Do V, Nguyen S, Le D, Nguyen T, Nguyen C, Ho T, Vo N, Nguyen T, Nguyen H, Cao M. Pasa: leveraging population pangenome graph to scaffold prokaryote genome assemblies. Nucleic Acids Res 2024; 52:e15. [PMID: 38084888 PMCID: PMC10853769 DOI: 10.1093/nar/gkad1170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 11/07/2023] [Accepted: 11/22/2023] [Indexed: 02/10/2024] Open
Abstract
Whole genome sequencing has increasingly become the essential method for studying the genetic mechanisms of antimicrobial resistance and for surveillance of drug-resistant bacterial pathogens. The majority of bacterial genomes sequenced to date have been sequenced with Illumina sequencing technology, owing to its high-throughput, excellent sequence accuracy, and low cost. However, because of the short-read nature of the technology, these assemblies are fragmented into large numbers of contigs, hindering the obtaining of full information of the genome. We develop Pasa, a graph-based algorithm that utilizes the pangenome graph and the assembly graph information to improve scaffolding quality. By leveraging the population information of the bacteria species, Pasa is able to utilize the linkage information of the gene families of the species to resolve the contig graph of the assembly. We show that our method outperforms the current state of the arts in terms of accuracy, and at the same time, is computationally efficient to be applied to a large number of existing draft assemblies.
Collapse
Affiliation(s)
- Van Hoan Do
- Center for Applied Mathematics and Informatics, Le Quy Don Technical University, Hanoi, Vietnam
| | | | - Duc Quang Le
- Faculty of IT, Hanoi University of Civil Engineering, Hanoi, Vietnam
| | - Tam Thi Nguyen
- Oxford University Clinical Research Unit, Hanoi, Vietnam
| | - Canh Hao Nguyen
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan
| | - Tho Huu Ho
- Department of Medical Microbiology, The 103 Military Hospital, Vietnam Military Medical University, Hanoi, Vietnam
- Department of Genomics & Cytogenetics, Institute of Biomedicine & Pharmacy, Vietnam Military Medical University, Hanoi, Vietnam
| | - Nam S Vo
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam
| | | | | | | |
Collapse
|
2
|
Arredondo-Alonso S, Pöntinen AK, Cléon F, Gladstone RA, Schürch AC, Johnsen PJ, Samuelsen Ø, Corander J. A high-throughput multiplexing and selection strategy to complete bacterial genomes. Gigascience 2021; 10:giab079. [PMID: 34891160 PMCID: PMC8673558 DOI: 10.1093/gigascience/giab079] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 09/29/2021] [Accepted: 11/12/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Bacterial whole-genome sequencing based on short-read technologies often results in a draft assembly formed by contiguous sequences. The introduction of long-read sequencing technologies permits those contiguous sequences to be unambiguously bridged into complete genomes. However, the elevated costs associated with long-read sequencing frequently limit the number of bacterial isolates that can be long-read sequenced. Here we evaluated the recently released 96 barcoding kit from Oxford Nanopore Technologies (ONT) to generate complete genomes on a high-throughput basis. In addition, we propose an isolate selection strategy that optimizes a representative selection of isolates for long-read sequencing considering as input large-scale bacterial collections. RESULTS Despite an uneven distribution of long reads per barcode, near-complete chromosomal sequences (assembly contiguity = 0.89) were generated for 96 Escherichia coli isolates with associated short-read sequencing data. The assembly contiguity of the plasmid replicons was even higher (0.98), which indicated the suitability of the multiplexing strategy for studies focused on resolving plasmid sequences. We benchmarked hybrid and ONT-only assemblies and showed that the combination of ONT sequencing data with short-read sequencing data is still highly desirable (i) to perform an unbiased selection of isolates for long-read sequencing, (ii) to achieve an optimal genome accuracy and completeness, and (iii) to include small plasmids underrepresented in the ONT library. CONCLUSIONS The proposed long-read isolate selection ensures the completion of bacterial genomes that span the genome diversity inherent in large collections of bacterial isolates. We show the potential of using this multiplexing approach to close bacterial genomes on a high-throughput basis.
Collapse
Affiliation(s)
- Sergio Arredondo-Alonso
- Department of Biostatistics, University of Oslo, 0317, Oslo, Norway
- Parasites and Microbes, Wellcome Sanger Institute, Cambridgeshire CB10 1RQ, UK
| | - Anna K Pöntinen
- Department of Biostatistics, University of Oslo, 0317, Oslo, Norway
| | - François Cléon
- Department of Pharmacy, Faculty of Health Sciences, UiT The Arctic University of Norway, 9037, Tromsø, Norway
| | | | - Anita C Schürch
- Department of Medical Microbiology, UMC Utrecht, 3584 CX, Utrecht, the Netherlands
| | - Pål J Johnsen
- Department of Pharmacy, Faculty of Health Sciences, UiT The Arctic University of Norway, 9037, Tromsø, Norway
| | - Ørjan Samuelsen
- Department of Pharmacy, Faculty of Health Sciences, UiT The Arctic University of Norway, 9037, Tromsø, Norway
- Norwegian National Advisory Unit on Detection of Antimicrobial Resistance, Department of Microbiology and Infection Control, University Hospital of North Norway, 9038, Tromsø, Norway
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, 0317, Oslo, Norway
- Parasites and Microbes, Wellcome Sanger Institute, Cambridgeshire CB10 1RQ, UK
- Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, 02130, Espoo, Helsinki, Finland
| |
Collapse
|
3
|
Juma M, Sankaradoss A, Ndombi R, Mwaura P, Damodar T, Nazir J, Pandit A, Khurana R, Masika M, Chirchir R, Gachie J, Krishna S, Sowdhamini R, Anzala O, Meenakshi IS. Antimicrobial Resistance Profiling and Phylogenetic Analysis of Neisseria gonorrhoeae Clinical Isolates From Kenya in a Resource-Limited Setting. Front Microbiol 2021; 12:647565. [PMID: 34385981 PMCID: PMC8353456 DOI: 10.3389/fmicb.2021.647565] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 05/31/2021] [Indexed: 11/13/2022] Open
Abstract
Background Africa has one of the highest incidences of gonorrhea. Neisseria gonorrhoeae is gaining resistance to most of the available antibiotics, compromising treatment across the world. Whole-genome sequencing (WGS) is an efficient way of predicting AMR determinants and their spread in the population. Recent advances in next-generation sequencing technologies like Oxford Nanopore Technology (ONT) have helped in the generation of longer reads of DNA in a shorter duration with lower cost. Increasing accuracy of base-calling algorithms, high throughput, error-correction strategies, and ease of using the mobile sequencer MinION in remote areas lead to its adoption for routine microbial genome sequencing. To investigate whether MinION-only sequencing is sufficient for WGS and downstream analysis in resource-limited settings, we sequenced the genomes of 14 suspected N. gonorrhoeae isolates from Nairobi, Kenya. Methods Using WGS, the isolates were confirmed to be cases of N. gonorrhoeae (n = 9), and there were three co-occurrences of N. gonorrhoeae with Moraxella osloensis and N. meningitidis (n = 2). N. meningitidis has been implicated in sexually transmitted infections in recent years. The near-complete N. gonorrhoeae genomes (n = 10) were analyzed further for mutations/factors causing AMR using an in-house database of mutations curated from the literature. Results We observe that ciprofloxacin resistance is associated with multiple mutations in both gyrA and parC. Mutations conferring tetracycline (rpsJ) and sulfonamide (folP) resistance and plasmids encoding beta-lactamase were seen in all the strains, and tet(M)-containing plasmids were identified in nine strains. Phylogenetic analysis clustered the 10 isolates into clades containing previously sequenced genomes from Kenya and countries across the world. Based on homology modeling of AMR targets, we see that the mutations in GyrA and ParC disrupt the hydrogen bonding with quinolone drugs and mutations in FolP may affect interaction with the antibiotic. Conclusion Here, we demonstrate the utility of mobile DNA sequencing technology in producing a consensus genome for sequence typing and detection of genetic determinants of AMR. The workflow followed in the study, including AMR mutation dataset creation and the genome identification, assembly, and analysis, can be used for any clinical isolate. Further studies are required to determine the utility of real-time sequencing in outbreak investigations, diagnosis, and management of infections, especially in resource-limited settings.
Collapse
Affiliation(s)
- Meshack Juma
- KAVI Institute of Clinical Research, University of Nairobi, Nairobi, Kenya
| | - Arun Sankaradoss
- National Centre for Biological Sciences, Tata Institute of Fundamental Research (TIFR), Bengaluru, India
| | - Redcliff Ndombi
- KAVI Institute of Clinical Research, University of Nairobi, Nairobi, Kenya
| | - Patrick Mwaura
- KAVI Institute of Clinical Research, University of Nairobi, Nairobi, Kenya
| | - Tina Damodar
- National Centre for Biological Sciences, Tata Institute of Fundamental Research (TIFR), Bengaluru, India
| | - Junaid Nazir
- National Centre for Biological Sciences, Tata Institute of Fundamental Research (TIFR), Bengaluru, India
| | - Awadhesh Pandit
- National Centre for Biological Sciences, Tata Institute of Fundamental Research (TIFR), Bengaluru, India
| | - Rupsy Khurana
- National Centre for Biological Sciences, Tata Institute of Fundamental Research (TIFR), Bengaluru, India
| | - Moses Masika
- KAVI Institute of Clinical Research, University of Nairobi, Nairobi, Kenya
| | - Ruth Chirchir
- KAVI Institute of Clinical Research, University of Nairobi, Nairobi, Kenya
| | - John Gachie
- KAVI Institute of Clinical Research, University of Nairobi, Nairobi, Kenya
| | - Sudhir Krishna
- National Centre for Biological Sciences, Tata Institute of Fundamental Research (TIFR), Bengaluru, India.,School of Interdisciplinary Life Sciences, Indian Institute of Technology Goa, Ponda, India
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research (TIFR), Bengaluru, India
| | - Omu Anzala
- KAVI Institute of Clinical Research, University of Nairobi, Nairobi, Kenya
| | - Iyer S Meenakshi
- National Centre for Biological Sciences, Tata Institute of Fundamental Research (TIFR), Bengaluru, India
| |
Collapse
|