1
|
Jugas R, Vitkova H. ProcaryaSV: structural variation detection pipeline for bacterial genomes using short-read sequencing. BMC Bioinformatics 2024; 25:233. [PMID: 38982375 PMCID: PMC11234778 DOI: 10.1186/s12859-024-05843-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 06/13/2024] [Indexed: 07/11/2024] Open
Abstract
BACKGROUND Structural variations play an important role in bacterial genomes. They can mediate genome adaptation quickly in response to the external environment and thus can also play a role in antibiotic resistance. The detection of structural variations in bacteria is challenging, and the recognition of even small rearrangements can be important. Even though most detection tools are aimed at and benchmarked on eukaryotic genomes, they can also be used on prokaryotic genomes. The key features of detection are the ability to detect small rearrangements and support haploid genomes. Because of the limiting performance of a single detection tool, combining the detection abilities of multiple tools can lead to more robust results. There are already available workflows for structural variation detection for long-reads technologies and for the detection of single-nucleotide variation and indels, both aimed at bacteria. Yet we are unaware of structural variations detection workflows for the short-reads sequencing platform. Motivated by this gap we created our workflow. Further, we were interested in increasing the detection performance and providing more robust results. RESULTS We developed an open-source bioinformatics pipeline, ProcaryaSV, for the detection of structural variations in bacterial isolates from paired-end short sequencing reads. Multiple tools, starting with quality control and trimming of sequencing data, alignment to the reference genome, and multiple structural variation detection tools, are integrated. All the partial results are then processed and merged with an in-house merging algorithm. Compared with a single detection approach, ProcaryaSV has improved detection performance and is a reproducible easy-to-use tool. CONCLUSIONS The ProcaryaSV pipeline provides an integrative approach to structural variation detection from paired-end next-generation sequencing of bacterial samples. It can be easily installed and used on Linux machines. It is publicly available on GitHub at https://github.com/robinjugas/ProcaryaSV .
Collapse
Affiliation(s)
- Robin Jugas
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Helena Vitkova
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic.
| |
Collapse
|
2
|
Tam YL, Cameron S, Preston A, Cowley L. GWarrange: a pre- and post- genome-wide association studies pipeline for detecting phenotype-associated genome rearrangement events. Microb Genom 2024; 10. [PMID: 38980151 DOI: 10.1099/mgen.0.001268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024] Open
Abstract
The use of k-mers to capture genetic variation in bacterial genome-wide association studies (bGWAS) has demonstrated its effectiveness in overcoming the plasticity of bacterial genomes by providing a comprehensive array of genetic variants in a genome set that is not confined to a single reference genome. However, little attempt has been made to interpret k-mers in the context of genome rearrangements, partly due to challenges in the exhaustive and high-throughput identification of genome structure and individual rearrangement events. Here, we present GWarrange, a pre- and post-bGWAS processing methodology that leverages the unique properties of k-mers to facilitate bGWAS for genome rearrangements. Repeat sequences are common instigators of genome rearrangements through intragenomic homologous recombination, and they are commonly found at rearrangement boundaries. Using whole-genome sequences, repeat sequences are replaced by short placeholder sequences, allowing the regions flanking repeats to be incorporated into relatively short k-mers. Then, locations of flanking regions in significant k-mers are mapped back to complete genome sequences to visualise genome rearrangements. Four case studies based on two bacterial species (Bordetella pertussis and Enterococcus faecium) and a simulated genome set are presented to demonstrate the ability to identify phenotype-associated rearrangements. GWarrange is available at https://github.com/DorothyTamYiLing/GWarrange.
Collapse
Affiliation(s)
- Yi Ling Tam
- The Milner Centre for Evolution and Department of Life Sciences, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Sarah Cameron
- The Milner Centre for Evolution and Department of Life Sciences, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Andrew Preston
- The Milner Centre for Evolution and Department of Life Sciences, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Lauren Cowley
- The Milner Centre for Evolution and Department of Life Sciences, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| |
Collapse
|
3
|
Singh V, West G, Fiocchi C, Good CE, Katz J, Jacobs MR, Dichosa AEK, Flask C, Wesolowski M, McColl C, Grubb B, Ahmed S, Bank NC, Thamma K, Bederman I, Erokwu B, Yang X, Sundrud MS, Menghini P, Basson AR, Ezeji J, Viswanath SE, Veloo A, Sykes DB, Cominelli F, Rodriguez-Palacios A. Clonal Parabacteroides from Gut Microfistulous Tracts as Transmissible Cytotoxic Succinate-Commensal Model of Crohn's Disease Complications. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574896. [PMID: 38260564 PMCID: PMC10802508 DOI: 10.1101/2024.01.09.574896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Crohn's disease (CD) has been traditionally viewed as a chronic inflammatory disease that cause gut wall thickening and complications, including fistulas, by mechanisms not understood. By focusing on Parabacteroides distasonis (presumed modern succinate-producing commensal probiotic), recovered from intestinal microfistulous tracts (cavernous fistulous micropathologies CavFT proposed as intermediate between 'mucosal fissures' and 'fistulas') in two patients that required surgery to remove CD-damaged ilea, we demonstrate that such isolates exert pathogenic/pathobiont roles in mouse models of CD. Our isolates are clonally-related; potentially emerging as transmissible in the community and mice; proinflammatory and adapted to the ileum of germ-free mice prone to CD-like ileitis (SAMP1/YitFc) but not healthy mice (C57BL/6J), and cytotoxic/ATP-depleting to HoxB8-immortalized bone marrow derived myeloid cells from SAMP1/YitFc mice when concurrently exposed to succinate and extracts from CavFT-derived E. coli , but not to cells from healthy mice. With unique genomic features supporting recent genetic exchange with Bacteroides fragilis -BGF539, evidence of international presence in primarily human metagenome databases, these CavFT Pdis isolates could represent to a new opportunistic Parabacteroides species, or subspecies (' cavitamuralis' ) adapted to microfistulous niches in CD.
Collapse
|
4
|
Pereira Zanetti JP, Peres Oliveira L, Chindelevitch L, Meidanis J. Generalizations of the genomic rank distance to indels. Bioinformatics 2023; 39:7039678. [PMID: 36790056 PMCID: PMC9985151 DOI: 10.1093/bioinformatics/btad087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 12/25/2022] [Accepted: 02/13/2023] [Indexed: 02/16/2023] Open
Abstract
MOTIVATION The rank distance model represents genome rearrangements in multi-chromosomal genomes as matrix operations, which allows the reconstruction of parsimonious histories of evolution by rearrangements. We seek to generalize this model by allowing for genomes with different gene content, to accommodate a broader range of biological contexts. We approach this generalization by using a matrix representation of genomes. This leads to simple distance formulas and sorting algorithms for genomes with different gene contents, but without duplications. RESULTS We generalize the rank distance to genomes with different gene content in two different ways. The first approach adds insertions, deletions and the substitution of a single extremity to the basic operations. We show how to efficiently compute this distance. To avoid genomes with incomplete markers, our alternative distance, the rank-indel distance, only uses insertions and deletions of entire chromosomes. We construct phylogenetic trees with our distances and the DCJ-Indel distance for simulated data and real prokaryotic genomes, and compare them against reference trees. For simulated data, our distances outperform the DCJ-Indel distance using the Quartet metric as baseline. This suggests that rank distances are more robust for comparing distantly related species. For real prokaryotic genomes, all rearrangement-based distances yield phylogenetic trees that are topologically distant from the reference (65% similarity with Quartet metric), but are able to cluster related species within their respective clades and distinguish the Shigella strains as the farthest relative of the Escherichia coli strains, a feature not seen in the reference tree. AVAILABILITY AND IMPLEMENTATION Code and instructions are available at https://github.com/meidanis-lab/rank-indel. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Leonid Chindelevitch
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College, London, UK
| | - João Meidanis
- Institute of Computing, University of Campinas, Campinas, Brazil
| |
Collapse
|
5
|
D’Iorio M, Dewar K. Replication-associated inversions are the dominant form of bacterial chromosome structural variation. Life Sci Alliance 2022; 6:6/1/e202201434. [PMID: 36261227 PMCID: PMC9584773 DOI: 10.26508/lsa.202201434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 09/29/2022] [Accepted: 09/30/2022] [Indexed: 11/24/2022] Open
Abstract
The structural arrangements of bacterial chromosomes vary widely between closely related species and can result in significant phenotypic outcomes. The appearance of large-scale chromosomal inversions that are symmetric relative to markers for the origin of replication (OriC) has been previously observed; however, the overall prevalence of replication-associated structural rearrangements (RASRs) in bacteria and their causal mechanisms are currently unknown. Here, we systematically identify the locations of RASRs in species with multiple complete-sequenced genomes and investigate potential mediating biological mechanisms. We found that 247 of 313 species contained sequences with at least one large (>50 Kb) inversion in their sequence comparisons, and the aggregated inversion distances away from symmetry were normally distributed with a mean of zero. Many inversions that were offset from dnaA were found to be centered on a different marker for the OriC Instances of flanking repeats provide evidence that breaks formed during the replication process could be repaired to opposing positions. We also found a strong relationship between the later stages of replication and the range in distance variation from symmetry.
Collapse
Affiliation(s)
- Matthew D’Iorio
- Quantitative Life Sciences, McGill University, Montreal, Canada,Correspondence:
| | - Ken Dewar
- Department of Human Genetics, McGill University, Montreal, Canada,Centre for Microbiome Research, McGill University, Montreal, Canada
| |
Collapse
|
6
|
Cao S, Brandis G, Huseby DL, Hughes D. Positive selection during niche adaptation results in large-scale and irreversible rearrangement of chromosomal gene order in bacteria. Mol Biol Evol 2022; 39:6554941. [PMID: 35348727 PMCID: PMC9016547 DOI: 10.1093/molbev/msac069] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Analysis of bacterial genomes shows that, whereas diverse species share many genes in common, their linear order on the chromosome is often not conserved. Whereas rearrangements in gene order could occur by genetic drift, an alternative hypothesis is rearrangement driven by positive selection during niche adaptation (SNAP). Here, we provide the first experimental support for the SNAP hypothesis. We evolved Salmonella to adapt to growth on malate as the sole carbon source and followed the evolutionary trajectories. The initial adaptation to growth in the new environment involved the duplication of 1.66 Mb, corresponding to one-third of the Salmonella chromosome. This duplication is selected to increase the copy number of a single gene, dctA, involved in the uptake of malate. Continuing selection led to the rapid loss or mutation of duplicate genes from either copy of the duplicated region. After 2000 generations, only 31% of the originally duplicated genes remained intact and the gene order within the Salmonella chromosome has been significantly and irreversibly altered. These results experientially validate predictions made by the SNAP hypothesis and show that SNAP can be a strong driving force for rearrangements in chromosomal gene order.
Collapse
Affiliation(s)
- Sha Cao
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,These authors contributed equally: Sha Cao, Gerrit Brandis
| | - Gerrit Brandis
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden.,These authors contributed equally: Sha Cao, Gerrit Brandis
| | - Douglas L Huseby
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Diarmaid Hughes
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
7
|
Noureen M, Kawashima T, Arita M. Genetic Markers of Genome Rearrangements in Helicobacter pylori. Microorganisms 2021; 9:621. [PMID: 33802974 PMCID: PMC8002640 DOI: 10.3390/microorganisms9030621] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 03/11/2021] [Accepted: 03/12/2021] [Indexed: 11/16/2022] Open
Abstract
Helicobacter pylori exhibits a diverse genomic structure with high mutation and recombination rates. Various genetic elements function as drivers of this genomic diversity including genome rearrangements. Identifying the association of these elements with rearrangements can pave the way to understand its genome evolution. We analyzed the order of orthologous genes among 72 publicly available complete genomes to identify large genome rearrangements, and rearrangement breakpoints were compared with the positions of insertion sequences, genomic islands, and restriction modification genes. Comparison of the shared inversions revealed the conserved genomic elements across strains from different geographical locations. Some were region-specific and others were global, indicating that highly shared rearrangements and their markers were more ancestral than strain-or region-specific ones. The locations of genomic islands were an important factor for the occurrence of the rearrangements. Comparative genomics helps to evaluate the conservation of various elements contributing to the diversity across genomes.
Collapse
Affiliation(s)
- Mehwish Noureen
- Department of Genetics, SOKENDAI University, Yata 1111, Mishima 411-8540, Shizuoka, Japan;
| | - Takeshi Kawashima
- Bioinformation and DDBJ Center, National Institute of Genetics, Yata 1111, Mishima 411-8540, Shizuoka, Japan;
| | - Masanori Arita
- Bioinformation and DDBJ Center, National Institute of Genetics, Yata 1111, Mishima 411-8540, Shizuoka, Japan;
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama 230-0045, Kanagawa, Japan
| |
Collapse
|