1
|
Ortigas-Vasquez A, Szpara M. Embracing Complexity: What Novel Sequencing Methods Are Teaching Us About Herpesvirus Genomic Diversity. Annu Rev Virol 2024; 11:67-87. [PMID: 38848592 DOI: 10.1146/annurev-virology-100422-010336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2024]
Abstract
The arrival of novel sequencing technologies throughout the past two decades has led to a paradigm shift in our understanding of herpesvirus genomic diversity. Previously, herpesviruses were seen as a family of DNA viruses with low genomic diversity. However, a growing body of evidence now suggests that herpesviruses exist as dynamic populations that possess standing variation and evolve at much faster rates than previously assumed. In this review, we explore how strategies such as deep sequencing, long-read sequencing, and haplotype reconstruction are allowing scientists to dissect the genomic composition of herpesvirus populations. We also discuss the challenges that need to be addressed before a detailed picture of herpesvirus diversity can emerge.
Collapse
Affiliation(s)
- Alejandro Ortigas-Vasquez
- Departments of Biology and of Biochemistry and Molecular Biology; Center for Infectious Disease Dynamics; and Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA;
| | - Moriah Szpara
- Departments of Biology and of Biochemistry and Molecular Biology; Center for Infectious Disease Dynamics; and Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA;
| |
Collapse
|
2
|
Shaw J, Gounot JS, Chen H, Nagarajan N, Yu YW. Floria: fast and accurate strain haplotyping in metagenomes. Bioinformatics 2024; 40:i30-i38. [PMID: 38940183 PMCID: PMC11211831 DOI: 10.1093/bioinformatics/btae252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
SUMMARY Shotgun metagenomics allows for direct analysis of microbial community genetics, but scalable computational methods for the recovery of bacterial strain genomes from microbiomes remains a key challenge. We introduce Floria, a novel method designed for rapid and accurate recovery of strain haplotypes from short and long-read metagenome sequencing data, based on minimum error correction (MEC) read clustering and a strain-preserving network flow model. Floria can function as a standalone haplotyping method, outputting alleles and reads that co-occur on the same strain, as well as an end-to-end read-to-assembly pipeline (Floria-PL) for strain-level assembly. Benchmarking evaluations on synthetic metagenomes show that Floria is > 3× faster and recovers 21% more strain content than base-level assembly methods (Strainberry) while being over an order of magnitude faster when only phasing is required. Applying Floria to a set of 109 deeply sequenced nanopore metagenomes took <20 min on average per sample and identified several species that have consistent strain heterogeneity. Applying Floria's short-read haplotyping to a longitudinal gut metagenomics dataset revealed a dynamic multi-strain Anaerostipes hadrus community with frequent strain loss and emergence events over 636 days. With Floria, accurate haplotyping of metagenomic datasets takes mere minutes on standard workstations, paving the way for extensive strain-level metagenomic analyses. AVAILABILITY AND IMPLEMENTATION Floria is available at https://github.com/bluenote-1577/floria, and the Floria-PL pipeline is available at https://github.com/jsgounot/Floria_analysis_workflow along with code for reproducing the benchmarks.
Collapse
Affiliation(s)
- Jim Shaw
- Department of Mathematics, University of Toronto, Toronto, Ontario, M5S 2E4, Canada
| | - Jean-Sebastien Gounot
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Singapore, 138672, Republic of Singapore
| | - Hanrong Chen
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Singapore, 138672, Republic of Singapore
| | - Niranjan Nagarajan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Singapore, 138672, Republic of Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117597, Republic of Singapore
| | - Yun William Yu
- Department of Mathematics, University of Toronto, Toronto, Ontario, M5S 2E4, Canada
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, United States
| |
Collapse
|
3
|
Williamson CHD, Vazquez AJ, Nunnally AE, Kyger K, Fofanov VY, Furstenau TN, Hornstra HM, Terriquez J, Keim P, Sahl JW. ColiSeq: a multiplex amplicon assay that provides strain level resolution of Escherichia coli directly from clinical specimens. Microbiol Spectr 2024; 12:e0413923. [PMID: 38651881 PMCID: PMC11237721 DOI: 10.1128/spectrum.04139-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 04/01/2024] [Indexed: 04/25/2024] Open
Abstract
Escherichia coli is a diverse pathogen, causing a range of disease in humans, from self-limiting diarrhea to urinary tract infections (UTIs). Uropathogenic E. coli (UPEC) is the most frequently observed uropathogen in UTIs, a common disease in high-income countries, incurring billions of dollars yearly in treatment costs. Although E. coli is easily grown and identified in the clinical laboratory, genotyping the pathogen is more complicated, yet critical for reducing the incidence of disease. These goals can be achieved through whole-genome sequencing of E. coli isolates, but this approach is relatively slow and typically requires culturing the pathogen in the laboratory. To genotype E. coli rapidly and inexpensively directly from clinical samples, including but not limited to urine, we developed and validated a multiplex amplicon sequencing assay, called ColiSeq. The assay consists of targets designed for E. coli species confirmation, high resolution genotyping, and mixture deconvolution. To demonstrate its utility, we screened the ColiSeq assay against 230 clinical urine samples collected from a hospital system in Flagstaff, Arizona, USA. A limit of detection analysis demonstrated the ability of ColiSeq to identify E. coli at a concentration of ~2 genomic equivalent (GEs)/mL and to generate high-resolution genotyping at a concentration of 1 × 105 GEs/mL. The results of this study suggest that ColiSeq could be a valuable method to understand the source of UPEC strains and guide infection mitigation efforts. As sequence-based diagnostics become accepted in the clinical laboratory, workflows such as ColiSeq will provide actionable information to improve patient outcomes.IMPORTANCEUrinary tract infections (UTIs), caused primarily by Escherichia coli, create an enormous health care burden in the United States and other high-income countries. The early detection of E. coli from clinical samples, including urine, is important to target therapy and prevent further patient complications. Additionally, understanding the source of E. coli exposure will help with future mitigation efforts. In this study, we developed, tested, and validated an amplicon sequencing assay focused on direct detection of E. coli from urine. The resulting sequence data were demonstrated to provide strain level resolution of the pathogen, not only confirming the presence of E. coli, which can focus treatment efforts, but also providing data needed for source attribution and contact tracing. This assay will generate inexpensive, rapid, and reproducible data that can be deployed by public health agencies to track, diagnose, and potentially mitigate future UTIs caused by E. coli.
Collapse
Affiliation(s)
| | - Adam J. Vazquez
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, USA
| | - Amalee E. Nunnally
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, USA
| | - Kristen Kyger
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, USA
| | - Viacheslav Y. Fofanov
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, USA
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, Arizona, USA
| | - Tara N. Furstenau
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, USA
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, Arizona, USA
| | - Heidie M. Hornstra
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, USA
| | | | - Paul Keim
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, USA
| | - Jason W. Sahl
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, USA
| |
Collapse
|
4
|
Ju N, Liu J, He Q. SNP-slice resolves mixed infections: simultaneously unveiling strain haplotypes and linking them to hosts. Bioinformatics 2024; 40:btae344. [PMID: 38885409 PMCID: PMC11187496 DOI: 10.1093/bioinformatics/btae344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 05/09/2024] [Accepted: 06/14/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Multi-strain infection is a common yet under-investigated phenomenon of many pathogens. Currently, biologists analyzing SNP information sometimes have to discard mixed infection samples as many downstream analyses require monogenomic inputs. Such a protocol impedes our understanding of the underlying genetic diversity, co-infection patterns, and genomic relatedness of pathogens. A scalable tool to learn and resolve the SNP-haplotypes from polygenomic data is an urgent need in molecular epidemiology. RESULTS We develop a slice sampling Markov Chain Monte Carlo algorithm, named SNP-Slice, to learn not only the SNP-haplotypes of all strains in the populations but also which strains infect which hosts. Our method reconstructs SNP-haplotypes and individual heterozygosities accurately without reference panels and outperforms the state-of-the-art methods at estimating the multiplicity of infections and allele frequencies. Thus, SNP-Slice introduces a novel approach to address polygenomic data and opens a new avenue for resolving complex infection patterns in molecular surveillance. We illustrate the performance of SNP-Slice on empirical malaria and HIV datasets and provide recommendations for using our method on empirical datasets. AVAILABILITY AND IMPLEMENTATION The implementation of the SNP-Slice algorithm, as well as scripts to analyze SNP-Slice outputs, are available at https://github.com/nianqiaoju/snp-slice.
Collapse
Affiliation(s)
- Nianqiao Ju
- Department of Statistics, Purdue University, West Lafayette, IN 47907, United States
| | - Jiawei Liu
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, United States
| | - Qixin He
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, United States
| |
Collapse
|
5
|
Ventolero M, Wang S, Hu H, Li X. Are the predicted known bacterial strains in a sample really present? A case study. PLoS One 2023; 18:e0291964. [PMID: 37831725 PMCID: PMC10575510 DOI: 10.1371/journal.pone.0291964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 09/10/2023] [Indexed: 10/15/2023] Open
Abstract
With mutations constantly accumulating in bacterial genomes, it is unclear whether the previously identified bacterial strains are really present in an extant sample. To address this question, we did a case study on the known strains of the bacterial species S. aureus and S. epidermis in 68 atopic dermatitis shotgun metagenomic samples. We evaluated the likelihood of the presence of all sixteen known strains predicted in the original study and by two popular tools in this study. We found that even with the same tool, only two known strains were predicted by the original study and this study. Moreover, none of the sixteen known strains was likely present in these 68 samples. Our study thus indicates the limitation of the known-strain-based studies, especially those on rapidly evolving bacterial species. It implies the unlikely presence of the previously identified known strains in a current environmental sample. It also called for de novo bacterial strain identification directly from shotgun metagenomic reads.
Collapse
Affiliation(s)
- Minerva Ventolero
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, Florida, United States of America
| | - Saidi Wang
- Department of Computer Science, University of Central Florida, Orlando, Florida, United States of America
| | - Haiyan Hu
- Department of Computer Science, Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, Florida, United States of America
| | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, Florida, United States of America
| |
Collapse
|
6
|
Liao H, Ji Y, Sun Y. High-resolution strain-level microbiome composition analysis from short reads. MICROBIOME 2023; 11:183. [PMID: 37587527 PMCID: PMC10433603 DOI: 10.1186/s40168-023-01615-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 07/07/2023] [Indexed: 08/18/2023]
Abstract
BACKGROUND Bacterial strains under the same species can exhibit different biological properties, making strain-level composition analysis an important step in understanding the dynamics of microbial communities. Metagenomic sequencing has become the major means for probing the microbial composition in host-associated or environmental samples. Although there are a plethora of composition analysis tools, they are not optimized to address the challenges in strain-level analysis: highly similar strain genomes and the presence of multiple strains under one species in a sample. Thus, this work aims to provide a high-resolution and more accurate strain-level analysis tool for short reads. RESULTS In this work, we present a new strain-level composition analysis tool named StrainScan that employs a novel tree-based k-mers indexing structure to strike a balance between the strain identification accuracy and the computational complexity. We tested StrainScan extensively on a large number of simulated and real sequencing data and benchmarked StrainScan with popular strain-level analysis tools including Krakenuniq, StrainSeeker, Pathoscope2, Sigma, StrainGE, and StrainEst. The results show that StrainScan has higher accuracy and resolution than the state-of-the-art tools on strain-level composition analysis. It improves the F1 score by 20% in identifying multiple strains at the strain level. CONCLUSIONS By using a novel k-mer indexing structure, StrainScan is able to provide strain-level analysis with higher resolution than existing tools, enabling it to return more informative strain composition analysis in one sample or across multiple samples. StrainScan takes short reads and a set of reference strains as input and its source codes are freely available at https://github.com/liaoherui/StrainScan . Video Abstract.
Collapse
Affiliation(s)
- Herui Liao
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China
| | - Yongxin Ji
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China.
| |
Collapse
|
7
|
Zhao C, Shi ZJ, Pollard KS. Pitfalls of genotyping microbial communities with rapidly growing genome collections. Cell Syst 2023; 14:160-176.e3. [PMID: 36657438 PMCID: PMC9957970 DOI: 10.1016/j.cels.2022.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/15/2022] [Accepted: 12/19/2022] [Indexed: 01/20/2023]
Abstract
Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before calling variants. In this synthesis, we investigate how databases of increasingly diverse and closely related species have pushed the limits of current alignment algorithms, thereby degrading the performance of metagenotyping tools. We identify multi-mapping reads as a prevalent source of errors and illustrate a trade-off between retaining correct alignments versus limiting incorrect alignments, many of which map reads to the wrong species. Then we evaluate several actionable mitigation strategies and review emerging methods showing promise to further improve metagenotyping in response to the rapid growth in genome collections. Our results have implications beyond metagenotyping to the many tools in microbial genomics that depend upon accurate read mapping.
Collapse
Affiliation(s)
- Chunyu Zhao
- Chan Zuckerberg Biohub, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
| | - Zhou Jason Shi
- Chan Zuckerberg Biohub, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
| | - Katherine S Pollard
- Chan Zuckerberg Biohub, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA; Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
8
|
Venturini C, Pang J, Tamuri AU, Roy S, Atkinson C, Griffiths P, Breuer J, Goldstein RA. Haplotype assignment of longitudinal viral deep sequencing data using covariation of variant frequencies. Virus Evol 2022; 8:veac093. [PMID: 36478783 PMCID: PMC9719071 DOI: 10.1093/ve/veac093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 09/15/2022] [Accepted: 10/05/2022] [Indexed: 11/13/2022] Open
Abstract
Longitudinal deep sequencing of viruses can provide detailed information about intra-host evolutionary dynamics including how viruses interact with and transmit between hosts. Many analyses require haplotype reconstruction, identifying which variants are co-located on the same genomic element. Most current methods to perform this reconstruction are based on a high density of variants and cannot perform this reconstruction for slowly evolving viruses. We present a new approach, HaROLD (HAplotype Reconstruction Of Longitudinal Deep sequencing data), which performs this reconstruction based on identifying co-varying variant frequencies using a probabilistic framework. We illustrate HaROLD on both RNA and DNA viruses with synthetic Illumina paired read data created from mixed human cytomegalovirus (HCMV) and norovirus genomes, and clinical datasets of HCMV and norovirus samples, demonstrating high accuracy, especially when longitudinal samples are available.
Collapse
Affiliation(s)
- Cristina Venturini
- Infection, Immunity, Inflammation, Institute of Child Health, University College London, London WC1E 6BT, UK
| | - Juanita Pang
- Division of Infection and Immunity, University College London, London WC1E 6BT, UK
| | - Asif U Tamuri
- Research IT Services, University College London, London WC1E 6BT, UK
| | - Sunando Roy
- Infection, Immunity, Inflammation, Institute of Child Health, University College London, London WC1E 6BT, UK
| | - Claire Atkinson
- Institute for Immunity and Transplantation, University College London, London NW3 2PP, UK
| | - Paul Griffiths
- Institute for Immunity and Transplantation, University College London, London NW3 2PP, UK
| | - Judith Breuer
- Infection, Immunity, Inflammation, Institute of Child Health, University College London, London WC1E 6BT, UK
- Great Ormond Street Hospital for Children, London WC1N 3JH, UK
| | - Richard A Goldstein
- Division of Infection and Immunity, University College London, London WC1E 6BT, UK
- Infection, Immunity, Inflammation, Institute of Child Health, University College London, London WC1E 6BT, UK
| |
Collapse
|
9
|
Purushothaman S, Meola M, Egli A. Combination of Whole Genome Sequencing and Metagenomics for Microbiological Diagnostics. Int J Mol Sci 2022; 23:9834. [PMID: 36077231 PMCID: PMC9456280 DOI: 10.3390/ijms23179834] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 08/24/2022] [Accepted: 08/26/2022] [Indexed: 12/21/2022] Open
Abstract
Whole genome sequencing (WGS) provides the highest resolution for genome-based species identification and can provide insight into the antimicrobial resistance and virulence potential of a single microbiological isolate during the diagnostic process. In contrast, metagenomic sequencing allows the analysis of DNA segments from multiple microorganisms within a community, either using an amplicon- or shotgun-based approach. However, WGS and shotgun metagenomic data are rarely combined, although such an approach may generate additive or synergistic information, critical for, e.g., patient management, infection control, and pathogen surveillance. To produce a combined workflow with actionable outputs, we need to understand the pre-to-post analytical process of both technologies. This will require specific databases storing interlinked sequencing and metadata, and also involves customized bioinformatic analytical pipelines. This review article will provide an overview of the critical steps and potential clinical application of combining WGS and metagenomics together for microbiological diagnosis.
Collapse
Affiliation(s)
- Srinithi Purushothaman
- Applied Microbiology Research, Department of Biomedicine, University of Basel, 4031 Basel, Switzerland
- Institute of Medical Microbiology, University of Zurich, 8006 Zurich, Switzerland
| | - Marco Meola
- Applied Microbiology Research, Department of Biomedicine, University of Basel, 4031 Basel, Switzerland
- Institute of Medical Microbiology, University of Zurich, 8006 Zurich, Switzerland
- Swiss Institute of Bioinformatics, University of Basel, 4031 Basel, Switzerland
| | - Adrian Egli
- Applied Microbiology Research, Department of Biomedicine, University of Basel, 4031 Basel, Switzerland
- Institute of Medical Microbiology, University of Zurich, 8006 Zurich, Switzerland
- Clinical Bacteriology and Mycology, University Hospital Basel, 4031 Basel, Switzerland
| |
Collapse
|
10
|
A revisit to universal single-copy genes in bacterial genomes. Sci Rep 2022; 12:14550. [PMID: 36008577 PMCID: PMC9411617 DOI: 10.1038/s41598-022-18762-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 08/18/2022] [Indexed: 11/08/2022] Open
Abstract
Universal single-copy genes (USCGs) are widely used for species classification and taxonomic profiling. Despite many studies on USCGs, our understanding of USCGs in bacterial genomes might be out of date, especially how different the USCGs are in different studies, how well a set of USCGs can distinguish two bacterial species, whether USCGs can separate different strains of a bacterial species, to name a few. To fill the void, we studied USCGs in the most updated complete bacterial genomes. We showed that different USCG sets are quite different while coming from highly similar functional categories. We also found that although USCGs occur once in almost all bacterial genomes, each USCG does occur multiple times in certain genomes. We demonstrated that USCGs are reliable markers to distinguish different species while they cannot distinguish different strains of most bacterial species. Our study sheds new light on the usage and limitations of USCGs, which will facilitate their applications in evolutionary, phylogenomic, and metagenomic studies.
Collapse
|
11
|
Li D, He M, Tang Q, Tian S, Zhang J, Li Y, Wang D, Jin L, Ning C, Zhu W, Hu S, Long K, Ma J, Liu J, Zhang Z, Li M. Comparative 3D genome architecture in vertebrates. BMC Biol 2022; 20:99. [PMID: 35524220 PMCID: PMC9077971 DOI: 10.1186/s12915-022-01301-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 04/20/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND The three-dimensional (3D) architecture of the genome has a highly ordered and hierarchical nature, which influences the regulation of essential nuclear processes at the basis of gene expression, such as gene transcription. While the hierarchical organization of heterochromatin and euchromatin can underlie differences in gene expression that determine evolutionary differences among species, the way 3D genome architecture is affected by evolutionary forces within major lineages remains unclear. Here, we report a comprehensive comparison of 3D genomes, using high resolution Hi-C data in fibroblast cells of fish, chickens, and 10 mammalian species. RESULTS This analysis shows a correlation between genome size and chromosome length that affects chromosome territory (CT) organization in the upper hierarchy of genome architecture, whereas lower hierarchical features, including local transcriptional availability of DNA, are selected through the evolution of vertebrates. Furthermore, conservation of topologically associating domains (TADs) appears strongly associated with the modularity of expression profiles across species. Additionally, LINE and SINE transposable elements likely contribute to heterochromatin and euchromatin organization, respectively, during the evolution of genome architecture. CONCLUSIONS Our analysis uncovers organizational features that appear to determine the conservation and transcriptional regulation of functional genes across species. These findings can guide ongoing investigations of genome evolution by extending our understanding of the mechanisms shaping genome architecture.
Collapse
Affiliation(s)
- Diyan Li
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Mengnan He
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Qianzi Tang
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Shilin Tian
- Department of Ecology, Tibetan Centre for Ecology and Conservation at WHU-TU, Hubei Key Laboratory of Cell Homeostasis, College of Life Sciences, Wuhan University, Wuhan, 430072, China
- Novogene Bioinformatics Institute, Beijing, 100000, China
| | - Jiaman Zhang
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Yan Li
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Danyang Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Long Jin
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Chunyou Ning
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Wei Zhu
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Silu Hu
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Keren Long
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Jideng Ma
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Jing Liu
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Mingzhou Li
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China.
| |
Collapse
|
12
|
Altermann E, Tegetmeyer HE, Chanyi RM. The evolution of bacterial genome assemblies - where do we need to go next? MICROBIOME RESEARCH REPORTS 2022; 1:15. [PMID: 38046358 PMCID: PMC10688829 DOI: 10.20517/mrr.2022.02] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 03/08/2022] [Accepted: 03/24/2022] [Indexed: 12/05/2023]
Abstract
Genome sequencing has fundamentally changed our ability to decipher and understand the genetic blueprint of life and how it changes over time in response to environmental and evolutionary pressures. The pace of sequencing is still increasing in response to advances in technologies, paving the way from sequenced genes to genomes to metagenomes to metagenome-assembled genomes (MAGs). Our ability to interrogate increasingly complex microbial communities through metagenomes and MAGs is opening up a tantalizing future where we may be able to delve deeper into the mechanisms and genetic responses emerging over time. In the near future, we will be able to detect MAG assembly variations within strains originating from diverging sub-populations, and one of the emerging challenges will be to capture these variations in a biologically relevant way. Here, we present a brief overview of sequencing technologies and the current state of metagenome assemblies to suggest the need to develop new data formats that can capture the genetic variations within strains and communities, which previously remained invisible due to sequencing technology limitations.
Collapse
Affiliation(s)
- Eric Altermann
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Riddet Institute, Massey University, Private Bag 11222, Palmerston North 4442, New Zealand
- Massey University, School of Veterinary Science, Palmerston North 4100, New Zealand
| | - Halina E. Tegetmeyer
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Center for Biotechnology, Bielefeld University, Universitaetsstrasse 27, Bielefeld 33615, Germany
| | - Ryan M. Chanyi
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Riddet Institute, Massey University, Private Bag 11222, Palmerston North 4442, New Zealand
| |
Collapse
|
13
|
Strain identification and quantitative analysis in microbial communities. J Mol Biol 2022; 434:167582. [DOI: 10.1016/j.jmb.2022.167582] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 03/31/2022] [Accepted: 04/03/2022] [Indexed: 12/14/2022]
|
14
|
Ventolero MF, Wang S, Hu H, Li X. Computational analyses of bacterial strains from shotgun reads. Brief Bioinform 2022; 23:6524011. [PMID: 35136954 DOI: 10.1093/bib/bbac013] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 01/10/2022] [Accepted: 01/11/2022] [Indexed: 12/21/2022] Open
Abstract
Shotgun sequencing is routinely employed to study bacteria in microbial communities. With the vast amount of shotgun sequencing reads generated in a metagenomic project, it is crucial to determine the microbial composition at the strain level. This study investigated 20 computational tools that attempt to infer bacterial strain genomes from shotgun reads. For the first time, we discussed the methodology behind these tools. We also systematically evaluated six novel-strain-targeting tools on the same datasets and found that BHap, mixtureS and StrainFinder performed better than other tools. Because the performance of the best tools is still suboptimal, we discussed future directions that may address the limitations.
Collapse
Affiliation(s)
| | - Saidi Wang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA.,Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
15
|
Reconstruction of evolving gene variants and fitness from short sequencing reads. Nat Chem Biol 2021; 17:1188-1198. [PMID: 34635842 PMCID: PMC8551035 DOI: 10.1038/s41589-021-00876-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 08/09/2021] [Indexed: 12/23/2022]
Abstract
Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods, as short read lengths can lose mutation linkages in haplotypes. Here we present Evoracle, a machine learning method that accurately reconstructs full-length genotypes (R2 = 0.94) and fitness using short-read data from directed evolution experiments, with substantial improvements over related methods. We validate Evoracle on phage-assisted continuous evolution (PACE) and phage-assisted non-continuous evolution (PANCE) of adenine base editors and OrthoRep evolution of drug-resistant enzymes. Evoracle retains strong performance (R2 = 0.86) on data with complete linkage loss between neighboring nucleotides and large measurement noise, such as pooled Sanger sequencing data (~US$10 per timepoint), and broadens the accessibility of training machine learning models on gene variant fitnesses. Evoracle can also identify high-fitness variants, including low-frequency 'rising stars', well before they are identifiable from consensus mutations.
Collapse
|
16
|
Li X, Hu H, Li X. mixtureS: a novel tool for bacterial strain genome reconstruction from reads. Bioinformatics 2021; 37:575-577. [PMID: 32805048 PMCID: PMC8599889 DOI: 10.1093/bioinformatics/btaa728] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 07/14/2020] [Accepted: 08/10/2020] [Indexed: 01/08/2023] Open
Abstract
MOTIVATION It is essential to study bacterial strains in environmental samples. Existing methods and tools often depend on known strains or known variations, cannot work on individual samples, not reliable, or not easy to use, etc. It is thus important to develop more user-friendly tools that can identify bacterial strains more accurately. RESULTS We developed a new tool called mixtureS that can de novo identify bacterial strains from shotgun reads of a clonal or metagenomic sample, without prior knowledge about the strains and their variations. Tested on 243 simulated datasets and 195 experimental datasets, mixtureS reliably identified the strains, their numbers and their abundance. Compared with three tools, mixtureS showed better performance in almost all simulated datasets and the vast majority of experimental datasets. AVAILABILITY AND IMPLEMENTATION The source code and tool mixtureS is available at http://www.cs.ucf.edu/˜xiaoman/mixtureS/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xin Li
- Department of Computer Science
| | | | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
17
|
Pelizzola M, Behr M, Li H, Munk A, Futschik A. Multiple haplotype reconstruction from allele frequency data. NATURE COMPUTATIONAL SCIENCE 2021; 1:262-271. [PMID: 38217170 DOI: 10.1038/s43588-021-00056-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 03/12/2021] [Indexed: 01/15/2024]
Abstract
Because haplotype information is of widespread interest in biomedical applications, effort has been put into their reconstruction. Here, we propose an efficient method, called haploSep, that is able to accurately infer major haplotypes and their frequencies just from multiple samples of allele frequency data. Even the accuracy of experimentally obtained allele frequencies can be improved by re-estimating them from our reconstructed haplotypes. From a methodological point of view, we model our problem as a multivariate regression problem where both the design matrix and the coefficient matrix are unknown. Compared to other methods, haploSep is very fast, with linear computational complexity in the haplotype length. We illustrate our method on simulated and real data focusing on experimental evolution and microbial data.
Collapse
Affiliation(s)
- Marta Pelizzola
- Vetmeduni Vienna, Vienna, Austria
- Vienna Graduate School of Population Genetics, Vienna, Austria
| | - Merle Behr
- University of California, Berkeley, CA, USA
| | - Housen Li
- University of Göttingen, Göttingen, Germany
- Cluster of Excellence 'Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells' (MBExC), University of Göttingen, Göttingen, Germany
| | - Axel Munk
- University of Göttingen, Göttingen, Germany
- Cluster of Excellence 'Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells' (MBExC), University of Göttingen, Göttingen, Germany
- Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | | |
Collapse
|
18
|
Abstract
Cystic fibrosis patients frequently suffer from recurring respiratory infections caused by colonizing pathogenic and commensal bacteria. Although modern therapies can sometimes alleviate respiratory symptoms by ameliorating residual function of the protein responsible for the disorder, management of chronic respiratory infections remains an issue. In cystic fibrosis, dynamic and complex communities of microbial pathogens and commensals can colonize the lung. Cultured isolates from lung sputum reveal high inter- and intraindividual variability in pathogen strains, sequence variants, and phenotypes; disease progression likely depends on the precise combination of infecting lineages. Routine clinical protocols, however, provide a limited overview of the colonizer populations. Therefore, a more comprehensive and precise identification and characterization of infecting lineages could assist in making corresponding decisions on treatment. Here, we describe longitudinal tracking for four cystic fibrosis patients who exhibited extreme clinical phenotypes and, thus, were selected from a pilot cohort of 11 patients with repeated sampling for more than a year. Following metagenomics sequencing of lung sputum, we find that the taxonomic identity of individual colonizer lineages can be easily established. Crucially, even superficially clonal pathogens can be subdivided into multiple sublineages at the sequence level. By tracking individual allelic differences over time, an assembly-free clustering approach allows us to reconstruct multiple lineage-specific genomes with clear structural differences. Our study showcases a culture-independent shotgun metagenomics approach for longitudinal tracking of sublineage pathogen dynamics, opening up the possibility of using such methods to assist in monitoring disease progression through providing high-resolution routine characterization of the cystic fibrosis lung microbiome.
Collapse
|
19
|
Cao C, He J, Mak L, Perera D, Kwok D, Wang J, Li M, Mourier T, Gavriliuc S, Greenberg M, Morrissy AS, Sycuro LK, Yang G, Jeffares DC, Long Q. Reconstruction of Microbial Haplotypes by Integration of Statistical and Physical Linkage in Scaffolding. Mol Biol Evol 2021; 38:2660-2672. [PMID: 33547786 PMCID: PMC8136496 DOI: 10.1093/molbev/msab037] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or "haplotypes." However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics, and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here, we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis.
Collapse
Affiliation(s)
- Chen Cao
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Jingni He
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Cardiology, Xiangya Hospital, Central South University, Changsha, China
| | - Lauren Mak
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Present address: Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, NY, USA
| | - Deshan Perera
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Devin Kwok
- Department of Mathematics & Statistics, University of Calgary, Calgary, AB, Canada
| | - Jia Wang
- Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USA
| | - Minghao Li
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Tobias Mourier
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Stefan Gavriliuc
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Matthew Greenberg
- Department of Mathematics & Statistics, University of Calgary, Calgary, AB, Canada
| | - A Sorana Morrissy
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Laura K Sycuro
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Microbiology, Immunology, and Infectious Diseases, Snyder Institute for Chronic Diseases, University of Calgary, Calgary, AB, Canada
| | - Guang Yang
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Medical Genetics, University of Calgary, Calgary, AB, Canada
| | - Daniel C Jeffares
- Department of Biology, York Biomedical Research Institute, University of York, York, United Kingdom
| | - Quan Long
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Mathematics & Statistics, University of Calgary, Calgary, AB, Canada,Department of Medical Genetics, University of Calgary, Calgary, AB, Canada,Hotchkiss Brain Institute, O’Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada,Corresponding author: E-mail:
| |
Collapse
|
20
|
Anyansi C, Straub TJ, Manson AL, Earl AM, Abeel T. Computational Methods for Strain-Level Microbial Detection in Colony and Metagenome Sequencing Data. Front Microbiol 2020; 11:1925. [PMID: 33013732 PMCID: PMC7507117 DOI: 10.3389/fmicb.2020.01925] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 07/22/2020] [Indexed: 01/17/2023] Open
Abstract
Metagenomic sequencing is a powerful tool for examining the diversity and complexity of microbial communities. Most widely used tools for taxonomic profiling of metagenomic sequence data allow for a species-level overview of the composition. However, individual strains within a species can differ greatly in key genotypic and phenotypic characteristics, such as drug resistance, virulence and growth rate. Therefore, the ability to resolve microbial communities down to the level of individual strains within a species is critical to interpreting metagenomic data for clinical and environmental applications, where identifying a particular strain, or tracking a particular strain across a set of samples, can help aid in clinical diagnosis and treatment, or in characterizing yet unstudied strains across novel environmental locations. Recently published approaches have begun to tackle the problem of resolving strains within a particular species in metagenomic samples. In this review, we present an overview of these new algorithms and their uses, including methods based on assembly reconstruction and methods operating with or without a reference database. While existing metagenomic analysis methods show reasonable performance at the species and higher taxonomic levels, identifying closely related strains within a species presents a bigger challenge, due to the diversity of databases, genetic relatedness, and goals when conducting these analyses. Selection of which metagenomic tool to employ for a specific application should be performed on a case-by case basis as these tools have strengths and weaknesses that affect their performance on specific tasks. A comprehensive benchmark across different use case scenarios is vital to validate performance of these tools on microbial samples. Because strain-level metagenomic analysis is still in its infancy, development of more fine-grained, high-resolution algorithms will continue to be in demand for the future.
Collapse
Affiliation(s)
- Christine Anyansi
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| | - Timothy J. Straub
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Abigail L. Manson
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| | - Ashlee M. Earl
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| | - Thomas Abeel
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| |
Collapse
|
21
|
Saltykova A, Buytaers FE, Denayer S, Verhaegen B, Piérard D, Roosens NHC, Marchal K, De Keersmaecker SCJ. Strain-Level Metagenomic Data Analysis of Enriched In Vitro and In Silico Spiked Food Samples: Paving the Way towards a Culture-Free Foodborne Outbreak Investigation Using STEC as a Case Study. Int J Mol Sci 2020; 21:E5688. [PMID: 32784459 PMCID: PMC7460976 DOI: 10.3390/ijms21165688] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 08/04/2020] [Accepted: 08/06/2020] [Indexed: 12/13/2022] Open
Abstract
Culture-independent diagnostics, such as metagenomic shotgun sequencing of food samples, could not only reduce the turnaround time of samples in an outbreak investigation, but also allow the detection of multi-species and multi-strain outbreaks. For successful foodborne outbreak investigation using a metagenomic approach, it is, however, necessary to bioinformatically separate the genomes of individual strains, including strains belonging to the same species, present in a microbial community, which has up until now not been demonstrated for this application. The current work shows the feasibility of strain-level metagenomics of enriched food matrix samples making use of data analysis tools that classify reads against a sequence database. It includes a brief comparison of two database-based read classification tools, Sigma and Sparse, using a mock community obtained by in vitro spiking minced meat with a Shiga toxin-producing Escherichia coli (STEC) isolate originating from a described outbreak. The more optimal tool Sigma was further evaluated using in silico simulated metagenomic data to explore the possibilities and limitations of this data analysis approach. The performed analysis allowed us to link the pathogenic strains from food samples to human isolates previously collected during the same outbreak, demonstrating that the metagenomic approach could be applied for the rapid source tracking of foodborne outbreaks. To our knowledge, this is the first study demonstrating a data analysis approach for detailed characterization and phylogenetic placement of multiple bacterial strains of one species from shotgun metagenomic WGS data of an enriched food sample.
Collapse
Affiliation(s)
- Assia Saltykova
- Transversal Activities in Applied Genomics (TAG), Sciensano, 1050 Brussels, Belgium
- IDLab, Department of Information Technology, Ghent University, IMEC, 9052 Ghent, Belgium
| | - Florence E Buytaers
- Transversal Activities in Applied Genomics (TAG), Sciensano, 1050 Brussels, Belgium
- IDLab, Department of Information Technology, Ghent University, IMEC, 9052 Ghent, Belgium
| | - Sarah Denayer
- National Reference Laboratory for Shiga Toxin-Producing Escherichia coli (NRL STEC), Foodborne Pathogens, Sciensano, 1050 Brussels, Belgium
| | - Bavo Verhaegen
- National Reference Laboratory for Shiga Toxin-Producing Escherichia coli (NRL STEC), Foodborne Pathogens, Sciensano, 1050 Brussels, Belgium
| | - Denis Piérard
- National Reference Center for Shiga Toxin-Producing Escherichia coli (NRC STEC), Department of Microbiology and Infection Control, Universitair Ziekenhuis Brussel (UZ Brussel), Vrije Universiteit Brussel (VUB), 1090 Brussels, Belgium
| | - Nancy H C Roosens
- Transversal Activities in Applied Genomics (TAG), Sciensano, 1050 Brussels, Belgium
| | - Kathleen Marchal
- IDLab, Department of Information Technology, Ghent University, IMEC, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
- Department of Genetics, University of Pretoria, Pretoria 0083, South Africa
| | | |
Collapse
|
22
|
Li X, Saadat S, Hu H, Li X. BHap: a novel approach for bacterial haplotype reconstruction. Bioinformatics 2020; 35:4624-4631. [PMID: 31004480 DOI: 10.1093/bioinformatics/btz280] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 03/07/2019] [Accepted: 04/13/2019] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION The bacterial haplotype reconstruction is critical for selecting proper treatments for diseases caused by unknown haplotypes. Existing methods and tools do not work well on this task, because they are usually developed for viral instead of bacterial populations. RESULTS In this study, we developed BHap, a novel algorithm based on fuzzy flow networks, for reconstructing bacterial haplotypes from next generation sequencing data. Tested on simulated and experimental datasets, we showed that BHap was capable of reconstructing haplotypes of bacterial populations with an average F1 score of 0.87, an average precision of 0.87 and an average recall of 0.88. We also demonstrated that BHap had a low susceptibility to sequencing errors, was capable of reconstructing haplotypes with low coverage and could handle a wide range of mutation rates. Compared with existing approaches, BHap outperformed them in terms of higher F1 scores, better precision, better recall and more accurate estimation of the number of haplotypes. AVAILABILITY AND IMPLEMENTATION The BHap tool is available at http://www.cs.ucf.edu/∼xiaoman/BHap/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xin Li
- Department of Computer Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| | - Samaneh Saadat
- Department of Computer Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| | - Haiyan Hu
- Department of Computer Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
23
|
Idowu AO, Oyibo WA, Bhattacharyya S, Khubbar M, Mendie UE, Bumah VV, Black C, Igietseme J, Azenabor AA. Rare mutations in Pfmdr1 gene of Plasmodium falciparum detected in clinical isolates from patients treated with anti-malarial drug in Nigeria. Malar J 2019; 18:319. [PMID: 31533729 PMCID: PMC6751857 DOI: 10.1186/s12936-019-2947-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 09/06/2019] [Indexed: 01/18/2023] Open
Abstract
Background Plasmodium falciparum, the deadliest causative agent of malaria, has high prevalence in Nigeria. Drug resistance causing failure of previously effective drugs has compromised anti-malarial treatment. On this basis, there is need for a proactive surveillance for resistance markers to the currently recommended artemisinin-based combination therapy (ACT), for early detection of resistance before it become widespread. Methods This study assessed anti-malarial resistance genes polymorphism in patients with uncomplicated P. falciparum malaria in Lagos, Nigeria. Sanger and Next Generation Sequencing (NGS) methods were used to screen for mutations in thirty-seven malaria positive blood samples targeting the P. falciparum chloroquine-resistance transporter (Pfcrt), P. falciparum multidrug-resistance 1 (Pfmdr1), and P. falciparum kelch 13 (Pfk13) genes, which have been previously associated with anti-malarial resistance. Results Expectedly, the NGS method was more proficient, detecting six Pfmdr1, seven Pfcrt and three Pfk13 mutations in the studied clinical isolates from Nigeria, a malaria endemic area. These mutations included rare Pfmdr1 mutations, N504K, N649D, F938Y and S967N, which were previously unreported. In addition, there was moderate prevalence of the K76T mutation (34.6%) associated with chloroquine and amodiaquine resistance, and high prevalence of the N86 wild type allele (92.3%) associated with lumefantrine resistance. Conclusion Widespread circulation of mutations associated with resistance to current anti-malarial drugs could potentially limit effective malaria therapy in endemic populations.
Collapse
Affiliation(s)
- Abel O Idowu
- Department of Biomedical Sciences, College of Health Sciences, University of Wisconsin, 2400 E. Hartford Avenue, Milwaukee, WI, 53211, USA.,Department of Pharmaceutics and Pharmaceutical Technology, Faculty of Pharmacy, University of Lagos, Lagos, Nigeria
| | - Wellington A Oyibo
- ANDI Centre of Excellence in Malaria Diagnosis, College of Medicine, University of Lagos, Lagos, Nigeria
| | | | - Manjeet Khubbar
- City of Milwaukee Health Department Laboratory, Milwaukee, USA
| | - Udoma E Mendie
- Department of Pharmaceutics and Pharmaceutical Technology, Faculty of Pharmacy, University of Lagos, Lagos, Nigeria
| | - Violet V Bumah
- Department of Biology, North Life Science 317, San Diego State University, San Diego, CA, 92182, USA
| | - Carolyn Black
- Molecular Pathogenesis Laboratory, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Joseph Igietseme
- Molecular Pathogenesis Laboratory, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Anthony A Azenabor
- Department of Biomedical Sciences, College of Health Sciences, University of Wisconsin, 2400 E. Hartford Avenue, Milwaukee, WI, 53211, USA. .,Department of Pharmaceutics and Pharmaceutical Technology, Faculty of Pharmacy, University of Lagos, Lagos, Nigeria.
| |
Collapse
|
24
|
dos Santos LF, Costa Polveiro R, Scatamburlo Moreira T, Pereira Vidigal PM, Chang YF, Scatamburlo Moreira MA. Polymorphism analysis of the apxIA gene of Actinobacillus pleuropneumoniae serovar 5 isolated in swine herds from Brazil. PLoS One 2018; 13:e0208789. [PMID: 30562362 PMCID: PMC6298653 DOI: 10.1371/journal.pone.0208789] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 11/26/2018] [Indexed: 11/30/2022] Open
Abstract
The bacterium Actinobacillus pleuropneumoniae is the etiological agent of Contagious Porcine Pleuropneumonia, a disease responsible for economic losses in the swine industry worldwide. A. pleuropneumoniae is capable of producing proteinaceous exotoxins responsible for inducing hemorrhagic lesions, one of which is ApxI. Few studies have conducted an in-depth evaluation of polymorphisms of the nucleotides that make up the ApxI toxin gene. Here we analyze the polymorphisms of the apxIA gene region of A. pleuropneumoniae serovar 5 isolated from swine in different regions in Brazil and report the results of molecular sequencing and phylogenetic analysis. Analysis of the apxIA gene in 60 isolates revealed the presence of genetic diversity and variability. The polymorphisms in the nucleotide sequences determined the grouping of the Brazilian sequences and five more sequences from the GenBank database into 14 different haplotypes, which formed three main groups and revealed the presence of mutations in the nucleotide sequences. The estimation of selection pressures suggests the occurrence of genetic variations by positive selective pressure on A. pleuropneumoniae in large groups of animals in relatively small spaces. These conditions presumably favor the horizontal dissemination of apxIA gene mutations within bacterial populations with host reservoirs. As a result, the same serovar can demonstrate different antigenic capacities due to mutations in the apxIA gene. These alterations in sequences of the apxIA gene could occur in other areas of countries with intense swine production, which could lead to differences in the pathogenicity and immunogenicity of each serovar and have implications for the clinical status or diagnosis of A. pleuropneumoniae.
Collapse
Affiliation(s)
- Lucas Fernando dos Santos
- Laboratory of Bacterial Diseases, Sector of Preventive Veterinary Medicine and Public Health, Veterinary Department, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
- Microbiologia Veterinária Especial LTDA (Microvet), Viçosa, Minas Gerais, Brazil
| | - Richard Costa Polveiro
- Laboratory of Bacterial Diseases, Sector of Preventive Veterinary Medicine and Public Health, Veterinary Department, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Thalita Scatamburlo Moreira
- Laboratory of Bacterial Diseases, Sector of Preventive Veterinary Medicine and Public Health, Veterinary Department, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Pedro Marcus Pereira Vidigal
- Núcleo de Análise de Biomoléculas (NuBioMol), Center of Biological Sciences, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Yung-Fu Chang
- Department of Population Medicine and Diagnostic Sciences, College of Veterinary Medicine, Cornell University, Ithaca, New York, United States of America
| | - Maria Aparecida Scatamburlo Moreira
- Laboratory of Bacterial Diseases, Sector of Preventive Veterinary Medicine and Public Health, Veterinary Department, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
- * E-mail:
| |
Collapse
|
25
|
Identifying Mixed Mycobacterium tuberculosis Infection and Laboratory Cross-Contamination during Mycobacterial Sequencing Programs. J Clin Microbiol 2018; 56:JCM.00923-18. [PMID: 30209183 PMCID: PMC6204665 DOI: 10.1128/jcm.00923-18] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 08/28/2018] [Indexed: 11/20/2022] Open
Abstract
The detection of laboratory cross-contamination and mixed tuberculosis infections is an important goal of clinical mycobacteriology laboratories. The objective of this study was to develop a method to detect mixtures of different Mycobacterium tuberculosis lineages in laboratories performing mycobacterial next-generation sequencing (NGS). The setting was the Public Health England National Mycobacteriology Laboratory Birmingham, which performs Illumina sequencing on DNA extracted from positive mycobacterial growth indicator tubes. We analyzed 4,156 samples yielding M. tuberculosis from 663 MiSeq runs, which were obtained during development and production use of a diagnostic process using NGS. The counts of the most common (major) variant and all other variants (nonmajor variants) were determined from reads mapping to positions defining M. tuberculosis lineages. Expected variation was estimated during process development. For each sample, we determined the nonmajor variant proportions at 55 sets of lineage-defining positions. The nonmajor variant proportion in the two most mixed lineage-defining sets (F2 metric) was compared with that of the 47 least-mixed lineage-defining sets (F47 metric). The following three patterns were observed: (i) not mixed by either metric; (ii) high F47 metric, suggesting mixtures of multiple lineages; and (iii) samples compatible with mixtures of two lineages, detected by differential F2 metric elevations relative to F47. Pattern ii was observed in batches, with similar patterns in the M. tuberculosis H37Rv control present in each run, and is likely to reflect cross-contamination. During production, the proportions of samples in the patterns were 97%, 2.8%, and 0.001%, respectively. The F2 and F47 metrics described could be used for laboratory process control in laboratories sequencing M. tuberculosis genomes.
Collapse
|
26
|
Van den Bergh B, Swings T, Fauvart M, Michiels J. Experimental Design, Population Dynamics, and Diversity in Microbial Experimental Evolution. Microbiol Mol Biol Rev 2018; 82:e00008-18. [PMID: 30045954 PMCID: PMC6094045 DOI: 10.1128/mmbr.00008-18] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
In experimental evolution, laboratory-controlled conditions select for the adaptation of species, which can be monitored in real time. Despite the current popularity of such experiments, nature's most pervasive biological force was long believed to be observable only on time scales that transcend a researcher's life-span, and studying evolution by natural selection was therefore carried out solely by comparative means. Eventually, microorganisms' propensity for fast evolutionary changes proved us wrong, displaying strong evolutionary adaptations over a limited time, nowadays massively exploited in laboratory evolution experiments. Here, we formulate a guide to experimental evolution with microorganisms, explaining experimental design and discussing evolutionary dynamics and outcomes and how it is used to assess ecoevolutionary theories, improve industrially important traits, and untangle complex phenotypes. Specifically, we give a comprehensive overview of the setups used in experimental evolution. Additionally, we address population dynamics and genetic or phenotypic diversity during evolution experiments and expand upon contributing factors, such as epistasis and the consequences of (a)sexual reproduction. Dynamics and outcomes of evolution are most profoundly affected by the spatiotemporal nature of the selective environment, where changing environments might lead to generalists and structured environments could foster diversity, aided by, for example, clonal interference and negative frequency-dependent selection. We conclude with future perspectives, with an emphasis on possibilities offered by fast-paced technological progress. This work is meant to serve as an introduction to those new to the field of experimental evolution, as a guide to the budding experimentalist, and as a reference work to the seasoned expert.
Collapse
Affiliation(s)
- Bram Van den Bergh
- Laboratory of Symbiotic and Pathogenic Interactions, Centre of Microbial and Plant Genetics, KU Leuven-University of Leuven, Leuven, Belgium
- Michiels Lab, Center for Microbiology, VIB, Leuven, Belgium
- Douglas Lab, Department of Entomology, Cornell University, Ithaca, New York, USA
| | - Toon Swings
- Laboratory of Symbiotic and Pathogenic Interactions, Centre of Microbial and Plant Genetics, KU Leuven-University of Leuven, Leuven, Belgium
- Michiels Lab, Center for Microbiology, VIB, Leuven, Belgium
| | - Maarten Fauvart
- Laboratory of Symbiotic and Pathogenic Interactions, Centre of Microbial and Plant Genetics, KU Leuven-University of Leuven, Leuven, Belgium
- Michiels Lab, Center for Microbiology, VIB, Leuven, Belgium
- imec, Leuven, Belgium
| | - Jan Michiels
- Laboratory of Symbiotic and Pathogenic Interactions, Centre of Microbial and Plant Genetics, KU Leuven-University of Leuven, Leuven, Belgium
- Michiels Lab, Center for Microbiology, VIB, Leuven, Belgium
| |
Collapse
|
27
|
Mahomed S, Naidoo K, Dookie N, Padayatchi N. Whole genome sequencing for the management of drug-resistant TB in low income high TB burden settings: Challenges and implications. Tuberculosis (Edinb) 2017; 107:137-143. [DOI: 10.1016/j.tube.2017.09.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Revised: 08/26/2017] [Accepted: 09/13/2017] [Indexed: 12/18/2022]
|
28
|
Abstract
Computer-assisted technologies of the genomic structure, biological function, and evolution of viruses remain a largely neglected area of research. The attention of bioinformaticians to this challenging field is currently unsatisfying in respect to its medical and biological importance. The power of new genome sequencing technologies, associated with new tools to handle "big data", provides unprecedented opportunities to address fundamental questions in virology. Here, we present an overview of the current technologies, challenges, and advantages of Next-Generation Sequencing (NGS) in relation to the field of virology. We present how viral sequences can be detected de novo out of current short-read NGS data. Furthermore, we discuss the challenges and applications of viral quasispecies and how secondary structures, commonly shaped by RNA viruses, can be computationally predicted. The phylogenetic analysis of viruses, as another ubiquitous field in virology, forms an essential element of describing viral epidemics and challenges current algorithms. Recently, the first specialized virus-bioinformatic organizations have been established. We need to bring together virologists and bioinformaticians and provide a platform for the implementation of interdisciplinary collaborative projects at local and international scales. Above all, there is an urgent need for dedicated software tools to tackle various challenges in virology.
Collapse
Affiliation(s)
- Martin Hölzer
- RNA Bioinformatics and High Throughput Analysis, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany; European Virus Bioinformatics Center (EVBC), Jena, Germany
| | - Manja Marz
- RNA Bioinformatics and High Throughput Analysis, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany; European Virus Bioinformatics Center (EVBC), Jena, Germany; FLI Leibniz Institute for Age Research, Jena, Germany.
| |
Collapse
|
29
|
Russell SL, Cavanaugh CM. Intrahost Genetic Diversity of Bacterial Symbionts Exhibits Evidence of Mixed Infections and Recombinant Haplotypes. Mol Biol Evol 2017; 34:2747-2761. [DOI: 10.1093/molbev/msx188] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
|
30
|
Swings T, Van den Bergh B, Wuyts S, Oeyen E, Voordeckers K, Verstrepen KJ, Fauvart M, Verstraeten N, Michiels J. Adaptive tuning of mutation rates allows fast response to lethal stress in Escherichia coli. eLife 2017; 6. [PMID: 28460660 PMCID: PMC5429094 DOI: 10.7554/elife.22939] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 04/18/2017] [Indexed: 12/17/2022] Open
Abstract
While specific mutations allow organisms to adapt to stressful environments, most changes in an organism's DNA negatively impact fitness. The mutation rate is therefore strictly regulated and often considered a slowly-evolving parameter. In contrast, we demonstrate an unexpected flexibility in cellular mutation rates as a response to changes in selective pressure. We show that hypermutation independently evolves when different Escherichia coli cultures adapt to high ethanol stress. Furthermore, hypermutator states are transitory and repeatedly alternate with decreases in mutation rate. Specifically, population mutation rates rise when cells experience higher stress and decline again once cells are adapted. Interestingly, we identified cellular mortality as the major force driving the quick evolution of mutation rates. Together, these findings show how organisms balance robustness and evolvability and help explain the prevalence of hypermutation in various settings, ranging from emergence of antibiotic resistance in microbes to cancer relapses upon chemotherapy.
Collapse
Affiliation(s)
- Toon Swings
- Centre of Microbial and Plant Genetics, KU Leuven - University of Leuven, Leuven, Belgium
| | - Bram Van den Bergh
- Centre of Microbial and Plant Genetics, KU Leuven - University of Leuven, Leuven, Belgium
| | - Sander Wuyts
- Centre of Microbial and Plant Genetics, KU Leuven - University of Leuven, Leuven, Belgium
| | - Eline Oeyen
- Centre of Microbial and Plant Genetics, KU Leuven - University of Leuven, Leuven, Belgium
| | - Karin Voordeckers
- Centre of Microbial and Plant Genetics, KU Leuven - University of Leuven, Leuven, Belgium.,VIB Laboratory for Genetics and Genomics, Vlaams Instituut voor Biotechnologie, Leuven, Belgium
| | - Kevin J Verstrepen
- Centre of Microbial and Plant Genetics, KU Leuven - University of Leuven, Leuven, Belgium.,VIB Laboratory for Genetics and Genomics, Vlaams Instituut voor Biotechnologie, Leuven, Belgium
| | - Maarten Fauvart
- Centre of Microbial and Plant Genetics, KU Leuven - University of Leuven, Leuven, Belgium.,Smart Systems and Emerging Technologies Unit, Imec (Interuniversity Micro-Electronics Centre), Leuven, Belgium
| | - Natalie Verstraeten
- Centre of Microbial and Plant Genetics, KU Leuven - University of Leuven, Leuven, Belgium
| | - Jan Michiels
- Centre of Microbial and Plant Genetics, KU Leuven - University of Leuven, Leuven, Belgium
| |
Collapse
|
31
|
Molecular Epidemiology of Plasmodium falciparum kelch13 Mutations in Senegal Determined by Using Targeted Amplicon Deep Sequencing. Antimicrob Agents Chemother 2017; 61:AAC.02116-16. [PMID: 28069653 DOI: 10.1128/aac.02116-16] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 12/27/2016] [Indexed: 12/19/2022] Open
Abstract
The emergence of Plasmodium falciparum resistance to artemisinin in Southeast Asia threatens malaria control and elimination activities worldwide. Multiple polymorphisms in the P. falciparum kelch gene found in chromosome 13 (Pfk13) have been associated with artemisinin resistance. Surveillance of potential drug resistance loci within a population that may emerge under increasing drug pressure is an important public health activity. In this context, P. falciparum infections from an observational surveillance study in Senegal were genotyped using targeted amplicon deep sequencing (TADS) for Pfk13 polymorphisms. The results were compared to previously reported Pfk13 polymorphisms from around the world. A total of 22 Pfk13 propeller domain polymorphisms were identified in this study, of which 12 have previously not been reported. Interestingly, of the 10 polymorphisms identified in the present study that were also previously reported, all had a different amino acid substitution at these codon positions. Most of the polymorphisms were present at low frequencies and were confined to single isolates, suggesting they are likely transient polymorphisms that are part of naturally evolving parasite populations. The results of this study underscore the need to identify potential drug resistance loci existing within a population, which may emerge under increasing drug pressure.
Collapse
|
32
|
Zojer M, Schuster LN, Schulz F, Pfundner A, Horn M, Rattei T. Variant profiling of evolving prokaryotic populations. PeerJ 2017; 5:e2997. [PMID: 28224054 PMCID: PMC5316281 DOI: 10.7717/peerj.2997] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 01/17/2017] [Indexed: 12/30/2022] Open
Abstract
Genomic heterogeneity of bacterial species is observed and studied in experimental evolution experiments and clinical diagnostics, and occurs as micro-diversity of natural habitats. The challenge for genome research is to accurately capture this heterogeneity with the currently used short sequencing reads. Recent advances in NGS technologies improved the speed and coverage and thus allowed for deep sequencing of bacterial populations. This facilitates the quantitative assessment of genomic heterogeneity, including low frequency alleles or haplotypes. However, false positive variant predictions due to sequencing errors and mapping artifacts of short reads need to be prevented. We therefore created VarCap, a workflow for the reliable prediction of different types of variants even at low frequencies. In order to predict SNPs, InDels and structural variations, we evaluated the sensitivity and accuracy of different software tools using synthetic read data. The results suggested that the best sensitivity could be reached by a union of different tools, however at the price of increased false positives. We identified possible reasons for false predictions and used this knowledge to improve the accuracy by post-filtering the predicted variants according to properties such as frequency, coverage, genomic environment/localization and co-localization with other variants. We observed that best precision was achieved by using an intersection of at least two tools per variant. This resulted in the reliable prediction of variants above a minimum relative abundance of 2%. VarCap is designed for being routinely used within experimental evolution experiments or for clinical diagnostics. The detected variants are reported as frequencies within a VCF file and as a graphical overview of the distribution of the different variant/allele/haplotype frequencies. The source code of VarCap is available at https://github.com/ma2o/VarCap. In order to provide this workflow to a broad community, we implemeted VarCap on a Galaxy webserver, which is accessible at http://galaxy.csb.univie.ac.at.
Collapse
Affiliation(s)
- Markus Zojer
- Department of Microbiology and Ecosystems Science, Division of Computational Systems Biology, University of Vienna , Vienna , Austria
| | - Lisa N Schuster
- Department of Microbiology and Ecosystems Science, Division of Microbial Ecology, University of Vienna , Vienna , Austria
| | - Frederik Schulz
- DOE Joint Genome Institute, Lawrence Berkeley National Lab , Walnut Creek , CA , United States
| | - Alexander Pfundner
- Department of Microbiology and Ecosystems Science, Division of Computational Systems Biology, University of Vienna , Vienna , Austria
| | - Matthias Horn
- Department of Microbiology and Ecosystems Science, Division of Microbial Ecology, University of Vienna , Vienna , Austria
| | - Thomas Rattei
- Department of Microbiology and Ecosystems Science, Division of Computational Systems Biology, University of Vienna , Vienna , Austria
| |
Collapse
|
33
|
Choi YJ, Tyagi R, McNulty SN, Rosa BA, Ozersky P, Mafrtin J, Hallsworth-Pepin K, Unnasch TR, Norice CT, Nutman TB, Weil GJ, Fischer PU, Mitreva M. Genomic diversity in Onchocerca volvulus and its Wolbachia endosymbiont. Nat Microbiol 2016; 2:16207. [PMID: 27869792 PMCID: PMC5512550 DOI: 10.1038/nmicrobiol.2016.207] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 09/19/2016] [Indexed: 01/03/2023]
Abstract
Ongoing elimination efforts have altered the global distribution of Onchocerca volvulus, the agent of river blindness, and further population restructuring is expected as efforts continue. Therefore, a better understanding of population genetic processes and their effect on biogeography is needed to support elimination goals. We describe O. volvulus genome variation in 27 isolates from the early 1990s (before widespread mass treatment) from four distinct locales: Ecuador, Uganda, the West African forest and the West African savanna. We observed genetic substructuring between Ecuador and West Africa and between the West African forest and savanna bioclimes, with evidence of unidirectional gene flow from savanna to forest strains. We identified forest:savanna-discriminatory genomic regions and report a set of ancestry informative loci that can be used to differentiate between forest, savanna and admixed isolates, which has not previously been possible. We observed mito-nuclear discordance possibly stemming from incomplete lineage sorting. The catalogue of the nuclear, mitochondrial and endosymbiont DNA variants generated in this study will support future basic and translational onchocerciasis research, with particular relevance for ongoing control programmes, and boost efforts to characterize drug, vaccine and diagnostic targets.
Collapse
Affiliation(s)
- Young-Jun Choi
- McDonnell Genome Institute, Washington University in St. Louis, MO, USA
| | - Rahul Tyagi
- McDonnell Genome Institute, Washington University in St. Louis, MO, USA
| | | | - Bruce A. Rosa
- McDonnell Genome Institute, Washington University in St. Louis, MO, USA
| | - Philip Ozersky
- McDonnell Genome Institute, Washington University in St. Louis, MO, USA
| | - John Mafrtin
- McDonnell Genome Institute, Washington University in St. Louis, MO, USA
| | | | - Thomas R. Unnasch
- Global Health Infectious Disease Research Program, Department of Global Health, University of South Florida, Tampa, FL, USA
| | - Carmelle T. Norice
- Laboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, Bethesda, MD, USA
| | - Thomas B. Nutman
- Laboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, Bethesda, MD, USA
| | - Gary J. Weil
- Division of Infectious Diseases, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Peter U. Fischer
- Division of Infectious Diseases, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Makedonka Mitreva
- McDonnell Genome Institute, Washington University in St. Louis, MO, USA
- Division of Infectious Diseases, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
34
|
Gan M, Liu Q, Yang C, Gao Q, Luo T. Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis. PLoS One 2016; 11:e0159029. [PMID: 27391214 PMCID: PMC4938208 DOI: 10.1371/journal.pone.0159029] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 06/24/2016] [Indexed: 11/18/2022] Open
Abstract
Mixed infection by multiple Mycobacterium tuberculosis (MTB) strains is associated with poor treatment outcome of tuberculosis (TB). Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS) has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs) of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates.
Collapse
Affiliation(s)
- Mingyu Gan
- Key Laboratory of Medical Molecular Virology of Ministries of Education and Health, Institutes of Biomedical Sciences and Institute of Medical Microbiology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Qingyun Liu
- Key Laboratory of Medical Molecular Virology of Ministries of Education and Health, Institutes of Biomedical Sciences and Institute of Medical Microbiology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Chongguang Yang
- Key Laboratory of Medical Molecular Virology of Ministries of Education and Health, Institutes of Biomedical Sciences and Institute of Medical Microbiology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Qian Gao
- Key Laboratory of Medical Molecular Virology of Ministries of Education and Health, Institutes of Biomedical Sciences and Institute of Medical Microbiology, School of Basic Medical Sciences, Fudan University, Shanghai, China
- * E-mail: (QG); (TL)
| | - Tao Luo
- Laboratory of Infection and Immunity, School of Basic Medical Science, West China Center of Medical Sciences, Sichuan University, Chengdu, Sichuan, China
- * E-mail: (QG); (TL)
| |
Collapse
|
35
|
Vincent AT, Derome N, Boyle B, Culley AI, Charette SJ. Next-generation sequencing (NGS) in the microbiological world: How to make the most of your money. J Microbiol Methods 2016; 138:60-71. [PMID: 26995332 DOI: 10.1016/j.mimet.2016.02.016] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Revised: 01/26/2016] [Accepted: 02/24/2016] [Indexed: 12/16/2022]
Abstract
The Sanger sequencing method produces relatively long DNA sequences of unmatched quality and has been considered for long time as the gold standard for sequencing DNA. Many improvements of the Sanger method that culminated with fluorescent dyes coupled with automated capillary electrophoresis enabled the sequencing of the first genomes. Nevertheless, using this technology to sequence whole genomes was costly, laborious and time consuming even for genomes that are relatively small in size. A major technological advance was the introduction of next-generation sequencing (NGS) pioneered by 454 Life Sciences in the early part of the 21th century. NGS allowed scientists to sequence thousands to millions of DNA molecules in a single machine run. Since then, new NGS technologies have emerged and existing NGS platforms have been improved, enabling the production of genome sequences at an unprecedented rate as well as broadening the spectrum of NGS applications. The current affordability of generating genomic information, especially with microbial samples, has resulted in a false sense of simplicity that belies the fact that many researchers still consider these technologies a black box. In this review, our objective is to identify and discuss four steps that we consider crucial to the success of any NGS-related project. These steps are: (1) the definition of the research objectives beyond sequencing and appropriate experimental planning, (2) library preparation, (3) sequencing and (4) data analysis. The goal of this review is to give an overview of the process, from sample to analysis, and discuss how to optimize your resources to achieve the most from your NGS-based research. Regardless of the evolution and improvement of the sequencing technologies, these four steps will remain relevant.
Collapse
Affiliation(s)
- Antony T Vincent
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada; Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, Quebec City, QC G1V 0A6, Canada; Centre de recherche de l'Institut universitaire de cardiologie et de pneumologie de Québec, Quebec City, QC G1V 4G5, Canada
| | - Nicolas Derome
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada; Département de biologie, Faculté des sciences et de génie, Université Laval, Quebec City G1V 0A6, Canada
| | - Brian Boyle
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
| | - Alexander I Culley
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada; Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, Quebec City, QC G1V 0A6, Canada; Groupe de Recherche en Écologie Buccale (GREB), Faculté de médecine dentaire, Université Laval, Quebec City, QC G1V 0A6, Canada
| | - Steve J Charette
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada; Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, Quebec City, QC G1V 0A6, Canada; Centre de recherche de l'Institut universitaire de cardiologie et de pneumologie de Québec, Quebec City, QC G1V 4G5, Canada.
| |
Collapse
|
36
|
Steenackers HP, Parijs I, Dubey A, Foster KR, Vanderleyden J. Experimental evolution in biofilm populations. FEMS Microbiol Rev 2016; 40:373-97. [PMID: 26895713 PMCID: PMC4852284 DOI: 10.1093/femsre/fuw002] [Citation(s) in RCA: 91] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/21/2016] [Indexed: 12/19/2022] Open
Abstract
Biofilms are a major form of microbial life in which cells form dense surface associated communities that can persist for many generations. The long-life of biofilm communities means that they can be strongly shaped by evolutionary processes. Here, we review the experimental study of evolution in biofilm communities. We first provide an overview of the different experimental models used to study biofilm evolution and their associated advantages and disadvantages. We then illustrate the vast amount of diversification observed during biofilm evolution, and we discuss (i) potential ecological and evolutionary processes behind the observed diversification, (ii) recent insights into the genetics of adaptive diversification, (iii) the striking degree of parallelism between evolution experiments and real-life biofilms and (iv) potential consequences of diversification. In the second part, we discuss the insights provided by evolution experiments in how biofilm growth and structure can promote cooperative phenotypes. Overall, our analysis points to an important role of biofilm diversification and cooperation in bacterial survival and productivity. Deeper understanding of both processes is of key importance to design improved antimicrobial strategies and diagnostic techniques. This review paper provides an overview of (i) the different experimental models used to study biofilm evolution, (ii) the vast amount of diversification observed during biofilm evolution (including potential causes and consequences) and (iii) recent insights in how growth in biofilms can lead to the evolution of cooperative phenotypes.
Collapse
Affiliation(s)
- Hans P Steenackers
- Department of Microbial and Molecular Systems, Centre of Microbial and Plant Genetics, KU Leuven, Leuven 3001, Belgium
| | - Ilse Parijs
- Department of Microbial and Molecular Systems, Centre of Microbial and Plant Genetics, KU Leuven, Leuven 3001, Belgium
| | | | - Kevin R Foster
- Department of Zoology, University of Oxford, Oxford OX1 3PS, UK Oxford Centre for Integrative Systems Biology, University of Oxford, Oxford OX1 3QU, UK
| | - Jozef Vanderleyden
- Department of Microbial and Molecular Systems, Centre of Microbial and Plant Genetics, KU Leuven, Leuven 3001, Belgium
| |
Collapse
|
37
|
Kuleshov V, Jiang C, Zhou W, Jahanbani F, Batzoglou S, Snyder M. Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome. Nat Biotechnol 2015; 34:64-9. [PMID: 26655498 PMCID: PMC4884093 DOI: 10.1038/nbt.3416] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Accepted: 10/23/2015] [Indexed: 01/30/2023]
Abstract
Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequence remains a difficult problem. Here, we present an analysis of a human gut microbiome using on Tru-seq synthetic long reads combined with new computational tools for metagenomic long-read assembly, variant-calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species of which 51 were not found using short sequence reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1Mbp. Extensive intraspecies variation among microbial strains in the form of haplotypes that span up to hundreds of Kbp can be observed using our approach. Our method incorporates synthetic long-read sequencing technology with standard shotgun approaches to move towards rapid, precise and comprehensive analyses of metagenome and microbiome samples.
Collapse
Affiliation(s)
- Volodymyr Kuleshov
- Department of Computer Science, Stanford University, Stanford, California, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Chao Jiang
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Wenyu Zhou
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Fereshteh Jahanbani
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Serafim Batzoglou
- Department of Computer Science, Stanford University, Stanford, California, USA
| | - Michael Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|