1
|
Yu D, Andersson-Li M, Maes S, Andersson-Li L, Neumann NF, Odlare M, Jonsson A. Development of a logic regression-based approach for the discovery of host- and niche-informative biomarkers in Escherichia coli and their application for microbial source tracking. Appl Environ Microbiol 2024; 90:e0022724. [PMID: 38940567 PMCID: PMC11267920 DOI: 10.1128/aem.00227-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 06/07/2024] [Indexed: 06/29/2024] Open
Abstract
Microbial source tracking leverages a wide range of approaches designed to trace the origins of fecal contamination in aquatic environments. Although source tracking methods are typically employed within the laboratory setting, computational techniques can be leveraged to advance microbial source tracking methodology. Herein, we present a logic regression-based supervised learning approach for the discovery of source-informative genetic markers within intergenic regions across the Escherichia coli genome that can be used for source tracking. With just single intergenic loci, logic regression was able to identify highly source-specific (i.e., exceeding 97.00%) biomarkers for a wide range of host and niche sources, with sensitivities reaching as high as 30.00%-50.00% for certain source categories, including pig, sheep, mouse, and wastewater, depending on the specific intergenic locus analyzed. Restricting the source range to reflect the most prominent zoonotic sources of E. coli transmission (i.e., bovine, chicken, human, and pig) allowed for the generation of informative biomarkers for all host categories, with specificities of at least 90.00% and sensitivities between 12.50% and 70.00%, using the sequence data from key intergenic regions, including emrKY-evgAS, ibsB-(mdtABCD-baeSR), ompC-rcsDB, and yedS-yedR, that appear to be involved in antibiotic resistance. Remarkably, we were able to use this approach to classify 48 out of 113 river water E. coli isolates collected in Northwestern Sweden as either beaver, human, or reindeer in origin with a high degree of consensus-thus highlighting the potential of logic regression modeling as a novel approach for augmenting current source tracking efforts.IMPORTANCEThe presence of microbial contaminants, particularly from fecal sources, within water poses a serious risk to public health. The health and economic burden of waterborne pathogens can be substantial-as such, the ability to detect and identify the sources of fecal contamination in environmental waters is crucial for the control of waterborne diseases. This can be accomplished through microbial source tracking, which involves the use of various laboratory techniques to trace the origins of microbial pollution in the environment. Building on current source tracking methodology, we describe a novel workflow that uses logic regression, a supervised machine learning method, to discover genetic markers in Escherichia coli, a common fecal indicator bacterium, that can be used for source tracking efforts. Importantly, our research provides an example of how the rise in prominence of machine learning algorithms can be applied to improve upon current microbial source tracking methodology.
Collapse
Affiliation(s)
- Daniel Yu
- School of Public Health, University of Alberta, Edmonton, Alberta, Canada
| | | | - Sharon Maes
- Department of Natural Sciences, Design and Sustainable Development, Mid Sweden University, Östersund, Sweden
| | - Lili Andersson-Li
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Solna, Sweden
| | - Norman F. Neumann
- School of Public Health, University of Alberta, Edmonton, Alberta, Canada
| | - Monica Odlare
- Department of Natural Sciences, Design and Sustainable Development, Mid Sweden University, Östersund, Sweden
| | - Anders Jonsson
- Department of Natural Sciences, Design and Sustainable Development, Mid Sweden University, Östersund, Sweden
| |
Collapse
|
2
|
Yu D, Stothard P, Neumann NF. Emergence of potentially disinfection-resistant, naturalized Escherichia coli populations across food- and water-associated engineered environments. Sci Rep 2024; 14:13478. [PMID: 38866876 PMCID: PMC11169474 DOI: 10.1038/s41598-024-64241-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 06/06/2024] [Indexed: 06/14/2024] Open
Abstract
The Escherichia coli species is comprised of several 'ecotypes' inhabiting a wide range of host and natural environmental niches. Recent studies have suggested that novel naturalized ecotypes have emerged across wastewater treatment plants and meat processing facilities. Phylogenetic and multilocus sequence typing analyses clustered naturalized wastewater and meat plant E. coli strains into two main monophyletic clusters corresponding to the ST635 and ST399 sequence types, with several serotypes identified by serotyping, potentially representing distinct lineages that have naturalized across wastewater treatment plants and meat processing facilities. This evidence, taken alongside ecotype prediction analyses that distinguished the naturalized strains from their host-associated counterparts, suggests these strains may collectively represent a novel ecotype that has recently emerged across food- and water-associated engineered environments. Interestingly, pan-genomic analyses revealed that the naturalized strains exhibited an abundance of biofilm formation, defense, and disinfection-related stress resistance genes, but lacked various virulence and colonization genes, indicating that their naturalization has come at the cost of fitness in the original host environment.
Collapse
Affiliation(s)
- Daniel Yu
- School of Public Health, University of Alberta, Edmonton, AB, Canada.
- Antimicrobial Resistance-One Health Consortium, Calgary, AB, Canada.
| | - Paul Stothard
- Department of Agriculture, Food and Nutritional Sciences, University of Alberta, Edmonton, AB, Canada
| | - Norman F Neumann
- School of Public Health, University of Alberta, Edmonton, AB, Canada
- Antimicrobial Resistance-One Health Consortium, Calgary, AB, Canada
| |
Collapse
|
3
|
Lagerstrom KM, Hadly EA. Under-Appreciated Phylogroup Diversity of Escherichia coli within and between Animals at the Urban-Wildland Interface. Appl Environ Microbiol 2023:e0014223. [PMID: 37191541 DOI: 10.1128/aem.00142-23] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023] Open
Abstract
Wild animals have been implicated as reservoirs and even "melting pots" of pathogenic and antimicrobial-resistant bacteria of concern to human health. Though Escherichia coli is common among vertebrate guts and plays a role in the propagation of such genetic information, few studies have explored its diversity beyond humans nor the ecological factors that influence its diversity and distribution in wild animals. We characterized an average of 20 E. coli isolates per scat sample (n = 84) from a community of 14 wild and 3 domestic species. The phylogeny of E. coli comprises 8 phylogroups that are differentially associated with pathogenicity and antibiotic resistance, and we uncovered all of them in one small biological preserve surrounded by intense human activity. Challenging previous assumptions that a single isolate is representative of within-host phylogroup diversity, 57% of individual animals sampled carried multiple phylogroups simultaneously. Host species' phylogroup richness saturated at different levels across species and encapsulated vast within-sample and within-species variation, indicating that distribution patterns are influenced both by isolation source and laboratory sampling depth. Using ecological methods that ensure statistical relevance, we identify trends in phylogroup prevalence associated with host and environmental factors. The vast genetic diversity and broad distribution of E. coli in wildlife populations has implications for biodiversity conservation, agriculture, and public health, as well as for gauging unknown risks at the urban-wildland interface. We propose critical directions for future studies of the "wild side" of E. coli that will expand our understanding of its ecology and evolution beyond the human environment. IMPORTANCE To our knowledge, neither the phylogroup diversity of E. coli within individual wild animals nor that within an interacting multispecies community have previously been assessed. In doing so, we uncovered the globally known phylogroup diversity from an animal community on a preserve imbedded in a human-dominated landscape. We revealed that the phylogroup composition in domestic animals differed greatly from that in their wild counterparts, implying potential human impacts on the domestic animal gut. Significantly, many wild individuals hosted multiple phylogroups simultaneously, indicating the potential for strain-mixing and zoonotic spillback, especially as human encroachment into wildlands increases in the Anthropocene. We reason that due to extensive anthropogenic environmental contamination, wildlife is increasingly exposed to our waste, including E. coli and antibiotics. The gaps in the ecological and evolutionary understanding of E. coli thus necessitate a significant uptick in research to better understand human impacts on wildlife and the risk for zoonotic pathogen emergence.
Collapse
Affiliation(s)
| | - Elizabeth A Hadly
- Department of Biology, Stanford University, Stanford, California, USA
- Jasper Ridge Biological Preserve, Stanford University, Stanford, California, USA
- Center for Innovation in Global Health, Stanford University, Stanford, California, USA
| |
Collapse
|
4
|
Yu D, Ryu K, Zhi S, Otto SJG, Neumann NF. Naturalized Escherichia coli in Wastewater and the Co-evolution of Bacterial Resistance to Water Treatment and Antibiotics. Front Microbiol 2022; 13:810312. [PMID: 35707173 PMCID: PMC9189398 DOI: 10.3389/fmicb.2022.810312] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 05/09/2022] [Indexed: 12/30/2022] Open
Abstract
Antibiotic resistance represents one of the most pressing concerns facing public health today. While the current antibiotic resistance crisis has been driven primarily by the anthropogenic overuse of antibiotics in human and animal health, recent efforts have revealed several important environmental dimensions underlying this public health issue. Antibiotic resistant (AR) microbes, AR genes, and antibiotics have all been found widespread in natural environments, reflecting the ancient origins of this phenomenon. In addition, modern societal advancements in sanitation engineering (i.e., sewage treatment) have also contributed to the dissemination of resistance, and concerningly, may also be promoting the evolution of resistance to water treatment. This is reflected in the recent characterization of naturalized wastewater strains of Escherichia coli-strains that appear to be adapted to live in wastewater (and meat packing plants). These strains carry a plethora of stress-resistance genes against common treatment processes, such as chlorination, heat, UV light, and advanced oxidation, mechanisms which potentially facilitate their survival during sewage treatment. These strains also carry an abundance of common antibiotic resistance genes, and evidence suggests that resistance to some antibiotics is linked to resistance to treatment (e.g., tetracycline resistance and chlorine resistance). As such, these naturalized E. coli populations may be co-evolving resistance against both antibiotics and water treatment. Recently, extraintestinal pathogenic strains of E. coli (ExPEC) have also been shown to exhibit phenotypic resistance to water treatment, seemingly associated with the presence of various shared genetic elements with naturalized wastewater E. coli. Consequently, some pathogenic microbes may also be evolving resistance to the two most important public health interventions for controlling infectious disease in modern society-antibiotic therapy and water treatment.
Collapse
Affiliation(s)
- Daniel Yu
- School of Public Health, University of Alberta, Edmonton, AB, Canada
- Antimicrobial Resistance – One Health Consortium, Calgary, AB, Canada
| | - Kanghee Ryu
- School of Public Health, University of Alberta, Edmonton, AB, Canada
- Antimicrobial Resistance – One Health Consortium, Calgary, AB, Canada
| | - Shuai Zhi
- School of Medicine, Ningbo University, Ningbo, China
- The Affiliated Hospital of Medical School, Ningbo University, Ningbo, China
| | - Simon J. G. Otto
- School of Public Health, University of Alberta, Edmonton, AB, Canada
- Antimicrobial Resistance – One Health Consortium, Calgary, AB, Canada
- Human-Environment-Animal Transdisciplinary Antimicrobial Resistance Research Group, School of Public Health, University of Alberta, Edmonton, AB, Canada
- Healthy Environments, Centre for Health Communities, School of Public Health, University of Alberta, Edmonton, AB, Canada
| | - Norman F. Neumann
- School of Public Health, University of Alberta, Edmonton, AB, Canada
- Antimicrobial Resistance – One Health Consortium, Calgary, AB, Canada
| |
Collapse
|
5
|
Yu D, Banting G, Neumann NF. A review of the taxonomy, genetics, and biology of the genus Escherichia and the type species Escherichia coli. Can J Microbiol 2021; 67:553-571. [PMID: 33789061 DOI: 10.1139/cjm-2020-0508] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Historically, bacteriologists have relied heavily on biochemical and structural phenotypes for bacterial taxonomic classification. However, advances in comparative genomics have led to greater insights into the remarkable genetic diversity within the microbial world, and even within well-accepted species such as Escherichia coli. The extraordinary genetic diversity in E. coli recapitulates the evolutionary radiation of this species in exploiting a wide range of niches (i.e., ecotypes), including the gastrointestinal system of diverse vertebrate hosts as well as non-host natural environments (soil, natural waters, wastewater), which drives the adaptation, natural selection, and evolution of intragenotypic conspecific specialism as a strategy for survival. Over the last few years, there has been increasing evidence that many E. coli strains are very host (or niche)-specific. While biochemical and phylogenetic evidence support the classification of E. coli as a distinct species, the vast genomic (diverse pan-genome and intragenotypic variability), phenotypic (e.g., metabolic pathways), and ecotypic (host-/niche-specificity) diversity, comparable to the diversity observed in known species complexes, suggest that E. coli is better represented as a complex. Herein we review the taxonomic classification of the genus Escherichia and discuss how phenotype, genotype, and ecotype recapitulate our understanding of the biology of this remarkable bacterium.
Collapse
Affiliation(s)
- Daniel Yu
- School of Public Health, University of Alberta, Edmonton, AB T6G IC9, Canada.,School of Public Health, University of Alberta, Edmonton, AB T6G IC9, Canada
| | - Graham Banting
- School of Public Health, University of Alberta, Edmonton, AB T6G IC9, Canada.,School of Public Health, University of Alberta, Edmonton, AB T6G IC9, Canada
| | - Norman F Neumann
- School of Public Health, University of Alberta, Edmonton, AB T6G IC9, Canada.,School of Public Health, University of Alberta, Edmonton, AB T6G IC9, Canada
| |
Collapse
|
6
|
Garabetian F, Vitte I, Sabourin A, Moussard H, Jouanillou A, Mornet L, Lesne M, Lyautey E. Uneven genotypic diversity of Escherichia coli in fecal sources limits the performance of a library-dependent method of microbial source tracking on the southwestern French Atlantic coast. Can J Microbiol 2020; 66:698-712. [PMID: 32730720 DOI: 10.1139/cjm-2020-0244] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
To develop a library-dependent method of tracking fecal sources of contamination of beaches on the Atlantic coast of southwestern France, a library of 6368 Escherichia coli isolates was constructed from samples of feces, from 40 known human or animal sources collected in the vicinity of Arcachon Bay in 2010, and in French Basque Country, Landes, and Béarn, between 2017 and 2018. Different schemes of source identification were tested: use of the complete or filtered reference library; characterization of the isolates by genotypic or proteomic profiling based on ERIC-PCR or MALDI-TOF mass spectrometry, respectively; isolate by isolate assignment using either classifiers based on the Pearson similarity or SVM (support vector machine). With the exception of one source identification scheme, which was discarded since it used self-assignment, all tested schemes resulted in low rates of correct classification (<35%) and significant rates of incorrect classification (>15%). The heterogeneous coverage of E. coli genotypic diversity between sources and the uneven distribution of E. coli genotypes in the library likely explain the difficulties encountered in identifying the sources of fecal contamination. Shannon diversity index of sources ranged from 0 for several wildlife species sampled once to 3.03 for sewage treatment plant effluents sampled on various occasions, showing discrepancies between sources. The uneven genotypic composition of the library was attested by the value of the Pielou index (0.54), the high proportion of nondiscriminatory genotypes (>91% of the isolates), and the very low proportion of discriminatory genotypes (<3%). Since efforts made to constitute such a library are not affordable for routine analyses, the results question the relevance of developing such a method for identifying sources of fecal contamination on such a coastline.
Collapse
Affiliation(s)
| | - Isabelle Vitte
- Laboratoires des Pyrénées et des Landes, F-64150 Lagor, France
| | - Antoine Sabourin
- Université de Bordeaux, CNRS, EPOC, EPHE, UMR 5805, F-33600 Pessac, France.,Laboratoires des Pyrénées et des Landes, F-64150 Lagor, France
| | - Hélène Moussard
- Université de Bordeaux, CNRS, EPOC, EPHE, UMR 5805, F-33600 Pessac, France
| | | | - Line Mornet
- Université de Bordeaux, CNRS, EPOC, EPHE, UMR 5805, F-33600 Pessac, France
| | - Mélanie Lesne
- Laboratoires des Pyrénées et des Landes, F-64150 Lagor, France
| | - Emilie Lyautey
- Université Savoie Mont Blanc, INRAE, CARRTEL, 74200 Thonon-les-Bains, France
| |
Collapse
|
7
|
Loayza F, Graham JP, Trueba G. Factors Obscuring the Role of E. coli from Domestic Animals in the Global Antimicrobial Resistance Crisis: An Evidence-Based Review. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E3061. [PMID: 32354184 PMCID: PMC7246672 DOI: 10.3390/ijerph17093061] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 04/21/2020] [Accepted: 04/21/2020] [Indexed: 01/01/2023]
Abstract
Recent studies have found limited associations between antimicrobial resistance (AMR) in domestic animals (and animal products), and AMR in human clinical settings. These studies have primarily used Escherichia coli, a critically important bacterial species associated with significant human morbidity and mortality. E. coli is found in domestic animals and the environment, and it can be easily transmitted between these compartments. Additionally, the World Health Organization has highlighted E. coli as a "highly relevant and representative indicator of the magnitude and the leading edge of the global antimicrobial resistance (AMR) problem". In this paper, we discuss the weaknesses of current research that aims to link E. coli from domestic animals to the current AMR crisis in humans. Fundamental gaps remain in our understanding the complexities of E. coli population genetics and the magnitude of phenomena such as horizontal gene transfer (HGT) or DNA rearrangements (transposition and recombination). The dynamic and intricate interplay between bacterial clones, plasmids, transposons, and genes likely blur the evidence of AMR transmission from E. coli in domestic animals to human microbiota and vice versa. We describe key factors that are frequently neglected when carrying out studies of AMR sources and transmission dynamics.
Collapse
Affiliation(s)
- Fernanda Loayza
- Microbiology Institute, Colegio de Ciencias Biológicas y Ambientales, Universidad San Francisco de Quito, Diego de Robles y Pampite, Cumbayá-Quito P.O. BOX 170901, Ecuador
| | - Jay P. Graham
- Berkeley School of Public Health, University of California, 2121 Berkeley Way, Room 5302, Berkeley, CA 94720-7360, USA
| | - Gabriel Trueba
- Microbiology Institute, Colegio de Ciencias Biológicas y Ambientales, Universidad San Francisco de Quito, Diego de Robles y Pampite, Cumbayá-Quito P.O. BOX 170901, Ecuador
| |
Collapse
|
8
|
Zhi S, Banting G, Stothard P, Ashbolt NJ, Checkley S, Meyer K, Otto S, Neumann NF. Evidence for the evolution, clonal expansion and global dissemination of water treatment-resistant naturalized strains of Escherichia coli in wastewater. WATER RESEARCH 2019; 156:208-222. [PMID: 30921537 DOI: 10.1016/j.watres.2019.03.024] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Revised: 03/12/2019] [Accepted: 03/15/2019] [Indexed: 06/09/2023]
Abstract
We previously demonstrated the existence of naturalized strains of E. coli in wastewater and herein perform an in-depth comparative whole genome analysis of these strains (n = 17). Fourteen of the Canadian E. coli strains, isolated from geographically separated wastewater treatment plants, were virtually identical at the core genome and were ≥96% similar at the whole genome level, suggesting clonal-relatedness among these isolates. Remarkably, these strains were shown to be extremely similar to the genome of an E. coli isolated from wastewater in Switzerland, suggesting a global distribution of these strains. The genomes of three other Canadian wastewater strains were more diverse but very similar to the genomes of E. coli isolates collected from U.S. wastewater samples. Based on maximum likelihood phylogenetic analysis, wastewater strains from Canada, the U.S. and Switzerland formed a clade separate from other known enteric phylogroups (i.e., A, B1, B2, D, E) and the cryptic clades. All Canadian, Swiss and U.S. wastewater strains possessed a common SNP biomarker pattern across their genomes, and a sub-population (i.e., 14 Canadian and 1 Swiss strain) also possessed a previously identified wastewater-specific marker known as uspC-IS30-flhDC element. Biochemical heat mapping of 518 categories of genes recapitulated phylogeny, with wastewater strains phenotypically clustering separately from enteric and cryptic clades. Wastewater strains were enriched for stress-response genes (i.e., nutrient acquisition/deprivation, DNA repair, oxidative stress, and UV resistance) - elements reflective of their environmental survival challenges. Wastewater strains were shown to carry a plethora of known antibiotic resistance (AR) genes, the patterns of which were remarkably similar among all Canadian, U.S. and Swiss wastewater strains. Virulence gene composition was also similar among all the wastewater strains, with an abundant representation of virulence genes commonly associated with urinary pathogenic E. coli (UPEC) as well as enterohemorrhagic (EHEC) E. coli. The remarkable degree of similarity between all wastewater strains from Canada, Switzerland and the U.S. suggests the evolution and global-dissemination of water treatment-resistant clone of E. coli. These finding, along with others, raise some important concerns about the potential for emergence of E. coli pathotypes resistant to water-treatment.
Collapse
Affiliation(s)
- Shuai Zhi
- School of Public Health, Room 3-57, South Academic Building, University of Alberta, Edmonton, Alberta, T6G 2G7, Canada
| | - Graham Banting
- School of Public Health, Room 3-57, South Academic Building, University of Alberta, Edmonton, Alberta, T6G 2G7, Canada
| | - Paul Stothard
- Faculty of Agricultural, Life and Environmental Sciences, 1400 College Plaza, University of Alberta, Edmonton, Alberta, Canada
| | - Nicholas J Ashbolt
- School of Public Health, Room 3-57, South Academic Building, University of Alberta, Edmonton, Alberta, T6G 2G7, Canada
| | - Sylvia Checkley
- Faculty of Veterinary Medicine, Department of Ecosystem and Public Health, University of Calgary, Calgary, Alberta, Canada
| | - Kelsey Meyer
- Faculty of Veterinary Medicine, Department of Ecosystem and Public Health, University of Calgary, Calgary, Alberta, Canada
| | - Simon Otto
- School of Public Health, Room 3-57, South Academic Building, University of Alberta, Edmonton, Alberta, T6G 2G7, Canada
| | - Norman F Neumann
- School of Public Health, Room 3-57, South Academic Building, University of Alberta, Edmonton, Alberta, T6G 2G7, Canada.
| |
Collapse
|
9
|
Tietz T, Selinski S, Golka K, Hengstler JG, Gripp S, Ickstadt K, Ruczinski I, Schwender H. Identification of interactions of binary variables associated with survival time using survivalFS. Arch Toxicol 2019; 93:585-602. [PMID: 30694373 DOI: 10.1007/s00204-019-02398-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 01/16/2019] [Indexed: 12/01/2022]
Abstract
Many medical studies aim to identify factors associated with a time to an event such as survival time or time to relapse. Often, in particular, when binary variables are considered in such studies, interactions of these variables might be the actual relevant factors for predicting, e.g., the time to recurrence of a disease. Testing all possible interactions is often not possible, so that procedures such as logic regression are required that avoid such an exhaustive search. In this article, we present an ensemble method based on logic regression that can cope with the instability of the regression models generated by logic regression. This procedure called survivalFS also provides measures for quantifying the importance of the interactions forming the logic regression models on the time to an event and for the assessment of the individual variables that take the multivariate data structure into account. In this context, we introduce a new performance measure, which is an adaptation of Harrel's concordance index. The performance of survivalFS and the proposed importance measures is evaluated in a simulation study as well as in an application to genotype data from a urinary bladder cancer study. Furthermore, we compare the performance of survivalFS and its importance measures for the individual variables with the variable importance measure used in random survival forests, a popular procedure for the analysis of survival data. These applications show that survivalFS is able to identify interactions associated with time to an event and to outperform random survival forests.
Collapse
Affiliation(s)
- Tobias Tietz
- Mathematical Institute, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
| | - Silvia Selinski
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund University, IfADo, 44139, Dortmund, Germany
| | - Klaus Golka
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund University, IfADo, 44139, Dortmund, Germany
| | - Jan G Hengstler
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund University, IfADo, 44139, Dortmund, Germany
| | - Stephan Gripp
- Department of Radiation Oncology, Heinrich Heine University Hospital, 44225, Düsseldorf, Germany
| | - Katja Ickstadt
- Faculty of Statistics, TU Dortmund University, 44221, Dortmund, Germany
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, 21205, USA
| | - Holger Schwender
- Mathematical Institute, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany.
| |
Collapse
|
10
|
Quainoo S, Coolen JPM, van Hijum SAFT, Huynen MA, Melchers WJG, van Schaik W, Wertheim HFL. Whole-Genome Sequencing of Bacterial Pathogens: the Future of Nosocomial Outbreak Analysis. Clin Microbiol Rev 2017; 30:1015-1063. [PMID: 28855266 PMCID: PMC5608882 DOI: 10.1128/cmr.00016-17] [Citation(s) in RCA: 228] [Impact Index Per Article: 32.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Outbreaks of multidrug-resistant bacteria present a frequent threat to vulnerable patient populations in hospitals around the world. Intensive care unit (ICU) patients are particularly susceptible to nosocomial infections due to indwelling devices such as intravascular catheters, drains, and intratracheal tubes for mechanical ventilation. The increased vulnerability of infected ICU patients demonstrates the importance of effective outbreak management protocols to be in place. Understanding the transmission of pathogens via genotyping methods is an important tool for outbreak management. Recently, whole-genome sequencing (WGS) of pathogens has become more accessible and affordable as a tool for genotyping. Analysis of the entire pathogen genome via WGS could provide unprecedented resolution in discriminating even highly related lineages of bacteria and revolutionize outbreak analysis in hospitals. Nevertheless, clinicians have long been hesitant to implement WGS in outbreak analyses due to the expensive and cumbersome nature of early sequencing platforms. Recent improvements in sequencing technologies and analysis tools have rapidly increased the output and analysis speed as well as reduced the overall costs of WGS. In this review, we assess the feasibility of WGS technologies and bioinformatics analysis tools for nosocomial outbreak analyses and provide a comparison to conventional outbreak analysis workflows. Moreover, we review advantages and limitations of sequencing technologies and analysis tools and present a real-world example of the implementation of WGS for antimicrobial resistance analysis. We aimed to provide health care professionals with a guide to WGS outbreak analysis that highlights its benefits for hospitals and assists in the transition from conventional to WGS-based outbreak analysis.
Collapse
Affiliation(s)
- Scott Quainoo
- Department of Microbiology, Radboud University, Nijmegen, The Netherlands
| | - Jordy P M Coolen
- Department of Medical Microbiology, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - Sacha A F T van Hijum
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
- NIZO, Ede, The Netherlands
| | - Martijn A Huynen
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - Willem J G Melchers
- Department of Medical Microbiology, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - Willem van Schaik
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, United Kingdom
| | - Heiman F L Wertheim
- Department of Medical Microbiology, Radboud University Medical Centre, Nijmegen, The Netherlands
| |
Collapse
|
11
|
Lorenz MW, Abdi NA, Scheckenbach F, Pflug A, Bülbül A, Catapano AL, Agewall S, Ezhov M, Bots ML, Kiechl S, Orth A. Automatic identification of variables in epidemiological datasets using logic regression. BMC Med Inform Decis Mak 2017; 17:40. [PMID: 28407816 PMCID: PMC5390441 DOI: 10.1186/s12911-017-0429-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 03/23/2017] [Indexed: 12/11/2022] Open
Abstract
Background For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or semi-automated identification of variables can help to reduce the workload and improve the data quality. For semi-automation high sensitivity in the recognition of matching variables is particularly important, because it allows creating software which for a target variable presents a choice of source variables, from which a user can choose the matching one, with only low risk of having missed a correct source variable. Methods For each variable in a set of target variables, a number of simple rules were manually created. With logic regression, an optimal Boolean combination of these rules was searched for every target variable, using a random subset of a large database of epidemiological and clinical cohort data (construction subset). In a second subset of this database (validation subset), this optimal combination rules were validated. Results In the construction sample, 41 target variables were allocated on average with a positive predictive value (PPV) of 34%, and a negative predictive value (NPV) of 95%. In the validation sample, PPV was 33%, whereas NPV remained at 94%. In the construction sample, PPV was 50% or less in 63% of all variables, in the validation sample in 71% of all variables. Conclusions We demonstrated that the application of logic regression in a complex data management task in large epidemiological IPD meta-analyses is feasible. However, the performance of the algorithm is poor, which may require backup strategies. Electronic supplementary material The online version of this article (doi:10.1186/s12911-017-0429-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Matthias W Lorenz
- Department of Neurology, University Clinic Frankfurt, Schleusenweg 2-16, D-60528, Frankfurt/Main, Germany.
| | - Negin Ashtiani Abdi
- Faculty of Computer Science and Engineering, Frankfurt University of Applied Sciences, Frankfurt/Main, Germany
| | - Frank Scheckenbach
- Department of Neurology, University Clinic Frankfurt, Schleusenweg 2-16, D-60528, Frankfurt/Main, Germany
| | - Anja Pflug
- Department of Neurology, University Clinic Frankfurt, Schleusenweg 2-16, D-60528, Frankfurt/Main, Germany
| | - Alpaslan Bülbül
- Department of Neurology, University Clinic Frankfurt, Schleusenweg 2-16, D-60528, Frankfurt/Main, Germany
| | - Alberico L Catapano
- IRCSS Multimedica, Milan, Italy.,Department of Pharmacological and Biomolecular Sciences, University of Milan, Milan, Italy
| | - Stefan Agewall
- Institute of Clinical Sciences, University of Oslo, Oslo, Norway.,Department of Cardiology, Oslo University Hospital Ullevål, Oslo, Norway
| | - Marat Ezhov
- Atherosclerosis Department, Cardiology Research Center, Moscow, Russia
| | - Michiel L Bots
- University Medical Center Utrecht, Utrecht, The Netherlands.,Department of Epidemiology and Biostatistics, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Stefan Kiechl
- Department of Neurology, Medical University Innsbruck, Innsbruck, Austria
| | - Andreas Orth
- Faculty of Computer Science and Engineering, Frankfurt University of Applied Sciences, Frankfurt/Main, Germany
| | | |
Collapse
|
12
|
Zhi S, Li Q, Yasui Y, Banting G, Edge TA, Topp E, McAllister TA, Neumann NF. An evaluation of logic regression-based biomarker discovery across multiple intergenic regions for predicting host specificity in Escherichia coli. Mol Phylogenet Evol 2016; 103:133-142. [DOI: 10.1016/j.ympev.2016.07.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Revised: 06/23/2016] [Accepted: 07/14/2016] [Indexed: 10/21/2022]
|
13
|
Evidence of Naturalized Stress-Tolerant Strains of Escherichia coli in Municipal Wastewater Treatment Plants. Appl Environ Microbiol 2016; 82:5505-18. [PMID: 27371583 PMCID: PMC5007776 DOI: 10.1128/aem.00143-16] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 06/23/2016] [Indexed: 01/06/2023] Open
Abstract
Escherichia coli has been proposed to have two habitats—the intestines of mammals/birds and the nonhost environment. Our goal was to assess whether certain strains of E. coli have evolved toward adaptation and survival in wastewater. Raw sewage samples from different treatment plants were subjected to chlorine stress, and ∼59% of the surviving E. coli strains were found to contain a genetic insertion element (IS30) located within the uspC-flhDC intergenic region. The positional location of the IS30 element was not observed across a library of 845 E. coli isolates collected from various animal hosts or within GenBank or whole-genome reference databases for human and animal E. coli isolates (n = 1,177). Phylogenetics clustered the IS30 element-containing wastewater E. coli isolates into a distinct clade, and biomarker analysis revealed that these wastewater isolates contained a single nucleotide polymorphism (SNP) biomarker pattern that was specific for wastewater. These isolates belonged to phylogroup A, possessed generalized stress response (RpoS) activity, and carried the locus of heat resistance, features likely relevant to nonhost environmental survival. Isolates were screened for 28 virulence genes but carried only the fimH marker. Our data suggest that wastewater contains a naturalized resident population of E. coli. We developed an endpoint PCR targeting the IS30 element within the uspC-flhDC intergenic region, and all raw sewage samples (n = 21) were positive for this marker. Conversely, the prevalence of this marker in E. coli-positive surface and groundwater samples was low (≤5%). This simple PCR assay may represent a convenient microbial source-tracking tool for identification of water samples affected by municipal wastewater. IMPORTANCE The results of this study demonstrate that some strains of E. coli appear to have evolved to become naturalized populations in the wastewater environment and possess a number of stress-related genetic elements likely important for survival in this nonhost environment. The presence of non-host-adapted strains in wastewater challenges our understanding of using E. coli as a microbial indicator of wastewater treatment performance, suggesting that the E. coli strains present in human and animal feces may be very different from those found in treated wastewater.
Collapse
|