1
|
Carriço JA, Rossi M, Moran-Gilad J, Van Domselaar G, Ramirez M. A primer on microbial bioinformatics for nonbioinformaticians. Clin Microbiol Infect 2018; 24:342-349. [PMID: 29309933 DOI: 10.1016/j.cmi.2017.12.015] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 11/13/2017] [Accepted: 12/22/2017] [Indexed: 01/19/2023]
Abstract
BACKGROUND Presently, the bottleneck in the deployment of high-throughput sequencing technology is the ability to analyse the increasing amount of data produced in a fit-for-purpose manner. The field of microbial bioinformatics is thriving and quickly adapting to technological changes, which creates difficulties for nonbioinformaticians in following the complexity and increasingly obscure jargon of this field. AIMS This review is directed towards nonbioinformaticians who wish to gain understanding of the overall microbial bioinformatic processes, from raw data obtained from sequencers to final outputs. SOURCES The software and analytical strategies reviewed are based on the personal experience of the authors. CONTENT The bioinformatic processes of transforming raw reads to actionable information in a clinical and epidemiologic context is explained. We review the advantages and limitations of two major strategies currently applied: read mapping, which is the comparison with a predefined reference genome, and de novo assembly, which is the unguided assembly of the raw data. Finally, we discuss the main analytical methodologies and the most frequently used freely available software and its application in the context of bacterial infectious disease management. IMPLICATIONS High-throughput sequencing technologies are overhauling outbreak investigation and epidemiologic surveillance while creating new challenges due to the amount and complexity of data generated. The continuously evolving field of microbial bioinformatics is required for stakeholders to fully harness the power of these new technologies.
Collapse
Affiliation(s)
- J A Carriço
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal.
| | - M Rossi
- Department of Food Hygiene and Environmental Health, Faculty of Veterinary Medicine, University of Helsinki, Helsinki, Finland
| | - J Moran-Gilad
- Department of Health Systems Management, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Public Health Services, Ministry of Health, Jerusalem, Israel; ESCMID Study Group for Genomic and Molecular Diagnostics (ESGMD), Basel, Switzerland
| | - G Van Domselaar
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington St, Winnipeg, MB, R3E 3R2, Canada; Department of Medical Microbiology and Infectious Diseases, University of Manitoba, 745 Bannatyne Avenue, Winnipeg, MB, R3E 0J9, Canada
| | - M Ramirez
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
2
|
Carriço JA, Sabat AJ, Friedrich AW, Ramirez M. Bioinformatics in bacterial molecular epidemiology and public health: databases, tools and the next-generation sequencing revolution. ACTA ACUST UNITED AC 2013; 18:20382. [PMID: 23369390 DOI: 10.2807/ese.18.04.20382-en] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Advances in typing methodologies have been the driving force in the field of molecular epidemiology of pathogens. The development of molecular methodologies, and more recently of DNA sequencing methods to complement and improve phenotypic identification methods, was accompanied by the generation of large amounts of data and the need to develop ways of storing and analysing them. Simultaneously, advances in computing allowed the development of specialised algorithms for image analysis, data sharing and integration, and for mining the ever larger amounts of accumulated data. In this review, we will discuss how bioinformatics accompanied the changes in bacterial molecular epidemiology. We will discuss the benefits for public health of specialised online typing databases and algorithms allowing for real-time data analysis and visualisation. The impact of the new and disruptive next-generation sequencing methodologies will be evaluated, and we will look ahead into these novel challenges.
Collapse
Affiliation(s)
- J A Carriço
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal.
| | | | | | | | | |
Collapse
|
3
|
Carriço JA, Silva-Costa C, Melo-Cristino J, Pinto FR, de Lencastre H, Almeida JS, Ramirez M. Illustration of a common framework for relating multiple typing methods by application to macrolide-resistant Streptococcus pyogenes. J Clin Microbiol 2006; 44:2524-32. [PMID: 16825375 PMCID: PMC1489512 DOI: 10.1128/jcm.02536-05] [Citation(s) in RCA: 242] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The studies that correlate the results obtained by different typing methodologies rely solely on qualitative comparisons of the groups defined by each methodology. We propose a framework of measures for the quantitative assessment of correspondences between different typing methods as a first step to the global mapping of type equivalences. A collection of 325 macrolide-resistant Streptococcus pyogenes isolates associated with pharyngitis cases in Portugal was used to benchmark the proposed measures. All isolates were characterized by macrolide resistance phenotyping, T serotyping, emm sequence typing, and pulsed-field gel electrophoresis (PFGE), using SmaI or Cfr9I and SfiI. A subset of 41 isolates, representing each PFGE cluster, was also characterized by multilocus sequence typing (MLST). The application of Adjusted Rand and Wallace indices allowed the evaluation of the strength and the directionality of the correspondences between the various typing methods and showed that if PFGE or MLST data are available one can confidently predict the emm type (Wallace coefficients of 0.952 for both methods). In contrast, emm typing was a poor predictor of PFGE cluster or MLST sequence type (Wallace coefficients of 0.803 and 0.655, respectively). This was confirmed by the analysis of the larger data set available from http://spyogenes.mlst.net and underscores the necessity of performing PFGE or MLST to unambiguously define clones in S. pyogenes.
Collapse
Affiliation(s)
- J A Carriço
- Grupo de Biomatemática, Istituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Rua da Quinta Grande 6 2780-156 Oeiras, Portugal.
| | | | | | | | | | | | | |
Collapse
|
4
|
Mato R, Sanches IS, Simas C, Nunes S, Carriço JA, Sousa NG, Frazão N, Saldanha J, Brito-Avô A, Almeida JS, de Lencastre H. Natural history of drug-resistant clones of Streptococcus pneumoniae colonizing healthy children in Portugal. Microb Drug Resist 2006; 11:309-22. [PMID: 16359190 DOI: 10.1089/mdr.2005.11.309] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A total of 3,539 Streptococcus pneumoniae (Pn) were recovered from 4,969 nasopharyngeal samples of children attending 13 day-care centers (DCCs) located in Lisbon, Portugal, during a surveillance study from January, 2001, through March, 2003, integrated in the European intervention project (EURIS, European Resistance Intervention Study). All Pn isolates were tested for anti-biotyping and drug-resistant pneumococci (DRPn) were further tested by serotyping and pulsed-field gel electrophoresis (PFGE). Overall carriage of Pn was very high (71.2%) and 39.9% of the isolates were resistant to antimicrobials (22.5% with decreased susceptibility to penicillin and 17.4% susceptible to penicillin and resistant to other antimicrobials). Serotypes 6B, 14, 23 F, 19F, and 19 A were prevalent among the 1,287 DRPn and 5.8% of the isolates were non-typeable. Eighty PFGE patterns were identified among 1,285 DRPn, and 93.1% of the DRPn belonged to 26 major clonal types that comprised: Pneumococcal Molecular Epidemiology Network (PMEN) clones (76.3%), Portuguese (PT)-DCC clones, previously detected in 1996-1999 (14.3%), and EURIS PT-DCC new clones, identified for the first time in the EURIS study, during 2001-2003 (9.4%). Comparing with previous Portuguese surveillance studies carried out since 1996, we observed that carriage increased from 47% to 71%, but no major changes were detected on the prevalence of pneumococcal serotypes. Moreover, although PMEN clones were predominant in all DCCs, in the present study the majority of them were gradually decreasing in time whereas several PT-DCC and new clones seemed to be increasing.
Collapse
Affiliation(s)
- R Mato
- Instituto de Tecnologia Química e Biológica da Universidade Nova de Lisboa (ITQB/UNL), 2780-156 Oeiras, Portugal
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Carriço JA, Pinto FR, Simas C, Nunes S, Sousa NG, Frazão N, de Lencastre H, Almeida JS. Assessment of band-based similarity coefficients for automatic type and subtype classification of microbial isolates analyzed by pulsed-field gel electrophoresis. J Clin Microbiol 2005; 43:5483-90. [PMID: 16272474 PMCID: PMC1287802 DOI: 10.1128/jcm.43.11.5483-5490.2005] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Pulsed-field gel electrophoresis (PFGE) has been the typing method of choice for strain identification in epidemiological studies of several bacterial species of medical importance. The usual procedure for the comparison of strains and assignment of strain type and subtype relies on visual assessment of band difference number, followed by an incremental assignment to the group hosting the most similar type previously seen. Band-based similarity coefficients, such as the Dice or the Jaccard coefficient, are then used for dendrogram construction, which provides a quantitative assessment of strain similarity. PFGE type assignment is based on the definition of a threshold linkage value, below which strains are assigned to the same group. This is typically performed empirically by inspecting the hierarchical cluster analysis dendrogram containing the strains of interest. This approach has the problem that the threshold value selected is dependent on the linkage method used for dendrogram construction. Furthermore, the use of a linkage method skews the original similarity values between strains. In this paper we assess the goodness of classification of several band-based similarity coefficients by comparing it with the band difference number for PFGE type and subtype classification using receiver operating characteristic curves. The procedure described was applied to a collection of PFGE results for 1,798 isolates of Streptococcus pneumoniae, which documented 96 types and 396 subtypes. The band-based similarity coefficients were found to perform equally well for type classification, but with different proportions of false-positive and false-negative classifications in their minimal false discovery rate when they were used for subtype classification.
Collapse
Affiliation(s)
- J A Carriço
- Biomathematics Group, Universidade Nova de Lisboa, Rua da Quinta Grande 6, 2780-156 Oeiras, Portugal.
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Serrano I, Melo-Cristino J, Carriço JA, Ramirez M. Characterization of the genetic lineages responsible for pneumococcal invasive disease in Portugal. J Clin Microbiol 2005; 43:1706-15. [PMID: 15814989 PMCID: PMC1081348 DOI: 10.1128/jcm.43.4.1706-1715.2005] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The availability of a conjugate vaccine has the potential to reduce the disease burden of pneumococci and to alter the serotype frequency in the disease-causing population through immunoselection. These changes will probably be reflected in the distributions of individual genetic lineages within the population. We present a characterization of a collection of recent (1999 to 2002) invasive isolates from Portugal (n = 465) by macrorestriction profiling with pulsed-field gel electrophoresis (PFGE) and multilocus sequence typing. During this time, serotypes 14, 1, 3, 4, 8, 9V, 23F, 7F, 19A, and 12B were the 10 most prevalent overall by decreasing rank order. By combining the PFGE data with the sequence types (STs) of 104 isolates, we were able to identify the genetic lineages of the majority of the isolates. We found 66 STs, including 20 novel STs, corresponding to 47 different lineages by e-BURST analysis. We found in our collection a number of previously identified internationally disseminated lineages, especially among macrolide-resistant and penicillin-resistant isolates, and these accounted for most of the isolates. Most of the major lineages (17 of 25) were identified in all years of the study, suggesting that the pneumococcal population associated with invasive disease was stable. This study provides a characterization of the pneumococcal population associated with invasive disease that will be useful for detecting potential selective effects of the novel conjugate vaccine.
Collapse
Affiliation(s)
- I Serrano
- Institute of Molecular Medicine, Lisbon Faculty of Medicine, Lisbon, Portugal
| | | | | | | |
Collapse
|
7
|
Abstract
MOTIVATION Chaos Game Representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to find the coordinates for their position in a continuous space. This distribution of positions has two properties: it is unique, and the source sequence can be recovered from the coordinates such that distance between positions measures similarity between the corresponding sequences. The possibility of using the latter property to identify succession schemes have been entirely overlooked in previous studies which raises the possibility that CGR may be upgraded from a mere representation technique to a sequence modeling tool. RESULTS The distribution of positions in the CGR plane were shown to be a generalization of Markov chain probability tables that accommodates non-integer orders. Therefore, Markov models are particular cases of CGR models rather than the reverse, as currently accepted. In addition, the CGR generalization has both practical (computational efficiency) and fundamental (scale independence) advantages. These results are illustrated by using Escherichia coli K-12 as a test data-set, in particular, the genes thrA, thrB and thrC of the threonine operon.
Collapse
Affiliation(s)
- J S Almeida
- ITQB/Universidade Nova Lisboa, PO Box 127, 2780 Oeiras, Portugal.
| | | | | | | | | |
Collapse
|