1
|
Liu YY, Chen CC. A machine learning-based typing scheme refinement for Listeria monocytogenes core genome multilocus sequence typing with high discriminatory power for common source outbreak tracking. PLoS One 2021; 16:e0260293. [PMID: 34797875 PMCID: PMC8604304 DOI: 10.1371/journal.pone.0260293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Accepted: 11/05/2021] [Indexed: 11/18/2022] Open
Abstract
Background As whole-genome sequencing for pathogen genomes becomes increasingly popular, the typing methods of gene-by-gene comparison, such as core genome multilocus sequence typing (cgMLST) and whole-genome multilocus sequence typing (wgMLST), are being routinely implemented in molecular epidemiology. However, some intrinsic problems remain. For example, genomic sequences with varying read depths, read lengths, and assemblers influence the genome assemblies, introducing error or missing alleles into the generated allelic profiles. These errors and missing alleles might create “specious discrepancy” among closely related isolates, thus making accurate epidemiological interpretation challenging. In addition, the rapid growth of the cgMLST allelic profile database can cause problems related to storage and maintenance as well as long query search times. Methods We attempted to resolve these issues by decreasing the scheme size to reduce the occurrence of error and missing alleles, alleviate the storage burden, and improve the query search time. The challenge in this approach is maintaining the typing resolution when using fewer loci. We achieved this by using a popular artificial intelligence technique, XGBoost, coupled with Shapley additive explanations for feature selection. Finally, 370 loci from the original 1701 cgMLST loci of Listeria monocytogenes were selected. Results Although the size of the final scheme (LmScheme_370) was approximately 80% lower than that of the original cgMLST scheme, its discriminatory power, tested for 35 outbreaks, was concordant with that of the original cgMLST scheme. Although we used L. monocytogenes as a demonstration in this study, the approach can be applied to other schemes and pathogens. Our findings might help elucidate gene-by-gene–based epidemiology.
Collapse
Affiliation(s)
- Yen-Yi Liu
- Department of Public Health, China Medical University, Taichung, Taiwan
| | - Chih-Chieh Chen
- Institute of Medical Science and Technology, National Sun Yat-sen University, Kaohsiung, Taiwan
- Rapid Screening Research Center for Toxicology and Biomedicine, National Sun Yat-sen University, Kaohsiung, Taiwan
- * E-mail:
| |
Collapse
|
2
|
Huang CH, Chen CC, Liou JS, Lee AY, Blom J, Lin YC, Huang L, Watanabe K. Genome-based reclassification of Lactobacillus casei: emended classification and description of the species Lactobacillus zeae. Int J Syst Evol Microbiol 2020; 70:3755-3762. [DOI: 10.1099/ijsem.0.003969] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Taxonomic relationships between
Lactobacillus casei
,
Lactobacillus paracasei
and
Lactobacillus zeae
have long been debated. Results of previous analyses have shown that overall genome relatedness indices (such as average nucleotide identity and core nucleotide identity) between the type strains
L. casei
ATCC 393T and
L. zeae
ATCC 15820T were 94.6 and 95.3 %, respectively, which are borderline for species definition. However, the digital DNA‒DNA hybridization value was 57.3 %, which was clearly lower than the species delineation threshold of 70 %, and hence raised the possibility that
L. casei
could be reclassified into two species. To re-evaluate the taxonomic relationship of these taxa, multilocus sequence analysis (MLSA) based on the concatenated five housekeeping gene (dnaJ, dnaK, mutL, pheS and yycH) sequences, phylogenomic and core genome multilocus sequence typing analyses, gene presence and absence profiles using pan-genome analysis, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) profiling analysis, cellular fatty acid compositions, and phenotype analysis were carried out. The results of phenotypic characterization, MLSA, whole-genome sequence-based analyses and MALDI-TOF MS profiling justified an independent species designation for the
L. zeae
strains, and supported an emended the description of the name of
Lactobacillus zeae
(ex Kuznetsov 1956) Dicks et al. 1996, with ATCC 15820T (=DSM 20178T=BCRC 17942T) as the type strain.
Collapse
Affiliation(s)
- Chien-Hsun Huang
- Bioresource Collection and Research Center, Food Industry Research and Development Institute, 331 Shih-Pin Rd, Hsinchu 30062, Taiwan, ROC
| | - Chih-Chieh Chen
- Institute of Medical Science and Technology, National Sun Yat-sen University, Kaohsiung 80424, Taiwan, ROC
- General Institute of Clinical Medicine, Kaohsiung Medical University, Kaohsiung 80708, Taiwan, ROC
- Rapid Screening Research Center for Toxicology and Biomedicine, National Sun Yat-sen University, Kaohsiung 80424, Taiwan, ROC
| | - Jong-Shian Liou
- Bioresource Collection and Research Center, Food Industry Research and Development Institute, 331 Shih-Pin Rd, Hsinchu 30062, Taiwan, ROC
| | - Ai-Yun Lee
- Bioresource Collection and Research Center, Food Industry Research and Development Institute, 331 Shih-Pin Rd, Hsinchu 30062, Taiwan, ROC
| | - Jochen Blom
- Bioinformatics and Systems Biology, Justus-Liebig-University Giessen, Giessen, 35392, Germany
| | - Yu-Chun Lin
- Livestock Research Institute, Council of Agriculture, Executive Yuan, Tainan, Taiwan, ROC
| | - Lina Huang
- Bioresource Collection and Research Center, Food Industry Research and Development Institute, 331 Shih-Pin Rd, Hsinchu 30062, Taiwan, ROC
| | - Koichi Watanabe
- Bioresource Collection and Research Center, Food Industry Research and Development Institute, 331 Shih-Pin Rd, Hsinchu 30062, Taiwan, ROC
- Department of Animal Science and Technology, College of Bioresources and Agriculture, National Taiwan University, Taipei 10673, Taiwan, ROC
| |
Collapse
|