Ju N, Liu J, He Q. SNP-slice resolves mixed infections: simultaneously unveiling strain haplotypes and linking them to hosts.
Bioinformatics 2024;
40:btae344. [PMID:
38885409 PMCID:
PMC11187496 DOI:
10.1093/bioinformatics/btae344]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 05/09/2024] [Accepted: 06/14/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION
Multi-strain infection is a common yet under-investigated phenomenon of many pathogens. Currently, biologists analyzing SNP information sometimes have to discard mixed infection samples as many downstream analyses require monogenomic inputs. Such a protocol impedes our understanding of the underlying genetic diversity, co-infection patterns, and genomic relatedness of pathogens. A scalable tool to learn and resolve the SNP-haplotypes from polygenomic data is an urgent need in molecular epidemiology.
RESULTS
We develop a slice sampling Markov Chain Monte Carlo algorithm, named SNP-Slice, to learn not only the SNP-haplotypes of all strains in the populations but also which strains infect which hosts. Our method reconstructs SNP-haplotypes and individual heterozygosities accurately without reference panels and outperforms the state-of-the-art methods at estimating the multiplicity of infections and allele frequencies. Thus, SNP-Slice introduces a novel approach to address polygenomic data and opens a new avenue for resolving complex infection patterns in molecular surveillance. We illustrate the performance of SNP-Slice on empirical malaria and HIV datasets and provide recommendations for using our method on empirical datasets.
AVAILABILITY AND IMPLEMENTATION
The implementation of the SNP-Slice algorithm, as well as scripts to analyze SNP-Slice outputs, are available at https://github.com/nianqiaoju/snp-slice.
Collapse