Zheng X, Qin Y, Wang J. A Poisson model of sequence comparison and its application to coronavirus phylogeny.
Math Biosci 2008;
217:159-66. [PMID:
19073197 PMCID:
PMC7094598 DOI:
10.1016/j.mbs.2008.11.006]
[Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2008] [Revised: 09/30/2008] [Accepted: 11/14/2008] [Indexed: 11/18/2022]
Abstract
In this paper, we propose two metrics to compare DNA and protein sequences based on a Poisson model of word occurrences. Instead of comparing the frequencies of all fixed-length words in two sequences, we consider (1) the probability of ‘generating’ one sequence under the Poisson model estimated from the other; (2) their different expression levels of words. Phylogenetic trees of 25 viruses including SARS-CoVs are constructed to illustrate our approach.
Collapse