Qi Z, Wen X. Novel Protein Sequence Comparison Method Based on Transition Probability Graph and Information Entropy.
Comb Chem High Throughput Screen 2020;
25:392-400. [PMID:
32875978 DOI:
10.2174/1386207323666200901103001]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 07/17/2020] [Accepted: 07/17/2020] [Indexed: 11/22/2022]
Abstract
AIM AND OBJECTIVE
Sequence analysis is one of the foundations in bioinformatics. It is widely used to find out the feature metric hidden in the sequence. Otherwise, the graphical representation of biologic sequence is an important tool for sequencing analysis. This study is undertaken to find out a new graphical representation of biosequences.
MATERIALS AND METHODS
The transition probability is used to describe amino acid combinations of protein sequences. The combinations are composed of amino acids directly adjacent to each other or separated by multiple amino acids. The transition probability graph is built up by the transition probabilities of amino acid combinations. Next, a map is defined as a representation from transition probability graph to transition probability vector by k-order transition probability graph. Transition entropy vectors are developed by the transition probability vector and information entropy. Finally, the proposed method is applied to two separate applications, 499 HA genes of H1N1, and 95 coronaviruses.
RESULTS
By constructing a phylogenetic tree, we find that the results of each application are consistent with other studies.
CONCLUSION
the graphical representation proposed in this article is a practical and correct method.
Collapse