A new graph-theoretic approach to determine the similarity of genome sequences based on nucleotide triplets.
Genomics 2020;
112:4701-4714. [PMID:
32827671 PMCID:
PMC7437474 DOI:
10.1016/j.ygeno.2020.08.023]
[Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 07/15/2020] [Accepted: 08/17/2020] [Indexed: 11/22/2022]
Abstract
Methods of finding sequence similarity play a significant role in computational biology. Owing to the rapid increase of genome sequences in public databases, the evolutionary relationship of species becomes more challenging. But traditional alignment-based methods are found inappropriate due to their time-consuming nature. Therefore, it is necessary to find a faster method, which applies to species phylogeny. In this paper, a new graph-theory based alignment-free sequence comparison method is proposed. A complete-bipartite graph is used to represent each genome sequence based on its nucleotide triplets. Subsequently, with the help of the weights of edges of the graph, a vector descriptor is formed. Finally, the phylogenetic tree is drawn using the UPGMA algorithm. In the present case, the datasets for comparison are related to mammals, viruses, and bacteria. In most of the cases, the phylogeny in the present case is found to be more satisfactory as compared to earlier methods.
A new graph-theory based alignment-free genome sequence comparison.
Use of complete bipartite graph to represent genome sequences.
Descriptor based on the weights of the edges of the graph.
Comparison of the phylogenetic trees of different mammals, viruses, and bacteria.
Less time complexity compared to that of earlier methods.
Collapse