Suvorova YM, Korotkov EV. New Method for Potential Fusions Detection in Protein-Coding Sequences.
J Comput Biol 2019;
26:1253-1261. [PMID:
31211597 DOI:
10.1089/cmb.2019.0122]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Abstract
Gene fusion is known to be one of the mechanisms of a new gene formation. Most bioinformatics methods for studying fused genes are based on the sequence similarity search. However, if the ancestral sequences were lost during evolution or changed too much, it is impossible to detect the fusion. Previously, we have developed a method of searching for triplet periodicity (TP) change points in protein-coding sequences (CDS) and showed the possible relation of this phenomenon with gene formation as a result of fusion. In this study, we improved the TP change point detection method and studied the genes of six eukaryotic genomes. At the level of 2%-3% of the probability of type I error, TP change points were found in 20%-40% of genes. Further analysis showed that about 30% of the TP change points can be explained by amino acid repeats. Another 30% can be potentially fused genes, alignment for which was detected by the BLAST program. We believe that the rest of the results can be fused genes, the ancestral sequences for which have been lost. The method is more sensitive to TP changes and allowed us to find up to two to three times more cases of significant TP change points than our previous method.
Collapse