Wu CY, Li QZ, Feng ZX. Non-coding RNA identification based on topology secondary structure and reading frame in organelle genome level.
Genomics 2015;
107:9-15. [PMID:
26697761 DOI:
10.1016/j.ygeno.2015.12.002]
[Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 12/08/2015] [Accepted: 12/12/2015] [Indexed: 10/22/2022]
Abstract
Non-coding RNA (ncRNA) genes make transcripts as same as the encoding genes, and ncRNAs directly function as RNAs rather than serve as blueprints for proteins. As the function of ncRNA is closely related to organelle genomes, it is desirable to explore ncRNA function by confirming its provenance. In this paper, the topology secondary structure, motif and the triplets under three reading frames are considered as parameters of ncRNAs. A method of SVM combining the increment of diversity (ID) algorithm is applied to construct the classifier. When the method is applied to the ncRNA dataset less than 80% sequence identity, the overall accuracies reach 95.57%, 96.40% in the five-fold cross-validation and the jackknife test, respectively. Further, for the independent testing dataset, the average prediction success rate of our method achieved 93.24%. The higher predictive success rates indicate that our method is very helpful for distinguishing ncRNAs from various organelle genomes.
Collapse