Amin N, McGrath A, Chen YPP. FexRNA: Exploratory Data Analysis and Feature Selection of Non-Coding RNA.
IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;
18:2795-2801. [PMID:
33539302 DOI:
10.1109/tcbb.2021.3057128]
[Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Non-coding RNA (ncRNA) is involved in many biological processes and diseases in all species. Many ncRNA datasets exist that provide ncRNA data in FASTA format which is well suited for biomedical purposes. However, for ncRNA analysis and classification, statistical learning methods require hidden numerical features from the data. Furthermore, in the literature, a wealth of sequence intrinsic features has been proposed for ncRNA identification. The extraction of hidden features, their analysis, and usage of a suitable set of features is crucial for the performance of any statistical learning method. To alleviate the posed challenges, we generated 96 feature datasets from ncRNA widely used features. The feature datasets are based on RNACentral and consist of species, ncRNA types, and expert databases that are available on the FexRNA platform. Additionally, the feature datasets are explored and analysed to provide statistical information, univariate, and bivariate analysis. We sought to determine which of these 17 features would be most appropriate to use in developing ncRNA classification approaches. For feature selection (FS), a two-phase hierarchical FS framework based on correlation and majority voting is proposed and evaluated on 5 species. The FexRNA platform provides information about ncRNA feature analysis and selection.
Collapse