Liu B, Long R, Chou KC. iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework.
ACTA ACUST UNITED AC 2016;
32:2411-8. [PMID:
27153623 DOI:
10.1093/bioinformatics/btw186]
[Citation(s) in RCA: 174] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 04/03/2016] [Indexed: 11/13/2022]
Abstract
MOTIVATION
Regulatory DNA elements are associated with DNase I hypersensitive sites (DHSs). Accordingly, identification of DHSs will provide useful insights for in-depth investigation into the function of noncoding genomic regions.
RESULTS
In this study, using the strategy of ensemble learning framework, we proposed a new predictor called iDHS-EL for identifying the location of DHS in human genome. It was formed by fusing three individual Random Forest (RF) classifiers into an ensemble predictor. The three RF operators were respectively based on the three special modes of the general pseudo nucleotide composition (PseKNC): (i) kmer, (ii) reverse complement kmer and (iii) pseudo dinucleotide composition. It has been demonstrated that the new predictor remarkably outperforms the relevant state-of-the-art methods in both accuracy and stability.
AVAILABILITY AND IMPLEMENTATION
For the convenience of most experimental scientists, a web server for iDHS-EL is established at http://bioinformatics.hitsz.edu.cn/iDHS-EL, which is the first web-server predictor ever established for identifying DHSs, and by which users can easily get their desired results without the need to go through the mathematical details. We anticipate that IDHS-EL: will become a very useful high throughput tool for genome analysis.
CONTACT
bliu@gordonlifescience.org or bliu@insun.hit.edu.cn
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Collapse