1
|
Amirkhah R, Farazmand A, Gupta SK, Ahmadi H, Wolkenhauer O, Schmitz U. Naïve Bayes classifier predicts functional microRNA target interactions in colorectal cancer. MOLECULAR BIOSYSTEMS 2016; 11:2126-34. [PMID: 26086375 DOI: 10.1039/c5mb00245a] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Alterations in the expression of miRNAs have been extensively characterized in several cancers, including human colorectal cancer (CRC). Recent publications provide evidence for tissue-specific miRNA target recognition. Several computational methods have been developed to predict miRNA targets; however, all of these methods assume a general pattern underlying these interactions and therefore tolerate reduced prediction accuracy and a significant number of false predictions. The motivation underlying the presented work was to unravel the relationship between miRNAs and their target mRNAs in CRC. We developed a novel computational algorithm for miRNA-target prediction in CRC using a Naïve Bayes classifier. The algorithm, which is referred to as CRCmiRTar, was trained with data from validated miRNA target interactions in CRC and other cancer entities. Furthermore, we identified a set of position-based, sequence, structural, and thermodynamic features that identify CRC-specific miRNA target interactions. Evaluation of the algorithm showed a significant improvement of performance with respect to AUC, and sensitivity, compared to other widely used algorithms based on machine learning. Based on miRNA and gene expression profiles in CRC tissues with similar clinical and pathological features, our classifier predicted 204 functional interactions, which involve 11 miRNAs and 41 mRNAs in this cancer entity. While the approach is here validated for CRC, the implementation of disease-specific miRNA target prediction algorithms can be easily adopted for other applications too. The identification of disease-specific miRNA target interactions may also facilitate the identification of potential drug targets.
Collapse
Affiliation(s)
- Raheleh Amirkhah
- Department of Cell and Molecular Biology, School of Biology, College of Science, University of Tehran, Tehran, Iran
| | | | | | | | | | | |
Collapse
|
2
|
Mousavi R, Eftekhari M, Haghighi MG. A new approach to human microRNA target prediction using ensemble pruning and rotation forest. J Bioinform Comput Biol 2016; 13:1550017. [DOI: 10.1142/s0219720015500171] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
MicroRNAs (miRNAs) are small non-coding RNAs that have important functions in gene regulation. Since finding miRNA target experimentally is costly and needs spending much time, the use of machine learning methods is a growing research area for miRNA target prediction. In this paper, a new approach is proposed by using two popular ensemble strategies, i.e. Ensemble Pruning and Rotation Forest (EP-RTF), to predict human miRNA target. For EP, the approach utilizes Genetic Algorithm (GA). In other words, a subset of classifiers from the heterogeneous ensemble is first selected by GA. Next, the selected classifiers are trained based on the RTF method and then are combined using weighted majority voting. In addition to seeking a better subset of classifiers, the parameter of RTF is also optimized by GA. Findings of the present study confirm that the newly developed EP-RTF outperforms (in terms of classification accuracy, sensitivity, and specificity) the previously applied methods over four datasets in the field of human miRNA target. Diversity-error diagrams reveal that the proposed ensemble approach constructs individual classifiers which are more accurate and usually diverse than the other ensemble approaches. Given these experimental results, we highly recommend EP-RTF for improving the performance of miRNA target prediction.
Collapse
Affiliation(s)
- Reza Mousavi
- Department of Electrical and Computer Engineering, Graduate University of Advanced Technology, Kerman, Iran
| | - Mahdi Eftekhari
- Department of Computer Engineering, Shahid Bahonar University of Kerman, Iran
| | - Mehdi Ghezelbash Haghighi
- Department of Electrical and Computer Engineering, Graduate University of Advanced Technology, Kerman, Iran
| |
Collapse
|
3
|
Yang X, Guo Y, Luo J, Pu X, Li M. Effective identification of Gram-negative bacterial type III secreted effectors using position-specific residue conservation profiles. PLoS One 2013; 8:e84439. [PMID: 24391954 PMCID: PMC3877298 DOI: 10.1371/journal.pone.0084439] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Accepted: 11/07/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Type III secretion systems (T3SSs) are central to the pathogenesis and specifically deliver their secreted substrates (type III secreted proteins, T3SPs) into host cells. Since T3SPs play a crucial role in pathogen-host interactions, identifying them is crucial to our understanding of the pathogenic mechanisms of T3SSs. This study reports a novel and effective method for identifying the distinctive residues which are conserved different from other SPs for T3SPs prediction. Moreover, the importance of several sequence features was evaluated and further, a promising prediction model was constructed. RESULTS Based on the conservation profiles constructed by a position-specific scoring matrix (PSSM), 52 distinctive residues were identified. To our knowledge, this is the first attempt to identify the distinct residues of T3SPs. Of the 52 distinct residues, the first 30 amino acid residues are all included, which is consistent with previous studies reporting that the secretion signal generally occurs within the first 30 residue positions. However, the remaining 22 positions span residues 30-100 were also proven by our method to contain important signal information for T3SP secretion because the translocation of many effectors also depends on the chaperone-binding residues that follow the secretion signal. For further feature optimisation and compression, permutation importance analysis was conducted to select 62 optimal sequence features. A prediction model across 16 species was developed using random forest to classify T3SPs and non-T3 SPs, with high receiver operating curve of 0.93 in the 10-fold cross validation and an accuracy of 94.29% for the test set. Moreover, when performing on a common independent dataset, the results demonstrate that our method outperforms all the others published to date. Finally, the novel, experimentally confirmed T3 effectors were used to further demonstrate the model's correct application. The model and all data used in this paper are freely available at http://cic.scu.edu.cn/bioinformatics/T3SPs.zip.
Collapse
Affiliation(s)
- Xiaojiao Yang
- College of Chemistry, Sichuan University, Chengdu, P.R.China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, P.R.China
| | - Jiesi Luo
- College of Chemistry, Sichuan University, Chengdu, P.R.China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu, P.R.China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, P.R.China
| |
Collapse
|
4
|
Mendoza MR, da Fonseca GC, Loss-Morais G, Alves R, Margis R, Bazzan ALC. RFMirTarget: predicting human microRNA target genes with a random forest classifier. PLoS One 2013; 8:e70153. [PMID: 23922946 PMCID: PMC3724815 DOI: 10.1371/journal.pone.0070153] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2013] [Accepted: 06/16/2013] [Indexed: 12/27/2022] Open
Abstract
MicroRNAs are key regulators of eukaryotic gene expression whose fundamental role has already been identified in many cell pathways. The correct identification of miRNAs targets is still a major challenge in bioinformatics and has motivated the development of several computational methods to overcome inherent limitations of experimental analysis. Indeed, the best results reported so far in terms of specificity and sensitivity are associated to machine learning-based methods for microRNA-target prediction. Following this trend, in the current paper we discuss and explore a microRNA-target prediction method based on a random forest classifier, namely RFMirTarget. Despite its well-known robustness regarding general classifying tasks, to the best of our knowledge, random forest have not been deeply explored for the specific context of predicting microRNAs targets. Our framework first analyzes alignments between candidate microRNA-target pairs and extracts a set of structural, thermodynamics, alignment, seed and position-based features, upon which classification is performed. Experiments have shown that RFMirTarget outperforms several well-known classifiers with statistical significance, and that its performance is not impaired by the class imbalance problem or features correlation. Moreover, comparing it against other algorithms for microRNA target prediction using independent test data sets from TarBase and starBase, we observe a very promising performance, with higher sensitivity in relation to other methods. Finally, tests performed with RFMirTarget show the benefits of feature selection even for a classifier with embedded feature importance analysis, and the consistency between relevant features identified and important biological properties for effective microRNA-target gene alignment.
Collapse
Affiliation(s)
- Mariana R. Mendoza
- Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| | - Guilherme C. da Fonseca
- Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| | - Guilherme Loss-Morais
- Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| | - Ronnie Alves
- Instituto Tecnológico Vale Desenvolvimento Sustentável, Belém, Pará, Brazil
| | - Rogerio Margis
- Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| | - Ana L. C. Bazzan
- Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
- * E-mail:
| |
Collapse
|
5
|
Singh PK, Singh AV, Chauhan DS. Current understanding on micro RNAs and its regulation in response to Mycobacterial infections. J Biomed Sci 2013; 20:14. [PMID: 23448104 PMCID: PMC3599176 DOI: 10.1186/1423-0127-20-14] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Accepted: 02/23/2013] [Indexed: 12/11/2022] Open
Abstract
MicroRNAs (miRNAs) are evolutionarily conserved, naturally abundant, small, regulatory non-coding RNAs that inhibit gene expression at the post-transcriptional level in a sequence-specific manner. Due to involvement in a broad range of biological processes and diseases, miRNAs are now commanding considerable attention. Although much of the focus has been on the role of miRNAs in different types of cancer, recent evidence also points to a critical role of miRNAs in infectious disease, including those of bacterial origin. Now, miRNAs research is exploring rapidly as a new thrust area of biomedical research with relevance to deadly bacterial diseases like Tuberculosis (caused by Mycobacterium tuberculosis). The purpose of this review is to highlight the current developments in area of miRNAs regulation in Mycobacterial diseases; and how this might influence the diagnosis, understanding of disease biology, control and management in the future.
Collapse
Affiliation(s)
- Pravin Kumar Singh
- Department of Microbiology & Molecular Biology, National JALMA Institute for Leprosy & Other Mycobacterial Diseases, Tajganj, Agra UP Pin- 282001, India
| | | | | |
Collapse
|
6
|
Xiao J, Tang X, Li Y, Fang Z, Ma D, He Y, Li M. Identification of microRNA precursors based on random forest with network-level representation method of stem-loop structure. BMC Bioinformatics 2011; 12:165. [PMID: 21575268 PMCID: PMC3118167 DOI: 10.1186/1471-2105-12-165] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Accepted: 05/17/2011] [Indexed: 11/17/2022] Open
Abstract
Background MicroRNAs (miRNAs) play a key role in regulating various biological processes such as participating in the post-transcriptional pathway and affecting the stability and/or the translation of mRNA. Current methods have extracted feature information at different levels, among which the characteristic stem-loop structure makes the greatest contribution to the prediction of putative miRNA precursor (pre-miRNA). We find that none of these features alone is capable of identifying new pre-miRNA accurately. Results In the present work, a pre-miRNA stem-loop secondary structure is translated to a network, which provides a novel perspective for its structural analysis. Network parameters are used to construct prediction model, achieving an area under the receiver operating curves (AUC) value of 0.956. Moreover, by repeating the same method on two independent datasets, accuracies of 0.976 and 0.913 are achieved, respectively. Conclusions Network parameters effectively characterize pre-miRNA secondary structure, which improves our prediction model in both prediction ability and computation efficiency. Additionally, as a complement to feature extraction methods in previous studies, these multifaceted features can reflect natural properties of miRNAs and be used for comprehensive and systematic analysis on miRNA.
Collapse
Affiliation(s)
- Jiamin Xiao
- College of Chemistry and State Key Laboratory of Biotherapy, Sichuan University, Chengdu 610064, PR China
| | | | | | | | | | | | | |
Collapse
|
7
|
Feature importance analysis in guide strand identification of microRNAs. Comput Biol Chem 2011; 35:131-6. [PMID: 21704258 DOI: 10.1016/j.compbiolchem.2011.04.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2011] [Revised: 03/22/2011] [Accepted: 04/23/2011] [Indexed: 11/22/2022]
Abstract
MicroRNA (miRNA) is the negative regulator of gene expression, also known as guide strand of transient miRNA:miRNA* duplex. It is critical in maintaining the normal physiological processes such as development, differentiation, and apoptosis in many organisms. With increasing miRNA data, it is desirable to design methods to identify guide strand based on machine learning algorithms. In this study, the random forest models based on local sequence-structure features were proposed to identify miRNA in four species. The accuracies achieved were 86.51% for Homo sapiens, 81.66% for Ornithorhynchus anatinus, 82.33% for Mus musculus and 85.71% for Schmidtea mediterranea, respectively. Furthermore, the important analysis of feature elements was carried out by using the conditional feature importance strategy. The analysis results revealed that most of the significant elements were related to guanine-cytosine (GC) base pair. We believed that our method could be beneficial to annotate the function of miRNA and help the further understanding of the RNA interference mechanism.
Collapse
|