Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cummings MP, Myers DS. Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA. BMC Bioinformatics 2004;5:132. [PMID: 15373947 PMCID: PMC521485 DOI: 10.1186/1471-2105-5-132] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2004] [Accepted: 09/16/2004] [Indexed: 11/10/2022] Open

For:	Cummings MP, Myers DS. Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA. BMC Bioinformatics 2004;5:132. [PMID: 15373947 PMCID: PMC521485 DOI: 10.1186/1471-2105-5-132] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2004] [Accepted: 09/16/2004] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Mohammed T, Firoz A, Ramadan AM. RNA Editing in Chloroplast: Advancements and Opportunities. Curr Issues Mol Biol 2022;44:5593-5604. [PMID: 36421663 PMCID: PMC9688838 DOI: 10.3390/cimb44110379] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 11/05/2022] [Accepted: 11/10/2022] [Indexed: 07/25/2023] Open

Qin S, Fan Y, Hu S, Wang Y, Wang Z, Cao Y, Liu Q, Tan S, Dai Z, Zhou W. iPReditor-CMG: Improving a predictive RNA editor for crop mitochondrial genomes using genomic sequence features and an optimal support vector machine. PHYTOCHEMISTRY 2022;200:113222. [PMID: 35561852 DOI: 10.1016/j.phytochem.2022.113222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 04/29/2022] [Accepted: 04/30/2022] [Indexed: 06/15/2023]

Abstract

In crops, RNA editing is one of the most important post-transcriptional processes in which specific cytidines (C) in virtually all mitochondrial protein-coding genes are converted to uridines (U). Despite extensive recent research in RNA editing, exploring all of the C-to-U editing events efficiently on the genomic scale remains challengeable. Developing accurate prediction methods for the detection of RNA editing sites would dramatically reduce experimental determination. Therefore, we propose a novel method, iPReditor-CMG (improved predictive RNA editor for crop mitochondrial genomes), to predict crop mitochondrial editing sites using genome sequence and an optimised support vector machine (SVM). We first selected three mitochondrial genomes with known RNA editing sites from Arabidopsis thaliana, Brassica napus and Oryza sativa, released by NCBI, as the training and test sets. The genes and their transcripts from self-sequenced tobacco mitochondrial ATPase were selected as the validation set. The iPReditor-CMG first coded the genome sequences as numerical vectors and then performed an efficient feature selection on the high-dimensional feature space, where the SVM was employed in feature selection and following modelling. The average independent prediction accuracy of intraspecific editing sites across three species was 0.85, and up to 0.91 in A. thaliana, which outperformed the reference models. For the interspecific independent prediction, the prediction accuracy between dicotyledons was 0.78 and the accuracy between dicotyledons and monocotyledons was 0.56, which implies that there might be similarity in the C-to-U editing mechanism in close relatives. Finally, the best model was identified with an independent test accuracy of 0.91 and an AUC of 0.88, which suggested that five unreported feature sequences, i.e. TGACA, ACAAC, GTAGA, CCGTT and TAACA, are closely associated with the editing phenomenon. Multiple tests supported that the iPReditor-CMG could be effectively applied to predict editing sites in crop mitochondria, which may further contribute to understanding the mechanisms of site editing and post-transcriptional events in crop mitochondria.

Collapse

Affiliation(s)

Sidong Qin Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China
Yanjun Fan Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China; Shanxi Province Jincheng City Landscaping Service Center, Shanxi, 048000, China
Shengnan Hu Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China
Yongqiang Wang Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China
Ziqi Wang Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China
Yixiang Cao Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China
Qiyuan Liu Key Laboratory of Crop Physiology, Ecology and Genetic Breeding, Ministry of Education, College of Agronomy, Jiangxi Agricultural University, Nanchang, 330045, China
Siqiao Tan College of Information and Intelligence, Hunan Agricultural University, Changsha, 410128, China
Zhijun Dai Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China
Wei Zhou Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China.

Collapse

Loecher M. Unbiased variable importance for random forests. COMMUN STAT-THEOR M 2020. [DOI: 10.1080/03610926.2020.1764042] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Lo Giudice C, Hernández I, Ceci LR, Pesole G, Picardi E. RNA editing in plants: A comprehensive survey of bioinformatics tools and databases. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2019;137:53-61. [PMID: 30738217 DOI: 10.1016/j.plaphy.2019.02.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Revised: 01/30/2019] [Accepted: 02/02/2019] [Indexed: 06/09/2023]

Couronné R, Probst P, Boulesteix AL. Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinformatics 2018;19:270. [PMID: 30016950 PMCID: PMC6050737 DOI: 10.1186/s12859-018-2264-5] [Citation(s) in RCA: 265] [Impact Index Per Article: 44.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 06/27/2018] [Indexed: 11/10/2022] Open

Edera AA, Gandini CL, Sanchez-Puerta MV. Towards a comprehensive picture of C-to-U RNA editing sites in angiosperm mitochondria. PLANT MOLECULAR BIOLOGY 2018;97:215-231. [PMID: 29761268 DOI: 10.1007/s11103-018-0734-9] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Accepted: 05/02/2018] [Indexed: 06/08/2023]

Abstract

Our understanding of the dynamic and evolution of RNA editing in angiosperms is in part limited by the few editing sites identified to date. This study identified 10,217 editing sites from 17 diverse angiosperms. Our analyses confirmed the universality of certain features of RNA editing, and offer new evidence behind the loss of editing sites in angiosperms. RNA editing is a post-transcriptional process that substitutes cytidines (C) for uridines (U) in organellar transcripts of angiosperms. These substitutions mostly take place in mitochondrial messenger RNAs at specific positions called editing sites. By means of publicly available RNA-seq data, this study identified 10,217 editing sites in mitochondrial protein-coding genes of 17 diverse angiosperms. Even though other types of mismatches were also identified, we did not find evidence of non-canonical editing processes. The results showed an uneven distribution of editing sites among species, genes, and codon positions. The analyses revealed that editing sites were conserved across angiosperms but there were some species-specific sites. Non-synonymous editing sites were particularly highly conserved (~ 80%) across the plant species and were efficiently edited (80% editing extent). In contrast, editing sites at third codon positions were poorly conserved (~ 30%) and only partially edited (~ 40% editing extent). We found that the loss of editing sites along angiosperm evolution is mainly occurring by replacing editing sites with thymidines, instead of a degradation of the editing recognition motif around editing sites. Consecutive and highly conserved editing sites had been replaced by thymidines as result of retroprocessing, by which edited transcripts are reverse transcribed to cDNA and then integrated into the genome by homologous recombination. This phenomenon was more pronounced in eudicots, and in the gene cox1. These results suggest that retroprocessing is a widespread driving force underlying the loss of editing sites in angiosperm mitochondria.

Collapse

Epifanio I. Intervention in prediction measure: a new approach to assessing variable importance for random forests. BMC Bioinformatics 2017;18:230. [PMID: 28464827 PMCID: PMC5414143 DOI: 10.1186/s12859-017-1650-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Accepted: 04/25/2017] [Indexed: 12/20/2022] Open

Abstract

Background

Random forests are a popular method in many fields since they can be successfully applied to complex data, with a small sample size, complex interactions and correlations, mixed type predictors, etc. Furthermore, they provide variable importance measures that aid qualitative interpretation and also the selection of relevant predictors. However, most of these measures rely on the choice of a performance measure. But measures of prediction performance are not unique or there is not even a clear definition, as in the case of multivariate response random forests.

Methods

A new alternative importance measure, called Intervention in Prediction Measure, is investigated. It depends on the structure of the trees, without depending on performance measures. It is compared with other well-known variable importance measures in different contexts, such as a classification problem with variables of different types, another classification problem with correlated predictor variables, and problems with multivariate responses and predictors of different types.

Results

Several simulation studies are carried out, showing the new measure to be very competitive. In addition, it is applied in two well-known bioinformatics applications previously used in other papers. Improvements in performance are also provided for these applications by the use of this new measure.

Conclusions

This new measure is expressed as a percentage, which makes it attractive in terms of interpretability. It can be used with new observations. It can be defined globally, for each class (in a classification problem) and case-wise. It can easily be computed for any kind of response, including multivariate responses. Furthermore, it can be used with any algorithm employed to grow each individual tree. It can be used in place of (or in addition to) other variable importance measures.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-017-1650-8) contains supplementary material, which is available to authorized users.

Collapse

Cahoon AB, Nauss JA, Stanley CD, Qureshi A. Deep Transcriptome Sequencing of Two Green Algae, Chara vulgaris and Chlamydomonas reinhardtii, Provides No Evidence of Organellar RNA Editing. Genes (Basel) 2017;8:genes8020080. [PMID: 28230734 PMCID: PMC5333069 DOI: 10.3390/genes8020080] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 02/13/2017] [Indexed: 11/16/2022] Open

Janitza S, Strobl C, Boulesteix AL. An AUC-based permutation variable importance measure for random forests. BMC Bioinformatics 2013;14:119. [PMID: 23560875 PMCID: PMC3626572 DOI: 10.1186/1471-2105-14-119] [Citation(s) in RCA: 148] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 03/21/2013] [Indexed: 11/30/2022] Open

Abstract

Background

The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance.

Results

We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings.

Conclusions

The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html.

Collapse

Lenz H, Knoop V. PREPACT 2.0: Predicting C-to-U and U-to-C RNA Editing in Organelle Genome Sequences with Multiple References and Curated RNA Editing Annotation. Bioinform Biol Insights 2013;7:1-19. [PMID: 23362369 PMCID: PMC3547502 DOI: 10.4137/bbi.s11059] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open

Land Plant RNA Editing or: Don’t Be Fooled by Plant Organellar DNA Sequences. ACTA ACUST UNITED AC 2012. [DOI: 10.1007/978-94-007-2920-9_13] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]

Takala SL, Coulibaly D, Thera MA, Batchelor AH, Cummings MP, Escalante AA, Ouattara A, Traoré K, Niangaly A, Djimdé AA, Doumbo OK, Plowe CV. Extreme polymorphism in a vaccine antigen and risk of clinical malaria: implications for vaccine development. Sci Transl Med 2010;1:2ra5. [PMID: 20165550 DOI: 10.1126/scitranslmed.3000257] [Citation(s) in RCA: 138] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Salmans ML, Chaw SM, Lin CP, Shih ACC, Wu YW, Mulligan RM. Editing site analysis in a gymnosperm mitochondrial genome reveals similarities with angiosperm mitochondrial genomes. Curr Genet 2010;56:439-46. [PMID: 20617318 PMCID: PMC2943580 DOI: 10.1007/s00294-010-0312-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Revised: 06/11/2010] [Accepted: 06/16/2010] [Indexed: 11/30/2022]

Altmann A, Toloşi L, Sander O, Lengauer T. Permutation importance: a corrected feature importance measure. ACTA ACUST UNITED AC 2010;26:1340-7. [PMID: 20385727 DOI: 10.1093/bioinformatics/btq134] [Citation(s) in RCA: 618] [Impact Index Per Article: 44.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Abstract

MOTIVATION

In life sciences, interpretability of machine learning models is as important as their prediction accuracy. Linear models are probably the most frequently used methods for assessing feature relevance, despite their relative inflexibility. However, in the past years effective estimators of feature relevance have been derived for highly complex or non-parametric models such as support vector machines and RandomForest (RF) models. Recently, it has been observed that RF models are biased in such a way that categorical variables with a large number of categories are preferred.

RESULTS

In this work, we introduce a heuristic for normalizing feature importance measures that can correct the feature importance bias. The method is based on repeated permutations of the outcome vector for estimating the distribution of measured importance for each variable in a non-informative setting. The P-value of the observed importance provides a corrected measure of feature importance. We apply our method to simulated data and demonstrate that (i) non-informative predictors do not receive significant P-values, (ii) informative variables can successfully be recovered among non-informative variables and (iii) P-values computed with permutation importance (PIMP) are very helpful for deciding the significance of variables, and therefore improve model interpretability. Furthermore, PIMP was used to correct RF-based importance measures for two real-world case studies. We propose an improved RF model that uses the significant variables with respect to the PIMP measure and show that its prediction accuracy is superior to that of other existing models.

AVAILABILITY

R code for the method presented in this article is available at http://www.mpi-inf.mpg.de/ approximately altmann/download/PIMP.R CONTACT: altmann@mpi-inf.mpg.de, laura.tolosi@mpi-inf.mpg.de

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Hammani K, Okuda K, Tanz SK, Chateigner-Boutin AL, Shikanai T, Small I. A study of new Arabidopsis chloroplast RNA editing mutants reveals general features of editing factors and their target sites. THE PLANT CELL 2009;21:3686-99. [PMID: 19934379 PMCID: PMC2798323 DOI: 10.1105/tpc.109.071472] [Citation(s) in RCA: 145] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2009] [Revised: 10/09/2009] [Accepted: 10/30/2009] [Indexed: 05/18/2023]

Yura K, Sulaiman S, Hatta Y, Shionyu M, Go M. RESOPS: a database for analyzing the correspondence of RNA editing sites to protein three-dimensional structures. PLANT & CELL PHYSIOLOGY 2009;50:1865-73. [PMID: 19808808 PMCID: PMC2775959 DOI: 10.1093/pcp/pcp132] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Accepted: 09/24/2009] [Indexed: 05/21/2023]

Mower JP. The PREP suite: predictive RNA editors for plant mitochondrial genes, chloroplast genes and user-defined alignments. Nucleic Acids Res 2009;37:W253-9. [PMID: 19433507 PMCID: PMC2703948 DOI: 10.1093/nar/gkp337] [Citation(s) in RCA: 240] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Du P, Jia L, Li Y. CURE-Chloroplast: a chloroplast C-to-U RNA editing predictor for seed plants. BMC Bioinformatics 2009;10:135. [PMID: 19422723 PMCID: PMC2688514 DOI: 10.1186/1471-2105-10-135] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2008] [Accepted: 05/08/2009] [Indexed: 12/04/2022] Open

A Molecular Footprint of Limb Loss: Sequence Variation of the Autopodial Identity Gene Hoxa-13. J Mol Evol 2008;67:581-93. [DOI: 10.1007/s00239-008-9156-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2008] [Accepted: 08/05/2008] [Indexed: 10/21/2022]

Yura K, Miyata Y, Arikawa T, Higuchi M, Sugita M. Characteristics and prediction of RNA editing sites in transcripts of the Moss Takakia lepidozioides chloroplast. DNA Res 2008;15:309-21. [PMID: 18650260 PMCID: PMC2575889 DOI: 10.1093/dnares/dsn016] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Millar AH, Small ID, Day DA, Whelan J. Mitochondrial biogenesis and function in Arabidopsis. THE ARABIDOPSIS BOOK 2008;6:e0111. [PMID: 22303236 DOI: 10.1199/tab.0105] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]

Millar AH, Small ID, Day DA, Whelan J. Mitochondrial biogenesis and function in Arabidopsis. THE ARABIDOPSIS BOOK 2008;6:e0111. [PMID: 22303236 PMCID: PMC3243404 DOI: 10.1199/tab.0111] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Empirical characterization of random forest variable importance measures. Comput Stat Data Anal 2008. [DOI: 10.1016/j.csda.2007.08.015] [Citation(s) in RCA: 604] [Impact Index Per Article: 37.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Mower JP. Modeling Sites of RNA Editing as a Fifth Nucleotide State Reveals Progressive Loss of Edited Sites from Angiosperm Mitochondria. Mol Biol Evol 2007;25:52-61. [DOI: 10.1093/molbev/msm226] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open

Mulligan RM, Chang KLC, Chou CC. Computational analysis of RNA editing sites in plant mitochondrial genomes reveals similar information content and a sporadic distribution of editing sites. Mol Biol Evol 2007;24:1971-81. [PMID: 17591603 DOI: 10.1093/molbev/msm125] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

A computational analysis of RNA editing sites was performed on protein-coding sequences of plant mitochondrial genomes from Arabidopsis thaliana, Beta vulgaris, Brassica napus, and Oryza sativa. The distribution of nucleotides around edited and unedited cytidines was compared in 41 nucleotide segments and included 1481 edited cytidines and 21,390 unedited cytidines in the 4 genomes. The distribution of nucleotides was examined in 1, 2, and 3 nucleotide windows by comparison of nucleotide frequency ratios and relative entropy. The relative entropy analyses indicate that information is encoded in the nucleotide sequences in the 5 prime flank (-18 to -14, -13 to -10, -6 to -4, -2/-1) and the immediate 3 prime flanking nucleotide (+1), and these regions may be important in editing site recognition. The relative entropy was large when 2 or 3 nucleotide windows were analyzed, suggesting that several contiguous nucleotides may be involved in editing site recognition. RNA editing sites were frequently preceded by 2 pyrimidines or AU and followed by a guanidine (HYCG) in the monocot and dicot mitochondrial genomes, and rarely preceded by 2 purines. Analysis of chloroplast editing sites from a dicot, Nicotiana tabacum, and a monocot, Zea mays, revealed a similar distribution of nucleotides around editing sites (HYCA). The similarity of this motif around editing sites in monocots and dicots in both mitochondria and chloroplasts suggests that a mechanistic basis for this motif exists that is common in these different organelle and phylogenetic systems. The preferred sequence distribution around RNA editing sites may have an important impact on the acquisition of editing sites in evolution because the immediate sequence context of a cytidine residue may render a cytidine editable or uneditable, and consequently determine whether a T to C mutation at a specific position may be corrected by RNA editing. The distribution of editing sites in many protein-coding sequences is shown to be non-random with editing sites clustered in groups separated by regions with no editing sites. The sporadic distribution of editing sites could result from a mechanism of editing site loss by gene conversion utilizing edited sequence information, possibly through an edited cDNA intermediate.

Collapse

Strobl C, Boulesteix AL, Zeileis A, Hothorn T. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 2007;8:25. [PMID: 17254353 PMCID: PMC1796903 DOI: 10.1186/1471-2105-8-25] [Citation(s) in RCA: 1173] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2006] [Accepted: 01/25/2007] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories.

RESULTS

Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand.

CONCLUSION

We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analyzing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research.

Collapse

Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 2007. [PMID: 17254353 DOI: 10.1186/1471‐2105‐8‐25] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

RESULTS

CONCLUSION

Collapse

Thompson J, Gopal S. Correction: genetic algorithm learning as a robust approach to RNA editing site site prediction. BMC Bioinformatics 2006;7:406. [PMID: 16956416 PMCID: PMC1569880 DOI: 10.1186/1471-2105-7-406] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2006] [Accepted: 09/06/2006] [Indexed: 11/10/2022] Open

Thompson J, Gopal S. Genetic algorithm learning as a robust approach to RNA editing site prediction. BMC Bioinformatics 2006;7:145. [PMID: 16542417 PMCID: PMC1459874 DOI: 10.1186/1471-2105-7-145] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2005] [Accepted: 03/16/2006] [Indexed: 11/10/2022] Open

Mower JP. PREP-Mt: predictive RNA editor for plant mitochondrial genes. BMC Bioinformatics 2005;6:96. [PMID: 15826309 PMCID: PMC1087475 DOI: 10.1186/1471-2105-6-96] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2005] [Accepted: 04/12/2005] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In plants, RNA editing is a process that converts specific cytidines to uridines and uridines to cytidines in transcripts from virtually all mitochondrial protein-coding genes. There are thousands of plant mitochondrial genes in the sequence databases, but sites of RNA editing have not been determined for most. Accurate methods of RNA editing site prediction will be important in filling in this information gap and could reduce or even eliminate the need for experimental determination of editing sites for many sequences. Because RNA editing tends to increase protein conservation across species by "correcting" codons that specify unconserved amino acids, this principle can be used to predict editing sites by identifying positions where an RNA editing event would increase the conservation of a protein to homologues from other plants. PREP-Mt takes this approach to predict editing sites for any protein-coding gene in plant mitochondria.

RESULTS

To test the general applicability of the PREP-Mt methodology, RNA editing sites were predicted for 370 full-length or nearly full-length DNA sequences and then compared to the known sites of RNA editing for these sequences. Of 60,263 cytidines in this test set, PREP-Mt correctly classified 58,994 as either an edited or unedited site (accuracy = 97.9%). PREP-Mt properly identified 3,038 of the 3,698 known sites of RNA editing (sensitivity = 82.2%) and 55,956 of the 56,565 known unedited sites (specificity = 98.9%). Accuracy and sensitivity increased to 98.7% and 94.7%, respectively, after excluding the 489 silent editing sites (which have no effect on protein sequence or function) from the test set.

CONCLUSION

These results indicate that PREP-Mt is effective at identifying C to U RNA editing sites in plant mitochondrial protein-coding genes. Thus, PREP-Mt should be useful in predicting protein sequences for use in molecular, biochemical, and phylogenetic analyses. In addition, PREP-Mt could be used to determine functionality of a mitochondrial gene or to identify particular sequences with unusual editing properties. The PREP-Mt methodology should be applicable to any system where RNA editing increases protein conservation across species.

Collapse

Cummings MP, Segal MR. Few amino acid positions in rpoB are associated with most of the rifampin resistance in Mycobacterium tuberculosis. BMC Bioinformatics 2004;5:137. [PMID: 15453919 PMCID: PMC524371 DOI: 10.1186/1471-2105-5-137] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2004] [Accepted: 09/28/2004] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Mutations in rpoB, the gene encoding the beta subunit of DNA-dependent RNA polymerase, are associated with rifampin resistance in Mycobacterium tuberculosis. Several studies have been conducted where minimum inhibitory concentration (MIC, which is defined as the minimum concentration of the antibiotic in a given culture medium below which bacterial growth is not inhibited) of rifampin has been measured and partial DNA sequences have been determined for rpoB in different isolates of M. tuberculosis. However, no model has been constructed to predict rifampin resistance based on sequence information alone. Such a model might provide the basis for quantifying rifampin resistance status based exclusively on DNA sequence data and thus eliminate the requirements for time consuming culturing and antibiotic testing of clinical isolates.

RESULTS

Sequence data for amino acid positions 511-533 of rpoB and associated MIC of rifampin for different isolates of M. tuberculosis were taken from studies examining rifampin resistance in clinical samples from New York City and throughout Japan. We used tree-based statistical methods and random forests to generate models of the relationships between rpoB amino acid sequence and rifampin resistance. The proportion of variance explained by a relatively simple tree-based cross-validated regression model involving two amino acid positions (526 and 531) is 0.679. The first partition in the data, based on position 531, results in groups that differ one hundredfold in mean MIC (1.596 micrograms/ml and 159.676 micrograms/ml). The subsequent partition based on position 526, the most variable in this region, results in a > 354-fold difference in MIC. When considered as a classification problem (susceptible or resistant), a cross-validated tree-based model correctly classified most (0.884) of the observations and was very similar to the regression model. Random forest analysis of the MIC data as a continuous variable, a regression problem, produced a model that explained 0.861 of the variance. The random forest analysis of the MIC data as discrete classes produced a model that correctly classified 0.942 of the observations with sensitivity of 0.958 and specificity of 0.885.

CONCLUSIONS

Highly accurate regression and classification models of rifampin resistance can be made based on this short sequence region. Models may be better with improved (and consistent) measurements of MIC and more sequence data.

Collapse