1
|
Chen CW, Lin MH, Liao CC, Chang HP, Chu YW. iStable 2.0: Predicting protein thermal stability changes by integrating various characteristic modules. Comput Struct Biotechnol J 2020; 18:622-630. [PMID: 32226595 PMCID: PMC7090336 DOI: 10.1016/j.csbj.2020.02.021] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 02/25/2020] [Accepted: 02/27/2020] [Indexed: 11/15/2022] Open
Abstract
Protein mutations can lead to structural changes that affect protein function and result in disease occurrence. In protein engineering, drug design or and optimization industries, mutations are often used to improve protein stability or to change protein properties while maintaining stability. To provide possible candidates for novel protein design, several computational tools for predicting protein stability changes have been developed. Although many prediction tools are available, each tool employs different algorithms and features. This can produce conflicting prediction results that make it difficult for users to decide upon the correct protein design. Therefore, this study proposes an integrated prediction tool, iStable 2.0, which integrates 11 sequence-based and structure-based prediction tools by machine learning and adds protein sequence information as features. Three coding modules are designed for the system, an Online Server Module, a Stand-alone Module and a Sequence Coding Module, to improve the prediction performance of the previous version of the system. The final integrated structure-based classification model has a higher Matthews correlation coefficient than that of the single prediction tool (0.708 vs 0.547, respectively), and the Pearson correlation coefficient of the regression model likewise improves from 0.669 to 0.714. The sequence-based model not only successfully integrates off-the-shelf predictors but also improves the Matthews correlation coefficient of the best single prediction tool by at least 0.161, which is better than the individual structure-based prediction tools. In addition, both the Sequence Coding Module and the Stand-alone Module maintain performance with only a 5% decrease of the Matthews correlation coefficient when the integrated online tools are unavailable. iStable 2.0 is available at http://ncblab.nchu.edu.tw/iStable2.
Collapse
Affiliation(s)
- Chi-Wei Chen
- Department of Computer Science and Engineering, National Chung-Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan
- Institute of Genomics and Bioinformatics, National Chung Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan
| | - Meng-Han Lin
- Institute of Genomics and Bioinformatics, National Chung Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan
| | - Chi-Chou Liao
- Institute of Genomics and Bioinformatics, National Chung Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan
- Institute of Molecular Biology, National Chung Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan
| | - Hsung-Pin Chang
- Department of Computer Science and Engineering, National Chung-Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan
| | - Yen-Wei Chu
- Institute of Genomics and Bioinformatics, National Chung Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan
- Institute of Molecular Biology, National Chung Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan
- Agricultural Biotechnology Center, National Chung Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan
- Biotechnology Center, National Chung Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan
- Ph.D. Program in Translational Medicine, National Chung Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan
- Rong Hsing Research Center for Translational Medicine, National Chung Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan
- Corresponding author at: Institute of Genomics and Bioinformatics, National Chung Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan.
| |
Collapse
|
2
|
miRgo: integrating various off-the-shelf tools for identification of microRNA-target interactions by heterogeneous features and a novel evaluation indicator. Sci Rep 2020; 10:1466. [PMID: 32001758 PMCID: PMC6992741 DOI: 10.1038/s41598-020-58336-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 01/15/2020] [Indexed: 12/20/2022] Open
Abstract
MicroRNAs (miRNAs) are short non-coding RNAs that regulate gene expression and biological processes through binding to messenger RNAs. Predicting the relationship between miRNAs and their targets is crucial for research and clinical applications. Many tools have been developed to predict miRNA-target interactions, but variable results among the different prediction tools have caused confusion for users. To solve this problem, we developed miRgo, an application that integrates many of these tools. To train the prediction model, extreme values and median values from four different data combinations, which were obtained via an energy distribution function, were used to find the most representative dataset. Support vector machines were used to integrate 11 prediction tools, and numerous feature types used in these tools were classified into six categories-binding energy, scoring function, evolution evidence, binding type, sequence property, and structure-to simplify feature selection. In addition, a novel evaluation indicator, the Chu-Hsieh-Liang (CHL) index, was developed to improve the prediction power in positive data for feature selection. miRgo achieved better results than all other prediction tools in evaluation by an independent testing set and by its subset of functionally important genes. The tool is available at http://predictor.nchu.edu.tw/miRgo.
Collapse
|
3
|
Sagar A, Xue B. Recent Advances in Machine Learning Based Prediction of RNA-protein Interactions. Protein Pept Lett 2019; 26:601-619. [PMID: 31215361 DOI: 10.2174/0929866526666190619103853] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 04/04/2019] [Accepted: 06/01/2019] [Indexed: 12/18/2022]
Abstract
The interactions between RNAs and proteins play critical roles in many biological processes. Therefore, characterizing these interactions becomes critical for mechanistic, biomedical, and clinical studies. Many experimental methods can be used to determine RNA-protein interactions in multiple aspects. However, due to the facts that RNA-protein interactions are tissuespecific and condition-specific, as well as these interactions are weak and frequently compete with each other, those experimental techniques can not be made full use of to discover the complete spectrum of RNA-protein interactions. To moderate these issues, continuous efforts have been devoted to developing high quality computational techniques to study the interactions between RNAs and proteins. Many important progresses have been achieved with the application of novel techniques and strategies, such as machine learning techniques. Especially, with the development and application of CLIP techniques, more and more experimental data on RNA-protein interaction under specific biological conditions are available. These CLIP data altogether provide a rich source for developing advanced machine learning predictors. In this review, recent progresses on computational predictors for RNA-protein interaction were summarized in the following aspects: dataset, prediction strategies, and input features. Possible future developments were also discussed at the end of the review.
Collapse
Affiliation(s)
- Amit Sagar
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| |
Collapse
|
4
|
Chen L, Heikkinen L, Wang C, Yang Y, Sun H, Wong G. Trends in the development of miRNA bioinformatics tools. Brief Bioinform 2019; 20:1836-1852. [PMID: 29982332 PMCID: PMC7414524 DOI: 10.1093/bib/bby054] [Citation(s) in RCA: 326] [Impact Index Per Article: 65.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Revised: 05/18/2018] [Indexed: 12/13/2022] Open
Abstract
MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene expression via recognition of cognate sequences and interference of transcriptional, translational or epigenetic processes. Bioinformatics tools developed for miRNA study include those for miRNA prediction and discovery, structure, analysis and target prediction. We manually curated 95 review papers and ∼1000 miRNA bioinformatics tools published since 2003. We classified and ranked them based on citation number or PageRank score, and then performed network analysis and text mining (TM) to study the miRNA tools development trends. Five key trends were observed: (1) miRNA identification and target prediction have been hot spots in the past decade; (2) manual curation and TM are the main methods for collecting miRNA knowledge from literature; (3) most early tools are well maintained and widely used; (4) classic machine learning methods retain their utility; however, novel ones have begun to emerge; (5) disease-associated miRNA tools are emerging. Our analysis yields significant insight into the past development and future directions of miRNA tools.
Collapse
Affiliation(s)
- Liang Chen
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Liisa Heikkinen
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Changliang Wang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Yang Yang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Huiyan Sun
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Garry Wong
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| |
Collapse
|
5
|
Vanahalli MK, Patil N. An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2018.08.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
6
|
Zhao B, Xue B. Significant improvement of miRNA target prediction accuracy in large datasets using meta-strategy based on comprehensive voting and artificial neural networks. BMC Genomics 2019; 20:158. [PMID: 30813885 PMCID: PMC6391818 DOI: 10.1186/s12864-019-5528-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 02/13/2019] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Identifying mRNA targets of miRNAs is critical for studying gene expression regulation at the whole-genome level. Multiple computational tools have been developed to predict miRNA:mRNA interactions. Nonetheless, many of these tools are developed in various small datasets, which each represent a limited sample space. Thus, the prediction accuracy of these tools has not been systematically validated at a larger scale. Accordingly, comparing the prediction accuracy of these tools and determining their applicability become challenging. In addition, the accuracy of these tools, especially in large datasets, needs to be improved for broader applications. RESULTS In this project, a large dataset containing more than 46,600 miRNA:mRNA interactions was assembled and split into eleven subsets based on the availability of prediction scores of four individual predictors, which are miRanda, miRDB, PITA, and TargetScan. In each of these subsets, the predictive results of four individual predictors were integrated using decision-tree based artificial neural networks to make the meta-prediction. The decision-tree is used here to sort the predictive results of four individual predictors, and artificial neural networks are applied to make meta-prediction based on the outputs of individual predictors. In the decision tree, dual-threshold and two-step significance-voting were incorporated, information gain was analysed to select threshold values. The prediction performance of this new strategy was improved significantly in most of the eleven datasets comparing to the individual predictors and other meta-predictors, such as ComiR, under multi-fold cross-validation, as well as in independent datasets. The overall improvement of prediction accuracy in independent datasets is at least 9 percentile points comparing to the other predictors, and the percentage of improvement of F1 and MCC scores is at least 40% compared to the other predictors. CONCLUSIONS The combination of dual-threshold, two-step significance-voting, and analysis of information gain is very effective in optimizing the outcome of decision-tree, and further integration with artificial neural networks is critical for further improving the performance of meta-predictor. A new pipeline based on this integration for miRNA target prediction has been developed. A strategy using outputs of individual predictors to reorganize large-scale miRNA:mRNA interaction dataset has also been validated and used to evaluate the prediction accuracy of predictors. The predictor is available at: https://github.com/xueLab/mirTarDANN ).
Collapse
Affiliation(s)
- Bi Zhao
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, FL, 33620, USA
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, FL, 33620, USA.
| |
Collapse
|
7
|
Zhao B, Xue B. Decision-Tree Based Meta-Strategy Improved Accuracy of Disorder Prediction and Identified Novel Disordered Residues Inside Binding Motifs. Int J Mol Sci 2018; 19:E3052. [PMID: 30301243 PMCID: PMC6213717 DOI: 10.3390/ijms19103052] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Revised: 09/24/2018] [Accepted: 10/04/2018] [Indexed: 02/06/2023] Open
Abstract
Using computational techniques to identify intrinsically disordered residues is practical and effective in biological studies. Therefore, designing novel high-accuracy strategies is always preferable when existing strategies have a lot of room for improvement. Among many possibilities, a meta-strategy that integrates the results of multiple individual predictors has been broadly used to improve the overall performance of predictors. Nonetheless, a simple and direct integration of individual predictors may not effectively improve the performance. In this project, dual-threshold two-step significance voting and neural networks were used to integrate the predictive results of four individual predictors, including: DisEMBL, IUPred, VSL2, and ESpritz. The new meta-strategy has improved the prediction performance of intrinsically disordered residues significantly, compared to all four individual predictors and another four recently-designed predictors. The improvement was validated using five-fold cross-validation and in independent test datasets.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, FL 33620, USA.
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, FL 33620, USA.
| |
Collapse
|
8
|
Zhao B, Xue B. Improving prediction accuracy using decision-tree-based meta-strategy and multi-threshold sequential-voting exemplified by miRNA target prediction. Genomics 2017; 109:227-232. [PMID: 28435088 DOI: 10.1016/j.ygeno.2017.04.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Revised: 03/28/2017] [Accepted: 04/19/2017] [Indexed: 01/12/2023]
Abstract
Lots of computational predictors have been developed for fast and large-scale analysis of biological data. However, many of them were developed long time ago when training datasets or sets of input features were rather small. Consequently, the utility of these predictors in much large datasets, which are very common in nowadays, need to be examined carefully. In addition, with the rapid development of scientific research, the expectation on the prediction accuracy of computational predictors is continuously uplifting. Therefore, developing novel strategies to improve the prediction accuracies of computational predictors becomes critical. In this study, the predictive results of existing individual miRNA target predictors were integrated into a decision-tree to make meta-prediction. When the multi-threshold sequential-voting technique was used, the prediction accuracy of the decision-tree was significantly improved by at least thirty percentage points compared to the individual predictors.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, 4202 East Fowler Ave. ISA2015, Tampa, Florida, 33620, USA
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, 4202 East Fowler Ave. ISA2015, Tampa, Florida, 33620, USA.
| |
Collapse
|