Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zou Q, Wang Z, Guan X, Liu B, Wu Y, Lin Z. An approach for identifying cytokines based on a novel ensemble classifier. Biomed Res Int 2013;2013:686090. [PMID: 24027761 DOI: 10.1155/2013/686090] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2013] [Revised: 07/02/2013] [Accepted: 07/15/2013] [Indexed: 11/18/2022]

For:	Zou Q, Wang Z, Guan X, Liu B, Wu Y, Lin Z. An approach for identifying cytokines based on a novel ensemble classifier. Biomed Res Int 2013;2013:686090. [PMID: 24027761 DOI: 10.1155/2013/686090] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2013] [Revised: 07/02/2013] [Accepted: 07/15/2013] [Indexed: 11/18/2022]

Number

Cited by Other Article(s)

Zhong G, Liu H, Deng L. Ensemble Machine Learning and Predicted Properties Promote Antimicrobial Peptide Identification. Interdiscip Sci 2024:10.1007/s12539-024-00640-z. [PMID: 38972032 DOI: 10.1007/s12539-024-00640-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 06/04/2024] [Accepted: 06/07/2024] [Indexed: 07/08/2024]

Zhong G, Deng L. ACPScanner: Prediction of Anticancer Peptides by Integrated Machine Learning Methodologies. J Chem Inf Model 2024;64:1092-1104. [PMID: 38277774 DOI: 10.1021/acs.jcim.3c01860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2024]

Gu X, Ding Y, Xiao P, He T. A GHKNN model based on the physicochemical property extraction method to identify SNARE proteins. Front Genet 2022;13:935717. [PMID: 36506312 PMCID: PMC9727185 DOI: 10.3389/fgene.2022.935717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 11/02/2022] [Indexed: 11/24/2022] Open

Liu S, Cui C, Chen H, Liu T. Ensemble learning-based feature selection for phosphorylation site detection. Front Genet 2022;13:984068. [PMID: 36338976 PMCID: PMC9634105 DOI: 10.3389/fgene.2022.984068] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 10/05/2022] [Indexed: 11/18/2022] Open

A Novel Ensemble-Based Technique for the Preemptive Diagnosis of Rheumatoid Arthritis Disease in the Eastern Province of Saudi Arabia Using Clinical Data. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022;2022:2339546. [PMID: 36158117 PMCID: PMC9492338 DOI: 10.1155/2022/2339546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 07/20/2022] [Accepted: 08/02/2022] [Indexed: 11/17/2022]

Abstract

Rheumatoid arthritis (RA) is a chronic inflammatory disease caused by numerous genetic and environmental factors leading to musculoskeletal system pain. RA may damage other tissues and organs, causing complications that severely reduce patients' quality of life. According to the World Health Organization (WHO), over 1.71 billion individuals worldwide had musculoskeletal problems in 2021. Rheumatologists face challenges in the early detection of RA since its symptoms are similar to other illnesses, and there is no definitive test to diagnose the disease. Accordingly, it is preferable to profit from the power of computational intelligence techniques that can identify hidden patterns to diagnose RA early. Although multiple studies were conducted to diagnose RA early, they showed unsatisfactory performance, with the highest accuracy of 87.5% using imaging data. Yet, imaging data requires diagnostic tools that are challenging to collect and examine and are more costly. Recent studies indicated that neither a blood test nor a physical finding could early confirm the diagnosis. Therefore, this study proposes a novel ensemble technique for the preemptive prediction of RA and investigates the possibility of diagnosing the disease using clinical data before the symptoms appear. Two datasets were obtained from King Fahad University Hospital (KFUH), Dammam, Saudi Arabia, including 446 patients, with 251 positive cases of RA and 195 negative cases of RA. Two experiments were conducted where the former was developed without upsampling the dataset, and the latter was carried out using an upsampled dataset. Multiple machine learning (ML) algorithms were utilized to assemble the novel voting ensemble, including support vector machine (SVM), logistic regression (LR), and adaptive boosting (Adaboost). The results indicated that clinical laboratory tests fed to the proposed voting ensemble technique could accurately diagnose RA preemptively with an accuracy, recall, and precision of 94.03%, 96.00%, and 93.51%, respectively, with 30 clinical features when utilizing the original data and sequential forward feature selection (SFFS) technique. It is concluded that deploying the proposed model in local hospitals can contribute to introducing a method that aids medical specialists in preemptively diagnosing RA and stopping or delaying the course using clinical laboratory tests.

Collapse

Zhao S, Meng J, Kang Q, Luan Y. Identifying LncRNA-Encoded Short Peptides Using Optimized Hybrid Features and Ensemble Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2873-2881. [PMID: 34383651 DOI: 10.1109/tcbb.2021.3104288] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Zhou H, Wang H, Ding Y, Tang J. Multivariate Information Fusion for Identifying Antifungal Peptides with Hilbert-Schmidt Independence Criterion. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210727161003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Jiao S, Zou Q, Guo H, Shi L. iTTCA-RF: a random forest predictor for tumor T cell antigens. J Transl Med 2021;19:449. [PMID: 34706730 PMCID: PMC8554859 DOI: 10.1186/s12967-021-03084-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 09/16/2021] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging.

METHODS

In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm.

RESULTS

Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA .

CONCLUSIONS

We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.

Collapse

Su R, Hu J, Zou Q, Manavalan B, Wei L. Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief Bioinform 2021;21:408-420. [PMID: 30649170 DOI: 10.1093/bib/bby124] [Citation(s) in RCA: 107] [Impact Index Per Article: 35.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 11/30/2018] [Accepted: 11/30/2018] [Indexed: 12/16/2022] Open

Zhao S, Ju Y, Ye X, Zhang J, Han S. Bioluminescent Proteins Prediction with Voting Strategy. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200601122328] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Nath A, Leier A. Improved cytokine-receptor interaction prediction by exploiting the negative sample space. BMC Bioinformatics 2020;21:493. [PMID: 33129275 PMCID: PMC7603689 DOI: 10.1186/s12859-020-03835-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 10/23/2020] [Indexed: 01/19/2023] Open

Abstract

Background

Cytokines act by binding to specific receptors in the plasma membrane of target cells. Knowledge of cytokine–receptor interaction (CRI) is very important for understanding the pathogenesis of various human diseases—notably autoimmune, inflammatory and infectious diseases—and identifying potential therapeutic targets. Recently, machine learning algorithms have been used to predict CRIs. “Gold Standard” negative datasets are still lacking and strong biases in negative datasets can significantly affect the training of learning algorithms and their evaluation. To mitigate the unrepresentativeness and bias inherent in the negative sample selection (non-interacting proteins), we propose a clustering-based approach for representative negative sample selection.

Results

We used deep autoencoders to investigate the effect of different sampling approaches for non-interacting pairs on the training and the performance of machine learning classifiers. By using the anomaly detection capabilities of deep autoencoders we deduced the effects of different categories of negative samples on the training of learning algorithms. Random sampling for selecting non-interacting pairs results in either over- or under-representation of hard or easy to classify instances. When K-means based sampling of negative datasets is applied to mitigate the inadequacies of random sampling, random forest (RF) together with the combined feature set of atomic composition, physicochemical-2grams and two different representations of evolutionary information performs best. Average model performances based on leave-one-out cross validation (loocv) over ten different negative sample sets that each model was trained with, show that RF models significantly outperform the previous best CRI predictor in terms of accuracy (+ 5.1%), specificity (+ 13%), mcc (+ 0.1) and g-means value (+ 5.1). Evaluations using tenfold cv and training/testing splits confirm the competitive performance.

Conclusions

A comparative analysis was performed to assess the effect of three different sampling methods (random, K-means and uniform sampling) on the training of learning algorithms using different evaluation methods. Models trained on K-means sampled datasets generally show a significantly improved performance compared to those trained on random selections—with RF seemingly benefiting most in our particular setting. Our findings on the sampling are highly relevant and apply to many applications of supervised learning approaches in bioinformatics.

Collapse

Dou L, Li X, Zhang L, Xiang H, Xu L. iGlu_AdaBoost: Identification of Lysine Glutarylation Using the AdaBoost Classifier. J Proteome Res 2020;20:191-201. [PMID: 33090794 DOI: 10.1021/acs.jproteome.0c00314] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Nguyen TTD, Le NQK, Ho QT, Phan DV, Ou YY. TNFPred: identifying tumor necrosis factors using hybrid features based on word embeddings. BMC Med Genomics 2020;13:155. [PMID: 33087125 PMCID: PMC7579990 DOI: 10.1186/s12920-020-00779-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Abstract

Background

Cytokines are a class of small proteins that act as chemical messengers and play a significant role in essential cellular processes including immunity regulation, hematopoiesis, and inflammation. As one important family of cytokines, tumor necrosis factors have association with the regulation of a various biological processes such as proliferation and differentiation of cells, apoptosis, lipid metabolism, and coagulation. The implication of these cytokines can also be seen in various diseases such as insulin resistance, autoimmune diseases, and cancer. Considering the interdependence between this kind of cytokine and others, classifying tumor necrosis factors from other cytokines is a challenge for biological scientists.

Methods

In this research, we employed a word embedding technique to create hybrid features which was proved to efficiently identify tumor necrosis factors given cytokine sequences. We segmented each protein sequence into protein words and created corresponding word embedding for each word. Then, word embedding-based vector for each sequence was created and input into machine learning classification models. When extracting feature sets, we not only diversified segmentation sizes of protein sequence but also conducted different combinations among split grams to find the best features which generated the optimal prediction. Furthermore, our methodology follows a well-defined procedure to build a reliable classification tool.

Results

With our proposed hybrid features, prediction models obtain more promising performance compared to seven prominent sequenced-based feature kinds. Results from 10 independent runs on the surveyed dataset show that on an average, our optimal models obtain an area under the curve of 0.984 and 0.998 on 5-fold cross-validation and independent test, respectively.

Conclusions

These results show that biologists can use our model to identify tumor necrosis factors from other cytokines efficiently. Moreover, this study proves that natural language processing techniques can be applied reasonably to help biologists solve bioinformatics problems efficiently.

Collapse

Gu X, Chen Z, Wang D. Prediction of G Protein-Coupled Receptors With CTDC Extraction and MRMD2.0 Dimension-Reduction Methods. Front Bioeng Biotechnol 2020;8:635. [PMID: 32671038 PMCID: PMC7329982 DOI: 10.3389/fbioe.2020.00635] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 05/26/2020] [Indexed: 11/13/2022] Open

Hou R, Wang L, Wu YJ. Predicting ATP-Binding Cassette Transporters Using the Random Forest Method. Front Genet 2020;11:156. [PMID: 32269586 PMCID: PMC7109328 DOI: 10.3389/fgene.2020.00156] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Accepted: 02/11/2020] [Indexed: 12/21/2022] Open

iPromoter-2L2.0: Identifying Promoters and Their Types by Combining Smoothing Cutting Window Algorithm and Sequence-Based Features. MOLECULAR THERAPY-NUCLEIC ACIDS 2019;18:80-87. [PMID: 31536883 PMCID: PMC6796744 DOI: 10.1016/j.omtn.2019.08.008] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 07/17/2019] [Accepted: 08/02/2019] [Indexed: 11/23/2022]

Wei L, Xing P, Shi G, Ji Z, Zou Q. Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:1264-1273. [PMID: 28222000 DOI: 10.1109/tcbb.2017.2670558] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]

Lin Y, Cai Y, Liu J, Lin C, Liu X. An advanced approach to identify antimicrobial peptides and their function types for penaeus through machine learning strategies. BMC Bioinformatics 2019;20:291. [PMID: 31182007 PMCID: PMC6557738 DOI: 10.1186/s12859-019-2766-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Ru X, Li L, Zou Q. Incorporating Distance-Based Top-n-gram and Random Forest To Identify Electron Transport Proteins. J Proteome Res 2019;18:2931-2939. [DOI: 10.1021/acs.jproteome.9b00250] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Han K, Wang M, Zhang L, Wang Y, Guo M, Zhao M, Zhao Q, Zhang Y, Zeng N, Wang C. Predicting Ion Channels Genes and Their Types With Machine Learning Techniques. Front Genet 2019;10:399. [PMID: 31130983 PMCID: PMC6510169 DOI: 10.3389/fgene.2019.00399] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Accepted: 04/12/2019] [Indexed: 02/01/2023] Open

Gao YC, Zhou XH, Zhang W. An Ensemble Strategy to Predict Prognosis in Ovarian Cancer Based on Gene Modules. Front Genet 2019;10:366. [PMID: 31068972 PMCID: PMC6491874 DOI: 10.3389/fgene.2019.00366] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 04/05/2019] [Indexed: 12/15/2022] Open

Ru X, Li L, Wang C. Identification of Phage Viral Proteins With Hybrid Sequence Features. Front Microbiol 2019;10:507. [PMID: 30972038 PMCID: PMC6443926 DOI: 10.3389/fmicb.2019.00507] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2018] [Accepted: 02/27/2019] [Indexed: 02/01/2023] Open

Li Y, Niu M, Zou Q. ELM-MHC: An Improved MHC Identification Method with Extreme Learning Machine Algorithm. J Proteome Res 2019;18:1392-1401. [DOI: 10.1021/acs.jproteome.9b00012] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Wei L, Hu J, Li F, Song J, Su R, Zou Q. Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief Bioinform 2018;21:106-119. [PMID: 30383239 DOI: 10.1093/bib/bby107] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 09/18/2018] [Accepted: 10/05/2018] [Indexed: 12/11/2022] Open

Niu M, Li Y, Wang C, Han K. RFAmyloid: A Web Server for Predicting Amyloid Proteins. Int J Mol Sci 2018;19:ijms19072071. [PMID: 30013015 PMCID: PMC6073578 DOI: 10.3390/ijms19072071] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 07/10/2018] [Accepted: 07/12/2018] [Indexed: 12/22/2022] Open

Kumar P, Joy J, Pandey A, Gupta D. PRmePRed: A protein arginine methylation prediction tool. PLoS One 2017;12:e0183318. [PMID: 28813517 PMCID: PMC5557562 DOI: 10.1371/journal.pone.0183318] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Accepted: 08/02/2017] [Indexed: 12/16/2022] Open

Xu ZC, Wang P, Qiu WR, Xiao X. iSS-PC: Identifying Splicing Sites via Physical-Chemical Properties Using Deep Sparse Auto-Encoder. Sci Rep 2017;7:8222. [PMID: 28811565 PMCID: PMC5557945 DOI: 10.1038/s41598-017-08523-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Accepted: 07/10/2017] [Indexed: 12/13/2022] Open

Wei L, Xing P, Su R, Shi G, Ma ZS, Zou Q. CPPred-RF: A Sequence-based Predictor for Identifying Cell-Penetrating Peptides and Their Uptake Efficiency. J Proteome Res 2017;16:2044-2053. [PMID: 28436664 DOI: 10.1021/acs.jproteome.7b00019] [Citation(s) in RCA: 129] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]

Wei L, Tang J, Zou Q. Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2016.06.026] [Citation(s) in RCA: 196] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Detecting N⁶-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines. Sci Rep 2017;7:40242. [PMID: 28079126 PMCID: PMC5227715 DOI: 10.1038/srep40242] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 12/05/2016] [Indexed: 12/22/2022] Open

Zou Q, Wan S, Ju Y, Tang J, Zeng X. Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy. BMC SYSTEMS BIOLOGY 2016;10:114. [PMID: 28155714 PMCID: PMC5259984 DOI: 10.1186/s12918-016-0353-5] [Citation(s) in RCA: 135] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Cai Y, Liao Z, Ju Y, Liu J, Mao Y, Liu X. Resistance gene identification from Larimichthys crocea with machine learning techniques. Sci Rep 2016;6:38367. [PMID: 27922074 PMCID: PMC5138596 DOI: 10.1038/srep38367] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 11/08/2016] [Indexed: 12/11/2022] Open

Wei L, Liao M, Gao X, Wang J, Lin W. mGOF-loc: A novel ensemble learning method for human protein subcellular localization prediction. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.09.137] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Liu B, Liu Y, Jin X, Wang X, Liu B. iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance. Sci Rep 2016;6:33483. [PMID: 27641752 PMCID: PMC5027590 DOI: 10.1038/srep33483] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 08/25/2016] [Indexed: 01/01/2023] Open

Li YH, Xu JY, Tao L, Li XF, Li S, Zeng X, Chen SY, Zhang P, Qin C, Zhang C, Chen Z, Zhu F, Chen YZ. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity. PLoS One 2016;11:e0155290. [PMID: 27525735 PMCID: PMC4985167 DOI: 10.1371/journal.pone.0155290] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 04/27/2016] [Indexed: 12/20/2022] Open

Affiliation(s)

Ying Hong Li Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
Jing Yu Xu Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China
Lin Tao Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
Xiao Feng Li Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
Shuang Li Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
Xian Zeng Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
Shang Ying Chen Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
Peng Zhang Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
Chu Qin Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
Cheng Zhang Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
Zhe Chen Zhejiang Key Laboratory of Gastro-intestinal Pathophysiology, Zhejiang Hospital of Traditional Chinese Medicine, Zhejiang Chinese Medical University, Hangzhou, P. R. China
Feng Zhu Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
Yu Zong Chen Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore

Collapse

In Silico Prediction of Gamma-Aminobutyric Acid Type-A Receptors Using Novel Machine-Learning-Based SVM and GBDT Approaches. BIOMED RESEARCH INTERNATIONAL 2016;2016:2375268. [PMID: 27579307 PMCID: PMC4992803 DOI: 10.1155/2016/2375268] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2016] [Revised: 06/08/2016] [Accepted: 06/19/2016] [Indexed: 11/17/2022]

Ensemble Feature Learning of Genomic Data Using Support Vector Machine. PLoS One 2016;11:e0157330. [PMID: 27304923 PMCID: PMC4909287 DOI: 10.1371/journal.pone.0157330] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2015] [Accepted: 05/28/2016] [Indexed: 11/29/2022] Open

Abstract

The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data.

Collapse

A Novel Peptide Binding Prediction Approach for HLA-DR Molecule Based on Sequence and Structural Information. BIOMED RESEARCH INTERNATIONAL 2016;2016:3832176. [PMID: 27340658 PMCID: PMC4906198 DOI: 10.1155/2016/3832176] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Accepted: 05/04/2016] [Indexed: 11/18/2022]

Liu B, Wang S, Dong Q, Li S, Liu X. Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning. IEEE Trans Nanobioscience 2016;15:328-334. [PMID: 28113908 DOI: 10.1109/tnb.2016.2555951] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Che Y, Ju Y, Xuan P, Long R, Xing F. Identification of Multi-Functional Enzyme with Multi-Label Classifier. PLoS One 2016;11:e0153503. [PMID: 27078147 PMCID: PMC4831692 DOI: 10.1371/journal.pone.0153503] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 03/30/2016] [Indexed: 11/23/2022] Open

The Virtual Screening of the Drug Protein with a Few Crystal Structures Based on the Adaboost-SVM. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2016;2016:4809831. [PMID: 27127534 PMCID: PMC4834164 DOI: 10.1155/2016/4809831] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2015] [Revised: 03/06/2016] [Accepted: 03/07/2016] [Indexed: 11/27/2022]

Chen L, Zhang YH, Huang T, Cai YD. Identifying novel protein phenotype annotations by hybridizing protein-protein interactions and protein sequence similarities. Mol Genet Genomics 2016;291:913-34. [PMID: 26728152 DOI: 10.1007/s00438-015-1157-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Accepted: 12/08/2015] [Indexed: 01/18/2023]

Abstract

Studies of protein phenotypes represent a central challenge of modern genetics in the post-genome era because effective and accurate investigation of protein phenotypes is one of the most critical procedures to identify functional biological processes in microscale, which involves the analysis of multifactorial traits and has greatly contributed to the development of modern biology in the post genome era. Therefore, we have developed a novel computational method that identifies novel proteins associated with certain phenotypes in yeast based on the protein-protein interaction network. Unlike some existing network-based computational methods that identify the phenotype of a query protein based on its direct neighbors in the local network, the proposed method identifies novel candidate proteins for a certain phenotype by considering all annotated proteins with this phenotype on the global network using a shortest path (SP) algorithm. The identified proteins are further filtered using both a permutation test and their interactions and sequence similarities to annotated proteins. We compared our method with another widely used method called random walk with restart (RWR). The biological functions of proteins for each phenotype identified by our SP method and the RWR method were analyzed and compared. The results confirmed a large proportion of our novel protein phenotype annotation, and the RWR method showed a higher false positive rate than the SP method. Our method is equally effective for the prediction of proteins involving in all the eleven clustered yeast phenotypes with a quite low false positive rate. Considering the universality and generalizability of our supporting materials and computing strategies, our method can further be applied to study other organisms and the new functions we predicted can provide pertinent instructions for the further experimental verifications.

Collapse

Pai PP, Mondal S. MOWGLI: prediction of protein-MannOse interacting residues With ensemble classifiers usinG evoLutionary Information. J Biomol Struct Dyn 2015;34:2069-83. [PMID: 26457920 DOI: 10.1080/07391102.2015.1106978] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Survey of Natural Language Processing Techniques in Bioinformatics. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015;2015:674296. [PMID: 26525745 PMCID: PMC4615216 DOI: 10.1155/2015/674296] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2015] [Revised: 06/12/2015] [Accepted: 06/21/2015] [Indexed: 01/02/2023]

A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015;2015:178572. [PMID: 26508990 PMCID: PMC4609795 DOI: 10.1155/2015/178572] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2015] [Accepted: 08/25/2015] [Indexed: 11/29/2022]

Sample Selection for Training Cascade Detectors. PLoS One 2015. [PMID: 26197221 PMCID: PMC4510611 DOI: 10.1371/journal.pone.0133059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Zou Q, Li J, Song L, Zeng X, Wang G. Similarity computation strategies in the microRNA-disease network: a survey. Brief Funct Genomics 2015;15:55-64. [PMID: 26134276 DOI: 10.1093/bfgp/elv024] [Citation(s) in RCA: 141] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Toward a Literature-Driven Definition of Big Data in Healthcare. BIOMED RESEARCH INTERNATIONAL 2015;2015:639021. [PMID: 26137488 PMCID: PMC4468280 DOI: 10.1155/2015/639021] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 02/04/2015] [Indexed: 11/17/2022]

A linear-RBF multikernel SVM to classify big text corpora. BIOMED RESEARCH INTERNATIONAL 2015;2015:878291. [PMID: 25879039 PMCID: PMC4386713 DOI: 10.1155/2015/878291] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2014] [Revised: 11/10/2014] [Accepted: 11/13/2014] [Indexed: 11/23/2022]

Xu R, Zhou J, Wang H, He Y, Wang X, Liu B. Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC SYSTEMS BIOLOGY 2015;9 Suppl 1:S10. [PMID: 25708928 PMCID: PMC4331676 DOI: 10.1186/1752-0509-9-s1-s10] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Abstract

BACKGROUND

DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions.

RESULTS

We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods.

CONCLUSIONS

The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/.

Collapse