Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhang G, Fang B. LogitBoost classifier for discriminating thermophilic and mesophilic proteins. J Biotechnol 2007;127:417-24. [PMID: 17045354 DOI: 10.1016/j.jbiotec.2006.07.020] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2006] [Revised: 07/04/2006] [Accepted: 07/19/2006] [Indexed: 11/17/2022]

For:	Zhang G, Fang B. LogitBoost classifier for discriminating thermophilic and mesophilic proteins. J Biotechnol 2007;127:417-24. [PMID: 17045354 DOI: 10.1016/j.jbiotec.2006.07.020] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2006] [Revised: 07/04/2006] [Accepted: 07/19/2006] [Indexed: 11/17/2022]

Number

Cited by Other Article(s)

Yu H, Luo X. ThermoFinder: A sequence-based thermophilic proteins prediction framework. Int J Biol Macromol 2024;270:132469. [PMID: 38761901 DOI: 10.1016/j.ijbiomac.2024.132469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 05/20/2024]

Susanty M, Naim Mursalim MK, Hertadi R, Purwarianti A, Rajab TLE. Classifying alkaliphilic proteins using embeddings from protein language model. Comput Biol Med 2024;173:108385. [PMID: 38547659 DOI: 10.1016/j.compbiomed.2024.108385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/22/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]

Martínez‐Mauricio KL, García‐Jacas CR, Cordoves‐Delgado G. Examining evolutionary scale modeling-derived different-dimensional embeddings in the antimicrobial peptide classification through a KNIME workflow. Protein Sci 2024;33:e4928. [PMID: 38501511 PMCID: PMC10949403 DOI: 10.1002/pro.4928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 01/28/2024] [Accepted: 01/30/2024] [Indexed: 03/20/2024]

Abstract

Molecular features play an important role in different bio-chem-informatics tasks, such as the Quantitative Structure-Activity Relationships (QSAR) modeling. Several pre-trained models have been recently created to be used in downstream tasks, either by fine-tuning a specific model or by extracting features to feed traditional classifiers. In this regard, a new family of Evolutionary Scale Modeling models (termed as ESM-2 models) was recently introduced, demonstrating outstanding results in protein structure prediction benchmarks. Herein, we studied the usefulness of the different-dimensional embeddings derived from the ESM-2 models to classify antimicrobial peptides (AMPs). To this end, we built a KNIME workflow to use the same modeling methodology across experiments in order to guarantee fair analyses. As a result, the 640- and 1280-dimensional embeddings derived from the 30- and 33-layer ESM-2 models, respectively, are the most valuable since statistically better performances were achieved by the QSAR models built from them. We also fused features of the different ESM-2 models, and it was concluded that the fusion contributes to getting better QSAR models than using features of a single ESM-2 model. Frequency studies revealed that only a portion of the ESM-2 embeddings is valuable for modeling tasks since between 43% and 66% of the features were never used. Comparisons regarding state-of-the-art deep learning (DL) models confirm that when performing methodologically principled studies in the prediction of AMPs, non-DL based QSAR models yield comparable-to-superior performances to DL-based QSAR models. The developed KNIME workflow is available-freely at https://github.com/cicese-biocom/classification-QSAR-bioKom. This workflow can be valuable to avoid unfair comparisons regarding new computational methods, as well as to propose new non-DL based QSAR models.

Collapse

Xiang X, Gao J, Ding Y. DeepPPThermo: A Deep Learning Framework for Predicting Protein Thermostability Combining Protein-Level and Amino Acid-Level Features. J Comput Biol 2024;31:147-160. [PMID: 38100126 DOI: 10.1089/cmb.2023.0097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2024] Open

Haselbeck F, John M, Zhang Y, Pirnay J, Fuenzalida-Werner J, Costa R, Grimm D. Superior protein thermophilicity prediction with protein language model embeddings. NAR Genom Bioinform 2023;5:lqad087. [PMID: 37829176 PMCID: PMC10566323 DOI: 10.1093/nargab/lqad087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 07/14/2023] [Accepted: 09/18/2023] [Indexed: 10/14/2023] Open

Zhao J, Yan W, Yang Y. DeepTP: A Deep Learning Model for Thermophilic Protein Prediction. Int J Mol Sci 2023;24:ijms24032217. [PMID: 36768540 PMCID: PMC9917291 DOI: 10.3390/ijms24032217] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 01/19/2023] [Accepted: 01/19/2023] [Indexed: 01/26/2023] Open

Kumar S, Duggineni VK, Singhania V, Misra SP, Deshpande PA. Unravelling and Quantifying the Biophysical– Biochemical Descriptors Governing Protein Thermostability by Machine Learning. ADVANCED THEORY AND SIMULATIONS 2023. [DOI: 10.1002/adts.202200703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Converting the genomic knowledge base to build protein specific machine learning prediction models; a classification study on thermophilic serine protease. Biologia (Bratisl) 2022. [DOI: 10.1007/s11756-022-01214-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins. Comput Biol Med 2022;146:105704. [PMID: 35690478 DOI: 10.1016/j.compbiomed.2022.105704] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 05/15/2022] [Accepted: 06/04/2022] [Indexed: 11/22/2022]

Charoenkwan P, Schaduangrat N, Hasan MM, Moni MA, Lió P, Shoombuatong W. Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins. EXCLI JOURNAL 2022;21:554-570. [PMID: 35651661 PMCID: PMC9150013 DOI: 10.17179/excli2022-4723] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 02/21/2022] [Indexed: 12/15/2022]

Abstract

Thermophilic proteins (TPPs) are critical for basic research and in the food industry due to their ability to maintain a thermodynamically stable fold at extremely high temperatures. Thus, the expeditious identification of novel TPPs through computational models from protein sequences is very desirable. Over the last few decades, a number of computational methods, especially machine learning (ML)-based methods, for in silico prediction of TPPs have been developed. Therefore, it is desirable to revisit these methods and summarize their advantages and disadvantages in order to further develop new computational approaches to achieve more accurate and improved prediction of TPPs. With this goal in mind, we comprehensively investigate a large collection of fourteen state-of-the-art TPP predictors in terms of their dataset size, feature encoding schemes, feature selection strategies, ML algorithms, evaluation strategies and web server/software usability. To the best of our knowledge, this article represents the first comprehensive review on the development of ML-based methods for in silico prediction of TPPs. Among these TPP predictors, they can be classified into two groups according to the interpretability of ML algorithms employed (i.e., computational black-box methods and computational white-box methods). In order to perform the comparative analysis, we conducted a comparative study on several currently available TPP predictors based on two benchmark datasets. Finally, we provide future perspectives for the design and development of new computational models for TPP prediction. We hope that this comprehensive review will facilitate researchers in selecting an appropriate TPP predictor that is the most suitable one to deal with their purposes and provide useful perspectives for the development of more effective and accurate TPP predictors.

Collapse

Zeng B, Zhao S, Zhou R, Zhou Y, Jin W, Yi Z, Zhang G. Engineering and screening of novel β-1,3-xylanases with desired hydrolysate type by optimized ancestor sequence reconstruction and data mining. Comput Struct Biotechnol J 2022;20:3313-3321. [PMID: 35832630 PMCID: PMC9251504 DOI: 10.1016/j.csbj.2022.06.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 06/22/2022] [Accepted: 06/22/2022] [Indexed: 11/03/2022] Open

Ahmed Z, Zulfiqar H, Khan AA, Gul I, Dao FY, Zhang ZY, Yu XL, Tang L. iThermo: A Sequence-Based Model for Identifying Thermophilic Proteins Using a Multi-Feature Fusion Strategy. Front Microbiol 2022;13:790063. [PMID: 35273581 PMCID: PMC8902591 DOI: 10.3389/fmicb.2022.790063] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 01/10/2022] [Indexed: 01/20/2023] Open

Charoenkwan P, Chotpatiwetchkul W, Lee VS, Nantasenamat C, Shoombuatong W. A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides. Sci Rep 2021;11:23782. [PMID: 34893688 PMCID: PMC8664844 DOI: 10.1038/s41598-021-03293-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 12/01/2021] [Indexed: 02/08/2023] Open

Abstract

Owing to their ability to maintain a thermodynamically stable fold at extremely high temperatures, thermophilic proteins (TTPs) play a critical role in basic research and a variety of applications in the food industry. As a result, the development of computation models for rapidly and accurately identifying novel TTPs from a large number of uncharacterized protein sequences is desirable. In spite of existing computational models that have already been developed for characterizing thermophilic proteins, their performance and interpretability remain unsatisfactory. We present a novel sequence-based thermophilic protein predictor, termed SCMTPP, for improving model predictability and interpretability. First, an up-to-date and high-quality dataset consisting of 1853 TPPs and 3233 non-TPPs was compiled from published literature. Second, the SCMTPP predictor was created by combining the scoring card method (SCM) with estimated propensity scores of g-gap dipeptides. Benchmarking experiments revealed that SCMTPP had a cross-validation accuracy of 0.883, which was comparable to that of a support vector machine-based predictor (0.906-0.910) and 2-17% higher than that of commonly used machine learning models. Furthermore, SCMTPP outperformed the state-of-the-art approach (ThermoPred) on the independent test dataset, with accuracy and MCC of 0.865 and 0.731, respectively. Finally, the SCMTPP-derived propensity scores were used to elucidate the critical physicochemical properties for protein thermostability enhancement. In terms of interpretability and generalizability, comparative results showed that SCMTPP was effective for identifying and characterizing TPPs. We had implemented the proposed predictor as a user-friendly online web server at http://pmlabstack.pythonanywhere.com/SCMTPP in order to allow easy access to the model. SCMTPP is expected to be a powerful tool for facilitating community-wide efforts to identify TPPs on a large scale and guiding experimental characterization of TPPs.

Collapse

Bell TH, Bell T. Many roads to bacterial generalism. FEMS Microbiol Ecol 2021;97:6006266. [PMID: 33238305 DOI: 10.1093/femsec/fiaa240] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 11/23/2020] [Indexed: 12/13/2022] Open

Land Subsidence Susceptibility Mapping in Jakarta Using Functional and Meta-Ensemble Machine Learning Algorithm Based on Time-Series InSAR Data. REMOTE SENSING 2020. [DOI: 10.3390/rs12213627] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Guo Z, Wang P, Liu Z, Zhao Y. Discrimination of Thermophilic Proteins and Non-thermophilic Proteins Using Feature Dimension Reduction. Front Bioeng Biotechnol 2020;8:584807. [PMID: 33195148 PMCID: PMC7642589 DOI: 10.3389/fbioe.2020.584807] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 09/11/2020] [Indexed: 01/19/2023] Open

Wu L, Hu Y, Zhang X, Chen W, Yu ASL, Kellum JA, Waitman LR, Liu M. Changing relative risk of clinical factors for hospital-acquired acute kidney injury across age groups: a retrospective cohort study. BMC Nephrol 2020;21:321. [PMID: 32741377 PMCID: PMC7397647 DOI: 10.1186/s12882-020-01980-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2020] [Accepted: 07/23/2020] [Indexed: 12/14/2022] Open

Feng C, Ma Z, Yang D, Li X, Zhang J, Li Y. A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features. Front Bioeng Biotechnol 2020;8:285. [PMID: 32432088 PMCID: PMC7214540 DOI: 10.3389/fbioe.2020.00285] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 03/18/2020] [Indexed: 11/13/2022] Open

Hybrid Computational Intelligence Models for Improvement Gully Erosion Assessment. REMOTE SENSING 2020. [DOI: 10.3390/rs12010140] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

Abstract Gullying is a type of soil erosion that currently represents a major threat at the societal scale and will likely increase in the future. In Iran, soil erosion, and specifically gullying, is already causing significant distress to local economies by affecting agricultural productivity and infrastructure. Recognizing this threat has recently led the Iranian geomorphology community to focus on the problem across the whole country. This study is in line with other efforts where the optimal method to map gully-prone areas is sought by testing state-of-the-art machine learning tools. In this study, we compare the performance of three machine learning algorithms, namely Fisher’s linear discriminant analysis (FLDA), logistic model tree (LMT) and naïve Bayes tree (NBTree). We also introduce three novel ensemble models by combining the aforementioned base classifiers to the Random SubSpace (RS) meta-classifier namely RS-FLDA, RS-LMT and RS-NBTree. The area under the receiver operating characteristic (AUROC), true skill statistics (TSS) and kappa criteria are used for calibration (goodness-of-fit) and validation (prediction accuracy) datasets to compare the performance of the different algorithms. In addition to susceptibility mapping, we also study the association between gully erosion and a set of morphometric, hydrologic and thematic properties by adopting the evidential belief function (EBF). The results indicate that hydrology-related factors contribute the most to gully formation, which is also confirmed by the susceptibility patterns displayed by the RS-NBTree ensemble. The RS-NBTree is the model that outperforms the other five models, as indicated by the prediction accuracy (area under curve (AUC) = 0.898, Kappa = 0.748 and TSS = 0.697), and goodness-of-fit (AUC = 0.780, Kappa = 0.682 and TSS = 0.618). The analyses are performed with the same gully presence/absence balanced modeling design. Therefore, the differences in performance are dependent on the algorithm architecture. Overall, the EBF model can detect strong and reasonable dependencies towards gully-prone conditions. The RS-NBTree ensemble model performed significantly better than the others, suggesting greater flexibility towards unknown data, which may support the applications of these methods in transferable susceptibility models in areas that are potentially erodible but currently lack gully data. Collapse

Establishment and evaluation of prediction model for multiple disease classification based on gut microbial data. Sci Rep 2019;9:10189. [PMID: 31308384 PMCID: PMC6629854 DOI: 10.1038/s41598-019-46249-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Accepted: 04/12/2019] [Indexed: 12/17/2022] Open

Land Subsidence Susceptibility Mapping Using Bayesian, Functional, and Meta-Ensemble Machine Learning Models. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9061248] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Panja AS, Nag A, Bandopadhyay B, Maiti S. Protein Stability Determination (PSD): A Tool for Proteomics Analysis. Curr Bioinform 2018. [DOI: 10.2174/1574893613666180315121614] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Tang H, Cao RZ, Wang W, Liu TS, Wang LM, He CM. A two-step discriminated method to identify thermophilic proteins. INT J BIOMATH 2017. [DOI: 10.1142/s1793524517500504] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Fan GL, Liu YL, Wang H. Identification of thermophilic proteins by incorporating evolutionary and acid dissociation information into Chou's general pseudo amino acid composition. J Theor Biol 2016;407:138-142. [DOI: 10.1016/j.jtbi.2016.07.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Revised: 06/24/2016] [Accepted: 07/07/2016] [Indexed: 10/21/2022]

Insights into the molecular basis of piezophilic adaptation: Extraction of piezophilic signatures. J Theor Biol 2015;390:117-26. [PMID: 26656108 DOI: 10.1016/j.jtbi.2015.11.021] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2015] [Revised: 11/06/2015] [Accepted: 11/21/2015] [Indexed: 11/20/2022]

Ofer D, Linial M. ProFET: Feature engineering captures high-level protein functions. Bioinformatics 2015;31:3429-36. [DOI: 10.1093/bioinformatics/btv345] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 05/29/2015] [Indexed: 11/13/2022] Open

Nath A, Subbiah K. Inferring biological basis about psychrophilicity by interpreting the rules generated from the correctly classified input instances by a classifier. Comput Biol Chem 2014;53PB:198-203. [PMID: 25462328 DOI: 10.1016/j.compbiolchem.2014.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Revised: 09/02/2014] [Accepted: 10/06/2014] [Indexed: 11/19/2022]

Wang L, Li C. Optimal subset selection of primary sequence features using the genetic algorithm for thermophilic proteins identification. Biotechnol Lett 2014;36:1963-9. [PMID: 24930111 DOI: 10.1007/s10529-014-1577-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Accepted: 05/28/2014] [Indexed: 10/25/2022]

House MJ, Bangma SJ, Thomas M, Gan EK, Ayonrinde OT, Adams LA, Olynyk JK, St Pierre TG. Texture-based classification of liver fibrosis using MRI. J Magn Reson Imaging 2013;41:322-8. [PMID: 24347292 DOI: 10.1002/jmri.24536] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2013] [Accepted: 11/15/2013] [Indexed: 12/23/2022] Open

Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins. Comput Biol Chem 2013;46:16-22. [DOI: 10.1016/j.compbiolchem.2013.05.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Revised: 04/24/2013] [Accepted: 05/03/2013] [Indexed: 01/15/2023]

Zuo YC, Chen W, Fan GL, Li QZ. A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins. Amino Acids 2012;44:573-80. [PMID: 22851052 DOI: 10.1007/s00726-012-1374-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 07/17/2012] [Indexed: 11/25/2022]

Nakariyakul S, Liu ZP, Chen L. Detecting thermophilic proteins through selecting amino acid and dipeptide composition features. Amino Acids 2011;42:1947-53. [DOI: 10.1007/s00726-011-0923-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2011] [Accepted: 04/20/2011] [Indexed: 11/29/2022]

Lin H, Chen W. Prediction of thermophilic proteins using feature selection technique. J Microbiol Methods 2010;84:67-70. [PMID: 21044646 DOI: 10.1016/j.mimet.2010.10.013] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Revised: 10/15/2010] [Accepted: 10/19/2010] [Indexed: 11/16/2022]

Zhang G, Li H, Fang B. Discriminating acidic and alkaline enzymes using a random forest model with secondary structure amino acid composition. Process Biochem 2009. [DOI: 10.1016/j.procbio.2009.02.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Jahandideh M, Barkooie SMH, Jahandideh S, Abdolmaleki P, Movahedi MM, Hoseini S, Asadabadi EB, Jouni FJ, Karami Z, Firoozabadi NH. Elucidating the protein cold-adaptation: Investigation of the parameters enhancing protein psychrophilicity. J Theor Biol 2008;255:113-8. [PMID: 18718477 DOI: 10.1016/j.jtbi.2008.07.034] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2008] [Revised: 06/10/2008] [Accepted: 07/27/2008] [Indexed: 11/28/2022]

Ge G, Wong GW. Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles. BMC Bioinformatics 2008;9:275. [PMID: 18547427 PMCID: PMC2440392 DOI: 10.1186/1471-2105-9-275] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2007] [Accepted: 06/11/2008] [Indexed: 01/14/2023] Open

Abstract

Background

Pancreatic cancer is the fourth leading cause of cancer death in the United States. Consequently, identification of clinically relevant biomarkers for the early detection of this cancer type is urgently needed. In recent years, proteomics profiling techniques combined with various data analysis methods have been successfully used to gain critical insights into processes and mechanisms underlying pathologic conditions, particularly as they relate to cancer. However, the high dimensionality of proteomics data combined with their relatively small sample sizes poses a significant challenge to current data mining methodology where many of the standard methods cannot be applied directly. Here, we propose a novel methodological framework using machine learning method, in which decision tree based classifier ensembles coupled with feature selection methods, is applied to proteomics data generated from premalignant pancreatic cancer.

Results

This study explores the utility of three different feature selection schemas (Student t test, Wilcoxon rank sum test and genetic algorithm) to reduce the high dimensionality of a pancreatic cancer proteomic dataset. Using the top features selected from each method, we compared the prediction performances of a single decision tree algorithm C4.5 with six different decision-tree based classifier ensembles (Random forest, Stacked generalization, Bagging, Adaboost, Logitboost and Multiboost). We show that ensemble classifiers always outperform single decision tree classifier in having greater accuracies and smaller prediction errors when applied to a pancreatic cancer proteomics dataset.

Conclusion

In our cross validation framework, classifier ensembles generally have better classification accuracies compared to that of a single decision tree when applied to a pancreatic cancer proteomic dataset, thus suggesting its utility in future proteomics data analysis. Additionally, the use of feature selection method allows us to select biomarkers with potentially important roles in cancer development, therefore highlighting the validity of this method.

Collapse