Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Postma G, Krooshof P, Buydens L. Opening the kernel of kernel partial least squares and support vector machines. Anal Chim Acta 2011;705:123-34. [DOI: 10.1016/j.aca.2011.04.025] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2011] [Revised: 03/31/2011] [Accepted: 04/14/2011] [Indexed: 02/08/2023]

For:	Postma G, Krooshof P, Buydens L. Opening the kernel of kernel partial least squares and support vector machines. Anal Chim Acta 2011;705:123-34. [DOI: 10.1016/j.aca.2011.04.025] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2011] [Revised: 03/31/2011] [Accepted: 04/14/2011] [Indexed: 02/08/2023]

Number

Cited by Other Article(s)

Gjelsvik EL, Tøndel K. Increased interpretation of deep learning models using hierarchical cluster-based modelling. PLoS One 2023;18:e0295251. [PMID: 38060472 PMCID: PMC10703235 DOI: 10.1371/journal.pone.0295251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/20/2023] [Indexed: 12/18/2023] Open

Shan P, Bi Y, Li Z, Wang Q, He Z, Zhao Y, Peng S. Unsupervised model adaptation for multivariate calibration by domain adaptation-regularization based kernel partial least square. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2023;292:122418. [PMID: 36736045 DOI: 10.1016/j.saa.2023.122418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 01/24/2023] [Accepted: 01/25/2023] [Indexed: 06/18/2023]

The Correlation Analysis between Air Quality and Construction Sites: Evaluation in the Urban Environment during the COVID-19 Pandemic. SUSTAINABILITY 2022. [DOI: 10.3390/su14127075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Stavropoulos G, van Vorstenbosch R, Jonkers DMAE, Penders J, Hill JE, van Schooten FJ, Smolinska A. Advanced data fusion: Random forest proximities and pseudo-sample principle towards increased prediction accuracy and variable interpretation. Anal Chim Acta 2021;1183:339001. [PMID: 34627524 DOI: 10.1016/j.aca.2021.339001] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 08/24/2021] [Accepted: 08/25/2021] [Indexed: 11/26/2022]

Abstract

Data fusion has gained much attention in the field of life sciences, and this is because analysis of biological samples may require the use of data coming from multiple complementary sources to express the samples fully. Data fusion lies in the idea that different data platforms detect different biological entities. Therefore, if these different biological compounds are then combined, they can provide comprehensive profiling and understanding of the research question in hand. Data fusion can be performed in three different traditional ways: low-level, mid-level, and high-level data fusion. However, the increasing complexity and amount of generated data require the development of more sophisticated fusion approaches. In that regard, the current study presents an advanced data fusion approach (i.e. proximities stacking) based on random forest proximities coupled with the pseudo-sample principle. Four different data platforms of 130 samples each (faecal microbiome, blood, blood headspace, and exhaled breath samples of patients who have Crohn's disease) were used to demonstrate the classification performance of this new approach. More specifically, 104 samples were used to train and validate the models, whereas the remaining 26 samples were used to validate the models externally. Mid-level, high-level, as well as individual platform classification predictions, were made and compared against the proximities stacking approach. The performance of each approach was assessed by calculating the sensitivity and specificity of each model for the external test set, and visualized by performing principal component analysis on the proximity matrices of the training samples to then, subsequently, project the test samples onto that space. The implementation of pseudo-samples allowed for the identification of the most important variables per platform, finding relations among variables of the different data platforms, and the examination of how variables behave in the samples. The proximities stacking approach outperforms both mid-level and high-level fusion approaches, as well as all individual platform predictions. Concurrently, it tackles significant bottlenecks of the traditional ways of fusion and of another advanced fusion way discussed in the paper, and finally, it contradicts the general belief that the more data, the merrier the result, and therefore, considerations have to be taken into account before any data fusion analysis is conducted.

Collapse

Guo HN, Wu SB, Tian YJ, Zhang J, Liu HT. Application of machine learning methods for the prediction of organic solid waste treatment and recycling processes: A review. BIORESOURCE TECHNOLOGY 2021;319:124114. [PMID: 32942236 DOI: 10.1016/j.biortech.2020.124114] [Citation(s) in RCA: 89] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 09/04/2020] [Accepted: 09/07/2020] [Indexed: 05/23/2023]

Chemometric Strategies for Spectroscopy-Based Food Authentication. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10186544] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Zhang H, Deng X, Zhang Y, Hou C, Li C. Dynamic nonlinear batch process fault detection and identification based on two‐directional dynamic kernel slow feature analysis. CAN J CHEM ENG 2020. [DOI: 10.1002/cjce.23832] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Constructing bi-plots for random forest: Tutorial. Anal Chim Acta 2020;1131:146-155. [PMID: 32928475 DOI: 10.1016/j.aca.2020.06.043] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 06/15/2020] [Accepted: 06/16/2020] [Indexed: 01/29/2023]

Narayanan H, Sokolov M, Butté A, Morbidelli M. Decision Tree-PLS (DT-PLS) algorithm for the development of process: Specific local prediction models. Biotechnol Prog 2019;35:e2818. [PMID: 30969466 DOI: 10.1002/btpr.2818] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Revised: 03/15/2019] [Accepted: 03/25/2019] [Indexed: 12/26/2022]

Sanz H, Valim C, Vegas E, Oller JM, Reverter F. SVM-RFE: selection and visualization of the most relevant features through non-linear kernels. BMC Bioinformatics 2018;19:432. [PMID: 30453885 PMCID: PMC6245920 DOI: 10.1186/s12859-018-2451-4] [Citation(s) in RCA: 233] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 10/30/2018] [Indexed: 02/02/2023] Open

Abstract

Background

Support vector machines (SVM) are a powerful tool to analyze data with a number of predictors approximately equal or larger than the number of observations. However, originally, application of SVM to analyze biomedical data was limited because SVM was not designed to evaluate importance of predictor variables. Creating predictor models based on only the most relevant variables is essential in biomedical research. Currently, substantial work has been done to allow assessment of variable importance in SVM models but this work has focused on SVM implemented with linear kernels. The power of SVM as a prediction model is associated with the flexibility generated by use of non-linear kernels. Moreover, SVM has been extended to model survival outcomes. This paper extends the Recursive Feature Elimination (RFE) algorithm by proposing three approaches to rank variables based on non-linear SVM and SVM for survival analysis.

Results

The proposed algorithms allows visualization of each one the RFE iterations, and hence, identification of the most relevant predictors of the response variable. Using simulation studies based on time-to-event outcomes and three real datasets, we evaluate the three methods, based on pseudo-samples and kernel principal component analysis, and compare them with the original SVM-RFE algorithm for non-linear kernels. The three algorithms we proposed performed generally better than the gold standard RFE for non-linear kernels, when comparing the truly most relevant variables with the variable ranks produced by each algorithm in simulation studies. Generally, the RFE-pseudo-samples outperformed the other three methods, even when variables were assumed to be correlated in all tested scenarios.

Conclusions

The proposed approaches can be implemented with accuracy to select variables and assess direction and strength of associations in analysis of biomedical data using SVM for categorical or time-to-event responses. Conducting variable selection and interpreting direction and strength of associations between predictors and outcomes with the proposed approaches, particularly with the RFE-pseudo-samples approach can be implemented with accuracy when analyzing biomedical data. These approaches, perform better than the classical RFE of Guyon for realistic scenarios about the structure of biomedical data.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2451-4) contains supplementary material, which is available to authorized users.

Collapse

Zhang H, Tian X, Deng X, Cao Y. Batch process fault detection and identification based on discriminant global preserving kernel slow feature analysis. ISA TRANSACTIONS 2018;79:108-126. [PMID: 29776590 DOI: 10.1016/j.isatra.2018.05.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2017] [Revised: 05/01/2018] [Accepted: 05/08/2018] [Indexed: 06/08/2023]

Song W, Wang H, Maguire P, Nibouche O. Nearest clusters based partial least squares discriminant analysis for the classification of spectral data. Anal Chim Acta 2018;1009:27-38. [DOI: 10.1016/j.aca.2018.01.023] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 12/18/2017] [Accepted: 01/15/2018] [Indexed: 11/29/2022]

Differentiation Between Organic and Non-Organic Apples Using Diffraction Grating and Image Processing-A Cost-Effective Approach. SENSORS 2018;18:s18061667. [PMID: 29789501 PMCID: PMC6021810 DOI: 10.3390/s18061667] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2018] [Revised: 05/15/2018] [Accepted: 05/20/2018] [Indexed: 11/17/2022]

Chemometric Methods for Classification and Feature Selection. COMPREHENSIVE ANALYTICAL CHEMISTRY 2018. [DOI: 10.1016/bs.coac.2018.08.006] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]

Wongsaipun S, Krongchai C, Jakmunee J, Kittiwachana S. Rice Grain Freshness Measurement Using Rapid Visco Analyzer and Chemometrics. FOOD ANAL METHOD 2017. [DOI: 10.1007/s12161-017-1031-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Bian X, Li S, Lin L, Tan X, Fan Q, Li M. High and low frequency unfolded partial least squares regression based on empirical mode decomposition for quantitative analysis of fuel oil samples. Anal Chim Acta 2016;925:16-22. [DOI: 10.1016/j.aca.2016.04.029] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Revised: 03/31/2016] [Accepted: 04/21/2016] [Indexed: 12/26/2022]

Tan C, Chen H, Lin Z, Wu T, Wang L, Zhang K. Classification of Liquor Using Near-Infrared Spectroscopy and Chemometrics. ANAL LETT 2014. [DOI: 10.1080/00032719.2014.938343] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Chen H, Tan C, Wu H, Lin Z, Wu T. Feasibility of Rapid Diagnosis of Colorectal Cancer by Near-Infrared Spectroscopy and Support Vector Machine. ANAL LETT 2014. [DOI: 10.1080/00032719.2014.915410] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Singh KP, Gupta S, Rai P. Predicting dissolved oxygen concentration using kernel regression modeling approaches with nonlinear hydro-chemical data. ENVIRONMENTAL MONITORING AND ASSESSMENT 2014;186:2749-2765. [PMID: 24338099 DOI: 10.1007/s10661-013-3576-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Accepted: 11/28/2013] [Indexed: 06/03/2023]

A quantitative structure-activity relationship study of anti-HIV activity of substituted HEPT using nonlinear models. Med Chem Res 2013;22:5442-5452. [PMID: 24098069 PMCID: PMC3785711 DOI: 10.1007/s00044-013-0525-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2012] [Accepted: 01/31/2013] [Indexed: 11/27/2022]

Platikanov S, Martín J, Tauler R. Linear and non-linear chemometric modeling of THM formation in Barcelona's water treatment plant. THE SCIENCE OF THE TOTAL ENVIRONMENT 2012;432:365-374. [PMID: 22750183 DOI: 10.1016/j.scitotenv.2012.05.097] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Revised: 05/22/2012] [Accepted: 05/31/2012] [Indexed: 06/01/2023]

Interpretation and visualization of non-linear data fusion in kernel space: study on metabolomic characterization of progression of multiple sclerosis. PLoS One 2012;7:e38163. [PMID: 22715376 PMCID: PMC3371049 DOI: 10.1371/journal.pone.0038163] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2012] [Accepted: 05/01/2012] [Indexed: 11/22/2022] Open

Abstract

Background

In the last decade data fusion has become widespread in the field of metabolomics. Linear data fusion is performed most commonly. However, many data display non-linear parameter dependences. The linear methods are bound to fail in such situations. We used proton Nuclear Magnetic Resonance and Gas Chromatography-Mass Spectrometry, two well established techniques, to generate metabolic profiles of Cerebrospinal fluid of Multiple Sclerosis (MScl) individuals. These datasets represent non-linearly separable groups. Thus, to extract relevant information and to combine them a special framework for data fusion is required.

Methodology

The main aim is to demonstrate a novel approach for data fusion for classification; the approach is applied to metabolomics datasets coming from patients suffering from MScl at a different stage of the disease. The approach involves data fusion in kernel space and consists of four main steps. The first one is to extract the significant information per data source using Support Vector Machine Recursive Feature Elimination. This method allows one to select a set of relevant variables. In the next step the optimized kernel matrices are merged by linear combination. In step 3 the merged datasets are analyzed with a classification technique, namely Kernel Partial Least Square Discriminant Analysis. In the final step, the variables in kernel space are visualized and their significance established.

Conclusions

We find that fusion in kernel space allows for efficient and reliable discrimination of classes (MScl and early stage). This data fusion approach achieves better class prediction accuracy than analysis of individual datasets and the commonly used mid-level fusion. The prediction accuracy on an independent test set (8 samples) reaches 100%. Additionally, the classification model obtained on fused kernels is simpler in terms of complexity, i.e. just one latent variable was sufficient. Finally, visualization of variables importance in kernel space was achieved.

Collapse

Cristescu SM, Gietema HA, Blanchet L, Kruitwagen CLJJ, Munnik P, van Klaveren RJ, Lammers JWJ, Buydens L, Harren FJM, Zanen P. Screening for emphysema via exhaled volatile organic compounds. J Breath Res 2011;5:046009. [PMID: 22071870 DOI: 10.1088/1752-7155/5/4/046009] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Noorizadeh H, Farmany A, Noorizadeh M. Application of GA–KPLS and L–M ANN calculations for the prediction of the capacity factor of hazardous psychoactive designer drugs. Med Chem Res 2011. [DOI: 10.1007/s00044-011-9794-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]