1
|
Li Y, Zhou H, Liu J, Tan X. A Hierarchical Matrix Factorization-Based Method for Intelligent Industrial Fault Diagnosis. SENSORS (BASEL, SWITZERLAND) 2024; 24:5408. [PMID: 39205102 PMCID: PMC11360107 DOI: 10.3390/s24165408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 07/29/2024] [Accepted: 08/06/2024] [Indexed: 09/04/2024]
Abstract
Data-driven fault diagnosis, identifying abnormality causes using collected industrial data, is one of the challenging tasks for intelligent industry safety management. It is worth noting that practical industrial data are usually related to a mixture of several physical attributes, such as the operating environment, product quality and working conditions. However, the traditional models may not be sufficient to leverage the coherent information for diagnostic performance enhancement, due to their shallow architecture. This paper presents a hierarchical matrix factorization (HMF) that relies on a succession of matrix factoring to find an efficient representation of industrial data for fault diagnosis. Specifically, HMF consecutively decomposes data into several hierarchies. The intermediate hierarchies play the role of analysis operators which automatically learn implicit characteristics of industrial data; the final hierarchy outputs high-level and discriminative features. Furthermore, HMF is also extended in a nonlinear manner by introducing activation functions, referred as NHMF, to deal with nonlinearities in practical industrial processes. The applications of HMF and NHMF to fault diagnosis are evaluated by the multiple-phase flow process. The experimental results show that our models achieve competitive performance against the considered shallow and deep models, consuming less computing time than deep models.
Collapse
Affiliation(s)
- Yanxia Li
- School of Automation, Chengdu University of Information Technology, Chengdu 610225, China; (Y.L.); (J.L.); (X.T.)
| | - Han Zhou
- School of Automation, Chongqing University, Chongqing 400044, China
| | - Jiajia Liu
- School of Automation, Chengdu University of Information Technology, Chengdu 610225, China; (Y.L.); (J.L.); (X.T.)
| | - Xuemin Tan
- School of Automation, Chengdu University of Information Technology, Chengdu 610225, China; (Y.L.); (J.L.); (X.T.)
| |
Collapse
|
2
|
Han S, Kim M, Jung S, Ahn J. Sparse ordinal discriminant analysis. Biometrics 2024; 80:ujad040. [PMID: 38412301 DOI: 10.1093/biomtc/ujad040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 10/27/2023] [Accepted: 12/26/2023] [Indexed: 02/29/2024]
Abstract
Ordinal class labels are frequently observed in classification studies across various fields. In medical science, patients' responses to a drug can be arranged in the natural order, reflecting their recovery postdrug administration. The severity of the disease is often recorded using an ordinal scale, such as cancer grades or tumor stages. We propose a method based on the linear discriminant analysis (LDA) that generates a sparse, low-dimensional discriminant subspace reflecting the class orders. Unlike existing approaches that focus on predictors marginally associated with ordinal labels, our proposed method selects variables that collectively contribute to the ordinal labels. We employ the optimal scoring approach for LDA as a regularization framework, applying an ordinality penalty to the optimal scores and a sparsity penalty to the coefficients for the predictors. We demonstrate the effectiveness of our approach using a glioma dataset, where we predict cancer grades based on gene expression. A simulation study with various settings validates the competitiveness of our classification performance and demonstrates the advantages of our approach in terms of the interpretability of the estimated classifier with respect to the ordinal class labels.
Collapse
Affiliation(s)
- Sangil Han
- Department of Statistics, Seoul National University, 08826 Seoul, South Korea
| | - Minwoo Kim
- Department of Statistics, Seoul National University, 08826 Seoul, South Korea
| | - Sungkyu Jung
- Department of Statistics, Seoul National University, 08826 Seoul, South Korea
| | - Jeongyoun Ahn
- Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology, 34141 Daejeon, South Korea
| |
Collapse
|
3
|
Kupek E, Liberali R. Food patterns associated with overweight in 7-11-year old children: machine-learning approach. CIENCIA & SAUDE COLETIVA 2024; 29:e14712022. [PMID: 38198326 DOI: 10.1590/1413-81232024291.14712022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 03/21/2023] [Indexed: 01/12/2024] Open
Abstract
Longitudinal study, whose objective was to present a better strategy and statistical methods, and demonstrate its use with the data across the 2013-2015 period in schoolchildren aged 7 to 11 years, covered with the same food questionnaire (WebCAAFE) survey in Florianopolis, southern Brazil. Six meals/snacks and 32 foods/beverages yielded 192 possible combinations denominated meal/snack-Specific Food/beverage item (MSFIs). LASSO algorithm (LASSO-logistic regression) was used to determine the MSFIs predictive of overweight/obesity, and then binary (logistic) regression was used to further analyze a subset of these variables. Late breakfast, lunch and dinner were all associated with increased overweight/obesity risk, as was an anticipated lunch. Time-of-day or meal-tagged food/beverage intake result in large number of variables whose predictive patterns regarding weight status can be analyzed by machine learning such as LASSO, which in turn may identify the patterns not amenable to other popular statistical methods such as binary logistic regression.
Collapse
Affiliation(s)
- Emil Kupek
- Departamento de Saúde Pública, Universidade Federal de Santa Catarina. Florianópolis SC Brasil.
| | - Rafaela Liberali
- Programa de Póa-Graduação em Ciências Médicas, Universidade Federal de Santa Catarina. Florianópolis SC Brasil
| |
Collapse
|
4
|
Zou J, Shah O, Chiu YC, Ma T, Atkinson JM, Oesterreich S, Lee AV, Tseng GC. Systems approach for congruence and selection of cancer models towards precision medicine. PLoS Comput Biol 2024; 20:e1011754. [PMID: 38198519 PMCID: PMC10805322 DOI: 10.1371/journal.pcbi.1011754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 01/23/2024] [Accepted: 12/12/2023] [Indexed: 01/12/2024] Open
Abstract
Cancer models are instrumental as a substitute for human studies and to expedite basic, translational, and clinical cancer research. For a given cancer type, a wide selection of models, such as cell lines, patient-derived xenografts, organoids and genetically modified murine models, are often available to researchers. However, how to quantify their congruence to human tumors and to select the most appropriate cancer model is a largely unsolved issue. Here, we present Congruence Analysis and Selection of CAncer Models (CASCAM), a statistical and machine learning framework for authenticating and selecting the most representative cancer models in a pathway-specific manner using transcriptomic data. CASCAM provides harmonization between human tumor and cancer model omics data, systematic congruence quantification, and pathway-based topological visualization to determine the most appropriate cancer model selection. The systems approach is presented using invasive lobular breast carcinoma (ILC) subtype and suggesting CAMA1 followed by UACC3133 as the most representative cell lines for ILC research. Two additional case studies for triple negative breast cancer (TNBC) and patient-derived xenograft/organoid (PDX/PDO) are further investigated. CASCAM is generalizable to any cancer subtype and will authenticate cancer models for faithful non-human preclinical research towards precision medicine.
Collapse
Affiliation(s)
- Jian Zou
- Department of Statistics, School of Public Health, Chongqing Medical University, Chongqing, China
| | - Osama Shah
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, Pennsylvania, United States of America
- Magee-Womens Research Institute, Pittsburgh, Pennsylvania, United States of America
- Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Yu-Chiao Chiu
- Cancer Therapeutics Program, UPMC Hillman Cancer Center (HCC), Pittsburgh, Pennsylvania, United States of America
- Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, University of Maryland, College Park, Maryland, United States of America
| | - Jennifer M. Atkinson
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, Pennsylvania, United States of America
- Magee-Womens Research Institute, Pittsburgh, Pennsylvania, United States of America
- Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Steffi Oesterreich
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, Pennsylvania, United States of America
- Magee-Womens Research Institute, Pittsburgh, Pennsylvania, United States of America
- Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Adrian V. Lee
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, Pennsylvania, United States of America
- Magee-Womens Research Institute, Pittsburgh, Pennsylvania, United States of America
- Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
5
|
Lai Z, Chen X, Zhang J, Kong H, Wen J. Maximal Margin Support Vector Machine for Feature Representation and Classification. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:6700-6713. [PMID: 37018685 DOI: 10.1109/tcyb.2022.3232800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
High-dimensional small sample size data, which may lead to singularity in computation, are becoming increasingly common in the field of pattern recognition. Moreover, it is still an open problem how to extract the most suitable low-dimensional features for the support vector machine (SVM) and simultaneously avoid singularity so as to enhance the SVM's performance. To address these problems, this article designs a novel framework that integrates the discriminative feature extraction and sparse feature selection into the support vector framework to make full use of the classifiers' characteristics to find the optimal/maximal classification margin. As such, the extracted low-dimensional features from high-dimensional data are more suitable for SVM to obtain good performance. Thus, a novel algorithm, called the maximal margin SVM (MSVM), is proposed to achieve this goal. An alternatively iterative learning strategy is adopted in MSVM to learn the optimal discriminative sparse subspace and the corresponding support vectors. The mechanism and the essence of the designed MSVM are revealed. The computational complexity and convergence are also analyzed and validated. Experimental results on some well-known databases (including breastmnist, pneumoniamnist, colon-cancer, etc.) show the great potential of MSVM against classical discriminant analysis methods and SVM-related methods, and the codes can be available on https://www.scholat.com/laizhihui.
Collapse
|
6
|
Kim J, Lee Y, Liang Z. The Geometry of Nonlinear Embeddings in Kernel Discriminant Analysis. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:5203-5217. [PMID: 35857735 DOI: 10.1109/tpami.2022.3192726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Fisher's linear discriminant analysis is a classical method for classification, yet it is limited to capturing linear features only. Kernel discriminant analysis as an extension is known to successfully alleviate the limitation through a nonlinear feature mapping. We study the geometry of nonlinear embeddings in discriminant analysis with polynomial kernels and Gaussian kernel by identifying the population-level discriminant function that depends on the data distribution and the kernel. In order to obtain the discriminant function, we solve a generalized eigenvalue problem with between-class and within-class covariance operators. The polynomial discriminants are shown to capture the class difference through the population moments explicitly. For approximation of the Gaussian discriminant, we use a particular representation of the Gaussian kernel by utilizing the exponential generating function for Hermite polynomials. We also show that the Gaussian discriminant can be approximated using randomized projections of the data. Our results illuminate how the data distribution and the kernel interact in determination of the nonlinear embedding for discrimination, and provide a guideline for choice of the kernel and its parameters.
Collapse
|
7
|
Hirose K, Miura K, Koie A. Hierarchical clustered multiclass discriminant analysis via cross-validation. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2022.107613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
8
|
Del Real Mata C, Jeanne O, Jalali M, Lu Y, Mahshid S. Nanostructured-Based Optical Readouts Interfaced with Machine Learning for Identification of Extracellular Vesicles. Adv Healthc Mater 2023; 12:e2202123. [PMID: 36443009 DOI: 10.1002/adhm.202202123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 11/14/2022] [Indexed: 11/30/2022]
Abstract
Extracellular vesicles (EVs) are shed from cancer cells into body fluids, enclosing molecular information about the underlying disease with the potential for being the target cancer biomarker in emerging diagnosis approaches such as liquid biopsy. Still, the study of EVs presents major challenges due to their heterogeneity, complexity, and scarcity. Recently, liquid biopsy platforms have allowed the study of tumor-derived materials, holding great promise for early-stage diagnosis and monitoring of cancer when interfaced with novel adaptations of optical readouts and advanced machine learning analysis. Here, recent advances in labeled and label-free optical techniques such as fluorescence, plasmonic, and chromogenic-based systems interfaced with nanostructured sensors like nanoparticles, nanoholes, and nanowires, and diverse machine learning analyses are reviewed. The adaptability of the different optical methods discussed is compared and insights are provided into prospective avenues for the translation of the technological approaches for cancer diagnosis. It is discussed that the inherent augmented properties of nanostructures enhance the sensitivity of the detection of EVs. It is concluded by reviewing recent integrations of nanostructured-based optical readouts with diverse machine learning models as novel analysis ventures that can potentially increase the capability of the methods to the point of translation into diagnostic applications.
Collapse
Affiliation(s)
| | - Olivia Jeanne
- McGill University, Department of Bioengineering, Montreal, QC, H3A 0E9, Canada
| | - Mahsa Jalali
- McGill University, Department of Bioengineering, Montreal, QC, H3A 0E9, Canada
| | - Yao Lu
- McGill University, Department of Bioengineering, Montreal, QC, H3A 0E9, Canada
| | - Sara Mahshid
- McGill University, Department of Bioengineering, Montreal, QC, H3A 0E9, Canada
| |
Collapse
|
9
|
Xue K, Yang J, Yao F. Optimal linear discriminant analysis for high-dimensional functional data. J Am Stat Assoc 2023. [DOI: 10.1080/01621459.2022.2164288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Affiliation(s)
- Kaijie Xue
- School of Statistics and Data Science, Nankai University, Tianjin, China
| | - Jin Yang
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20852, U.S.A
| | - Fang Yao
- Department of Probability and Statistics, School of Mathematical Sciences, Center for Statistical Science, Peking University, Beijing, China
| |
Collapse
|
10
|
Bat algorithm for variable selection in multivariate classification modeling using linear discriminant analysis. Microchem J 2023. [DOI: 10.1016/j.microc.2022.108382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
11
|
Atkins S, Einarsson G, Clemmensen L, Ames B. Proximal methods for sparse optimal scoring and discriminant analysis. ADV DATA ANAL CLASSI 2022. [DOI: 10.1007/s11634-022-00530-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
12
|
Wang J, Wang H, Nie F, Li X. Ratio Sum Versus Sum Ratio for Linear Discriminant Analysis. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:10171-10185. [PMID: 34874851 DOI: 10.1109/tpami.2021.3133351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Dimension reduction is a critical technology for high-dimensional data processing, where Linear Discriminant Analysis (LDA) and its variants are effective supervised methods. However, LDA prefers to feature with smaller variance, which causes feature with weak discriminative ability retained. In this paper, we propose a novel Ratio Sum for Linear Discriminant Analysis (RSLDA), which aims at maximizing discriminative ability of each feature in subspace. To be specific, it maximizes the sum of ratio of the between-class distance to the within-class distance in each dimension of subspace. Since the original RSLDA problem is difficult to obtain the closed solution, an equivalent problem is developed which can be solved by an alternative optimization algorithm. For solving the equivalent problem, it is transformed into two sub-problems, one of which can be solved directly, the other is changed into a convex optimization problem, where singular value decomposition is employed instead of matrix inversion. Consequently, performance of algorithm cannot be affected by the non-singularity of covariance matrix. Furthermore, Kernel RSLDA (KRSLDA) is presented to improve the robustness of RSLDA. Additionally, time complexity of RSLDA and KRSLDA are analyzed. Extensive experiments show that RSLDA and KRSLDA outperforms other comparison methods on toy datasets and multiple public datasets.
Collapse
|
13
|
Anzarmou Y, Mkhadri A, Oualkacha K. Sparse overlapped linear discriminant analysis. TEST-SPAIN 2022. [DOI: 10.1007/s11749-022-00839-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
14
|
Hu L, Zhang W, Dai Z. Joint Sparse Locality-Aware Regression for Robust Discriminative Learning. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12245-12258. [PMID: 34166212 DOI: 10.1109/tcyb.2021.3080128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
With the dramatic increase of dimensions in the data representation, extracting latent low-dimensional features becomes of the utmost importance for efficient classification. Aiming at the problems of weakly discriminating marginal representation and difficulty in revealing the data manifold structure in most of the existing linear discriminant methods, we propose a more powerful discriminant feature extraction framework, namely, joint sparse locality-aware regression (JSLAR). In our model, we formulate a new strategy induced by the nonsquared L2 norm for enhancing the local intraclass compactness of the data manifold, which can achieve the joint learning of the locality-aware graph structure and the desirable projection matrix. Besides, we formulate a weighted retargeted regression to perform the marginal representation learning adaptively instead of using the general average interclass margin. To alleviate the disturbance of outliers and prevent overfitting, we measure the regression term and locality-aware term together with the regularization term by forcing the row sparsity with the joint L2,1 norms. Then, we derive an effective iterative algorithm for solving the proposed model. The experimental results over a range of benchmark databases demonstrate that the proposed JSLAR outperforms some state-of-the-art approaches.
Collapse
|
15
|
Costa VG, Pedreira CE. Recent advances in decision trees: an updated survey. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10275-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
16
|
NetDA: An R Package for Network-Based Discriminant Analysis Subject to Multilabel Classes. JOURNAL OF PROBABILITY AND STATISTICS 2022. [DOI: 10.1155/2022/1041752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
In this paper, we introduce the R package NetDA, which aims to deal with multiclassification with network structures in predictors accommodated. To address the natural feature of network structures, we apply Gaussian graphical models to characterize dependence structures of the predictors and directly estimate the precision matrix. After that, the estimated precision matrix is employed to linear discriminant functions and quadratic discriminant functions. The R package NetDA is now available on CRAN, and the demonstration of functions is summarized as a vignette in the online documentation.
Collapse
|
17
|
Milligan K, Deng X, Ali-Adeeb R, Shreeves P, Punch S, Costie N, Crook JM, Brolo AG, Lum JJ, Andrews JL, Jirasek A. Prediction of disease progression indicators in prostate cancer patients receiving HDR-brachytherapy using Raman spectroscopy and semi-supervised learning: a pilot study. Sci Rep 2022; 12:15104. [PMID: 36068275 PMCID: PMC9448740 DOI: 10.1038/s41598-022-19446-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 08/29/2022] [Indexed: 11/09/2022] Open
Abstract
This work combines Raman spectroscopy (RS) with supervised learning methods-group and basis restricted non-negative matrix factorisation (GBR-NMF) and linear discriminant analysis (LDA)-to aid in the prediction of clinical indicators of disease progression in a cohort of 9 patients receiving high dose rate brachytherapy (HDR-BT) as the primary treatment for intermediate risk (D'Amico) prostate adenocarcinoma. The combination of Raman spectroscopy and GBR-NMF-sparseLDA modelling allowed for the prediction of the following clinical information; Gleason score, cancer of the prostate risk assessment (CAPRA) score of pre-treatment biopsies and a Ki67 score of < 3.5% or > 3.5% in post treatment biopsies. The three clinical indicators of disease progression investigated in this study were predicted using a single set of Raman spectral data acquired from each individual biopsy, obtained pre HDR-BT treatment. This work highlights the potential of RS, combined with supervised learning, as a tool for the prediction of multiple types of clinically relevant information to be acquired simultaneously using pre-treatment biopsies, therefore opening up the potential for avoiding the need for multiple immunohistochemistry (IHC) staining procedures (H&E, Ki67) and blood sample analysis (PSA) to aid in CAPRA scoring.
Collapse
Affiliation(s)
- Kirsty Milligan
- Department of Physics, University of British Columbia, Kelowna, BC, Canada
| | - Xinchen Deng
- Department of Physics, University of British Columbia, Kelowna, BC, Canada
| | - Ramie Ali-Adeeb
- Department of Physics, University of British Columbia, Kelowna, BC, Canada
| | - Phillip Shreeves
- Department of Statistics, University of British Columbia, Kelowna, Canada
| | - Samantha Punch
- Trev and Joyce Deeley Research Centre, BC Cancer, Victoria, BC, Canada
| | - Nathalie Costie
- Trev and Joyce Deeley Research Centre, BC Cancer, Victoria, BC, Canada
| | - Juanita M Crook
- Department of Radiation Oncology, University of British Columbia, Kelowna, BC, Canada
| | - Alexandre G Brolo
- Department of Chemistry, University of Victoria, British Columbia, Canada
| | - Julian J Lum
- Trev and Joyce Deeley Research Centre, BC Cancer, Victoria, BC, Canada.,Department of Biochemistry and Microbiology, University of Victoria, Victoria, Canada
| | - Jeffrey L Andrews
- Department of Statistics, University of British Columbia, Kelowna, Canada
| | - Andrew Jirasek
- Department of Physics, University of British Columbia, Kelowna, BC, Canada.
| |
Collapse
|
18
|
Rahmani N, Mani-Varnosfaderani A. Quality control, classification, and authentication of Iranian rice varieties using FT-IR spectroscopy and sparse chemometric methods. J Food Compost Anal 2022. [DOI: 10.1016/j.jfca.2022.104650] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
19
|
Yu W, Wade S, Bondell HD, Azizi L. Non-stationary Gaussian process discriminant analysis with variable selection for high-dimensional functional data. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2098136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Weichang Yu
- Melbourne Centre for Data Science, University of Melbourne
| | - Sara Wade
- School of Mathematics, University of Edinburgh
| | | | - Lamiae Azizi
- School of Mathematics and Statistics, University of Sydney
| |
Collapse
|
20
|
Chen LP. Nonparametric discriminant analysis with network structures in predictor. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2084618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Li-Pang Chen
- Department of Statistics, National Chengchi University, Taipei, Taiwan
| |
Collapse
|
21
|
Dornaika F, Khoder A, Moujahid A, Khoder W. A supervised discriminant data representation: application to pattern classification. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07332-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractThe performance of machine learning and pattern recognition algorithms generally depends on data representation. That is why, much of the current effort in performing machine learning algorithms goes into the design of preprocessing frameworks and data transformations able to support effective machine learning. The method proposed in this work consists of a hybrid linear feature extraction scheme to be used in supervised multi-class classification problems. Inspired by two recent linear discriminant methods: robust sparse linear discriminant analysis (RSLDA) and inter-class sparsity-based discriminative least square regression (ICS_DLSR), we propose a unifying criterion that is able to retain the advantages of these two powerful methods. The resulting transformation relies on sparsity-promoting techniques both to select the features that most accurately represent the data and to preserve the row-sparsity consistency property of samples from the same class. The linear transformation and the orthogonal matrix are estimated using an iterative alternating minimization scheme based on steepest descent gradient method and different initialization schemes. The proposed framework is generic in the sense that it allows the combination and tuning of other linear discriminant embedding methods. According to the experiments conducted on several datasets including faces, objects, and digits, the proposed method was able to outperform competing methods in most cases.
Collapse
|
22
|
Dornaika F, Khoder A, Khoder W. Data representation via refined discriminant analysis and common class structure. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.12.068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
23
|
Lee M, Song YH, Li L, Lee KY, Yang SB. Detecting fake reviews with supervised machine learning algorithms. SERVICE INDUSTRIES JOURNAL 2022. [DOI: 10.1080/02642069.2022.2054996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Minwoo Lee
- Conrad N. Hilton College of Hotel and Restaurant Management, University of Houston, Houston, TX, USA
| | - Young Ho Song
- Odette School of Business, University of Windsor, Windsor, Ontario, Canada
| | - Lin Li
- Business School, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
| | - Kyung Young Lee
- Rowe School of Business, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Sung-Byung Yang
- School of Management, Kyung Hee University, Seoul, Republic of Korea
| |
Collapse
|
24
|
Fujikoshi Y. High-dimensional consistencies of KOO methods in multivariate regression model and discriminant analysis. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2021.104860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
25
|
Hu X, Sun Y, Gao J, Hu Y, Ju F, Yin B. Probabilistic Linear Discriminant Analysis Based on L 1-Norm and Its Bayesian Variational Inference. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1616-1627. [PMID: 32386179 DOI: 10.1109/tcyb.2020.2985997] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Probabilistic linear discriminant analysis (PLDA) is a very effective feature extraction approach and has obtained extensive and successful applications in supervised learning tasks. It employs the squared L2 -norm to measure the model errors, which assumes a Gaussian noise distribution implicitly. However, the noise in real-life applications may not follow a Gaussian distribution. Particularly, the squared L2 -norm could extremely exaggerate data outliers. To address this issue, this article proposes a robust PLDA model under the assumption of a Laplacian noise distribution, called L1-PLDA. The learning process employs the approach by expressing the Laplacian density function as a superposition of an infinite number of Gaussian distributions via introducing a new latent variable and then adopts the variational expectation-maximization (EM) algorithm to learn parameters. The most significant advantage of the new model is that the introduced latent variable can be used to detect data outliers. The experiments on several public databases show the superiority of the proposed L1-PLDA model in terms of classification and outlier detection.
Collapse
|
26
|
Fop M, Mattei PA, Bouveyron C, Murphy TB. Unobserved classes and extra variables in high-dimensional discriminant analysis. ADV DATA ANAL CLASSI 2022; 16:55-92. [PMID: 35308632 PMCID: PMC8924148 DOI: 10.1007/s11634-021-00474-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 07/15/2021] [Accepted: 10/03/2021] [Indexed: 11/30/2022]
Abstract
AbstractIn supervised classification problems, the test set may contain data points belonging to classes not observed in the learning phase. Moreover, the same units in the test data may be measured on a set of additional variables recorded at a subsequent stage with respect to when the learning sample was collected. In this situation, the classifier built in the learning phase needs to adapt to handle potential unknown classes and the extra dimensions. We introduce a model-based discriminant approach, Dimension-Adaptive Mixture Discriminant Analysis (D-AMDA), which can detect unobserved classes and adapt to the increasing dimensionality. Model estimation is carried out via a full inductive approach based on an EM algorithm. The method is then embedded in a more general framework for adaptive variable selection and classification suitable for data of large dimensions. A simulation study and an artificial experiment related to classification of adulterated honey samples are used to validate the ability of the proposed framework to deal with complex situations.
Collapse
Affiliation(s)
- Michael Fop
- School of Mathematics & Statistics, University College Dublin, Dublin, Ireland
| | | | - Charles Bouveyron
- Université Côte d'Azur, Inria, CNRS, Laboratoire J.A. Dieudonné, Maasai team, Nice, France
| | - Thomas Brendan Murphy
- Université Côte d'Azur, Inria, CNRS, Laboratoire J.A. Dieudonné, Maasai team, Nice, France
| |
Collapse
|
27
|
Benchmarking Eliminative Radiomic Feature Selection for Head and Neck Lymph Node Classification. Cancers (Basel) 2022; 14:cancers14030477. [PMID: 35158745 PMCID: PMC8833684 DOI: 10.3390/cancers14030477] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 01/13/2022] [Accepted: 01/16/2022] [Indexed: 12/12/2022] Open
Abstract
Simple Summary Pathologic cervical lymph nodes (LN) in head and neck squamous cell carcinoma (HNSCC) deteriorate prognosis. Current radiologic criteria for LN-classification are primarily shape-based. Radiomics is an emerging data-driven technique that aids in extraction, processing and analyzing features and is potentially capable of LN-classification. Currently available sets of features are too complex for clinical applicability. We identified the combination of sparse discriminant analysis and genetic algorithms as a potentially useful algorithm for eliminative feature selection. In this retrospective, cohort-study, from 252 LNs with over extracted 30,000 features, this algorithm retained a classification accuracy of up to 90% with only 10% of the original number of features. From a clinical perspective, the selected features appeared plausible and potentially capable of correctly classifying LNs. Both the identified algorithm and features need further exploration of their potential as prospective classifiers for LNs in HNSCC. Abstract In head and neck squamous cell carcinoma (HNSCC) pathologic cervical lymph nodes (LN) remain important negative predictors. Current criteria for LN-classification in contrast-enhanced computed-tomography scans (contrast-CT) are shape-based; contrast-CT imagery allows extraction of additional quantitative data (“features”). The data-driven technique to extract, process, and analyze features from contrast-CTs is termed “radiomics”. Extracted features from contrast-CTs at various levels are typically redundant and correlated. Current sets of features for LN-classification are too complex for clinical application. Effective eliminative feature selection (EFS) is a crucial preprocessing step to reduce the complexity of sets identified. We aimed at exploring EFS-algorithms for their potential to identify sets of features, which were as small as feasible and yet retained as much accuracy as possible for LN-classification. In this retrospective cohort-study, which adhered to the STROBE guidelines, in total 252 LNs were classified as “non-pathologic” (n = 70), “pathologic” (n = 182) or “pathologic with extracapsular spread” (n = 52) by two experienced head-and-neck radiologists based on established criteria which served as a reference. The combination of sparse discriminant analysis and genetic optimization retained up to 90% of the classification accuracy with only 10% of the original numbers of features. From a clinical perspective, the selected features appeared plausible and potentially capable of correctly classifying LNs. Both the identified EFS-algorithm and the identified features need further exploration to assess their potential to prospectively classify LNs in HNSCC.
Collapse
|
28
|
Cai Z, Xia Y, Hang W. An Outer-Product-of-Gradient Approach to Dimension Reduction and its Application to Classification in High Dimensional Space. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2021.2003202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Zhibo Cai
- National University of Singapore, Singapore
| | - Yingcun Xia
- National University of Singapore, Singapore
- University of Electronic Science and Technology of China, Chengdu, China
| | | |
Collapse
|
29
|
Lu J, Lai Z, Wang H, Chen Y, Zhou J, Shen L. Generalized Embedding Regression: A Framework for Supervised Feature Extraction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:185-199. [PMID: 33147149 DOI: 10.1109/tnnls.2020.3027602] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Sparse discriminative projection learning has attracted much attention due to its good performance in recognition tasks. In this article, a framework called generalized embedding regression (GER) is proposed, which can simultaneously perform low-dimensional embedding and sparse projection learning in a joint objective function with a generalized orthogonal constraint. Moreover, the label information is integrated into the model to preserve the global structure of data, and a rank constraint is imposed on the regression matrix to explore the underlying correlation structure of classes. Theoretical analysis shows that GER can obtain the same or approximate solution as some related methods with special settings. By utilizing this framework as a general platform, we design a novel supervised feature extraction approach called jointly sparse embedding regression (JSER). In JSER, we construct an intrinsic graph to characterize the intraclass similarity and a penalty graph to indicate the interclass separability. Then, the penalty graph Laplacian is used as the constraint matrix in the generalized orthogonal constraint to deal with interclass marginal points. Moreover, the L2,1 -norm is imposed on the regression terms for robustness to outliers and data's variations and the regularization term for jointly sparse projection learning, leading to interesting semantic interpretability. An effective iterative algorithm is elaborately designed to solve the optimization problem of JSER. Theoretically, we prove that the subproblem of JSER is essentially an unbalanced Procrustes problem and can be solved iteratively. The convergence of the designed algorithm is also proved. Experimental results on six well-known data sets indicate the competitive performance and latent properties of JSER.
Collapse
|
30
|
Fujiwara T, Wei X, Zhao J, Ma KL. Interactive Dimensionality Reduction for Comparative Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:758-768. [PMID: 34591765 DOI: 10.1109/tvcg.2021.3114807] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Finding the similarities and differences between groups of datasets is a fundamental analysis task. For high-dimensional data, dimensionality reduction (DR) methods are often used to find the characteristics of each group. However, existing DR methods provide limited capability and flexibility for such comparative analysis as each method is designed only for a narrow analysis target, such as identifying factors that most differentiate groups. This paper presents an interactive DR framework where we integrate our new DR method, called ULCA (unified linear comparative analysis), with an interactive visual interface. ULCA unifies two DR schemes, discriminant analysis and contrastive learning, to support various comparative analysis tasks. To provide flexibility for comparative analysis, we develop an optimization algorithm that enables analysts to interactively refine ULCA results. Additionally, the interactive visualization interface facilitates interpretation and refinement of the ULCA results. We evaluate ULCA and the optimization algorithm to show their efficiency as well as present multiple case studies using real-world datasets to demonstrate the usefulness of this framework.
Collapse
|
31
|
Min K, Mai Q. A general framework for tensor screening through smoothing. Electron J Stat 2022. [DOI: 10.1214/21-ejs1954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Keqian Min
- Department of Statistics, Florida State University, Tallahassee, Florida 32306, U.S.A
| | - Qing Mai
- Department of Statistics, Florida State University, Tallahassee, Florida 32306, U.S.A
| |
Collapse
|
32
|
Kang X, Kang L, Chen W, Deng X. A generative approach to modeling data with quantitative and qualitative responses. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2022.104952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
33
|
Ren S, Mai Q. The robust nearest shrunken centroids classifier for high-dimensional heavy-tailed data. Electron J Stat 2022. [DOI: 10.1214/22-ejs2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Shaokang Ren
- Department of Statistics, Florida State University, Tallahassee, Florida 32306, U.S.A
| | - Qing Mai
- Department of Statistics, Florida State University, Tallahassee, Florida 32306, U.S.A
| |
Collapse
|
34
|
Deep Transfer Learning for Parkinson’s Disease Monitoring by Image-Based Representation of Resting-State EEG Using Directional Connectivity. ALGORITHMS 2021. [DOI: 10.3390/a15010005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Parkinson’s disease (PD) is characterized by abnormal brain oscillations that can change rapidly. Tracking neural alternations with high temporal resolution electrophysiological monitoring methods such as EEG can lead to valuable information about alterations observed in PD. Concomitantly, there have been advances in the high-accuracy performance of deep neural networks (DNNs) using few-patient data. In this study, we propose a method to transform resting-state EEG data into a deep latent space to classify PD subjects from healthy cases. We first used a general orthogonalized directed coherence (gOPDC) method to compute directional connectivity (DC) between all pairwise EEG channels in four frequency bands (Theta, Alpha, Beta, and Gamma) and then converted the DC maps into 2D images. We then used the VGG-16 architecture (trained on the ImageNet dataset) as our pre-trained model, enlisted weights of convolutional layers as initial weights, and fine-tuned all layer weights with our data. After training, the classification achieved 99.62% accuracy, 100% precision, 99.17% recall, 0.9958 F1 score, and 0.9958 AUC averaged for 10 random repetitions of training/evaluating on the proposed deep transfer learning (DTL) network. Using the latent features learned by the network and employing LASSO regression, we found that latent features (as opposed to the raw DC values) were significantly correlated with five clinical indices routinely measured: left and right finger tapping, left and right tremor, and body bradykinesia. Our results demonstrate the power of transfer learning and latent space derivation for the development of oscillatory biomarkers in PD.
Collapse
|
35
|
|
36
|
Elsten T, de Rooij M. SUBiNN: a stacked uni- and bivariate kNN sparse ensemble. ADV DATA ANAL CLASSI 2021. [DOI: 10.1007/s11634-021-00462-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractNearest Neighbor classification is an intuitive distance-based classification method. It has, however, two drawbacks: (1) it is sensitive to the number of features, and (2) it does not give information about the importance of single features or pairs of features. In stacking, a set of base-learners is combined in one overall ensemble classifier by means of a meta-learner. In this manuscript we combine univariate and bivariate nearest neighbor classifiers that are by itself easily interpretable. Furthermore, we combine these classifiers by a Lasso method that results in a sparse ensemble of nonlinear main and pairwise interaction effects. We christened the new method SUBiNN: Stacked Uni- and Bivariate Nearest Neighbors. SUBiNN overcomes the two drawbacks of simple nearest neighbor methods. In extensive simulations and using benchmark data sets, we evaluate the predictive performance of SUBiNN and compare it to other nearest neighbor ensemble methods as well as Random Forests and Support Vector Machines. Results indicate that SUBiNN often outperforms other nearest neighbor methods, that SUBiNN is well capable of identifying noise features, but that Random Forests is often, but not always, the best classifier.
Collapse
|
37
|
Harmless label noise and informative soft-labels in supervised classification. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2021.107253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
38
|
Long Y, Xu X. Classification by likelihood accordance functions. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2021.1955258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Yuqi Long
- School of Mathematics, Northwest University, Xi’an, China
- School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China
| | - Xingzhong Xu
- School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China
- Beijing Key Laboratory on MCAACI, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
39
|
Li G, Duan X, Wu Z, Wu C. Generalized elastic net optimal scoring problem for feature selection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
40
|
Zhang MQ, Luo XL. Novel dynamic enhanced robust principal subspace discriminant analysis for high-dimensional process fault diagnosis with industrial applications. ISA TRANSACTIONS 2021; 114:1-14. [PMID: 33388145 DOI: 10.1016/j.isatra.2020.12.025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 11/30/2020] [Accepted: 12/10/2020] [Indexed: 06/12/2023]
Abstract
Since the data are often polluted by numerous measured noise or outliers, traditional subspace discriminant analysis is difficult to extract optimal diagnostic information. To alleviate the impact of the problem, a robust principal subspace discriminant analysis algorithm for fault diagnosis is designed. On the premise of decreasing the impact of redundant information, the optimal latent features can be calculated. Specifically, in the algorithm, dual constraints of the weighted principal subspace center and l2,1-norm are introduced into the objective function to suppress outliers and noise. Besides, considering that the current changes of the data in a dynamic process rely on past observations, merely analyzing the current data may lead to an incorrect interpretation of the mechanism model, especially in the presence of similar variable data under the two different conditions. Therefore, based on the robust principal subspace discriminant analysis, we further develop its dynamic enhanced version. The dynamic enhanced method utilizes the dynamic augmented matrix to enhance the latent features of historical data into current shifted features, so as to enlarge the difference between similar modes. Finally, the experimental results arranged on the Tennessee Eastman process and a commercial multi-phase flow process demonstrate that the proposed method has advanced diagnostic performance and satisfactory convergence speed.
Collapse
Affiliation(s)
- Ming-Qing Zhang
- Department of Automation, China University of Petroleum, Beijing, 102249, China.
| | - Xiong-Lin Luo
- Department of Automation, China University of Petroleum, Beijing, 102249, China.
| |
Collapse
|
41
|
Generalising combinatorial discriminant analysis through conditioning truncated Rayleigh flow. Knowl Inf Syst 2021. [DOI: 10.1007/s10115-021-01587-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
42
|
Nakagawa T, Watanabe H, Hyodo M. Kick-one-out-based variable selection method for Euclidean distance-based classifier in high-dimensional settings. J MULTIVARIATE ANAL 2021. [DOI: 10.1016/j.jmva.2021.104756] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
43
|
Comparison between visual assessments and different variants of linear discriminant analysis to the classification of Raman patterns of inkjet printer inks. Forensic Chem 2021. [DOI: 10.1016/j.forc.2021.100336] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
44
|
Peng C, Cheng Q. Discriminative Ridge Machine: A Classifier for High-Dimensional Data or Imbalanced Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2595-2609. [PMID: 32692682 PMCID: PMC8219475 DOI: 10.1109/tnnls.2020.3006877] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
In this article, we introduce a discriminative ridge regression approach to supervised classification. It estimates a representation model while accounting for discriminativeness between classes, thereby enabling accurate derivation of categorical information. This new type of regression model extends the existing models, such as ridge, lasso, and group lasso, by explicitly incorporating discriminative information. As a special case, we focus on a quadratic model that admits a closed-form analytical solution. The corresponding classifier is called the discriminative ridge machine (DRM). Three iterative algorithms are further established for the DRM to enhance the efficiency and scalability for real applications. Our approach and the algorithms are applicable to general types of data including images, high-dimensional data, and imbalanced data. We compare the DRM with current state-of-the-art classifiers. Our extensive experimental results show the superior performance of the DRM and confirm the effectiveness of the proposed approach.
Collapse
|
45
|
Statistical and Machine Learning Link Selection Methods for Brain Functional Networks: Review and Comparison. Brain Sci 2021; 11:brainsci11060735. [PMID: 34073098 PMCID: PMC8227272 DOI: 10.3390/brainsci11060735] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 05/24/2021] [Accepted: 05/28/2021] [Indexed: 11/28/2022] Open
Abstract
Network-based representations have introduced a revolution in neuroscience, expanding the understanding of the brain from the activity of individual regions to the interactions between them. This augmented network view comes at the cost of high dimensionality, which hinders both our capacity of deciphering the main mechanisms behind pathologies, and the significance of any statistical and/or machine learning task used in processing this data. A link selection method, allowing to remove irrelevant connections in a given scenario, is an obvious solution that provides improved utilization of these network representations. In this contribution we review a large set of statistical and machine learning link selection methods and evaluate them on real brain functional networks. Results indicate that most methods perform in a qualitatively similar way, with NBS (Network Based Statistics) winning in terms of quantity of retained information, AnovaNet in terms of stability and ExT (Extra Trees) in terms of lower computational cost. While machine learning methods are conceptually more complex than statistical ones, they do not yield a clear advantage. At the same time, the high heterogeneity in the set of links retained by each method suggests that they are offering complementary views to the data. The implications of these results in neuroscience tasks are finally discussed.
Collapse
|
46
|
Partition-based feature screening for categorical data via RKHS embeddings. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2021.107176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
47
|
Zhu L, Spachos P, Pensini E, Plataniotis KN. Deep learning and machine vision for food processing: A survey. Curr Res Food Sci 2021; 4:233-249. [PMID: 33937871 PMCID: PMC8079277 DOI: 10.1016/j.crfs.2021.03.009] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/24/2021] [Accepted: 03/25/2021] [Indexed: 11/21/2022] Open
Abstract
The quality and safety of food is an important issue to the whole society, since it is at the basis of human health, social development and stability. Ensuring food quality and safety is a complex process, and all stages of food processing must be considered, from cultivating, harvesting and storage to preparation and consumption. However, these processes are often labour-intensive. Nowadays, the development of machine vision can greatly assist researchers and industries in improving the efficiency of food processing. As a result, machine vision has been widely used in all aspects of food processing. At the same time, image processing is an important component of machine vision. Image processing can take advantage of machine learning and deep learning models to effectively identify the type and quality of food. Subsequently, follow-up design in the machine vision system can address tasks such as food grading, detecting locations of defective spots or foreign objects, and removing impurities. In this paper, we provide an overview on the traditional machine learning and deep learning methods, as well as the machine vision techniques that can be applied to the field of food processing. We present the current approaches and challenges, and the future trends.
Collapse
Affiliation(s)
- Lili Zhu
- School of Engineering, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Petros Spachos
- School of Engineering, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Erica Pensini
- School of Engineering, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | | |
Collapse
|
48
|
Frias M, Moyano JM, Rivero-Juarez A, Luna JM, Camacho Á, Fardoun HM, Machuca I, Al-Twijri M, Rivero A, Ventura S. Classification Accuracy of Hepatitis C Virus Infection Outcome: Data Mining Approach. J Med Internet Res 2021; 23:e18766. [PMID: 33624609 PMCID: PMC7946589 DOI: 10.2196/18766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 11/02/2020] [Accepted: 12/17/2020] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The dataset from genes used to predict hepatitis C virus outcome was evaluated in a previous study using a conventional statistical methodology. OBJECTIVE The aim of this study was to reanalyze this same dataset using the data mining approach in order to find models that improve the classification accuracy of the genes studied. METHODS We built predictive models using different subsets of factors, selected according to their importance in predicting patient classification. We then evaluated each independent model and also a combination of them, leading to a better predictive model. RESULTS Our data mining approach identified genetic patterns that escaped detection using conventional statistics. More specifically, the partial decision trees and ensemble models increased the classification accuracy of hepatitis C virus outcome compared with conventional methods. CONCLUSIONS Data mining can be used more extensively in biomedicine, facilitating knowledge building and management of human diseases.
Collapse
Affiliation(s)
- Mario Frias
- Department of Clinical Virology and Zoonoses, Maimonides Biomedical Research Institute of Córdoba, Córdoba, Spain
| | - Jose M Moyano
- Department of Computer Science and Numerical Analysis, University of Córdoba, Córdoba, Spain
- Knowledge Discovery and Intelligent Systems in Biomedicine Laboratory, Maimonides Biomedical Research Institute of Córdoba, Córdoba, Spain
| | - Antonio Rivero-Juarez
- Department of Clinical Virology and Zoonoses, Maimonides Biomedical Research Institute of Córdoba, Córdoba, Spain
| | - Jose M Luna
- Department of Computer Science and Numerical Analysis, University of Córdoba, Córdoba, Spain
- Knowledge Discovery and Intelligent Systems in Biomedicine Laboratory, Maimonides Biomedical Research Institute of Córdoba, Córdoba, Spain
| | - Ángela Camacho
- Department of Clinical Virology and Zoonoses, Maimonides Biomedical Research Institute of Córdoba, Córdoba, Spain
| | - Habib M Fardoun
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Isabel Machuca
- Department of Clinical Virology and Zoonoses, Maimonides Biomedical Research Institute of Córdoba, Córdoba, Spain
| | - Mohamed Al-Twijri
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Antonio Rivero
- Department of Clinical Virology and Zoonoses, Maimonides Biomedical Research Institute of Córdoba, Córdoba, Spain
| | - Sebastian Ventura
- Department of Computer Science and Numerical Analysis, University of Córdoba, Córdoba, Spain
- Knowledge Discovery and Intelligent Systems in Biomedicine Laboratory, Maimonides Biomedical Research Institute of Córdoba, Córdoba, Spain
| |
Collapse
|
49
|
|
50
|
Affiliation(s)
- Na Cui
- CStone pharmaceuticals Suzhou Jiangsu China
| | - Jianjun Hu
- Pinterest, Inc. San Francisco California USA
| | - Feng Liang
- Department of Statistics University of Illinois at Urbana‐Champaign Champaign Illinois USA
| |
Collapse
|