Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wang Q, Luo Z, Huang J, Feng Y, Liu Z. A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM. Comput Intell Neurosci 2017;2017:1827016. [PMID: 28250765 DOI: 10.1155/2017/1827016] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2016] [Revised: 12/23/2016] [Accepted: 12/28/2016] [Indexed: 11/17/2022]

For:	Wang Q, Luo Z, Huang J, Feng Y, Liu Z. A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM. Comput Intell Neurosci 2017;2017:1827016. [PMID: 28250765 DOI: 10.1155/2017/1827016] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2016] [Revised: 12/23/2016] [Accepted: 12/28/2016] [Indexed: 11/17/2022]

Number

Cited by Other Article(s)

Thölke P, Mantilla-Ramos YJ, Abdelhedi H, Maschke C, Dehgan A, Harel Y, Kemtur A, Mekki Berrada L, Sahraoui M, Young T, Bellemare Pépin A, El Khantour C, Landry M, Pascarella A, Hadid V, Combrisson E, O'Byrne J, Jerbi K. Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data. Neuroimage 2023:120253. [PMID: 37385392 DOI: 10.1016/j.neuroimage.2023.120253] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 06/05/2023] [Accepted: 06/26/2023] [Indexed: 07/01/2023] Open

Abstract

Machine learning (ML) is increasingly used in cognitive, computational and clinical neuroscience. The reliable and efficient application of ML requires a sound understanding of its subtleties and limitations. Training ML models on datasets with imbalanced classes is a particularly common problem, and it can have severe consequences if not adequately addressed. With the neuroscience ML user in mind, this paper provides a didactic assessment of the class imbalance problem and illustrates its impact through systematic manipulation of data imbalance ratios in (i) simulated data and (ii) brain data recorded with electroencephalography (EEG), magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI). Our results illustrate how the widely-used Accuracy (Acc) metric, which measures the overall proportion of successful predictions, yields misleadingly high performances, as class imbalance increases. Because Acc weights the per-class ratios of correct predictions proportionally to class size, it largely disregards the performance on the minority class. A binary classification model that learns to systematically vote for the majority class will yield an artificially high decoding accuracy that directly reflects the imbalance between the two classes, rather than any genuine generalizable ability to discriminate between them. We show that other evaluation metrics such as the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC), and the less common Balanced Accuracy (BAcc) metric - defined as the arithmetic mean between sensitivity and specificity, provide more reliable performance evaluations for imbalanced data. Our findings also highlight the robustness of Random Forest (RF), and the benefits of using stratified cross-validation and hyperprameter optimization to tackle data imbalance. Critically, for neuroscience ML applications that seek to minimize overall classification error, we recommend the routine use of BAcc, which in the specific case of balanced data is equivalent to using standard Acc, and readily extends to multi-class settings. Importantly, we present a list of recommendations for dealing with imbalanced data, as well as open-source code to allow the neuroscience community to replicate and extend our observations and explore alternative approaches to coping with imbalanced data.

Collapse

Affiliation(s)

Philipp Thölke Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Institute of Cognitive Science, Osnabrück University, Neuer Graben 29/Schloss, Osnabrück, 49074, Lower Saxony, Germany.
Yorguin-Jose Mantilla-Ramos Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Neuropsychology and Behavior Group (GRUNECO), Faculty of Medicine, Universidad de Antioquia,53-108, Medellin, Aranjuez, Medellin, 050010, Colombia
Hamza Abdelhedi Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Charlotte Maschke Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Integrated Program in Neuroscience, McGill University, 1033 Pine Ave,Montreal, H3A 0G4, Canada
Arthur Dehgan Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Institut de Neurosciences de la Timone (INT), CNRS, Aix Marseille University,Marseille, 13005, France
Yann Harel Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Anirudha Kemtur Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Loubna Mekki Berrada Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Myriam Sahraoui Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Tammy Young Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Department of Computing Science, University of Alberta, 116 St & 85 Ave, Edmonton, T6G 2R3, AB, Canada
Antoine Bellemare Pépin Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Department of Music, Concordia University, 1550 De Maisonneuve Blvd. W., Montreal, H3H 1G8, QC, Canada
Clara El Khantour Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Mathieu Landry Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Annalisa Pascarella Institute for Applied Mathematics Mauro Picone, National Research Council, Roma, Italy, Roma, Italy
Vanessa Hadid Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Etienne Combrisson Institut de Neurosciences de la Timone (INT), CNRS, Aix Marseille University,Marseille, 13005, France
Jordan O'Byrne Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Karim Jerbi Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Mila (Quebec Machine Learning Institute),6666 Rue Saint-Urbain, Montreal, H2S 3H1, QC, Canada; UNIQUE Centre (Quebec Neuro-AI Research Centre), 3744 rue Jean-Brillant, Montreal,H3T 1P1,QC, Canada

Collapse

Xu Y, Yu Z, Chen CLP, Liu Z. Adaptive Subspace Optimization Ensemble Method for High-Dimensional Imbalanced Data Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023;34:2284-2297. [PMID: 34469316 DOI: 10.1109/tnnls.2021.3106306] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Shim JG, Ryu KH, Cho EA, Ahn JH, Cha YB, Lim G, Lee SH. Machine learning for prediction of postoperative nausea and vomiting in patients with intravenous patient-controlled analgesia. PLoS One 2022;17:e0277957. [PMID: 36548346 PMCID: PMC9778492 DOI: 10.1371/journal.pone.0277957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 11/07/2022] [Indexed: 12/24/2022] Open

Minimally overfitted learners: A general framework for ensemble learning. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109669] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Venkataramana L, Prasad DVV, Saraswathi S, Mithumary CM, Karthikeyan R, Monika N. Classification of COVID-19 from tuberculosis and pneumonia using deep learning techniques. Med Biol Eng Comput 2022;60:2681-2691. [PMID: 35834050 PMCID: PMC9281341 DOI: 10.1007/s11517-022-02632-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 07/05/2022] [Indexed: 12/02/2022]

Abstract

Deep learning provides the healthcare industry with the ability to analyse data at exceptional speeds without compromising on accuracy. These techniques are applicable to healthcare domain for accurate and timely prediction. Convolutional neural network is a class of deep learning methods which has become dominant in various computer vision tasks and is attracting interest across a variety of domains, including radiology. Lung diseases such as tuberculosis (TB), bacterial and viral pneumonias, and COVID-19 are not predicted accurately due to availability of very few samples for either of the lung diseases. The disease could be easily diagnosed using X-ray or CT scan images. But the number of images available for each of the disease is not as equally as other resulting in imbalance nature of input data. Conventional supervised machine learning methods do not achieve higher accuracy when trained using a lesser amount of COVID-19 data samples. Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset. Data augmentation helped reduce overfitting when training a deep neural network. The SMOTE (Synthetic Minority Oversampling Technique) algorithm is used for the purpose of balancing the classes. The novelty in this research work is to apply combined data augmentation and class balance techniques before classification of tuberculosis, pneumonia, and COVID-19. The classification accuracy obtained with the proposed multi-level classification after training the model is recorded as 97.4% for TB and pneumonia and 88% for bacterial, viral, and COVID-19 classifications. The proposed multi-level classification method produced is ~8 to ~10% improvement in classification accuracy when compared with the existing methods in this area of research. The results reveal the fact that the proposed system is scalable to growing medical data and classifies lung diseases and its sub-types in less time with higher accuracy.

Collapse

Choi HS, Jung D, Kim S, Yoon S. Imbalanced Data Classification via Cooperative Interaction Between Classifier and Generator. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022;33:3343-3356. [PMID: 33531305 DOI: 10.1109/tnnls.2021.3052243] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

PF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.05.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Malavolta M, Pallante L, Mavkov B, Stojceski F, Grasso G, Korfiati A, Mavroudi S, Kalogeras A, Alexakos C, Martos V, Amoroso D, Di Benedetto G, Piga D, Theofilatos K, Deriu MA. A survey on computational taste predictors. Eur Food Res Technol 2022;248:2215-2235. [PMID: 35637881 PMCID: PMC9134981 DOI: 10.1007/s00217-022-04044-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 04/29/2022] [Accepted: 04/30/2022] [Indexed: 11/29/2022]

Huang K, Wang X. CCR-GSVM: A boundary data generation algorithm for support vector machine in imbalanced majority noise problem. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03408-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Merhbene G, Nath S, Puttick AR, Kurpicz-Briki M. BurnoutEnsemble: Augmented Intelligence to Detect Indications for Burnout in Clinical Psychology. Front Big Data 2022;5:863100. [PMID: 35449532 PMCID: PMC9016321 DOI: 10.3389/fdata.2022.863100] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 02/25/2022] [Indexed: 11/24/2022] Open

Fusco R, Di Bernardo E, Piccirillo A, Rubulotta MR, Petrosino T, Barretta ML, Mattace Raso M, Vallone P, Raiano C, Di Giacomo R, Siani C, Avino F, Scognamiglio G, Di Bonito M, Granata V, Petrillo A. Radiomic and Artificial Intelligence Analysis with Textural Metrics Extracted by Contrast-Enhanced Mammography and Dynamic Contrast Magnetic Resonance Imaging to Detect Breast Malignant Lesions. Curr Oncol 2022;29:1947-1966. [PMID: 35323359 PMCID: PMC8947713 DOI: 10.3390/curroncol29030159] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 03/07/2022] [Accepted: 03/10/2022] [Indexed: 11/16/2022] Open

Affiliation(s)

Roberta Fusco Medical Oncolody Division, Igea SpA, 80013 Naples, Italy; (R.F.); (E.D.B.)
Elio Di Bernardo Medical Oncolody Division, Igea SpA, 80013 Naples, Italy; (R.F.); (E.D.B.)
Adele Piccirillo Department of Electrical Engineering and Information Technologies, Università degli Studi di Napoli Federico II, 80125 Naples, Italy;
Maria Rosaria Rubulotta Radiology Division, Istituto Nazionale Tumori-IRCCS-Fondazione G. Pascale, 80131 Naples, Italy; (M.R.R.); (T.P.); (M.L.B.); (M.M.R.); (P.V.); (C.R.); (A.P.)
Teresa Petrosino Radiology Division, Istituto Nazionale Tumori-IRCCS-Fondazione G. Pascale, 80131 Naples, Italy; (M.R.R.); (T.P.); (M.L.B.); (M.M.R.); (P.V.); (C.R.); (A.P.)
Maria Luisa Barretta Radiology Division, Istituto Nazionale Tumori-IRCCS-Fondazione G. Pascale, 80131 Naples, Italy; (M.R.R.); (T.P.); (M.L.B.); (M.M.R.); (P.V.); (C.R.); (A.P.)
Mauro Mattace Raso Radiology Division, Istituto Nazionale Tumori-IRCCS-Fondazione G. Pascale, 80131 Naples, Italy; (M.R.R.); (T.P.); (M.L.B.); (M.M.R.); (P.V.); (C.R.); (A.P.)
Paolo Vallone Radiology Division, Istituto Nazionale Tumori-IRCCS-Fondazione G. Pascale, 80131 Naples, Italy; (M.R.R.); (T.P.); (M.L.B.); (M.M.R.); (P.V.); (C.R.); (A.P.)
Concetta Raiano Radiology Division, Istituto Nazionale Tumori-IRCCS-Fondazione G. Pascale, 80131 Naples, Italy; (M.R.R.); (T.P.); (M.L.B.); (M.M.R.); (P.V.); (C.R.); (A.P.)
Raimondo Di Giacomo Senology Surgical Division, Istituto Nazionale Tumori-IRCCS-Fondazione G. Pascale, 80131 Naples, Italy; (R.D.G.); (C.S.); (F.A.)
Claudio Siani Senology Surgical Division, Istituto Nazionale Tumori-IRCCS-Fondazione G. Pascale, 80131 Naples, Italy; (R.D.G.); (C.S.); (F.A.)
Franca Avino Senology Surgical Division, Istituto Nazionale Tumori-IRCCS-Fondazione G. Pascale, 80131 Naples, Italy; (R.D.G.); (C.S.); (F.A.)
Giosuè Scognamiglio Pathology Division, Istituto Nazionale Tumori-IRCCS-Fondazione G. Pascale, 80131 Naples, Italy; (G.S.); (M.D.B.)
Maurizio Di Bonito Pathology Division, Istituto Nazionale Tumori-IRCCS-Fondazione G. Pascale, 80131 Naples, Italy; (G.S.); (M.D.B.)
Vincenza Granata Radiology Division, Istituto Nazionale Tumori-IRCCS-Fondazione G. Pascale, 80131 Naples, Italy; (M.R.R.); (T.P.); (M.L.B.); (M.M.R.); (P.V.); (C.R.); (A.P.) Correspondence: ; Tel.: +39-081-590-714; Fax: +39-081-590-3825
Antonella Petrillo Radiology Division, Istituto Nazionale Tumori-IRCCS-Fondazione G. Pascale, 80131 Naples, Italy; (M.R.R.); (T.P.); (M.L.B.); (M.M.R.); (P.V.); (C.R.); (A.P.)

Collapse

Li DC, Shi QS, Lin YS, Lin LS. A Boundary-Information-Based Oversampling Approach to Improve Learning Performance for Imbalanced Datasets. ENTROPY 2022;24:e24030322. [PMID: 35327833 PMCID: PMC8947752 DOI: 10.3390/e24030322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 02/19/2022] [Accepted: 02/21/2022] [Indexed: 11/16/2022]

A new clustering mining algorithm for multi-source imbalanced location data. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.10.029] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Shim JG, Ryu KH, Cho EA, Ahn JH, Kim HK, Lee YJ, Lee SH. Machine Learning Approaches to Predict Chronic Lower Back Pain in People Aged over 50 Years. Medicina (B Aires) 2021;57:medicina57111230. [PMID: 34833448 PMCID: PMC8618953 DOI: 10.3390/medicina57111230] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 11/09/2021] [Indexed: 11/16/2022] Open

Yang L, Heiselman C, Quirk JG, Djurić PM. CLASS-IMBALANCED CLASSIFIERS USING ENSEMBLES OF GAUSSIAN PROCESSES AND GAUSSIAN PROCESS LATENT VARIABLE MODELS. PROCEEDINGS OF THE ... IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. ICASSP (CONFERENCE) 2021;2021. [PMID: 34712104 DOI: 10.1109/icassp39728.2021.9414754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Tripathi MK, Nath A, Singh TP, Ethayathulla AS, Kaur P. Evolving scenario of big data and Artificial Intelligence (AI) in drug discovery. Mol Divers 2021;25:1439-1460. [PMID: 34159484 PMCID: PMC8219515 DOI: 10.1007/s11030-021-10256-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 06/14/2021] [Indexed: 12/24/2022]

Radiomics and Artificial Intelligence Analysis with Textural Metrics Extracted by Contrast-Enhanced Mammography in the Breast Lesions Classification. Diagnostics (Basel) 2021;11:diagnostics11050815. [PMID: 33946333 PMCID: PMC8146084 DOI: 10.3390/diagnostics11050815] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 04/26/2021] [Accepted: 04/27/2021] [Indexed: 12/29/2022] Open

Abstract

The aim of the study was to estimate the diagnostic accuracy of textural features extracted by dual-energy contrast-enhanced mammography (CEM) images, by carrying out univariate and multivariate statistical analyses including artificial intelligence approaches. In total, 80 patients with known breast lesion were enrolled in this prospective study according to regulations issued by the local Institutional Review Board. All patients underwent dual-energy CEM examination in both craniocaudally (CC) and double acquisition of mediolateral oblique (MLO) projections (early and late). The reference standard was pathology from a surgical specimen for malignant lesions and pathology from a surgical specimen or fine needle aspiration cytology, core or Tru-Cut needle biopsy, and vacuum assisted breast biopsy for benign lesions. In total, 104 samples of 80 patients were analyzed. Furthermore, 48 textural parameters were extracted by manually segmenting regions of interest. Univariate and multivariate approaches were performed: non-parametric Wilcoxon–Mann–Whitney test; receiver operating characteristic (ROC), linear classifier (LDA), decision tree (DT), k-nearest neighbors (KNN), artificial neural network (NNET), and support vector machine (SVM) were utilized. A balancing approach and feature selection methods were used. The univariate analysis showed low accuracy and area under the curve (AUC) for all considered features. Instead, in the multivariate textural analysis, the best performance considering the CC view (accuracy (ACC) = 0.75; AUC = 0.82) was reached with a DT trained with leave-one-out cross-variation (LOOCV) and balanced data (with adaptive synthetic (ADASYN) function) and a subset of three robust textural features (MAD, VARIANCE, and LRLGE). The best performance (ACC = 0.77; AUC = 0.83) considering the early-MLO view was reached with a NNET trained with LOOCV and balanced data (with ADASYN function) and a subset of ten robust features (MEAN, MAD, RANGE, IQR, VARIANCE, CORRELATION, RLV, COARSNESS, BUSYNESS, and STRENGTH). The best performance (ACC = 0.73; AUC = 0.82) considering the late-MLO view was reached with a NNET trained with LOOCV and balanced data (with ADASYN function) and a subset of eleven robust features (MODE, MEDIAN, RANGE, RLN, LRLGE, RLV, LZLGE, GLV_GLSZM, ZSV, COARSNESS, and BUSYNESS). Multivariate analyses using pattern recognition approaches, considering 144 textural features extracted from all three mammographic projections (CC, early MLO, and late MLO), optimized by adaptive synthetic sampling and feature selection operations obtained the best results (ACC = 0.87; AUC = 0.90) and showed the best performance in the discrimination of benign and malignant lesions.

Collapse

WUZHENG XIAOLEI, ZUO SHIGANG, YAO LI, ZHAO XIAOJIE. SEMI-SUPERVISED SPARSE REPRESENTATION CLASSIFICATION FOR SLEEP EEG RECOGNITION WITH IMBALANCED SAMPLE SETS. J MECH MED BIOL 2021. [DOI: 10.1142/s0219519421400066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Chen PW, Baune NA, Zwir I, Wang J, Swamidass V, Wong AW. Measuring Activities of Daily Living in Stroke Patients with Motion Machine Learning Algorithms: A Pilot Study. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021;18:ijerph18041634. [PMID: 33572116 PMCID: PMC7915561 DOI: 10.3390/ijerph18041634] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 02/04/2021] [Accepted: 02/05/2021] [Indexed: 11/20/2022]

Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19. ACTA ACUST UNITED AC 2020;3:100023. [PMID: 33289013 PMCID: PMC7710484 DOI: 10.1016/j.ibmed.2020.100023] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 11/10/2020] [Accepted: 11/16/2020] [Indexed: 11/20/2022]

Wang Z, Cao C, Zhu Y. Entropy and Confidence-Based Undersampling Boosting Random Forests for Imbalanced Problems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020;31:5178-5191. [PMID: 31995503 DOI: 10.1109/tnnls.2020.2964585] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Moustakidis S, Papandrianos NI, Christodolou E, Papageorgiou E, Tsaopoulos D. Dense neural networks in knee osteoarthritis classification: a study on accuracy and fairness. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05459-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Pulido JV, Guleria S, Ehsan L, Fasullo M, Lippman R, Mutha P, Shah T, Syed S, Brown DE. Semi-Supervised Classification of Noisy, Gigapixel Histology Images. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING 2020;2020:563-568. [PMID: 34046246 PMCID: PMC8144886 DOI: 10.1109/bibe50027.2020.00097] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Classification of Dermoscopy Skin Lesion Color-Images Using Fractal-Deep Learning Features. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10175954] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Ladeira Marques M, Moraes Villela S, Hasenclever Borges CC. Large margin classifiers to generate synthetic data for imbalanced datasets. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01719-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Cost-sensitive sample shifting in feature space. Pattern Anal Appl 2020. [DOI: 10.1007/s10044-020-00890-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Yu K, Shi W, Santoro N. Designing a Streaming Algorithm for Outlier Detection in Data Mining-An Incrementa Approach. SENSORS 2020;20:s20051261. [PMID: 32110907 PMCID: PMC7085525 DOI: 10.3390/s20051261] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 02/05/2020] [Accepted: 02/19/2020] [Indexed: 11/16/2022]

Abstract

To design an algorithm for detecting outliers over streaming data has become an important task in many common applications, arising in areas such as fraud detections, network analysis, environment monitoring and so forth. Due to the fact that real-time data may arrive in the form of streams rather than batches, properties such as concept drift, temporal context, transiency, and uncertainty need to be considered. In addition, data processing needs to be incremental with limited memory resource, and scalable. These facts create big challenges for existing outlier detection algorithms in terms of their accuracies when they are implemented in an incremental fashion, especially in the streaming environment. To address these problems, we first propose C_KDE_WR, which uses sliding window and kernel function to process the streaming data online, and reports its results demonstrating high throughput on handling real-time streaming data, implemented in a CUDA framework on Graphics Processing Unit (GPU). We also present another algorithm, C_LOF, based on a very popular and effective outlier detection algorithm called Local Outlier Factor (LOF) which unfortunately works only on batched data. Using a novel incremental approach that compensates the drawback of high complexity in LOF, we show how to implement it in a streaming context and to obtain results in a timely manner. Like C_KDE_WR, C_LOF also employs sliding-window and statistical-summary to help making decision based on the data in the current window. It also addresses all those challenges of streaming data as addressed in C_KDE_WR. In addition, we report the comparative evaluation on the accuracy of C_KDE_WR with the state-of-the-art SOD_GPU using Precision, Recall and F-score metrics. Furthermore, a t-test is also performed to demonstrate the significance of the improvement. We further report the testing results of C_LOF on different parameter settings and drew ROC and PR curve with their area under the curve (AUC) and Average Precision (AP) values calculated respectively. Experimental results show that C_LOF can overcome the masquerading problem, which often exists in outlier detection on streaming data. We provide complexity analysis and report experiment results on the accuracy of both C_KDE_WR and C_LOF algorithms in order to evaluate their effectiveness as well as their efficiencies.

Collapse

Yoo TK, Ryu IH, Choi H, Kim JK, Lee IS, Kim JS, Lee G, Rim TH. Explainable Machine Learning Approach as a Tool to Understand Factors Used to Select the Refractive Surgery Technique on the Expert Level. Transl Vis Sci Technol 2020;9:8. [PMID: 32704414 PMCID: PMC7346876 DOI: 10.1167/tvst.9.2.8] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Accepted: 11/18/2019] [Indexed: 12/23/2022] Open

Abstract

Purpose

Recently, laser refractive surgery options, including laser epithelial keratomileusis, laser in situ keratomileusis, and small incision lenticule extraction, successfully improved patients' quality of life. Evidence-based recommendation for an optimal surgery technique is valuable in increasing patient satisfaction. We developed an interpretable multiclass machine learning model that selects the laser surgery option on the expert level.

Methods

A multiclass XGBoost model was constructed to classify patients into four categories including laser epithelial keratomileusis, laser in situ keratomileusis, small incision lenticule extraction, and contraindication groups. The analysis included 18,480 subjects who intended to undergo refractive surgery at the B&VIIT Eye center. Training (n = 10,561) and internal validation (n = 2640) were performed using subjects who visited between 2016 and 2017. The model was trained based on clinical decisions of highly experienced experts and ophthalmic measurements. External validation (n = 5279) was conducted using subjects who visited in 2018. The SHapley Additive ex-Planations technique was adopted to explain the output of the XGBoost model.

Results

The multiclass XGBoost model exhibited an accuracy of 81.0% and 78.9% when tested on the internal and external validation datasets, respectively. The SHapley Additive ex-Planations explanations for the results were consistent with prior knowledge from ophthalmologists. The explanation from one-versus-one and one-versus-rest XGBoost classifiers was effective for easily understanding users in the multicategorical classification problem.

Conclusions

This study suggests an expert-level multiclass machine learning model for selecting the refractive surgery for patients. It also provided a clinical understanding in a multiclass problem based on an explainable artificial intelligence technique.

Translational Relevance

Explainable machine learning exhibits a promising future for increasing the practical use of artificial intelligence in ophthalmic clinics.

Collapse

Learning imbalanced datasets based on SMOTE and Gaussian distribution. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2019.10.048] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Standard Decision Boundary in a Support-Domain of Fuzzy Classifier Prediction for the Task of Imbalanced Data Classification. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7303698 DOI: 10.1007/978-3-030-50423-6_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Susan S, Kumar A. SSOMaj-SMOTE-SSOMin: Three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.02.028] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Integrating Growth and Environmental Parameters to Discriminate Powdery Mildew and Aphid of Winter Wheat Using Bi-Temporal Landsat-8 Imagery. REMOTE SENSING 2019. [DOI: 10.3390/rs11070846] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Monitoring and discriminating co-epidemic diseases and pests at regional scales are of practical importance in guiding differential treatment. A combination of vegetation and environmental parameters could improve the accuracy for discriminating crop diseases and pests. Different diseases and pests could cause similar stresses and symptoms during the same crop growth period, so combining growth period information can be useful for discerning different changes in crop diseases and pests. Additionally, problems associated with imbalanced data often have detrimental effects on the performance of image classification. In this study, we developed an approach for discriminating crop diseases and pests based on bi-temporal Landsat-8 satellite imagery integrating both crop growth and environmental parameters. As a case study, the approach was applied to data during a period of typical co-epidemic outbreak of winter wheat powdery mildew and aphids in the Shijiazhuang area of Hebei Province, China. Firstly, bi-temporal remotely sensed features characterizing growth indices and environmental factors were calculated based on two Landsat-8 images. The synthetic minority oversampling technique (SMOTE) algorithm was used to resample the imbalanced training data set before model construction. Then, a back propagation neural network (BPNN) based on a new training data set balanced by the SMOTE approach (SMOTE-BPNN) was developed to generate the regional wheat disease and pest distribution maps. The original training data set-based BPNN and support vector machine (SVM) methods were used for comparison and testing of the initial results. Our findings suggest that the proposed approach incorporating both growth and environmental parameters of different crop periods could distinguish wheat powdery mildew and aphids at the regional scale. The bi-temporal growth indices and environmental factors-based SMOTE-BPNN, BPNN, and SVM models all had an overall accuracy high than 80%. Meanwhile, the SMOTE-BPNN method had the highest G-means among the three methods. These results revealed that the combination of bi-temporal crop growth and environmental parameters is essential for improving the accuracy of the crop disease and pest discriminating models. The combination of SMOTE and BPNN could effectively improve the discrimination accuracy of the minor disease or pest. Collapse

Predicting Surgical Complications in Patients Undergoing Elective Adult Spinal Deformity Procedures Using Machine Learning. Spine Deform 2019;6:762-770. [PMID: 30348356 DOI: 10.1016/j.jspd.2018.03.003] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2017] [Revised: 02/25/2018] [Accepted: 03/01/2018] [Indexed: 11/22/2022]

Abstract

STUDY DESIGN

Cross-sectional database study.

OBJECTIVE

To train and validate machine learning models to identify risk factors for complications following surgery for adult spinal deformity (ASD).

SUMMARY OF BACKGROUND DATA

Machine learning models such as logistic regression (LR) and artificial neural networks (ANNs) are valuable tools for analyzing and interpreting large and complex data sets. ANNs have yet to be used for risk factor analysis in orthopedic surgery.

METHODS

The American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) database was queried for patients who underwent surgery for ASD. This query returned 4,073 patients, which data were used to train and evaluate our models. The predictive variables used included sex, age, ethnicity, diabetes, smoking, steroid use, coagulopathy, functional status, American Society of Anesthesiologists (ASA) class >3, body mass index (BMI), pulmonary comorbidities, and cardiac comorbidities. The models were used to predict cardiac complications, wound complications, venous thromboembolism (VTE), and mortality. Using ASA class as a benchmark for prediction, area under receiver operating characteristic curves (AUC) was used to determine the accuracy of our machine learning models.

RESULTS

The mean age of patients was 59.5 years. Forty-one percent of patients were male whereas 59.0% of patients were female. ANN and LR outperformed ASA scoring in predicting every complication (p<.05). The ANN outperformed LR in predicting cardiac complication, wound complication, and mortality (p<.05).

CONCLUSIONS

Machine learning algorithms outperform ASA scoring for predicting individual risk prognosis. These algorithms also outperform LR in predicting individual risk for all complications except VTE. With the growing size of medical data, the training of machine learning on these large data sets promises to improve risk prognostication, with the ability of continuously learning making them excellent tools in complex clinical scenarios.

LEVEL OF EVIDENCE

Level III.

Collapse

Arvind V, Kim JS, Oermann EK, Kaji D, Cho SK. Predicting Surgical Complications in Adult Patients Undergoing Anterior Cervical Discectomy and Fusion Using Machine Learning. Neurospine 2018;15:329-337. [PMID: 30554505 PMCID: PMC6347343 DOI: 10.14245/ns.1836248.124] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 11/27/2018] [Indexed: 11/25/2022] Open

Examining the Ability of Artificial Neural Networks Machine Learning Models to Accurately Predict Complications Following Posterior Lumbar Spine Fusion. Spine (Phila Pa 1976) 2018;43:853-860. [PMID: 29016439 PMCID: PMC6252089 DOI: 10.1097/brs.0000000000002442] [Citation(s) in RCA: 113] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Abstract

STUDY DESIGN

A cross-sectional database study.

OBJECTIVE

The aim of this study was to train and validate machine learning models to identify risk factors for complications following posterior lumbar spine fusion.

SUMMARY OF BACKGROUND DATA

Machine learning models such as artificial neural networks (ANNs) are valuable tools for analyzing and interpreting large and complex datasets. ANNs have yet to be used for risk factor analysis in orthopedic surgery.

METHODS

The American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) database was queried for patients who underwent posterior lumbar spine fusion. This query returned 22,629 patients, 70% of whom were used to train our models, and 30% were used to evaluate the models. The predictive variables used included sex, age, ethnicity, diabetes, smoking, steroid use, coagulopathy, functional status, American Society for Anesthesiology (ASA) class ≥3, body mass index (BMI), pulmonary comorbidities, and cardiac comorbidities. The models were used to predict cardiac complications, wound complications, venous thromboembolism (VTE), and mortality. Using ASA class as a benchmark for prediction, area under receiver operating curves (AUC) was used to determine the accuracy of our machine learning models.

RESULTS

On the basis of AUC values, ANN and LR both outperformed ASA class for predicting all four types of complications. ANN was the most accurate for predicting cardiac complications, and LR was most accurate for predicting wound complications, VTE, and mortality, though ANN and LR had comparable AUC values for predicting all types of complications. ANN had greater sensitivity than LR for detecting wound complications and mortality.

CONCLUSION

Machine learning in the form of logistic regression and ANNs were more accurate than benchmark ASA scores for identifying risk factors of developing complications following posterior lumbar spine fusion, suggesting they are potentially great tools for risk factor analysis in spine surgery.

LEVEL OF EVIDENCE

Collapse