1
|
Liu Z, He X. Dynamic Submodular-Based Learning Strategy in Imbalanced Drifting Streams for Real-Time Safety Assessment in Nonstationary Environments. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:3038-3051. [PMID: 37494171 DOI: 10.1109/tnnls.2023.3294788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
The design of real-time safety assessment (RTSA) approaches in nonstationary environments is meaningful to reduce the possibility of significant losses. However, several challenging problems are needed to be well considered. The performance of existing approaches will be negatively affected in the settings of imbalanced drifting streams. In this case, the model design with the incremental update should also be explored. Furthermore, the query strategy should also be well-designed. This article investigates a dynamic submodular-based learning strategy to address such issues. Specifically, an efficient incremental update procedure is designed with the structure of the broad learning system (BLS), which is beneficial to the detection of concept drift. Furthermore, a novel dynamic submodular-based annotation with an activation interval strategy is proposed to select valuable samples in imbalanced drifting streams. The lower bound of annotation value is also proven theoretically with a novel drift adaption mechanism. Numerous experiments are conducted with the realistic data of JiaoLong deep-sea manned submersible. The experimental results show that the proposed approach can achieve better assessment accuracy than typical existing approaches.
Collapse
|
2
|
Lou J, Jiang Y, Shen Q, Wang R, Li Z. Probabilistic Regularized Extreme Learning for Robust Modeling of Traffic Flow Forecasting. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1732-1741. [PMID: 33064658 DOI: 10.1109/tnnls.2020.3027822] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The adaptive neurofuzzy inference system (ANFIS) is a structured multioutput learning machine that has been successfully adopted in learning problems without noise or outliers. However, it does not work well for learning problems with noise or outliers. High-accuracy real-time forecasting of traffic flow is extremely difficult due to the effect of noise or outliers from complex traffic conditions. In this study, a novel probabilistic learning system, probabilistic regularized extreme learning machine combined with ANFIS (probabilistic R-ELANFIS), is proposed to capture the correlations among traffic flow data and, thereby, improve the accuracy of traffic flow forecasting. The new learning system adopts a fantastic objective function that minimizes both the mean and the variance of the model bias. The results from an experiment based on real-world traffic flow data showed that, compared with some kernel-based approaches, neural network approaches, and conventional ANFIS learning systems, the proposed probabilistic R-ELANFIS achieves competitive performance in terms of forecasting ability and generalizability.
Collapse
|
3
|
Chen S, Wang R, Lu J. A meta-framework for multi-label active learning based on deep reinforcement learning. Neural Netw 2023; 162:258-270. [PMID: 36913822 DOI: 10.1016/j.neunet.2023.02.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 01/02/2023] [Accepted: 02/28/2023] [Indexed: 03/09/2023]
Abstract
Multi-label Active Learning (MLAL) is an effective method to improve the performance of the classifier on multi-label problems with less annotation effort by allowing the learning system to actively select high-quality examples (example-label pairs) for labeling. Existing MLAL algorithms mainly focus on designing reasonable algorithms to evaluate the potential values (as previously mentioned quality) of the unlabeled data. These manually designed methods may show totally different results on various types of datasets due to the defect of the methods or the particularity of the datasets. In this paper, instead of manually designing an evaluation method, we propose a deep reinforcement learning (DRL) model to explore a general evaluation method on several seen datasets and eventually apply it to unseen datasets based on a meta framework. In addition, a self-attention mechanism along with a reward function is integrated into the DRL structure to address the label correlation and data imbalanced problems in MLAL. Comprehensive experiments show that our proposed DRL-based MLAL method is able to produce comparable results as compared with other methods reported in the literature.
Collapse
Affiliation(s)
- Shuyue Chen
- College of Mathematics and Statistics, Shenzhen University, Shenzhen, 518060, China.
| | - Ran Wang
- College of Mathematics and Statistics, Shenzhen University, Shenzhen, 518060, China; Shenzhen Key Lab. of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen, 518060, China; Guangdong Key Lab. of Intelligent Information Process, Shenzhen University, Shenzhen, 518060, China.
| | - Jian Lu
- College of Mathematics and Statistics, Shenzhen University, Shenzhen, 518060, China; Shenzhen Key Lab. of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen, 518060, China.
| |
Collapse
|
4
|
Improving Active Learning Performance through the Use of Data Augmentation. INT J INTELL SYST 2023. [DOI: 10.1155/2023/7941878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Active learning (AL) is a well-known technique to optimize data usage in training, through the interactive selection of unlabeled observations, out of a large pool of unlabeled data, to be labeled by a supervisor. Its focus is to find the unlabeled observations that, once labeled, will maximize the informativeness of the training dataset, therefore reducing data-related costs. The literature describes several methods to improve the effectiveness of this process. Nonetheless, there is a paucity of research developed around the application of artificial data sources in AL, especially outside image classification or NLP. This paper proposes a new AL framework, which relies on the effective use of artificial data. It may be used with any classifier, generation mechanism, and data type and can be integrated with multiple other state-of-the-art AL contributions. This combination is expected to increase the ML classifier’s performance and reduce both the supervisor’s involvement and the amount of required labeled data at the expense of a marginal increase in computational time. The proposed method introduces a hyperparameter optimization component to improve the generation of artificial instances during the AL process as well as an uncertainty-based data generation mechanism. We compare the proposed method to the standard framework and an oversampling-based active learning method for more informed data generation in an AL context. The models’ performance was tested using four different classifiers, two AL-specific performance metrics, and three classification performance metrics over 15 different datasets. We demonstrated that the proposed framework, using data augmentation, significantly improved the performance of AL, both in terms of classification performance and data selection efficiency (all the codes and preprocessed data developed for this study are available at https://github.com/joaopfonseca/publications/).
Collapse
|
5
|
Chen X, Wujek B. A Unified Framework for Automatic Distributed Active Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9774-9786. [PMID: 34813465 DOI: 10.1109/tpami.2021.3129793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We propose a novel unified frameork for automated distributed active learning (AutoDAL) to address multiple challenging problems in active learning such as limited labeled data, imbalanced datasets, automatic hyperparameter selection as well as scalability to big data. First, automated graph-based semi-supervised learning is conducted by aggregating the proposed cost functions from different compute nodes and jointly optimizing hyperparameters in both the classification and query selection stages. For dense datasets, clustering-based uncertainty sampling with maximum entropy (CME) loss is applied in the optimization. For sparse and imbalanced datasets, shrinkage optimized KL-divergence regularization and local selection based active learning (SOAR) loss are further naturally adapted in AutoDAL. The optimization is efficiently resolved by iteratively executing a genetic algorithm (GA) refined with a local generating set search (GSS) and solving an integer linear programming (ILP) problem. Moreover, we propose an efficient distributed active learning algorithm which is scalable for big data. The proposed AutoDAL algorithm is applied to multiple benchmark datasets and two real-world datasets including an electrocardiogram (ECG) dataset and a credit fraud detection dataset for classification. We demonstrate that the proposed AutoDAL algorithm is capable of achieving significantly better performance compared to several state-of-the-art AutoML approaches and active learning algorithms.
Collapse
|
6
|
Active Learning by Extreme Learning Machine with Considering Exploration and Exploitation Simultaneously. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11089-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
7
|
Ji W, Zhang Y, Cheng Y, Wang Y, Zhou Y. Development and validation of prediction models for hypertension risks: A cross-sectional study based on 4,287,407 participants. Front Cardiovasc Med 2022; 9:928948. [PMID: 36225955 PMCID: PMC9548597 DOI: 10.3389/fcvm.2022.928948] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 08/29/2022] [Indexed: 11/13/2022] Open
Abstract
ObjectiveTo develop an optimal screening model to identify the individuals with a high risk of hypertension in China by comparing tree-based machine learning models, such as classification and regression tree, random forest, adaboost with a decision tree, extreme gradient boosting decision tree, and other machine learning models like an artificial neural network, naive Bayes, and traditional logistic regression models.MethodsA total of 4,287,407 adults participating in the national physical examination were included in the study. Features were selected using the least absolute shrinkage and selection operator regression. The Borderline synthetic minority over-sampling technique was used for data balance. Non-laboratory and semi-laboratory analyses were carried out in combination with the selected features. The tree-based machine learning models, other machine learning models, and traditional logistic regression models were constructed to identify individuals with hypertension, respectively. Top features selected using the best algorithm and the corresponding variable importance score were visualized.ResultsA total of 24 variables were finally included for analyses after the least absolute shrinkage and selection operator regression model. The sample size of hypertensive patients in the training set was expanded from 689,025 to 2,312,160 using the borderline synthetic minority over-sampling technique algorithm. The extreme gradient boosting decision tree algorithm showed the best results (area under the receiver operating characteristic curve of non-laboratory: 0.893 and area under the receiver operating characteristic curve of semi-laboratory: 0.894). This study found that age, systolic blood pressure, waist circumference, diastolic blood pressure, albumin, drinking frequency, electrocardiogram, ethnicity (uyghur, hui, and other), body mass index, sex (female), exercise frequency, diabetes mellitus, and total bilirubin are important factors reflecting hypertension. Besides, some algorithms included in the semi-laboratory analyses showed less improvement in the predictive performance compared to the non-laboratory analyses.ConclusionUsing multiple methods, a more significant prediction model can be built, which discovers risk factors and provides new insights into the prediction and prevention of hypertension.
Collapse
Affiliation(s)
- Weidong Ji
- Department of Medical Information, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Yushan Zhang
- Department of Maternal and Child Health, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Yinlin Cheng
- Department of Medical Information, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Yushan Wang
- Center of Health Management, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
- *Correspondence: Yushan Wang
| | - Yi Zhou
- Department of Medical Information, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Yi Zhou
| |
Collapse
|
8
|
Back-propagation extreme learning machine. Soft comput 2022. [DOI: 10.1007/s00500-022-07331-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
9
|
Stable Matching-Based Two-Way Selection in Multi-Label Active Learning with Imbalanced Data. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
10
|
Ni J, Huang Z, Yu C, Lv D, Wang C. Comparative Convolutional Dynamic Multi-Attention Recommendation Model. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3510-3521. [PMID: 33556019 DOI: 10.1109/tnnls.2021.3053245] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recently, an attention mechanism has been used to help recommender systems grasp user interests more accurately. It focuses on their pivotal interests from a psychology perspective. However, most current studies based on it only focus on part of user interests; they have not mined user preferences thoroughly. To address the above problem, we propose a novel recommendation model: comparative convolutional dynamic multi-attention (CCDMA). This model provides a more accurate approach to represent user and item features and uses multi-attention-based convolutional neural networks to extract user and item latent feature vectors dynamically. The multi-attention mechanism considers both self-attention and cross-attention. Self-attention refers to the internal attention within users and items; cross-attention is the mutual attention between users and items. Moreover, we propose an optimized comparative learning framework that can mine the ternary relationships between one user and a pair of items, focusing on their relative relationship and the internal link between a pair of items. Extensive experiments on several real-world data sets show that the CCDMA model significantly outperforms state-of-the-art baselines in terms of different evaluation metrics.
Collapse
|
11
|
Li Y, Zhang J, Zhang S, Xiao W, Zhang Z. Multi-objective optimization-based adaptive class-specific cost extreme learning machine for imbalanced classification. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.05.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
12
|
Huang X, Cao T, Chen L, Li J, Tan Z, Xu B, Xu R, Song Y, Zhou Z, Wang Z, Wei Y, Zhang Y, Li J, Huo Y, Qin X, Wu Y, Wang X, Wang H, Cheng X, Xu X, Liu L. Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults. Front Cardiovasc Med 2022; 9:901240. [PMID: 35600480 PMCID: PMC9120532 DOI: 10.3389/fcvm.2022.901240] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 04/05/2022] [Indexed: 11/13/2022] Open
Abstract
Background Stroke is a major global health burden, and risk prediction is essential for the primary prevention of stroke. However, uncertainty remains about the optimal prediction model for analyzing stroke risk. In this study, we aim to determine the most effective stroke prediction method in a Chinese hypertensive population using machine learning and establish a general methodological pipeline for future analysis. Methods The training set included 70% of data (n = 14,491) from the China Stroke Primary Prevention Trial (CSPPT). Internal validation was processed with the rest 30% of CSPPT data (n = 6,211), and external validation was conducted using a nested case–control (NCC) dataset (n = 2,568). The primary outcome was the first stroke. Four received analysis methods were processed and compared: logistic regression (LR), stepwise logistic regression (SLR), extreme gradient boosting (XGBoost), and random forest (RF). Population characteristic data with inclusion and exclusion of laboratory variables were separately analyzed. Accuracy, sensitivity, specificity, kappa, and area under receiver operating characteristic curves (AUCs) were used to make model assessments with AUCs the top concern. Data balancing techniques, including random under-sampling (RUS) and synthetic minority over-sampling technique (SMOTE), were applied to process this unbalanced training set. Results The best model performance was observed in RUS-applied RF model with laboratory variables. Compared with null models (sensitivity = 0, specificity = 100, and mean AUCs = 0.643), data balancing techniques improved overall performance with RUS, demonstrating a more satisfactory effect in the current study (RUS: sensitivity = 63.9; specificity = 53.7; and mean AUCs = 0.624. Adding laboratory variables improved the performance of analysis methods. All results were reconfirmed in validation sets. The top 10 important variables were determined by the analysis method with the best performance. Conclusion Among the tested methods, the most effective stroke prediction model in targeted population is RUS-applied RF. From the insights, the current study revealed, we provided general frameworks for building machine learning-based prediction models.
Collapse
Affiliation(s)
- Xiao Huang
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
- *Correspondence: Xiao Huang
| | - Tianyu Cao
- Biological Anthropology, University of California, Santa Barbara, Santa Barbara, CA, United States
| | - Liangziqian Chen
- Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China
| | - Junpei Li
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Ziheng Tan
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Benjamin Xu
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Richard Xu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Yun Song
- Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China
- Institute of Biomedicine, Anhui Medical University, Hefei, China
| | - Ziyi Zhou
- Department of Biomedical Engineering, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China
| | - Zhuo Wang
- Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
| | - Yaping Wei
- Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
| | - Yan Zhang
- Department of Cardiology, Peking University First Hospital, Beijing, China
| | - Jianping Li
- Department of Cardiology, Peking University First Hospital, Beijing, China
| | - Yong Huo
- Department of Cardiology, Peking University First Hospital, Beijing, China
| | - Xianhui Qin
- National Clinical Research Study Center for Kidney Disease, The State Key Laboratory for Organ Failure Research, Renal Division, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Yanqing Wu
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Xiaobin Wang
- Department of Population, Family and Reproductive Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, United States
| | - Hong Wang
- Department of Cardiovascular Science, Temple University Lewis Katz School of Medicine, Philadelphia, PA, United States
| | - Xiaoshu Cheng
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Xiping Xu
- Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
| | - Lishun Liu
- Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China
- Department of Biomedical Engineering, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China
- Lishun Liu
| |
Collapse
|
13
|
Wang H, Li L, Wang W, Wang H, Zhuang Y, Lu X, Zhang G, Wang S, Lin P, Chen C, Bai Y, Chen Q, Chen H, Qu J, Xu L. Simulations to Assess the Performance of Multifactor Risk Scores for Predicting Myopia Prevalence in Children and Adolescents in China. Front Genet 2022; 13:861164. [PMID: 35480319 PMCID: PMC9035486 DOI: 10.3389/fgene.2022.861164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 03/09/2022] [Indexed: 11/13/2022] Open
Abstract
Background: Myopia is the most common visual impairment among Chinese children and adolescents. The purpose of this study is to explore key interventions for myopia prevalence, especially for early-onset myopia and high myopia.Methods: Univariate and multivariate analyses were conducted to evaluate potential associations between risk factor exposure and myopia. LASSO was performed to prioritize the risk features, and the selected leading factors were used to establish the assembled simulation model. Finally, two forecasting models were constructed to predict the risk of myopia and high myopia.Results: Children and adolescents with persistently incorrect posture had a high risk of myopia (OR 7.205, 95% CI 5.999–8.652), which was 2.8 times higher than that in students who always maintained correct posture. In the cohort with high myopia, sleep time of less than 7 h per day (OR 9.789, 95% CI 6.865–13.958), incorrect sitting posture (OR 8.975, 95% CI 5.339–15.086), and siblings with spherical equivalent <−6.00 D (OR 8.439, 95% CI 5.420–13.142) were the top three risk factors. The AUCs of integrated simulation models for myopia and high myopia were 0.8716 and 0.8191, respectively.Conclusion: The findings illustrate that keeping incorrect posture is the leading risk factor for myopia onset, while the onset age of myopia is the primary factor affecting high myopia progression. The age between 8 and 12 years is the crucial stage for clinical intervention, especially for children with parental myopia.
Collapse
Affiliation(s)
- Hong Wang
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Center of Optometry International Innovation of Wenzhou, Wenzhou, China
| | - Liansheng Li
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Wenzhou Realdata Medical Research Co., Ltd, Wenzhou, China
| | - Wencan Wang
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Wenzhou PSI Medical Laboratory Co., Ltd, Wenzhou, China
| | - Hao Wang
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Youyuan Zhuang
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Xiaoyan Lu
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Guosi Zhang
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Siyu Wang
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Peng Lin
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Chong Chen
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Yu Bai
- Center of Optometry International Innovation of Wenzhou, Wenzhou, China
| | - Qi Chen
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Hao Chen
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
- *Correspondence: Liangde Xu, ; Jia Qu, ; Hao Chen,
| | - Jia Qu
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Center of Optometry International Innovation of Wenzhou, Wenzhou, China
- *Correspondence: Liangde Xu, ; Jia Qu, ; Hao Chen,
| | - Liangde Xu
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Center of Optometry International Innovation of Wenzhou, Wenzhou, China
- *Correspondence: Liangde Xu, ; Jia Qu, ; Hao Chen,
| |
Collapse
|
14
|
Ji W, Xue M, Zhang Y, Yao H, Wang Y. A Machine Learning Based Framework to Identify and Classify Non-alcoholic Fatty Liver Disease in a Large-Scale Population. Front Public Health 2022; 10:846118. [PMID: 35444985 PMCID: PMC9013842 DOI: 10.3389/fpubh.2022.846118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 02/23/2022] [Indexed: 12/12/2022] Open
Abstract
Non-alcoholic fatty liver disease (NAFLD) is a common serious health problem worldwide, which lacks efficient medical treatment. We aimed to develop and validate the machine learning (ML) models which could be used to the accurate screening of large number of people. This paper included 304,145 adults who have joined in the national physical examination and used their questionnaire and physical measurement parameters as model's candidate covariates. Absolute shrinkage and selection operator (LASSO) was used to feature selection from candidate covariates, then four ML algorithms were used to build the screening model for NAFLD, used a classifier with the best performance to output the importance score of the covariate in NAFLD. Among the four ML algorithms, XGBoost owned the best performance (accuracy = 0.880, precision = 0.801, recall = 0.894, F-1 = 0.882, and AUC = 0.951), and the importance ranking of covariates is accordingly BMI, age, waist circumference, gender, type 2 diabetes, gallbladder disease, smoking, hypertension, dietary status, physical activity, oil-loving and salt-loving. ML classifiers could help medical agencies achieve the early identification and classification of NAFLD, which is particularly useful for areas with poor economy, and the covariates' importance degree will be helpful to the prevention and treatment of NAFLD.
Collapse
Affiliation(s)
- Weidong Ji
- Department of Medical Information, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Mingyue Xue
- Hospital of Traditional Chinese Medicine Affiliated to the Fourth Clinical Medical College of Xinjiang Medical University, Urumqi, China
| | - Yushan Zhang
- Department of Maternal and Child Health, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Hua Yao
- Center of Health Management, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Yushan Wang
- Center of Health Management, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
- *Correspondence: Yushan Wang
| |
Collapse
|
15
|
RoiSeg: An Effective Moving Object Segmentation Approach Based on Region-of-Interest with Unsupervised Learning. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12052674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Traditional video object segmentation often has low detection speed and inaccurate results due to the jitter caused by the pan-and-tilt or hand-held devices. Deep neural network (DNN) has been widely adopted to address these problems; however, it relies on a large number of annotated data and high-performance computing units. Therefore, DNN is not suitable for some special scenarios (e.g., no prior knowledge or powerful computing ability). In this paper, we propose RoiSeg, an effective moving object segmentation approach based on Region-of-Interest (ROI), which utilizes unsupervised learning method to achieve automatic segmentation of moving objects. Specifically, we first hypothesize that the central n × n pixels of images act as the ROI to represent the features of the segmented moving object. Second, we pool the ROI to a central point of the foreground to simplify the segmentation problem into a classification problem based on ROI. Third but not the least, we implement a trajectory-based classifier and an online updating mechanism to address the classification problem and the compensation of class imbalance, respectively. We conduct extensive experiments to evaluate the performance of RoiSeg and the experimental results demonstrate that RoiSeg is more accurate and faster compared with other segmentation algorithms. Moreover, RoiSeg not only effectively handles ambient lighting changes, fog, salt and pepper noise, but also has a good ability to deal with camera jitter and windy scenes.
Collapse
|
16
|
Zheng Y, Chen B, Wang S, Wang W, Qin W. Mixture Correntropy-Based Kernel Extreme Learning Machines. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:811-825. [PMID: 33079685 DOI: 10.1109/tnnls.2020.3029198] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Kernel-based extreme learning machine (KELM), as a natural extension of ELM to kernel learning, has achieved outstanding performance in addressing various regression and classification problems. Compared with the basic ELM, KELM has a better generalization ability owing to no needs of the number of hidden nodes given beforehand and random projection mechanism. Since KELM is derived under the minimum mean square error (MMSE) criterion for the Gaussian assumption of noise, its performance may deteriorate under the non-Gaussian cases, seriously. To improve the robustness of KELM, this article proposes a mixture correntropy-based KELM (MC-KELM), which adopts the recently proposed maximum mixture correntropy criterion as the optimization criterion, instead of using the MMSE criterion. In addition, an online sequential version of MC-KELM (MCOS-KELM) is developed to deal with the case that the data arrive sequentially (one-by-one or chunk-by-chunk). Experimental results on regression and classification data sets are reported to validate the performance superiorities of the new methods.
Collapse
|
17
|
Zhang J, Dai Q. A cost-sensitive active learning algorithm: toward imbalanced time series forecasting. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06837-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
18
|
Wei C, Zhang L, Feng Y, Ma A, Kang Y. Machine learning model for predicting acute kidney injury progression in critically ill patients. BMC Med Inform Decis Mak 2022; 22:17. [PMID: 35045840 PMCID: PMC8772216 DOI: 10.1186/s12911-021-01740-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Accepted: 12/21/2021] [Indexed: 12/30/2022] Open
Abstract
Background Acute kidney injury (AKI) is a serve and harmful syndrome in the intensive care unit. Comparing to the patients with AKI stage 1/2, the patients with AKI stage 3 have higher in-hospital mortality and risk of progression to chronic kidney disease. The purpose of this study is to develop a prediction model that predict whether patients with AKI stage 1/2 will progress to AKI stage 3. Methods Patients with AKI stage 1/2, when they were first diagnosed with AKI in the Medical Information Mart for Intensive Care, were included. We used the Logistic regression and machine learning extreme gradient boosting (XGBoost) to build two models which can predict patients who will progress to AKI stage 3. Established models were evaluated by cross-validation, receiver operating characteristic curve, and precision–recall curves. Results We included 25,711 patients, of whom 2130 (8.3%) progressed to AKI stage 3. Creatinine, multiple organ failure syndromes were the most important in AKI progression prediction. The XGBoost model has a better performance than the Logistic regression model on predicting AKI stage 3 progression. Thus, we build a software based on our data which can predict AKI progression in real time. Conclusions The XGboost model can better identify patients with AKI progression than Logistic regression model. Machine learning techniques may improve predictive modeling in medical research.
Collapse
|
19
|
Comparison of the Meta-Active Machine Learning Model Applied to Biological Data-Driven Experiments with Other Models. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:8014850. [PMID: 34938423 PMCID: PMC8687783 DOI: 10.1155/2021/8014850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 10/11/2021] [Accepted: 11/14/2021] [Indexed: 11/30/2022]
Abstract
Currently, many methods that could estimate the effects of conditions on a given biological target require either strong modelling assumptions or separate screens. Traditionally, many conditions and targets, without doing all possible experiments, could be achieved by driven experimentation or several mathematical methods, especially conversational machine learning methods. However, these methods still could not avoid and replace manual labels completely. This paper presented a meta-active machine learning method to resolve this problem. This project has used nine traditional machine learning methods to compare their accuracy and running time. In addition, this paper analyzes the meta-active machine learning method (MAML) compared with a classical screening method and progressive experiments. The obtained results show that applying this method yields the best experimental results on the current dataset.
Collapse
|
20
|
Zhang W, Wu QMJ, Yang Y, Akilan T. Multimodel Feature Reinforcement Framework Using Moore-Penrose Inverse for Big Data Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:5008-5021. [PMID: 33021948 DOI: 10.1109/tnnls.2020.3026621] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Fully connected representation learning (FCRL) is one of the widely used network structures in multimodel image classification frameworks. However, most FCRL-based structures, for instance, stacked autoencoder encode features and find the final cognition with separate building blocks, resulting in loosely connected feature representation. This article achieves a robust representation by considering a low-dimensional feature and the classifier model simultaneously. Thus, a new hierarchical subnetwork-based neural network (HSNN) is proposed in this article. The novelties of this framework are as follows: 1) it is an iterative learning process, instead of stacking separate blocks to obtain the discriminative encoding and the final classification results. In this sense, the optimal global features are generated; 2) it applies Moore-Penrose (MP) inverse-based batch-by-batch learning strategy to handle large-scale data sets, so that large data set, such as Place365 containing 1.8 million images, can be processed effectively. The experimental results on multiple domains with a varying number of training samples from ∼ 1 K to ∼ 2 M show that the proposed feature reinforcement framework achieves better generalization performance compared with most state-of-the-art FCRL methods.
Collapse
|
21
|
Active learning with extreme learning machine for online imbalanced multiclass classification. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107385] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
22
|
Cai L, Wang L, Fu X, Zeng X. Active Semisupervised Model for Improving the Identification of Anticancer Peptides. ACS OMEGA 2021; 6:23998-24008. [PMID: 34568678 PMCID: PMC8459422 DOI: 10.1021/acsomega.1c03132] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Indexed: 06/13/2023]
Abstract
Cancer is one of the most dangerous threats to human health. Accurate identification of anticancer peptides (ACPs) is valuable for the development and design of new anticancer agents. However, most machine-learning algorithms have limited ability to identify ACPs, and their accuracy is sensitive to the amount of label data. In this paper, we construct a new technology that combines active learning (AL) and label propagation (LP) algorithm to solve this problem, called (ACP-ALPM). First, we develop an efficient feature representation method based on various descriptor information and coding information of the peptide sequence. Then, an AL strategy is used to filter out the most informative data for model training, and a more powerful LP classifier is cast through continuous iterations. Finally, we evaluate the performance of ACP-ALPM and compare it with that of some of the state-of-the-art and classic methods; experimental results show that our method is significantly superior to them. In addition, through the experimental comparison of random selection and AL on three public data sets, it is proved that the AL strategy is more effective. Notably, a visualization experiment further verified that AL can utilize unlabeled data to improve the performance of the model. We hope that our method can be extended to other types of peptides and provide more inspiration for other similar work.
Collapse
Affiliation(s)
- Lijun Cai
- Department of Information
Science and Technology, Hunan University, Changsha, Hunan 410000, China
| | - Li Wang
- Department of Information
Science and Technology, Hunan University, Changsha, Hunan 410000, China
| | - Xiangzheng Fu
- Department of Information
Science and Technology, Hunan University, Changsha, Hunan 410000, China
| | - Xiangxiang Zeng
- Department of Information
Science and Technology, Hunan University, Changsha, Hunan 410000, China
| |
Collapse
|
23
|
The Use of Transfer Learning for Activity Recognition in Instances of Heterogeneous Sensing. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11167660] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Transfer learning is a growing field that can address the variability of activity recognition problems by reusing the knowledge from previous experiences to recognise activities from different conditions, resulting in the leveraging of resources such as training and labelling efforts. Although integrating ubiquitous sensing technology and transfer learning seem promising, there are some research opportunities that, if addressed, could accelerate the development of activity recognition. This paper presents TL-FmRADLs; a framework that converges the feature fusion strategy with a teacher/learner approach over the active learning technique to automatise the self-training process of the learner models. Evaluation TL-FmRADLs is conducted over InSync; an open access dataset introduced for the first time in this paper. Results show promising effects towards mitigating the insufficiency of labelled data available by enabling the learner model to outperform the teacher’s performance.
Collapse
|
24
|
Wang T, Cao J, Lai X, Wu QMJ. Hierarchical One-Class Classifier With Within-Class Scatter-Based Autoencoders. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3770-3776. [PMID: 32822309 DOI: 10.1109/tnnls.2020.3015860] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Autoencoding is a vital branch of representation learning in deep neural networks (DNNs). The extreme learning machine-based autoencoder (ELM-AE) has been recently developed and has gained popularity for its fast learning speed and ease of implementation. However, the ELM-AE uses random hidden node parameters without tuning, which may generate meaningless encoded features. In this brief, we first propose a within-class scatter information constraint-based AE (WSI-AE) that minimizes both the reconstruction error and the within-class scatter of the encoded features. We then build stacked WSI-AEs into a one-class classification (OCC) algorithm based on the hierarchical regularized least-squared method. The effectiveness of our approach was experimentally demonstrated in comparisons with several state-of-the-art AEs and OCC algorithms. The evaluations were performed on several benchmark data sets.
Collapse
|
25
|
A comprehensive active learning method for multiclass imbalanced data streams with concept drift. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106778] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
26
|
SMOTE-Based Weighted Deep Rotation Forest for the Imbalanced Hyperspectral Data Classification. REMOTE SENSING 2021. [DOI: 10.3390/rs13030464] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Conventional classification algorithms have shown great success in balanced hyperspectral data classification. However, the imbalanced class distribution is a fundamental problem of hyperspectral data, and it is regarded as one of the great challenges in classification tasks. To solve this problem, a non-ANN based deep learning, namely SMOTE-Based Weighted Deep Rotation Forest (SMOTE-WDRoF) is proposed in this paper. First, the neighboring pixels of instances are introduced as the spatial information and balanced datasets are created by using the SMOTE algorithm. Second, these datasets are fed into the WDRoF model that consists of the rotation forest and the multi-level cascaded random forests. Specifically, the rotation forest is used to generate rotation feature vectors, which are input into the subsequent cascade forest. Furthermore, the output probability of each level and the original data are stacked as the dataset of the next level. And the sample weights are automatically adjusted according to the dynamic weight function constructed by the classification results of each level. Compared with the traditional deep learning approaches, the proposed method consumes much less training time. The experimental results on four public hyperspectral data demonstrate that the proposed method can get better performance than support vector machine, random forest, rotation forest, SMOTE combined rotation forest, convolutional neural network, and rotation-based deep forest in multiclass imbalance learning.
Collapse
|
27
|
Camargo G, Bugatti PH, Saito PTM. Active semi-supervised learning for biological data classification. PLoS One 2020; 15:e0237428. [PMID: 32813738 PMCID: PMC7437865 DOI: 10.1371/journal.pone.0237428] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Accepted: 07/27/2020] [Indexed: 11/18/2022] Open
Abstract
Due to datasets have continuously grown, efforts have been performed in the attempt to solve the problem related to the large amount of unlabeled data in disproportion to the scarcity of labeled data. Another important issue is related to the trade-off between the difficulty in obtaining annotations provided by a specialist and the need for a significant amount of annotated data to obtain a robust classifier. In this context, active learning techniques jointly with semi-supervised learning are interesting. A smaller number of more informative samples previously selected (by the active learning strategy) and labeled by a specialist can propagate the labels to a set of unlabeled data (through the semi-supervised one). However, most of the literature works neglect the need for interactive response times that can be required by certain real applications. We propose a more effective and efficient active semi-supervised learning framework, including a new active learning method. An extensive experimental evaluation was performed in the biological context (using the ALL-AML, Escherichia coli and PlantLeaves II datasets), comparing our proposals with state-of-the-art literature works and different supervised (SVM, RF, OPF) and semi-supervised (YATSI-SVM, YATSI-RF and YATSI-OPF) classifiers. From the obtained results, we can observe the benefits of our framework, which allows the classifier to achieve higher accuracies more quickly with a reduced number of annotated samples. Moreover, the selection criterion adopted by our active learning method, based on diversity and uncertainty, enables the prioritization of the most informative boundary samples for the learning process. We obtained a gain of up to 20% against other learning techniques. The active semi-supervised learning approaches presented a better trade-off (accuracies and competitive and viable computational times) when compared with the active supervised learning ones.
Collapse
Affiliation(s)
- Guilherme Camargo
- Department of Computing, Federal University of Technology - Paraná, Cornélio Procópio, PR, Brazil
| | - Pedro H. Bugatti
- Department of Computing, Federal University of Technology - Paraná, Cornélio Procópio, PR, Brazil
| | - Priscila T. M. Saito
- Department of Computing, Federal University of Technology - Paraná, Cornélio Procópio, PR, Brazil
- Institute of Computing, University of Campinas, Campinas, SP, Brazil
| |
Collapse
|
28
|
Multiclass Non-Randomized Spectral–Spatial Active Learning for Hyperspectral Image Classification. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10144739] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Active Learning (AL) for Hyperspectral Image Classification (HSIC) has been extensively studied. However, the traditional AL methods do not consider randomness among the existing and new samples. Secondly, very limited AL research has been carried out on joint spectral–spatial information. Thirdly, a minor but still worth mentioning factor is the stopping criteria. Therefore, this study caters to all these issues using a spatial prior Fuzziness concept coupled with Multinomial Logistic Regression via a Splitting and Augmented Lagrangian (MLR-LORSAL) classifier with dual stopping criteria. This work further compares several sample selection methods with the diverse nature of classifiers i.e., probabilistic and non-probabilistic. The sample selection methods include Breaking Ties (BT), Mutual Information (MI) and Modified Breaking Ties (MBT). The comparative classifiers include Support Vector Machine (SVM), Extreme Learning Machine (ELM), K-Nearest Neighbour (KNN) and Ensemble Learning (EL). The experimental results on three benchmark hyperspectral datasets reveal that the proposed pipeline significantly increases the classification accuracy and generalization performance. To further validate the performance, several statistical tests are also considered such as Precision, Recall and F1-Score.
Collapse
|
29
|
Toward Enhanced State of Charge Estimation of Lithium-ion Batteries Using Optimized Machine Learning Techniques. Sci Rep 2020; 10:4687. [PMID: 32170100 PMCID: PMC7070070 DOI: 10.1038/s41598-020-61464-7] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 02/24/2020] [Indexed: 11/09/2022] Open
Abstract
State of charge (SOC) is a crucial index used in the assessment of electric vehicle (EV) battery storage systems. Thus, SOC estimation of lithium-ion batteries has been widely investigated because of their fast charging, long-life cycle, and high energy density characteristics. However, precise SOC assessment of lithium-ion batteries remains challenging because of their varying characteristics under different working environments. Machine learning techniques have been widely used to design an advanced SOC estimation method without the information of battery chemical reactions, battery models, internal properties, and additional filters. Here, the capacity of optimized machine learning techniques are presented toward enhanced SOC estimation in terms of learning capability, accuracy, generalization performance, and convergence speed. We validate the proposed method through lithium-ion battery experiments, EV drive cycles, temperature, noise, and aging effects. We show that the proposed method outperforms several state-of-the-art approaches in terms of accuracy, adaptability, and robustness under diverse operating conditions.
Collapse
|
30
|
Xue M, Su Y, Li C, Wang S, Yao H. Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework. J Diabetes Res 2020; 2020:6873891. [PMID: 33029536 PMCID: PMC7532405 DOI: 10.1155/2020/6873891] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 08/01/2020] [Accepted: 09/02/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. METHODS A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables' importance scores of T2DM. RESULTS The results indicated that XGBoost had the best performance (accuracy = 0.906, precision = 0.910, recall = 0.902, F-1 = 0.906, and AUC = 0.968). The degree of variables' importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). CONCLUSIONS We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables' importance scores gives a clue to prevent diabetes occurrence.
Collapse
Affiliation(s)
- Mingyue Xue
- Hospital of Traditional Chinese Medicine Affiliated to the Fourth Clinical Medical College of Xinjiang Medical University, Urumqi, China
- College of Public Health, Xinjiang Medical University, Urumqi, China
| | - Yinxia Su
- College of Public Health, Xinjiang Medical University, Urumqi, China
| | - Chen Li
- The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Shuxia Wang
- Center of Health Management, The First Affiliated Hospital, Xinjiang Medical University, Urumqi, China
| | - Hua Yao
- Center of Health Management, The First Affiliated Hospital, Xinjiang Medical University, Urumqi, China
| |
Collapse
|
31
|
Khawaja A, Khan TM, Khan MAU, Nawaz SJ. A Multi-Scale Directional Line Detector for Retinal Vessel Segmentation. SENSORS (BASEL, SWITZERLAND) 2019; 19:E4949. [PMID: 31766276 PMCID: PMC6891360 DOI: 10.3390/s19224949] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 11/02/2019] [Accepted: 11/08/2019] [Indexed: 11/16/2022]
Abstract
The assessment of transformations in the retinal vascular structure has a strong potential in indicating a wide range of underlying ocular pathologies. Correctly identifying the retinal vessel map is a crucial step in disease identification, severity progression assessment, and appropriate treatment. Marking the vessels manually by a human expert is a tedious and time-consuming task, thereby reinforcing the need for automated algorithms capable of quick segmentation of retinal features and any possible anomalies. Techniques based on unsupervised learning methods utilize vessel morphology to classify vessel pixels. This study proposes a directional multi-scale line detector technique for the segmentation of retinal vessels with the prime focus on the tiny vessels that are most difficult to segment out. Constructing a directional line-detector, and using it on images having only the features oriented along the detector's direction, significantly improves the detection accuracy of the algorithm. The finishing step involves a binarization operation, which is again directional in nature, helps in achieving further performance improvements in terms of key performance indicators. The proposed method is observed to obtain a sensitivity of 0.8043, 0.8011, and 0.7974 for the Digital Retinal Images for Vessel Extraction (DRIVE), STructured Analysis of the Retina (STARE), and Child Heart And health Study in England (CHASE_DB1) datasets, respectively. These results, along with other performance enhancements demonstrated by the conducted experimental evaluation, establish the validity and applicability of directional multi-scale line detectors as a competitive framework for retinal image segmentation.
Collapse
Affiliation(s)
- Ahsan Khawaja
- Department of Electrical and Computer Engineering, COMSATS University Islamabad (CUI), Islamabad 45550, Pakistan; (T.M.K.); (S.J.N.)
| | - Tariq M. Khan
- Department of Electrical and Computer Engineering, COMSATS University Islamabad (CUI), Islamabad 45550, Pakistan; (T.M.K.); (S.J.N.)
| | | | - Syed Junaid Nawaz
- Department of Electrical and Computer Engineering, COMSATS University Islamabad (CUI), Islamabad 45550, Pakistan; (T.M.K.); (S.J.N.)
| |
Collapse
|
32
|
A fast and accurate approach for bankruptcy forecasting using squared logistics loss with GPU-based extreme gradient boosting. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.04.060] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
33
|
Spatial Prior Fuzziness Pool-Based Interactive Classification of Hyperspectral Images. REMOTE SENSING 2019. [DOI: 10.3390/rs11091136] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Acquisition of labeled data for supervised Hyperspectral Image (HSI) classification is expensive in terms of both time and costs. Moreover, manual selection and labeling are often subjective and tend to induce redundancy into the classifier. Active learning (AL) can be a suitable approach for HSI classification as it integrates data acquisition to the classifier design by ranking the unlabeled data to provide advice for the next query that has the highest training utility. However, multiclass AL techniques tend to include redundant samples into the classifier to some extent. This paper addresses such a problem by introducing an AL pipeline which preserves the most representative and spatially heterogeneous samples. The adopted strategy for sample selection utilizes fuzziness to assess the mapping between actual output and the approximated a-posteriori probabilities, computed by a marginal probability distribution based on discriminative random fields. The samples selected in each iteration are then provided to the spectral angle mapper-based objective function to reduce the inter-class redundancy. Experiments on five HSI benchmark datasets confirmed that the proposed Fuzziness and Spectral Angle Mapper (FSAM)-AL pipeline presents competitive results compared to the state-of-the-art sample selection techniques, leading to lower computational requirements.
Collapse
|
34
|
Deep Learning in the Biomedical Applications: Recent and Future Status. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9081526] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Deep neural networks represent, nowadays, the most effective machine learning technology in biomedical domain. In this domain, the different areas of interest concern the Omics (study of the genome—genomics—and proteins—transcriptomics, proteomics, and metabolomics), bioimaging (study of biological cell and tissue), medical imaging (study of the human organs by creating visual representations), BBMI (study of the brain and body machine interface) and public and medical health management (PmHM). This paper reviews the major deep learning concepts pertinent to such biomedical applications. Concise overviews are provided for the Omics and the BBMI. We end our analysis with a critical discussion, interpretation and relevant open challenges.
Collapse
|