1
|
Naderi Yaghouti AR, Zamanian H, Shalbaf A. Machine learning approaches for early detection of non-alcoholic steatohepatitis based on clinical and blood parameters. Sci Rep 2024; 14:2442. [PMID: 38287043 PMCID: PMC10824722 DOI: 10.1038/s41598-024-51741-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 01/09/2024] [Indexed: 01/31/2024] Open
Abstract
This study aims to develop a machine learning approach leveraging clinical data and blood parameters to predict non-alcoholic steatohepatitis (NASH) based on the NAFLD Activity Score (NAS). Using a dataset of 181 patients, we performed preprocessing including normalization and categorical encoding. To identify predictive features, we applied sequential forward selection (SFS), chi-square, analysis of variance (ANOVA), and mutual information (MI). The selected features were used to train machine learning classifiers including SVM, random forest, AdaBoost, LightGBM, and XGBoost. Hyperparameter tuning was done for each classifier using randomized search. Model evaluation was performed using leave-one-out cross-validation over 100 repetitions. Among the classifiers, random forest, combined with SFS feature selection and 10 features, obtained the best performance: Accuracy: 81.32% ± 6.43%, Sensitivity: 86.04% ± 6.21%, Specificity: 70.49% ± 8.12% Precision: 81.59% ± 6.23%, and F1-score: 83.75% ± 6.23% percent. Our findings highlight the promise of machine learning in enhancing early diagnosis of NASH and provide a compelling alternative to conventional diagnostic techniques. Consequently, this study highlights the promise of machine learning techniques in enhancing early and non-invasive diagnosis of NASH based on readily available clinical and blood data. Our findings provide the basis for developing scalable approaches that can improve screening and monitoring of NASH progression.
Collapse
Affiliation(s)
- Amir Reza Naderi Yaghouti
- Department of Biomedical Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Hamed Zamanian
- Department of Biomedical Engineering and Medical Physics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ahmad Shalbaf
- Department of Biomedical Engineering and Medical Physics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
2
|
Predicting survival after radiosurgery in patients with lung cancer brain metastases using deep learning of radiomics and EGFR status. Phys Eng Sci Med 2023; 46:585-596. [PMID: 36857023 DOI: 10.1007/s13246-023-01234-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/13/2023] [Indexed: 03/02/2023]
Abstract
The early prediction of overall survival (OS) in patients with lung cancer brain metastases (BMs) after Gamma Knife radiosurgery (GKRS) can facilitate patient management and outcome improvement. However, the disease progression is influenced by multiple factors, such as patient characteristics and treatment strategies, and hence satisfactory performance of OS prediction remains challenging. Accordingly, we proposed a deep learning approach based on comprehensive predictors, including clinical, imaging, and genetic information, to accomplish reliable and personalized OS prediction in patients with BMs after receiving GKRS. Overall 1793 radiomic features extracted from pre-GKRS magnetic resonance images (MRI), clinical information, and epidermal growth factor receptor (EGFR) mutation status were retrospectively collected from 237 BM patients who underwent GKRS. DeepSurv, a multi-layer perceptron model, with 4 different aggregation methods of radiomics was applied to predict personalized survival curves and survival status at 3, 6, 12, and 24 months. The model combining clinical features, EGFR status, and radiomics from the largest BM showed the best prediction performance with concordance index of 0.75 and achieved areas under the curve of 0.82, 0.80, 0.84, and 0.92 for predicting survival status at 3, 6, 12, and 24 months, respectively. The DeepSurv model showed a significant improvement (p < 0.001) in concordance index compared to the validated lung cancer BM prognostic molecular markers. Furthermore, the model provided a novel estimate of the risk-of-death period for patients. The personalized survival curves generated by the DeepSurv model effectively predicted the risk-of-death period which could facilitate personalized management of patients with lung cancer BMs.
Collapse
|
3
|
Lash MT, Sajeesh S, Araz OM. Predicting mobility using limited data during early stages of a pandemic. JOURNAL OF BUSINESS RESEARCH 2023; 157:113413. [PMID: 36628355 PMCID: PMC9815965 DOI: 10.1016/j.jbusres.2022.113413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 10/24/2022] [Accepted: 10/28/2022] [Indexed: 06/17/2023]
Abstract
The COVID-19 pandemic has changed consumer behavior substantially. In this study, we explore the drivers of consumer mobility in several metropolitan areas in the United States under the perceived risks of COVID-19. We capture multiple dimensions of perceived risk using local and national cases and death counts of COVID-19, along with real-time Google Trends data for personal protective equipment (PPE). While Google Trends data are popular inputs in many studies, the risk of multicollinearity escalates with the addition of more relevant terms. Therefore, multicollinearity-alleviating methods are needed to appropriately leverage information provided by Google Trends data. We develop and utilize a novel optimization scheme to induce linear models containing strictly significant covariates and minimal multicollinearity. We find that there are a variety of unique factors that drive mobility in different geographic locations, as well as several factors that are common to all locations.
Collapse
Affiliation(s)
- Michael T Lash
- School of Business, University of Kansas, Lawrence, KS 66045, United States
| | - S Sajeesh
- College of Business, University of Nebraska - Lincoln, Lincoln, NE 68588, United States
| | - Ozgur M Araz
- College of Business, University of Nebraska - Lincoln, Lincoln, NE 68588, United States
| |
Collapse
|
4
|
Lu CF, Liao CY, Chao HS, Chiu HY, Wang TW, Lee Y, Chen JR, Shiao TH, Chen YM, Wu YT. A radiomics-based deep learning approach to predict progression free-survival after tyrosine kinase inhibitor therapy in non-small cell lung cancer. Cancer Imaging 2023; 23:9. [PMID: 36670497 PMCID: PMC9854198 DOI: 10.1186/s40644-023-00522-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 01/05/2023] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKIs) are a first-line therapy for non-small cell lung cancer (NSCLC) with EGFR mutations. Approximately half of the patients with EGFR-mutated NSCLC are treated with EGFR-TKIs and develop disease progression within 1 year. Therefore, the early prediction of tumor progression in patients who receive EGFR-TKIs can facilitate patient management and development of treatment strategies. We proposed a deep learning approach based on both quantitative computed tomography (CT) characteristics and clinical data to predict progression-free survival (PFS) in patients with advanced NSCLC after EGFR-TKI treatment. METHODS A total of 593 radiomic features were extracted from pretreatment chest CT images. The DeepSurv models for the progression risk stratification of EGFR-TKI treatment were proposed based on CT radiomic and clinical features from 270 stage IIIB-IV EGFR-mutant NSCLC patients. Time-dependent PFS predictions at 3, 12, 18, and 24 months and estimated personalized PFS curves were calculated using the DeepSurv models. RESULTS The model combining clinical and radiomic features demonstrated better prediction performance than the clinical model. The model achieving areas under the curve of 0.76, 0.77, 0.76, and 0.86 can predict PFS at 3, 12, 18, and 24 months, respectively. The personalized PFS curves showed significant differences (p < 0.003) between groups with good (PFS > median) and poor (PFS < median) tumor control. CONCLUSIONS The DeepSurv models provided reliable multi-time-point PFS predictions for EGFR-TKI treatment. The personalized PFS curves can help make accurate and individualized predictions of tumor progression. The proposed deep learning approach holds promise for improving the pre-TKI personalized management of patients with EGFR-mutated NSCLC.
Collapse
Affiliation(s)
- Chia-Feng Lu
- grid.260539.b0000 0001 2059 7017Department of Biomedical Imaging and Radiological Sciences, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Chien-Yi Liao
- grid.260539.b0000 0001 2059 7017Department of Biomedical Imaging and Radiological Sciences, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Heng-Sheng Chao
- grid.278247.c0000 0004 0604 5314Department of Chest Medicine, Taipei Veteran General Hospital, Taipei, Taiwan
| | - Hwa-Yen Chiu
- grid.278247.c0000 0004 0604 5314Department of Chest Medicine, Taipei Veteran General Hospital, Taipei, Taiwan ,grid.260539.b0000 0001 2059 7017Institute of Biophotonics, National Yang Ming Chiao Tung University, Taipei, Taiwan ,grid.260539.b0000 0001 2059 7017School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Ting-Wei Wang
- grid.260539.b0000 0001 2059 7017Institute of Biophotonics, National Yang Ming Chiao Tung University, Taipei, Taiwan ,grid.260539.b0000 0001 2059 7017School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Yen Lee
- grid.260539.b0000 0001 2059 7017Institute of Biophotonics, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Jyun-Ru Chen
- grid.260539.b0000 0001 2059 7017Department of Biomedical Imaging and Radiological Sciences, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Tsu-Hui Shiao
- grid.278247.c0000 0004 0604 5314Department of Chest Medicine, Taipei Veteran General Hospital, Taipei, Taiwan
| | - Yuh-Min Chen
- grid.278247.c0000 0004 0604 5314Department of Chest Medicine, Taipei Veteran General Hospital, Taipei, Taiwan ,grid.260539.b0000 0001 2059 7017School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Yu-Te Wu
- grid.260539.b0000 0001 2059 7017Institute of Biophotonics, National Yang Ming Chiao Tung University, Taipei, Taiwan ,grid.260539.b0000 0001 2059 7017Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan
| |
Collapse
|
5
|
Zhao H, Xu Y, Wu Y, Ma Z, Ding Z, Sun Y. Modeling of the Rating of Perceived Exertion Based on Heart Rate Using Machine Learning Methods. AN ACAD BRAS CIENC 2023; 95:e20201723. [PMID: 37018836 DOI: 10.1590/0001-3765202320201723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 04/08/2021] [Indexed: 04/07/2023] Open
Abstract
Rating of perceived exertion (RPE) can serve as a more convenient and economical alternative to heart rate (HR) for exercise intensity control. This study aims to explore the influence of factors, such as indicators of demographic, anthropometric, body composition, cardiovascular function and basic exercise ability on the relationship between HR and RPE, and to develop the model predicting RPE from HR. 48 healthy participants were recruited to perform an incrementally 6-stage pedaling test. HR and RPE were collected during each stage. The influencing factors were identified with the forward selection method to train Gaussian Process regression (GPR), support vector machine (SVM) and linear regression models. Metrics of R2, adjusted R2 and RMSE were calculated to evaluate the performance of the models. The GPR model outperformed the SVM and linear regression models, and achieved an R2 of 0.95, adjusted R2 of 0.89 and RMSE of 0.52. Indicators of age, resting heart rate (RHR), Central arterial pressure (CAP), body fat rate (BFR) and body mass index (BMI) were identified as factors that best predicted the relationship between RPE and HR. It is possible to use GPR model to estimate RPE from HR accurately, after adjusting for age, RHR, CAP, BFR and BMI.
Collapse
Affiliation(s)
- Huanhuan Zhao
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, 350 Shushan Lake Road, Hefei, 230031, China
- University of Science and Technology of China, 96 Jinzhai Road, Hefei, 230026, China
- School of Computer and Information Engineering, Chuzhou University, 1 Huifeng West Road, Chuzhou, 239000, China
| | - Yang Xu
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, 350 Shushan Lake Road, Hefei, 230031, China
| | - Yichen Wu
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, 350 Shushan Lake Road, Hefei, 230031, China
- University of Science and Technology of China, 96 Jinzhai Road, Hefei, 230026, China
- School of Electronic and Information Engineering, Anhui Jianzhu University, 292 ziyun Road, Hefei 230601, China
| | - Zuchang Ma
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, 350 Shushan Lake Road, Hefei, 230031, China
| | - Zenghui Ding
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, 350 Shushan Lake Road, Hefei, 230031, China
| | - Yining Sun
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, 350 Shushan Lake Road, Hefei, 230031, China
| |
Collapse
|
6
|
Kasim S, Malek S, Song C, Wan Ahmad WA, Fong A, Ibrahim KS, Safiruz MS, Aziz F, Hiew JH, Ibrahim N. In-hospital mortality risk stratification of Asian ACS patients with artificial intelligence algorithm. PLoS One 2022; 17:e0278944. [PMID: 36508425 PMCID: PMC9744311 DOI: 10.1371/journal.pone.0278944] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 11/25/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Conventional risk score for predicting in-hospital mortality following Acute Coronary Syndrome (ACS) is not catered for Asian patients and requires different types of scoring algorithms for STEMI and NSTEMI patients. OBJECTIVE To derive a single algorithm using deep learning and machine learning for the prediction and identification of factors associated with in-hospital mortality in Asian patients with ACS and to compare performance to a conventional risk score. METHODS The Malaysian National Cardiovascular Disease Database (NCVD) registry, is a multi-ethnic, heterogeneous database spanning from 2006-2017. It was used for in-hospital mortality model development with 54 variables considered for patients with STEMI and Non-STEMI (NSTEMI). Mortality prediction was analyzed using feature selection methods with machine learning algorithms. Deep learning algorithm using features selected from machine learning was compared to Thrombolysis in Myocardial Infarction (TIMI) score. RESULTS A total of 68528 patients were included in the analysis. Deep learning models constructed using all features and selected features from machine learning resulted in higher performance than machine learning and TIMI risk score (p < 0.0001 for all). The best model in this study is the combination of features selected from the SVM algorithm with a deep learning classifier. The DL (SVM selected var) algorithm demonstrated the highest predictive performance with the least number of predictors (14 predictors) for in-hospital prediction of STEMI patients (AUC = 0.96, 95% CI: 0.95-0.96). In NSTEMI in-hospital prediction, DL (RF selected var) (AUC = 0.96, 95% CI: 0.95-0.96, reported slightly higher AUC compared to DL (SVM selected var) (AUC = 0.95, 95% CI: 0.94-0.95). There was no significant difference between DL (SVM selected var) algorithm and DL (RF selected var) algorithm (p = 0.5). When compared to the DL (SVM selected var) model, the TIMI score underestimates patients' risk of mortality. TIMI risk score correctly identified 13.08% of the high-risk patient's non-survival vs 24.7% for the DL model and 4.65% vs 19.7% of the high-risk patient's non-survival for NSTEMI. Age, heart rate, Killip class, cardiac catheterization, oral hypoglycemia use and antiarrhythmic agent were found to be common predictors of in-hospital mortality across all ML feature selection models in this study. The final algorithm was converted into an online tool with a database for continuous data archiving for prospective validation. CONCLUSIONS ACS patients were better classified using a combination of machine learning and deep learning in a multi-ethnic Asian population when compared to TIMI scoring. Machine learning enables the identification of distinct factors in individual Asian populations to improve mortality prediction. Continuous testing and validation will allow for better risk stratification in the future, potentially altering management and outcomes.
Collapse
Affiliation(s)
- Sazzli Kasim
- Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- Cardiac Vascular and Lung Research Institute, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
- Faculty of Medicine, Universiti Teknologi MARA (UiTM), Sungai Buloh Campus, Sungai Buloh, Malaysia
| | - Sorayya Malek
- Bioinformatics Division, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
- * E-mail:
| | - Cheen Song
- Bioinformatics Division, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Wan Azman Wan Ahmad
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
- Division of Cardiology, University Malaya Medical Centre, Kuala Lumpur, Malaysia
| | - Alan Fong
- Sarawak Heart Centre, Kota Samarahan, Sarawak, Malaysia
- Clinical Research Centre, Sarawak General Hospital, Institute for Clinical Research, National Institutes of Health, Jalan Hospital, Kuching, Sarawak, Malaysia
- Swinburne University of Technology, Sarawak Campus, Kuching, Malaysia
| | - Khairul Shafiq Ibrahim
- Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- Cardiac Vascular and Lung Research Institute, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
| | - Muhammad Shahreeza Safiruz
- Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
| | - Firdaus Aziz
- Bioinformatics Division, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Jia Hui Hiew
- Bioinformatics Division, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Nurulain Ibrahim
- Faculty of Medicine, Universiti Teknologi MARA (UiTM), Sungai Buloh Campus, Sungai Buloh, Malaysia
| |
Collapse
|
7
|
Kasim S, Malek S, Cheen S, Safiruz MS, Ahmad WAW, Ibrahim KS, Aziz F, Negishi K, Ibrahim N. In-hospital risk stratification algorithm of Asian elderly patients. Sci Rep 2022; 12:17592. [PMID: 36266376 PMCID: PMC9584943 DOI: 10.1038/s41598-022-18839-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 08/22/2022] [Indexed: 01/13/2023] Open
Abstract
Limited research has been conducted in Asian elderly patients (aged 65 years and above) for in-hospital mortality prediction after an ST-segment elevation myocardial infarction (STEMI) using Deep Learning (DL) and Machine Learning (ML). We used DL and ML to predict in-hospital mortality in Asian elderly STEMI patients and compared it to a conventional risk score for myocardial infraction outcomes. Malaysia's National Cardiovascular Disease Registry comprises an ethnically diverse Asian elderly population (3991 patients). 50 variables helped in establishing the in-hospital death prediction model. The TIMI score was used to predict mortality using DL and feature selection methods from ML algorithms. The main performance metric was the area under the receiver operating characteristic curve (AUC). The DL and ML model constructed using ML feature selection outperforms the conventional risk scoring score, TIMI (AUC 0.75). DL built from ML features (AUC ranging from 0.93 to 0.95) outscored DL built from all features (AUC 0.93). The TIMI score underestimates mortality in the elderly. TIMI predicts 18.4% higher mortality than the DL algorithm (44.7%). All ML feature selection algorithms identify age, fasting blood glucose, heart rate, Killip class, oral hypoglycemic agent, systolic blood pressure, and total cholesterol as common predictors of mortality in the elderly. In a multi-ethnic population, DL outperformed the TIMI risk score in classifying elderly STEMI patients. ML improves death prediction by identifying separate characteristics in older Asian populations. Continuous testing and validation will improve future risk classification, management, and results.
Collapse
Affiliation(s)
- Sazzli Kasim
- grid.412259.90000 0001 2161 1343Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia ,grid.412259.90000 0001 2161 1343Cardiac Vascular and Lung Research Institute, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia ,National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia ,grid.412259.90000 0001 2161 1343Faculty of Medicine, Universiti Teknologi MARA (UiTM), Sungai Buloh Campus, Sungai Buloh, Malaysia
| | - Sorayya Malek
- grid.10347.310000 0001 2308 5949Bioinformatics Division, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Song Cheen
- grid.10347.310000 0001 2308 5949Bioinformatics Division, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Muhammad Shahreeza Safiruz
- grid.10347.310000 0001 2308 5949Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
| | - Wan Azman Wan Ahmad
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia ,grid.413018.f0000 0000 8963 3111Division of Cardiology, University Malaya Medical Centre, Kuala Lumpur, Malaysia
| | - Khairul Shafiq Ibrahim
- grid.412259.90000 0001 2161 1343Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia ,grid.412259.90000 0001 2161 1343Cardiac Vascular and Lung Research Institute, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia ,National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
| | - Firdaus Aziz
- grid.10347.310000 0001 2308 5949Bioinformatics Division, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Kazuaki Negishi
- grid.1013.30000 0004 1936 834XSydney Medical School Nepean, Faculty of Medicine and Health, Charles Perkins Centre Nepean, The University of Sydney, Sydney, NSW Australia ,grid.413243.30000 0004 0453 1183Nepean Hospital, Sydney, NSW Australia
| | - Nurulain Ibrahim
- grid.412259.90000 0001 2161 1343Faculty of Medicine, Universiti Teknologi MARA (UiTM), Sungai Buloh Campus, Sungai Buloh, Malaysia
| |
Collapse
|
8
|
Multichannel Acoustic Spectroscopy of the Human Body for Inviolable Biometric Authentication. BIOSENSORS 2022; 12:bios12090700. [PMID: 36140085 PMCID: PMC9496529 DOI: 10.3390/bios12090700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/26/2022] [Accepted: 08/29/2022] [Indexed: 11/17/2022]
Abstract
Specific features of the human body, such as fingerprint, iris, and face, are extensively used in biometric authentication. Conversely, the internal structure and material features of the body have not been explored extensively in biometrics. Bioacoustics technology is suitable for extracting information about the internal structure and biological and material characteristics of the human body. Herein, we report a biometric authentication method that enables multichannel bioacoustic signal acquisition with a systematic approach to study the effects of selectively distilled frequency features, increasing the number of sensing channels with respect to multiple fingers. The accuracy of identity recognition according to the number of sensing channels and the number of selectively chosen frequency features was evaluated using exhaustive combination searches and forward-feature selection. The technique was applied to test the accuracy of machine learning classification using 5,232 datasets from 54 subjects. By optimizing the scanning frequency and sensing channels, our method achieved an accuracy of 99.62%, which is comparable to existing biometric methods. Overall, the proposed biometric method not only provides an unbreakable, inviolable biometric but also can be applied anywhere in the body and can substantially broaden the use of biometrics by enabling continuous identity recognition on various body parts for biometric identity authentication.
Collapse
|
9
|
Hybrid Machine Learning Approach for Gully Erosion Mapping Susceptibility at a Watershed Scale. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2022. [DOI: 10.3390/ijgi11070401] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Gully erosion is a serious threat to the state of ecosystems all around the world. As a result, safeguarding the soil for our own benefit and from our own actions is a must for guaranteeing the long-term viability of a variety of ecosystem services. As a result, developing gully erosion susceptibility maps (GESM) is both suggested and necessary. In this study, we compared the effectiveness of three hybrid machine learning (ML) algorithms with the bivariate statistical index frequency ratio (FR), named random forest-frequency ratio (RF-FR), support vector machine-frequency ratio (SVM-FR), and naïve Bayes-frequency ratio (NB-FR), in mapping gully erosion in the GHISS watershed in the northern part of Morocco. The models were implemented based on the inventory mapping of a total number of 178 gully erosion points randomly divided into 2 groups (70% of points were used for training the models and 30% of points were used for the validation process), and 12 conditioning variables (i.e., elevation, slope, aspect, plane curvature, topographic moisture index (TWI), stream power index (SPI), precipitation, distance to road, distance to stream, drainage density, land use, and lithology). Using the equal interval reclassification method, the spatial distribution of gully erosion was categorized into five different classes, including very high, high, moderate, low, and very low. Our results showed that the very high susceptibility classes derived using RF-FR, SVM-FR, and NB-FR models covered 25.98%, 22.62%, and 27.10% of the total area, respectively. The area under the receiver (AUC) operating characteristic curve, precision, and accuracy were employed to evaluate the performance of these models. Based on the receiver operating characteristic (ROC), the results showed that the RF-FR achieved the best performance (AUC = 0.91), followed by SVM-FR (AUC = 0.87), and then NB-FR (AUC = 0.82), respectively. Our contribution, in line with the Sustainable Development Goals (SDGs), plays a crucial role for understanding and identifying the issue of “where and why” gully erosion occurs, and hence it can serve as a first pathway to reducing gully erosion in this particular area.
Collapse
|
10
|
Kleinerman A, Rosenfeld A, Rosemarin H. Machine-learning based routing of callers in an Israeli mental health hotline. Isr J Health Policy Res 2022; 11:25. [PMID: 35659290 PMCID: PMC9164346 DOI: 10.1186/s13584-022-00534-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 05/19/2022] [Indexed: 11/15/2022] Open
Abstract
Background Mental health contact centers (also known as Hotlines) offer crisis intervention and counselling by phone calls and online chats. These mental health helplines have shown great success in improving the mental state of the callers, and are increasingly becoming popular in Israel and worldwide. Unfortunately, our knowledge about how to conduct successful routing of callers to counselling agents has been limited due to lack of large-scale data with labeled outcomes of the interactions. To date, many of these contact centers are overwhelmed by chat requests and operate in a simple first-come-first-serve (FCFS) scheduling policy which, combined, may lead to many callers receiving suboptimal counselling or abandoning the service before being treated. In this work our goal is to improve the efficiency of mental health contact centers by using a novel machine-learning based routing policy. Methods We present a large-scale machine learning-based analysis of real-world data from the online contact center of ERAN, the Israeli Association for Emotional First Aid. The data includes over 35,000 conversations over a 2-years period. Based on this analysis, we present a novel call routing method, that integrates advanced AI-techniques including the Monte Carlo tree search algorithm. We conducted an experiment that included various realistic simulations of incoming calls to contact centers, based on data from ERAN. We divided the simulations into two common settings: standard call flow and heavy call flow. In order to establish a baseline, we compared our proposed solution to two baseline methods: (1) The FCFS method; and (2) a greedy solution based on machine learning predictions. Our comparison focuses on two metrics - the number of calls served and the average feedback of the callers (i.e., quality of the chats). Results In the preliminary analysis, we identify indicative features that significantly contribute to the effectiveness of a conversation and demonstrate high accuracy in predicting the expected duration and the callers’ feedback. In the routing methods evaluation, we find that in heavy call flow settings, our proposed method significantly outperforms the other methods in both the quantity of served calls and average feedback. Most notably, we find that in the heavy call flow settings, our method improves the average feedback by 24% compared to FCFS and by 4% compared to the greedy solution. Regarding the standard-flow setting, we find that our proposed method significantly outperforms the FCFS method in the callers’ average feedback with a 12% improvement. However, in this setting, we did not find a significant difference between all methods in the quantity of served-calls and no significant difference was found between our proposed method and the greedy solution. Conclusion The proposed routing policy has the potential to significantly improve the performance of mental health contact centers, especially in peak hours. Leveraging artificial intelligence techniques, such as machine learning algorithms, combined with real-world data can bring about a significant and necessary leap forward in the way mental health hotlines operate and consequently reduce the burden of mental illnesses on health systems. However, implementation and evaluation in an operational contact center is necessary in order to verify that the results replicate in practice.
Collapse
|
11
|
Forensic Analysis on Internet of Things (IoT) Device Using Machine-to-Machine (M2M) Framework. ELECTRONICS 2022. [DOI: 10.3390/electronics11071126] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The versatility of IoT devices increases the probability of continuous attacks on them. The low processing power and low memory of IoT devices have made it difficult for security analysts to keep records of various attacks performed on these devices during forensic analysis. The forensic analysis estimates how much damage has been done to the devices due to various attacks. In this paper, we have proposed an intelligent forensic analysis mechanism that automatically detects the attack performed on IoT devices using a machine-to-machine (M2M) framework. Further, the M2M framework has been developed using different forensic analysis tools and machine learning to detect the type of attacks. Additionally, the problem of an evidence acquisition (attack on IoT devices) has been resolved by introducing a third-party logging server. Forensic analysis is also performed on logs using forensic server (security onion) to determine the effect and nature of the attacks. The proposed framework incorporates different machine learning (ML) algorithms for the automatic detection of attacks. The performance of these models is measured in terms of accuracy, precision, recall, and F1 score. The results indicate that the decision tree algorithm shows the optimum performance as compared to the other algorithms. Moreover, comprehensive performance analysis and results presented validate the proposed model.
Collapse
|
12
|
Kaur K, Singh P. Impact of Feature Extraction and Feature Selection Algorithms on Punjabi Speech Emotion Recognition Using Convolutional Neural Network. ACM T ASIAN LOW-RESO 2022. [DOI: 10.1145/3511888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The challenge to refine the spontaneity and productivity of a machine and human coherence, speech emotion recognition has been an overriding area of research. The trustability and fulfillment of such emotion recognition are largely involved with the feature extraction and selection processes. An important role is played in exploring and distinguishing audio content during the feature extraction phase. Also, the features that have been extracted should be tough to a number of disturbances and reliable enough for an adequate classification system. This paper focuses on three main components of a Speech Emotion Recognition (SER) Process. The first one is the optimal feature extraction method for Punjabi SER system. The second one is the use of an appropriate feature selection method that desires to select effectual features from the ones extracted in the first step, and removes the redundant features, to improve the conduct of emotion recognition. The third one is the classification model that has been used further for emotion recognition. So, the scope of this paper is to explain the three main steps of Punjabi SER system, feature extraction, feature selection, and emotion recognition with classifier. The results have been calculated and compared for number of feature set combinations, with and without feature selection process. A total of 10 experiments are carried out and various performance metrics such as precision, recall, F1-score, accuracy, etc. are used to demonstrate the results.
Collapse
Affiliation(s)
- Kamaldeep Kaur
- Research Scholar, IKG Punjab Technical University, Punjab, India and Department of Computer Science & Engineering, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India
| | - Parminder Singh
- Department of Computer Science & Engineering, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India
| |
Collapse
|
13
|
Gong L, Xu M, Fang M, He B, Li H, Fang X, Dong D, Tian J. The potential of prostate gland radiomic features in identifying the gleason score. Comput Biol Med 2022; 144:105318. [DOI: 10.1016/j.compbiomed.2022.105318] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 02/08/2022] [Accepted: 02/09/2022] [Indexed: 12/17/2022]
|
14
|
Alp N, Ozkan H. Neural correlates of integration processes during dynamic face perception. Sci Rep 2022; 12:118. [PMID: 34996892 PMCID: PMC8742062 DOI: 10.1038/s41598-021-02808-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 11/22/2021] [Indexed: 11/10/2022] Open
Abstract
Integrating the spatiotemporal information acquired from the highly dynamic world around us is essential to navigate, reason, and decide properly. Although this is particularly important in a face-to-face conversation, very little research to date has specifically examined the neural correlates of temporal integration in dynamic face perception. Here we present statistically robust observations regarding the brain activations measured via electroencephalography (EEG) that are specific to the temporal integration. To that end, we generate videos of neutral faces of individuals and non-face objects, modulate the contrast of the even and odd frames at two specific frequencies (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$f_1$$\end{document}f1 and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$f_2$$\end{document}f2) in an interlaced manner, and measure the steady-state visual evoked potential as participants view the videos. Then, we analyze the intermodulation components (IMs: (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$nf_1\pm mf_2$$\end{document}nf1±mf2), a linear combination of the fundamentals with integer multipliers) that consequently reflect the nonlinear processing and indicate temporal integration by design. We show that electrodes around the medial temporal, inferior, and medial frontal areas respond strongly and selectively when viewing dynamic faces, which manifests the essential processes underlying our ability to perceive and understand our social world. The generation of IMs is only possible if even and odd frames are processed in succession and integrated temporally, therefore, the strong IMs in our frequency spectrum analysis show that the time between frames (1/60 s) is sufficient for temporal integration.
Collapse
Affiliation(s)
- Nihan Alp
- Psychology, Sabanci University, Istanbul, Turkey.
| | - Huseyin Ozkan
- Electronics Engineering, Sabanci University, Istanbul, Turkey
| |
Collapse
|
15
|
Abstract
We apply the Support Vector Regression (SVR) machine learning model to estimate surface roughness on a large alluvial fan of the Kosi River in the Himalayan Foreland from satellite images. To train the model, we used input features such as radar backscatter values in Vertical–Vertical (VV) and Vertical–Horizontal (VH) polarisation, incidence angle from Sentinel-1, Normalised Difference Vegetation Index (NDVI) from Sentinel-2, and surface elevation from Shuttle Radar Topographic Mission (SRTM). We generated additional features (VH/VV and VH–VV) through a linear data fusion of the existing features. For the training and validation of our model, we conducted a field campaign during 11–20 December 2019. We measured surface roughness at 78 different locations over the entire fan surface using an in-house-developed mechanical pin-profiler. We used the regression tree ensemble approach to assess the relative importance of individual input feature to predict the surface soil roughness from SVR model. We eliminated the irrelevant input features using an iterative backward elimination approach. We then performed feature sensitivity to evaluate the riskiness of the selected features. Finally, we applied the dimension reduction and scaling to minimise the data redundancy and bring them to a similar level. Based on these, we proposed five SVR methods (PCA-NS-SVR, PCA-CM-SVR, PCA-ZM-SVR, PCA-MM-SVR, and PCA-S-SVR). We trained and evaluated the performance of all variants of SVR with a 60:40 ratio using the input features and the in-situ surface roughness. We compared the performance of SVR models with six different benchmark machine learning models (i.e., Gaussian Process Regression (GPR), Generalised Regression Neural Network (GRNN), Binary Decision Tree (BDT), Bragging Ensemble Learning, Boosting Ensemble Learning, and Automated Machine Learning (AutoML)). We observed that the PCA-MM-SVR perform better with a coefficient of correlation (R = 0.74), Root Mean Square Error (RMSE = 0.16 cm), and Mean Square Error (MSE = 0.025 cm2). To ensure a fair selection of the machine learning model, we evaluated the Akaike’s Information Criterion (AIC), corrected AIC (AICc), and Bayesian Information Criterion (BIC). We observed that SVR exhibits the lowest values of AIC, corrected AIC, and BIC of all the other methods; this indicates the best goodness-of-fit. Eventually, we also compared the result of PCA-MM-SVR with the surface roughness estimated from different empirical and semi-empirical radar backscatter models. The accuracy of the PCA-MM-SVR model is better than the backscatter models. This study provides a robust approach to measure surface roughness at high spatial and temporal resolutions solely from the satellite data.
Collapse
|
16
|
Enhancement of Radiosurgical Treatment Outcome Prediction Using MRI Radiomics in Patients with Non-Small Cell Lung Cancer Brain Metastases. Cancers (Basel) 2021; 13:cancers13164030. [PMID: 34439186 PMCID: PMC8392266 DOI: 10.3390/cancers13164030] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 07/01/2021] [Accepted: 08/09/2021] [Indexed: 12/24/2022] Open
Abstract
Simple Summary Non-small cell lung cancer (NSCLC) is the most common cause of brain metastasis (BM). Approximately 50% of patients with metastatic NSCLC harbor BMs. Within the past decade, Gamma Knife radiosurgery (GKRS) has become one of the first-line treatments for BMs. Ability to predict treatment response after GKRS can therefore guide treatment strategy. This study aimed to determine whether pre-radiosurgical neuroimaging radiomics can predict survival and local tumor control after GKRS. Based on the collected magnetic resonance images and clinical characteristics of the 237 NSCLC BM patients with BMs (for survival prediction) and 256 NSCLC patients with 976 BMs (for prediction of local tumor control), we concluded that the identified radiomic features could provide valuable additional information to enhance the prediction of BM responses after GKRS. The proposed approach provided physicians with an intuitive way to predict the patient outcome based on pre-radiosurgical magnetic resonance images. Abstract The diagnosis of brain metastasis (BM) is commonly observed in non-small cell lung cancer (NSCLC) with poor outcomes. Accordingly, developing an approach to early predict BM response to Gamma Knife radiosurgery (GKRS) may benefit the patient treatment and monitoring. A total of 237 NSCLC patients with BMs (for survival prediction) and 256 patients with 976 BMs (for prediction of local tumor control) treated with GKRS were retrospectively analyzed. All the survival data were recorded without censoring, and the status of local tumor control was determined by comparing the last MRI follow-up in patients’ lives with the pre-GKRS MRI. Overall 1763 radiomic features were extracted from pre-radiosurgical magnetic resonance images. Three prediction models were constructed, using (1) clinical data, (2) radiomic features, and (3) clinical and radiomic features. Support vector machines with a 30% hold-out validation approach were constructed. For treatment outcome predictions, the models derived from both the clinical and radiomics data achieved the best results. For local tumor control, the combined model achieved an area under the curve (AUC) of 0.95, an accuracy of 90%, a sensitivity of 91%, and a specificity of 89%. For patient survival, the combined model achieved an AUC of 0.81, an accuracy of 77%, a sensitivity of 78%, and a specificity of 80%. The pre-radiosurgical radiomics data enhanced the performance of local tumor control and survival prediction models in NSCLC patients with BMs treated with GRKS. An outcome prediction model based on radiomics combined with clinical features may guide therapy in these patients.
Collapse
|
17
|
Aziz F, Malek S, Ibrahim KS, Raja Shariff RE, Wan Ahmad WA, Ali RM, Liu KT, Selvaraj G, Kasim S. Short- and long-term mortality prediction after an acute ST-elevation myocardial infarction (STEMI) in Asians: A machine learning approach. PLoS One 2021; 16:e0254894. [PMID: 34339432 PMCID: PMC8328310 DOI: 10.1371/journal.pone.0254894] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 07/07/2021] [Indexed: 12/22/2022] Open
Abstract
Background Conventional risk score for predicting short and long-term mortality following an ST-segment elevation myocardial infarction (STEMI) is often not population specific. Objective Apply machine learning for the prediction and identification of factors associated with short and long-term mortality in Asian STEMI patients and compare with a conventional risk score. Methods The National Cardiovascular Disease Database for Malaysia registry, of a multi-ethnic, heterogeneous Asian population was used for in-hospital (6299 patients), 30-days (3130 patients), and 1-year (2939 patients) model development. 50 variables were considered. Mortality prediction was analysed using feature selection methods with machine learning algorithms and compared to Thrombolysis in Myocardial Infarction (TIMI) score. Invasive management of varying degrees was selected as important variables that improved mortality prediction. Results Model performance using a complete and reduced variable produced an area under the receiver operating characteristic curve (AUC) from 0.73 to 0.90. The best machine learning model for in-hospital, 30 days, and 1-year outperformed TIMI risk score (AUC = 0.88, 95% CI: 0.846–0.910; vs AUC = 0.81, 95% CI:0.772–0.845, AUC = 0.90, 95% CI: 0.870–0.935; vs AUC = 0.80, 95% CI: 0.746–0.838, AUC = 0.84, 95% CI: 0.798–0.872; vs AUC = 0.76, 95% CI: 0.715–0.802, p < 0.0001 for all). TIMI score underestimates patients’ risk of mortality. 90% of non-survival patients are classified as high risk (>50%) by machine learning algorithm compared to 10–30% non-survival patients by TIMI. Common predictors identified for short- and long-term mortality were age, heart rate, Killip class, fasting blood glucose, prior primary PCI or pharmaco-invasive therapy and diuretics. The final algorithm was converted into an online tool with a database for continuous data archiving for algorithm validation. Conclusions In a multi-ethnic population, patients with STEMI were better classified using the machine learning method compared to TIMI scoring. Machine learning allows for the identification of distinct factors in individual Asian populations for better mortality prediction. Ongoing continuous testing and validation will allow for better risk stratification and potentially alter management and outcomes in the future.
Collapse
Affiliation(s)
- Firdaus Aziz
- Bioinformatics Division, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Sorayya Malek
- Bioinformatics Division, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
- * E-mail: (SM); (SK)
| | - Khairul Shafiq Ibrahim
- Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- Cardiac Vascular and Lung Research Institute, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
| | - Raja Ezman Raja Shariff
- Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- Cardiac Vascular and Lung Research Institute, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
| | - Wan Azman Wan Ahmad
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
- Division of Cardiology, University Malaya Medical Centre, Kuala Lumpur, Malaysia
| | - Rosli Mohd Ali
- Cardiac Vascular Sentral Kuala Lumpur, Kuala Lumpur, Malaysia
| | - Kien Ting Liu
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
| | - Gunavathy Selvaraj
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
| | - Sazzli Kasim
- Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- Cardiac Vascular and Lung Research Institute, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
- * E-mail: (SM); (SK)
| |
Collapse
|
18
|
Gao Y, Yan P, Kruger U, Cavuoto L, Schwaitzberg S, De S, Intes X. Functional Brain Imaging Reliably Predicts Bimanual Motor Skill Performance in a Standardized Surgical Task. IEEE Trans Biomed Eng 2021; 68:2058-2066. [PMID: 32755850 PMCID: PMC8265734 DOI: 10.1109/tbme.2020.3014299] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Currently, there is a dearth of objective metrics for assessing bi-manual motor skills, which are critical for high-stakes professions such as surgery. Recently, functional near-infrared spectroscopy (fNIRS) has been shown to be effective at classifying motor task types, which can be potentially used for assessing motor performance level. In this work, we use fNIRS data for predicting the performance scores in a standardized bi-manual motor task used in surgical certification and propose a deep-learning framework 'Brain-NET' to extract features from the fNIRS data. Our results demonstrate that the Brain-NET is able to predict bi-manual surgical motor skills based on neuroimaging data accurately ( R2=0.73). Furthermore, the classification ability of the Brain-NET model is demonstrated based on receiver operating characteristic (ROC) curves and area under the curve (AUC) values of 0.91. Hence, these results establish that fNIRS associated with deep learning analysis is a promising method for a bedside, quick and cost-effective assessment of bi-manual skill levels.
Collapse
|
19
|
Nieveen J, Brinton M, Warren DJ, Mathews VJ. A Nonlinear Latching Filter to Remove Jitter From Movement Estimates for Prostheses. IEEE Trans Neural Syst Rehabil Eng 2021; 28:2849-2858. [PMID: 33201823 DOI: 10.1109/tnsre.2020.3038706] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Continuous movement intent decoders are critical for precise control of hand and wrist prostheses. Noise in biological signals (e.g., myoelectric or neural signals) can lead to undesirable jitter in the output of these types of decoders. A low-pass filter (LPF) at the output of the decoder effectively reduces jitter, but also substantially slows intended movements. This paper introduces an alternative, the latching filter (LF), a recursive, nonlinear filter that provides smoothing of small-amplitude jitter but allows quick changes to its output in response to large input changes. The performance of a Kalman filter (KF) decoder smoothed with an LF is compared with that of both an KF decoder without an additional smoother and a KF decoder smoothed with a LPF. These three algorithms were tested in real-time on target holding and target reaching tasks using surface electromyographic signals recorded from 5 non-amputee subjects, and intramuscular electromyographic and peripheral neural signals recorded from an amputee subject. When compared with the LPF, the LF provided a statistically significant improvement in amputee and non-amputee subjects' ability to hold the hand steady at requested positions and achieve movement goals faster. The KF decoder with LF provided a statistically significant improvement in all subjects' ability to hold the prosthetic hand steady, with only slightly lower speeds, when compared to the unsmoothed KF.
Collapse
|
20
|
Pradhan D, Sahoo B, Misra BB, Padhy S. A multiclass SVM classifier with teaching learning based feature subset selection for enzyme subclass classification. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106664] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
21
|
Near-infrared Spectroscopy and Hyperspectral Imaging for Sugar Content Evaluation in Potatoes over Multiple Growing Seasons. FOOD ANAL METHOD 2020. [DOI: 10.1007/s12161-020-01886-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
AbstractSugar content is one of the most important properties of potato tubers as it directly affects their processing and the final product quality, especially for fried products. In this study, data obtained from spectroscopic (interactance and reflectance) and hyperspectral imaging systems were used individually or fused to develop non-cultivar nor growing season-specific regression and classification models for potato tubers based on glucose and sucrose concentration. Data was acquired over three growing seasons for two potato cultivars. The most influential wavelengths were selected from the imaging systems using interval partial least squares for regression and sequential forward selection for classification. Hyperspectral imaging showed the highest regression performance for glucose with a correlation coefficient (ratio of performance to deviation) or r(RPD) of 91.8(2.41) which increased to 94%(2.91) when the data was fused with the interactance data. The sucrose regression results had the highest accuracy using data obtained from the interactance system with r(RPD) values of 74.5%(1.40) that increased to 84.4%(1.82) when the data was fused with the reflectance data. Classification was performed to identify tubers with either high or low sugar content. Classification performance showed accuracy values as high as 95% for glucose and 80.1% for sucrose using hyperspectral imaging, with no noticeable improvement when data was fused from the other spectroscopic systems. When testing the robustness of the developed models over different seasons, it was found that the regression models had r(RPD) values of 55(1.19)–90.3%(2.34) for glucose and 35.8(1.07)–82.2%(1.29) for sucrose. Results obtained in this study demonstrate the feasibility of developing a rapid monitoring system using multispectral imaging and data fusion methods for online evaluation of potato sugar content.
Collapse
|
22
|
Hou Y, Li X, Zheng Y, Zhou J, Tan J, Chen X. A Method for Detecting the Randomness of Barkhausen Noise in a Material Fatigue Test Using Sensitivity and Uncertainty Analysis. SENSORS 2020; 20:s20185383. [PMID: 32962228 PMCID: PMC7571059 DOI: 10.3390/s20185383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 09/04/2020] [Accepted: 09/17/2020] [Indexed: 11/16/2022]
Abstract
The magnetic Barkhausen noise (MBN) signal provides interesting clues about the evolution of microstructure of the magnetic material (internal stresses, level of degradation, etc.). This makes it widely used in non-destructive evaluation of ferromagnetic materials. Although researchers have made great effort to explore the intrinsic random characteristics and stable features of MBN signals, they have failed to provide a deterministic definition of the stochastic quality of the MBN signals. Because many features are not reproducible, there is no quantitative description for the stochastic nature of MBN, and no uniform standards to evaluate performance of features. We aim to make further study on the stochastic characteristics of MBN signal and transform it into the quantification of signal uncertainty and sensitivity, to solve the above problems for fatigue state prediction. In the case of parameter uncertainty in the prediction model, a prior approximation method was proposed. Thus, there are two distinct sources of uncertainty: feature(observation) uncertainty and model uncertainty were discussed. We define feature uncertainty from the perspective of a probability distribution using a confidence interval sensitivity analysis, and uniformly quantize and re-parameterize the feature matrix from the feature probability distribution space. We also incorporate informed priors into the estimation process by optimizing the Kullback-Leibler divergence between prior and posterior distribution, approximating the prior to the posterior. Thus, in an insufficient data situation, informed priors can improve prediction accuracy. Experiments prove that our proposed confidence interval sensitivity analysis to capture feature uncertainty has the potential to determine the instability in MBN signals quantitatively and reduce the dispersion of features, so that all features can produce positive additive effects. The false prediction rate can be reduced to almost 0. The proposed priors can not only measure model parameter uncertainties but also show superior performance similar to that of maximum likelihood estimation (MLE). The results also show that improvements in parameter uncertainties cannot be directly propagated to improve prediction uncertainties.
Collapse
Affiliation(s)
- Yuting Hou
- School of Aeronautics and Astronautics, University of Electronic Science and Technology of China, Chengdu 611731, China; (Y.H.); (X.C.)
| | - Xiang Li
- School of Aeronautics and Astronautics, University of Electronic Science and Technology of China, Chengdu 611731, China; (Y.H.); (X.C.)
- Correspondence:
| | - Yang Zheng
- China Special Equipment Inspection and Research Institute, Beijing 100029, China; (Y.Z.); (J.T.)
| | - Jinjie Zhou
- School of Mechanical Engineering, North University of China, Taiyuan 030051, China;
| | - Jidong Tan
- China Special Equipment Inspection and Research Institute, Beijing 100029, China; (Y.Z.); (J.T.)
| | - Xiaoping Chen
- School of Aeronautics and Astronautics, University of Electronic Science and Technology of China, Chengdu 611731, China; (Y.H.); (X.C.)
| |
Collapse
|
23
|
Tang J, Wang Y, Luo Y, Fu J, Zhang Y, Li Y, Xiao Z, Lou Y, Qiu Y, Zhu F. Computational advances of tumor marker selection and sample classification in cancer proteomics. Comput Struct Biotechnol J 2020; 18:2012-2025. [PMID: 32802273 PMCID: PMC7403885 DOI: 10.1016/j.csbj.2020.07.009] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 07/06/2020] [Accepted: 07/08/2020] [Indexed: 12/11/2022] Open
Abstract
Cancer proteomics has become a powerful technique for characterizing the protein markers driving transformation of malignancy, tracing proteome variation triggered by therapeutics, and discovering the novel targets and drugs for the treatment of oncologic diseases. To facilitate cancer diagnosis/prognosis and accelerate drug target discovery, a variety of methods for tumor marker identification and sample classification have been developed and successfully applied to cancer proteomic studies. This review article describes the most recent advances in those various approaches together with their current applications in cancer-related studies. Firstly, a number of popular feature selection methods are overviewed with objective evaluation on their advantages and disadvantages. Secondly, these methods are grouped into three major classes based on their underlying algorithms. Finally, a variety of sample separation algorithms are discussed. This review provides a comprehensive overview of the advances on tumor maker identification and patients/samples/tissues separations, which could be guidance to the researches in cancer proteomics.
Collapse
Key Words
- ANN, Artificial Neural Network
- ANOVA, Analysis of Variance
- CFS, Correlation-based Feature Selection
- Cancer proteomics
- Computational methods
- DAPC, Discriminant Analysis of Principal Component
- DT, Decision Trees
- EDA, Estimation of Distribution Algorithm
- FC, Fold Change
- GA, Genetic Algorithms
- GR, Gain Ratio
- HC, Hill Climbing
- HCA, Hierarchical Cluster Analysis
- IG, Information Gain
- LDA, Linear Discriminant Analysis
- LIMMA, Linear Models for Microarray Data
- MBF, Markov Blanket Filter
- MWW, Mann–Whitney–Wilcoxon test
- OPLS-DA, Orthogonal Partial Least Squares Discriminant Analysis
- PCA, Principal Component Analysis
- PLS-DA, Partial Least Square Discriminant Analysis
- RF, Random Forest
- RF-RFE, Random Forest with Recursive Feature Elimination
- SA, Simulated Annealing
- SAM, Significance Analysis of Microarrays
- SBE, Sequential Backward Elimination
- SFS, and Sequential Forward Selection
- SOM, Self-organizing Map
- SU, Symmetrical Uncertainty
- SVM, Support Vector Machine
- SVM-RFE, Support Vector Machine with Recursive Feature Elimination
- Sample classification
- Tumor marker selection
- sPLSDA, Sparse Partial Least Squares Discriminant Analysis
- t-SNE, Student t Distribution
- χ2, Chi-square
Collapse
Affiliation(s)
- Jing Tang
- Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yang Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,School of Pharmaceutical Sciences and Innovative Drug Research Centre, Chongqing University, Chongqing 401331, China
| | - Yi Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziyu Xiao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yan Lou
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Yunqing Qiu
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Feng Zhu
- Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
24
|
Nhu VH, Shirzadi A, Shahabi H, Singh SK, Al-Ansari N, Clague JJ, Jaafari A, Chen W, Miraki S, Dou J, Luu C, Górski K, Thai Pham B, Nguyen HD, Ahmad BB. Shallow Landslide Susceptibility Mapping: A Comparison between Logistic Model Tree, Logistic Regression, Naïve Bayes Tree, Artificial Neural Network, and Support Vector Machine Algorithms. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E2749. [PMID: 32316191 PMCID: PMC7215797 DOI: 10.3390/ijerph17082749] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 04/09/2020] [Accepted: 04/13/2020] [Indexed: 11/21/2022]
Abstract
Shallow landslides damage buildings and other infrastructure, disrupt agriculture practices, and can cause social upheaval and loss of life. As a result, many scientists study the phenomenon, and some of them have focused on producing landslide susceptibility maps that can be used by land-use managers to reduce injury and damage. This paper contributes to this effort by comparing the power and effectiveness of five machine learning, benchmark algorithms-Logistic Model Tree, Logistic Regression, Naïve Bayes Tree, Artificial Neural Network, and Support Vector Machine-in creating a reliable shallow landslide susceptibility map for Bijar City in Kurdistan province, Iran. Twenty conditioning factors were applied to 111 shallow landslides and tested using the One-R attribute evaluation (ORAE) technique for modeling and validation processes. The performance of the models was assessed by statistical-based indexes including sensitivity, specificity, accuracy, mean absolute error (MAE), root mean square error (RMSE), and area under the receiver operatic characteristic curve (AUC). Results indicate that all the five machine learning models performed well for shallow landslide susceptibility assessment, but the Logistic Model Tree model (AUC = 0.932) had the highest goodness-of-fit and prediction accuracy, followed by the Logistic Regression (AUC = 0.932), Naïve Bayes Tree (AUC = 0.864), ANN (AUC = 0.860), and Support Vector Machine (AUC = 0.834) models. Therefore, we recommend the use of the Logistic Model Tree model in shallow landslide mapping programs in semi-arid regions to help decision makers, planners, land-use managers, and government agencies mitigate the hazard and risk.
Collapse
Affiliation(s)
- Viet-Ha Nhu
- Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City 758307, Vietnam;
- Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City 758307, Vietnam
| | - Ataollah Shirzadi
- Department of Rangeland and Watershed Management, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran;
| | - Himan Shahabi
- Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran;
- Board Member of Department of Zrebar Lake Environmental Research, Kurdistan Studies Institute, University of Kurdistan, Sanandaj 66177-15175, Iran
| | - Sushant K. Singh
- Virtusa Corporation, 10 Marshall Street, Irvington, NJ 07111, USA;
| | - Nadhir Al-Ansari
- Department of Civil, Environmental and Natural Resources Engineering, Lulea University of Technology, 971 87 Lulea, Sweden
| | - John J. Clague
- Department of Earth Sciences, Simon Fraser University, Burnaby, BC V5A 1S6, Canada;
| | - Abolfazl Jaafari
- Research Institute of Forests and Rangelands, Agricultural Research, Education, and Extension Organization (AREEO), Tehran 13185-116, Iran;
| | - Wei Chen
- College of Geology & Environment, Xi’an University of Science and Technology, Xi’an 710054, China;
- Key Laboratory of Coal Resources Exploration and Comprehensive Utilization, Ministry of Natural Resources, Xi’an 710021, China
| | - Shaghayegh Miraki
- Department of Watershed Sciences Engineering, Faculty of Natural Resources, University of Agricultural Science and Natural Resources of Sari, Mazandaran 48181-68984, Iran;
| | - Jie Dou
- Department of Civil and Environmental Engineering, Nagaoka University of Technology, 1603-1, Kami-Tomioka, Nagaoka, Niigata 940-2188, Japan;
| | - Chinh Luu
- Faculty of Hydraulic Engineering, National University of Civil Engineering, Hanoi 112000, Vietnam;
| | - Krzysztof Górski
- Faculty of Mechanical Engineering, Kazimierz Pulaski University of Technology and Humanities in Radom, Chrobrego 45 Street, 26-200 Radom, Poland;
| | - Binh Thai Pham
- Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam
| | - Huu Duy Nguyen
- Faculty of Geography, VNU University of Science, 334 Nguyen Trai, Ha Noi 100000, Vietnam;
| | - Baharin Bin Ahmad
- Faculty of Built Environment and Surveying, Universiti Teknologi Malaysia (UTM), Johor Bahru 81310, Malaysia;
| |
Collapse
|
25
|
Yang W, Wang W, Zhang R, Zhang F, Xiong Y, Wu T, Chen W, DU Y. A Modified Moving-Window Partial Least-Squares Method by Coupling with Sampling Error Profile Analysis for Variable Selection in Near-Infrared Spectral Analysis. ANAL SCI 2020; 36:303-309. [PMID: 31611474 DOI: 10.2116/analsci.19p283] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
In this study, a new variable selection method, named moving-window partial least-squares coupled with sampling error profile analysis (SEPA-MWPLS), is developed. With a moving window, moving-window partial least-squares (MWPLS) is used to find window intervals which show low residual sums of squares (RSS) of a calibration set. Sampling error profile analysis (SEPA) is a useful method based on Monte-Carlo Sampling and profile analysis for cross validation (CV). By combining MWPLS with SEPA, we can obtain more stable and reliable results. Besides, we simplify the plot of the RSS line so that it is easier to determine the informative intervals. In addition, a backward elimination strategy is used to optimize the combination of subintervals. The performance of SEPA-MWPLS was tested with two near-infrared (NIR) spectra datasets and was compared with PLS, MWPLS and Monte Carlo uninformative variable elimination (MC-UVE). The results show that SEPA-MWPLS can improve model performances significantly compared with MWPLS in the number of variables, root-mean-squared errors of CV, calibration and prediction (RMSECVs, RMSECs and RMSEPs). Meanwhile it also exhibits better performances than MC-UVE.
Collapse
Affiliation(s)
- Wuye Yang
- Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry & Molecular Engineering, East China University of Science and Technology
| | - Wenming Wang
- Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry & Molecular Engineering, East China University of Science and Technology
| | - Ruoqiu Zhang
- Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry & Molecular Engineering, East China University of Science and Technology
| | - Feiyu Zhang
- Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry & Molecular Engineering, East China University of Science and Technology
| | - Yinran Xiong
- Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry & Molecular Engineering, East China University of Science and Technology
| | - Ting Wu
- Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry & Molecular Engineering, East China University of Science and Technology
| | - Wanchao Chen
- Institute of Edible Fungi, Shanghai Academy of Agriculture Sciences, National Engineering Research Center of Edible Fungi, Key Laboratory of Edible Fungi Resources and Utilization (South), Ministry of Agriculture
| | - Yiping DU
- Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry & Molecular Engineering, East China University of Science and Technology
| |
Collapse
|
26
|
Rady A, Adedeji AA. Application of Hyperspectral Imaging and Machine Learning Methods to Detect and Quantify Adulterants in Minced Meats. FOOD ANAL METHOD 2020. [DOI: 10.1007/s12161-020-01719-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
27
|
Discretization and Feature Selection Based on Bias Corrected Mutual Information Considering High-Order Dependencies. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING 2020. [PMCID: PMC7206174 DOI: 10.1007/978-3-030-47426-3_64] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Mutual Information (MI) based feature selection methods are popular due to their ability to capture the nonlinear relationship among variables. However, existing works rarely address the error (bias) that occurs due to the use of finite samples during the estimation of MI. To the best of our knowledge, none of the existing methods address the bias issue for the high-order interaction term which is essential for better approximation of joint MI. In this paper, we first calculate the amount of bias of this term. Moreover, to select features using \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\chi ^2$$\end{document} based search, we also show that this term follows \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\chi ^2$$\end{document} distribution. Based on these two theoretical results, we propose Discretization and feature Selection based on bias corrected Mutual information (DSbM). DSbM is extended by adding simultaneous forward selection and backward elimination (DSbM\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$_\mathrm{fb}$$\end{document}). We demonstrate the superiority of DSbM over four state-of-the-art methods in terms of accuracy and the number of selected features on twenty benchmark datasets. Experimental results also demonstrate that DSbM outperforms the existing methods in terms of accuracy, Pareto Optimality and Friedman test. We also observe that compared to DSbM, in some dataset DSbM\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$_\mathrm{fb}$$\end{document} selects fewer features and increases accuracy.
Collapse
|
28
|
Wan H, Li JM, Ding H, Lin SX, Tu SQ, Tian XH, Hu JP, Chang S. An Overview of Computational Tools of Nucleic Acid Binding Site Prediction for Site-specific Proteins and Nucleases. Protein Pept Lett 2019; 27:370-384. [PMID: 31746287 DOI: 10.2174/0929866526666191028162302] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Revised: 05/24/2019] [Accepted: 09/24/2019] [Indexed: 12/26/2022]
Abstract
Understanding the interaction mechanism of proteins and nucleic acids is one of the most fundamental problems for genome editing with engineered nucleases. Due to some limitations of experimental investigations, computational methods have played an important role in obtaining the knowledge of protein-nucleic acid interaction. Over the past few years, dozens of computational tools have been used for identification of nucleic acid binding site for site-specific proteins and design of site-specific nucleases because of their significant advantages in genome editing. Here, we review existing widely-used computational tools for target prediction of site-specific proteins as well as off-target prediction of site-specific nucleases. This article provides a list of on-line prediction tools according to their features followed by the description of computational methods used by these tools, which range from various sequence mapping algorithms (like Bowtie, FetchGWI and BLAST) to different machine learning methods (such as Support Vector Machine, hidden Markov models, Random Forest, elastic network and deep neural networks). We also make suggestions on the further development in improving the accuracy of prediction methods. This survey will provide a reference guide for computational biologists working in the field of genome editing.
Collapse
Affiliation(s)
- Hua Wan
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Jian-Ming Li
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Huang Ding
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Shuo-Xin Lin
- Department of Electrical and Computer Engineering, James Clark School of Engineering, University of Maryland, College Park, MD 20742, United States
| | - Shu-Qin Tu
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Xu-Hong Tian
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Jian-Ping Hu
- College of Pharmacy and Biological Engineering, Sichuan Industrial Institute of Antibiotics, Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, Antibiotics Research and Re-Evaluation Key Laboratory of Sichuan Province, Chengdu University, Chengdu 610106, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| |
Collapse
|
29
|
Xia LY, Wang QY, Cao Z, Liang Y. Descriptor Selection Improvements for Quantitative Structure-Activity Relationships. Int J Neural Syst 2019; 29:1950016. [PMID: 31390912 DOI: 10.1142/s0129065719500163] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Molecular descriptor selection is an essential procedure to improve a predictive quantitative structure–activity relationship (QSAR) model. However, within the QSAR model, there are a number of redundant, noisy and irrelevant descriptors. In this study, we propose a novel descriptor selection framework using self-paced learning (SPL) via sparse logistic regression (LR) with Logsum penalty (SPL-Logsum), which can simultaneously adaptively identify the simple and complex samples and avoid over-fitting. SPL is inspired by the learning process of humans or animals gradually learned from simple and complex samples to train models, and the Logsum penalized LR helps to select a small subset of significant molecular descriptors for improving the QSAR models. Experimental results on some simulations and three public QSAR datasets show that our proposed SPL-Logsum framework outperforms other existing sparse methods regarding the area under the curve, sensitivity, specificity, accuracy, and [Formula: see text]-values.
Collapse
Affiliation(s)
- Liang-Yong Xia
- Faculty of Information Technology, Macau University of Science and Technology, Macau, P. R. China
| | - Qing-Yong Wang
- Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha, P. R. China
| | - Zehong Cao
- Discipline of ICT, School of Technology, Environments and Design, College of Sciences and Engineering, University of Tasmania, TAS, Australia
| | - Yong Liang
- University of Science and Technology, Macau, P. R. China
| |
Collapse
|
30
|
Leclercq M, Vittrant B, Martin-Magniette ML, Scott Boyer MP, Perin O, Bergeron A, Fradet Y, Droit A. Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data. Front Genet 2019; 10:452. [PMID: 31156708 PMCID: PMC6532608 DOI: 10.3389/fgene.2019.00452] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 04/30/2019] [Indexed: 12/11/2022] Open
Abstract
The identification of biomarker signatures in omics molecular profiling is usually performed to predict outcomes in a precision medicine context, such as patient disease susceptibility, diagnosis, prognosis, and treatment response. To identify these signatures, we have developed a biomarker discovery tool, called BioDiscML. From a collection of samples and their associated characteristics, i.e., the biomarkers (e.g., gene expression, protein levels, clinico-pathological data), BioDiscML exploits various feature selection procedures to produce signatures associated to machine learning models that will predict efficiently a specified outcome. To this purpose, BioDiscML uses a large variety of machine learning algorithms to select the best combination of biomarkers for predicting categorical or continuous outcomes from highly unbalanced datasets. The software has been implemented to automate all machine learning steps, including data pre-processing, feature selection, model selection, and performance evaluation. BioDiscML is delivered as a stand-alone program and is available for download at https://github.com/mickaelleclercq/BioDiscML.
Collapse
Affiliation(s)
- Mickael Leclercq
- Centre de Recherche du CHU de Québec-Université Laval, Québec City, QC, Canada.,Département de Médecine Moléculaire, Université Laval, Québec City, QC, Canada
| | - Benjamin Vittrant
- Centre de Recherche du CHU de Québec-Université Laval, Québec City, QC, Canada.,Département de Médecine Moléculaire, Université Laval, Québec City, QC, Canada
| | - Marie Laure Martin-Magniette
- Institute of Plant Sciences Paris Saclay IPS2, CNRS, INRA, Université Paris-Sud, Université Evry, Université Paris-Saclay, Paris Diderot, Sorbonne Paris-Cité, Orsay, France.,UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, Paris, France
| | - Marie Pier Scott Boyer
- Centre de Recherche du CHU de Québec-Université Laval, Québec City, QC, Canada.,Département de Médecine Moléculaire, Université Laval, Québec City, QC, Canada
| | - Olivier Perin
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Alain Bergeron
- Centre de Recherche du CHU de Québec-Université Laval, Québec City, QC, Canada.,Département de Chirurgie, Oncology Axis, Université Laval, Québec City, QC, Canada
| | - Yves Fradet
- Centre de Recherche du CHU de Québec-Université Laval, Québec City, QC, Canada.,Département de Chirurgie, Oncology Axis, Université Laval, Québec City, QC, Canada
| | - Arnaud Droit
- Centre de Recherche du CHU de Québec-Université Laval, Québec City, QC, Canada.,Département de Médecine Moléculaire, Université Laval, Québec City, QC, Canada
| |
Collapse
|
31
|
Peleg Y, Shefer S, Anavy L, Chudnovsky A, Israel A, Golberg A, Yakhini Z. Sparse NIR optimization method (SNIRO) to quantify analyte composition with visible (VIS)/near infrared (NIR) spectroscopy (350 nm-2500 nm). Anal Chim Acta 2019; 1051:32-40. [DOI: 10.1016/j.aca.2018.11.038] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 11/14/2018] [Accepted: 11/21/2018] [Indexed: 12/01/2022]
|
32
|
Abstract
AbstractA large variety of issues influence the success of data mining on a given problem. Two primary and important issues are the representation and the quality of the dataset. Specifically, if much redundant and unrelated or noisy and unreliable information is presented, then knowledge discovery becomes a very difficult problem. It is well-known that data preparation steps require significant processing time in machine learning tasks. It would be very helpful and quite useful if there were various preprocessing algorithms with the same reliable and effective performance across all datasets, but this is impossible. To this end, we present the most well-known and widely used up-to-date algorithms for each step of data preprocessing in the framework of predictive data mining.
Collapse
|
33
|
Machine Learning-Based Slum Mapping in Support of Slum Upgrading Programs: The Case of Bandung City, Indonesia. REMOTE SENSING 2018. [DOI: 10.3390/rs10101522] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The survey-based slum mapping (SBSM) program conducted by the Indonesian government to reach the national target of “cities without slums” by 2019 shows mapping inconsistencies due to several reasons, e.g., the dependency on the surveyor’s experiences and the complexity of the slum indicators set. By relying on such inconsistent maps, it will be difficult to monitor the national slum upgrading program’s progress. Remote sensing imagery combined with machine learning algorithms could support the reduction of these inconsistencies. This study evaluates the performance of two machine learning algorithms, i.e., support vector machine (SVM) and random forest (RF), for slum mapping in support of the slum mapping campaign in Bandung, Indonesia. Recognizing the complexity in differentiating slum and formal areas in Indonesia, the study used a combination of spectral, contextual, and morphological features. In addition, sequential feature selection (SFS) combined with the Hilbert–Schmidt independence criterion (HSIC) was used to select significant features for classifying slums. Overall, the highest accuracy (88.5%) was achieved by the SVM with SFS using contextual, morphological, and spectral features, which is higher than the estimated accuracy of the SBSM. To evaluate the potential of machine learning-based slum mapping (MLBSM) in support of slum upgrading programs, interviews were conducted with several local and national stakeholders. Results show that local acceptance for a remote sensing-based slum mapping approach varies among stakeholder groups. Therefore, a locally adapted framework is required to combine ground surveys with robust and consistent machine learning methods, for being able to deal with big data, and to allow the rapid extraction of consistent information on the dynamics of slums at a large scale.
Collapse
|
34
|
Peng H, Zheng Y, Blumenstein M, Tao D, Li J. CRISPR/Cas9 cleavage efficiency regression through boosting algorithms and Markov sequence profiling. Bioinformatics 2018; 34:3069-3077. [PMID: 29672669 DOI: 10.1093/bioinformatics/bty298] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Accepted: 04/12/2018] [Indexed: 12/26/2022] Open
Abstract
Motivation CRISPR/Cas9 system is a widely used genome editing tool. A prediction problem of great interests for this system is: how to select optimal single-guide RNAs (sgRNAs), such that its cleavage efficiency is high meanwhile the off-target effect is low. Results This work proposed a two-step averaging method (TSAM) for the regression of cleavage efficiencies of a set of sgRNAs by averaging the predicted efficiency scores of a boosting algorithm and those by a support vector machine (SVM). We also proposed to use profiled Markov properties as novel features to capture the global characteristics of sgRNAs. These new features are combined with the outstanding features ranked by the boosting algorithm for the training of the SVM regressor. TSAM improved the mean Spearman correlation coefficiencies comparing with the state-of-the-art performance on benchmark datasets containing thousands of human, mouse and zebrafish sgRNAs. Our method can be also converted to make binary distinctions between efficient and inefficient sgRNAs with superior performance to the existing methods. The analysis reveals that highly efficient sgRNAs have lower melting temperature at the middle of the spacer, cut at 5'-end closer parts of the genome and contain more 'A' but less 'G' comparing with inefficient ones. Comprehensive further analysis also demonstrates that our tool can predict an sgRNA's cutting efficiency with consistently good performance no matter it is expressed from an U6 promoter in cells or from a T7 promoter in vitro. Availability and implementation Online tool is available at http://www.aai-bioinfo.com/CRISPR/. Python and Matlab source codes are freely available at https://github.com/penn-hui/TSAM. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hui Peng
- Faculty of Engineering and Information Technology, Advanced Analytics Institute, University of Technology Sydney, Broadway, NSW, Australia
| | - Yi Zheng
- Faculty of Engineering and Information Technology, Advanced Analytics Institute, University of Technology Sydney, Broadway, NSW, Australia
| | - Michael Blumenstein
- Faculty of Engineering and Information Technology, Advanced Analytics Institute, University of Technology Sydney, Broadway, NSW, Australia
| | - Dacheng Tao
- Faculty of Engineering and Information Technologies, School of Information Technologies, University of Sydney, Darlington, NSW, Australia
| | - Jinyan Li
- Faculty of Engineering and Information Technology, Advanced Analytics Institute, University of Technology Sydney, Broadway, NSW, Australia
| |
Collapse
|
35
|
Chen Y, Yuan Z, Bi S, Wang X, Ye Y, Svenning JC. Macrofungal species distributions depend on habitat partitioning of topography, light, and vegetation in a temperate mountain forest. Sci Rep 2018; 8:13589. [PMID: 30206254 PMCID: PMC6134103 DOI: 10.1038/s41598-018-31795-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2018] [Accepted: 08/28/2018] [Indexed: 11/09/2022] Open
Abstract
The habitat partitioning hypothesis provides a conceptual framework for explaining the maintenance of plant and animal diversity. Its central tenet assumes environmental conditions are spatially structured, and that this structure is reflected in species distributions through associations with different habitats. Studies confirming habitat partitioning effects have focused primarily on spatial distributions of plants and animals, with habitat partitioning hypothesis under explored for macrofungi. Here, we examined the sporocarps of macrofungi in a 5-ha forest dynamics plot in China. We used four different methods to define microhabitats for habitat partitioning analyses based on topography, understory light availability, plant community, or a combination of these factors, and analyzed the effect of microhabitat partitioning on epigeous macrofungal community. Our results showed that the characteristics of the macrofungal assemblages varied among the habitats. A total of 85 species examined were associated with one or more of the habitat types (85/125, 68%). The factors related to the sporocarp composition differed among the various microhabitats. Our findings suggest that different microhabitats favor occurrence of different macrofungal species, and sporocarps -environment relation varied among the different microhabitats at this temperate mountain forest locality. These findings shed new light to the biodiversity conservation in macrofungi in temperate deciduous broad-leaved forest and point to the potential importance of microhabitat partitioning for sporocarp formation.
Collapse
Affiliation(s)
- Yun Chen
- College of Life Sciences, Henan Agricultural University, No.63 Agricultural Road, Zhengzhou, 450002, China.,Section for Ecoinformatics and Biodiversity, Department of Bioscience, Aarhus University, Aarhus, Denmark.,Center for Biodiversity Dynamics in a Changing World (BIOCHANGE), Aarhus University, Aarhus, Denmark
| | - Zhiliang Yuan
- College of Life Sciences, Henan Agricultural University, No.63 Agricultural Road, Zhengzhou, 450002, China
| | - Shuai Bi
- College of Life Sciences, Henan Agricultural University, No.63 Agricultural Road, Zhengzhou, 450002, China
| | - Xueying Wang
- College of Life Sciences, Henan Agricultural University, No.63 Agricultural Road, Zhengzhou, 450002, China
| | - Yongzhong Ye
- College of Life Sciences, Henan Agricultural University, No.63 Agricultural Road, Zhengzhou, 450002, China.
| | - Jens-Christian Svenning
- Section for Ecoinformatics and Biodiversity, Department of Bioscience, Aarhus University, Aarhus, Denmark.,Center for Biodiversity Dynamics in a Changing World (BIOCHANGE), Aarhus University, Aarhus, Denmark
| |
Collapse
|
36
|
Bruccoleri M, Riccobono F, Größler A. Shared Leadership Regulates Operational Team Performance in the Presence of Extreme Decisional Consensus/Conflict: Evidences from Business Process Reengineering. DECISION SCIENCES 2018. [DOI: 10.1111/deci.12325] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Manfredi Bruccoleri
- Department of Industrial and Digital Innovation; University of Palermo; Viale delle Scienze Ed.8 Palermo 90128 Italy
| | - Francesca Riccobono
- Department of Industrial and Digital Innovation; University of Palermo; Viale delle Scienze Ed.8 Palermo 90128 Italy
| | - Andreas Größler
- Institute of Business Administration; Operations Management Department University of Stuttgart; Keplerstraße 17, Room 10.037 70174 Stuttgart Germany
| |
Collapse
|
37
|
|
38
|
Palma SICJ, Traguedo AP, Porteira AR, Frias MJ, Gamboa H, Roque ACA. Machine learning for the meta-analyses of microbial pathogens' volatile signatures. Sci Rep 2018; 8:3360. [PMID: 29463885 PMCID: PMC5820279 DOI: 10.1038/s41598-018-21544-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Accepted: 02/06/2018] [Indexed: 11/11/2022] Open
Abstract
Non-invasive and fast diagnostic tools based on volatolomics hold great promise in the control of infectious diseases. However, the tools to identify microbial volatile organic compounds (VOCs) discriminating between human pathogens are still missing. Artificial intelligence is increasingly recognised as an essential tool in health sciences. Machine learning algorithms based in support vector machines and features selection tools were here applied to find sets of microbial VOCs with pathogen-discrimination power. Studies reporting VOCs emitted by human microbial pathogens published between 1977 and 2016 were used as source data. A set of 18 VOCs is sufficient to predict the identity of 11 microbial pathogens with high accuracy (77%), and precision (62-100%). There is one set of VOCs associated with each of the 11 pathogens which can predict the presence of that pathogen in a sample with high accuracy and precision (86-90%). The implemented pathogen classification methodology supports future database updates to include new pathogen-VOC data, which will enrich the classifiers. The sets of VOCs identified potentiate the improvement of the selectivity of non-invasive infection diagnostics using artificial olfaction devices.
Collapse
Affiliation(s)
- Susana I C J Palma
- UCIBIO, REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
| | - Ana P Traguedo
- UCIBIO, REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
| | - Ana R Porteira
- UCIBIO, REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
| | - Maria J Frias
- UCIBIO, REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
| | - Hugo Gamboa
- LIBPhys-UNL, Departamento de Física, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
| | - Ana C A Roque
- UCIBIO, REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal.
| |
Collapse
|
39
|
Rady A, Adedeji A. Assessing different processed meats for adulterants using visible-near-infrared spectroscopy. Meat Sci 2017; 136:59-67. [PMID: 29096288 DOI: 10.1016/j.meatsci.2017.10.014] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Revised: 10/22/2017] [Accepted: 10/23/2017] [Indexed: 11/19/2022]
Abstract
The main objective of this study was to investigate the use of spectroscopic systems in the range of 400-1000nm (visible/near-infrared or Vis-NIR) and 900-1700nm (NIR) to assess and estimate plant and animal proteins as potential adulterants in minced beef and pork. Multiple machine learning techniques were used for classification, adulterant prediction, and wavelength selection. Samples were first evaluated for the presence or absence of adulterants (6 classes), and secondly for adulterant type (6 classes) and level. Selected wavelengths models generally resulted in better classification and prediction outputs than full wavelengths. The first stage classification rates were 96% and 100% for pure/unadulterated and adulterated samples, respectively. Whereas, the second stage had classification rates of 69-100%. The optimal models for predicting adulterant levels yielded correlation coefficient, r of 0.78-0.86 and ratio of performance to deviation, RPD, of 1.19-1.98. The results from this study illustrate potential application of spectroscopic technology to rapidly and accurately detect adulterants in minced beef and pork.
Collapse
Affiliation(s)
- Ahmed Rady
- Department of Biosystems and Agricultural Engineering, University of Kentucky, Lexington, KY, USA; Department of Biosystems and Agricultural Engineering, Alexandria University, Alexandria, Egypt
| | - Akinbode Adedeji
- Department of Biosystems and Agricultural Engineering, University of Kentucky, Lexington, KY, USA.
| |
Collapse
|
40
|
Pradhan D, Padhy S, Sahoo B. Enzyme classification using multiclass support vector machine and feature subset selection. Comput Biol Chem 2017; 70:211-219. [PMID: 28934693 DOI: 10.1016/j.compbiolchem.2017.08.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Revised: 07/15/2017] [Accepted: 08/15/2017] [Indexed: 10/19/2022]
Abstract
Proteins are the macromolecules responsible for almost all biological processes in a cell. With the availability of large number of protein sequences from different sequencing projects, the challenge with the scientist is to characterize their functions. As the wet lab methods are time consuming and expensive, many computational methods such as FASTA, PSI-BLAST, DNA microarray clustering, and Nearest Neighborhood classification on protein-protein interaction network have been proposed. Support vector machine is one such method that has been used successfully for several problems such as protein fold recognition, protein structure prediction etc. Cai et al. in 2003 have used SVM for classifying proteins into different functional classes and to predict their function. They used the physico-chemical properties of proteins to represent the protein sequences. In this paper a model comprising of feature subset selection followed by multiclass Support Vector Machine is proposed to determine the functional class of a newly generated protein sequence. To train and test the model for its performance, 32 physico-chemical properties of enzymes from 6 enzyme classes are considered. To determine the features that contribute significantly for functional classification, Sequential Forward Floating Selection (SFFS), Orthogonal Forward Selection (OFS), and SVM Recursive Feature Elimination (SVM-RFE) algorithms are used and it is observed that out of 32 properties considered initially, only 20 features are sufficient to classify the proteins into its functional classes with an accuracy ranging from 91% to 94%. On comparison it is seen that, OFS followed by SVM performs better than other methods. Our model generalizes the existing model to include multiclass classification and to identify most significant features affecting the protein function.
Collapse
Affiliation(s)
- Debasmita Pradhan
- Department of Computer Scienceing and Engineering, Silicon Institute of Technology, Silicon Hills, Patia, Bhubaneswar, 751024, India.
| | - Sudarsan Padhy
- Department of Computer Scienceing and Engineering, Silicon Institute of Technology, Silicon Hills, Patia, Bhubaneswar, 751024, India
| | - Biswajit Sahoo
- School of Computer Engineering, KIIT University, Bhubaneswar, 751024, India
| |
Collapse
|
41
|
EHR-based phenotyping: Bulk learning and evaluation. J Biomed Inform 2017; 70:35-51. [PMID: 28410982 DOI: 10.1016/j.jbi.2017.04.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2016] [Revised: 03/09/2017] [Accepted: 04/10/2017] [Indexed: 01/29/2023]
Abstract
In data-driven phenotyping, a core computational task is to identify medical concepts and their variations from sources of electronic health records (EHR) to stratify phenotypic cohorts. A conventional analytic framework for phenotyping largely uses a manual knowledge engineering approach or a supervised learning approach where clinical cases are represented by variables encompassing diagnoses, medicinal treatments and laboratory tests, among others. In such a framework, tasks associated with feature engineering and data annotation remain a tedious and expensive exercise, resulting in poor scalability. In addition, certain clinical conditions, such as those that are rare and acute in nature, may never accumulate sufficient data over time, which poses a challenge to establishing accurate and informative statistical models. In this paper, we use infectious diseases as the domain of study to demonstrate a hierarchical learning method based on ensemble learning that attempts to address these issues through feature abstraction. We use a sparse annotation set to train and evaluate many phenotypes at once, which we call bulk learning. In this batch-phenotyping framework, disease cohort definitions can be learned from within the abstract feature space established by using multiple diseases as a substrate and diagnostic codes as surrogates. In particular, using surrogate labels for model training renders possible its subsequent evaluation using only a sparse annotated sample. Moreover, statistical models can be trained and evaluated, using the same sparse annotation, from within the abstract feature space of low dimensionality that encapsulates the shared clinical traits of these target diseases, collectively referred to as the bulk learning set.
Collapse
|
42
|
Application of binary quantum-inspired gravitational search algorithm in feature subset selection. APPL INTELL 2017. [DOI: 10.1007/s10489-017-0894-3] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
43
|
Yuan Y, Zheng X, Lu X. Discovering Diverse Subset for Unsupervised Hyperspectral Band Selection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:51-64. [PMID: 28113180 DOI: 10.1109/tip.2016.2617462] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Band selection, as a special case of the feature selection problem, tries to remove redundant bands and select a few important bands to represent the whole image cube. This has attracted much attention, since the selected bands provide discriminative information for further applications and reduce the computational burden. Though hyperspectral band selection has gained rapid development in recent years, it is still a challenging task because of the following requirements: 1) an effective model can capture the underlying relations between different high-dimensional spectral bands; 2) a fast and robust measure function can adapt to general hyperspectral tasks; and 3) an efficient search strategy can find the desired selected bands in reasonable computational time. To satisfy these requirements, a multigraph determinantal point process (MDPP) model is proposed to capture the full structure between different bands and efficiently find the optimal band subset in extensive hyperspectral applications. There are three main contributions: 1) graphical model is naturally transferred to address band selection problem by the proposed MDPP; 2) multiple graphs are designed to capture the intrinsic relationships between hyperspectral bands; and 3) mixture DPP is proposed to model the multiple dependencies in the proposed multiple graphs, and offers an efficient search strategy to select the optimal bands. To verify the superiority of the proposed method, experiments have been conducted on three hyperspectral applications, such as hyperspectral classification, anomaly detection, and target detection. The reliability of the proposed method in generic hyperspectral tasks is experimentally proved on four real-world hyperspectral data sets.
Collapse
|
44
|
Chen W, Zheng L, Li K, Wang Q, Liu G, Jiang Q. A Novel and Effective Method for Congestive Heart Failure Detection and Quantification Using Dynamic Heart Rate Variability Measurement. PLoS One 2016; 11:e0165304. [PMID: 27835634 PMCID: PMC5105944 DOI: 10.1371/journal.pone.0165304] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 10/10/2016] [Indexed: 01/01/2023] Open
Abstract
Risk assessment of congestive heart failure (CHF) is essential for detection, especially helping patients make informed decisions about medications, devices, transplantation, and end-of-life care. The majority of studies have focused on disease detection between CHF patients and normal subjects using short-/long-term heart rate variability (HRV) measures but not much on quantification. We downloaded 116 nominal 24-hour RR interval records from the MIT/BIH database, including 72 normal people and 44 CHF patients. These records were analyzed under a 4-level risk assessment model: no risk (normal people, N), mild risk (patients with New York Heart Association (NYHA) class I-II, P1), moderate risk (patients with NYHA III, P2), and severe risk (patients with NYHA III-IV, P3). A novel multistage classification approach is proposed for risk assessment and rating CHF using the non-equilibrium decision-tree-based support vector machine classifier. We propose dynamic indices of HRV to capture the dynamics of 5-minute short term HRV measurements for quantifying autonomic activity changes of CHF. We extracted 54 classical measures and 126 dynamic indices and selected from these using backward elimination to detect and quantify CHF patients. Experimental results show that the multistage risk assessment model can realize CHF detection and quantification analysis with total accuracy of 96.61%. The multistage model provides a powerful predictor between predicted and actual ratings, and it could serve as a clinically meaningful outcome providing an early assessment and a prognostic marker for CHF patients.
Collapse
Affiliation(s)
- Wenhui Chen
- School of Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China.,Science and Technology Planning Project of Guangdong Province, Guangzhou, Guangdong, China.,Guangdong Provincial Engineering and Technology Centre of Advanced and Portable Medical Device, Guangzhou, Guangdong, China
| | - Lianrong Zheng
- School of Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China.,Science and Technology Planning Project of Guangdong Province, Guangzhou, Guangdong, China.,Guangdong Provincial Engineering and Technology Centre of Advanced and Portable Medical Device, Guangzhou, Guangdong, China
| | - Kunyang Li
- School of Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China.,Science and Technology Planning Project of Guangdong Province, Guangzhou, Guangdong, China.,Guangdong Provincial Engineering and Technology Centre of Advanced and Portable Medical Device, Guangzhou, Guangdong, China
| | - Qian Wang
- School of Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China.,Science and Technology Planning Project of Guangdong Province, Guangzhou, Guangdong, China.,Guangdong Provincial Engineering and Technology Centre of Advanced and Portable Medical Device, Guangzhou, Guangdong, China
| | - Guanzheng Liu
- School of Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China.,Science and Technology Planning Project of Guangdong Province, Guangzhou, Guangdong, China.,Guangdong Provincial Engineering and Technology Centre of Advanced and Portable Medical Device, Guangzhou, Guangdong, China
| | - Qing Jiang
- School of Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China.,Science and Technology Planning Project of Guangdong Province, Guangzhou, Guangdong, China.,Guangdong Provincial Engineering and Technology Centre of Advanced and Portable Medical Device, Guangzhou, Guangdong, China
| |
Collapse
|
45
|
Merkle R, Steiert B, Salopiata F, Depner S, Raue A, Iwamoto N, Schelker M, Hass H, Wäsch M, Böhm ME, Mücke O, Lipka DB, Plass C, Lehmann WD, Kreutz C, Timmer J, Schilling M, Klingmüller U. Identification of Cell Type-Specific Differences in Erythropoietin Receptor Signaling in Primary Erythroid and Lung Cancer Cells. PLoS Comput Biol 2016; 12:e1005049. [PMID: 27494133 PMCID: PMC4975441 DOI: 10.1371/journal.pcbi.1005049] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 07/05/2016] [Indexed: 01/23/2023] Open
Abstract
Lung cancer, with its most prevalent form non-small-cell lung carcinoma (NSCLC), is one of the leading causes of cancer-related deaths worldwide, and is commonly treated with chemotherapeutic drugs such as cisplatin. Lung cancer patients frequently suffer from chemotherapy-induced anemia, which can be treated with erythropoietin (EPO). However, studies have indicated that EPO not only promotes erythropoiesis in hematopoietic cells, but may also enhance survival of NSCLC cells. Here, we verified that the NSCLC cell line H838 expresses functional erythropoietin receptors (EPOR) and that treatment with EPO reduces cisplatin-induced apoptosis. To pinpoint differences in EPO-induced survival signaling in erythroid progenitor cells (CFU-E, colony forming unit-erythroid) and H838 cells, we combined mathematical modeling with a method for feature selection, the L1 regularization. Utilizing an example model and simulated data, we demonstrated that this approach enables the accurate identification and quantification of cell type-specific parameters. We applied our strategy to quantitative time-resolved data of EPO-induced JAK/STAT signaling generated by quantitative immunoblotting, mass spectrometry and quantitative real-time PCR (qRT-PCR) in CFU-E and H838 cells as well as H838 cells overexpressing human EPOR (H838-HA-hEPOR). The established parsimonious mathematical model was able to simultaneously describe the data sets of CFU-E, H838 and H838-HA-hEPOR cells. Seven cell type-specific parameters were identified that included for example parameters for nuclear translocation of STAT5 and target gene induction. Cell type-specific differences in target gene induction were experimentally validated by qRT-PCR experiments. The systematic identification of pathway differences and sensitivities of EPOR signaling in CFU-E and H838 cells revealed potential targets for intervention to selectively inhibit EPO-induced signaling in the tumor cells but leave the responses in erythroid progenitor cells unaffected. Thus, the proposed modeling strategy can be employed as a general procedure to identify cell type-specific parameters and to recommend treatment strategies for the selective targeting of specific cell types. A major challenge in the development of therapeutic interventions is the selective inhibition of a signal transduction pathway in one cell type such as a cancer cell leaving the other cell type such as a healthy cell as unaffected as possible. Here, we propose a new approach that combines mathematical modeling based on quantitative experimental data with statistical methods. We demonstrate based on simulated data that our approach can determine which parameters are the same and which parameters differ in two exemplary cell types. We compare a lung cancer cell line to the precursor cells of red blood cells. We show that the same signal transduction network induced by erythropoietin (EPO), a hormone that is frequently employed to treat anemia in cancer patients, regulates survival of both cell types. Based on our experimental data in combination with our computational approach, we identify seven cell type-specific differences in this signaling pathway. Our strategy allows predicting therapeutic targets that could be inhibited to interfere with survival of lung cancer cells while leaving production of red blood cells unaffected.
Collapse
Affiliation(s)
- Ruth Merkle
- Division Systems Biology of Signal Transduction, German Cancer Research Center (DKFZ), INF 280, Heidelberg, Germany
- Translational Lung Research Center (TLRC), German Center for Lung Research (DZL), Heidelberg, Germany
| | - Bernhard Steiert
- Institute of Physics, University of Freiburg, Germany & BIOSS Centre for Biological Signalling Studies, University of Freiburg, Germany
| | - Florian Salopiata
- Division Systems Biology of Signal Transduction, German Cancer Research Center (DKFZ), INF 280, Heidelberg, Germany
- Translational Lung Research Center (TLRC), German Center for Lung Research (DZL), Heidelberg, Germany
| | - Sofia Depner
- Division Systems Biology of Signal Transduction, German Cancer Research Center (DKFZ), INF 280, Heidelberg, Germany
- Translational Lung Research Center (TLRC), German Center for Lung Research (DZL), Heidelberg, Germany
| | - Andreas Raue
- Institute of Physics, University of Freiburg, Germany & BIOSS Centre for Biological Signalling Studies, University of Freiburg, Germany
| | - Nao Iwamoto
- Division Systems Biology of Signal Transduction, German Cancer Research Center (DKFZ), INF 280, Heidelberg, Germany
| | - Max Schelker
- Institute of Physics, University of Freiburg, Germany & BIOSS Centre for Biological Signalling Studies, University of Freiburg, Germany
| | - Helge Hass
- Institute of Physics, University of Freiburg, Germany & BIOSS Centre for Biological Signalling Studies, University of Freiburg, Germany
| | - Marvin Wäsch
- Division Systems Biology of Signal Transduction, German Cancer Research Center (DKFZ), INF 280, Heidelberg, Germany
- Translational Lung Research Center (TLRC), German Center for Lung Research (DZL), Heidelberg, Germany
| | - Martin E. Böhm
- Division Systems Biology of Signal Transduction, German Cancer Research Center (DKFZ), INF 280, Heidelberg, Germany
| | - Oliver Mücke
- Division Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), INF 280, Heidelberg, Germany
| | - Daniel B. Lipka
- Regulation of Cellular Differentiation Group, Division Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), INF 280, Heidelberg, Germany
| | - Christoph Plass
- Division Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), INF 280, Heidelberg, Germany
| | - Wolf D. Lehmann
- Division Systems Biology of Signal Transduction, German Cancer Research Center (DKFZ), INF 280, Heidelberg, Germany
| | - Clemens Kreutz
- Institute of Physics, University of Freiburg, Germany & BIOSS Centre for Biological Signalling Studies, University of Freiburg, Germany
| | - Jens Timmer
- Institute of Physics, University of Freiburg, Germany & BIOSS Centre for Biological Signalling Studies, University of Freiburg, Germany
- * E-mail: (JT); (MS); (UK)
| | - Marcel Schilling
- Division Systems Biology of Signal Transduction, German Cancer Research Center (DKFZ), INF 280, Heidelberg, Germany
- * E-mail: (JT); (MS); (UK)
| | - Ursula Klingmüller
- Division Systems Biology of Signal Transduction, German Cancer Research Center (DKFZ), INF 280, Heidelberg, Germany
- Translational Lung Research Center (TLRC), German Center for Lung Research (DZL), Heidelberg, Germany
- * E-mail: (JT); (MS); (UK)
| |
Collapse
|
46
|
Hadjerci O, Hafiane A, Conte D, Makris P, Vieyres P, Delbos A. Computer-aided detection system for nerve identification using ultrasound images: A comparative study. INFORMATICS IN MEDICINE UNLOCKED 2016. [DOI: 10.1016/j.imu.2016.06.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
|
47
|
Poucke SV, Zhang Z, Schmitz M, Vukicevic M, Laenen MV, Celi LA, Deyne CD. Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open Data Analysis Platform. PLoS One 2016; 11:e0145791. [PMID: 26731286 PMCID: PMC4701479 DOI: 10.1371/journal.pone.0145791] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 12/08/2015] [Indexed: 02/07/2023] Open
Abstract
With the accumulation of large amounts of health related data, predictive analytics could stimulate the transformation of reactive medicine towards Predictive, Preventive and Personalized (PPPM) Medicine, ultimately affecting both cost and quality of care. However, high-dimensionality and high-complexity of the data involved, prevents data-driven methods from easy translation into clinically relevant models. Additionally, the application of cutting edge predictive methods and data manipulation require substantial programming skills, limiting its direct exploitation by medical domain experts. This leaves a gap between potential and actual data usage. In this study, the authors address this problem by focusing on open, visual environments, suited to be applied by the medical community. Moreover, we review code free applications of big data technologies. As a showcase, a framework was developed for the meaningful use of data from critical care patients by integrating the MIMIC-II database in a data mining environment (RapidMiner) supporting scalable predictive analytics using visual tools (RapidMiner's Radoop extension). Guided by the CRoss-Industry Standard Process for Data Mining (CRISP-DM), the ETL process (Extract, Transform, Load) was initiated by retrieving data from the MIMIC-II tables of interest. As use case, correlation of platelet count and ICU survival was quantitatively assessed. Using visual tools for ETL on Hadoop and predictive modeling in RapidMiner, we developed robust processes for automatic building, parameter optimization and evaluation of various predictive models, under different feature selection schemes. Because these processes can be easily adopted in other projects, this environment is attractive for scalable predictive analytics in health research.
Collapse
Affiliation(s)
- Sven Van Poucke
- Department of Anesthesiology, Intensive Care, Emergency Medicine and Pain Therapy, Ziekenhuis Oost-Limburg, Genk, Belgium
- * E-mail:
| | - Zhongheng Zhang
- Department of Critical Care Medicine, Jinhua Hospital of Zhejiang University, Zhejiang, P.R. China
| | | | - Milan Vukicevic
- Department of Organizational Sciences, University of Belgrade, Belgrade, Serbia
| | - Margot Vander Laenen
- Department of Anesthesiology, Intensive Care, Emergency Medicine and Pain Therapy, Ziekenhuis Oost-Limburg, Genk, Belgium
| | - Leo Anthony Celi
- MIT Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States
| | - Cathy De Deyne
- Department of Anesthesiology, Intensive Care, Emergency Medicine and Pain Therapy, Ziekenhuis Oost-Limburg, Genk, Belgium
- Limburg Clinical Research Program, Faculty of Medicine, University Hasselt UH, Hasselt, Belgium
| |
Collapse
|
48
|
|
49
|
|
50
|
Abdoos A, Hemmati M, Abdoos AA. Short term load forecasting using a hybrid intelligent method. Knowl Based Syst 2015. [DOI: 10.1016/j.knosys.2014.12.008] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|