201
|
Hong W, Lu Y, Zhou X, Jin S, Pan J, Lin Q, Yang S, Basharat Z, Zippi M, Goyal H. Usefulness of Random Forest Algorithm in Predicting Severe Acute Pancreatitis. Front Cell Infect Microbiol 2022; 12:893294. [PMID: 35755843 PMCID: PMC9226542 DOI: 10.3389/fcimb.2022.893294] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 04/29/2022] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND AND AIMS This study aimed to develop an interpretable random forest model for predicting severe acute pancreatitis (SAP). METHODS Clinical and laboratory data of 648 patients with acute pancreatitis were retrospectively reviewed and randomly assigned to the training set and test set in a 3:1 ratio. Univariate analysis was used to select candidate predictors for the SAP. Random forest (RF) and logistic regression (LR) models were developed on the training sample. The prediction models were then applied to the test sample. The performance of the risk models was measured by calculating the area under the receiver operating characteristic (ROC) curves (AUC) and area under precision recall curve. We provide visualized interpretation by using local interpretable model-agnostic explanations (LIME). RESULTS The LR model was developed to predict SAP as the following function: -1.10-0.13×albumin (g/L) + 0.016 × serum creatinine (μmol/L) + 0.14 × glucose (mmol/L) + 1.63 × pleural effusion (0/1)(No/Yes). The coefficients of this formula were utilized to build a nomogram. The RF model consists of 16 variables identified by univariate analysis. It was developed and validated by a tenfold cross-validation on the training sample. Variables importance analysis suggested that blood urea nitrogen, serum creatinine, albumin, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, calcium, and glucose were the most important seven predictors of SAP. The AUCs of RF model in tenfold cross-validation of the training set and the test set was 0.89 and 0.96, respectively. Both the area under precision recall curve and the diagnostic accuracy of the RF model were higher than that of both the LR model and the BISAP score. LIME plots were used to explain individualized prediction of the RF model. CONCLUSIONS An interpretable RF model exhibited the highest discriminatory performance in predicting SAP. Interpretation with LIME plots could be useful for individualized prediction in a clinical setting. A nomogram consisting of albumin, serum creatinine, glucose, and pleural effusion was useful for prediction of SAP.
Collapse
Affiliation(s)
- Wandong Hong
- Department of Gastroenterology and Hepatology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- *Correspondence: Wandong Hong,
| | - Yajing Lu
- Department of Gastroenterology and Hepatology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Xiaoying Zhou
- School of the First Clinical Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Shengchun Jin
- School of the First Clinical Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Jingyi Pan
- School of the First Clinical Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Qingyi Lin
- School of the First Clinical Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Shaopeng Yang
- School of the First Clinical Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Zarrin Basharat
- Jamil-ur-Rahman Center for Genome Research, Dr. Panjwani Centre for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, Pakistan
| | - Maddalena Zippi
- Unit of Gastroenterology and Digestive Endoscopy, Sandro Pertini Hospital, Rome, Italy
| | - Hemant Goyal
- Department of Medicine, The Wright Center for Graduate Medical Education, Scranton, PA, United States
| |
Collapse
|
202
|
Buenafe RJ, Rathnam A, Añonuevo JJ, Sundar S, Sreenivasulu N. Application of classification models in screening superior rice grain quality in male sterile and pollen parents. J Food Compost Anal 2021. [DOI: 10.1016/j.jfca.2021.104137] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
203
|
Using a Random Forest Model to Predict the Location of Potential Damage on Asphalt Pavement. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app112110396] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Potential damage, eventually demonstrated as moisture damage on inner and in-situ road structures, is the most complex problem to predict, which costs lots of money, time, and natural resources for maintenance and even leads to safety problems. Traditional linear regression analysis cannot fit well with this multi-factor task in such in-field circumstances. Random Forest (RF) is a progressive nonlinear algorithm, which can combine all relative factors to gain accurate prediction and good explanation. In this study, an RF model is constructed for the prediction of potential damage. In addition, relative variable importance is analyzed to obtain the correlations between factors and potential damage separately. The results show that, through the optimization, the model achieved a good average accuracy of 83.33%. Finally, the controlling method for moisture damage is provided by combining the traditional analysis method and the RF model. In a word, RF is a prospective method in predictions and data mining for highway engineering. Trained with effective data, it can be multifunctional and powerful to solve hard problems.
Collapse
|
204
|
Çakıroğlu MA, Kaplan AN, Süzen AA. Experimental and DBN-Based neural network extraction of radiation attenuation coefficient of dry mixture shotcrete produced using different additives. Radiat Phys Chem Oxf Engl 1993 2021. [DOI: 10.1016/j.radphyschem.2021.109636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
205
|
Potentials and Limitations of WorldView-3 Data for the Detection of Invasive Lupinus polyphyllus Lindl. in Semi-Natural Grasslands. REMOTE SENSING 2021. [DOI: 10.3390/rs13214333] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Semi-natural grasslands contribute highly to biodiversity and other ecosystem services, but they are at risk by the spread of invasive plant species, which alter their habitat structure. Large area grassland monitoring can be a powerful tool to manage invaded ecosystems. Therefore, WorldView-3 multispectral sensor data was utilized to train multiple machine learning algorithms in an automatic machine learning workflow called ‘H2O AutoML’ to detect L. polyphyllus in a nature protection grassland ecosystem. Different degree of L. polyphyllus cover was collected on 3 × 3 m2 reference plots, and multispectral bands, indices, and texture features were used in a feature selection process to identify the most promising classification model and machine learning algorithm based on mean per class error, log loss, and AUC metrics. The best performance was achieved with a binary classification of lupin-free vs. fully invaded 3 × 3 m2 plot classification with a set of 7 features out of 763. The findings reveal that L. polyphyllus detection from WorldView-3 sensor data is limited to large dominant spots and not recommendable for lower plant coverage, especially single plant detection. Further research is needed to clarify if different phenological stages of L. polyphyllus as well as time series increase classification performance.
Collapse
|
206
|
Guo JN, Chen D, Deng SH, Huang JR, Song JX, Li XY, Cui BB, Liu YL. Identification and quantification of immune infiltration landscape on therapy and prognosis in left- and right-sided colon cancer. Cancer Immunol Immunother 2021; 71:1313-1330. [PMID: 34657172 PMCID: PMC9122887 DOI: 10.1007/s00262-021-03076-2] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 09/30/2021] [Indexed: 01/22/2023]
Abstract
Background The left-sided and right-sided colon cancer (LCCs and RCCs, respectively) have unique molecular features and clinical heterogeneity. This study aimed to identify the characteristics of immune cell infiltration (ICI) subtypes for evaluating prognosis and therapeutic benefits. Methods The independent gene datasets, corresponding somatic mutation and clinical information were collected from The Cancer Genome Atlas and Gene Expression Omnibus. The ICI contents were evaluated by “ESTIMATE” and “CIBERSORT.” We performed two computational algorithms to identify the ICI landscape related to prognosis and found the unique infiltration characteristics. Next, principal component analysis was conducted to construct ICI score based on three ICI patterns. We analyzed the correlation between ICI score and tumor mutation burden (TMB), and stratified patients into prognostic-related high- and low- ICI score groups (HSG and LSG, respectively). The role of ICI scores in the prediction of therapeutic benefits was investigated by "pRRophetic" and verified by Immunophenoscores (IPS) (TCIA database) and an independent immunotherapy cohort (IMvigor210). The key genes were preliminary screened by weighted gene co-expression network analysis based on ICI scores. And they were further identified at various levels, including single cell, protein and immunotherapy response. The predictive ability of ICI score for prognosis was also verified in IMvigor210 cohort. Results The ICI features with a better prognosis were marked by high plasma cells, dendritic cells and mast cells, low memory CD4+ T cells, M0 macrophages, M1 macrophages, as well as M2 macrophages. A high ICI score was characterized by an increased TMB and genomic instability related signaling pathways. The prognosis, sensitivities of targeted inhibitors and immunotherapy, IPS and expression of immune checkpoints were significantly different in HSG and LSG. The genes identified by ICI scores and various levels included CA2 and TSPAN1. Conclusion The identification of ICI subtypes and ICI scores will help gain insights into the heterogeneity in LCC and RCC, and identify patients probably benefiting from treatments. ICI scores and the key genes could serve as an effective biomarker to predict prognosis and the sensitivity of immunotherapy. Supplementary Information The online version contains supplementary material available at 10.1007/s00262-021-03076-2.
Collapse
Affiliation(s)
- Jun-Nan Guo
- Department of Colorectal Surgery, Harbin Medical University Cancer Hospital, Harbin, 150086, People's Republic of China
| | - Du Chen
- The First Department of Oncological Surgery, The First People's Hospital of Xiangtan City, Xiangtan, 411100, People's Republic of China
| | - Shen-Hui Deng
- Department of Anesthesiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, 150086, People's Republic of China
| | - Jia-Rong Huang
- Department of Clinical Medicine, North Sichuan Medical College, Nanchong, 637000, People's Republic of China
| | - Jin-Xuan Song
- Department of Clinical Medicine, North Sichuan Medical College, Nanchong, 637000, People's Republic of China
| | - Xiang-Yu Li
- Department of Clinical Medicine, North Sichuan Medical College, Nanchong, 637000, People's Republic of China
| | - Bin-Bin Cui
- Department of Colorectal Surgery, Harbin Medical University Cancer Hospital, Harbin, 150086, People's Republic of China.
| | - Yan-Long Liu
- Department of Colorectal Surgery, Harbin Medical University Cancer Hospital, Harbin, 150086, People's Republic of China.
| |
Collapse
|
207
|
Feng W, Quan Y, Dauphin G, Li Q, Gao L, Huang W, Xia J, Zhu W, Xing M. Semi-supervised rotation forest based on ensemble margin theory for the classification of hyperspectral image with limited training data. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.06.059] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
208
|
López-Castro T, Zhao Y, Fitzpatrick S, Ruglass LM, Hien DA. Seeing the forest for the trees: Predicting attendance in trials for co-occurring PTSD and substance use disorders with a machine learning approach. J Consult Clin Psychol 2021; 89:869-884. [PMID: 34807661 PMCID: PMC9426719 DOI: 10.1037/ccp0000688] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Objective: High dropout rates are common in randomized clinical trials (RCTs) for comorbid posttraumatic stress disorder and substance use disorders (PTSD + SUD). Optimizing attendance is a priority for PTSD + SUD treatment development, yet research has found few consistent associations to guide responsive strategies. In this study, we employed a data-driven pipeline for identifying salient and reliable predictors of attendance. Method: In a novel application of the iterative Random Forest algorithm (iRF), we investigated the association of individual level characteristics and session attendance in a completed RCT for PTSD + SUD (n = 70; women = 22 [31.4%]). iRF identified a group of potential predictor candidates for the total trial sessions attended; then, a Poisson regression model assessed the association between the iRF-identified factors and attendance. As a validation set, a parallel regression of significant predictors was conducted on a second, independent RCT for PTSD + SUD (n = 60; women = 48 [80%]). Results: Two testable hypotheses were derived from iRF's variable importance measures. Faster within-treatment improvement of PTSD symptoms was associated with greater session attendance with age moderating this relationship (p = .01): faster PTSD symptom improvement predicted fewer sessions attended among younger patients and more sessions among older patients. Full-time employment was also associated with fewer sessions attended (p = .02). In the validation set, the interaction between age and speed of PTSD improvement was significant (p = .05) and the employment association was not. Conclusions: Results demonstrate the potential of data-driven methods to identifying meaningful predictors as well as the dynamic contribution of symptom change during treatment to understanding RCT attendance. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Yihong Zhao
- The Center of Alcohol and Substance Use Studies, Rutgers University – New Brunswick
| | | | | | - Denise A. Hien
- The Center of Alcohol and Substance Use Studies, Rutgers University – New Brunswick
| |
Collapse
|
209
|
Forecasting Solar Radiation. JOURNAL OF CASES ON INFORMATION TECHNOLOGY 2021. [DOI: 10.4018/jcit.296263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Renewable energy, such as solar and wind, has been increasing in popularity for over a decade. This is especially true in rural, underdeveloped areas, and urban households that desire energy independence. Renewable energy sources, such as solar, provide enhanced environmental benefits while simultaneously minimizing the carbon footprint. One popular technology that can capture solar energy is solar panels. The demand for solar panels has been on the rise due to increases in energy conversion efficiency, long-term financial advantages, and contributions to decreasing fossil fuel usage. However, solar panels need a steady supply of sunlight. This can be challenging in many situations, geographies, and environments. This paper uses multiple machine learning (ML) algorithms that can predict future values of solar radiation based on previously observed values and other environmental features measured without the use of complex equipment with methods that are computationally efficient so that forecasting can be done on consumer premises.
Collapse
|
210
|
de Abreu Fontes J, Anzanello MJ, Brito JBG, Bucco GB, Fogliatto FS, Puglia FDP. Combining wavelength importance ranking to the random forest classifier to analyze multiclass spectral data. Forensic Sci Int 2021; 328:110998. [PMID: 34551367 DOI: 10.1016/j.forsciint.2021.110998] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 09/04/2021] [Accepted: 09/09/2021] [Indexed: 10/20/2022]
Abstract
Near Infrared (NIR) is a type of vibrational spectroscopy widely used in different areas to characterize substances. NIR datasets are comprised of absorbance measures on a range of wavelengths (λ). Typically noisy and correlated, the use of such datasets tend to compromise the performance of several statistical techniques; one way to overcome that is to select portions of the spectra in which wavelengths are more informative. In this paper we investigate the performance of the Random Forest (RF) classifier associated with several wavelength importance ranking approaches on the task of classifying product samples into categories, such as quality levels or authenticity. Our propositions are tested using six NIR datasets comprised of two or more classes of food and pharmaceutical products, as well as illegal drugs. Our proposed classification model, an integration of the χ2 ranking score and the RF classifier, substantially reduced the number of wavelengths in the dataset, while increasing the classification accuracy when compared to the use of complete datasets. Our propositions also presented good performance when compared to competing methods available in the literature.
Collapse
Affiliation(s)
- Juliana de Abreu Fontes
- Departamento de Engenharia de Produção e Transportes - Universidade Federal do Rio Grande do Sul, Av. Osvaldo Aranha, 99 - 5° andar, Porto Alegre, RS, Brazil.
| | - Michel José Anzanello
- Departamento de Engenharia de Produção e Transportes - Universidade Federal do Rio Grande do Sul, Av. Osvaldo Aranha, 99 - 5° andar, Porto Alegre, RS, Brazil
| | - João B G Brito
- Departamento de Engenharia de Produção e Transportes - Universidade Federal do Rio Grande do Sul, Av. Osvaldo Aranha, 99 - 5° andar, Porto Alegre, RS, Brazil
| | - Guilherme Brandelli Bucco
- Escola de Administração - Universidade Federal do Rio Grande do Sul, Washington Luiz, 855, Porto Alegre, RS, Brazil
| | - Flavio Sanson Fogliatto
- Departamento de Engenharia de Produção e Transportes - Universidade Federal do Rio Grande do Sul, Av. Osvaldo Aranha, 99 - 5° andar, Porto Alegre, RS, Brazil
| | - Fábio do Prado Puglia
- Departamento de Engenharia de Produção e Transportes - Universidade Federal do Rio Grande do Sul, Av. Osvaldo Aranha, 99 - 5° andar, Porto Alegre, RS, Brazil
| |
Collapse
|
211
|
Zhou X, Lin Q, Gui Y, Wang Z, Liu M, Lu H. Multimodal MR Images-Based Diagnosis of Early Adolescent Attention-Deficit/Hyperactivity Disorder Using Multiple Kernel Learning. Front Neurosci 2021; 15:710133. [PMID: 34594183 PMCID: PMC8477011 DOI: 10.3389/fnins.2021.710133] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Accepted: 07/30/2021] [Indexed: 11/13/2022] Open
Abstract
Attention-deficit/hyperactivity disorder (ADHD) is one of the most common brain diseases among children. The current criteria of ADHD diagnosis mainly depend on behavior analysis, which is subjective and inconsistent, especially for children. The development of neuroimaging technologies, such as magnetic resonance imaging (MRI), drives the discovery of brain abnormalities in structure and function by analyzing multimodal neuroimages for computer-aided diagnosis of brain diseases. This paper proposes a multimodal machine learning framework that combines the Boruta based feature selection and Multiple Kernel Learning (MKL) to integrate the multimodal features of structural and functional MRIs and Diffusion Tensor Images (DTI) for the diagnosis of early adolescent ADHD. The rich and complementary information of the macrostructural features, microstructural properties, and functional connectivities are integrated at the kernel level, followed by a support vector machine classifier for discriminating ADHD from healthy children. Our experiments were conducted on the comorbidity-free ADHD subjects and covariable-matched healthy children aged 9-10 chosen from the Adolescent Brain and Cognitive Development (ABCD) study. This paper is the first work to combine structural and functional MRIs with DTI for early adolescents of the ABCD study. The results indicate that the kernel-level fusion of multimodal features achieves 0.698 of AUC (area under the receiver operating characteristic curves) and 64.3% of classification accuracy for ADHD diagnosis, showing a significant improvement over the early feature fusion and unimodal features. The abnormal functional connectivity predictors, involving default mode network, attention network, auditory network, and sensorimotor mouth network, thalamus, and cerebellum, as well as the anatomical regions in basal ganglia, are found to encode the most discriminative information, which collaborates with macrostructure and diffusion alterations to boost the performances of disorder diagnosis.
Collapse
Affiliation(s)
- Xiaocheng Zhou
- Shanghai Jiao Tong University-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, China
- Department of Bioinformatics and Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Qingmin Lin
- Department of Bioinformatics and Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- Department of Developmental and Behavioral Pediatrics, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Yuanyuan Gui
- Shanghai Jiao Tong University-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, China
- Department of Bioinformatics and Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Zixin Wang
- Shanghai Jiao Tong University-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, China
- Department of Bioinformatics and Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Manhua Liu
- MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
- Department of Instrument Science and Engineering, School of EIEE, Shanghai Jiao Tong University, Shanghai, China
| | - Hui Lu
- Shanghai Jiao Tong University-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, China
- Department of Bioinformatics and Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- Center for Biomedical Informatics, Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Shanghai Children's Hospital, Shanghai, China
| |
Collapse
|
212
|
Huang Y, Wei L, Hu Y, Shao N, Lin Y, He S, Shi H, Zhang X, Lin Y. Multi-Parametric MRI-Based Radiomics Models for Predicting Molecular Subtype and Androgen Receptor Expression in Breast Cancer. Front Oncol 2021; 11:706733. [PMID: 34490107 PMCID: PMC8416497 DOI: 10.3389/fonc.2021.706733] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 07/28/2021] [Indexed: 12/30/2022] Open
Abstract
Objective To investigate whether radiomics features extracted from multi-parametric MRI combining machine learning approach can predict molecular subtype and androgen receptor (AR) expression of breast cancer in a non-invasive way. Materials and Methods Patients diagnosed with clinical T2–4 stage breast cancer from March 2016 to July 2020 were retrospectively enrolled. The molecular subtypes and AR expression in pre-treatment biopsy specimens were assessed. A total of 4,198 radiomics features were extracted from the pre-biopsy multi-parametric MRI (including dynamic contrast-enhancement T1-weighted images, fat-suppressed T2-weighted images, and apparent diffusion coefficient map) of each patient. We applied several feature selection strategies including the least absolute shrinkage and selection operator (LASSO), and recursive feature elimination (RFE), the maximum relevance minimum redundancy (mRMR), Boruta and Pearson correlation analysis, to select the most optimal features. We then built 120 diagnostic models using distinct classification algorithms and feature sets divided by MRI sequences and selection strategies to predict molecular subtype and AR expression of breast cancer in the testing dataset of leave-one-out cross-validation (LOOCV). The performances of binary classification models were assessed via the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). And the performances of multiclass classification models were assessed via AUC, overall accuracy, precision, recall rate, and F1-score. Results A total of 162 patients (mean age, 46.91 ± 10.08 years) were enrolled in this study; 30 were low-AR expression and 132 were high-AR expression. HR+/HER2− cancers were diagnosed in 56 cases (34.6%), HER2+ cancers in 81 cases (50.0%), and TNBC in 25 patients (15.4%). There was no significant difference in clinicopathologic characteristics between low-AR and high-AR groups (P > 0.05), except the menopausal status, ER, PR, HER2, and Ki-67 index (P = 0.043, <0.001, <0.001, 0.015, and 0.006, respectively). No significant difference in clinicopathologic characteristics was observed among three molecular subtypes except the AR status and Ki-67 (P = <0.001 and 0.012, respectively). The Multilayer Perceptron (MLP) showed the best performance in discriminating AR expression, with an AUC of 0.907 and an accuracy of 85.8% in the testing dataset. The highest performances were obtained for discriminating TNBC vs. non-TNBC (AUC: 0.965, accuracy: 92.6%), HER2+ vs. HER2− (AUC: 0.840, accuracy: 79.0%), and HR+/HER2− vs. others (AUC: 0.860, accuracy: 82.1%) using MLP as well. The micro-AUC of MLP multiclass classification model was 0.896, and the overall accuracy was 0.735. Conclusions Multi-parametric MRI-based radiomics combining with machine learning approaches provide a promising method to predict the molecular subtype and AR expression of breast cancer non-invasively.
Collapse
Affiliation(s)
- Yuhong Huang
- Breast Disease Center, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Lihong Wei
- Department of Pathology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Yalan Hu
- Breast Disease Center, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Nan Shao
- Breast Disease Center, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Yingyu Lin
- Department of Radiology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Shaofu He
- Department of Radiology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Huijuan Shi
- Department of Pathology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Xiaoling Zhang
- Department of Radiology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Ying Lin
- Breast Disease Center, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
213
|
Gu W, Kim M, Wang L, Yang Z, Nakajima T, Tsushima Y. Multi-omics Analysis of Ferroptosis Regulation Patterns and Characterization of Tumor Microenvironment in Patients with Oral Squamous Cell Carcinoma. Int J Biol Sci 2021; 17:3476-3492. [PMID: 34512160 PMCID: PMC8416738 DOI: 10.7150/ijbs.61441] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Accepted: 07/20/2021] [Indexed: 02/06/2023] Open
Abstract
Ferroptosis is a newly recognized mechanism of regulated cell death. It was reported to be highly associated with immune therapy and chemotherapy. However, its mechanism of regulation in the tumor microenvironment (TME) and influence on oral squamous cell carcinoma (OSCC) therapy are unknown. We identified a ferroptosis-specific gene-expression signature, an FPscore, developed by a principal component analysis (PCA) algorithm to evaluate the ferroptosis regulation patterns of individual tumor. Multi-omics analysis of ferroptosis regulation patterns was conducted. Three distinct ferroptosis regulation subtypes, which linked to outcomes and the clinical relevance of each patient, were established. A high FPscore of patients with OSCC was associated with a favorable prognosis, a ferroptosis-related immune-activation phenotype, potential sensitivities to the chemotherapy and immunotherapy. Importantly, a high FPscore correlated with a low gene copy number burden and high immune checkpoint expressions. We validated the prognostic value of the FPscore using independent immunotherapy and pan-cancer cohorts. Comprehensive evaluation of individual tumors with distinct ferroptosis regulation patterns provides new mechanistic insights, which may be clinically relevant for the application of combination therapies in OSCC.
Collapse
Affiliation(s)
- Wenchao Gu
- Department of Diagnostic Radiology and Nuclear Medicine, Gunma University Graduate School of Medicine, Maebashi, Japan
| | - Mai Kim
- Department of Oral and Maxillofacial Surgery, and Plastic Surgery, Gunma University Graduate School of Medicine, Maebashi, Japan
| | - Lei Wang
- Department of Pathology, Fudan University Shanghai Cancer Center, Shanghai, 200032, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, China
| | - Zongcheng Yang
- Department of Implantology, School and Hospital of Stomatology, Cheeloo College of Medicine, Shandong University & Shandong Key Laboratory of Oral Tissue Regeneration & Shandong Engineering Laboratory for Dental Materials and Oral Tissue Regeneration, Jinan, Shandong, People's Republic of China
| | - Takahito Nakajima
- Department of Diagnostic and Interventional Radiology, University of Tsukuba, Ibaraki, Japan
| | - Yoshito Tsushima
- Department of Diagnostic Radiology and Nuclear Medicine, Gunma University Graduate School of Medicine, Maebashi, Japan
| |
Collapse
|
214
|
Narváez-Villa P, Arenas-Ramírez B, Mira J, Aparicio-Izquierdo F. Analysis and Prediction of Vehicle Kilometers Traveled: A Case Study in Spain. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18168327. [PMID: 34444076 PMCID: PMC8391987 DOI: 10.3390/ijerph18168327] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 11/16/2022]
Abstract
Knowledge of the kilometers traveled by vehicles is essential in transport and road safety studies as an indicator of exposure and mobility. Its application in the determination of user risk indices in a disaggregated manner is of great interest to the scientific community and the authorities in charge of ensuring road safety on highways. This study used a sample of the data recorded during passenger vehicle inspections at Vehicle Technical Inspection stations and housed in a data warehouse managed by the General Directorate for Traffic of Spain. This study has three notable characteristics: (1) a novel data source is explored, (2) the methodology developed applies to other types of vehicles, with the level of disaggregation the data allows, and (3) pattern extraction and the estimate of mobility contribute to the continuous and necessary improvement of road safety indicators and are aligned with goal 3 (Good Health and Well-Being: Target 3.6) of The United Nations Sustainable Development Goals of the 2030 Agenda. An Operational Data Warehouse was created from the sample received, which helped in obtaining inference values for the kilometers traveled by Spanish fleet vehicles with a level of disaggregation that, to the knowledge of the authors, was unreachable with advanced statistical models. Three machine learning methods, CART, random forest, and gradient boosting, were optimized and compared based on the performance metrics of the models. The three methods identified the age, engine size, and tare weight of passenger vehicles as the factors with greatest influence on their travel patterns.
Collapse
Affiliation(s)
- Paúl Narváez-Villa
- University Institute for Automobile Research Francisco Aparicio Izquierdo (INSIA-UPM), Universidad Politécnica de Madrid (UPM), 28006 Madrid, Spain; (B.A.-R.); (F.A.-I.)
- Transportation Engineering Research Group, Universidad Politécnica Salesiana, Cuenca 010105, Ecuador
- Correspondence: or
| | - Blanca Arenas-Ramírez
- University Institute for Automobile Research Francisco Aparicio Izquierdo (INSIA-UPM), Universidad Politécnica de Madrid (UPM), 28006 Madrid, Spain; (B.A.-R.); (F.A.-I.)
| | - José Mira
- Statistics Department, Escuela Técnica Superior de Ingenieros Industriales (ETSII-UPM), Universidad Politécnica de Madrid (UPM), 28006 Madrid, Spain;
| | - Francisco Aparicio-Izquierdo
- University Institute for Automobile Research Francisco Aparicio Izquierdo (INSIA-UPM), Universidad Politécnica de Madrid (UPM), 28006 Madrid, Spain; (B.A.-R.); (F.A.-I.)
| |
Collapse
|
215
|
Chen Q, Zhao Y, Liu Y, Sun Y, Yang C, Li P, Zhang L, Gao C. MSLPNet: multi-scale location perception network for dental panoramic X-ray image segmentation. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-05790-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
216
|
Lu M, Parel JM, Miller D. Interactions between staphylococcal enterotoxins A and D and superantigen-like proteins 1 and 5 for predicting methicillin and multidrug resistance profiles among Staphylococcus aureus ocular isolates. PLoS One 2021; 16:e0254519. [PMID: 34320020 PMCID: PMC8318242 DOI: 10.1371/journal.pone.0254519] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 06/29/2021] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Methicillin-resistant Staphylococcus aureus (MRSA) and multidrug-resistant (MDR) S. aureus strains are well recognized as posing substantial problems in treating ocular infections. S. aureus has a vast array of virulence factors, including superantigens and enterotoxins. Their interactions and ability to signal antibiotics resistance have not been explored. OBJECTIVES To predict the relationship between superantigens and methicillin and multidrug resistance among S. aureus ocular isolates. METHODS We used a DNA microarray to characterize the enterotoxin and superantigen gene profiles of 98 S. aureus isolates collected from common ocular sources. The outcomes contained phenotypic and genotypic expressions of MRSA. We also included the MDR status as an outcome, categorized as resistance to three or more drugs, including oxacillin, penicillin, erythromycin, clindamycin, moxifloxacin, tetracycline, trimethoprim-sulfamethoxazole and gentamicin. We identified gene profiles that predicted each outcome through a classification analysis utilizing Random Forest machine learning techniques. FINDINGS Our machine learning models predicted the outcomes accurately utilizing 67 enterotoxin and superantigen genes. Strong correlates predicting the genotypic expression of MRSA were enterotoxins A, D, J and R and superantigen-like proteins 1, 3, 7 and 10. Among these virulence factors, enterotoxin D and superantigen-like proteins 1, 5 and 10 were also significantly informative for predicting both MDR and MRSA in terms of phenotypic expression. Strong interactions were identified including enterotoxins A (entA) interacting with superantigen-like protein 1 (set6-var1_11), and enterotoxin D (entD) interacting with superantigen-like protein 5 (ssl05/set3_probe 1): MRSA and MDR S. aureus are associated with the presence of both entA and set6-var1_11, or both entD and ssl05/set3_probe 1, while the absence of these genes in pairs indicates non-multidrug-resistant and methicillin-susceptible S. aureus. CONCLUSIONS MRSA and MDR S. aureus show a different spectrum of ocular pathology than their non-resistant counterparts. When assessing the role of enterotoxins in predicting antibiotics resistance, it is critical to consider both main effects and interactions.
Collapse
Affiliation(s)
- Min Lu
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States of America
| | - Jean-Marie Parel
- Department of Ophthalmology, Ophthalmic Biophysics Center, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL, United States of America
| | - Darlene Miller
- Department of Ophthalmology, Ocular Microbiology Laboratory, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL, United States of America
- * E-mail:
| |
Collapse
|
217
|
Biological knowledge-slanted random forest approach for the classification of calcified aortic valve stenosis. BioData Min 2021; 14:35. [PMID: 34301292 PMCID: PMC8305490 DOI: 10.1186/s13040-021-00269-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2021] [Accepted: 07/18/2021] [Indexed: 11/29/2022] Open
Abstract
Background Calcific aortic valve stenosis (CAVS) is a fatal disease and there is no pharmacological treatment to prevent the progression of CAVS. This study aims to identify genes potentially implicated with CAVS in patients with congenital bicuspid aortic valve (BAV) and tricuspid aortic valve (TAV) in comparison with patients having normal valves, using a knowledge-slanted random forest (RF). Results This study implemented a knowledge-slanted random forest (RF) using information extracted from a protein-protein interactions network to rank genes in order to modify their selection probability to draw the candidate split-variables. A total of 15,191 genes were assessed in 19 valves with CAVS (BAV, n = 10; TAV, n = 9) and 8 normal valves. The performance of the model was evaluated using accuracy, sensitivity, and specificity to discriminate cases with CAVS. A comparison with conventional RF was also performed. The performance of this proposed approach reported improved accuracy in comparison with conventional RF to classify cases separately with BAV and TAV (Slanted RF: 59.3% versus 40.7%). When patients with BAV and TAV were grouped against patients with normal valves, the addition of prior biological information was not relevant with an accuracy of 92.6%. Conclusion The knowledge-slanted RF approach reflected prior biological knowledge, leading to better precision in distinguishing between cases with BAV, TAV, and normal valves. The results of this study suggest that the integration of biological knowledge can be useful during difficult classification tasks. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00269-4.
Collapse
|
218
|
Chun HJ, Coutavas E, Pine AB, Lee AI, Yu VL, Shallow MK, Giovacchini CX, Mathews AM, Stephenson B, Que LG, Lee PJ, Kraft BD. Immunofibrotic drivers of impaired lung function in postacute sequelae of SARS-CoV-2 infection. JCI Insight 2021; 6:148476. [PMID: 34111030 PMCID: PMC8410030 DOI: 10.1172/jci.insight.148476] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 06/09/2021] [Indexed: 11/17/2022] Open
Abstract
BACKGROUNDIndividuals recovering from COVID-19 frequently experience persistent respiratory ailments, which are key elements of postacute sequelae of SARS-CoV-2 infection (PASC); however, little is known about the underlying biological factors that may direct lung recovery and the extent to which these are affected by COVID-19 severity.METHODSWe performed a prospective cohort study of individuals with persistent symptoms after acute COVID-19, collecting clinical data, pulmonary function tests, and plasma samples used for multiplex profiling of inflammatory, metabolic, angiogenic, and fibrotic factors.RESULTSSixty-one participants were enrolled across 2 academic medical centers at a median of 9 weeks (interquartile range, 6-10 weeks) after COVID-19 illness: n = 13 participants (21%) had mild COVID-19 and were not hospitalized, n = 30 participants (49%) were hospitalized but were considered noncritical, and n = 18 participants (30%) were hospitalized and in the intensive care unit (ICU). Fifty-three participants (85%) had lingering symptoms, most commonly dyspnea (69%) and cough (58%). Forced vital capacity (FVC), forced expiratory volume in 1 second (FEV1), and diffusing capacity for carbon monoxide (DLCO) declined as COVID-19 severity increased (P < 0.05) but these values did not correlate with respiratory symptoms. Partial least-squares discriminant analysis of plasma biomarker profiles clustered participants by past COVID-19 severity. Lipocalin-2 (LCN2), MMP-7, and HGF identified by our analysis were significantly higher in the ICU group (P < 0.05), inversely correlated with FVC and DLCO (P < 0.05), and were confirmed in a separate validation cohort (n = 53).CONCLUSIONSubjective respiratory symptoms are common after acute COVID-19 illness but do not correlate with COVID-19 severity or pulmonary function. Host response profiles reflecting neutrophil activation (LCN2), fibrosis signaling (MMP-7), and alveolar repair (HGF) track with lung impairment and may be novel therapeutic or prognostic targets.FundingNational Heart, Lung, and Blood Institute (K08HL130557 and R01HL142818), American Heart Association (Transformational Project Award), the DeLuca Foundation Award, a donation from Jack Levin to the Benign Hematology Program at Yale University, and Duke University.
Collapse
Affiliation(s)
- Hyung J. Chun
- Yale Cardiovascular Research Center, Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Elias Coutavas
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, Duke University School of Medicine, Durham, North Carolina, USA
| | - Alexander B. Pine
- Section of Hematology, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Alfred I. Lee
- Section of Hematology, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Vanessa L. Yu
- Yale Cardiovascular Research Center, Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Marcus K. Shallow
- Yale Cardiovascular Research Center, Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Coral X. Giovacchini
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, Duke University School of Medicine, Durham, North Carolina, USA
| | - Anne M. Mathews
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, Duke University School of Medicine, Durham, North Carolina, USA
| | - Brian Stephenson
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, Duke University School of Medicine, Durham, North Carolina, USA
| | - Loretta G. Que
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, Duke University School of Medicine, Durham, North Carolina, USA
| | - Patty J. Lee
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, Duke University School of Medicine, Durham, North Carolina, USA
| | - Bryan D. Kraft
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, Duke University School of Medicine, Durham, North Carolina, USA
| |
Collapse
|
219
|
Applying random forest in a health administrative data context: a conceptual guide. HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY 2021. [DOI: 10.1007/s10742-021-00255-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
220
|
Morita-Sherman M, Li M, Joseph B, Yasuda C, Vegh D, De Campos BM, Alvim MKM, Louis S, Bingaman W, Najm I, Jones S, Wang X, Blümcke I, Brinkmann BH, Worrell G, Cendes F, Jehi L. Incorporation of quantitative MRI in a model to predict temporal lobe epilepsy surgery outcome. Brain Commun 2021; 3:fcab164. [PMID: 34396113 PMCID: PMC8361423 DOI: 10.1093/braincomms/fcab164] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/01/2021] [Indexed: 11/23/2022] Open
Abstract
Quantitative volumetric brain MRI measurement is important in research applications, but translating it into patient care is challenging. We explore the incorporation of clinical automated quantitative MRI measurements in statistical models predicting outcomes of surgery for temporal lobe epilepsy. Four hundred and thirty-five patients with drug-resistant epilepsy who underwent temporal lobe surgery at Cleveland Clinic, Mayo Clinic and University of Campinas were studied. We obtained volumetric measurements from the pre-operative T1-weighted MRI using NeuroQuant, a Food and Drug Administration approved software package. We created sets of statistical models to predict the probability of complete seizure-freedom or an Engel score of I at the last follow-up. The cohort was randomly split into training and testing sets, with a ratio of 7:3. Model discrimination was assessed using the concordance statistic (C-statistic). We compared four sets of models and selected the one with the highest concordance index. Volumetric differences in pre-surgical MRI located predominantly in the frontocentral and temporal regions were associated with poorer outcomes. The addition of volumetric measurements to the model with clinical variables alone increased the model’s C-statistic from 0.58 to 0.70 (right-sided surgery) and from 0.61 to 0.66 (left-sided surgery) for complete seizure freedom and from 0.62 to 0.67 (right-sided surgery) and from 0.68 to 0.73 (left-sided surgery) for an Engel I outcome score. 57% of patients with extra-temporal abnormalities were seizure-free at last follow-up, compared to 68% of those with no such abnormalities (P-value = 0.02). Adding quantitative MRI data increases the performance of a model developed to predict post-operative seizure outcomes. The distribution of the regions of interest included in the final model supports the notion that focal epilepsies are network disorders and that subtle cortical volume loss outside the surgical site influences seizure outcome.
Collapse
Affiliation(s)
| | - Manshi Li
- Department of Quantitative Health Sciences, Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH, USA
| | - Boney Joseph
- Department of Neurology, Mayo Clinic, Rochester, MN, USA
| | - Clarissa Yasuda
- Department of Neurology, University of Campinas, Campinas, Brazil
| | - Deborah Vegh
- Department of Neurology, Epilepsy Center, Cleveland Clinic, Cleveland, OH, USA
| | | | - Marina K M Alvim
- Department of Neurology, University of Campinas, Campinas, Brazil
| | - Shreya Louis
- Department of Neurology, Epilepsy Center, Cleveland Clinic, Cleveland, OH, USA
| | - William Bingaman
- Department of Neurology, Epilepsy Center, Cleveland Clinic, Cleveland, OH, USA
| | - Imad Najm
- Department of Neurology, Epilepsy Center, Cleveland Clinic, Cleveland, OH, USA
| | - Stephen Jones
- Department of Neurology, Epilepsy Center, Cleveland Clinic, Cleveland, OH, USA
| | - Xiaofeng Wang
- Department of Quantitative Health Sciences, Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH, USA
| | - Ingmar Blümcke
- Department of Neuropathology, University Hospitals, Erlangen, Germany
| | | | | | - Fernando Cendes
- Department of Neurology, University of Campinas, Campinas, Brazil
| | - Lara Jehi
- Department of Neurology, Epilepsy Center, Cleveland Clinic, Cleveland, OH, USA
| |
Collapse
|
221
|
Wang Y, Guo H, Li S, Wang L, Song X, Zhao X. Identify risk factors and predict the postoperative risk of ESCC using ensemble learning. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
222
|
Buckley SJ, Harvey RJ, Shan Z. Application of the random forest algorithm to Streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models. Sci Rep 2021; 11:12687. [PMID: 34135390 PMCID: PMC8209152 DOI: 10.1038/s41598-021-91941-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 05/27/2021] [Indexed: 02/07/2023] Open
Abstract
Group A Streptococcus (GAS) is a globally significant bacterial pathogen. The GAS genotyping gold standard characterises the nucleotide variation of emm, which encodes a surface-exposed protein that is recombinogenic and under immune-based selection pressure. Within a supervised learning methodology, we tested three random forest (RF) algorithms (Guided, Ordinary, and Regularized) and 53 GAS response regulator (RR) allele types to infer six genomic traits (emm-type, emm-subtype, tissue and country of sample, clinical outcomes, and isolate invasiveness). The Guided, Ordinary, and Regularized RF classifiers inferred the emm-type with accuracies of 96.7%, 95.7%, and 95.2%, using ten, three, and four RR alleles in the feature set, respectively. Notably, we inferred the emm-type with 93.7% accuracy using only mga2 and lrp. We demonstrated a utility for inferring emm-subtype (89.9%), country (88.6%), invasiveness (84.7%), but not clinical (56.9%), or tissue (56.4%), which is consistent with the complexity of GAS pathophysiology. We identified a novel cell wall-spanning domain (SF5), and proposed evolutionary pathways depicting the 'contrariwise' and 'likewise' chimeric deletion-fusion of emm and enn. We identified an intermediate strain, which provides evidence of the time-dependent excision of mga regulon genes. Overall, our workflow advances the understanding of the GAS mga regulon and its plasticity.
Collapse
Affiliation(s)
- Sean J Buckley
- School of Health and Behavioural Sciences, University of the Sunshine Coast, Locked Bag 4, Maroochydore DC, QLD, 4558, Australia.
| | - Robert J Harvey
- School of Health and Behavioural Sciences, University of the Sunshine Coast, Locked Bag 4, Maroochydore DC, QLD, 4558, Australia
- Sunshine Coast Health Institute, Birtinya, QLD, 4575, Australia
| | - Zack Shan
- Thompson Institute, University of the Sunshine Coast, Birtinya, QLD, 4575, Australia
| |
Collapse
|
223
|
Speiser JL. A random forest method with feature selection for developing medical prediction models with clustered and longitudinal data. J Biomed Inform 2021; 117:103763. [PMID: 33781921 PMCID: PMC8131242 DOI: 10.1016/j.jbi.2021.103763] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 03/03/2021] [Accepted: 03/23/2021] [Indexed: 12/22/2022]
Abstract
BACKGROUND Machine learning methodologies are gaining popularity for developing medical prediction models for datasets with a large number of predictors, particularly in the setting of clustered and longitudinal data. Binary Mixed Model (BiMM) forest is a promising machine learning algorithm which may be applied to develop prediction models for clustered and longitudinal binary outcomes. Although machine learning methods for clustered and longitudinal methods such as BiMM forest exist, feature selection has not been analyzed via data simulations. Feature selection improves the practicality and ease of use of prediction models for clinicians by reducing the burden of data collection. Thus, feature selection procedures are not only beneficial, but are often necessary for development of medical prediction models. In this study, we aim to assess feature selection within the BiMM forest setting for modeling clustered and longitudinal binary outcomes. METHODS We conducted a simulation study to compare BiMM forest with feature selection (backward elimination or stepwise selection) to standard generalized linear mixed model feature selection methods (shrinkage and backward elimination). We also evaluated feature selection methods to develop models predicting mobility disability in older adults using the Health, Aging and Body Composition Study dataset as an example utilization of the proposed methodology. RESULTS BiMM forest with backward elimination generally offered higher computational efficiency, similar or higher predictive performance (accuracy and area under the receiver operating curve), and similar or higher ability to identify correct features compared to linear methods for the different simulated scenarios. For predicting mobility disability in older adults, methods generally performed similarly in terms of accuracy, area under the receiver operating curve, and specificity; however, BiMM forest with backward elimination had the highest sensitivity. CONCLUSIONS This study is novel because it is the first investigation of feature selection for developing random forest prediction models for clustered and longitudinal binary outcomes. Results from the simulation study reveal that BiMM forest with backward elimination has the highest accuracy (performance and identification of correct features) and lowest computation time compared to other feature selection methods in some scenarios and similar performance in other scenarios. Many informatics datasets have clustered and longitudinal outcomes and results from this study suggest that BiMM forest with backward elimination may be beneficial for developing medical prediction models.
Collapse
Affiliation(s)
- Jaime Lynn Speiser
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA.
| |
Collapse
|
224
|
Ellis CJ, Eaton S. Microclimates hold the key to spatial forest planning under climate change: Cyanolichens in temperate rainforest. GLOBAL CHANGE BIOLOGY 2021; 27:1915-1926. [PMID: 33421251 DOI: 10.1111/gcb.15514] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/23/2020] [Accepted: 12/27/2020] [Indexed: 06/12/2023]
Abstract
There is deepening interest in how microclimatic refugia can reduce species threat, if suitable climatic conditions are maintained locally, despite global climate change. Microclimates are a particularly important consideration in topographically heterogeneous landscapes, while in some habitats, such as forests and woodlands, microclimates are also extremely labile and affected by management practices that could consequently be used to offset climate change impact. This study explored a conservation priority guild-cyanolichen epiphytes in temperate rainforest-quantifying the niche response to macroclimate, and landscape or woodland stand structures that determine the microclimate. Based on epiphyte survey in a core region of European temperate rainforest (western Scotland), a 'random forest' machine-learning model confirmed a strong cyanolichen response to summer dryness, as well as the effects of distance to running water, topographic heatload and tree species identity, which modify the local moisture regime and/or lichen growth rates. By quantifying this response to macroclimate, landscape and stand structures, it was possible to estimate an extent to which woodland may be expanded in the future, to offset a negative effect of increasing summer dryness projected through to the 2080s. Using current policy as a yardstick, sufficient woodland expansion could be delivered relatively quickly for median impacted sites, but with times to woodland delivery extending over 10, 20 and 25 years for sites at the 75th, 90th and 95th percentiles of cyanolichen decline. Furthermore, the extent of new woodland required, and delivery times, increase almost threefold on average, as new woodland becomes distributed over wider riparian zones. These contrasting implications emphasize an urgent need for afforestation that achieves targeted spatial planning responsive to microclimates as refugia.
Collapse
Affiliation(s)
| | - Sally Eaton
- Royal Botanic Garden Edinburgh, Edinburgh, UK
| |
Collapse
|
225
|
Alcalá-Rmz V, Galván-Tejada CE, García-Hernández A, Valladares-Salgado A, Cruz M, Galván-Tejada JI, Celaya-Padilla JM, Luna-Garcia H, Gamboa-Rosales H. Identification of People with Diabetes Treatment through Lipids Profile Using Machine Learning Algorithms. Healthcare (Basel) 2021; 9:422. [PMID: 33917300 PMCID: PMC8067355 DOI: 10.3390/healthcare9040422] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 03/02/2021] [Accepted: 03/08/2021] [Indexed: 11/16/2022] Open
Abstract
Diabetes incidence has been a problem, because according with the World Health Organization and the International Diabetes Federation, the number of people with this disease is increasing very fast all over the world. Diabetic treatment is important to prevent the development of several complications, also lipid profile monitoring is important. For that reason the aim of this work is the implementation of machine learning algorithms that are able to classify cases, that corresponds to patients diagnosed with diabetes that have diabetes treatment, and controls that refers to subjects who do not have diabetes treatment but some of them have diabetes, bases on lipids profile levels. Logistic regression, K-nearest neighbor, decision trees and random forest were implemented, all of them were evaluated with accuracy, sensitivity, specificity and AUC-ROC curve metrics. Artificial neural network obtain an acurracy of 0.685 and an AUC value of 0.750, logistic regression achieve an accuracy of 0.729 and an AUC value of 0.795, K-nearest neighbor gets an accuracy of 0.669 and an AUC value of 0.709, on the other hand, decision tree reached an accuracy pg 0.691 and a AUC value of 0.683, finally random forest achieve an accuracy of 0.704 and an AUC curve of 0.776. The performance of all models was statistically significant, but the best performance model for this problem corresponds to logistic regression.
Collapse
Affiliation(s)
- Vanessa Alcalá-Rmz
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (V.A.-R.); (A.G.-H.); (J.I.G.-T.); (J.M.C.-P.); (H.L.-G.); (H.G.-R.)
| | - Carlos E. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (V.A.-R.); (A.G.-H.); (J.I.G.-T.); (J.M.C.-P.); (H.L.-G.); (H.G.-R.)
| | - Alejandra García-Hernández
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (V.A.-R.); (A.G.-H.); (J.I.G.-T.); (J.M.C.-P.); (H.L.-G.); (H.G.-R.)
| | - Adan Valladares-Salgado
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Av. Cuauhtémoc 330, Col. Doctores, Del. Cuauhtémoc, Mexico City 06720, Mexico; (A.V.-S.); (M.C.)
| | - Miguel Cruz
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Av. Cuauhtémoc 330, Col. Doctores, Del. Cuauhtémoc, Mexico City 06720, Mexico; (A.V.-S.); (M.C.)
| | - Jorge I. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (V.A.-R.); (A.G.-H.); (J.I.G.-T.); (J.M.C.-P.); (H.L.-G.); (H.G.-R.)
| | - Jose M. Celaya-Padilla
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (V.A.-R.); (A.G.-H.); (J.I.G.-T.); (J.M.C.-P.); (H.L.-G.); (H.G.-R.)
| | - Huizilopoztli Luna-Garcia
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (V.A.-R.); (A.G.-H.); (J.I.G.-T.); (J.M.C.-P.); (H.L.-G.); (H.G.-R.)
| | - Hamurabi Gamboa-Rosales
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (V.A.-R.); (A.G.-H.); (J.I.G.-T.); (J.M.C.-P.); (H.L.-G.); (H.G.-R.)
| |
Collapse
|
226
|
Kim YJ, Jeon JS, Cho SE, Kim KG, Kang SG. Prediction Models for Obstructive Sleep Apnea in Korean Adults Using Machine Learning Techniques. Diagnostics (Basel) 2021; 11:diagnostics11040612. [PMID: 33808100 PMCID: PMC8066462 DOI: 10.3390/diagnostics11040612] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 03/24/2021] [Accepted: 03/26/2021] [Indexed: 12/01/2022] Open
Abstract
This study aimed to investigate the applicability of machine learning to predict obstructive sleep apnea (OSA) among individuals with suspected OSA in South Korea. A total of 92 clinical variables for OSA were collected from 279 South Koreans (OSA, n = 213; no OSA, n = 66), from which seven major clinical indices were selected. The data were randomly divided into training data (OSA, n = 149; no OSA, n = 46) and test data (OSA, n = 64; no OSA, n = 20). Using the seven clinical indices, the OSA prediction models were trained using four types of machine learning models—logistic regression, support vector machine (SVM), random forest, and XGBoost (XGB)—and each model was validated using the test data. In the validation, the SVM showed the best OSA prediction result with a sensitivity, specificity, and area under curve (AUC) of 80.33%, 86.96%, and 0.87, respectively, while the XGB showed the lowest OSA prediction performance with a sensitivity, specificity, and AUC of 78.69%, 73.91%, and 0.80, respectively. The machine learning algorithms showed high OSA prediction performance using data from South Koreans with suspected OSA. Hence, machine learning will be helpful in clinical applications for OSA prediction in the Korean population.
Collapse
Affiliation(s)
- Young Jae Kim
- Department of Biomedical Engineering, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea; (Y.J.K.); (J.S.J.)
| | - Ji Soo Jeon
- Department of Biomedical Engineering, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea; (Y.J.K.); (J.S.J.)
| | - Seo-Eun Cho
- Department of Psychiatry, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea;
| | - Kwang Gi Kim
- Department of Biomedical Engineering, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea; (Y.J.K.); (J.S.J.)
- Correspondence: (K.G.K.); (S.-G.K.); Tel.: +82-32-458-2818 (S.-G.K.)
| | - Seung-Gul Kang
- Department of Psychiatry, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea;
- Correspondence: (K.G.K.); (S.-G.K.); Tel.: +82-32-458-2818 (S.-G.K.)
| |
Collapse
|
227
|
Kalina J, Neoral A, Vidnerová P. Effective Automatic Method Selection for Nonlinear Regression Modeling. Int J Neural Syst 2021; 31:2150020. [PMID: 33787471 DOI: 10.1142/s0129065721500209] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Metalearning, an important part of artificial intelligence, represents a promising approach for the task of automatic selection of appropriate methods or algorithms. This paper is interested in recommending a suitable estimator for nonlinear regression modeling, particularly in recommending either the standard nonlinear least squares estimator or one of such available alternative estimators, which is highly robust with respect to the presence of outliers in the data. The authors hold the opinion that theoretical considerations will never be able to formulate such recommendations for the nonlinear regression context. Instead, metalearning is explored here as an original approach suitable for this task. In this paper, four different approaches for automatic method selection for nonlinear regression are proposed and computations over a training database of 643 real publicly available datasets are performed. Particularly, while the metalearning results may be harmed by the imbalanced number of groups, an effective approach yields much improved results, performing a novel combination of supervised feature selection by random forest and oversampling by synthetic minority oversampling technique (SMOTE). As a by-product, the computations bring arguments in favor of the very recent nonlinear least weighted squares estimator, which turns out to outperform other (and much more renowned) estimators in a quite large percentage of datasets.
Collapse
Affiliation(s)
- Jan Kalina
- The Czech Academy of Sciences, Institute of Computer Science, Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic.,Charles University, Faculty of Mathematics and Physics, Sokolovská 83, 186 75 Prague 8, Czech Republic
| | - Aleš Neoral
- The Czech Academy of Sciences, Institute of Computer Science, Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic
| | - Petra Vidnerová
- The Czech Academy of Sciences, Institute of Computer Science, Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic
| |
Collapse
|
228
|
Buenafe RJQ, Kumanduri V, Sreenivasulu N. Deploying viscosity and starch polymer properties to predict cooking and eating quality models: A novel breeding tool to predict texture. Carbohydr Polym 2021; 260:117766. [PMID: 33712124 PMCID: PMC7973724 DOI: 10.1016/j.carbpol.2021.117766] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Revised: 01/30/2021] [Accepted: 02/02/2021] [Indexed: 12/15/2022]
Abstract
Multivariate analysis was used to develop twelve cooking and eating quality classes. Two-layered random forest model was used to predict rice classification. High classification accuracy of cooking and eating quality ideotypes were obtained. Mismatches from IRRI-released and consumer-preferred lines was capture by the model.
Acceptance of new rice genotypes demanded by rice value chain depends on premium value of varieties that match consumer demands of regional preferences. High throughput prediction tools are not available to breeders to classify cooking and eating quality (CEQ) ideotypes and to capture texture of varieties. The pasting properties in combination with starch properties were used to develop two layered models in order to classify the rice varieties into twelve distinct CEQ ideotypes with unique sensory profiles. Classification models developed using random forest method depicted the overall accuracy of 96 %. These CEQ models were found to be robust to predict ideotypes in both Indica and Japonica diversity panels grown under dry and wet seasons and across the years. We conducted random forest modeling using 1.8 million high density SNPs and identified top 1000 SNP features which explained CEQ model classification with the accuracy of 0.81. Furthermore these CEQ models were found to be valuable to predict textural preferences of IRRI breeding lines released during 1960–2013 and mega varieties preferred in South and South East Asia.
Collapse
Affiliation(s)
- Reuben James Q Buenafe
- Grain Quality and Nutrition Center, International Rice Research Institute, Los Baños, Laguna, 4031, Philippines; School of Chemical, Biological, Materials Engineering and Sciences, Mapua University, Muralla St., Intramuros, Manila, 1002, Philippines.
| | | | - Nese Sreenivasulu
- Grain Quality and Nutrition Center, International Rice Research Institute, Los Baños, Laguna, 4031, Philippines.
| |
Collapse
|
229
|
Maeda-Gutiérrez V, Galván-Tejada CE, Cruz M, Valladares-Salgado A, Galván-Tejada JI, Gamboa-Rosales H, García-Hernández A, Luna-García H, Gonzalez-Curiel I, Martínez-Acuña M. Distal Symmetric Polyneuropathy Identification in Type 2 Diabetes Subjects: A Random Forest Approach. Healthcare (Basel) 2021; 9:138. [PMID: 33535510 PMCID: PMC7912731 DOI: 10.3390/healthcare9020138] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 01/23/2021] [Accepted: 01/25/2021] [Indexed: 12/05/2022] Open
Abstract
The prevalence of diabetes mellitus is increasing worldwide, causing health and economic implications. One of the principal microvascular complications of type 2 diabetes is Distal Symmetric Polyneuropathy (DSPN), affecting 42.6% of the population in Mexico. Therefore, the purpose of this study was to find out the predictors of this complication. The dataset contained a total number of 140 subjects, including clinical and paraclinical features. A multivariate analysis was constructed using Boruta as a feature selection method and Random Forest as a classification algorithm applying the strategy of K-Folds Cross Validation and Leave One Out Cross Validation. Then, the models were evaluated through a statistical analysis based on sensitivity, specificity, area under the curve (AUC) and receiving operating characteristic (ROC) curve. The results present significant values obtained by the model with this approach, presenting 67% of AUC with only three features as predictors. It is possible to conclude that this proposed methodology can classify patients with DSPN, obtaining a preliminary computer-aided diagnosis tool for the clinical area in helping to identify the diagnosis of DSPN.
Collapse
Affiliation(s)
- Valeria Maeda-Gutiérrez
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juarez 147, Centro, 98000 Zacatecas, Zac, Mexico; (V.M.-G.); (J.I.G.-T.); (H.G.-R.); (A.G.-H.); (H.L.-G.)
| | - Carlos E. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juarez 147, Centro, 98000 Zacatecas, Zac, Mexico; (V.M.-G.); (J.I.G.-T.); (H.G.-R.); (A.G.-H.); (H.L.-G.)
| | - Miguel Cruz
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI. Instituto Mexicano del Seguro Social, Av. Cuauhtémoc 330, Col. Doctores, Del. Cuauhtémoc, Mexico City 06720, Mexico; (M.C.); (A.V.-S.)
| | - Adan Valladares-Salgado
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI. Instituto Mexicano del Seguro Social, Av. Cuauhtémoc 330, Col. Doctores, Del. Cuauhtémoc, Mexico City 06720, Mexico; (M.C.); (A.V.-S.)
| | - Jorge I. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juarez 147, Centro, 98000 Zacatecas, Zac, Mexico; (V.M.-G.); (J.I.G.-T.); (H.G.-R.); (A.G.-H.); (H.L.-G.)
| | - Hamurabi Gamboa-Rosales
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juarez 147, Centro, 98000 Zacatecas, Zac, Mexico; (V.M.-G.); (J.I.G.-T.); (H.G.-R.); (A.G.-H.); (H.L.-G.)
| | - Alejandra García-Hernández
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juarez 147, Centro, 98000 Zacatecas, Zac, Mexico; (V.M.-G.); (J.I.G.-T.); (H.G.-R.); (A.G.-H.); (H.L.-G.)
| | - Huizilopoztli Luna-García
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juarez 147, Centro, 98000 Zacatecas, Zac, Mexico; (V.M.-G.); (J.I.G.-T.); (H.G.-R.); (A.G.-H.); (H.L.-G.)
| | - Irma Gonzalez-Curiel
- Unidad Académica de Ciencias Químicas, Universidad Autónoma de Zacatecas, Jardín Juarez 147, Centro, Zacatecas 98000, Mexico; (I.G.-C.); (M.M.-A.)
| | - Mónica Martínez-Acuña
- Unidad Académica de Ciencias Químicas, Universidad Autónoma de Zacatecas, Jardín Juarez 147, Centro, Zacatecas 98000, Mexico; (I.G.-C.); (M.M.-A.)
| |
Collapse
|
230
|
Continual learning classification method with constant-sized memory cells based on the artificial immune system. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106673] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
231
|
Rahman T, Khandakar A, Hoque ME, Ibtehaz N, Kashem SB, Masud R, Shampa L, Hasan MM, Islam MT, Al-Maadeed S, Zughaier SM, Badran S, Doi SAR, Chowdhury MEH. Development and Validation of an Early Scoring System for Prediction of Disease Severity in COVID-19 Using Complete Blood Count Parameters. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2021; 9:120422-120441. [PMID: 34786318 PMCID: PMC8545188 DOI: 10.1109/access.2021.3105321] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Accepted: 08/07/2021] [Indexed: 05/08/2023]
Abstract
The coronavirus disease 2019 (COVID-19) after outbreaking in Wuhan increasingly spread throughout the world. Fast, reliable, and easily accessible clinical assessment of the severity of the disease can help in allocating and prioritizing resources to reduce mortality. The objective of the study was to develop and validate an early scoring tool to stratify the risk of death using readily available complete blood count (CBC) biomarkers. A retrospective study was conducted on twenty-three CBC blood biomarkers for predicting disease mortality for 375 COVID-19 patients admitted to Tongji Hospital, China from January 10 to February 18, 2020. Machine learning based key biomarkers among the CBC parameters as the mortality predictors were identified. A multivariate logistic regression-based nomogram and a scoring system was developed to categorize the patients in three risk groups (low, moderate, and high) for predicting the mortality risk among COVID-19 patients. Lymphocyte count, neutrophils count, age, white blood cell count, monocytes (%), platelet count, red blood cell distribution width parameters collected at hospital admission were selected as important biomarkers for death prediction using random forest feature selection technique. A CBC score was devised for calculating the death probability of the patients and was used to categorize the patients into three sub-risk groups: low (<=5%), moderate (>5% and <=50%), and high (>50%), respectively. The area under the curve (AUC) of the model for the development and internal validation cohort were 0.961 and 0.88, respectively. The proposed model was further validated with an external cohort of 103 patients of Dhaka Medical College, Bangladesh, which exhibits in an AUC of 0.963. The proposed CBC parameter-based prognostic model and the associated web-application, can help the medical doctors to improve the management by early prediction of mortality risk of the COVID-19 patients in the low-resource countries.
Collapse
Affiliation(s)
- Tawsifur Rahman
- Department of Electrical EngineeringQatar University Doha Qatar
| | - Amith Khandakar
- Department of Electrical EngineeringQatar University Doha Qatar
| | - Md Enamul Hoque
- Department of Biomedical EngineeringMilitary Institute of Science and Technology Dhaka 1216 Bangladesh
| | - Nabil Ibtehaz
- Department of Computer Science and EngineeringBangladesh University of Engineering and Technology Dhaka 1205 Bangladesh
| | - Saad Bin Kashem
- Faculty of Robotics and Advanced ComputingQatar Armed Forces-Academic Bridge Program, Qatar Foundation Doha Qatar
| | - Reehum Masud
- COVID Isolation UnitUnited Hospitals, Ltd. Dhaka 1212 Bangladesh
| | - Lutfunnahar Shampa
- Department of Obstetrics and GynecologyDhaka Medical College Hospital (COVID UNIT) Dhaka 1000 Bangladesh
| | | | - Mohammad Tariqul Islam
- Department of Electrical, Electronics and Systems EngineeringUniversiti Kebangsaan Malaysia Bangi Selangor 43600 Malaysia
| | - Somaya Al-Maadeed
- Department of Computer Science and EngineeringQatar University Doha Qatar
| | - Susu M Zughaier
- Department of Basic Medical SciencesCollege of MedicineQU Health, Qatar University Doha Qatar
| | - Saif Badran
- Department of Plastic SurgeryHamad Medical Corporation Doha Qatar
- Department of Population MedicineCollege of MedicineQU Health, Qatar University Doha Qatar
| | - Suhail A R Doi
- Department of Population MedicineCollege of MedicineQU Health, Qatar University Doha Qatar
| | | |
Collapse
|
232
|
Jiang F, Kutia M, Sarkissian AJ, Lin H, Long J, Sun H, Wang G. Estimating the Growing Stem Volume of Coniferous Plantations Based on Random Forest Using an Optimized Variable Selection Method. SENSORS 2020; 20:s20247248. [PMID: 33348807 PMCID: PMC7766647 DOI: 10.3390/s20247248] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 12/08/2020] [Accepted: 12/14/2020] [Indexed: 11/16/2022]
Abstract
Forest growing stem volume (GSV) reflects the richness of forest resources as well as the quality of forest ecosystems. Remote sensing technology enables robust and efficient GSV estimation as it greatly reduces the survey time and cost while facilitating periodic monitoring. Given its red edge bands and a short revisit time period, Sentinel-2 images were selected for the GSV estimation in Wangyedian forest farm, Inner Mongolia, China. The variable combination was shown to significantly affect the accuracy of the estimation model. After extracting spectral variables, texture features, and topographic factors, a stepwise random forest (SRF) method was proposed to select variable combinations and establish random forest regressions (RFR) for GSV estimation. The linear stepwise regression (LSR), Boruta, Variable Selection Using Random Forests (VSURF), and random forest (RF) methods were then used as references for comparison with the proposed SRF for selection of predictors and GSV estimation. Combined with the observed GSV data and the Sentinel-2 images, the distributions of GSV were generated by the RFR models with the variable combinations determined by the LSR, RF, Boruta, VSURF, and SRF. The results show that the texture features of Sentinel-2’s red edge bands can significantly improve the accuracy of GSV estimation. The SRF method can effectively select the optimal variable combination, and the SRF-based model results in the highest estimation accuracy with the decreases of relative root mean square error by 16.4%, 14.4%, 16.3%, and 10.6% compared with those from the LSR-, RF-, Boruta-, and VSURF-based models, respectively. The GSV distribution generated by the SRF-based model matched that of the field observations well. The results of this study are expected to provide a reference for GSV estimation of coniferous plantations.
Collapse
Affiliation(s)
- Fugen Jiang
- Research Center of Forestry Remote Sensing and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China; (F.J.); (H.L.); (J.L.); (G.W.)
- Key Laboratory of Forestry Remote Sensing Based Big Data and Ecological Security for Hunan Province, Changsha 410004, China
- Key Laboratory of National Forestry and Grassland Administration on Forest Resources Management and Monitoring in Southern Area, Changsha 410004, China
| | - Mykola Kutia
- Bangor College China, Bangor University, 498 Shaoshan Rd., Changsha 410004, China; (M.K.); (A.J.S.)
| | - Arbi J. Sarkissian
- Bangor College China, Bangor University, 498 Shaoshan Rd., Changsha 410004, China; (M.K.); (A.J.S.)
| | - Hui Lin
- Research Center of Forestry Remote Sensing and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China; (F.J.); (H.L.); (J.L.); (G.W.)
- Key Laboratory of Forestry Remote Sensing Based Big Data and Ecological Security for Hunan Province, Changsha 410004, China
- Key Laboratory of National Forestry and Grassland Administration on Forest Resources Management and Monitoring in Southern Area, Changsha 410004, China
| | - Jiangping Long
- Research Center of Forestry Remote Sensing and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China; (F.J.); (H.L.); (J.L.); (G.W.)
- Key Laboratory of Forestry Remote Sensing Based Big Data and Ecological Security for Hunan Province, Changsha 410004, China
- Key Laboratory of National Forestry and Grassland Administration on Forest Resources Management and Monitoring in Southern Area, Changsha 410004, China
| | - Hua Sun
- Research Center of Forestry Remote Sensing and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China; (F.J.); (H.L.); (J.L.); (G.W.)
- Key Laboratory of Forestry Remote Sensing Based Big Data and Ecological Security for Hunan Province, Changsha 410004, China
- Key Laboratory of National Forestry and Grassland Administration on Forest Resources Management and Monitoring in Southern Area, Changsha 410004, China
- Correspondence: ; Tel.: +86-138-758-821-84
| | - Guangxing Wang
- Research Center of Forestry Remote Sensing and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China; (F.J.); (H.L.); (J.L.); (G.W.)
- Department of Geography and Environmental Resources, Southern Illinois University, Carbondale, IL 62901, USA
| |
Collapse
|
233
|
On the Influence of Reference Mahalanobis Distance Space for Quality Classification of Complex Metal Parts Using Vibrations. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10238620] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Mahalanobis distance (MD) is a well-known metric in multivariate analysis to separate groups or populations. In the context of the Mahalanobis-Taguchi system (MTS), a set of normal observations are used to obtain their MD values and construct a reference Mahalanobis distance space, for which a suitable classification threshold can then be introduced to classify new observations as normal/abnormal. Aiming at enhancing the performance of feature screening and threshold determination in MTS, the authors have recently proposed an integrated Mahalanobis classification system (IMCS) algorithm with robust classification performance. However, the reference MD space considered in either MTS or IMCS is only based on normal samples. In this paper, an investigation on the influence of the reference MD space based on a set of (i) normal samples, (ii) abnormal samples, and (iii) both normal and abnormal samples for classification is performed. The potential of using an alternative MD space is evaluated for sorting complex metallic parts, i.e., good/bad structural quality, based on their broadband vibrational spectra. Results are discussed for a sparse and imbalanced experimental case study of complex-shaped metallic turbine blades with various damage types; a rich and balanced numerical case study of dogbone-cylinders is also considered.
Collapse
|
234
|
Remote Sensing of Lake Sediment Core Particle Size Using Hyperspectral Image Analysis. REMOTE SENSING 2020. [DOI: 10.3390/rs12233850] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Hyperspectral imaging has recently emerged in the geosciences as a technology that provides rapid, accurate, and high-resolution information from lake sediment cores. Here we introduce a new methodology to infer particle size distribution, an insightful proxy that tracks past changes in aquatic ecosystems and their catchments, from laboratory hyperspectral images of lake sediment cores. The proposed methodology includes data preparation, spectral preprocessing and transformation, variable selection, and model fitting. We evaluated random forest regression and other commonly used statistical methods to find the best model for particle size determination. We tested the performance of combinations of spectral transformation techniques, including absorbance, continuum removal, and first and second derivatives of the reflectance and absorbance, along with different regression models including partial least squares, multiple linear regression, principal component regression, and support vector regression, and evaluated the resulting root mean square error (RMSE), R-squared, and mean relative error (MRE). Our results show that a random forest regression model built on spectra absorbance significantly outperforms all other models. The new workflow demonstrated herein represents a much-improved method for generating inferences from hyperspectral imagery, which opens many new opportunities for advancing the study of sediment archives.
Collapse
|
235
|
Glacier Mapping Based on Random Forest Algorithm: A Case Study over the Eastern Pamir. WATER 2020. [DOI: 10.3390/w12113231] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Debris-covered glaciers are common features on the eastern Pamir and serve as important indicators of climate change promptly. However, mapping of debris-covered glaciers in alpine regions is still challenging due to many factors including the spectral similarity between debris and the adjacent bedrock, shadows cast from mountains and clouds, and seasonal snow cover. Considering that few studies have added movement velocity features when extracting glacier boundaries, we innovatively developed an automatic algorithm consisting of rule-based image segmentation and Random Forest to extract information about debris-covered glaciers with Landsat-8 OLI/TIRS data for spectral, texture and temperature features, multi-digital elevation models (DEMs) for elevation and topographic features, and the Inter-mission Time Series of Land Ice Velocity and Elevation (ITS_LIVE) for movement velocity features, and accuracy evaluation was performed to determine the optimal feature combination extraction of debris-covered glaciers. The study found that the overall accuracy of extracting debris-covered glaciers using combined movement velocity features is 97.60%, and the Kappa coefficient is 0.9624, which is better than the extraction results using other schemes. The high classification accuracy obtained using our method overcomes most of the above-mentioned challenges and can detect debris-covered glaciers, illustrating that this method can be executed efficiently, which will further help water resources management.
Collapse
|
236
|
Amato MP, Portaccio E, De Meo E. Understanding the pathophysiology of cognitive changes in MS: A step forward. Mult Scler 2020; 27:4-5. [PMID: 33146049 DOI: 10.1177/1352458520968038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Maria Pia Amato
- Department NEUROFARBA, University of Florence, Florence, Italy/IRCCS Fondazione Don Carlo Gnocchi, Florence, Italy
| | - Emilio Portaccio
- IRCCS Fondazione Don Carlo Gnocchi, Florence, Italy/Azienda Ospedaliero-Universitaria Careggi, Florence, Italy
| | - Ermelinda De Meo
- Department NEUROFARBA, University of Florence, Florence, Italy/Neuroimaging Research Unit, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy/Vita-Salute San Raffaele University, Milan, Italy
| |
Collapse
|
237
|
Bai X, Li J. The best configuration of collaborative knowledge innovation management from the perspective of artificial intelligence. KNOWLEDGE MANAGEMENT RESEARCH & PRACTICE 2020. [DOI: 10.1080/14778238.2020.1834886] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- XuJing Bai
- School of Management, Northwestern Polytechnical University , Xi’an, China
| | - JiaJun Li
- School of Management, Northwestern Polytechnical University , Xi’an, China
| |
Collapse
|
238
|
Modeling Road Accident Severity with Comparisons of Logistic Regression, Decision Tree and Random Forest. INFORMATION 2020. [DOI: 10.3390/info11050270] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
To reduce the damage caused by road accidents, researchers have applied different techniques to explore correlated factors and develop efficient prediction models. The main purpose of this study is to use one statistical and two nonparametric data mining techniques, namely, logistic regression (LR), classification and regression tree (CART), and random forest (RF), to compare their prediction capability, identify the significant variables (identified by LR) and important variables (identified by CART or RF) that are strongly correlated with road accident severity, and distinguish the variables that have significant positive influence on prediction performance. In this study, three prediction performance evaluation measures, accuracy, sensitivity and specificity, are used to find the best integrated method which consists of the most effective prediction model and the input variables that have higher positive influence on accuracy, sensitivity and specificity.
Collapse
|
239
|
An Optimized Object-Based Random Forest Algorithm for Marsh Vegetation Mapping Using High-Spatial-Resolution GF-1 and ZY-3 Data. REMOTE SENSING 2020. [DOI: 10.3390/rs12081270] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Discriminating marsh vegetation is critical for the rapid assessment and management of wetlands. The study area, Honghe National Nature Reserve (HNNR), a typical freshwater wetland, is located in Northeast China. This study optimized the parameters (mtry and ntrees) of an object-based random forest (RF) algorithm to improve the applicability of marsh vegetation classification. Multidimensional datasets were used as the input variables for model training, then variable selection was performed on the variables to eliminate redundancy, which improved classification efficiency and overall accuracy. Finally, the performance of a new generation of Chinese high-spatial-resolution Gaofen-1 (GF-1) and Ziyuan-3 (ZY-3) satellite images for marsh vegetation classification was evaluated using the improved object-based RF algorithm with accuracy assessment. The specific conclusions of this study are as follows: (1) Optimized object-based RF classifications consistently produced more than 70.26% overall accuracy for all scenarios of GF-1 and ZY-3 at the 95% confidence interval. The performance of ZY-3 imagery applied to marsh vegetation mapping is lower than that of GF-1 imagery due to the coarse spatial resolution. (2) Parameter optimization of the object-based RF algorithm effectively improved the stability and classification accuracy of the algorithm. After parameter adjustment, scenario 3 for GF-1 data had the highest classification accuracy of 84% (ZY-3 is 74.72%) at the 95% confidence interval. (3) The introduction of multidimensional datasets improved the overall accuracy of marsh vegetation mapping, but with many redundant variables. Using three variable selection algorithms to remove redundant variables from the multidimensional datasets effectively improved the classification efficiency and overall accuracy. The recursive feature elimination (RFE)-based variable selection algorithm had the best performance. (4) Optical spectral bands, spectral indices, mean value of green and NIR bands in textural information, DEM, TWI, compactness, max difference, and shape index are valuable variables for marsh vegetation mapping. (5) GF-1 and ZY-3 images had higher classification accuracy for forest, cropland, shrubs, and open water.
Collapse
|
240
|
A Random Forest Modelling Procedure for a Multi-Sensor Assessment of Tree Species Diversity. REMOTE SENSING 2020. [DOI: 10.3390/rs12071210] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Earth observation data can provide important information for tree species diversity mapping and monitoring. The relatively recent advances in remote sensing data characteristics and processing systems elevate the potential of satellite imagery for providing accurate, timely, consistent, and robust spatially explicit estimates of tree species diversity over forest ecosystems. This study was conducted in Northern Pindos National Park, the largest terrestrial park in Greece and aimed to assess the potential of four satellite sensors with different instrumental characteristics, for the estimation of tree diversity. Through field measurements, we originally quantified two diversity indices, namely the Shannon diversity index (H’) and Simpson’s diversity (D1). Random forest regression models were developed for associating remotely sensed spectral signal with tree species diversity within the area. The models generated from the use of the WorldView-2 image were the most accurate with a coefficient of determination of up to 0.44 for H’ and 0.37 for D1. The Sentinel-2 -based models of tree species diversity performed slightly worse, but were better than the Landsat-8 and RapidEye models. The coefficient of variation quantifying internal variability of spectral values within each plot provided little or no usage for improving the modelling accuracy. Our results suggest that very-high-spatial-resolution imagery provides the most important information for the assessment of tree species diversity in heterogeneous Mediterranean ecosystems.
Collapse
|
241
|
Antoniadi AM, Galvin M, Heverin M, Hardiman O, Mooney C. Prediction of caregiver burden in amyotrophic lateral sclerosis: a machine learning approach using random forests applied to a cohort study. BMJ Open 2020; 10:e033109. [PMID: 32114464 PMCID: PMC7050406 DOI: 10.1136/bmjopen-2019-033109] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Revised: 02/05/2020] [Accepted: 02/07/2020] [Indexed: 12/13/2022] Open
Abstract
OBJECTIVES Amyotrophic lateral sclerosis (ALS) is a rare neurodegenerative disease that is characterised by the rapid degeneration of upper and lower motor neurons and has a fatal trajectory 3-4 years from symptom onset. Due to the nature of the condition patients with ALS require the assistance of informal caregivers whose task is demanding and can lead to high feelings of burden. This study aims to predict caregiver burden and identify related features using machine learning techniques. DESIGN This included demographic and socioeconomic information, quality of life, anxiety and depression questionnaires, for patients and carers, resource use of patients and clinical information. The method used for prediction was the Random forest algorithm. SETTING AND PARTICIPANTS This study investigates a cohort of 90 patients and their primary caregiver at three different time-points. The patients were attending the National ALS/Motor Neuron Disease Multidisciplinary Clinic at Beaumont Hospital, Dublin. RESULTS The caregiver's quality of life and psychological distress were the most predictive features of burden (0.92 sensitivity and 0.78 specificity). The most predictive features for Clinical Decision Support model were associated with the weekly caregiving duties of the primary caregiver as well as their age and health and also the patient's physical functioning and age of onset. However, this model had a lower sensitivity and specificity score (0.84 and 0.72, respectively). The ability of patients without gastrostomy to cut food and handle utensils was also highly predictive of burden in this study. Generally, our models are better in predicting the high-risk category, and we suggest that information related to the caregiver's quality of life and psychological distress is required. CONCLUSION This work demonstrates a proof of concept of an informatics solution to identifying caregivers at risk of burden that could be incorporated into future care pathways.
Collapse
Affiliation(s)
- Anna Markella Antoniadi
- UCD School of Computer Science, University College Dublin, Dublin, Ireland
- FutureNeuro SFI Research Centre, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Miriam Galvin
- Academic Unit of Neurology, Trinity Biomedical Sciences Institute, University of Dublin Trinity College, Dublin, Ireland
| | - Mark Heverin
- Academic Unit of Neurology, Trinity Biomedical Sciences Institute, University of Dublin Trinity College, Dublin, Ireland
| | - Orla Hardiman
- FutureNeuro SFI Research Centre, Royal College of Surgeons in Ireland, Dublin, Ireland
- Academic Unit of Neurology, Trinity Biomedical Sciences Institute, University of Dublin Trinity College, Dublin, Ireland
- Department of Neurology, National Neuroscience Centre, Beaumont Hospital, Dublin, Ireland
| | - Catherine Mooney
- UCD School of Computer Science, University College Dublin, Dublin, Ireland
- FutureNeuro SFI Research Centre, Royal College of Surgeons in Ireland, Dublin, Ireland
| |
Collapse
|
242
|
Predicting Microhabitat Suitability for an Endangered Small Mammal Using Sentinel-2 Data. REMOTE SENSING 2020. [DOI: 10.3390/rs12030562] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Accurate mapping is a main challenge for endangered small-sized terrestrial species. Freely available spatio-temporal data at high resolution from multispectral satellite offer excellent opportunities for improving predictive distribution models of such species based on fine-scale habitat features, thus making it easier to achieve comprehensive biodiversity conservation goals. However, there are still few examples showing the utility of remote-sensing-based products in mapping microhabitat suitability for small species of conservation concern. Here, we address this issue using Sentinel-2 sensor-derived habitat variables, used in combination with more commonly used explanatory variables (e.g., topography), to predict the distribution of the endangered Cabrera vole (Microtus cabrerae) in agrosilvopastorial systems. Based on vole surveys conducted in two different seasons over a ~176,000 ha landscape in Southern Portugal, we assessed the significance of each predictor in explaining Cabrera vole occurrence using the Boruta algorithm, a novel Random forest variant for dealing with high dimensionality of explanatory variables. Overall, results showed a strong contribution of Sentinel-2-derived variables for predicting microhabitat suitability of Cabrera voles. In particular, we found that photosynthetic activity (NDI45), specific spectral signal (SWIR1), and landscape heterogeneity (Rao’s Q) were good proxies of Cabrera voles’ microhabitat, mostly during temporally greener and wetter conditions. In addition to remote-sensing-based variables, the presence of road verges was also an important driver of voles’ distribution, highlighting their potential role as refuges and/or corridors. Overall, our study supports the use of remote-sensing data to predict microhabitat suitability for endangered small-sized species in marginal areas that potentially hold most of the biodiversity found in human-dominated landscapes. We believe our approach can be widely applied to other species, for which detailed habitat mapping over large spatial extents is difficult to obtain using traditional descriptors. This would certainly contribute to improving conservation planning, thereby contributing to global conservation efforts in landscapes that are managed for multiple purposes.
Collapse
|
243
|
Chen J, Li Q, Wang H, Deng M. A Machine Learning Ensemble Approach Based on Random Forest and Radial Basis Function Neural Network for Risk Evaluation of Regional Flood Disaster: A Case Study of the Yangtze River Delta, China. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 17:E49. [PMID: 31861677 PMCID: PMC6982166 DOI: 10.3390/ijerph17010049] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 12/07/2019] [Accepted: 12/17/2019] [Indexed: 11/16/2022]
Abstract
The Yangtze River Delta (YRD) is one of the most developed regions in China. This is also a flood-prone area where flood disasters are frequently experienced; the situations between the people-land nexus and the people-water nexus are very complicated. Therefore, the accurate assessment of flood risk is of great significance to regional development. The paper took the YRD urban agglomeration as the research case. The driving force, pressure, state, impact and response (DPSIR) conceptual framework was established to analyze the indexes of flood disasters. The random forest (RF) algorithm was used to screen important indexes of floods risk, and a risk assessment model based on the radial basis function (RBF) neural network was constructed to evaluate the flood risk level in this region from 2009 to 2018. The risk map showed the I-V level of flood risk in the YRD urban agglomeration from 2016 to 2018 by using the geographic information system (GIS). Further analysis indicated that the indexes such as flood season rainfall, urban impervious area ratio, gross domestic product (GDP) per square kilometer of land, water area ratio, population density and emergency rescue capacity of public administration departments have important influence on flood risk. The flood risk has been increasing in the YRD urban agglomeration during the past ten years under the urbanization background, and economic development status showed a significant positive correlation with flood risks. In addition, there were serious differences in the rising rate of flood risks and the status quo among provinces. There are still a few cities that have stabilized at a better flood-risk level through urban flood control measures from 2016 to 2018. These results were basically in line with the actual situation, which validated the effectiveness of the model. Finally, countermeasures and suggestions for reducing the urban flood risk in the YRD region were proposed, in order to provide decision support for flood control, disaster reduction and emergency management in the YRD region.
Collapse
Affiliation(s)
- Junfei Chen
- State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Hohai University, Nanjing 210098, China; (H.W.); (M.D.)
- Business School, Hohai University, Nanjing 211100, China;
| | - Qian Li
- Business School, Hohai University, Nanjing 211100, China;
| | - Huimin Wang
- State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Hohai University, Nanjing 210098, China; (H.W.); (M.D.)
- Business School, Hohai University, Nanjing 211100, China;
| | - Menghua Deng
- State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Hohai University, Nanjing 210098, China; (H.W.); (M.D.)
- Business School, Hohai University, Nanjing 211100, China;
| |
Collapse
|