1
|
Paula DP, Camacho M, Barbosa O, Marques L, Harter Griep R, da Fonseca MJM, Barreto S, Lekadir K. Sex and population differences in the cardiometabolic continuum: a machine learning study using the UK Biobank and ELSA-Brasil cohorts. BMC Public Health 2024; 24:2131. [PMID: 39107721 PMCID: PMC11304673 DOI: 10.1186/s12889-024-19395-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 04/08/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND The temporal relationships across cardiometabolic diseases (CMDs) were recently conceptualized as the cardiometabolic continuum (CMC), sequence of cardiovascular events that stem from gene-environmental interactions, unhealthy lifestyle influences, and metabolic diseases such as diabetes, and hypertension. While the physiological pathways linking metabolic and cardiovascular diseases have been investigated, the study of the sex and population differences in the CMC have still not been described. METHODS We present a machine learning approach to model the CMC and investigate sex and population differences in two distinct cohorts: the UK Biobank (17,700 participants) and the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil) (7162 participants). We consider the following CMDs: hypertension (Hyp), diabetes (DM), heart diseases (HD: angina, myocardial infarction, or heart failure), and stroke (STK). For the identification of the CMC patterns, individual trajectories with the time of disease occurrence were clustered using k-means. Based on clinical, sociodemographic, and lifestyle characteristics, we built multiclass random forest classifiers and used the SHAP methodology to evaluate feature importance. RESULTS Five CMC patterns were identified across both sexes and cohorts: EarlyHyp, FirstDM, FirstHD, Healthy, and LateHyp, named according to prevalence and disease occurrence time that depicted around 95%, 78%, 75%, 88% and 99% of individuals, respectively. Within the UK Biobank, more women were classified in the Healthy cluster and more men in all others. In the EarlyHyp and LateHyp clusters, isolated hypertension occurred earlier among women. Smoking habits and education had high importance and clear directionality for both sexes. For ELSA-Brasil, more men were classified in the Healthy cluster and more women in the FirstDM. The diabetes occurrence time when followed by hypertension was lower among women. Education and ethnicity had high importance and clear directionality for women, while for men these features were smoking, alcohol, and coffee consumption. CONCLUSIONS There are clear sex differences in the CMC that varied across the UK and Brazilian cohorts. In particular, disadvantages regarding incidence and the time to onset of diseases were more pronounced in Brazil, against woman. The results show the need to strengthen public health policies to prevent and control the time course of CMD, with an emphasis on women.
Collapse
Affiliation(s)
- Daniela Polessa Paula
- National School of Statistical Sciences, Brazilian Institute of Geography and Statistics, Rio de Janeiro, Brazil.
- Institute of Mathematics and Statistics, University of the Rio de Janeiro State, Rio de Janeiro, Brazil.
| | - Marina Camacho
- Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain
| | - Odaleia Barbosa
- Institute of Nutrition, University of the Rio de Janeiro State, Rio de Janeiro, Brazil
| | - Larissa Marques
- Coordination of Information and Communication (CINCO - PEIC), Oswaldo Cruz Foundation, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Rosane Harter Griep
- Health and Environmental Education Laboratory, Oswaldo Cruz Institute (IOC), Rio de Janeiro, RJ, Brazil
| | | | - Sandhi Barreto
- Postgraduate Program in Public Health, School of Medicine & Clinical Hospital, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Karim Lekadir
- Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
2
|
Anteneh LM, Lokonon BE, Kakaï RG. Modelling techniques in cholera epidemiology: A systematic and critical review. Math Biosci 2024; 373:109210. [PMID: 38777029 DOI: 10.1016/j.mbs.2024.109210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 05/09/2024] [Accepted: 05/13/2024] [Indexed: 05/25/2024]
Abstract
Diverse modelling techniques in cholera epidemiology have been developed and used to (1) study its transmission dynamics, (2) predict and manage cholera outbreaks, and (3) assess the impact of various control and mitigation measures. In this study, we carry out a critical and systematic review of various approaches used for modelling the dynamics of cholera. Also, we discuss the strengths and weaknesses of each modelling approach. A systematic search of articles was conducted in Google Scholar, PubMed, Science Direct, and Taylor & Francis. Eligible studies were those concerned with the dynamics of cholera excluding studies focused on models for cholera transmission in animals, socio-economic factors, and genetic & molecular related studies. A total of 476 peer-reviewed articles met the inclusion criteria, with about 40% (32%) of the studies carried out in Asia (Africa). About 52%, 21%, and 9%, of the studies, were based on compartmental (e.g., SIRB), statistical (time series and regression), and spatial (spatiotemporal clustering) models, respectively, while the rest of the analysed studies used other modelling approaches such as network, machine learning and artificial intelligence, Bayesian, and agent-based approaches. Cholera modelling studies that incorporate vector/housefly transmission of the pathogen are scarce and a small portion of researchers (3.99%) considers the estimation of key epidemiological parameters. Vaccination only platform was utilized as a control measure in more than half (58%) of the studies. Research productivity in cholera epidemiological modelling studies have increased in recent years, but authors used diverse range of models. Future models should consider incorporating vector/housefly transmission of the pathogen and on the estimation of key epidemiological parameters for the transmission of cholera dynamics.
Collapse
Affiliation(s)
- Leul Mekonnen Anteneh
- Laboratoire de Biomathématiques et d'Estimations Forestières, University of Abomey-Calavi, Cotonou, Benin.
| | - Bruno Enagnon Lokonon
- Laboratoire de Biomathématiques et d'Estimations Forestières, University of Abomey-Calavi, Cotonou, Benin
| | - Romain Glèlè Kakaï
- Laboratoire de Biomathématiques et d'Estimations Forestières, University of Abomey-Calavi, Cotonou, Benin
| |
Collapse
|
3
|
Massago M, Massago M, Iora PH, Tavares Gurgel SJ, Conegero CI, Carolino IDR, Mushi MM, Chaves Forato GA, de Souza JVP, Hernandes Rocha TA, Bonfim S, Staton CA, Nihei OK, Vissoci JRN, de Andrade L. Applicability of machine learning algorithm to predict the therapeutic intervention success in Brazilian smokers. PLoS One 2024; 19:e0295970. [PMID: 38437221 PMCID: PMC10911606 DOI: 10.1371/journal.pone.0295970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 12/02/2023] [Indexed: 03/06/2024] Open
Abstract
Smoking cessation is an important public health policy worldwide. However, as far as we know, there is a lack of screening of variables related to the success of therapeutic intervention (STI) in Brazilian smokers by machine learning (ML) algorithms. To address this gap in the literature, we evaluated the ability of eight ML algorithms to correctly predict the STI in Brazilian smokers who were treated at a smoking cessation program in Brazil between 2006 and 2017. The dataset was composed of 12 variables and the efficacies of the algorithms were measured by accuracy, sensitivity, specificity, positive predictive value (PPV) and area under the receiver operating characteristic curve. We plotted a decision tree flowchart and also measured the odds ratio (OR) between each independent variable and the outcome, and the importance of the variable for the best model based on PPV. The mean global values for the metrics described above were, respectively, 0.675±0.028, 0.803±0.078, 0.485±0.146, 0.705±0.035 and 0.680±0.033. Supporting vector machines performed the best algorithm with a PPV of 0.726±0.031. Smoking cessation drug use was the roof of decision tree with OR of 4.42 and importance of variable of 100.00. Increase in the number of relapses also promoted a positive outcome, while higher consumption of cigarettes resulted in the opposite. In summary, the best model predicted 72.6% of positive outcomes correctly. Smoking cessation drug use and higher number of relapses contributed to quit smoking, while higher consumption of cigarettes showed the opposite effect. There are important strategies to reduce the number of smokers and increase STI by increasing services and drug treatment for smokers.
Collapse
Affiliation(s)
- Miyoko Massago
- PhD Student in the Postgraduate Program in Health Sciences, State University of Maringa, Maringa, Parana, Brazil
| | - Mamoru Massago
- Master in Computer Sciences, State University of Maringa, Maringa, Parana, Brazil
| | - Pedro Henrique Iora
- Professor in the Morphological Sciences Department, State University of Maringa, Maringa, Parana, Brazil
| | | | - Celso Ivam Conegero
- Professor in the Department of Medicine, State University of Maringa, Maringa, Parana, Brazil
| | | | - Maria Muzanila Mushi
- Global Emergency Medicine Innovation and Implementation Research Center, Duke University School of Medicine, Duke Global Health Institute, Durham, North Carolina, United States of America
| | | | - João Vitor Perez de Souza
- Assistant Professor of Emergency Medicine and Global Health, Duke Global Health Institute, Department of Emergency Medicine, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - Thiago Augusto Hernandes Rocha
- Assistant Professor of Emergency Medicine and Global Health, Duke Global Health Institute, Department of Emergency Medicine, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - Samile Bonfim
- PhD Student in the Postgraduate Program in Health Sciences, State University of Maringa, Maringa, Parana, Brazil
| | - Catherine Ann Staton
- Assistant Professor of Emergency Medicine and Global Health, Duke Global Health Institute, Department of Emergency Medicine, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - Oscar Kenji Nihei
- Professor in the Center of Education, Literature and Health, Western Parana State University, Foz do Iguaçu, Parana, Brazil
| | - João Ricardo Nickenig Vissoci
- Assistant Professor of Emergency Medicine and Global Health, Duke Global Health Institute, Department of Emergency Medicine, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - Luciano de Andrade
- Professor in the Postgraduate Program in Health Sciences, State University of Maringa, Maringa, Parana, Brazil
| |
Collapse
|
4
|
Budhathoki N, Bhandari R, Bashyal S, Lee C. Predicting asthma using imbalanced data modeling techniques: Evidence from 2019 Michigan BRFSS data. PLoS One 2023; 18:e0295427. [PMID: 38060576 PMCID: PMC10703315 DOI: 10.1371/journal.pone.0295427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 11/10/2023] [Indexed: 12/18/2023] Open
Abstract
Studies in the past have examined asthma prevalence and the associated risk factors in the United States using data from national surveys. However, the findings of these studies may not be relevant to specific states because of the different environmental and socioeconomic factors that vary across regions. The 2019 Behavioral Risk Factor Surveillance System (BRFSS) showed that Michigan had higher asthma prevalence rates than the national average. In this regard, we employ various modern machine learning techniques to predict asthma and identify risk factors associated with asthma among Michigan adults using the 2019 BRFSS data. After data cleaning, a sample of 10,337 individuals was selected for analysis, out of which 1,118 individuals (10.8%) reported having asthma during the survey period. Typical machine learning techniques often perform poorly due to imbalanced data issues. To address this challenge, we employed two synthetic data generation techniques, namely the Random Over-Sampling Examples (ROSE) and Synthetic Minority Over-Sampling Technique (SMOTE) and compared their performances. The overall performance of machine learning algorithms was improved using both methods, with ROSE performing better than SMOTE. Among the ROSE-adjusted models, we found that logistic regression, partial least squares, gradient boosting, LASSO, and elastic net had comparable performance, with sensitivity at around 50% and area under the curve (AUC) at around 63%. Due to ease of interpretability, logistic regression is chosen for further exploration of risk factors. Presence of chronic obstructive pulmonary disease, lower income, female sex, financial barrier to see a doctor due to cost, taken flu shot/spray in the past 12 months, 18-24 age group, Black, non-Hispanic group, and presence of diabetes are identified as asthma risk factors. This study demonstrates the potentiality of machine learning coupled with imbalanced data modeling approaches for predicting asthma from a large survey dataset. We conclude that the findings could guide early screening of at-risk asthma patients and designing appropriate interventions to improve care practices.
Collapse
Affiliation(s)
- Nirajan Budhathoki
- Department of Statistics, Actuarial & Data Sciences, Central Michigan University, Mount Pleasant, Michigan, United States of America
| | - Ramesh Bhandari
- Department of Physics, Central Michigan University, Mount Pleasant, Michigan, United States of America
| | - Suraj Bashyal
- Department of Geography & Environmental Studies, Central Michigan University, Mount Pleasant, Michigan, United States of America
| | - Carl Lee
- Department of Statistics, Actuarial & Data Sciences, Central Michigan University, Mount Pleasant, Michigan, United States of America
| |
Collapse
|
5
|
Chen K, Abtahi F, Carrero JJ, Fernandez-Llatas C, Seoane F. Process mining and data mining applications in the domain of chronic diseases: A systematic review. Artif Intell Med 2023; 144:102645. [PMID: 37783545 DOI: 10.1016/j.artmed.2023.102645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 08/24/2023] [Accepted: 08/28/2023] [Indexed: 10/04/2023]
Abstract
The widespread use of information technology in healthcare leads to extensive data collection, which can be utilised to enhance patient care and manage chronic illnesses. Our objective is to summarise previous studies that have used data mining or process mining methods in the context of chronic diseases in order to identify research trends and future opportunities. The review covers articles that pertain to the application of data mining or process mining methods on chronic diseases that were published between 2000 and 2022. Articles were sourced from PubMed, Web of Science, EMBASE, and Google Scholar based on predetermined inclusion and exclusion criteria. A total of 71 articles met the inclusion criteria and were included in the review. Based on the literature review results, we detected a growing trend in the application of data mining methods in diabetes research. Additionally, a distinct increase in the use of process mining methods to model clinical pathways in cancer research was observed. Frequently, this takes the form of a collaborative integration of process mining, data mining, and traditional statistical methods. In light of this collaborative approach, the meticulous selection of statistical methods based on their underlying assumptions is essential when integrating these traditional methods with process mining and data mining methods. Another notable challenge is the lack of standardised guidelines for reporting process mining studies in the medical field. Furthermore, there is a pressing need to enhance the clinical interpretation of data mining and process mining results.
Collapse
Affiliation(s)
- Kaile Chen
- Department of Clinical Science, Intervention and Technology, Karolinska Institutet, 17177 Stockholm, Sweden; School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Biomedical Engineering and Health Systems, Division of Ergonomics, KTH Royal Institute of Technology, 14157 Stockholm, Sweden.
| | - Farhad Abtahi
- Department of Clinical Science, Intervention and Technology, Karolinska Institutet, 17177 Stockholm, Sweden; School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Biomedical Engineering and Health Systems, Division of Ergonomics, KTH Royal Institute of Technology, 14157 Stockholm, Sweden; Department of Clinical Physiology, Karolinska University Hospital, 17176 Stockholm, Sweden
| | - Juan-Jesus Carrero
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177 Stockholm, Sweden
| | - Carlos Fernandez-Llatas
- Department of Clinical Science, Intervention and Technology, Karolinska Institutet, 17177 Stockholm, Sweden; SABIEN, ITACA, Universitat Politècnica de València, Spain
| | - Fernando Seoane
- Department of Clinical Science, Intervention and Technology, Karolinska Institutet, 17177 Stockholm, Sweden; Department of Clinical Physiology, Karolinska University Hospital, 17176 Stockholm, Sweden; Department of Medical Technology, Karolinska University Hospital, 17176 Stockholm, Sweden; Department of Textile Technology, University of Borås, 50190 Borås, Sweden
| |
Collapse
|
6
|
Terabe ML, Massago M, Iora PH, Hernandes Rocha TA, de Souza JVP, Huo L, Massago M, Senda DM, Kobayashi EM, Vissoci JR, Staton CA, de Andrade L. Applicability of machine learning technique in the screening of patients with mild traumatic brain injury. PLoS One 2023; 18:e0290721. [PMID: 37616279 PMCID: PMC10449130 DOI: 10.1371/journal.pone.0290721] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 08/14/2023] [Indexed: 08/26/2023] Open
Abstract
Even though the demand of head computed tomography (CT) in patients with mild traumatic brain injury (TBI) has progressively increased worldwide, only a small number of individuals have intracranial lesions that require neurosurgical intervention. As such, this study aims to evaluate the applicability of a machine learning (ML) technique in the screening of patients with mild TBI in the Regional University Hospital of Maringá, Paraná state, Brazil. This is an observational, descriptive, cross-sectional, and retrospective study using ML technique to develop a protocol that predicts which patients with an initial diagnosis of mild TBI should be recommended for a head CT. Among the tested models, he linear extreme gradient boosting was the best algorithm, with the highest sensitivity (0.70 ± 0.06). Our predictive model can assist in the screening of mild TBI patients, assisting health professionals to manage the resource utilization, and improve the quality and safety of patient care.
Collapse
Affiliation(s)
- Miriam Leiko Terabe
- Postgraduate Program in Management, Technology and Innovation in Urgency and Emergency, State University of Maringa, Maringa, Parana, Brazil
| | - Miyoko Massago
- Postgraduate Program in Health Sciences, State University of Maringa, Maringa, Parana, Brazil
| | - Pedro Henrique Iora
- Department of Medicine, State University of Maringa, Maringa, Parana, Brazil
| | | | - João Vitor Perez de Souza
- Postgraduate Program in Biosciences and Physiopathology, State University of Maringa, Maringa, Parana, Brazil
| | - Lily Huo
- Duke Global Health Institute, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Mamoru Massago
- Postgraduate Program in Computer Sciences, State University of Maringa, Maringa, Parana, Brazil
| | - Dalton Makoto Senda
- Postgraduate Program in Health Sciences, State University of Maringa, Maringa, Parana, Brazil
| | | | - João Ricardo Vissoci
- Postgraduate Program in Health Sciences, State University of Maringa, Maringa, Parana, Brazil
- Duke Global Health Institute, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Catherine Ann Staton
- Postgraduate Program in Health Sciences, State University of Maringa, Maringa, Parana, Brazil
- Duke Global Health Institute, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Luciano de Andrade
- Postgraduate Program in Management, Technology and Innovation in Urgency and Emergency, State University of Maringa, Maringa, Parana, Brazil
- Postgraduate Program in Health Sciences, State University of Maringa, Maringa, Parana, Brazil
- Department of Medicine, State University of Maringa, Maringa, Parana, Brazil
| |
Collapse
|
7
|
Chellappan D, Rajaguru H. Detection of Diabetes through Microarray Genes with Enhancement of Classifiers Performance. Diagnostics (Basel) 2023; 13:2654. [PMID: 37627916 PMCID: PMC10453776 DOI: 10.3390/diagnostics13162654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 08/06/2023] [Accepted: 08/07/2023] [Indexed: 08/27/2023] Open
Abstract
Diabetes is a life-threatening, non-communicable disease. Diabetes mellitus is a prevalent chronic disease with a significant global impact. The timely detection of diabetes in patients is necessary for an effective treatment. The primary objective of this study is to propose a novel approach for identifying type II diabetes mellitus using microarray gene data. Specifically, our research focuses on the performance enhancement of methods for detecting diabetes. Four different Dimensionality Reduction techniques, Detrend Fluctuation Analysis (DFA), the Chi-square probability density function (Chi2pdf), the Firefly algorithm, and Cuckoo Search, are used to reduce high dimensional data. Metaheuristic algorithms like Particle Swarm Optimization (PSO) and Harmonic Search (HS) are used for feature selection. Seven classifiers, Non-Linear Regression (NLR), Linear Regression (LR), Logistics Regression (LoR), Gaussian Mixture Model (GMM), Bayesian Linear Discriminant Classifier (BLDC), Softmax Discriminant Classifier (SDC), and Support Vector Machine-Radial Basis Function (SVM-RBF), are utilized to classify the diabetic and non-diabetic classes. The classifiers' performances are analyzed through parameters such as accuracy, recall, precision, F1 score, error rate, Matthews Correlation Coefficient (MCC), Jaccard metric, and kappa. The SVM (RBF) classifier with the Chi2pdf Dimensionality Reduction technique with a PSO feature selection method attained a high accuracy of 91% with a Kappa of 0.7961, outperforming all of the other classifiers.
Collapse
Affiliation(s)
- Dinesh Chellappan
- Department of Electrical and Electronics Engineering, KPR Institute of Engineering and Technology, Coimbatore 641 407, Tamil Nadu, India
| | - Harikumar Rajaguru
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638 401, Tamil Nadu, India;
| |
Collapse
|
8
|
Shamsutdinova D, Das-Munshi J, Ashworth M, Roberts A, Stahl D. Predicting type 2 diabetes prevalence for people with severe mental illness in a multi-ethnic East London population. Int J Med Inform 2023; 172:105019. [PMID: 36787689 DOI: 10.1016/j.ijmedinf.2023.105019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 01/20/2023] [Accepted: 02/03/2023] [Indexed: 02/10/2023]
Abstract
BACKGROUND AND AIMS Prevalence of type two diabetes mellitus (T2DM) in people with severe mental illness (SMI) is 2-3 times higher than in general population. Predictive modelling has advanced greatly in the past decade, and it is important to apply cutting-edge methods to vulnerable groups. However, few T2DM prediction models account for the presence of mental illness, and none seemed to have been developed specifically for people with SMI. Therefore, we aimed to develop and internally validate a T2DM prevalence model for people with SMI. METHODS We utilised a large cross-sectional sample representative of a multi-ethnic population from London (674,000 adults); 10,159 people with SMI formed our analytical sample (1,513 T2DM cases). We fitted a linear logistic regression and XGBoost as stand-alone models and as a stacked ensemble. Age, sex, body mass index, ethnicity, area-based deprivation, past hypertension, cardiovascular diseases, prescribed antipsychotics, and SMI illness were the predictors. RESULTS Logistic regression performed well while detecting T2DM presence for people with SMI: area under the receiver operator curve (ROC-AUC) was 0.83 (95 % CI 0.79-0.87). XGBoost and LR-XGBoost ensemble performed equally well, ROC-AUC 0.83 (95 % CI 0.79-0.87), indicating a negligible contribution of non-linear terms to predictive power. Ethnicity was the most important predictor after age. We demonstrated how the derived models can be utilised and estimated a 2.14 % (95 %CI 2.03 %-2.24 %) increase in T2DM prevalence in East London SMI population in 20 years' time, driven by the projected demographic changes. CONCLUSIONS Primary care data, the setting where prediction models could be most fruitfully used, provide enough information for well-performing T2DM prevalence models for people with SMI. We demonstrated how thorough internal cross-validation of an ensemble of a linear and machine-learning model can quantify the predictive value of non-linearity in the data.
Collapse
Affiliation(s)
- Diana Shamsutdinova
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.
| | - Jayati Das-Munshi
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, London, United Kingdom; ESRC Centre for Society and Mental Health, King's College London, London, United Kingdom; South London and Maudsley NHS Trust, London, United Kingdom
| | - Mark Ashworth
- ESRC Centre for Society and Mental Health, King's College London, London, United Kingdom
| | - Angus Roberts
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Daniel Stahl
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| |
Collapse
|
9
|
Mistry S, Riches NO, Gouripeddi R, Facelli JC. Environmental exposures in machine learning and data mining approaches to diabetes etiology: A scoping review. Artif Intell Med 2023; 135:102461. [PMID: 36628796 PMCID: PMC9834645 DOI: 10.1016/j.artmed.2022.102461] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 10/06/2022] [Accepted: 11/23/2022] [Indexed: 12/03/2022]
Abstract
BACKGROUND Environmental exposures are implicated in diabetes etiology, but are poorly understood due to disease heterogeneity, complexity of exposures, and analytical challenges. Machine learning and data mining are artificial intelligence methods that can address these limitations. Despite their increasing adoption in etiology and prediction of diabetes research, the types of methods and exposures analyzed have not been thoroughly reviewed. OBJECTIVE We aimed to review articles that implemented machine learning and data mining methods to understand environmental exposures in diabetes etiology and disease prediction. METHODS We queried PubMed and Scopus databases for machine learning and data mining studies that used environmental exposures to understand diabetes etiology on September 19th, 2022. Exposures were classified into specific external, general external, or internal exposures. We reviewed machine learning and data mining methods and characterized the scope of environmental exposures studied in the etiology of general diabetes, type 1 diabetes, type 2 diabetes, and other types of diabetes. RESULTS We identified 44 articles for inclusion. Specific external exposures were the most common exposures studied, and supervised models were the most common methods used. Well-established specific external exposures of low physical activity, high cholesterol, and high triglycerides were predictive of general diabetes, type 2 diabetes, and prediabetes, while novel metabolic and gut microbiome biomarkers were implicated in type 1 diabetes. DISCUSSION The use of machine learning and data mining methods to elucidate environmental triggers of diabetes was largely limited to well-established risk factors identified using easily explainable and interpretable models. Future studies should seek to leverage machine learning and data mining to explore the temporality and co-occurrence of multiple exposures and further evaluate the role of general external and internal exposures in diabetes etiology.
Collapse
Affiliation(s)
- Sejal Mistry
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Center of Excellence for Exposure Health Informatics, University of Utah, Salt Lake City, UT, USA
| | - Naomi O Riches
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Center of Excellence for Exposure Health Informatics, University of Utah, Salt Lake City, UT, USA; Department of Obstetrics and Gynecology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Ramkiran Gouripeddi
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Center of Excellence for Exposure Health Informatics, University of Utah, Salt Lake City, UT, USA; Clinical and Translational Science Institute, University of Utah, Salt Lake City, UT, USA
| | - Julio C Facelli
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Center of Excellence for Exposure Health Informatics, University of Utah, Salt Lake City, UT, USA; Clinical and Translational Science Institute, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
10
|
Sinha K, Uddin Z, Kawsar H, Islam S, Deen M, Howlader M. Analyzing chronic disease biomarkers using electrochemical sensors and artificial neural networks. Trends Analyt Chem 2023. [DOI: 10.1016/j.trac.2022.116861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
11
|
Olusanya MO, Ogunsakin RE, Ghai M, Adeleke MA. Accuracy of Machine Learning Classification Models for the Prediction of Type 2 Diabetes Mellitus: A Systematic Survey and Meta-Analysis Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph192114280. [PMID: 36361161 PMCID: PMC9655196 DOI: 10.3390/ijerph192114280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/22/2022] [Accepted: 10/25/2022] [Indexed: 05/13/2023]
Abstract
Soft-computing and statistical learning models have gained substantial momentum in predicting type 2 diabetes mellitus (T2DM) disease. This paper reviews recent soft-computing and statistical learning models in T2DM using a meta-analysis approach. We searched for papers using soft-computing and statistical learning models focused on T2DM published between 2010 and 2021 on three different search engines. Of 1215 studies identified, 34 with 136952 patients met our inclusion criteria. The pooled algorithm's performance was able to predict T2DM with an overall accuracy of 0.86 (95% confidence interval [CI] of [0.82, 0.89]). The classification of diabetes prediction was significantly greater in models with a screening and diagnosis (pooled proportion [95% CI] = 0.91 [0.74, 0.97]) when compared to models with nephropathy (pooled proportion = 0.48 [0.76, 0.89] to 0.88 [0.83, 0.91]). For the prediction of T2DM, the decision trees (DT) models had a pooled accuracy of 0.88 [95% CI: 0.82, 0.92], and the neural network (NN) models had a pooled accuracy of 0.85 [95% CI: 0.79, 0.89]. Meta-regression did not provide any statistically significant findings for the heterogeneous accuracy in studies with different diabetes predictions, sample sizes, and impact factors. Additionally, ML models showed high accuracy for the prediction of T2DM. The predictive accuracy of ML algorithms in T2DM is promising, mainly through DT and NN models. However, there is heterogeneity among ML models. We compared the results and models and concluded that this evidence might help clinicians interpret data and implement optimum models for their dataset for T2DM prediction.
Collapse
Affiliation(s)
- Micheal O. Olusanya
- Department of Computer Science and Information Technology, Sol Plaatje University, Kimberley 8300, South Africa
- Correspondence:
| | - Ropo Ebenezer Ogunsakin
- Biostatistics Unit, Discipline of Public Health Medicine, School of Nursing & Public Health, College of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | - Meenu Ghai
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | - Matthew Adekunle Adeleke
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| |
Collapse
|
12
|
Polessa Paula D, Barbosa Aguiar O, Pruner Marques L, Bensenor I, Suemoto CK, Mendes da Fonseca MDJ, Griep RH. Comparing machine learning algorithms for multimorbidity prediction: An example from the Elsa-Brasil study. PLoS One 2022; 17:e0275619. [PMID: 36206287 PMCID: PMC9543987 DOI: 10.1371/journal.pone.0275619] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/20/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Multimorbidity is a worldwide concern related to greater disability, worse quality of life, and mortality. The early prediction is crucial for preventive strategies design and integrative medical practice. However, knowledge about how to predict multimorbidity is limited, possibly due to the complexity involved in predicting multiple chronic diseases. METHODS In this study, we present the use of a machine learning approach to build cost-effective multimorbidity prediction models. Based on predictors easily obtainable in clinical practice (sociodemographic, clinical, family disease history and lifestyle), we build and compared the performance of seven multilabel classifiers (multivariate random forest, and classifier chain, binary relevance and binary dependence, with random forest and support vector machine as base classifiers), using a sample of 15105 participants from the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil). We developed a web application for the building and use of prediction models. RESULTS Classifier chain with random forest as base classifier performed better (accuracy = 0.34, subset accuracy = 0.15, and Hamming Loss = 0.16). For different feature sets, random forest based classifiers outperformed those based on support vector machine. BMI, blood pressure, sex, and age were the features most relevant to multimorbidity prediction. CONCLUSIONS Our results support the choice of random forest based classifiers for multimorbidity prediction.
Collapse
Affiliation(s)
- Daniela Polessa Paula
- National School of Statistical Sciences, Brazilian Institute of Geography and Statistics, Rio de Janeiro, Brazil
- * E-mail: ,
| | | | - Larissa Pruner Marques
- National School of Public Health, Oswaldo Cruz Foundation, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Isabela Bensenor
- Department of Internal Medicine, Faculdade de Medicina da Universidade de São Paulo & Hospital Universitário, Universidade de São Paulo, São Paulo, Brazil
| | - Claudia Kimie Suemoto
- Division of Geriatrics, Department of Clinical Medicine, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | | | - Rosane Härter Griep
- Health and Environmental Education Laboratory, Oswaldo Cruz Institute (IOC), Rio de Janeiro, Brazil
| |
Collapse
|
13
|
Suzuki Y, Suzuki H, Ishikawa T, Yamada Y, Yatoh S, Sugano Y, Iwasaki H, Sekiya M, Yahagi N, Hada Y, Shimano H. Exploratory analysis using machine learning of predictive factors for falls in type 2 diabetes. Sci Rep 2022; 12:11965. [PMID: 35831378 PMCID: PMC9279484 DOI: 10.1038/s41598-022-15224-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Accepted: 06/21/2022] [Indexed: 11/09/2022] Open
Abstract
We aimed to investigate the status of falls and to identify important risk factors for falls in persons with type 2 diabetes (T2D) including the non-elderly. Participants were 316 persons with T2D who were assessed for medical history, laboratory data and physical capabilities during hospitalization and given a questionnaire on falls one year after discharge. Two different statistical models, logistic regression and random forest classifier, were used to identify the important predictors of falls. The response rate to the survey was 72%; of the 226 respondents, there were 129 males and 97 females (median age 62 years). The fall rate during the first year after discharge was 19%. Logistic regression revealed that knee extension strength, fasting C-peptide (F-CPR) level and dorsiflexion strength were independent predictors of falls. The random forest classifier placed grip strength, F-CPR, knee extension strength, dorsiflexion strength and proliferative diabetic retinopathy among the 5 most important variables for falls. Lower extremity muscle weakness, elevated F-CPR levels and reduced grip strength were shown to be important risk factors for falls in T2D. Analysis by random forest can identify new risk factors for falls in addition to logistic regression.
Collapse
Affiliation(s)
- Yasuhiro Suzuki
- Department of Rehabilitation Medicine, University of Tsukuba Hospital, Tsukuba, Ibaraki, 305-8576, Japan.
| | - Hiroaki Suzuki
- Department of Internal Medicine (Endocrinology and Metabolism), Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki, 305-8575, Japan.
| | | | | | - Shigeru Yatoh
- Department of Internal Medicine (Endocrinology and Metabolism), Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki, 305-8575, Japan
| | - Yoko Sugano
- Department of Internal Medicine (Endocrinology and Metabolism), Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki, 305-8575, Japan
| | - Hitoshi Iwasaki
- Department of Internal Medicine (Endocrinology and Metabolism), Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki, 305-8575, Japan
| | - Motohiro Sekiya
- Department of Internal Medicine (Endocrinology and Metabolism), Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki, 305-8575, Japan
| | - Naoya Yahagi
- Department of Internal Medicine (Endocrinology and Metabolism), Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki, 305-8575, Japan
| | - Yasushi Hada
- Department of Rehabilitation Medicine, University of Tsukuba Hospital, Tsukuba, Ibaraki, 305-8576, Japan
| | - Hitoshi Shimano
- Department of Internal Medicine (Endocrinology and Metabolism), Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki, 305-8575, Japan.,International Institute for Integrative Sleep Medicine (WPI-IIIS), University of Tsukuba, Tsukuba, Ibaraki, 305-8575, Japan.,Life Science Center of Tsukuba Advanced Research Alliance (TARA), University of Tsukuba, Tsukuba, Ibaraki, 305-8577, Japan.,Japan Agency for Medical Research and Development-Core Research for Evolutional Science and Technology (AMED-CREST), Chiyoda-ku, Tokyo, 100-0004, Japan
| |
Collapse
|
14
|
Tuppad A, Patil SD. Machine learning for diabetes clinical decision support: a review. ADVANCES IN COMPUTATIONAL INTELLIGENCE 2022; 2:22. [PMID: 35434723 PMCID: PMC9006199 DOI: 10.1007/s43674-022-00034-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 02/27/2022] [Accepted: 03/03/2022] [Indexed: 12/14/2022]
Abstract
Type 2 diabetes has recently acquired the status of an epidemic silent killer, though it is non-communicable. There are two main reasons behind this perception of the disease. First, a gradual but exponential growth in the disease prevalence has been witnessed irrespective of age groups, geography or gender. Second, the disease dynamics are very complex in terms of multifactorial risks involved, initial asymptomatic period, different short-term and long-term complications posing serious health threat and related co-morbidities. Majority of its risk factors are lifestyle habits like physical inactivity, lack of exercise, high body mass index (BMI), poor diet, smoking except some inevitable ones like family history of diabetes, ethnic predisposition, ageing etc. Nowadays, machine learning (ML) is increasingly being applied for alleviation of diabetes health burden and many research works have been proposed in the literature to offer clinical decision support in different application areas as well. In this paper, we present a review of such efforts for the prevention and management of type 2 diabetes. Firstly, we present the medical gaps in diabetes knowledge base, guidelines and medical practice identified from relevant articles and highlight those that can be addressed by ML. Further, we review the ML research works in three different application areas namely—(1) risk assessment (statistical risk scores and ML-based risk models), (2) diagnosis (using non-invasive and invasive features), (3) prognosis (from normoglycemia/prior morbidity to incident diabetes and prognosis of incident diabetes to related complications). We discuss and summarize the shortcomings or gaps in the existing ML methodologies for diabetes to be addressed in future. This review provides the breadth of ML predictive modeling applications for diabetes while highlighting the medical and technological gaps as well as various aspects involved in ML-based diabetes clinical decision support.
Collapse
Affiliation(s)
- Ashwini Tuppad
- School of Computer Science and Engineering, REVA University, Rukmini Knowledge Park, Kattigenahalli, Bangalore, Karnataka India
| | - Shantala Devi Patil
- School of Computer Science and Engineering, REVA University, Rukmini Knowledge Park, Kattigenahalli, Bangalore, Karnataka India
| |
Collapse
|
15
|
Use of Machine Learning and Routine Laboratory Tests for Diabetes Mellitus Screening. BIOMED RESEARCH INTERNATIONAL 2022; 2022:8114049. [PMID: 35392258 PMCID: PMC8983182 DOI: 10.1155/2022/8114049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 02/18/2022] [Accepted: 03/10/2022] [Indexed: 12/28/2022]
Abstract
Most patients with diabetes mellitus are asymptomatic, which leads to delayed and more complex treatment. At the same time, most individuals are routinely subjected to standard clinical laboratory examinations, which create large health datasets over a lifetime. Computer processing has been used to search for health anomalies and predict diseases using clinical examinations. This work studied machine learning models to support the screening of diabetes through routine laboratory tests using data from laboratory tests of 62,496 patients. The classification and regression models used were the K-nearest neighbor, support vector machines, Bayes naïve, random forest models, and artificial neural networks. Glycated hemoglobin, a test used for diabetes diagnosis, was used as the target. Regression models calculated glycated hemoglobin directly and were later classified. The performance of classification computer models has been studied under various subdataset partitions and combinations (e.g., healthy, prediabetic, and diabetes, as well as no healthy and no diabetes). The best single performance was achieved with the artificial neural network model when detecting prediabetes or diabetes. The artificial neural network classification model scored 78.1%, 78.7%, and 78.4% for sensitivity, precision, and F1 scores, respectively, when identifying no healthy group. Other models also had good results, depending on what is desired. Machine learning-based models can predict glycated hemoglobin values from routine laboratory tests and can be used as a screening tool to refer a patient for further testing.
Collapse
|
16
|
Delpino F, Costa Â, Farias S, Chiavegatto Filho A, Arcêncio R, Nunes B. Machine learning for predicting chronic diseases: a systematic review. Public Health 2022; 205:14-25. [DOI: 10.1016/j.puhe.2022.01.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 10/26/2021] [Accepted: 01/11/2022] [Indexed: 12/12/2022]
|
17
|
Fregoso-Aparicio L, Noguez J, Montesinos L, García-García JA. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr 2021; 13:148. [PMID: 34930452 PMCID: PMC8686642 DOI: 10.1186/s13098-021-00767-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 12/07/2021] [Indexed: 12/12/2022] Open
Abstract
Diabetes Mellitus is a severe, chronic disease that occurs when blood glucose levels rise above certain limits. Over the last years, machine and deep learning techniques have been used to predict diabetes and its complications. However, researchers and developers still face two main challenges when building type 2 diabetes predictive models. First, there is considerable heterogeneity in previous studies regarding techniques used, making it challenging to identify the optimal one. Second, there is a lack of transparency about the features used in the models, which reduces their interpretability. This systematic review aimed at providing answers to the above challenges. The review followed the PRISMA methodology primarily, enriched with the one proposed by Keele and Durham Universities. Ninety studies were included, and the type of model, complementary techniques, dataset, and performance parameters reported were extracted. Eighteen different types of models were compared, with tree-based algorithms showing top performances. Deep Neural Networks proved suboptimal, despite their ability to deal with big and dirty data. Balancing data and feature selection techniques proved helpful to increase the model's efficiency. Models trained on tidy datasets achieved almost perfect models.
Collapse
Affiliation(s)
- Luis Fregoso-Aparicio
- School of Engineering and Sciences, Tecnologico de Monterrey, Av Lago de Guadalupe KM 3.5, Margarita Maza de Juarez, 52926 Cd Lopez Mateos, Mexico
| | - Julieta Noguez
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - Luis Montesinos
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - José A. García-García
- Hospital General de Mexico Dr. Eduardo Liceaga, Dr. Balmis 148, Doctores, Cuauhtemoc, 06720 Mexico City, Mexico
| |
Collapse
|
18
|
Nagpal MS, Barbaric A, Sherifali D, Morita PP, Cafazzo JA. Patient-Generated Data Analytics of Health Behaviors of People Living With Type 2 Diabetes: Scoping Review. JMIR Diabetes 2021; 6:e29027. [PMID: 34783668 PMCID: PMC8726031 DOI: 10.2196/29027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 08/01/2021] [Accepted: 10/31/2021] [Indexed: 11/13/2022] Open
Abstract
Background Complications due to type 2 diabetes (T2D) can be mitigated through proper self-management that can positively change health behaviors. Technological tools are available to help people living with, or at risk of developing, T2D to manage their condition, and such tools provide a large repository of patient-generated health data (PGHD). Analytics can provide insights into the health behaviors of people living with T2D. Objective The aim of this review is to investigate what can be learned about the health behaviors of those living with, or at risk of developing, T2D through analytics from PGHD. Methods A scoping review using the Arksey and O’Malley framework was conducted in which a comprehensive search of the literature was conducted by 2 reviewers. In all, 3 electronic databases (PubMed, IEEE Xplore, and ACM Digital Library) were searched using keywords associated with diabetes, behaviors, and analytics. Several rounds of screening using predetermined inclusion and exclusion criteria were conducted, after which studies were selected. Critical examination took place through a descriptive-analytical narrative method, and data extracted from the studies were classified into thematic categories. These categories reflect the findings of this study as per our objective. Results We identified 43 studies that met the inclusion criteria for this review. Although 70% (30/43) of the studies examined PGHD independently, 30% (13/43) combined PGHD with other data sources. Most of these studies used machine learning algorithms to perform their analysis. The themes identified through this review include predicting diabetes or obesity, deriving factors that contribute to diabetes or obesity, obtaining insights from social media or web-based forums, predicting glycemia, improving adherence and outcomes, analyzing sedentary behaviors, deriving behavior patterns, discovering clinical correlations from behaviors, and developing design principles. Conclusions The increased volume and availability of PGHD have the potential to derive analytical insights into the health behaviors of people living with T2D. From the literature, we determined that analytics can predict outcomes and identify granular behavior patterns from PGHD. This review determined the broad range of insights that can be examined through PGHD, which constitutes a unique source of data for these applications that would not be possible through the use of other data sources.
Collapse
Affiliation(s)
- Meghan S Nagpal
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada.,Centre for Global eHealth Innovation, Techna Institute, University Health Network, Toronto, ON, Canada
| | - Antonia Barbaric
- Centre for Global eHealth Innovation, Techna Institute, University Health Network, Toronto, ON, Canada.,Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada
| | - Diana Sherifali
- School of Nursing, McMaster University, Hamilton, ON, Canada
| | - Plinio P Morita
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada.,Centre for Global eHealth Innovation, Techna Institute, University Health Network, Toronto, ON, Canada.,School of Public Health and Health Systems, University of Waterloo, Waterloo, ON, Canada
| | - Joseph A Cafazzo
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada.,Centre for Global eHealth Innovation, Techna Institute, University Health Network, Toronto, ON, Canada.,Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
19
|
Application of Data Mining Algorithms for Dementia in People with HIV/AIDS. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:4602465. [PMID: 34335861 PMCID: PMC8286188 DOI: 10.1155/2021/4602465] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 06/21/2021] [Indexed: 11/30/2022]
Abstract
Dementia interferes with the individual's motor, behavioural, and intellectual functions, causing him to be unable to perform instrumental activities of daily living. This study is aimed at identifying the best performing algorithm and the most relevant characteristics to categorise individuals with HIV/AIDS at high risk of dementia from the application of data mining. Principal component analysis (PCA) algorithm was used and tested comparatively between the following machine learning algorithms: logistic regression, decision tree, neural network, KNN, and random forest. The database used for this study was built from the data collection of 270 individuals infected with HIV/AIDS and followed up at the outpatient clinic of a reference hospital for infectious and parasitic diseases in the State of Ceará, Brazil, from January to April 2019. Also, the performance of the algorithms was analysed for the 104 characteristics available in the database; then, with the reduction of dimensionality, there was an improvement in the quality of the machine learning algorithms and identified that during the tests, even losing about 30% of the variation. Besides, when considering only 23 characteristics, the precision of the algorithms was 86% in random forest, 56% logistic regression, 68% decision tree, 60% KNN, and 59% neural network. The random forest algorithm proved to be more effective than the others, obtaining 84% precision and 86% accuracy.
Collapse
|
20
|
Dogan O, Tiwari S, Jabbar MA, Guggari S. A systematic review on AI/ML approaches against COVID-19 outbreak. COMPLEX INTELL SYST 2021; 7:2655-2678. [PMID: 34777970 PMCID: PMC8256231 DOI: 10.1007/s40747-021-00424-8] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 06/05/2021] [Indexed: 12/24/2022]
Abstract
A pandemic disease, COVID-19, has caused trouble worldwide by infecting millions of people. The studies that apply artificial intelligence (AI) and machine learning (ML) methods for various purposes against the COVID-19 outbreak have increased because of their significant advantages. Although AI/ML applications provide satisfactory solutions to COVID-19 disease, these solutions can have a wide diversity. This increase in the number of AI/ML studies and diversity in solutions can confuse deciding which AI/ML technique is suitable for which COVID-19 purposes. Because there is no comprehensive review study, this study systematically analyzes and summarizes related studies. A research methodology has been proposed to conduct the systematic literature review for framing the research questions, searching criteria and relevant data extraction. Finally, 264 studies were taken into account after following inclusion and exclusion criteria. This research can be regarded as a key element for epidemic and transmission prediction, diagnosis and detection, and drug/vaccine development. Six research questions are explored with 50 AI/ML approaches in COVID-19, 8 AI/ML methods for patient outcome prediction, 14 AI/ML techniques in disease predictions, along with five AI/ML methods for risk assessment of COVID-19. It also covers AI/ML method in drug development, vaccines for COVID-19, models in COVID-19, datasets and their usage and dataset applications with AI/ML.
Collapse
Affiliation(s)
- Onur Dogan
- Department of Industrial Engineering, Izmir Bakircay University, 35665 Izmir, Turkey.,Research Center for Data Analytics and Spatial Data Modeling (RC-DAS), Izmir Bakircay University, 35665 Izmir, Turkey
| | - Sanju Tiwari
- Department of Computer Science, Universidad Autonoma de Tamaulipas, Ciudad Victoria, Mexico
| | - M A Jabbar
- Vardhaman College of Engineering, Kacharam, India
| | | |
Collapse
|
21
|
Channa R, Wolf R, Abramoff MD. Autonomous Artificial Intelligence in Diabetic Retinopathy: From Algorithm to Clinical Application. J Diabetes Sci Technol 2021; 15:695-698. [PMID: 32126819 PMCID: PMC8120059 DOI: 10.1177/1932296820909900] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Artificial intelligence (AI)-based algorithms are rapidly entering the health care field and have the potential to improve patient care. Our article focuses on the use of autonomous AI algorithms (ie, algorithms that can make clinical decisions without human oversight) in diagnostic imaging. In this article, we have used the example of diabetic retinopathy screening to highlight some important aspects to be considered by developers, policymakers, and end users when bringing autonomous AI algorithms into clinical practice. We have divided these aspects into (1) following the principles of safety, efficacy, and equity in all phases of development and implementation of the algorithm; (2) regulatory processes involving medical records, medical liability, and patient privacy; (3) cost and billing; and (4) the role of health care providers.
Collapse
Affiliation(s)
- Roomasa Channa
- Department of Ophthalmology, Baylor College of
Medicine, Houston, TX, USA
- Michael DeBakey Veterans Affairs Hospital,
Houston, TX, USA
- Wilmer Eye Institute, Johns Hopkins
University, Baltimore, MD, USA
- Roomasa Channa, MD, Department of Ophthalmology,
Baylor College of Medicine, Michael DeBakey Veterans Affairs Hospital, 6501 Fannin Street,
Houston, TX 77030, USA.
| | - Risa Wolf
- Pediatric Endocrinology, Johns Hopkins
University, Baltimore, MD, USA
| | - Michael D. Abramoff
- The Robert C. Watzke Professor of
Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, USA
- VA Medical Center, Iowa City, IA, USA
- IDx, Coralville, IA, USA
| |
Collapse
|
22
|
Avilés-Santa ML, Monroig-Rivera A, Soto-Soto A, Lindberg NM. Current State of Diabetes Mellitus Prevalence, Awareness, Treatment, and Control in Latin America: Challenges and Innovative Solutions to Improve Health Outcomes Across the Continent. Curr Diab Rep 2020; 20:62. [PMID: 33037442 PMCID: PMC7546937 DOI: 10.1007/s11892-020-01341-9] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/10/2020] [Indexed: 02/06/2023]
Abstract
PURPOSE OF REVIEW Latin America is the scenario of great inequalities where about 32 million human beings live with diabetes. Through this review, we aimed at describing the current state of the prevalence, awareness, treatment, and control of diabetes mellitus and completion of selected guidelines of care across Latin America and identify opportunities to advance research that promotes better health outcomes. RECENT FINDINGS The prevalence of diabetes mellitus has been consistently increasing across the region, with some variation: higher prevalence in Mexico, Haiti, and Puerto Rico and lower in Colombia, Ecuador, Dominican Republic, Peru, and Uruguay. Prevalence assessment methods vary, and potentially underestimating the real number of persons with diabetes. Diabetes unawareness varies widely, with up to 50% of persons with diabetes who do not know they may have the disease. Glycemic, blood pressure, and LDL-C control and completion of guidelines to prevent microvascular complications are not consistently assessed across studies, and the achievement of control goals is suboptimal. On the other hand, multiple interventions, point-of-care/rapid assessment tools, and alternative models of health care delivery have been proposed and tested throughout Latin America. The prevalence of diabetes mellitus continues to rise across Latin America, and the number of those with the disease may be underestimated. However, some local governments are embedding more comprehensive diabetes assessments in their local national surveys. Clinicians and public health advocates in the region have proposed and initiated various multi-level interventions to address this enormous challenge in the region.
Collapse
Affiliation(s)
- M Larissa Avilés-Santa
- Division of Extramural Scientific Programs, Clinical and Health Services Research at the National Institute on Minority Health and Health Disparities, Bethesda, MD, USA.
| | | | | | | |
Collapse
|
23
|
Kopitar L, Kocbek P, Cilar L, Sheikh A, Stiglic G. Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci Rep 2020; 10:11981. [PMID: 32686721 PMCID: PMC7371679 DOI: 10.1038/s41598-020-68771-z] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 06/30/2020] [Indexed: 02/07/2023] Open
Abstract
Most screening tests for T2DM in use today were developed using multivariate regression methods that are often further simplified to allow transformation into a scoring formula. The increasing volume of electronically collected data opened the opportunity to develop more complex, accurate prediction models that can be continuously updated using machine learning approaches. This study compares machine learning-based prediction models (i.e. Glmnet, RF, XGBoost, LightGBM) to commonly used regression models for prediction of undiagnosed T2DM. The performance in prediction of fasting plasma glucose level was measured using 100 bootstrap iterations in different subsets of data simulating new incoming data in 6-month batches. With 6 months of data available, simple regression model performed with the lowest average RMSE of 0.838, followed by RF (0.842), LightGBM (0.846), Glmnet (0.859) and XGBoost (0.881). When more data were added, Glmnet improved with the highest rate (+ 3.4%). The highest level of variable selection stability over time was observed with LightGBM models. Our results show no clinically relevant improvement when more sophisticated prediction models were used. Since higher stability of selected variables over time contributes to simpler interpretation of the models, interpretability and model calibration should also be considered in development of clinical prediction models.
Collapse
Affiliation(s)
- Leon Kopitar
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, 6000, Koper, Slovenia.
| | - Primoz Kocbek
- Faculty of Health Sciences, University of Maribor, 2000, Maribor, Slovenia
| | - Leona Cilar
- Faculty of Health Sciences, University of Maribor, 2000, Maribor, Slovenia
| | - Aziz Sheikh
- Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland, EH8 9AG, UK.,Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital/Harvard Medical School, Boston, MA, 02115, USA
| | - Gregor Stiglic
- Faculty of Health Sciences, University of Maribor, 2000, Maribor, Slovenia.,Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000, Maribor, Slovenia
| |
Collapse
|
24
|
Musacchio N, Giancaterini A, Guaita G, Ozzello A, Pellegrini MA, Ponzani P, Russo GT, Zilich R, de Micheli A. Artificial Intelligence and Big Data in Diabetes Care: A Position Statement of the Italian Association of Medical Diabetologists. J Med Internet Res 2020; 22:e16922. [PMID: 32568088 PMCID: PMC7338925 DOI: 10.2196/16922] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 03/09/2020] [Accepted: 04/12/2020] [Indexed: 12/24/2022] Open
Abstract
Since the last decade, most of our daily activities have become digital. Digital health takes into account the ever-increasing synergy between advanced medical technologies, innovation, and digital communication. Thanks to machine learning, we are not limited anymore to a descriptive analysis of the data, as we can obtain greater value by identifying and predicting patterns resulting from inductive reasoning. Machine learning software programs that disclose the reasoning behind a prediction allow for “what-if” models by which it is possible to understand if and how, by changing certain factors, one may improve the outcomes, thereby identifying the optimal behavior. Currently, diabetes care is facing several challenges: the decreasing number of diabetologists, the increasing number of patients, the reduced time allowed for medical visits, the growing complexity of the disease both from the standpoints of clinical and patient care, the difficulty of achieving the relevant clinical targets, the growing burden of disease management for both the health care professional and the patient, and the health care accessibility and sustainability. In this context, new digital technologies and the use of artificial intelligence are certainly a great opportunity. Herein, we report the results of a careful analysis of the current literature and represent the vision of the Italian Association of Medical Diabetologists (AMD) on this controversial topic that, if well used, may be the key for a great scientific innovation. AMD believes that the use of artificial intelligence will enable the conversion of data (descriptive) into knowledge of the factors that “affect” the behavior and correlations (predictive), thereby identifying the key aspects that may establish an improvement of the expected results (prescriptive). Artificial intelligence can therefore become a tool of great technical support to help diabetologists become fully responsible of the individual patient, thereby assuring customized and precise medicine. This, in turn, will allow for comprehensive therapies to be built in accordance with the evidence criteria that should always be the ground for any therapeutic choice.
Collapse
Affiliation(s)
| | - Annalisa Giancaterini
- Diabetology Service, Muggiò Polyambulatory, Azienda Socio Sanitaria Territoriale, Monza, Italy
| | - Giacomo Guaita
- Diabetology, Endocrinology and Metabolic Diseases Service, Azienda Tutela Salute Sardegna-Azienda Socio Sanitaria Locale, Carbonia, Italy
| | - Alessandro Ozzello
- Departmental Structure of Endocrine Diseases and Diabetology, Azienda Sanitaria Locale TO3, Pinerolo, Italy
| | - Maria A Pellegrini
- Italian Association of Diabetologists, Rome, Italy.,New Coram Limited Liability Company, Udine, Italy
| | - Paola Ponzani
- Operative Unit of Diabetology, La Colletta Hospital, Azienda Sanitaria Locale 3, Genova, Italy
| | - Giuseppina T Russo
- Department of Clinical and Experimental Medicine, University of Messina, Messina, Italy
| | | | - Alberto de Micheli
- Associazione dei Cavalieri Italiani del Sovrano Militare Ordine di Malta, Genova, Italy
| |
Collapse
|
25
|
Zhang Y, Zhang Q, Li L, Thomas R, Li SZ, He MG, Wang NL. Establishment and Comparison of Algorithms for Detection of Primary Angle Closure Suspect Based on Static and Dynamic Anterior Segment Parameters. Transl Vis Sci Technol 2020; 9:16. [PMID: 32821488 PMCID: PMC7401939 DOI: 10.1167/tvst.9.5.16] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 02/12/2020] [Indexed: 12/14/2022] Open
Abstract
Purpose To establish and evaluate algorithms for detection of primary angle closure suspects (PACS), the risk factor for primary angle closure disease by combining multiple static and dynamic anterior segment optical coherence tomography (ASOCT) parameters. Methods Observational, cross-sectional study. The right eyes of subjects aged ≥40 years who participated in the 5-year follow-up of the Handan Eye Study, and underwent gonioscopy and ASOCT examinations under light and dark conditions were included. All ASOCT images were analyzed by Zhongshan Angle Assessment Program. Backward logistic regression (BLR) was used for inclusion of variables in the prediction models. BLR, naïve Bayes’ classification (NBC), and neural network (NN) were evaluated and compared using the area under the receiver operating characteristic curve (AUC). Results Data from 744 subjects (405 eyes with PACS and 339 normal eyes) were analyzed. Angle recess area at 750 µm, anterior chamber volume, lens vault in light and iris cross-sectional area change/pupil diameter change were included in the prediction models. The AUCs of BLR, NBC, and NN were 0.827 (95% confidence interval [CI], 0.798-0.856), 0.826 (95% CI, 0.797-0.854), and 0.844 (95% CI, 0.817-0.871), respectively. No significant statistical differences were found between the three algorithms (P = 0.622). Conclusions The three algorithms did not meet the requirements for population-based screening of PACS. One possible reason could be the different angle closure mechanisms in enrolled eyes. Translational Relevance This study provides a promise for basis for future research directed toward the development of an image-based, noncontact method to screen for angle closure.
Collapse
Affiliation(s)
- Ye Zhang
- Beijing Tongren Eye Center, Beijing Key Laboratory of Ophthalmology and Visual Science, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Qing Zhang
- Beijing Institute of Ophthalmology, Beijing, China
| | - Lei Li
- Beijing Tongren Eye Center, Beijing Key Laboratory of Ophthalmology and Visual Science, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Ravi Thomas
- Queensland Eye Institute, Brisbane, Australia.,University of Queensland, Brisbane, Australia
| | - Si Zhen Li
- Nanjing Tongren Hospital, Jiangsu, China
| | - Ming Guang He
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.,Department of Surgery, University of Melbourne, Melbourne, Australia.,Ophthalmology, Centre for Eye Research Australia, Melbourne, Australia
| | - Ning Li Wang
- Beijing Tongren Eye Center, Beijing Key Laboratory of Ophthalmology and Visual Science, Beijing Tongren Hospital, Capital Medical University, Beijing, China.,Beijing Institute of Ophthalmology, Beijing, China
| |
Collapse
|
26
|
Battineni G, Sagaro GG, Chinatalapudi N, Amenta F. Applications of Machine Learning Predictive Models in the Chronic Disease Diagnosis. J Pers Med 2020; 10:jpm10020021. [PMID: 32244292 PMCID: PMC7354442 DOI: 10.3390/jpm10020021] [Citation(s) in RCA: 79] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 03/09/2020] [Accepted: 03/23/2020] [Indexed: 02/07/2023] Open
Abstract
This paper reviews applications of machine learning (ML) predictive models in the diagnosis of chronic diseases. Chronic diseases (CDs) are responsible for a major portion of global health costs. Patients who suffer from these diseases need lifelong treatment. Nowadays, predictive models are frequently applied in the diagnosis and forecasting of these diseases. In this study, we reviewed the state-of-the-art approaches that encompass ML models in the primary diagnosis of CD. This analysis covers 453 papers published between 2015 and 2019, and our document search was conducted from PubMed (Medline), and Cumulative Index to Nursing and Allied Health Literature (CINAHL) libraries. Ultimately, 22 studies were selected to present all modeling methods in a precise way that explains CD diagnosis and usage models of individual pathologies with associated strengths and limitations. Our outcomes suggest that there are no standard methods to determine the best approach in real-time clinical practice since each method has its advantages and disadvantages. Among the methods considered, support vector machines (SVM), logistic regression (LR), clustering were the most commonly used. These models are highly applicable in classification, and diagnosis of CD and are expected to become more important in medical practice in the near future.
Collapse
Affiliation(s)
- Gopi Battineni
- Center for Telemedicine and Tele pharmacy, School of Medicinal and Health Sciences Products, University of Camerino, Via Madonna Della carceri 9, 62032 Camerino, Italy; (G.G.S.); (N.C.); (F.A.)
- Correspondence: ; Tel.: +39-333-172-8206
| | - Getu Gamo Sagaro
- Center for Telemedicine and Tele pharmacy, School of Medicinal and Health Sciences Products, University of Camerino, Via Madonna Della carceri 9, 62032 Camerino, Italy; (G.G.S.); (N.C.); (F.A.)
| | - Nalini Chinatalapudi
- Center for Telemedicine and Tele pharmacy, School of Medicinal and Health Sciences Products, University of Camerino, Via Madonna Della carceri 9, 62032 Camerino, Italy; (G.G.S.); (N.C.); (F.A.)
| | - Francesco Amenta
- Center for Telemedicine and Tele pharmacy, School of Medicinal and Health Sciences Products, University of Camerino, Via Madonna Della carceri 9, 62032 Camerino, Italy; (G.G.S.); (N.C.); (F.A.)
- Research Department, International Medical Radio Center Foundation (C.I.R.M.), 00144 Roma, Italy
| |
Collapse
|
27
|
Ambriola Oku AY, Zimeo Morais GA, Arantes Bueno AP, Fujita A, Sato JR. Potential Confounders in the Analysis of Brazilian Adolescent's Health: A Combination of Machine Learning and Graph Theory. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 17:ijerph17010090. [PMID: 31877700 PMCID: PMC6981403 DOI: 10.3390/ijerph17010090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 12/09/2019] [Accepted: 12/16/2019] [Indexed: 12/20/2022]
Abstract
The prevalence of health problems during childhood and adolescence is high in developing countries such as Brazil. Social inequality, violence, and malnutrition have strong impact on youth health. To better understand these issues we propose to combine machine-learning methods and graph analysis to build predictive networks applied to the Brazilian National Student Health Survey (PenSE 2015) data, a large dataset that consists of questionnaires filled by the students. By using a combination of gradient boosting machines and centrality hub metric, it was possible to identify potential confounders to be considered when conducting association analyses among variables. The variables were ranked according to their hub centrality to predict the other variables from a directed weighted-graph perspective. The top five ranked confounder variables were “gender”, “oral health care”, “intended education level”, and two variables associated with nutrition habits—“eat while watching TV” and “never eat fast-food”. In conclusion, although causal effects cannot be inferred from the data, we believe that the proposed approach might be a useful tool to obtain novel insights on the association between variables and to identify general factors related to health conditions.
Collapse
Affiliation(s)
- Amanda Yumi Ambriola Oku
- Center of Mathematics, Computing and Cognition—Universidade Federal do ABC, Santo André CEP 09210-580, Brazil
| | | | - Ana Paula Arantes Bueno
- Center of Mathematics, Computing and Cognition—Universidade Federal do ABC, Santo André CEP 09210-580, Brazil
| | - André Fujita
- Institute of Mathematics and Statistics—University of São Paulo, São Paulo CEP 05508-090, Brazil
| | - João Ricardo Sato
- Center of Mathematics, Computing and Cognition—Universidade Federal do ABC, Santo André CEP 09210-580, Brazil
- Correspondence:
| |
Collapse
|
28
|
Santos HGD, Nascimento CFD, Izbicki R, Duarte YADO, Porto Chiavegatto Filho AD. [Machine learning for predictive analyses in health: an example of an application to predict death in the elderly in São Paulo, Brazil]. CAD SAUDE PUBLICA 2019; 35:e00050818. [PMID: 31365698 DOI: 10.1590/0102-311x00050818] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 05/20/2019] [Indexed: 01/15/2023] Open
Abstract
This study aims to present the stages related to the use of machine learning algorithms for predictive analyses in health. An application was performed in a database of elderly residents in the city of São Paulo, Brazil, who participated in the Health, Well-Being, and Aging Study (SABE) (n = 2,808). The outcome variable was the occurrence of death within five years of the elder's entry into the study (n = 423), and the predictors were 37 variables related to the elder's demographic, socioeconomic, and health profile. The application was organized according to the following stages: division of data in training (70%) and testing (30%), pre-processing of the predictors, learning, and assessment of the models. The learning stage used 5 algorithms to adjust the models: logistic regression with and without penalization, neural networks, gradient boosted trees, and random forest. The algorithms' hyperparameters were optimized by 10-fold cross-validation to select those corresponding to the best models. For each algorithm, the best model was assessed in test data via area under the ROC curve (AUC) and related measures. All the models presented AUC ROC greater than 0.70. For the three models with the highest AUC ROC (neural networks and logistic regression with LASSO penalization and without penalization, respectively), quality measures of the predicted probability were also assessed. The expectation is that with the increased availability of data and trained human capital, it will be possible to develop predictive machine learning models with the potential to help health professionals make the best decisions.
Collapse
Affiliation(s)
| | | | - Rafael Izbicki
- Centro de Ciências Exatas e de Tecnologia, Universidade Federal de São Carlos, São Carlos, Brasil
| | | | | |
Collapse
|
29
|
Machine Learning Model for Imbalanced Cholera Dataset in Tanzania. ScientificWorldJournal 2019; 2019:9397578. [PMID: 31427903 PMCID: PMC6683776 DOI: 10.1155/2019/9397578] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Revised: 05/15/2019] [Accepted: 06/09/2019] [Indexed: 11/28/2022] Open
Abstract
Cholera epidemic remains a public threat throughout history, affecting vulnerable population living with unreliable water and substandard sanitary conditions. Various studies have observed that the occurrence of cholera has strong linkage with environmental factors such as climate change and geographical location. Climate change has been strongly linked to the seasonal occurrence and widespread of cholera through the creation of weather patterns that favor the disease's transmission, infection, and the growth of Vibrio cholerae, which cause the disease. Over the past decades, there have been great achievements in developing epidemic models for the proper prediction of cholera. However, the integration of weather variables and use of machine learning techniques have not been explicitly deployed in modeling cholera epidemics in Tanzania due to the challenges that come with its datasets such as imbalanced data and missing information. This paper explores the use of machine learning techniques to model cholera epidemics with linkage to seasonal weather changes while overcoming the data imbalance problem. Adaptive Synthetic Sampling Approach (ADASYN) and Principal Component Analysis (PCA) were used to the restore sampling balance and dimensional of the dataset. In addition, sensitivity, specificity, and balanced-accuracy metrics were used to evaluate the performance of the seven models. Based on the results of the Wilcoxon sign-rank test and features of the models, XGBoost classifier was selected to be the best model for the study. Overall results improved our understanding of the significant roles of machine learning strategies in health-care data. However, the study could not be treated as a time series problem due to the data collection bias. The study recommends a review of health-care systems in order to facilitate quality data collection and deployment of machine learning techniques.
Collapse
|
30
|
Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019; 110:12-22. [PMID: 30763612 DOI: 10.1016/j.jclinepi.2019.02.004] [Citation(s) in RCA: 793] [Impact Index Per Article: 158.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 01/18/2019] [Accepted: 02/05/2019] [Indexed: 02/06/2023]
Abstract
OBJECTIVES The objective of this study was to compare performance of logistic regression (LR) with machine learning (ML) for clinical prediction modeling in the literature. STUDY DESIGN AND SETTING We conducted a Medline literature search (1/2016 to 8/2017) and extracted comparisons between LR and ML models for binary outcomes. RESULTS We included 71 of 927 studies. The median sample size was 1,250 (range 72-3,994,872), with 19 predictors considered (range 5-563) and eight events per predictor (range 0.3-6,697). The most common ML methods were classification trees, random forests, artificial neural networks, and support vector machines. In 48 (68%) studies, we observed potential bias in the validation procedures. Sixty-four (90%) studies used the area under the receiver operating characteristic curve (AUC) to assess discrimination. Calibration was not addressed in 56 (79%) studies. We identified 282 comparisons between an LR and ML model (AUC range, 0.52-0.99). For 145 comparisons at low risk of bias, the difference in logit(AUC) between LR and ML was 0.00 (95% confidence interval, -0.18 to 0.18). For 137 comparisons at high risk of bias, logit(AUC) was 0.34 (0.20-0.47) higher for ML. CONCLUSION We found no evidence of superior performance of ML over LR. Improvements in methodology and reporting are needed for studies that compare modeling algorithms.
Collapse
Affiliation(s)
- Evangelia Christodoulou
- Department of Development & Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000 Belgium
| | - Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Windmill Road, Oxford, OX3 7LD UK
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Windmill Road, Oxford, OX3 7LD UK; Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Albinusdreef 2, Leiden, 2333 ZA The Netherlands
| | - Jan Y Verbakel
- Department of Development & Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000 Belgium; Department of Public Health & Primary Care, KU Leuven, Kapucijnenvoer 33J box 7001, Leuven, 3000 Belgium; Nuffield Department of Primary Care Health Sciences, University of Oxford, Woodstock Road, Oxford, OX2 6GG UK
| | - Ben Van Calster
- Department of Development & Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000 Belgium; Department of Biomedical Data Sciences, Leiden University Medical Centre, Albinusdreef 2, Leiden, 2333 ZA The Netherlands.
| |
Collapse
|
31
|
Becker A. Artificial intelligence in medicine: What is it doing for us today? HEALTH POLICY AND TECHNOLOGY 2019. [DOI: 10.1016/j.hlpt.2019.03.004] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
32
|
An Accurate Clinical Implication Assessment for Diabetes Mellitus Prevalence Based on a Study from Nigeria. Processes (Basel) 2019. [DOI: 10.3390/pr7050289] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The increasing rate of diabetes is found across the planet. Therefore, the diagnosis of pre-diabetes and diabetes is important in populations with extreme diabetes risk. In this study, a machine learning technique was implemented over a data mining platform by employing Rule classifiers (PART and Decision table) to measure the accuracy and logistic regression on the classification results for forecasting the prevalence in diabetes mellitus patients suffering simultaneously from other chronic disease symptoms. The real-life data was collected in Nigeria between December 2017 and February 2019 by applying ten non-intrusive and easily available clinical variables. The results disclosed that the Rule classifiers achieved a mean accuracy of 98.75%. The error rate, precision, recall, F-measure, and Matthew’s correlation coefficient MCC were 0.02%, 0.98%, 0.98%, 0.98%, and 0.97%, respectively. The forecast decision, achieved by employing a set of 23 decision rules (DR), indicates that age, gender, glucose level, and body mass are fundamental reasons for diabetes, followed by work stress, diet, family diabetes history, physical exercise, and cardiovascular stroke history. The study validated that the proposed set of DR is practical for quick screening of diabetes mellitus patients at the initial stage without intrusive medical tests and was found to be effective in the initial diagnosis of diabetes.
Collapse
|
33
|
ARANDA ALFREDO, VALENCIA ALVARO. COMPUTATIONAL STUDY ON THE RUPTURE RISK IN REAL CEREBRAL ANEURYSMS WITH GEOMETRICAL AND FLUID-MECHANICAL PARAMETERS USING FSI SIMULATIONS AND MACHINE LEARNING ALGORITHMS. J MECH MED BIOL 2019. [DOI: 10.1142/s0219519419500143] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Fluid-mechanical and morphological parameters are recognized as major factors in the rupture risk of human aneurysms. On the other hand, it is well known that a lot of machine learning tools are available to study a variety of problems in many fields. In this work, fluid–structure interaction (FSI) simulations were carried out to examine a database of 60 real saccular cerebral aneurysms (30 ruptured and 30 unruptured) using reconstructions by angiography images. With the results of the simulations and geometric analyses, we studied the analysis of variance (ANOVA) statistic test in many variables and we obtained that aspect ratio (AR), bottleneck factor (BNF), maximum height of the aneurysms (MH), relative residence time (RRT), Womersley number (WN) and Von-Mises strain (VMS) are statically significant and good predictors for the models. In consequence, these ones were used in five machine learning algorithms to determine the rupture risk predictions of the aneurysms, where the adaptative boosting (AdaBoost) was calculated with the highest area under the curve (AUC) in the receiver operating characteristic (ROC) curve (AUC 0.944).
Collapse
Affiliation(s)
- ALFREDO ARANDA
- Department of Mechanical Engineering, Universidad de Chile, Beauchef 851, Santiago 8370456, Chile
| | - ALVARO VALENCIA
- Department of Mechanical Engineering, Universidad de Chile, Beauchef 851, Santiago 8370456, Chile
| |
Collapse
|
34
|
Pei D, Gong Y, Kang H, Zhang C, Guo Q. Accurate and rapid screening model for potential diabetes mellitus. BMC Med Inform Decis Mak 2019; 19:41. [PMID: 30866905 PMCID: PMC6416888 DOI: 10.1186/s12911-019-0790-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 03/03/2019] [Indexed: 11/26/2022] Open
Abstract
Background Prediction or early diagnosis of diabetes is crucial for populations with high risk of diabetes. Methods In this study, we assessed the ability of five popular classifiers (J48, AdaboostM1, SMO, Bayes Net, and Naïve Bayes) to identify individuals with diabetes based on nine non-invasive and easily obtained clinical features, including age, gender, body mass index (BMI), hypertension, history of cardiovascular disease or stroke, family history of diabetes, physical activity, work stress, and salty food preference. A total of 4205 data entries were obtained from annual physical examination reports for adults in the Shengjing Hospital of China Medical University during January–April 2017. Weka data mining software was used to identify the best algorithm for diabetes classification. Results The results indicate that decision tree classifier J48 has the best performance (accuracy = 0.9503, precision = 0.950, recall = 0.950, F-measure = 0.948, and AUC = 0.964). The decision tree structure shows that age is the most significant feature, followed by family history of diabetes, work stress, BMI, salty food preference, physical activity, hypertension, gender, and history of cardiovascular disease or stroke. Conclusions Our study shows that decision tree analyses can be applied to screen individuals for early diabetes risk without the need for invasive tests. This procedure will be particularly useful in developing regions with high epidemiological risk and poor socioeconomic status, and enable clinical practitioners to rapidly screen patients for increased risk of diabetes. The key features in the tree structure could further facilitate diabetes prevention through targeted community interventions, which can potentially improve early diabetes diagnosis and reduce burdens on the healthcare system.
Collapse
Affiliation(s)
- Dongmei Pei
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Yang Gong
- University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Hong Kang
- University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Chengpu Zhang
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Qiyong Guo
- Department of radiology, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China.
| |
Collapse
|
35
|
Pei D, Zhang C, Quan Y, Guo Q. Identification of Potential Type II Diabetes in a Chinese Population with a Sensitive Decision Tree Approach. J Diabetes Res 2019; 2019:4248218. [PMID: 30805372 PMCID: PMC6362481 DOI: 10.1155/2019/4248218] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Revised: 11/20/2018] [Accepted: 12/18/2018] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Diabetes mellitus is a chronic disease with a steadfast increase in prevalence. Due to the chronic course of the disease combining with devastating complications, this disorder could easily carry a financial burden. The early diagnosis of diabetes remains as one of the major challenges medical providers are facing, and the satisfactory screening tools or methods are still required, especially a population- or community-based tool. METHODS This is a retrospective cross-sectional study involving 15,323 subjects who underwent the annual check-up in the Department of Family Medicine of Shengjing Hospital of China Medical University from January 2017 to June 2017. With a strict data filtration, 10,436 records from the eligible participants were utilized to develop a prediction model using the J48 decision tree algorithm. Nine variables, including age, gender, body mass index (BMI), hypertension, history of cardiovascular disease or stroke, family history of diabetes, physical activity, work-related stress, and salty food preference, were considered. RESULTS The accuracy, precision, recall, and area under the receiver operating characteristic curve (AUC) value for identifying potential diabetes were 94.2%, 94.0%, 94.2%, and 94.8%, respectively. The structure of the decision tree shows that age is the most significant feature. The decision tree demonstrated that among those participants with age ≤ 49, 5497 participants (97%) of the individuals were identified as nondiabetic, while age > 49, 771 participants (50%) of the individuals were identified as nondiabetic. In the subgroup where people were 34 < age ≤ 49 and BMI ≥ 25, when with positive family history of diabetes, 89 (92%) out of 97 individuals were identified as diabetic and, when without family history of diabetes, 576 (58%) of the individuals were identified as nondiabetic. Work-related stress was identified as being associated with diabetes. In individuals with 34 < age ≤ 49 and BMI ≥ 25 and without family history of diabetes, 22 (51%) of the individuals with high work-related stress were identified as nondiabetic while 349 (88%) of the individuals with low or moderate work-related stress were identified as not having diabetes. CONCLUSIONS We proposed a classifier based on a decision tree which used nine features of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of diabetes. The classifier indicates that a decision tree analysis can be successfully applied to screen diabetes, which will support clinical practitioners for rapid diabetes identification. The model provides a means to target the prevention of diabetes which could reduce the burden on the health system through effective case management.
Collapse
Affiliation(s)
- Dongmei Pei
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Chengpu Zhang
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Yu Quan
- Department of Informatics, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Qiyong Guo
- Department of Radiology, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| |
Collapse
|