1
|
Cerono G, Chicco D. Ensemble machine learning reveals key features for diabetes duration from electronic health records. PeerJ Comput Sci 2024; 10:e1896. [PMID: 38435625 PMCID: PMC10909161 DOI: 10.7717/peerj-cs.1896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 01/30/2024] [Indexed: 03/05/2024]
Abstract
Diabetes is a metabolic disorder that affects more than 420 million of people worldwide, and it is caused by the presence of a high level of sugar in blood for a long period. Diabetes can have serious long-term health consequences, such as cardiovascular diseases, strokes, chronic kidney diseases, foot ulcers, retinopathy, and others. Even if common, this disease is uneasy to spot, because it often comes with no symptoms. Especially for diabetes type 2, that happens mainly in the adults, knowing how long the diabetes has been present for a patient can have a strong impact on the treatment they can receive. This information, although pivotal, might be absent: for some patients, in fact, the year when they received the diabetes diagnosis might be well-known, but the year of the disease unset might be unknown. In this context, machine learning applied to electronic health records can be an effective tool to predict the past duration of diabetes for a patient. In this study, we applied a regression analysis based on several computational intelligence methods to a dataset of electronic health records of 73 patients with diabetes type 1 with 20 variables and another dataset of records of 400 patients of diabetes type 2 with 49 variables. Among the algorithms applied, Random Forests was able to outperform the other ones and to efficiently predict diabetes duration for both the cohorts, with the regression performances measured through the coefficient of determination R2. Afterwards, we applied the same method for feature ranking, and we detected the most relevant factors of the clinical records correlated with past diabetes duration: age, insulin intake, and body-mass index. Our study discoveries can have profound impact on clinical practice: when the information about the duration of diabetes of patient is missing, medical doctors can use our tool and focus on age, insulin intake, and body-mass index to infer this important aspect. Regarding limitations, unfortunately we were unable to find additional dataset of EHRs of patients with diabetes having the same variables of the two analyzed here, so we could not verify our findings on a validation cohort.
Collapse
Affiliation(s)
- Gabriel Cerono
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Canada
- Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy
| |
Collapse
|
2
|
Gao W, Xie J, Ke Y, Tian M, Zeng Z, Ma X, Zhi M. A two-stage prediction filling method with support vector technologies optimized competitively in stages by grey wolf optimizer and particle swarm optimization for missing fasting blood glucose. Proc Inst Mech Eng H 2023; 237:1427-1440. [PMID: 37873735 DOI: 10.1177/09544119231206456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Missing values often affect the data utilization in epidemiological survey. In this study, according to the cut-off point value of the medical diagnostic standard of fasting blood glucose for diabetes, we divide fasting blood glucose test data from the China Health and Nutrition Survey (CHNS) of Shandong province in 2009 into two classes: the normal and the abnormal. Accordingly, for missing fasting blood glucose values, we propose a two-stage prediction filling method with optimized support vector technologies competitively by particle swarm optimization (PSO) or grey wolf optimizer (GWO), which is to first predict the class of the missing data with support vector machine (SVM) in the first stage and then predict the missing value with support vector regression (SVR) within the predicted class in the second stage. In addition, we use the LIBSVM as a gold standard to train both SVM and SVR in different stages. For two kinds of competitive optimizers in stages, in the first stage GWO has the highest classification accuracy (91.1%), and in the second stage PSO has the smallest in-class mean absolute error (0.48). So, GWO-SVM-PSO-SVR is determined as the optimal model and a predicted value with it serves as a fill value. The comparison results of the models in empirical analysis also show that it outdoes any of the other filling models in terms of mean absolute error and mean absolute percentage error. In addition, the sensitivity analysis shows that it presents high tolerance as the sample size changes and has a good stability.
Collapse
Affiliation(s)
- Wenlong Gao
- Institute of Health Statistics and Intelligent Analysis, School of Public Health, Lanzhou University, Lanzhou, Gansu, P. R. China
- Department of Epidemiology and Health Statistics, School of Public Health, Lanzhou University, Lanzhou, Gansu, P. R. China
| | - Jingxiang Xie
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, Gansu, P. R. China
| | - Yongsong Ke
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, Gansu, P. R. China
| | - Maoyun Tian
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, Gansu, P. R. China
| | - Zhimei Zeng
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, Gansu, P. R. China
| | - Xiaojie Ma
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, Gansu, P. R. China
| | - Minqian Zhi
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, Gansu, P. R. China
| |
Collapse
|
3
|
Bernardini M, Doinychko A, Romeo L, Frontoni E, Amini MR. A novel missing data imputation approach based on clinical conditional Generative Adversarial Networks applied to EHR datasets. Comput Biol Med 2023; 163:107188. [PMID: 37393785 DOI: 10.1016/j.compbiomed.2023.107188] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/13/2023] [Accepted: 06/19/2023] [Indexed: 07/04/2023]
Abstract
The missing data mechanism is a relevant problem in Machine Learning (ML) and biomedical informatics communities. Real-world Electronic Health Record (EHR) datasets comprise several missing values, thus revealing a high level of spatiotemporal sparsity in the predictors' matrix. Several approaches in the state-of-the-art tried to deal with this problem by proposing different data imputation strategies that (i) are often unrelated to the ML model, (ii) are not conceived for EHR data where laboratory exams are not prescribed uniformly over time and percentage of missing values is high (iii) exploit only univariate and linear information on the observed features. Our paper proposes a data imputation strategy based on a clinical conditional Generative Adversarial Network (ccGAN) capable of imputing missing values by exploiting non-linear and multivariate information across patients. Unlike other GAN data imputation-based approaches, our method deals explicitly with the high level of missingness of routine EHR data by conditioning the imputing strategy to the observable values and those fully-annotated. We demonstrated the statistical significance of the ccGAN to other state-of-the-art approaches in terms of imputation (around 19.79% of gain to the best competitor) and predictive performance (up to 1.60% of gain to the best competitor) on a real multi-diabetic centers dataset. We also demonstrated its robustness across different missingness rates (up to 1.61% of gain to the best competitor in the highest missingness rates condition) on an additional benchmark EHR dataset.
Collapse
Affiliation(s)
- Michele Bernardini
- Department of Information Engineering (DII), Università Politecnica delle Marche, Ancona, Italy.
| | - Anastasiia Doinychko
- Grenoble Informatics Laboratory, Université Grenoble Alpes, Saint-Martin-d'Hères, France.
| | - Luca Romeo
- Department of Economics and Law, University of Macerata, Macerata, Italy.
| | - Emanuele Frontoni
- Department of Political Sciences, Communication and International Relations, University of Macerata, Macerata, Italy.
| | - Massih-Reza Amini
- Grenoble Informatics Laboratory, Université Grenoble Alpes, Saint-Martin-d'Hères, France.
| |
Collapse
|
4
|
Afsaneh E, Sharifdini A, Ghazzaghi H, Ghobadi MZ. Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review. Diabetol Metab Syndr 2022; 14:196. [PMID: 36572938 PMCID: PMC9793536 DOI: 10.1186/s13098-022-00969-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 12/16/2022] [Indexed: 12/28/2022] Open
Abstract
Diabetes as a metabolic illness can be characterized by increased amounts of blood glucose. This abnormal increase can lead to critical detriment to the other organs such as the kidneys, eyes, heart, nerves, and blood vessels. Therefore, its prediction, prognosis, and management are essential to prevent harmful effects and also recommend more useful treatments. For these goals, machine learning algorithms have found considerable attention and have been developed successfully. This review surveys the recently proposed machine learning (ML) and deep learning (DL) models for the objectives mentioned earlier. The reported results disclose that the ML and DL algorithms are promising approaches for controlling blood glucose and diabetes. However, they should be improved and employed in large datasets to affirm their applicability.
Collapse
|
5
|
Cardozo G, Tirloni SF, Pereira Moro AR, Marques JLB. Use of Artificial Intelligence in the Search for New Information Through Routine Laboratory Tests: Systematic Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2022; 3:e40473. [PMID: 36644762 PMCID: PMC9828303 DOI: 10.2196/40473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/28/2022] [Accepted: 10/31/2022] [Indexed: 11/05/2022]
Abstract
Background In recent decades, the use of artificial intelligence has been widely explored in health care. Similarly, the amount of data generated in the most varied medical processes has practically doubled every year, requiring new methods of analysis and treatment of these data. Mainly aimed at aiding in the diagnosis and prevention of diseases, this precision medicine has shown great potential in different medical disciplines. Laboratory tests, for example, almost always present their results separately as individual values. However, physicians need to analyze a set of results to propose a supposed diagnosis, which leads us to think that sets of laboratory tests may contain more information than those presented separately for each result. In this way, the processes of medical laboratories can be strongly affected by these techniques. Objective In this sense, we sought to identify scientific research that used laboratory tests and machine learning techniques to predict hidden information and diagnose diseases. Methods The methodology adopted used the population, intervention, comparison, and outcomes principle, searching the main engineering and health sciences databases. The search terms were defined based on the list of terms used in the Medical Subject Heading database. Data from this study were presented descriptively and followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses; 2020) statement flow diagram and the National Institutes of Health tool for quality assessment of articles. During the analysis, the inclusion and exclusion criteria were independently applied by 2 authors, with a third author being consulted in cases of disagreement. Results Following the defined requirements, 40 studies presenting good quality in the analysis process were selected and evaluated. We found that, in recent years, there has been a significant increase in the number of works that have used this methodology, mainly because of COVID-19. In general, the studies used machine learning classification models to predict new information, and the most used parameters were data from routine laboratory tests such as the complete blood count. Conclusions Finally, we conclude that laboratory tests, together with machine learning techniques, can predict new tests, thus helping the search for new diagnoses. This process has proved to be advantageous and innovative for medical laboratories. It is making it possible to discover hidden information and propose additional tests, reducing the number of false negatives and helping in the early discovery of unknown diseases.
Collapse
Affiliation(s)
- Glauco Cardozo
- Federal Institute of Santa Catarina Florianópolis Brazil
| | | | | | | |
Collapse
|
6
|
Salvatori B, Linder T, Eppel D, Morettini M, Burattini L, Göbl C, Tura A. TyGIS: improved triglyceride-glucose index for the assessment of insulin sensitivity during pregnancy. Cardiovasc Diabetol 2022; 21:215. [PMID: 36258194 PMCID: PMC9580191 DOI: 10.1186/s12933-022-01649-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 09/28/2022] [Indexed: 11/21/2022] Open
Abstract
Background The triglyceride-glucose index (TyG) has been proposed as a surrogate marker of insulin resistance, which is a typical trait of pregnancy. However, very few studies analyzed TyG performance as marker of insulin resistance in pregnancy, and they were limited to insulin resistance assessment at fasting rather than in dynamic conditions, i.e., during an oral glucose tolerance test (OGTT), which allows more reliable assessment of the actual insulin sensitivity impairment. Thus, first aim of the study was exploring in pregnancy the relationships between TyG and OGTT-derived insulin sensitivity. In addition, we developed a new version of TyG, for improved performance as marker of insulin resistance in pregnancy. Methods At early pregnancy, a cohort of 109 women underwent assessment of maternal biometry and blood tests at fasting, for measurements of several variables (visit 1). Subsequently (26 weeks of gestation) all visit 1 analyses were repeated (visit 2), and a subgroup of women (84 selected) received a 2 h-75 g OGTT (30, 60, 90, and 120 min sampling) with measurement of blood glucose, insulin and C-peptide for reliable assessment of insulin sensitivity (PREDIM index) and insulin secretion/beta-cell function. The dataset was randomly split into 70% training set and 30% test set, and by machine learning approach we identified the optimal model, with TyG included, showing the best relationship with PREDIM. For inclusion in the model, we considered only fasting variables, in agreement with TyG definition. Results The relationship of TyG with PREDIM was weak. Conversely, the improved TyG, called TyGIS, (linear function of TyG, body weight, lean body mass percentage and fasting insulin) resulted much strongly related to PREDIM, in both training and test sets (R2 > 0.64, p < 0.0001). Bland–Altman analysis and equivalence test confirmed the good performance of TyGIS in terms of association with PREDIM. Different further analyses confirmed TyGIS superiority over TyG. Conclusions We developed an improved version of TyG, as new surrogate marker of insulin sensitivity in pregnancy (TyGIS). Similarly to TyG, TyGIS relies only on fasting variables, but its performances are remarkably improved than those of TyG. Supplementary Information The online version contains supplementary material available at 10.1186/s12933-022-01649-8.
Collapse
Affiliation(s)
| | - Tina Linder
- Department of Obstetrics and Gynaecology, Medical University of Vienna, 1090, Vienna, Austria
| | - Daniel Eppel
- Department of Obstetrics and Gynaecology, Medical University of Vienna, 1090, Vienna, Austria
| | - Micaela Morettini
- Department of Information Engineering, Università Politecnica Delle Marche, 60131, Ancona, Italy
| | - Laura Burattini
- Department of Information Engineering, Università Politecnica Delle Marche, 60131, Ancona, Italy
| | - Christian Göbl
- Department of Obstetrics and Gynaecology, Medical University of Vienna, 1090, Vienna, Austria
| | - Andrea Tura
- CNR Institute of Neuroscience, Corso Stati Uniti 4, 35127, Padua, Italy.
| |
Collapse
|
7
|
Use of Machine Learning and Routine Laboratory Tests for Diabetes Mellitus Screening. BIOMED RESEARCH INTERNATIONAL 2022; 2022:8114049. [PMID: 35392258 PMCID: PMC8983182 DOI: 10.1155/2022/8114049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 02/18/2022] [Accepted: 03/10/2022] [Indexed: 12/28/2022]
Abstract
Most patients with diabetes mellitus are asymptomatic, which leads to delayed and more complex treatment. At the same time, most individuals are routinely subjected to standard clinical laboratory examinations, which create large health datasets over a lifetime. Computer processing has been used to search for health anomalies and predict diseases using clinical examinations. This work studied machine learning models to support the screening of diabetes through routine laboratory tests using data from laboratory tests of 62,496 patients. The classification and regression models used were the K-nearest neighbor, support vector machines, Bayes naïve, random forest models, and artificial neural networks. Glycated hemoglobin, a test used for diabetes diagnosis, was used as the target. Regression models calculated glycated hemoglobin directly and were later classified. The performance of classification computer models has been studied under various subdataset partitions and combinations (e.g., healthy, prediabetic, and diabetes, as well as no healthy and no diabetes). The best single performance was achieved with the artificial neural network model when detecting prediabetes or diabetes. The artificial neural network classification model scored 78.1%, 78.7%, and 78.4% for sensitivity, precision, and F1 scores, respectively, when identifying no healthy group. Other models also had good results, depending on what is desired. Machine learning-based models can predict glycated hemoglobin values from routine laboratory tests and can be used as a screening tool to refer a patient for further testing.
Collapse
|
8
|
Ilari L, Piersanti A, Göbl C, Burattini L, Kautzky-Willer A, Tura A, Morettini M. Unraveling the Factors Determining Development of Type 2 Diabetes in Women With a History of Gestational Diabetes Mellitus Through Machine-Learning Techniques. Front Physiol 2022; 13:789219. [PMID: 35250610 PMCID: PMC8892139 DOI: 10.3389/fphys.2022.789219] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 01/11/2022] [Indexed: 11/13/2022] Open
Abstract
Gestational diabetes mellitus (GDM) is a type of diabetes that usually resolves at the end of the pregnancy but exposes to a higher risk of developing type 2 diabetes mellitus (T2DM). This study aimed to unravel the factors, among those that quantify specific metabolic processes, which determine progression to T2DM by using machine-learning techniques. Classification of women who did progress to T2DM (labeled as PROG, n = 19) vs. those who did not (labeled as NON-PROG, n = 59) progress to T2DM has been performed by using Orange software through a data analysis procedure on a generated data set including anthropometric data and a total of 34 features, extracted through mathematical modeling/methods procedures. Feature selection has been performed through decision tree algorithm and then Naïve Bayes and penalized (L2) logistic regression were used to evaluate the ability of the selected features to solve the classification problem. Performance has been evaluated in terms of area under the operating receiver characteristics (AUC), classification accuracy (CA), precision, sensitivity, specificity, and F1. Feature selection provided six features, and based on them, classification was performed as follows: AUC of 0.795, 0.831, and 0.884; CA of 0.827, 0.813, and 0.840; precision of 0.830, 0.854, and 0.834; sensitivity of 0.827, 0.813, and 0.840; specificity of 0.700, 0.821, and 0.662; and F1 of 0.828, 0.824, and 0.836 for tree algorithm, Naïve Bayes, and penalized logistic regression, respectively. Fasting glucose, age, and body mass index together with features describing insulin action and secretion may predict the development of T2DM in women with a history of GDM.
Collapse
Affiliation(s)
- Ludovica Ilari
- Department of Information Engineering, Università Politecnica delle Marche, Ancona, Italy
| | - Agnese Piersanti
- Department of Information Engineering, Università Politecnica delle Marche, Ancona, Italy
| | - Christian Göbl
- Department of Obstetrics and Gynecology, Medical University of Vienna, Vienna, Austria
| | - Laura Burattini
- Department of Information Engineering, Università Politecnica delle Marche, Ancona, Italy
| | - Alexandra Kautzky-Willer
- Division of Endocrinology and Metabolism, Department of Internal Medicine III, Medical University of Vienna, Vienna, Austria
| | - Andrea Tura
- Metabolic Unit, CNR Institute of Neuroscience, Padua, Italy
| | - Micaela Morettini
- Department of Information Engineering, Università Politecnica delle Marche, Ancona, Italy
- *Correspondence: Micaela Morettini,
| |
Collapse
|
9
|
Berián J, Bravo I, Gardel-Vicente A, Lázaro-Galilea JL, Rigla M. Dynamic Insulin Basal Needs Estimation and Parameters Adjustment in Type 1 Diabetes. SENSORS 2021; 21:s21155226. [PMID: 34372462 PMCID: PMC8347968 DOI: 10.3390/s21155226] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 07/24/2021] [Accepted: 07/29/2021] [Indexed: 01/25/2023]
Abstract
Technology advances have made possible improvements such as Continuous Glucose Monitors, giving the patient a glucose reading every few minutes, or insulin pumps, allowing more personalized therapies. With the increasing number of available closed-loop systems, new challenges appear regarding algorithms and functionalities. Several of the analysed systems in this paper try to adapt to changes in some patients’ conditions and, in several of these systems, other variables such as basal needs are considered fixed from day to day to simplify the control problem. Therefore, these systems require a correct adjustment of the basal needs profile which becomes crucial to obtain good results. In this paper a novel approach tries to dynamically determine the insulin basal needs of the patient and use this information within a closed-loop algorithm, allowing the system to dynamically adjust in situations of illness, exercise, high-fat-content meals or even partially blocked infusion sites and avoiding the need for setting a basal profile that approximately matches the basal needs of the patient. The insulin sensitivity factor and the glycemic target are also dynamically modified according to the situation of the patient. Basal insulin needs are dynamically determined through linear regression via the decomposition of previously dosed insulin and its effect on the patient’s glycemia. Using the obtained value as basal insulin needs and other mechanisms such as basal needs modification through its trend, ISF and glycemic targets modification and low-glucose-suspend threshold, the safety of the algorithm is improved. The dynamic basal insulin needs determination was successfully included in a closed-loop control algorithm and was simulated on 30 virtual patients (10 adults, 10 adolescent and 10 children) using an open-source python implementation of the FDA-approved (Food and Drug Administration) UVa (University of Virginia)/Padova Simulator. Simulations showed that the proposed system dynamically determines the basal needs and can adapt to a partial blockage of the insulin infusion, obtaining similar results in terms of time in range to the case in which no blockage was simulated. The proposed algorithm can be incorporated to other current closed-loop control algorithms to directly estimate the patient’s basal insulin needs or as a monitoring channel to detect situations in which basal needs may differ from the expected ones.
Collapse
|
10
|
A Non-invasive Approach to Identify Insulin Resistance with Triglycerides and HDL-c Ratio Using Machine learning. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10461-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
11
|
Basu S, Johnson KT, Berkowitz SA. Use of Machine Learning Approaches in Clinical Epidemiological Research of Diabetes. Curr Diab Rep 2020; 20:80. [PMID: 33270183 DOI: 10.1007/s11892-020-01353-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/26/2020] [Indexed: 12/12/2022]
Abstract
PURPOSE OF REVIEW Machine learning approaches-which seek to predict outcomes or classify patient features by recognizing patterns in large datasets-are increasingly applied to clinical epidemiology research on diabetes. Given its novelty and emergence in fields outside of biomedical research, machine learning terminology, techniques, and research findings may be unfamiliar to diabetes researchers. Our aim was to present the use of machine learning approaches in an approachable way, drawing from clinical epidemiological research in diabetes published from 1 Jan 2017 to 1 June 2020. RECENT FINDINGS Machine learning approaches using tree-based learners-which produce decision trees to help guide clinical interventions-frequently have higher sensitivity and specificity than traditional regression models for risk prediction. Machine learning approaches using neural networking and "deep learning" can be applied to medical image data, particularly for the identification and staging of diabetic retinopathy and skin ulcers. Among the machine learning approaches reviewed, researchers identified new strategies to develop standard datasets for rigorous comparisons across older and newer approaches, methods to illustrate how a machine learner was treating underlying data, and approaches to improve the transparency of the machine learning process. Machine learning approaches have the potential to improve risk stratification and outcome prediction for clinical epidemiology applications. Achieving this potential would be facilitated by use of universal open-source datasets for fair comparisons. More work remains in the application of strategies to communicate how the machine learners are generating their predictions.
Collapse
Affiliation(s)
- Sanjay Basu
- Center for Primary Care, Harvard Medical School, Boston, MA, USA.
- Research and Population Health, Collective Health, San Francisco, CA, USA.
- School of Public Health, Imperial College London, London, SW7, UK.
| | - Karl T Johnson
- General Medicine and Clinical Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Seth A Berkowitz
- General Medicine and Clinical Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
12
|
Hu L, Liu B, Li Y. Ranking sociodemographic, health behavior, prevention, and environmental factors in predicting neighborhood cardiovascular health: A Bayesian machine learning approach. Prev Med 2020; 141:106240. [PMID: 32860821 PMCID: PMC7704682 DOI: 10.1016/j.ypmed.2020.106240] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Revised: 07/19/2020] [Accepted: 08/19/2020] [Indexed: 10/23/2022]
Abstract
Cardiovascular disease is the leading cause of death in the United States. While abundant research has been conducted to identify risk factors for cardiovascular disease at the individual level, less is known about factors that may influence population cardiovascular health outcomes at the neighborhood level. The purpose of this study is to use Bayesian Additive Regression Trees, a state-of-the-art machine learning approach, to rank sociodemographic, health behavior, prevention, and environmental factors in predicting neighborhood cardiovascular health. We created a new neighborhood health dataset by combining three datasets at the census tract level, including the 500 Cities Data from the Centers for Disease Control and Prevention, the 2011-2015 American Community Survey 5-Year Estimates from the Census Bureau, and the 2015-2016 Environmental Justice Screening database from the Environmental Protection Agency in the United States. Results showed that neighborhood behavioral factors such as the proportions of people who are obese, do not have leisure-time physical activity, and have binge drinking emerged as top five predictors for most of the neighborhood cardiovascular health outcomes. Findings from this study would allow public health researchers and policymakers to prioritize community-based interventions and efficiently use limited resources to improve neighborhood cardiovascular health.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bian Liu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yan Li
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Obstetrics, Gynecology, and Reproductive Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
13
|
Ferracuti F, Fioretti S, Frontoni E, Iarlori S, Mengarelli A, Riccio M, Romeo L, Verdini F. Functional evaluation of triceps surae during heel rise test: from EMG frequency analysis to machine learning approach. Med Biol Eng Comput 2020; 59:41-56. [PMID: 33191440 DOI: 10.1007/s11517-020-02286-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 10/31/2020] [Indexed: 11/28/2022]
Abstract
Soleus muscle flap as coverage tissue is a possible surgical solution adopted to cover the wounds due to open fractures. Despite this procedure presents many clinical advantages, relatively poor information is available about the loss of functionality of triceps surae of the treated leg. In this study, a group of patients who underwent a soleus muscle flap surgical procedure has been analyzed through the heel rise test (HRT), in order to explore the triceps surae residual functionalities. A frequency band analysis was performed in order to assess whether the residual heads of triceps surae exhibit different characteristics with respect to both the non-treated lower limb and an age-matched control group. Then, an in-depth analysis based on a machine learning approach was proposed for discriminating between groups by generalizing across new unseen subjects. Experimental results showed the reliability of the proposed analyses for discriminating between-group at a specific time epoch and the high interpretability of the proposed machine learning algorithm allowed the temporal localization of the most discriminative frequency bands. Findings of this study highlighted that significant differences can be recognized in the myoelectric spectral characteristics between the treated and contralateral leg in patients who underwent soleus flap surgery. These experimental results may support the clinical decision-making for assessing triceps surae performance and for supporting the choice of treatment in plastic and reconstructive surgery. Graphical Abstract The Graphical abstract presents the scope of the proposed analysis of myoelectric signals of soleus and gastrocnemius muscles of patiens groups during Hell Rise Test, highlighting the applied methods and the obtained results.
Collapse
Affiliation(s)
- Francesco Ferracuti
- Università Politecnica delle Marche, Via Brecce Bianche 1, 60131, Ancona, Italy
| | - Sandro Fioretti
- Università Politecnica delle Marche, Via Brecce Bianche 1, 60131, Ancona, Italy
| | - Emanuele Frontoni
- Università Politecnica delle Marche, Via Brecce Bianche 1, 60131, Ancona, Italy
| | - Sabrina Iarlori
- Università Politecnica delle Marche, Via Brecce Bianche 1, 60131, Ancona, Italy.
| | | | - Michele Riccio
- Department of Plastic and Reconstructive Hand Surgery, Università Politecnica delle Marche, AOU Ospedali Riuniti, Ancona, Italy
| | - Luca Romeo
- Università Politecnica delle Marche, Via Brecce Bianche 1, 60131, Ancona, Italy
| | - Federica Verdini
- Università Politecnica delle Marche, Via Brecce Bianche 1, 60131, Ancona, Italy
| |
Collapse
|
14
|
Frontoni E, Romeo L, Bernardini M, Moccia S, Migliorelli L, Paolanti M, Ferri A, Misericordia P, Mancini A, Zingaretti P. A Decision Support System for Diabetes Chronic Care Models Based on General Practitioner Engagement and EHR Data Sharing. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE-JTEHM 2020; 8:3000112. [PMID: 33150095 PMCID: PMC7605604 DOI: 10.1109/jtehm.2020.3031107] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 09/16/2020] [Accepted: 10/10/2020] [Indexed: 12/19/2022]
Abstract
Objective Decision support systems (DSS) have been developed and promoted for their potential to improve quality of health care. However, there is a lack of common clinical strategy and a poor management of clinical resources and erroneous implementation of preventive medicine. Methods To overcome this problem, this work proposed an integrated system that relies on the creation and sharing of a database extracted from GPs’ Electronic Health Records (EHRs) within the Netmedica Italian (NMI) cloud infrastructure. Although the proposed system is a pilot application specifically tailored for improving the chronic Type 2 Diabetes (T2D) care it could be easily targeted to effectively manage different chronic-diseases. The proposed DSS is based on EHR structure used by GPs in their daily activities following the most updated guidelines in data protection and sharing. The DSS is equipped with a Machine Learning (ML) method for analyzing the shared EHRs and thus tackling the high variability of EHRs. A novel set of T2D care-quality indicators are used specifically to determine the economic incentives and the T2D features are presented as predictors of the proposed ML approach. Results The EHRs from 41237 T2D patients were analyzed. No additional data collection, with respect to the standard clinical practice, was required. The DSS exhibited competitive performance (up to an overall accuracy of 98%±2% and macro-recall of 96%±1%) for classifying chronic care quality across the different follow-up phases. The chronic care quality model brought to a significant increase (up to 12%) of the T2D patients without complications. For GPs who agreed to use the proposed system, there was an economic incentive. A further bonus was assigned when performance targets are achieved. Conclusions The quality care evaluation in a clinical use-case scenario demonstrated how the empowerment of the GPs through the use of the platform (integrating the proposed DSS), along with the economic incentives, may speed up the improvement of care.
Collapse
Affiliation(s)
- Emanuele Frontoni
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Luca Romeo
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Michele Bernardini
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Sara Moccia
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Lucia Migliorelli
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Marina Paolanti
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Alessandro Ferri
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | | | - Adriano Mancini
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Primo Zingaretti
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| |
Collapse
|
15
|
Early temporal prediction of Type 2 Diabetes Risk Condition from a General Practitioner Electronic Health Record: A Multiple Instance Boosting Approach. Artif Intell Med 2020; 105:101847. [PMID: 32505428 DOI: 10.1016/j.artmed.2020.101847] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Revised: 02/12/2020] [Accepted: 03/20/2020] [Indexed: 11/22/2022]
Abstract
Early prediction of target patients at high risk of developing Type 2 diabetes (T2D) plays a significant role in preventing the onset of overt disease and its associated comorbidities. Although fundamental in early phases of T2D natural history, insulin resistance is not usually quantified by General Practitioners (GPs). Triglyceride-glucose (TyG) index has been proven useful in clinical studies for quantifying insulin resistance and for the early identification of individuals at T2D risk but still not applied by GPs for diagnostic purposes. The aim of this study is to propose a multiple instance learning boosting algorithm (MIL-Boost) for creating a predictive model capable of early prediction of worsening insulin resistance (low vs high T2D risk) in terms of TyG index. The MIL-Boost is applied to past electronic health record (EHR) patients' information stored by a single GP. The proposed MIL-Boost algorithm proved to be effective in dealing with this task, by performing better than the other state-of-the-art ML competitors (Recall from 0.70 and up to 0.83). The proposed MIL-based approach is able to extract hidden patterns from past EHR temporal data, even not directly exploiting triglycerides and glucose measurements. The major advantages of our method can be found in its ability to model the temporal evolution of longitudinal EHR data while dealing with small sample size and variability in the observations (e.g., a small variable number of prescriptions for non-hospitalized patients). The proposed algorithm may represent the main core of a clinical decision support system.
Collapse
|