51
|
Alshammari R, Atiyah N, Daghistani T, Alshammari A. Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet. Online J Public Health Inform 2020; 12:e11. [PMID: 32908645 PMCID: PMC7462602 DOI: 10.5210/ojphi.v12i1.10611] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Diabetes is a salient issue and a significant health care concern for many nations. The forecast for the prevalence of diabetes is on the rise. Hence, building a prediction machine learning model to assist in the identification of diabetic patients is of great interest. This study aims to create a machine learning model that is capable of predicting diabetes with high performance. The following study used the BigML platform to train four machine learning algorithms, namely, Deepnet, Models (decision tree), Ensemble and Logistic Regression, on data sets collected from the Ministry of National Guard Hospital Affairs (MNGHA) in Saudi Arabia between the years of 2013 and 2015. The comparative evaluation criteria for the four algorithms examined included; Accuracy, Precision, Recall, F-measure and PhiCoefficient. Results show that the Deepnet algorithm achieved higher performance compared to other machine learning algorithms based on various evaluation matrices.
Collapse
Affiliation(s)
- Riyad Alshammari
- Health Informatics Department, College of Public Health
and Health Informatics King Saud Bin Abdulaziz University for Health Sciences
(KSAU-HS) King Abdullah International Medical Research Center (KAIMRC) Ministry
of National Guard Health Affairs, Riyadh, KSA
| | - Noorah Atiyah
- Faculty of Health Sciences, Simon Fraser University,
Burnaby British Columbia, Canada
| | - Tahani Daghistani
- Health Informatics Department, College of Public Health
and Health Informatics King Saud Bin Abdulaziz University for Health Sciences
(KSAU-HS) King Abdullah International Medical Research Center (KAIMRC) Ministry
of National Guard Health Affairs, Riyadh, KSA
| | - Abdulwahhab Alshammari
- Health Informatics Department, College of Public Health
and Health Informatics King Saud Bin Abdulaziz University for Health Sciences
(KSAU-HS) King Abdullah International Medical Research Center (KAIMRC) Ministry
of National Guard Health Affairs, Riyadh, KSA
| |
Collapse
|
52
|
Doyle OM, Leavitt N, Rigg JA. Finding undiagnosed patients with hepatitis C infection: an application of artificial intelligence to patient claims data. Sci Rep 2020; 10:10521. [PMID: 32601354 PMCID: PMC7324575 DOI: 10.1038/s41598-020-67013-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 05/27/2020] [Indexed: 12/12/2022] Open
Abstract
Hepatitis C virus (HCV) remains a significant public health challenge with approximately half of the infected population untreated and undiagnosed. In this retrospective study, predictive models were developed to identify undiagnosed HCV patients using longitudinal medical claims linked to prescription data from approximately ten million patients in the United States (US) between 2010 and 2016. Features capturing information on demographics, risk factors, symptoms, treatments and procedures relevant to HCV were extracted from patients' medical history. Predictive algorithms were developed based on logistic regression, random forests, gradient boosted trees and a stacked ensemble. Descriptive analysis indicated that patients exhibited known symptoms of HCV on average 2-3 years prior to their diagnosis. The precision was at least 95% for all algorithms at low levels of recall (10%). For recall levels >50%, the stacked ensemble performed best with a precision of 97% compared with 87% for the gradient boosted trees and just 31% for the logistic regression. For context, the Center for Disease Control recommends screening in an at-risk sub-population with an estimated HCV prevalence of 2.23%. The artificial intelligence (AI) algorithm presented here has a precision which is substantially higher than the screening rates associated with recommended clinical guidelines, suggesting that AI algorithms have the potential to provide a step change in the effectiveness of HCV screening.
Collapse
Affiliation(s)
- Orla M Doyle
- Predictive Analytics, Real World Solutions, IQVIA, London, N1 9JY, UK.
| | - Nadejda Leavitt
- Predictive Analytics, Real World Solutions, IQVIA, 1 IMS Drive, Plymouth Meeting, PA, USA
| | - John A Rigg
- Predictive Analytics, Real World Solutions, IQVIA, London, N1 9JY, UK
| |
Collapse
|
53
|
Musacchio N, Giancaterini A, Guaita G, Ozzello A, Pellegrini MA, Ponzani P, Russo GT, Zilich R, de Micheli A. Artificial Intelligence and Big Data in Diabetes Care: A Position Statement of the Italian Association of Medical Diabetologists. J Med Internet Res 2020; 22:e16922. [PMID: 32568088 PMCID: PMC7338925 DOI: 10.2196/16922] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 03/09/2020] [Accepted: 04/12/2020] [Indexed: 12/24/2022] Open
Abstract
Since the last decade, most of our daily activities have become digital. Digital health takes into account the ever-increasing synergy between advanced medical technologies, innovation, and digital communication. Thanks to machine learning, we are not limited anymore to a descriptive analysis of the data, as we can obtain greater value by identifying and predicting patterns resulting from inductive reasoning. Machine learning software programs that disclose the reasoning behind a prediction allow for “what-if” models by which it is possible to understand if and how, by changing certain factors, one may improve the outcomes, thereby identifying the optimal behavior. Currently, diabetes care is facing several challenges: the decreasing number of diabetologists, the increasing number of patients, the reduced time allowed for medical visits, the growing complexity of the disease both from the standpoints of clinical and patient care, the difficulty of achieving the relevant clinical targets, the growing burden of disease management for both the health care professional and the patient, and the health care accessibility and sustainability. In this context, new digital technologies and the use of artificial intelligence are certainly a great opportunity. Herein, we report the results of a careful analysis of the current literature and represent the vision of the Italian Association of Medical Diabetologists (AMD) on this controversial topic that, if well used, may be the key for a great scientific innovation. AMD believes that the use of artificial intelligence will enable the conversion of data (descriptive) into knowledge of the factors that “affect” the behavior and correlations (predictive), thereby identifying the key aspects that may establish an improvement of the expected results (prescriptive). Artificial intelligence can therefore become a tool of great technical support to help diabetologists become fully responsible of the individual patient, thereby assuring customized and precise medicine. This, in turn, will allow for comprehensive therapies to be built in accordance with the evidence criteria that should always be the ground for any therapeutic choice.
Collapse
Affiliation(s)
| | - Annalisa Giancaterini
- Diabetology Service, Muggiò Polyambulatory, Azienda Socio Sanitaria Territoriale, Monza, Italy
| | - Giacomo Guaita
- Diabetology, Endocrinology and Metabolic Diseases Service, Azienda Tutela Salute Sardegna-Azienda Socio Sanitaria Locale, Carbonia, Italy
| | - Alessandro Ozzello
- Departmental Structure of Endocrine Diseases and Diabetology, Azienda Sanitaria Locale TO3, Pinerolo, Italy
| | - Maria A Pellegrini
- Italian Association of Diabetologists, Rome, Italy.,New Coram Limited Liability Company, Udine, Italy
| | - Paola Ponzani
- Operative Unit of Diabetology, La Colletta Hospital, Azienda Sanitaria Locale 3, Genova, Italy
| | - Giuseppina T Russo
- Department of Clinical and Experimental Medicine, University of Messina, Messina, Italy
| | | | - Alberto de Micheli
- Associazione dei Cavalieri Italiani del Sovrano Militare Ordine di Malta, Genova, Italy
| |
Collapse
|
54
|
Wang T, Xuan P, Liu Z, Zhang T. Assistant diagnosis with Chinese electronic medical records based on CNN and BiLSTM with phrase-level and word-level attentions. BMC Bioinformatics 2020; 21:230. [PMID: 32503424 PMCID: PMC7275511 DOI: 10.1186/s12859-020-03554-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 05/25/2020] [Indexed: 01/09/2023] Open
Abstract
Background Inferring diseases related to the patient’s electronic medical records (EMRs) is of great significance for assisting doctor diagnosis. Several recent prediction methods have shown that deep learning-based methods can learn the deep and complex information contained in EMRs. However, they do not consider the discriminative contributions of different phrases and words. Moreover, local information and context information of EMRs should be deeply integrated. Results A new method based on the fusion of a convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) with attention mechanisms is proposed for predicting a disease related to a given EMR, and it is referred to as FCNBLA. FCNBLA deeply integrates local information, context information of the word sequence and more informative phrases and words. A novel framework based on deep learning is developed to learn the local representation, the context representation and the combination representation. The left side of the framework is constructed based on CNN to learn the local representation of adjacent words. The right side of the framework based on BiLSTM focuses on learning the context representation of the word sequence. Not all phrases and words contribute equally to the representation of an EMR meaning. Therefore, we establish the attention mechanisms at the phrase level and word level, and the middle module of the framework learns the combination representation of the enhanced phrases and words. The macro average f-score and accuracy of FCNBLA achieved 91.29 and 92.78%, respectively. Conclusion The experimental results indicate that FCNBLA yields superior performance compared with several state-of-the-art methods. The attention mechanisms and combination representations are also confirmed to be helpful for improving FCNBLA’s prediction performance. Our method is helpful for assisting doctors in diagnosing diseases in patients.
Collapse
Affiliation(s)
- Tong Wang
- School of Computer Science and Technology, Heilongjiang University, Harbin, 150080, China
| | - Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin, 150080, China.
| | - Zonglin Liu
- School of Computer Science and Technology, Heilongjiang University, Harbin, 150080, China
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin, 150080, China
| |
Collapse
|
55
|
Diamanti K, Visvanathar R, Pereira MJ, Cavalli M, Pan G, Kumar C, Skrtic S, Risérus U, Eriksson JW, Kullberg J, Komorowski J, Wadelius C, Ahlström H. Integration of whole-body [ 18F]FDG PET/MRI with non-targeted metabolomics can provide new insights on tissue-specific insulin resistance in type 2 diabetes. Sci Rep 2020; 10:8343. [PMID: 32433479 PMCID: PMC7239946 DOI: 10.1038/s41598-020-64524-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 03/30/2020] [Indexed: 11/21/2022] Open
Abstract
Alteration of various metabolites has been linked to type 2 diabetes (T2D) and insulin resistance. However, identifying significant associations between metabolites and tissue-specific phenotypes requires a multi-omics approach. In a cohort of 42 subjects with different levels of glucose tolerance (normal, prediabetes and T2D) matched for age and body mass index, we calculated associations between parameters of whole-body positron emission tomography (PET)/magnetic resonance imaging (MRI) during hyperinsulinemic euglycemic clamp and non-targeted metabolomics profiling for subcutaneous adipose tissue (SAT) and plasma. Plasma metabolomics profiling revealed that hepatic fat content was positively associated with tyrosine, and negatively associated with lysoPC(P-16:0). Visceral adipose tissue (VAT) and SAT insulin sensitivity (Ki), were positively associated with several lysophospholipids, while the opposite applied to branched-chain amino acids. The adipose tissue metabolomics revealed a positive association between non-esterified fatty acids and, VAT and liver Ki. Bile acids and carnitines in adipose tissue were inversely associated with VAT Ki. Furthermore, we detected several metabolites that were significantly higher in T2D than normal/prediabetes. In this study we present novel associations between several metabolites from SAT and plasma with the fat fraction, volume and insulin sensitivity of various tissues throughout the body, demonstrating the benefit of an integrative multi-omics approach.
Collapse
Affiliation(s)
- Klev Diamanti
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Robin Visvanathar
- Department of Surgical Sciences, section of Radiology, Uppsala University, Uppsala, Sweden
| | - Maria J Pereira
- Department of Medical Sciences, Clinical Diabetes and Metabolism, Uppsala University, Uppsala, Sweden
| | - Marco Cavalli
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Gang Pan
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Chanchal Kumar
- Translational Science & Experimental Medicine, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
- Karolinska Institute/AstraZeneca Integrated CardioMetabolic Centre (KI/AZ ICMC), Department of Medicine, Novum, Huddinge, Sweden
| | - Stanko Skrtic
- Pharmaceutical Technology & Development, AstraZeneca AB, Gothenburg, Sweden
- Department of Medicine, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Ulf Risérus
- Department of Public Health and Caring Sciences, Clinical Nutrition and Metabolism, Uppsala University, Uppsala, Sweden
| | - Jan W Eriksson
- Department of Medical Sciences, Clinical Diabetes and Metabolism, Uppsala University, Uppsala, Sweden
| | - Joel Kullberg
- Department of Surgical Sciences, section of Radiology, Uppsala University, Uppsala, Sweden
- Antaros Medical AB, Mölndal, Sweden
| | - Jan Komorowski
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
- Institute of Computer Science, PAN, Warsaw, Poland
| | - Claes Wadelius
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Håkan Ahlström
- Department of Surgical Sciences, section of Radiology, Uppsala University, Uppsala, Sweden.
- Antaros Medical AB, Mölndal, Sweden.
| |
Collapse
|
56
|
Dworzynski P, Aasbrenn M, Rostgaard K, Melbye M, Gerds TA, Hjalgrim H, Pers TH. Nationwide prediction of type 2 diabetes comorbidities. Sci Rep 2020; 10:1776. [PMID: 32019971 PMCID: PMC7000818 DOI: 10.1038/s41598-020-58601-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 01/16/2020] [Indexed: 02/06/2023] Open
Abstract
Identification of individuals at risk of developing disease comorbidities represents an important task in tackling the growing personal and societal burdens associated with chronic diseases. We employed machine learning techniques to investigate to what extent data from longitudinal, nationwide Danish health registers can be used to predict individuals at high risk of developing type 2 diabetes (T2D) comorbidities. Leveraging logistic regression-, random forest- and gradient boosting models and register data spanning hospitalizations, drug prescriptions and contacts with primary care contractors from >200,000 individuals newly diagnosed with T2D, we predicted five-year risk of heart failure (HF), myocardial infarction (MI), stroke (ST), cardiovascular disease (CVD) and chronic kidney disease (CKD). For HF, MI, CVD, and CKD, register-based models outperformed a reference model leveraging canonical individual characteristics by achieving area under the receiver operating characteristic curve improvements of 0.06, 0.03, 0.04, and 0.07, respectively. The top 1,000 patients predicted to be at highest risk exhibited observed incidence ratios exceeding 4.99, 3.52, 1.97 and 4.71 respectively. In summary, prediction of T2D comorbidities utilizing Danish registers led to consistent albeit modest performance improvements over reference models, suggesting that register data could be leveraged to systematically identify individuals at risk of developing disease comorbidities.
Collapse
Affiliation(s)
- Piotr Dworzynski
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark
| | - Martin Aasbrenn
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Department of Geriatrics and Internal Medicine, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark
| | - Klaus Rostgaard
- Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark
| | - Mads Melbye
- Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Henrik Hjalgrim
- Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark
- Department of Haematology, Rigshospitalet, Copenhagen, Denmark
| | - Tune H Pers
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
- Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark.
| |
Collapse
|
57
|
Luo G, He S, Stone BL, Nkoy FL, Johnson MD. Developing a Model to Predict Hospital Encounters for Asthma in Asthmatic Patients: Secondary Analysis. JMIR Med Inform 2020; 8:e16080. [PMID: 31961332 PMCID: PMC7001050 DOI: 10.2196/16080] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 11/01/2019] [Accepted: 12/01/2019] [Indexed: 12/12/2022] Open
Abstract
Background As a major chronic disease, asthma causes many emergency department (ED) visits and hospitalizations each year. Predictive modeling is a key technology to prospectively identify high-risk asthmatic patients and enroll them in care management for preventive care to reduce future hospital encounters, including inpatient stays and ED visits. However, existing models for predicting hospital encounters in asthmatic patients are inaccurate. Usually, they miss over half of the patients who will incur future hospital encounters and incorrectly classify many others who will not. This makes it difficult to match the limited resources of care management to the patients who will incur future hospital encounters, increasing health care costs and degrading patient outcomes. Objective The goal of this study was to develop a more accurate model for predicting hospital encounters in asthmatic patients. Methods Secondary analysis of 334,564 data instances from Intermountain Healthcare from 2005 to 2018 was conducted to build a machine learning classification model to predict the hospital encounters for asthma in the following year in asthmatic patients. The patient cohort included all asthmatic patients who resided in Utah or Idaho and visited Intermountain Healthcare facilities during 2005 to 2018. A total of 235 candidate features were considered for model building. Results The model achieved an area under the receiver operating characteristic curve of 0.859 (95% CI 0.846-0.871). When the cutoff threshold for conducting binary classification was set at the top 10.00% (1926/19,256) of asthmatic patients with the highest predicted risk, the model reached an accuracy of 90.31% (17,391/19,256; 95% CI 89.86-90.70), a sensitivity of 53.7% (436/812; 95% CI 50.12-57.18), and a specificity of 91.93% (16,955/18,444; 95% CI 91.54-92.31). To steer future research on this topic, we pinpointed several potential improvements to our model. Conclusions Our model improves the state of the art for predicting hospital encounters for asthma in asthmatic patients. After further refinement, the model could be integrated into a decision support tool to guide asthma care management allocation. International Registered Report Identifier (IRRID) RR2-10.2196/resprot.5039
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - Shan He
- Care Transformation, Intermountain Healthcare, Salt Lake City, UT, United States
| | - Bryan L Stone
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| | - Flory L Nkoy
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| | - Michael D Johnson
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| |
Collapse
|
58
|
Itani S, Rossignol M. At the Crossroads Between Psychiatry and Machine Learning: Insights Into Paradigms and Challenges for Clinical Applicability. Front Psychiatry 2020; 11:552262. [PMID: 33192664 PMCID: PMC7541948 DOI: 10.3389/fpsyt.2020.552262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 09/07/2020] [Indexed: 11/27/2022] Open
Affiliation(s)
- Sarah Itani
- Fund for Scientific Research (F.R.S.-FNRS), Brussels, Belgium.,Department of Mathematics and Operations Research, Faculty of Engineering, University of Mons, Mons, Belgium
| | - Mandy Rossignol
- Department of Cognitive Psychology and Neuropsychology, Faculty of Psychology and Education, University of Mons, Mons, Belgium
| |
Collapse
|
59
|
A Review of Methodological Approaches for Developing Diagnostic Algorithms for Diabetes Screening. J Nurs Meas 2019; 27:433-457. [PMID: 31871284 DOI: 10.1891/1061-3749.27.3.433] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
BACKGROUND AND PURPOSE Diagnostic algorithms are invaluable tools for screening diabetes. This review aimed to evaluate and identify the most robust methodological approaches for developing diagnostic algorithms for screening diabetes. METHODS Following a literature search, methodological quality of algorithm development studies was evaluated using the TRIPOD guidelines (Collins, Reitsma, Altman, & Moons, 2015). RESULTS Methods used for developing the algorithms included logistic regression models, classification and regression trees, Random Forest and TreeNet, Artificial Neural Networks, and Naïve Bayes. Methodological issues for algorithm development studies were related to handling of missing values, reporting recruitment methods, categorization of continuous variables, and statistical controls. CONCLUSIONS Most studies exhibited critical methodological flaws and poor adherence to reporting standards. Diabetes screening algorithms can easily be availed electronically and utilized by nurses at minimal cost even in underserved areas.
Collapse
|
60
|
Nguyen BP, Pham HN, Tran H, Nghiem N, Nguyen QH, Do TTT, Tran CT, Simpson CR. Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 182:105055. [PMID: 31505379 DOI: 10.1016/j.cmpb.2019.105055] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 08/17/2019] [Accepted: 08/27/2019] [Indexed: 06/10/2023]
Abstract
OBJECTIVE Diabetes is responsible for considerable morbidity, healthcare utilisation and mortality in both developed and developing countries. Currently, methods of treating diabetes are inadequate and costly so prevention becomes an important step in reducing the burden of diabetes and its complications. Electronic health records (EHRs) for each individual or a population have become important tools in understanding developing trends of diseases. Using EHRs to predict the onset of diabetes could improve the quality and efficiency of medical care. In this paper, we apply a wide and deep learning model that combines the strength of a generalised linear model with various features and a deep feed-forward neural network to improve the prediction of the onset of type 2 diabetes mellitus (T2DM). MATERIALS AND METHODS The proposed method was implemented by training various models into a logistic loss function using a stochastic gradient descent. We applied this model using public hospital record data provided by the Practice Fusion EHRs for the United States population. The dataset consists of de-identified electronic health records for 9948 patients, of which 1904 have been diagnosed with T2DM. Prediction of diabetes in 2012 was based on data obtained from previous years (2009-2011). The imbalance class of the model was handled by Synthetic Minority Oversampling Technique (SMOTE) for each cross-validation training fold to analyse the performance when synthetic examples for the minority class are created. We used SMOTE of 150 and 300 percent, in which 300 percent means that three new synthetic instances are created for each minority class instance. This results in the approximated diabetes:non-diabetes distributions in the training set of 1:2 and 1:1, respectively. RESULTS Our final ensemble model not using SMOTE obtained an accuracy of 84.28%, area under the receiver operating characteristic curve (AUC) of 84.13%, sensitivity of 31.17% and specificity of 96.85%. Using SMOTE of 150 and 300 percent did not improve AUC (83.33% and 82.12%, respectively) but increased sensitivity (49.40% and 71.57%, respectively) with a moderate decrease in specificity (90.16% and 76.59%, respectively). DISCUSSION AND CONCLUSIONS Our algorithm has further optimised the prediction of diabetes onset using a novel state-of-the-art machine learning algorithm: the wide and deep learning neural network architecture.
Collapse
Affiliation(s)
- Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand.
| | - Hung N Pham
- School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi 100000, Vietnam
| | - Hop Tran
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand
| | - Nhung Nghiem
- Department of Public Health, University of Otago, 23A Mein Street, Wellington 6021, New Zealand
| | - Quang H Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi 100000, Vietnam
| | - Trang T T Do
- Institute for Infocomm Research, Agency for Science, Technology and Research, 1 Fusionopolis Way, Singapore 138632, Singapore
| | - Cao Truong Tran
- Faculty of Information Technology, Le Quy Don Technical University, 236 Hoang Quoc Viet Street, Hanoi 100000, Vietnam
| | - Colin R Simpson
- Faculty of Health, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand; Usher Institute, The University of Edinburgh, Edinburgh, EH89AG, United Kingdom
| |
Collapse
|
61
|
Current Techniques for Diabetes Prediction: Review and Case Study. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9214604] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Diabetes is one of the most common diseases worldwide. Many Machine Learning (ML) techniques have been utilized in predicting diabetes in the last couple of years. The increasing complexity of this problem has inspired researchers to explore the robust set of Deep Learning (DL) algorithms. The highest accuracy achieved so far was 95.1% by a combined model CNN-LSTM. Even though numerous ML algorithms were used in solving this problem, there are a set of classifiers that are rarely used or even not used at all in this problem, so it is of interest to determine the performance of these classifiers in predicting diabetes. Moreover, there is no recent survey that has reviewed and compared the performance of all the proposed ML and DL techniques in addition to combined models. This article surveyed all the ML and DL techniques-based diabetes predictions published in the last six years. In addition, one study was developed that aimed to implement those rarely and not used ML classifiers on the Pima Indian Dataset to analyze their performance. The classifiers obtained an accuracy of 68%–74%. The recommendation is to use these classifiers in diabetes prediction and enhance them by developing combined models.
Collapse
|
62
|
Preo N, Capobianco E. Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks. Front Big Data 2019; 2:30. [PMID: 33693353 PMCID: PMC7931876 DOI: 10.3389/fdata.2019.00030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 08/16/2019] [Indexed: 01/11/2023] Open
Abstract
Background: Electronic health records (EHR) play an important role for the redefinition of phenotypes in view of the wealth and heterogeneity of information now available from disparate data sources. A recent cross-sectional retrospective study has described the potential of EHR toward type 2 diabetes mellitus (T2D) screening when ad hoc models are used. About 10,000 US patients have been analyzed through a variety of inference techniques applied to all records with a variable degree of completeness. The analyses conducted in the reference study have indicated that EHR phenotypes significantly improved T2D detection. Methods: With these US patients and the T2D data evidenced in the above study, we propose an integrative inference approach that leverages the prediction power of EHR features selected by two well-known methods, Random Forests and Lasso. The goal is 2-fold: reducing the Big Data redundancies potentially harmful to the predictive learning task and exploiting the interconnectivity of EHR features. A mutual information (MI) network is the inference tool used to identify communities useful to prioritize significant T2D features underlying the similarity between patients. Results: Endowed with a different degree of granularity, the communities detected after the application of both methods were centered especially on T2D comorbidities and risk factors. As such, they appear very relevant for assessment of two main issues, T2D disease burden, and prevention. Conclusions: Our analytical approach offers a solution for managing the EHR scale factor in a complex disease context. EHR are rich sources of phenotypic diversity through which novel stratifications of patients are expected. To enable these results, both pre-screening of variables and calibration of risk prediction methods become necessary steps in EHR analyses. We have presented networks identifying major T2D communities. The specific significance assigned to comorbidities and risk factors in relation to T2D can be inferred with accuracy from just a suitably reduced number of EHR features.
Collapse
Affiliation(s)
| | - Enrico Capobianco
- Center for Computational Science, University of Miami, Miami, FL, United States
| |
Collapse
|
63
|
Thesmar D, Sraer D, Pinheiro L, Dadson N, Veliche R, Greenberg P. Combining the Power of Artificial Intelligence with the Richness of Healthcare Claims Data: Opportunities and Challenges. PHARMACOECONOMICS 2019; 37:745-752. [PMID: 30848452 DOI: 10.1007/s40273-019-00777-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Combinations of healthcare claims data with additional datasets provide large and rich sources of information. The dimensionality and complexity of these combined datasets can be challenging to handle with standard statistical analyses. However, recent developments in artificial intelligence (AI) have led to algorithms and systems that are able to learn and extract complex patterns from such data. AI has already been applied successfully to such combined datasets, with applications such as improving the insurance claim processing pipeline and reducing estimation biases in retrospective studies. Nevertheless, there is still the potential to do much more. The identification of complex patterns within high dimensional datasets may find new predictors for early onset of diseases or lead to a more proactive offering of personalized preventive services. While there are potential risks and challenges associated with the use of AI, these are not insurmountable. As with the introduction of any innovation, it will be necessary to be thoughtful and responsible as we increasingly apply AI methods in healthcare.
Collapse
Affiliation(s)
- David Thesmar
- MIT Sloan School of Management, MIT, Cambridge, MA, USA
| | - David Sraer
- Department of Economics and Haas School of Business, UC Berkeley, Berkeley, CA, USA
| | - Lisa Pinheiro
- Analysis Group, Inc., 1190 avenue des Canadiens-de-Montréal, Montreal, QC, Canada.
| | - Nick Dadson
- Analysis Group, Inc., 1190 avenue des Canadiens-de-Montréal, Montreal, QC, Canada
| | | | | |
Collapse
|
64
|
Hammond R, Athanasiadou R, Curado S, Aphinyanaphongs Y, Abrams C, Messito MJ, Gross R, Katzow M, Jay M, Razavian N, Elbel B. Predicting childhood obesity using electronic health records and publicly available data. PLoS One 2019; 14:e0215571. [PMID: 31009509 PMCID: PMC6476510 DOI: 10.1371/journal.pone.0215571] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 04/05/2019] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Because of the strong link between childhood obesity and adulthood obesity comorbidities, and the difficulty in decreasing body mass index (BMI) later in life, effective strategies are needed to address this condition in early childhood. The ability to predict obesity before age five could be a useful tool, allowing prevention strategies to focus on high risk children. The few existing prediction models for obesity in childhood have primarily employed data from longitudinal cohort studies, relying on difficult to collect data that are not readily available to all practitioners. Instead, we utilized real-world unaugmented electronic health record (EHR) data from the first two years of life to predict obesity status at age five, an approach not yet taken in pediatric obesity research. METHODS AND FINDINGS We trained a variety of machine learning algorithms to perform both binary classification and regression. Following previous studies demonstrating different obesity determinants for boys and girls, we similarly developed separate models for both groups. In each of the separate models for boys and girls we found that weight for length z-score, BMI between 19 and 24 months, and the last BMI measure recorded before age two were the most important features for prediction. The best performing models were able to predict obesity with an Area Under the Receiver Operator Characteristic Curve (AUC) of 81.7% for girls and 76.1% for boys. CONCLUSIONS We were able to predict obesity at age five using EHR data with an AUC comparable to cohort-based studies, reducing the need for investment in additional data collection. Our results suggest that machine learning approaches for predicting future childhood obesity using EHR data could improve the ability of clinicians and researchers to drive future policy, intervention design, and the decision-making process in a clinical setting.
Collapse
Affiliation(s)
- Robert Hammond
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
| | - Rodoniki Athanasiadou
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
| | - Silvia Curado
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Cell Biology, NYU School of Medicine, New York, New York, United States of America
| | - Yindalon Aphinyanaphongs
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Population Health, NYU School of Medicine, New York, New York, United States of America
| | - Courtney Abrams
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Population Health, NYU School of Medicine, New York, New York, United States of America
| | - Mary Jo Messito
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Pediatrics, NYU School of Medicine, Bellevue Hospital Center, New York, New York, United States of America
| | - Rachel Gross
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Pediatrics, NYU School of Medicine, Bellevue Hospital Center, New York, New York, United States of America
| | - Michelle Katzow
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Pediatrics, NYU School of Medicine, Bellevue Hospital Center, New York, New York, United States of America
| | - Melanie Jay
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Population Health, NYU School of Medicine, New York, New York, United States of America
- Department of Medicine, NYU School of Medicine, New York, New York, United States of America
| | - Narges Razavian
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Population Health, NYU School of Medicine, New York, New York, United States of America
- Department of Radiology, NYU School of Medicine, New York, New York, United States of America
| | - Brian Elbel
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Population Health, NYU School of Medicine, New York, New York, United States of America
- NYU Wagner Graduate School of Public Service, New York, New York, United States of America
| |
Collapse
|
65
|
Identification of Traditional Chinese Medicine Constitutions and Physiological Indexes Risk Factors in Metabolic Syndrome: A Data Mining Approach. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE 2019; 2019:1686205. [PMID: 30854002 PMCID: PMC6378021 DOI: 10.1155/2019/1686205] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Accepted: 01/13/2019] [Indexed: 12/20/2022]
Abstract
Objective In order to find the predictive indexes for metabolic syndrome (MS), a data mining method was used to identify significant physiological indexes and traditional Chinese medicine (TCM) constitutions. Methods The annual health check-up data including physical examination data; biochemical tests and Constitution in Chinese Medicine Questionnaire (CCMQ) measurement data from 2014 to 2016 were screened according to the inclusion and exclusion criteria. A predictive matrix was established by the longitudinal data of three consecutive years. TreeNet machine learning algorithm was applied to build prediction model to uncover the dependence relationship between physiological indexes, TCM constitutions, and MS. Results By model testing, the overall accuracy rate for prediction model by TreeNet was 73.23%. Top 12.31% individuals in test group (n=325) that have higher probability of having MS covered 23.68% MS patients, showing 0.92 times more risk of having MS than the general population. Importance of ranked top 15 was listed in descending order . The top 5 variables of great importance in MS prediction were TBIL difference between 2014 and 2015 (D_TBIL), TBIL in 2014 (TBIL 2014), LDL-C difference between 2014 and 2015 (D_LDL-C), CCMQ scores for balanced constitution in 2015 (balanced constitution 2015), and TCH in 2015 (TCH 2015). When D_TBIL was between 0 and 2, TBIL 2014 was between 10 and 15, D_LDL-C was above 19, balanced constitution 2015 was below 60, or TCH 2015 was above 5.7, the incidence of MS was higher. Furthermore, there were interactions between balanced constitution 2015 score and TBIL 2014 or D_LDL-C in MS prediction. Conclusion Balanced constitution, TBIL, LDL-C, and TCH level can act as predictors for MS. The combination of TCM constitution and physiological indexes can give early warning to MS.
Collapse
|
66
|
Predicting the early risk of chronic kidney disease in patients with diabetes using real-world data. Nat Med 2019; 25:57-59. [PMID: 30617317 DOI: 10.1038/s41591-018-0239-8] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Accepted: 10/04/2018] [Indexed: 11/09/2022]
Abstract
Diagnostic procedures, therapeutic recommendations, and medical risk stratifications are based on dedicated, strictly controlled clinical trials. However, a plethora of real-world medical data exists, whereupon the increase in data volume comes at the expense of completeness, uniformity, and control. Here, a case-by-case comparison shows that the predictive power of our real world data-based model for diabetes-related chronic kidney disease outperforms published algorithms, which were derived from clinical study data.
Collapse
|
67
|
Nirala N, Periyasamy R, Singh BK, Kumar A. Detection of type-2 diabetes using characteristics of toe photoplethysmogram by applying support vector machine. Biocybern Biomed Eng 2019. [DOI: 10.1016/j.bbe.2018.09.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
68
|
Abstract
In the future artificial intelligence (AI) will have the potential to improve outcomes diabetes care. With the creation of new sensors for physiological monitoring sensors and the introduction of smart insulin pens, novel data relationships based on personal phenotypic and genotypic information will lead to selections of tailored, effective therapies that will transform health care. However, decision-making processes based exclusively on quantitative metrics that ignore qualitative factors could create a quantitative fallacy. Difficult to quantify inputs into AI-based therapeutic decision-making processes include empathy, compassion, experience, and unconscious bias. Failure to consider these "softer" variables could lead to important errors. In other words, that which is not quantified about human health and behavior is still part of the calculus for determining therapeutic interventions.
Collapse
Affiliation(s)
- David Kerr
- Sansum Diabetes Research Institute, Santa Barbara, CA, USA
- David Kerr, MBChB, DM, FRCPE, Sansum Diabetes Research Institute, 2219 Bath St, Santa Barbara, CA 93105, USA.
| | | |
Collapse
|
69
|
Bennett CC. REMOVED: Artificial intelligence for diabetes case management: The intersection of physical and mental health. INFORMATICS IN MEDICINE UNLOCKED 2019. [DOI: 10.1016/j.imu.2019.100191] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
|
70
|
An S, Malhotra K, Dilley C, Han-Burgess E, Valdez JN, Robertson J, Clark C, Westover MB, Sun J. Predicting drug-resistant epilepsy - A machine learning approach based on administrative claims data. Epilepsy Behav 2018; 89:118-125. [PMID: 30412924 PMCID: PMC6461470 DOI: 10.1016/j.yebeh.2018.10.013] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 10/04/2018] [Accepted: 10/08/2018] [Indexed: 11/28/2022]
Abstract
Patients with drug-resistant epilepsy (DRE) are at high risk of morbidity and mortality, yet their referral to specialist care is frequently delayed. The ability to identify patients at high risk of DRE at the time of treatment initiation, and to subsequently steer their treatment pathway toward more personalized interventions, has high clinical utility. Here, we aim to demonstrate the feasibility of developing algorithms for predicting DRE using machine learning methods. Longitudinal, intersected data sourced from US pharmacy, medical, and adjudicated hospital claims from 1,376,756 patients from 2006 to 2015 were analyzed; 292,892 met inclusion criteria for epilepsy, and 38,382 were classified as having DRE using a proxy measure for drug resistance. Patients were characterized using 1270 features reflecting demographics, comorbidities, medications, procedures, epilepsy status, and payer status. Data from 175,735 randomly selected patients were used to train three algorithms and from the remainder to assess the trained models' predictive power. A model with only age and sex was used as a benchmark. The best model, random forest, achieved an area under the receiver operating characteristic curve (95% confidence interval [CI]) of 0.764 (0.759, 0.770), compared with 0.657 (0.651, 0.663) for the benchmark model. Moreover, predicted probabilities for DRE were well-calibrated with the observed frequencies in the data. The model predicted drug resistance approximately 2 years before patients in the test dataset had failed two antiepileptic drugs (AEDs). Machine learning models constructed using claims data predicted which patients are likely to fail ≥3 AEDs and are at risk of developing DRE at the time of the first AED prescription. The use of such models can ensure that patients with predicted DRE receive specialist care with potentially more aggressive therapeutic interventions from diagnosis, to help reduce the serious sequelae of DRE.
Collapse
Affiliation(s)
- Sungtae An
- Georgia Institute of Technology, College of Computing, Atlanta, GA, USA
| | - Kunal Malhotra
- Georgia Institute of Technology, College of Computing, Atlanta, GA, USA
| | | | | | - Jeffrey N Valdez
- Georgia Institute of Technology, College of Computing, Atlanta, GA, USA
| | | | | | - M Brandon Westover
- Massachusetts General Hospital, Department of Neurology, Boston, MA, USA
| | - Jimeng Sun
- Georgia Institute of Technology, College of Computing, Atlanta, GA, USA.
| |
Collapse
|
71
|
Maeta K, Nishiyama Y, Fujibayashi K, Gunji T, Sasabe N, Iijima K, Naito T. Prediction of Glucose Metabolism Disorder Risk Using a Machine Learning Algorithm: Pilot Study. JMIR Diabetes 2018; 3:e10212. [PMID: 30478026 PMCID: PMC6288596 DOI: 10.2196/10212] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 08/16/2018] [Accepted: 10/17/2018] [Indexed: 01/10/2023] Open
Abstract
Background A 75-g oral glucose tolerance test (OGTT) provides important information about glucose metabolism, although the test is expensive and invasive. Complete OGTT information, such as 1-hour and 2-hour postloading plasma glucose and immunoreactive insulin levels, may be useful for predicting the future risk of diabetes or glucose metabolism disorders (GMD), which includes both diabetes and prediabetes. Objective We trained several classification models for predicting the risk of developing diabetes or GMD using data from thousands of OGTTs and a machine learning technique (XGBoost). The receiver operating characteristic (ROC) curves and their area under the curve (AUC) values for the trained classification models are reported, along with the sensitivity and specificity determined by the cutoff values of the Youden index. We compared the performance of the machine learning techniques with logistic regressions (LR), which are traditionally used in medical research studies. Methods Data were collected from subjects who underwent multiple OGTTs during comprehensive check-up medical examinations conducted at a single facility in Tokyo, Japan, from May 2006 to April 2017. For each examination, a subject was diagnosed with diabetes or prediabetes according to the American Diabetes Association guidelines. Given the data, 2 studies were conducted: predicting the risk of developing diabetes (study 1) or GMD (study 2). For each study, to apply supervised machine learning methods, the required label data was prepared. If a subject was diagnosed with diabetes or GMD at least once during the period, then that subject’s data obtained in previous trials were classified into the risk group (y=1). After data processing, 13,581 and 6760 OGTTs were analyzed for study 1 and study 2, respectively. For each study, a randomly chosen subset representing 80% of the data was used for training 9 classification models and the remaining 20% was used for evaluating the models. Three classification models, A to C, used XGBoost with various input variables, some including OGTT data. The other 6 classification models, D to I, used LR for comparison. Results For study 1, the AUC values ranged from 0.78 to 0.93. For study 2, the AUC values ranged from 0.63 to 0.78. The machine learning approach using XGBoost showed better performance compared with traditional LR methods. The AUC values increased when the full OGTT variables were included. In our analysis using a particular setting of input variables, XGBoost showed that the OGTT variables were more important than fasting plasma glucose or glycated hemoglobin. Conclusions A machine learning approach, XGBoost, showed better prediction accuracy compared with LR, suggesting that advanced machine learning methods are useful for detecting the early signs of diabetes or GMD. The prediction accuracy increased when all OGTT variables were added. This indicates that complete OGTT information is important for predicting the future risk of diabetes and GMD accurately.
Collapse
Affiliation(s)
- Katsutoshi Maeta
- Faculty of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan
| | - Yu Nishiyama
- Faculty of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan
| | - Kazutoshi Fujibayashi
- Department of General Medicine, School of Medicine, Juntendo University, Tokyo, Japan
| | - Toshiaki Gunji
- Center for Preventive Medicine, NTT Medical Center Tokyo, Tokyo, Japan
| | - Noriko Sasabe
- Center for Preventive Medicine, NTT Medical Center Tokyo, Tokyo, Japan
| | - Kimiko Iijima
- Center for Preventive Medicine, NTT Medical Center Tokyo, Tokyo, Japan
| | - Toshio Naito
- Department of General Medicine, School of Medicine, Juntendo University, Tokyo, Japan
| |
Collapse
|
72
|
Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting Diabetes Mellitus With Machine Learning Techniques. Front Genet 2018; 9:515. [PMID: 30459809 PMCID: PMC6232260 DOI: 10.3389/fgene.2018.00515] [Citation(s) in RCA: 223] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Accepted: 10/12/2018] [Indexed: 12/30/2022] Open
Abstract
Diabetes mellitus is a chronic disease characterized by hyperglycemia. It may cause many complications. According to the growing morbidity in recent years, in 2040, the world’s diabetic patients will reach 642 million, which means that one of the ten adults in the future is suffering from diabetes. There is no doubt that this alarming figure needs great attention. With the rapid development of machine learning, machine learning has been applied to many aspects of medical health. In this study, we used decision tree, random forest and neural network to predict diabetes mellitus. The dataset is the hospital physical examination data in Luzhou, China. It contains 14 attributes. In this study, five-fold cross validation was used to examine the models. In order to verity the universal applicability of the methods, we chose some methods that have the better performance to conduct independent test experiments. We randomly selected 68994 healthy people and diabetic patients’ data, respectively as training set. Due to the data unbalance, we randomly extracted 5 times data. And the result is the average of these five experiments. In this study, we used principal component analysis (PCA) and minimum redundancy maximum relevance (mRMR) to reduce the dimensionality. The results showed that prediction with random forest could reach the highest accuracy (ACC = 0.8084) when all the attributes were used.
Collapse
Affiliation(s)
- Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Kaiyang Qu
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Yamei Luo
- School of Medical Information and Engineering, Southwest Medical University, Luzhou, China
| | - Dehui Yin
- School of Medical Information and Engineering, Southwest Medical University, Luzhou, China
| | - Ying Ju
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Hua Tang
- Department of Pathophysiology, School of Basic Medicine, Southwest Medical University, Luzhou, China
| |
Collapse
|
73
|
Murphree DH, Arabmakki E, Ngufor C, Storlie CB, McCoy RG. Stacked classifiers for individualized prediction of glycemic control following initiation of metformin therapy in type 2 diabetes. Comput Biol Med 2018; 103:109-115. [PMID: 30347342 DOI: 10.1016/j.compbiomed.2018.10.017] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Revised: 10/14/2018] [Accepted: 10/15/2018] [Indexed: 01/11/2023]
Abstract
OBJECTIVE Metformin is the preferred first-line medication for management of type 2 diabetes and prediabetes. However, over a third of patients experience primary or secondary therapeutic failure. We developed machine learning models to predict which patients initially prescribed metformin will achieve and maintain control of their blood glucose after one year of therapy. MATERIALS AND METHODS We performed a retrospective analysis of administrative claims data for 12,147 commercially-insured adults and Medicare Advantage beneficiaries with prediabetes or diabetes. Several machine learning models were trained using variables available at the time of metformin initiation to predict achievement and maintenance of hemoglobin A1c (HbA1c) < 7.0% after one year of therapy. RESULTS AUC performances based on five-fold cross-validation ranged from 0.58 to 0.75. The most influential variables driving the predictions were baseline HbA1c, starting metformin dosage, and presence of diabetes with complications. CONCLUSIONS Machine learning models can effectively predict primary or secondary metformin treatment failure within one year. This information can help identify effective individualized treatment strategies. Most of the implemented models outperformed traditional logistic regression, highlighting the potential for applying machine learning to problems in medicine.
Collapse
Affiliation(s)
- Dennis H Murphree
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA.
| | - Elaheh Arabmakki
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - Che Ngufor
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - Curtis B Storlie
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - Rozalina G McCoy
- Division of Community Internal Medicine, Department of Medicine, Mayo Clinic, Rochester, MN, 55905, USA; Division of Health Care Policy & Research, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, 55905, USA; Mayo Clinic Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
74
|
Won JC, Lee JH, Kim JH, Kang ES, Won KC, Kim DJ, Lee MK. Diabetes Fact Sheet in Korea, 2016: An Appraisal of Current Status. Diabetes Metab J 2018; 42:415-424. [PMID: 30113146 PMCID: PMC6202557 DOI: 10.4093/dmj.2018.0017] [Citation(s) in RCA: 77] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 05/02/2018] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND This report presents the recent prevalence and comorbidities related to diabetes in Korea by analyzing the nationally representative data. METHODS Using data from the Korea National Health and Nutrition Examination Survey for 2013 to 2014, the percentages and the total number of subjects over the age of 30 years with diabetes and prediabetes were estimated and applied to the National Population Census in 2014. Diagnosis of diabetes was based on fasting plasma glucose (≥126 mg/dL), current taking of antidiabetic medication, history of previous diabetes, or glycosylated hemoglobin (HbA1c) ≥6.5%. Impaired fasting glucose (IFG) was defined by fasting plasma glucose in the range of 100 to 125 mg/dL among those without diabetes. RESULTS About 4.8 million (13.7%) Korean adults (≥30 years old) had diabetes, and about 8.3 million (24.8%) Korean adults had IFG. However, 29.3% of the subjects with diabetes are not aware of their condition. Of the subjects with diabetes, 48.6% and 54.7% were obese and hypertensive, respectively, and 31.6% had hypercholesterolemia. Although most subjects with diabetes (89.1%) were under medical treatment, and mostly being treated with oral hypoglycemic agents (80.2%), 10.8% have remained untreated. With respect to overall glycemic control, 43.5% reached the target of HbA1c <7%, whereas 23.3% reached the target when the standard was set to HbA1c <6.5%, according to the Korean Diabetes Association guideline. CONCLUSION Diabetes is a major public health threat in Korea, but a significant proportion of adults were not controlling their illness. We need comprehensive approaches to overcome the upcoming diabetes-related disease burden in Korea.
Collapse
Affiliation(s)
- Jong Chul Won
- Department of Internal Medicine, Cardiovascular and Metabolic Disease Center, Inje University Sanggye Paik Hospital, Inje University College of Medicine, Seoul, Korea
| | - Jae Hyuk Lee
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Myongji Hospital, Goyang, Korea
| | - Jae Hyeon Kim
- Division of Endocrinology and Metabolism, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Eun Seok Kang
- Department of Internal Medicine, Yonsei University College of Medicine, Seoul, Korea
| | - Kyu Chang Won
- Department of Internal Medicine, Yeungnam University College of Medicine, Daegu, Korea
| | - Dae Jung Kim
- Department of Endocrinology and Metabolism, Ajou University School of Medicine, Suwon, Korea.
| | - Moon Kyu Lee
- Division of Endocrinology and Metabolism, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea.
| |
Collapse
|
75
|
Abstract
Population health management and specifically chronic disease management depend on the ability of providers to prevent development of high-cost and high-risk conditions such as diabetes, heart failure, and chronic respiratory diseases and to control them. The advent of big data analytics has potential to empower health care providers to make timely and truly evidence-based informed decisions to provide more effective and personalized treatment while reducing the costs of this care to patients. The goal of this study was to identify real-world health care applications of big data analytics to determine its effectiveness in both patient outcomes and the relief of financial burdens. The methodology for this study was a literature review utilizing 49 articles. Evidence of big data analytics being largely beneficial in the areas of risk prediction, diagnostic accuracy and patient outcome improvement, hospital readmission reduction, treatment guidance, and cost reduction was noted. Initial applications of big data analytics have proved useful in various phases of chronic disease management and could help reduce the chronic disease burden.
Collapse
|
76
|
Islam MS, Hasan MM, Wang X, Germack HD, Noor-E-Alam M. A Systematic Review on Healthcare Analytics: Application and Theoretical Perspective of Data Mining. Healthcare (Basel) 2018; 6:E54. [PMID: 29882866 PMCID: PMC6023432 DOI: 10.3390/healthcare6020054] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2018] [Revised: 05/17/2018] [Accepted: 05/21/2018] [Indexed: 12/17/2022] Open
Abstract
The growing healthcare industry is generating a large volume of useful data on patient demographics, treatment plans, payment, and insurance coverage—attracting the attention of clinicians and scientists alike. In recent years, a number of peer-reviewed articles have addressed different dimensions of data mining application in healthcare. However, the lack of a comprehensive and systematic narrative motivated us to construct a literature review on this topic. In this paper, we present a review of the literature on healthcare analytics using data mining and big data. Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, we conducted a database search between 2005 and 2016. Critical elements of the selected studies—healthcare sub-areas, data mining techniques, types of analytics, data, and data sources—were extracted to provide a systematic view of development in this field and possible future directions. We found that the existing literature mostly examines analytics in clinical and administrative decision-making. Use of human-generated data is predominant considering the wide adoption of Electronic Medical Record in clinical care. However, analytics based on website and social media data has been increasing in recent years. Lack of prescriptive analytics in practice and integration of domain expert knowledge in the decision-making process emphasizes the necessity of future research.
Collapse
Affiliation(s)
- Md Saiful Islam
- Mechanical and Industrial Engineering, Northeastern University, Boston, MA 02115, USA.
| | - Md Mahmudul Hasan
- Mechanical and Industrial Engineering, Northeastern University, Boston, MA 02115, USA.
| | - Xiaoyi Wang
- Mechanical and Industrial Engineering, Northeastern University, Boston, MA 02115, USA.
| | - Hayley D Germack
- Mechanical and Industrial Engineering, Northeastern University, Boston, MA 02115, USA.
- National Clinician Scholars Program, Yale University School of Medicine, New Haven, CT 06511, USA.
- Bouvé College of Health Sciences, Northeastern University, Boston, MA 02115, USA.
| | - Md Noor-E-Alam
- Mechanical and Industrial Engineering, Northeastern University, Boston, MA 02115, USA.
| |
Collapse
|
77
|
Cho NH, Shaw JE, Karuranga S, Huang Y, da Rocha Fernandes JD, Ohlrogge AW, Malanda B. IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Res Clin Pract 2018; 138:271-281. [PMID: 29496507 DOI: 10.1016/j.diabres.2018.02.023] [Citation(s) in RCA: 4332] [Impact Index Per Article: 618.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 02/16/2018] [Indexed: 02/06/2023]
Abstract
INTRODUCTION Since the year 2000, IDF has been measuring the prevalence of diabetes nationally, regionally and globally. AIM To produce estimates of the global burden of diabetes and its impact for 2017 and projections for 2045. METHODS A systematic literature review was conducted to identify published studies on the prevalence of diabetes, impaired glucose tolerance and hyperglycaemia in pregnancy in the period from 1990 to 2016. The highest quality studies on diabetes prevalence were selected for each country. A logistic regression model was used to generate age-specific prevalence estimates or each country. Estimates for countries without data were extrapolated from similar countries. RESULTS It was estimated that in 2017 there are 451 million (age 18-99 years) people with diabetes worldwide. These figures were expected to increase to 693 million) by 2045. It was estimated that almost half of all people (49.7%) living with diabetes are undiagnosed. Moreover, there was an estimated 374 million people with impaired glucose tolerance (IGT) and it was projected that almost 21.3 million live births to women were affected by some form of hyperglycaemia in pregnancy. In 2017, approximately 5 million deaths worldwide were attributable to diabetes in the 20-99 years age range. The global healthcare expenditure on people with diabetes was estimated to be USD 850 billion in 2017. CONCLUSION The new estimates of diabetes prevalence, deaths attributable to diabetes and healthcare expenditure due to diabetes present a large social, financial and health system burden across the world.
Collapse
Affiliation(s)
- N H Cho
- International Diabetes Federation, Chaussee de la Hulpe 166, Brussels, Belgium; Department of Preventive Medicine, Ajou University School of Medicine, 164 World Cup-ro, Suwon, South Korea.
| | - J E Shaw
- Baker Heart and Diabetes Institute, 75 Commercial Rd, Melbourne, Australia.
| | - S Karuranga
- International Diabetes Federation, Chaussee de la Hulpe 166, Brussels, Belgium.
| | - Y Huang
- International Diabetes Federation, Chaussee de la Hulpe 166, Brussels, Belgium.
| | | | - A W Ohlrogge
- International Diabetes Federation, Chaussee de la Hulpe 166, Brussels, Belgium.
| | - B Malanda
- International Diabetes Federation, Chaussee de la Hulpe 166, Brussels, Belgium.
| |
Collapse
|
78
|
Gehrmann S, Dernoncourt F, Li Y, Carlson ET, Wu JT, Welt J, Foote J, Moseley ET, Grant DW, Tyler PD, Celi LA. Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives. PLoS One 2018; 13:e0192360. [PMID: 29447188 PMCID: PMC5813927 DOI: 10.1371/journal.pone.0192360] [Citation(s) in RCA: 102] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 01/21/2018] [Indexed: 01/22/2023] Open
Abstract
In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions.
Collapse
Affiliation(s)
- Sebastian Gehrmann
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Harvard SEAS, Harvard University, Cambridge, MA, United States of America
- * E-mail:
| | - Franck Dernoncourt
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Massachusetts Institute of Technology, Cambridge, MA, United States of America
- Adobe Research, San Jose, CA, United States of America
| | - Yeran Li
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Harvard T.H. Chan School of Public Health, Cambridge, MA, United States of America
| | - Eric T. Carlson
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Philips Research North America, Cambridge, MA, United States of America
| | - Joy T. Wu
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Harvard T.H. Chan School of Public Health, Cambridge, MA, United States of America
| | - Jonathan Welt
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Wellman Center for Photomedicine, Massachusetts General Hospital, Boston, MA, United States of America
| | - John Foote
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Tufts University School of Medicine, Cambridge, MA, United States of America
| | - Edward T. Moseley
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- College of Science and Mathematics, University of Massachusetts, Boston, MA, United States of America
| | - David W. Grant
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Washington University School of Medicine, St. Louis, MO, United States of America
| | - Patrick D. Tyler
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Department of Internal Medicine, Beth Israel Deaconess Medical Center, Boston, MA, United States of America
| | - Leo A. Celi
- MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America
- Massachusetts Institute of Technology, Cambridge, MA, United States of America
| |
Collapse
|
79
|
Kaur P, Sharma M, Mittal M. Big Data and Machine Learning Based Secure Healthcare Framework. ACTA ACUST UNITED AC 2018. [DOI: 10.1016/j.procs.2018.05.020] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
80
|
Owusu Adjah ES, Montvida O, Agbeve J, Paul SK. Data Mining Approach to Identify Disease Cohorts from Primary Care Electronic Medical Records: A Case of Diabetes Mellitus. ACTA ACUST UNITED AC 2017. [DOI: 10.2174/1875036201710010016] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Identification of diseased patients from primary care based electronic medical records (EMRs) has methodological challenges that may impact epidemiologic inferences.Objective:To compare deterministic clinically guided selection algorithms with probabilistic machine learning (ML) methodologies for their ability to identify patients with type 2 diabetes mellitus (T2DM) from large population based EMRs from nationally representative primary care database.Methods:Four cohorts of patients with T2DM were defined by deterministic approach based on disease codes. The database was mined for a set of best predictors of T2DM and the performance of six ML algorithms were compared based on cross-validated true positive rate, true negative rate, and area under receiver operating characteristic curve.Results:In the database of 11,018,025 research suitable individuals, 379 657 (3.4%) were coded to have T2DM. Logistic Regression classifier was selected as best ML algorithm and resulted in a cohort of 383,330 patients with potential T2DM. Eighty-three percent (83%) of this cohort had a T2DM code, and 16% of the patients with T2DM code were not included in this ML cohort. Of those in the ML cohort without disease code, 52% had at least one measure of elevated glucose level and 22% had received at least one prescription for antidiabetic medication.Conclusion:Deterministic cohort selection based on disease coding potentially introduces significant mis-classification problem. ML techniques allow testing for potential disease predictors, and under meaningful data input, are able to identify diseased cohorts in a holistic way.
Collapse
|
81
|
Capobianco E. Systems and precision medicine approaches to diabetes heterogeneity: a Big Data perspective. Clin Transl Med 2017; 6:23. [PMID: 28744848 PMCID: PMC5526830 DOI: 10.1186/s40169-017-0155-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2017] [Accepted: 06/26/2017] [Indexed: 12/15/2022] Open
Abstract
Big Data, and in particular Electronic Health Records, provide the medical community with a great opportunity to analyze multiple pathological conditions at an unprecedented depth for many complex diseases, including diabetes. How can we infer on diabetes from large heterogeneous datasets? A possible solution is provided by invoking next-generation computational methods and data analytics tools within systems medicine approaches. By deciphering the multi-faceted complexity of biological systems, the potential of emerging diagnostic tools and therapeutic functions can be ultimately revealed. In diabetes, a multidimensional approach to data analysis is needed to better understand the disease conditions, trajectories and the associated comorbidities. Elucidation of multidimensionality comes from the analysis of factors such as disease phenotypes, marker types, and biological motifs while seeking to make use of multiple levels of information including genetics, omics, clinical data, and environmental and lifestyle factors. Examining the synergy between multiple dimensions represents a challenge. In such regard, the role of Big Data fuels the rise of Precision Medicine by allowing an increasing number of descriptions to be captured from individuals. Thus, data curations and analyses should be designed to deliver highly accurate predicted risk profiles and treatment recommendations. It is important to establish linkages between systems and precision medicine in order to translate their principles into clinical practice. Equivalently, to realize their full potential, the involved multiple dimensions must be able to process information ensuring inter-exchange, reducing ambiguities and redundancies, and ultimately improving health care solutions by introducing clinical decision support systems focused on reclassified phenotypes (or digital biomarkers) and community-driven patient stratifications.
Collapse
Affiliation(s)
- Enrico Capobianco
- Center for Computational Science, University of Miami, Miami, FL, USA.
| |
Collapse
|
82
|
Machine learning in laboratory medicine: waiting for the flood? ACTA ACUST UNITED AC 2017; 56:516-524. [DOI: 10.1515/cclm-2017-0287] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 09/05/2017] [Indexed: 02/04/2023]
Abstract
Abstract
This review focuses on machine learning and on how methods and models combining data analytics and artificial intelligence have been applied to laboratory medicine so far. Although still in its infancy, the potential for applying machine learning to laboratory data for both diagnostic and prognostic purposes deserves more attention by the readership of this journal, as well as by physician-scientists who will want to take advantage of this new computer-based support in pathology and laboratory medicine.
Collapse
|
83
|
Baum A, Scarpa J, Bruzelius E, Tamler R, Basu S, Faghmous J. Targeting weight loss interventions to reduce cardiovascular complications of type 2 diabetes: a machine learning-based post-hoc analysis of heterogeneous treatment effects in the Look AHEAD trial. Lancet Diabetes Endocrinol 2017; 5:808-815. [PMID: 28711469 PMCID: PMC5815373 DOI: 10.1016/s2213-8587(17)30176-6] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 04/21/2017] [Accepted: 04/27/2017] [Indexed: 02/08/2023]
Abstract
BACKGROUND The Action for Health in Diabetes (Look AHEAD) trial investigated whether long-term cardiovascular disease morbidity and mortality could be reduced through a weight loss intervention among people with type 2 diabetes. Despite finding no significant reduction in cardiovascular events on average, it is possible that some subpopulations might have derived benefit. In this post-hoc analysis, we test the hypothesis that the overall neutral average treatment effect in the trial masked important heterogeneous treatment effects (HTEs) from intensive weight loss interventions. METHODS We used causal forest modelling, which identifies HTEs, using a random half of the trial data (the training set). We applied Cox proportional hazards models to test the potential HTEs on the remaining half of the data (the testing set). The analysis was deemed exempt from review by the Columbia University Institutional Review Board, Protocol ID# AAAO3003. FINDINGS Between Aug 22, 2001, and April 30, 2004, 5145 patients with type 2 diabetes were enrolled in the Look AHEAD randomised controlled trial, of whom 4901 were included in the The National Institute of Diabetes and Digestive and Kidney Diseases Repository and included in our analyses: 2450 for model development and 2451 in the testing dataset. Baseline HbA1c and self-reported general health distinguished participants who differentially benefited from the intervention. Cox models for the primary composite cardiovascular outcome revealed a number needed to treat of 28·9 to prevent 1 event over 9·6 years among participants with HbA1c 6·8% or higher, or both HbA1c less than 6·8% and Short Form Health Survey (SF-36) general health score of 48 or more (2101 [86%] of 2451 participants in the testing dataset; 167 [16%] of 1046 primary outcome events for intervention vs 205 [19%] of 1055 for control, absolute risk reduction of 3·46%, 95% CI 0·21-6·73%, p=0·038) By contrast, participants with HbA1c less than 6·8% and baseline SF-36 general health score of less than 48 (350 [14%] of 2451 participants in the testing data; 27 [16%] of 171 primary outcome events for intervention vs 15 [8%] of 179 primary outcome events for control) had an absolute risk increase of the primary outcome of 7·41% (0·60 to 14·22, p=0·003). INTERPRETATION Look AHEAD participants with moderately or poorly controlled diabetes (HbA1c 6·8% or higher) and subjects with well controlled diabetes (HbA1c less than 6·8%) and good self-reported health (85% of the overall study population) averted cardiovascular events from a behavioural intervention aimed at weight loss. However, 15% of participants with well controlled diabetes and poor self-reported general health experienced negative effects that rendered the overall study outcome neutral. HbA1c and a short questionnaire on general health might identify people with type 2 diabetes likely to derive benefit from an intensive lifestyle intervention aimed at weight loss. FUNDING None.
Collapse
Affiliation(s)
- Aaron Baum
- Department of Health System Design and Global Health, Arnhold Institute for Global Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Joseph Scarpa
- Department of Health System Design and Global Health, Arnhold Institute for Global Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Emilie Bruzelius
- Department of Health System Design and Global Health, Arnhold Institute for Global Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Epidemiology, Joseph L Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Ronald Tamler
- Division of Endocrinology, Diabetes, and Bone Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sanjay Basu
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - James Faghmous
- Department of Health System Design and Global Health, Arnhold Institute for Global Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
84
|
Luo G, Sward K. A Roadmap for Optimizing Asthma Care Management via Computational Approaches. JMIR Med Inform 2017; 5:e32. [PMID: 28951380 PMCID: PMC5635229 DOI: 10.2196/medinform.8076] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Revised: 07/09/2017] [Accepted: 08/14/2017] [Indexed: 11/26/2022] Open
Abstract
Asthma affects 9% of Americans and incurs US $56 billion in cost, 439,000 hospitalizations, and 1.8 million emergency room visits annually. A small fraction of asthma patients with high vulnerabilities, severe disease, or great barriers to care consume most health care costs and resources. An effective approach is urgently needed to identify high-risk patients and intervene to improve outcomes and to reduce costs and resource use. Care management is widely used to implement tailored care plans for this purpose, but it is expensive and has limited service capacity. To maximize benefit, we should enroll only patients anticipated to have the highest costs or worst prognosis. Effective care management requires correctly identifying high-risk patients, but current patient identification approaches have major limitations. This paper pinpoints these limitations and outlines multiple machine learning techniques to address them, providing a roadmap for future research.
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - Katherine Sward
- College of Nursing, University of Utah, Salt Lake City, UT, United States
| |
Collapse
|
85
|
Rotmensch M, Halpern Y, Tlimat A, Horng S, Sontag D. Learning a Health Knowledge Graph from Electronic Medical Records. Sci Rep 2017; 7:5994. [PMID: 28729710 PMCID: PMC5519723 DOI: 10.1038/s41598-017-05778-z] [Citation(s) in RCA: 114] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 06/01/2017] [Indexed: 12/03/2022] Open
Abstract
Demand for clinical decision support systems in medicine and self-diagnostic symptom checkers has substantially increased in recent years. Existing platforms rely on knowledge bases manually compiled through a labor-intensive process or automatically derived using simple pairwise statistics. This study explored an automated process to learn high quality knowledge bases linking diseases and symptoms directly from electronic medical records. Medical concepts were extracted from 273,174 de-identified patient records and maximum likelihood estimation of three probabilistic models was used to automatically construct knowledge graphs: logistic regression, naive Bayes classifier and a Bayesian network using noisy OR gates. A graph of disease-symptom relationships was elicited from the learned parameters and the constructed knowledge graphs were evaluated and validated, with permission, against Google’s manually-constructed knowledge graph and against expert physician opinions. Our study shows that direct and automated construction of high quality health knowledge graphs from medical records using rudimentary concept extraction is feasible. The noisy OR model produces a high quality knowledge graph reaching precision of 0.85 for a recall of 0.6 in the clinical evaluation. Noisy OR significantly outperforms all tested models across evaluation frameworks (p < 0.01).
Collapse
Affiliation(s)
- Maya Rotmensch
- Center for Data Science, New York University, New York, NY, USA
| | - Yoni Halpern
- Department of Computer Science, New York University, New York, NY, USA
| | - Abdulhakim Tlimat
- Department of Emergency Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Steven Horng
- Department of Emergency Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA.,Division of Clinical Informatics, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - David Sontag
- Department of Electrical Engineering and Computer Science, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA. .,Institute for Medical Engineering & Science Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
86
|
Ogurtsova K, da Rocha Fernandes JD, Huang Y, Linnenkamp U, Guariguata L, Cho NH, Cavan D, Shaw JE, Makaroff LE. IDF Diabetes Atlas: Global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes Res Clin Pract 2017; 128:40-50. [PMID: 28437734 DOI: 10.1016/j.diabres.2017.03.024] [Citation(s) in RCA: 2478] [Impact Index Per Article: 309.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2017] [Accepted: 03/26/2017] [Indexed: 02/06/2023]
Abstract
AIM To produce current estimates of the national, regional and global impact of diabetes for 2015 and 2040. METHODS A systematic literature review was conducted to identify data sources on the prevalence of diabetes from studies conducted in the period from 1990 to 2015. An analytic hierarchy process was used to select the most appropriate studies for each country, and estimates for countries without data were modelled using extrapolation from similar countries that had available data. A logistic regression model was used to generate smoothed age-specific estimates, which were applied to UN population estimates. RESULTS 540 data sources were reviewed, of which 196 sources from 111 countries were selected. In 2015 it was estimated that there were 415 million (uncertainty interval: 340-536 million) people with diabetes aged 20-79years, 5.0 million deaths attributable to diabetes, and the total global health expenditure due to diabetes was estimated at 673 billion US dollars. Three quarters (75%) of those with diabetes were living in low- and middle-income countries. The number of people with diabetes aged 20-79years was predicted to rise to 642 million (uncertainty interval: 521-829 million) by 2040. CONCLUSION Diabetes prevalence, deaths attributable to diabetes, and health expenditure due to diabetes continue to rise across the globe with important social, financial and health system implications.
Collapse
Affiliation(s)
- K Ogurtsova
- International Diabetes Federation, Chaussee de la Hulpe 166, Brussels, Belgium
| | | | - Y Huang
- International Diabetes Federation, Chaussee de la Hulpe 166, Brussels, Belgium.
| | - U Linnenkamp
- International Diabetes Federation, Chaussee de la Hulpe 166, Brussels, Belgium
| | - L Guariguata
- International Diabetes Federation, Chaussee de la Hulpe 166, Brussels, Belgium.
| | - N H Cho
- International Diabetes Federation, Chaussee de la Hulpe 166, Brussels, Belgium; Department of Preventive Medicine, Ajou University School of Medicine, 164 World Cup-ro, Suwon, South Korea.
| | - D Cavan
- International Diabetes Federation, Chaussee de la Hulpe 166, Brussels, Belgium.
| | - J E Shaw
- Baker Heart and Diabetes Institute, 75 Commercial Rd, Melbourne, Australia.
| | - L E Makaroff
- International Diabetes Federation, Chaussee de la Hulpe 166, Brussels, Belgium; Department of Microbiology and Immunology, University of Leuven, Herestraat 49, Leuven, Belgium.
| |
Collapse
|
87
|
Ehlers AP, Roy SB, Khor S, Mandagani P, Maria M, Alfonso-Cristancho R, Flum DR. Improved Risk Prediction Following Surgery Using Machine Learning Algorithms. EGEMS 2017; 5:3. [PMID: 29881747 PMCID: PMC5983054 DOI: 10.13063/2327-9214.1278] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Background: Machine learning is used to analyze big data, often for the purposes of prediction. Analyzing a patient’s healthcare utilization pattern may provide more precise estimates of risk for adverse events (AE) or death. We sought to characterize healthcare utilization prior to surgery using machine learning for the purposes of risk prediction. Methods: Patients from MarketScan Commercial Claims and Encounters Database undergoing elective surgery from 2007–2012 with ≥1 comorbidity were included. All available healthcare claims occurring within six months prior to surgery were assessed. More than 300 predictors were defined by considering all combinations of conditions, encounter types, and timing along with sociodemographic factors. We used a supervised Naive Bayes algorithm to predict risk of AE or death within 90 days of surgery. We compared the model’s performance to the Charlson’s comorbidity index, a commonly used risk prediction tool. Results: Among 410,521 patients (mean age 52, 52 ± 9.4, 56% female), 4.7% had an AE and 0.01% died. The Charlson’s comorbidity index predicted 57% of AE’s and 59% of deaths. The Naive Bayes algorithm predicted 79% of AE’s and 78% of deaths. Claims for cancer, kidney disease, and peripheral vascular disease were the primary drivers of AE or death following surgery. Conclusions: The use of machine learning algorithms improves upon one commonly used risk estimator. Precisely quantifying the risk of an AE following surgery may better inform patient-centered decision-making and direct targeted quality improvement interventions while supporting activities of accountable care organizations that rely on accurate estimates of population risk.
Collapse
Affiliation(s)
| | - Senjuti Basu Roy
- Department of Computer Science, New Jersey Institute of Technology
| | - Sara Khor
- University of Washington Surgical Outcomes Research Center
| | - Prathyusha Mandagani
- University of Washington, Seattle Campus.,Department of Computer Science, New Jersey Institute of Technology.,University of Washington Surgical Outcomes Research Center.,GlaxoSmithKline.,University of Washington School of Medicine
| | - Moushumi Maria
- University of Washington, Seattle Campus.,Department of Computer Science, New Jersey Institute of Technology.,University of Washington Surgical Outcomes Research Center.,GlaxoSmithKline.,University of Washington School of Medicine
| | | | | |
Collapse
|
88
|
Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I. Machine Learning and Data Mining Methods in Diabetes Research. Comput Struct Biotechnol J 2017; 15:104-116. [PMID: 28138367 PMCID: PMC5257026 DOI: 10.1016/j.csbj.2016.12.005] [Citation(s) in RCA: 391] [Impact Index Per Article: 48.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 12/20/2016] [Accepted: 12/27/2016] [Indexed: 12/14/2022] Open
Abstract
The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic Health Records (EHRs). To this end, application of machine learning and data mining methods in biosciences is presently, more than ever before, vital and indispensable in efforts to transform intelligently all available information into valuable knowledge. Diabetes mellitus (DM) is defined as a group of metabolic disorders exerting significant pressure on human health worldwide. Extensive research in all aspects of diabetes (diagnosis, etiopathophysiology, therapy, etc.) has led to the generation of huge amounts of data. The aim of the present study is to conduct a systematic review of the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to a) Prediction and Diagnosis, b) Diabetic Complications, c) Genetic Background and Environment, and e) Health Care and Management with the first category appearing to be the most popular. A wide range of machine learning algorithms were employed. In general, 85% of those used were characterized by supervised learning approaches and 15% by unsupervised ones, and more specifically, association rules. Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used. The title applications in the selected articles project the usefulness of extracting valuable knowledge leading to new hypotheses targeting deeper understanding and further investigation in DM.
Collapse
Affiliation(s)
- Ioannis Kavakiotis
- Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece
| | - Olga Tsave
- Laboratory of Inorganic Chemistry, Department of Chemical Engineering, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Athanasios Salifoglou
- Laboratory of Inorganic Chemistry, Department of Chemical Engineering, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Nicos Maglaveras
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece
- Lab of Computing and Medical Informatics, Medical School, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Ioannis Vlahavas
- Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Ioanna Chouvarda
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece
- Lab of Computing and Medical Informatics, Medical School, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| |
Collapse
|
89
|
Thompson S, Varvel S, Sasinowski M, Burke JP. From Value Assessment to Value Cocreation: Informing Clinical Decision-Making with Medical Claims Data. BIG DATA 2016; 4:141-147. [PMID: 27642718 DOI: 10.1089/big.2015.0030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Big data and advances in analytical processes represent an opportunity for the healthcare industry to make better evidence-based decisions on the value generated by various tests, procedures, and interventions. Value-based reimbursement is the process of identifying and compensating healthcare providers based on whether their services improve quality of care without increasing cost of care or maintain quality of care while decreasing costs. In this article, we motivate and illustrate the potential opportunities for payers and providers to collaborate and evaluate the clinical and economic efficacy of different healthcare services. We conduct a case study of a firm that offers advanced biomarker and disease state management services for cardiovascular and cardiometabolic conditions. A value-based analysis that comprised a retrospective case/control cohort design was conducted, and claims data for over 7000 subjects who received these services were compared to a matched control cohort. Study subjects were commercial and Medicare Advantage enrollees with evidence of CHD, diabetes, or a related condition. Analysis of medical claims data showed a lower proportion of patients who received biomarker testing and disease state management services experienced a MI (p < 0.01) or diabetic complications (p < 0.001). No significant increase in cost of care was found between the two cohorts. Our results illustrate the opportunity healthcare payers such as Medicare and commercial insurance companies have in terms of identifying value-creating healthcare interventions. However, payers and providers also need to pursue system integration efforts to further automate the identification and dissemination of clinically and economically efficacious treatment plans to ensure at-risk patients receive the treatments and interventions that will benefit them the most.
Collapse
Affiliation(s)
- Steven Thompson
- 1 Robins School of Business, University of Richmond , Richmond, Virginia
| | | | | | - James P Burke
- 4 Health Economics and Outcomes Research , Optum, Eden Prairie, Minnesota
| |
Collapse
|