1
|
Wang L, Xie J, Gu Z, Miao X, Ma L, Yan S, Gong Y, Li C, Sun B, Ruan Y. Predicting isolated impaired glucose tolerance without oral glucose tolerance test using machine learning in Chinese Han men. Front Endocrinol (Lausanne) 2025; 16:1514397. [PMID: 40343071 PMCID: PMC12058868 DOI: 10.3389/fendo.2025.1514397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/20/2024] [Accepted: 03/27/2025] [Indexed: 05/11/2025] Open
Abstract
Background Isolated Impaired Glucose Tolerance (I-IGT) represents a specific prediabetic state that typically requires a standardized oral glucose tolerance test (OGTT) for diagnosis. This study aims to predict glucose tolerance status in Chinese Han men at fasting state using machine learning (ML) models with demographic, anthropometric, and laboratory data. Methods The study population consisted of 1,117 Chinese Han men aged 50-87 years. Baseline variables including age, fasting plasma glucose (FPG), high blood pressure (HBP), body mass index (BMI), waist to hip ratio (WHR), total cholesterol (TC), triglyceride (TG), high-density lipoprotein cholesterol (HDL-C), and low-density lipoprotein cholesterol (LDL-C) were collected from electronic medical records (EMRs) for machine learning model training and validation. Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naive Bayes (NB), Adaptive Boosting (AdaBoost) and Gradient Boosting Machines (GBM) were tested for machine learning model performance comparison. Model performance was evaluated using metrics including accuracy, recall, F1 score, positive predictive value (PPV), negative predictive value (NPV), and the area under the receiver operating characteristic curve (AUC). Shapley Additive Explanations (SHAP) and confusion matrix plots were used for model interpretation. Results The RF model demonstrated the best overall performance with a 96.7% accuracy, recall of 91.4%, F1 score of 95.7%, PPV of 99.1%, and NPV of 95.6%. The AUC values for the SVM, DT, RF, LR, KNN, NB, AdaBoost, and GBM models were 0.97, 0.92, 0.96, 0.97, 0.88, 0.88, 0.97, and 0.97, respectively. While the RF model showed strong overall performance, the LR model had the highest AUC, indicating superior discriminatory power. FPG was identified as the most important predictor for I-IGT, followed by HDL, TC, HBP, BMI, and WHR. Individuals with FPG levels higher than 5.1 mmol/L were more likely to have I-IGT; the performance metrics for this cut-off value were: 89.35% accuracy, 89.79% recall, 85.22% F1 score, 81.09% PPV, 94.38% NPV, and 0.95 AUC. Conclusion Machine learning models based on demographic and clinical characteristics offer a cost-effective method for predicting I-IGT in Chinese Han men aged over 50, without the need for an OGTT. These models could complement existing early diagnostic strategies, thereby enhancing the early detection and prevention of diabetes. Additionally, FPG alone could serve as an efficient screening tool for the early identification of I-IGT in clinical settings.
Collapse
Affiliation(s)
- Lin Wang
- Department of Endocrinology, Second Medical Center, Chinese People’s Liberation Army General Hospital, National Clinical Research Center for Geriatric Diseases, Beijing, China
| | - Jing Xie
- Department of Special Medical Service, Ninth Medical Center, Chinese People’s Liberation Army General Hospital, Beijing, China
| | - Zhaoyan Gu
- Department of Endocrinology, Second Medical Center, Chinese People’s Liberation Army General Hospital, National Clinical Research Center for Geriatric Diseases, Beijing, China
| | - Xinyu Miao
- Department of Endocrinology, Second Medical Center, Chinese People’s Liberation Army General Hospital, National Clinical Research Center for Geriatric Diseases, Beijing, China
| | - Lichao Ma
- Department of Endocrinology, Second Medical Center, Chinese People’s Liberation Army General Hospital, National Clinical Research Center for Geriatric Diseases, Beijing, China
| | - Shuangtong Yan
- Department of Endocrinology, Second Medical Center, Chinese People’s Liberation Army General Hospital, National Clinical Research Center for Geriatric Diseases, Beijing, China
| | - Yanping Gong
- Department of Endocrinology, Second Medical Center, Chinese People’s Liberation Army General Hospital, National Clinical Research Center for Geriatric Diseases, Beijing, China
| | - Chunlin Li
- Department of Endocrinology, Second Medical Center, Chinese People’s Liberation Army General Hospital, National Clinical Research Center for Geriatric Diseases, Beijing, China
| | - Banruo Sun
- Department of Endocrinology, Second Medical Center, Chinese People’s Liberation Army General Hospital, National Clinical Research Center for Geriatric Diseases, Beijing, China
| | - Yue Ruan
- Institute of Biomedical and Health Engineering, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
2
|
Bernstorff M, Hansen L, Enevoldsen K, Damgaard J, Hæstrup F, Perfalk E, Danielsen AA, Østergaard SD. Development and validation of a machine learning model for prediction of type 2 diabetes in patients with mental illness. Acta Psychiatr Scand 2025; 151:245-258. [PMID: 38575118 PMCID: PMC11787919 DOI: 10.1111/acps.13687] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 03/08/2024] [Accepted: 03/28/2024] [Indexed: 04/06/2024]
Abstract
BACKGROUND Type 2 diabetes (T2D) is approximately twice as common among individuals with mental illness compared with the background population, but may be prevented by early intervention on lifestyle, diet, or pharmacologically. Such prevention relies on identification of those at elevated risk (prediction). The aim of this study was to develop and validate a machine learning model for prediction of T2D among patients with mental illness. METHODS The study was based on routine clinical data from electronic health records from the psychiatric services of the Central Denmark Region. A total of 74,880 patients with 1.59 million psychiatric service contacts were included in the analyses. We created 1343 potential predictors from 51 source variables, covering patient-level information on demographics, diagnoses, pharmacological treatment, and laboratory results. T2D was operationalised as HbA1c ≥48 mmol/mol, fasting plasma glucose ≥7.0 mmol/mol, oral glucose tolerance test ≥11.1 mmol/mol or random plasma glucose ≥11.1 mmol/mol. Two machine learning models (XGBoost and regularised logistic regression) were trained to predict T2D based on 85% of the included contacts. The predictive performance of the best performing model was tested on the remaining 15% of the contacts. RESULTS The XGBoost model detected patients at high risk 2.7 years before T2D, achieving an area under the receiver operating characteristic curve of 0.84. Of the 996 patients developing T2D in the test set, the model issued at least one positive prediction for 305 (31%). CONCLUSION A machine learning model can accurately predict development of T2D among patients with mental illness based on routine clinical data from electronic health records. A decision support system based on such a model may inform measures to prevent development of T2D in this high-risk population.
Collapse
Affiliation(s)
- Martin Bernstorff
- Department of Affective DisordersAarhus University Hospital – PsychiatryAarhusDenmark
- Department of Clinical MedicineAarhus UniversityAarhusDenmark
- Center for Humanities ComputingAarhus UniversityAarhusDenmark
| | - Lasse Hansen
- Department of Affective DisordersAarhus University Hospital – PsychiatryAarhusDenmark
- Department of Clinical MedicineAarhus UniversityAarhusDenmark
- Center for Humanities ComputingAarhus UniversityAarhusDenmark
| | - Kenneth Enevoldsen
- Department of Affective DisordersAarhus University Hospital – PsychiatryAarhusDenmark
- Department of Clinical MedicineAarhus UniversityAarhusDenmark
- Center for Humanities ComputingAarhus UniversityAarhusDenmark
| | - Jakob Damgaard
- Department of Affective DisordersAarhus University Hospital – PsychiatryAarhusDenmark
- Department of Clinical MedicineAarhus UniversityAarhusDenmark
- Center for Humanities ComputingAarhus UniversityAarhusDenmark
| | - Frida Hæstrup
- Department of Affective DisordersAarhus University Hospital – PsychiatryAarhusDenmark
- Department of Clinical MedicineAarhus UniversityAarhusDenmark
- Center for Humanities ComputingAarhus UniversityAarhusDenmark
| | - Erik Perfalk
- Department of Affective DisordersAarhus University Hospital – PsychiatryAarhusDenmark
- Department of Clinical MedicineAarhus UniversityAarhusDenmark
| | - Andreas Aalkjær Danielsen
- Department of Affective DisordersAarhus University Hospital – PsychiatryAarhusDenmark
- Department of Clinical MedicineAarhus UniversityAarhusDenmark
| | - Søren Dinesen Østergaard
- Department of Affective DisordersAarhus University Hospital – PsychiatryAarhusDenmark
- Department of Clinical MedicineAarhus UniversityAarhusDenmark
| |
Collapse
|
3
|
Birdi S, Rabet R, Durant S, Patel A, Vosoughi T, Shergill M, Costanian C, Ziegler CP, Ali S, Buckeridge D, Ghassemi M, Gibson J, John-Baptiste A, Macklin J, McCradden M, McKenzie K, Mishra S, Naraei P, Owusu-Bempah A, Rosella L, Shaw J, Upshur R, Pinto AD. Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review. BMC Public Health 2024; 24:3599. [PMID: 39732655 PMCID: PMC11682638 DOI: 10.1186/s12889-024-21081-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 12/12/2024] [Indexed: 12/30/2024] Open
Abstract
BACKGROUND Machine learning (ML) is increasingly used in population and public health to support epidemiological studies, surveillance, and evaluation. Our objective was to conduct a scoping review to identify studies that use ML in population health, with a focus on its use in non-communicable diseases (NCDs). We also examine potential algorithmic biases in model design, training, and implementation, as well as efforts to mitigate these biases. METHODS We searched the peer-reviewed, indexed literature using Medline, Embase, Cochrane Central Register of Controlled Trials and Cochrane Database of Systematic Reviews, CINAHL, Scopus, ACM Digital Library, Inspec, Web of Science's Science Citation Index, Social Sciences Citation Index, and the Emerging Sources Citation Index, up to March 2022. RESULTS The search identified 27 310 studies and 65 were included. Study aims were separated into algorithm comparison (n = 13, 20%) or disease modelling for population-health-related outputs (n = 52, 80%). We extracted data on NCD type, data sources, technical approach, possible algorithmic bias, and jurisdiction. Type 2 diabetes was the most studied NCD. The most common use of ML was for risk modeling. Mitigating bias was not extensively addressed, with most methods focused on mitigating sex-related bias. CONCLUSION This review examines current applications of ML in NCDs, highlighting potential biases and strategies for mitigation. Future research should focus on communicable diseases and the transferability of ML models in low and middle-income settings. Our findings can guide the development of guidelines for the equitable use of ML to improve population health outcomes.
Collapse
Affiliation(s)
- Sharon Birdi
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada
| | - Roxana Rabet
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada
| | - Steve Durant
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada
| | - Atushi Patel
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada
| | - Tina Vosoughi
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada
| | - Mahek Shergill
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada
- Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON, Canada
| | - Christy Costanian
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada
| | - Carolyn P Ziegler
- Library Services, Unity Health Toronto, St. Michael's Hospital, Toronto, ON, Canada
| | - Shehzad Ali
- Department of Epidemiology and Biostatistics, Western Centre for Public Health & Family Medicine, Western University, London, ON, Canada
- Division of Epidemiology, Dalla Lana School of Public Health, Toronto, ON, Canada
- Department of Laboratory Medicine and Pathobiology, Temerty Faculty of Medicine, Toronto, ON, Canada
| | - David Buckeridge
- Department of Epidemiology, Biostatistics and Occupational Health, School of Population and Global Health, McGill University, Montreal, QC, Canada
| | - Marzyeh Ghassemi
- Department of Electrical Engineering and Computer Science (EECS) and Institute for Medical Engineering & Science (IMES), MIT, Cambridge, MA, USA
| | - Jennifer Gibson
- Joint Centre for Bioethics, University of Toronto, Toronto, ON, Canada
| | - Ava John-Baptiste
- Departments of Epidemiology & Biostatistics, Anesthesia & Perioperative Medicine, Schulich Interfaculty Program in Public Health, Western University, London, ON, Canada
| | - Jillian Macklin
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada
- Undergraduate Medical Education, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Melissa McCradden
- Division of Clinical Public Health, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Department of Bioethics, The Hospital for Sick Children, Toronto, ON, Canada
- Genetics & Genome Biology, SickKids Research Institute, Toronto, ON, Canada
| | - Kwame McKenzie
- Wellesley Institute, Toronto, ON, Canada
- CAMH, Toronto, ON, Canada
| | - Sharmistha Mishra
- Division of Infectious Diseases, Department of Medicine, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, Toronto, ON, Canada
- Institute of Medical Science, Faculty of Medicine, University of Toronto, Toronto, Canada
- Institute of Health Policy, Management and Evaluation, Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
- ICES, Toronto, ON, Canada
| | - Parisa Naraei
- Department of Computer Science, Toronto Metropolitan University, Toronto, ON, Canada
| | - Akwasi Owusu-Bempah
- Department of Sociology, Faculty of Arts & Sciences, University of Toronto, Toronto, ON, Canada
| | - Laura Rosella
- Division of Clinical Public Health, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Institute for Better Health, Trillium Health Partners, Toronto, ON, Canada
- Department of Health Sciences, University of York, York, UK
- WHO Collaborating Centre for Knowledge Translation and Health Technology Assessment in Health Equity, Ottawa Centre for Health Equity, Ottawa, ON, Canada
| | - James Shaw
- Department of Physical Therapy, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Ross Upshur
- Department of Family and Community Medicine, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Division of Clinical Public Health, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Joint Centre for Bioethics, University of Toronto, Toronto, ON, Canada
| | - Andrew D Pinto
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada.
- Department of Family and Community Medicine, St. Michael's Hospital, Toronto, ON, Canada.
- Department of Family and Community Medicine, Faculty of Medicine, University of Toronto, Toronto, ON, Canada.
- Division of Clinical Public Health, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
4
|
Jayasinghe D, Eshetie S, Beckmann K, Benyamin B, Lee SH. Advancements and limitations in polygenic risk score methods for genomic prediction: a scoping review. Hum Genet 2024; 143:1401-1431. [PMID: 39542907 DOI: 10.1007/s00439-024-02716-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 10/31/2024] [Indexed: 11/17/2024]
Abstract
This scoping review aims to identify and evaluate the landscape of Polygenic Risk Score (PRS)-based methods for genomic prediction from 2013 to 2023, highlighting their advancements, key concepts, and existing gaps in knowledge, research, and technology. Over the past decade, various PRS-based methods have emerged, each employing different statistical frameworks aimed at enhancing prediction accuracy, processing speed and memory efficiency. Despite notable advancements, challenges persist, including unrealistic assumptions regarding sample sizes and the polygenicity of traits necessary for accurate predictions, as well as limitations in exploring hyper-parameter spaces and considering environmental interactions. We included studies focusing on PRS-based methods for risk prediction that underwent methodological evaluations using valid approaches and released computational tools/software. Additionally, we restricted our selection to studies involving human participants that were published in English language. This review followed the standard protocol recommended by Joanna Briggs Institute Reviewer's Manual, systematically searching Ovid MEDLINE, Ovid Embase, Scopus and Web of Science databases. Additionally, searches included grey literature sources like pre-print servers such as bioRxiv, and articles recommended by experts to ensure comprehensive and diverse coverage of relevant records. This study identified 34 studies detailing 37 genomic prediction methods, the majority of which rely on linkage disequilibrium (LD) information and necessitate hyper-parameter tuning. Nine methods integrate functional/gene annotation, while 12 are suitable for cross-ancestry genomic prediction, with only one considering gene-environment (GxE) interaction. While some methods require individual-level data, most leverage summary statistics, offering flexibility. Despite progress, challenges remain. These include computational complexity and the need for large sample sizes for high prediction accuracy. Furthermore, recent methods exhibit varying effectiveness across traits, with absolute accuracies often falling short of clinical utility. Transferability across ancestries varies, influenced by trait heritability and diversity of training data, while handling admixed populations remains challenging. Additionally, the absence of standard error measurements for individual PRSs, crucial in clinical settings, underscores a critical gap. Another issue is the lack of customizable graphical visualization tools among current software packages. While genomic prediction methods have advanced significantly, there is still room for improvement. Addressing current challenges and embracing future research directions will lead to the development of more universally applicable, robust, and clinically relevant genomic prediction tools.
Collapse
Affiliation(s)
- Dovini Jayasinghe
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia.
| | - Setegn Eshetie
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
- College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
| | - Kerri Beckmann
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
| | - Beben Benyamin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| |
Collapse
|
5
|
Nghiem N, Wilson N, Krebs J, Tran T. Predicting the risk of diabetes complications using machine learning and social administrative data in a country with ethnic inequities in health: Aotearoa New Zealand. BMC Med Inform Decis Mak 2024; 24:274. [PMID: 39334279 PMCID: PMC11438423 DOI: 10.1186/s12911-024-02678-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 09/10/2024] [Indexed: 09/30/2024] Open
Abstract
BACKGROUND In the age of big data, linked social and administrative health data in combination with machine learning (ML) is being increasingly used to improve prediction in chronic disease, e.g., cardiovascular diseases (CVD). In this study we aimed to apply ML methods on extensive national-level health and social administrative datasets to assess the utility of these for predicting future diabetes complications, including by ethnicity. METHODS Five ML models were used to predict CVD events among all people with known diabetes in the population of New Zealand, utilizing nationwide individual-level administrative data. RESULTS The Xgboost ML model had the best predictive power for predicting CVD events three years into the future among the population with diabetes (N = 145,600). The optimization procedure also found limited improvement in prediction by ethnicity (using area under the receiver operating curve, [AUC]). The results indicated no trade-off between model predictive performance and equity gap of prediction by ethnicity (that is improving model prediction and reducing performance gaps by ethnicity can be achieved simultaneously). The list of variables of importance was different among different models/ethnic groups, for example: age, deprivation (neighborhood-level), having had a hospitalization event, and the number of years living with diabetes. DISCUSSION AND CONCLUSIONS We provide further evidence that ML with administrative health data can be used for meaningful future prediction of health outcomes. As such, it could be utilized to inform health planning and healthcare resource allocation for diabetes management and the prevention of CVD events. Our results may suggest limited scope for developing prediction models by ethnic group and that the major ways to reduce inequitable health outcomes is probably via improved delivery of prevention and management to those groups with diabetes at highest need.
Collapse
Affiliation(s)
- Nhung Nghiem
- Department of Public Health, University of Otago Wellington, Wellington City, Wellington, 6021, New Zealand.
- John Curtin School of Medical Research, Australian National University, Canberra City, ACT, 2601, Australia.
| | - Nick Wilson
- Department of Public Health, University of Otago Wellington, Wellington City, Wellington, 6021, New Zealand
| | - Jeremy Krebs
- Department of Medicine, University of Otago Wellington, Wellington City, Wellington, 6021, New Zealand
| | - Truyen Tran
- Applied Artificial Intelligence Institute (A2I2), Deakin University, Geelong City, VIC, 3216, Australia
| |
Collapse
|
6
|
Abstract
Artificial intelligence (AI) systems have demonstrated impressive performance across a variety of clinical tasks. However, notoriously, sometimes these systems are "black boxes." The initial response in the literature was a demand for "explainable AI." However, recently, several authors have suggested that making AI more explainable or "interpretable" is likely to be at the cost of the accuracy of these systems and that prioritizing interpretability in medical AI may constitute a "lethal prejudice." In this paper, we defend the value of interpretability in the context of the use of AI in medicine. Clinicians may prefer interpretable systems over more accurate black boxes, which in turn is sufficient to give designers of AI reason to prefer more interpretable systems in order to ensure that AI is adopted and its benefits realized. Moreover, clinicians may be justified in this preference. Achieving the downstream benefits from AI is critically dependent on how the outputs of these systems are interpreted by physicians and patients. A preference for the use of highly accurate black box AI systems, over less accurate but more interpretable systems, may itself constitute a form of lethal prejudice that may diminish the benefits of AI to-and perhaps even harm-patients.
Collapse
Affiliation(s)
- Joshua Hatherley
- School of Philosophical, Historical, and International Studies, Monash University, Clayton, Victoria, Australia
| | - Robert Sparrow
- School of Philosophical, Historical, and International Studies, Monash University, Clayton, Victoria, Australia
| | - Mark Howard
- School of Philosophical, Historical, and International Studies, Monash University, Clayton, Victoria, Australia
| |
Collapse
|
7
|
Zhang H, Jethani N, Jones S, Genes N, Major VJ, Jaffe IS, Cardillo AB, Heilenbach N, Ali NF, Bonanni LJ, Clayburn AJ, Khera Z, Sadler EC, Prasad J, Schlacter J, Liu K, Silva B, Montgomery S, Kim EJ, Lester J, Hill TM, Avoricani A, Chervonski E, Davydov J, Small W, Chakravartty E, Grover H, Dodson JA, Brody AA, Aphinyanaphongs Y, Masurkar A, Razavian N. Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.07.10.23292373. [PMID: 38405784 PMCID: PMC10888985 DOI: 10.1101/2023.07.10.23292373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Importance Large language models (LLMs) are crucial for medical tasks. Ensuring their reliability is vital to avoid false results. Our study assesses two state-of-the-art LLMs (ChatGPT and LlaMA-2) for extracting clinical information, focusing on cognitive tests like MMSE and CDR. Objective Evaluate ChatGPT and LlaMA-2 performance in extracting MMSE and CDR scores, including their associated dates. Methods Our data consisted of 135,307 clinical notes (Jan 12th, 2010 to May 24th, 2023) mentioning MMSE, CDR, or MoCA. After applying inclusion criteria 34,465 notes remained, of which 765 underwent ChatGPT (GPT-4) and LlaMA-2, and 22 experts reviewed the responses. ChatGPT successfully extracted MMSE and CDR instances with dates from 742 notes. We used 20 notes for fine-tuning and training the reviewers. The remaining 722 were assigned to reviewers, with 309 each assigned to two reviewers simultaneously. Inter-rater-agreement (Fleiss' Kappa), precision, recall, true/false negative rates, and accuracy were calculated. Our study follows TRIPOD reporting guidelines for model validation. Results For MMSE information extraction, ChatGPT (vs. LlaMA-2) achieved accuracy of 83% (vs. 66.4%), sensitivity of 89.7% (vs. 69.9%), true-negative rates of 96% (vs 60.0%), and precision of 82.7% (vs 62.2%). For CDR the results were lower overall, with accuracy of 87.1% (vs. 74.5%), sensitivity of 84.3% (vs. 39.7%), true-negative rates of 99.8% (98.4%), and precision of 48.3% (vs. 16.1%). We qualitatively evaluated the MMSE errors of ChatGPT and LlaMA-2 on double-reviewed notes. LlaMA-2 errors included 27 cases of total hallucination, 19 cases of reporting other scores instead of MMSE, 25 missed scores, and 23 cases of reporting only the wrong date. In comparison, ChatGPT's errors included only 3 cases of total hallucination, 17 cases of wrong test reported instead of MMSE, and 19 cases of reporting a wrong date. Conclusions In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Abraham A. Brody
- NYU Rory Meyers College of Nursing, NYU Grossman School of Medicine
| | | | | | | |
Collapse
|
8
|
Mohsen F, Al-Absi HRH, Yousri NA, El Hajj N, Shah Z. A scoping review of artificial intelligence-based methods for diabetes risk prediction. NPJ Digit Med 2023; 6:197. [PMID: 37880301 PMCID: PMC10600138 DOI: 10.1038/s41746-023-00933-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Accepted: 09/25/2023] [Indexed: 10/27/2023] Open
Abstract
The increasing prevalence of type 2 diabetes mellitus (T2DM) and its associated health complications highlight the need to develop predictive models for early diagnosis and intervention. While many artificial intelligence (AI) models for T2DM risk prediction have emerged, a comprehensive review of their advancements and challenges is currently lacking. This scoping review maps out the existing literature on AI-based models for T2DM prediction, adhering to the PRISMA extension for Scoping Reviews guidelines. A systematic search of longitudinal studies was conducted across four databases, including PubMed, Scopus, IEEE-Xplore, and Google Scholar. Forty studies that met our inclusion criteria were reviewed. Classical machine learning (ML) models dominated these studies, with electronic health records (EHR) being the predominant data modality, followed by multi-omics, while medical imaging was the least utilized. Most studies employed unimodal AI models, with only ten adopting multimodal approaches. Both unimodal and multimodal models showed promising results, with the latter being superior. Almost all studies performed internal validation, but only five conducted external validation. Most studies utilized the area under the curve (AUC) for discrimination measures. Notably, only five studies provided insights into the calibration of their models. Half of the studies used interpretability methods to identify key risk predictors revealed by their models. Although a minority highlighted novel risk predictors, the majority reported commonly known ones. Our review provides valuable insights into the current state and limitations of AI-based models for T2DM prediction and highlights the challenges associated with their development and clinical integration.
Collapse
Affiliation(s)
- Farida Mohsen
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Hamada R H Al-Absi
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Noha A Yousri
- Genetic Medicine, Weill Cornell Medicine-Qatar, Qatar Foundation, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
- Computer and Systems Engineering, Alexandria University, Alexandria, Egypt
| | - Nady El Hajj
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Zubair Shah
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar.
| |
Collapse
|
9
|
Naveed I, Kaleem MF, Keshavjee K, Guergachi A. Artificial intelligence with temporal features outperforms machine learning in predicting diabetes. PLOS DIGITAL HEALTH 2023; 2:e0000354. [PMID: 37878561 PMCID: PMC10599553 DOI: 10.1371/journal.pdig.0000354] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 08/19/2023] [Indexed: 10/27/2023]
Abstract
Diabetes mellitus type 2 is increasingly being called a modern preventable pandemic, as even with excellent available treatments, the rate of complications of diabetes is rapidly increasing. Predicting diabetes and identifying it in its early stages could make it easier to prevent, allowing enough time to implement therapies before it gets out of control. Leveraging longitudinal electronic medical record (EMR) data with deep learning has great potential for diabetes prediction. This paper examines the predictive competency of deep learning models in contrast to state-of-the-art machine learning models to incorporate the time dimension of risk. The proposed research investigates a variety of deep learning models and features for predicting diabetes. Model performance was appraised and compared in relation to predominant features, risk factors, training data density and visit history. The framework was implemented on the longitudinal EMR records of over 19K patients extracted from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). Empirical findings demonstrate that deep learning models consistently outperform other state-of-the-art competitors with prediction accuracy of above 91%, without overfitting. Fasting blood sugar, hemoglobin A1c and body mass index are the key predictors of future onset of diabetes. Overweight, middle aged patients and patients with hypertension are more vulnerable to developing diabetes, consistent with what is already known. Model performance improves as training data density or the visit history of a patient increases. This study confirms the ability of the LSTM deep learning model to incorporate the time dimension of risk in its predictive capabilities.
Collapse
Affiliation(s)
- Iqra Naveed
- Department of Electrical Engineering, University of Management and Technology, Lahore, Pakistan
| | - Muhammad Farhat Kaleem
- Department of Electrical Engineering, University of Management and Technology, Lahore, Pakistan
| | - Karim Keshavjee
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada
| | - Aziz Guergachi
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada
- Ted Rogers School of Information Technology Management, Toronto Metropolitan University, Toronto, Canada
- Department of Mathematics and Statistics, York University, Toronto, Canada
| |
Collapse
|
10
|
Oirbeek RV, Ponnet J, Baesens B, Verdonck T. Computational Efficient Approximations of the Concordance Probability in a Big Data Setting. BIG DATA 2023; 12:243-268. [PMID: 37289184 DOI: 10.1089/big.2022.0107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.
Collapse
Affiliation(s)
| | - Jolien Ponnet
- Department of Mathematics, Faculty of Science, KU Leuven, Leuven, Belgium
| | - Bart Baesens
- Faculty of Economics and Business, KU Leuven, Leuven, Belgium
- School of Management, University of Southampton, Southampton, United Kingdom
| | - Tim Verdonck
- Department of Mathematics, Faculty of Science, KU Leuven, Leuven, Belgium
- Department of Mathematics, Faculty of Science, UAntwerp-imec, Antwerp, Belgium
| |
Collapse
|
11
|
Hellmann A, Emmons A, Stewart Prime M, Paranjape K, Heaney DL. Digital Health: Today's Solutions and Tomorrow's Impact. Clin Lab Med 2023; 43:71-86. [PMID: 36764809 DOI: 10.1016/j.cll.2022.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Artificial intelligence (AI) is becoming an indispensable tool to augment decision making in different health care settings and by various members of the patient pathway, including the patient. AI provides the ability to optimize data to bring clinical decision support for clinicians and laboratorians and/or empower patients to actively participate in their own health care. Though there are many examples of AI in health care, the exact role of AI and digital health solutions is still taking shape. Although AI will not replace the clinician, those who do not adopt AI may in time, be left behind.
Collapse
Affiliation(s)
- Alison Hellmann
- Roche Diagnostics, 9115 Hague Road, Indianapolis, IN 46256, USA.
| | - Ashley Emmons
- Roche Diagnostics, 9115 Hague Road, Indianapolis, IN 46256, USA
| | - Matthew Stewart Prime
- Roche Information Solutions, Kornfeldstrasse 42, Riehen 4125, Basel Stadt, Switzerland
| | - Ketan Paranjape
- Roche Diagnostics, 9115 Hague Road, Indianapolis, IN 46256, USA
| | - Denise L Heaney
- Roche Diagnostics, 9115 Hague Road, Indianapolis, IN 46256, USA
| |
Collapse
|
12
|
Lone IM, Midlej K, Nun NB, Iraqi FA. Intestinal cancer development in response to oral infection with high-fat diet-induced Type 2 diabetes (T2D) in collaborative cross mice under different host genetic background effects. Mamm Genome 2023; 34:56-75. [PMID: 36757430 DOI: 10.1007/s00335-023-09979-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 01/20/2023] [Indexed: 02/10/2023]
Abstract
Type 2 diabetes (T2D) is a metabolic disease with an imbalance in blood glucose concentration. There are significant studies currently showing association between T2D and intestinal cancer developments. High-fat diet (HFD) plays part in the disease development of T2D, intestinal cancer and infectious diseases through many biological mechanisms, including but not limited to inflammation. Understanding the system genetics of the multimorbidity of these diseases will provide an important knowledge and platform for dissecting the complexity of these diseases. Furthermore, in this study we used some machine learning (ML) models to explore more aspects of diabetes mellitus. The ultimate aim of this project is to study the genetic factors, which underline T2D development, associated with intestinal cancer in response to a HFD consumption and oral coinfection, jointly or separately, on the same host genetic background. A cohort of 307 mice of eight different CC mouse lines in the four experimental groups was assessed. The mice were maintained on either HFD or chow diet (CHD) for 12-week period, while half of each dietary group was either coinfected with oral bacteria or uninfected. Host response to a glucose load and clearance was assessed using intraperitoneal glucose tolerance test (IPGTT) at two time points (weeks 6 and 12) during the experiment period and, subsequently, was translated to area under curve (AUC) values. At week 5 of the experiment, mice of group two and four were coinfected with Porphyromonas gingivalis (Pg) and Fusobacterium nucleatum (Fn) strains, three times a week, while keeping the other uninfected mice as a control group. At week 12, mice were killed, small intestines and colon were extracted, and subsequently, the polyp counts were assessed; as well, the intestine lengths and size were measured. Our results have shown that there is a significant variation in polyp's number in different CC lines, with a spectrum between 2.5 and 12.8 total polyps on average. There was a significant correlation between area under curve (AUC) and intestine measurements, including polyp counts, length and size. In addition, our results have shown a significant sex effect on polyp development and glucose tolerance ability with males more susceptible to HFD than females by showing higher AUC in the glucose tolerance test. The ML results showed that classification with random forest could reach the highest accuracy when all the attributes were used. These results provide an excellent platform for proceeding toward understanding the nature of the genes involved in resistance and rate of development of intestinal cancer and T2D induced by HFD and oral coinfection. Once obtained, such data can be used to predict individual risk for developing these diseases and to establish the genetically based strategy for their prevention and treatment.
Collapse
Affiliation(s)
- Iqbal M Lone
- Department of Clinical Microbiology and Immunology, Sackler Faculty of Medicine, Tel-Aviv University, Ramat Aviv, 69978, Tel-Aviv, Israel
| | - Kareem Midlej
- Department of Clinical Microbiology and Immunology, Sackler Faculty of Medicine, Tel-Aviv University, Ramat Aviv, 69978, Tel-Aviv, Israel
| | - Nadav Ben Nun
- Department of Clinical Microbiology and Immunology, Sackler Faculty of Medicine, Tel-Aviv University, Ramat Aviv, 69978, Tel-Aviv, Israel
| | - Fuad A Iraqi
- Department of Clinical Microbiology and Immunology, Sackler Faculty of Medicine, Tel-Aviv University, Ramat Aviv, 69978, Tel-Aviv, Israel.
| |
Collapse
|
13
|
Nghiem N, Atkinson J, Nguyen BP, Tran-Duy A, Wilson N. Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets. HEALTH ECONOMICS REVIEW 2023; 13:9. [PMID: 36738348 PMCID: PMC9898915 DOI: 10.1186/s13561-023-00422-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 01/23/2023] [Indexed: 06/18/2023]
Abstract
OBJECTIVES To optimise planning of public health services, the impact of high-cost users needs to be considered. However, most of the existing statistical models for costs do not include many clinical and social variables from administrative data that are associated with elevated health care resource use, and are increasingly available. This study aimed to use machine learning approaches and big data to predict high-cost users among people with cardiovascular disease (CVD). METHODS We used nationally representative linked datasets in New Zealand to predict CVD prevalent cases with the most expensive cost belonging to the top quintiles by cost. We compared the performance of four popular machine learning models (L1-regularised logistic regression, classification trees, k-nearest neighbourhood (KNN) and random forest) with the traditional regression models. RESULTS The machine learning models had far better accuracy in predicting high health-cost users compared with the logistic models. The harmony score F1 (combining sensitivity and positive predictive value) of the machine learning models ranged from 30.6% to 41.2% (compared with 8.6-9.1% for the logistic models). Previous health costs, income, age, chronic health conditions, deprivation, and receiving a social security benefit were among the most important predictors of the CVD high-cost users. CONCLUSIONS This study provides additional evidence that machine learning can be used as a tool together with big data in health economics for identification of new risk factors and prediction of high-cost users with CVD. As such, machine learning may potentially assist with health services planning and preventive measures to improve population health while potentially saving healthcare costs.
Collapse
Affiliation(s)
- Nhung Nghiem
- Department of Public Health, University of Otago, Wellington, New Zealand.
| | - June Atkinson
- Department of Public Health, University of Otago, Wellington, New Zealand
| | - Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand
| | - An Tran-Duy
- Centre for Health Policy, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Australia
| | - Nick Wilson
- Department of Public Health, University of Otago, Wellington, New Zealand
| |
Collapse
|
14
|
Zou X, Liu Y, Ji L. Review: Machine learning in precision pharmacotherapy of type 2 diabetes-A promising future or a glimpse of hope? Digit Health 2023; 9:20552076231203879. [PMID: 37786401 PMCID: PMC10541760 DOI: 10.1177/20552076231203879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 09/08/2023] [Indexed: 10/04/2023] Open
Abstract
Precision pharmacotherapy of diabetes requires judicious selection of the optimal therapeutic agent for individual patients. Artificial intelligence (AI), a swiftly expanding discipline, holds substantial potential to transform current practices in diabetes diagnosis and management. This manuscript provides a comprehensive review of contemporary research investigating drug responses in patient subgroups, stratified via either supervised or unsupervised machine learning approaches. The prevalent algorithmic workflow for investigating drug responses using machine learning involves cohort selection, data processing, predictor selection, development and validation of machine learning methods, subgroup allocation, and subsequent analysis of drug response. Despite the promising feature, current research does not yet provide sufficient evidence to implement machine learning algorithms into routine clinical practice, due to a lack of simplicity, validation, or demonstrated efficacy. Nevertheless, we anticipate that the evolving evidence base will increasingly substantiate the role of machine learning in molding precision pharmacotherapy for diabetes.
Collapse
Affiliation(s)
- Xiantong Zou
- Xiantong Zou, Department of Endocrinology and Metabolism, Peking University People's Hospital, Beijing, 100044, China.
| | | | - Linong Ji
- Linong Ji, Department of Endocrinology and Metabolism, Peking University People's Hospital, Beijing, 100044, China.
| |
Collapse
|
15
|
Detection of factors affecting kidney function using machine learning methods. Sci Rep 2022; 12:21740. [PMID: 36526702 PMCID: PMC9758148 DOI: 10.1038/s41598-022-26160-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022] Open
Abstract
Due to the increasing prevalence of chronic kidney disease and its high mortality rate, study of risk factors affecting the progression of the disease is of great importance. Here in this work, we aim to develop a framework for using machine learning methods to identify factors affecting kidney function. To this end classification methods are trained to predict the serum creatinine level based on numerical values of other blood test parameters in one of the three classes representing different ranges of the variable values. Models are trained using the data from blood test results of healthy and patient subjects including 46 different blood test parameters. The best developed models are random forest and LightGBM. Interpretation of the resulting model reveals a direct relationship between vitamin D and blood creatinine level. The detected analogy between these two parameters is reliable, regarding the relatively high predictive accuracy of the random forest model reaching the AUC of 0.90 and the accuracy of 0.74. Moreover, in this paper we develop a Bayesian network to infer the direct relationships between blood test parameters which have consistent results with the classification models. The proposed framework uses an inclusive set of advanced imputation methods to deal with the main challenge of working with electronic health data, missing values. Hence it can be applied to similar clinical studies to investigate and discover the relationships between the factors under study.
Collapse
|
16
|
Hatherley J, Sparrow R, Howard M. The Virtues of Interpretable Medical Artificial Intelligence. Camb Q Healthc Ethics 2022:1-10. [PMID: 36524245 DOI: 10.1017/s0963180122000305] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Artificial intelligence (AI) systems have demonstrated impressive performance across a variety of clinical tasks. However, notoriously, sometimes these systems are "black boxes." The initial response in the literature was a demand for "explainable AI." However, recently, several authors have suggested that making AI more explainable or "interpretable" is likely to be at the cost of the accuracy of these systems and that prioritizing interpretability in medical AI may constitute a "lethal prejudice." In this article, we defend the value of interpretability in the context of the use of AI in medicine. Clinicians may prefer interpretable systems over more accurate black boxes, which in turn is sufficient to give designers of AI reason to prefer more interpretable systems in order to ensure that AI is adopted and its benefits realized. Moreover, clinicians may be justified in this preference. Achieving the downstream benefits from AI is critically dependent on how the outputs of these systems are interpreted by physicians and patients. A preference for the use of highly accurate black box AI systems, over less accurate but more interpretable systems, may itself constitute a form of lethal prejudice that may diminish the benefits of AI to-and perhaps even harm-patients.
Collapse
Affiliation(s)
- Joshua Hatherley
- School of Philosophical, Historical, and International Studies, Monash University, Clayton, Victoria3168, Australia
| | - Robert Sparrow
- School of Philosophical, Historical, and International Studies, Monash University, Clayton, Victoria3168, Australia
| | - Mark Howard
- School of Philosophical, Historical, and International Studies, Monash University, Clayton, Victoria3168, Australia
| |
Collapse
|
17
|
Kumar M, Bajaj K, Sharma B, Narang S. A Comparative Performance Assessment of Optimized Multilevel Ensemble Learning Model with Existing Classifier Models. BIG DATA 2022; 10:371-387. [PMID: 34881989 DOI: 10.1089/big.2021.0257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
To predict the class level of any classification problem, predictive models are used and mostly a single predictive model is built to predict the class level of any classification problem; current research considers multiple predictive models to predict the class level. Ensemble modeling means instead of building a single predictive model, it is proposed to build a multilevel predictive model, which generalizes to predict all the class levels with an adequate percent of accuracy, that is, from 70% to 90% by applying and using a different combination of classification algorithms. In this article, a multilevel approach for selecting base classifiers for building an ensemble classification model is proposed. The rudimentary concept behind this approach is to drop lousy performing features and collinearity from the selected data set for ensemble modeling. For the evaluation of the proposed multilevel predictive model, different data sets from the University of California, Irvine, repository have been used and comparisons with the modern classifier's models have been conducted. The implementation analyses demonstrate the potency and excellence of the novel approach when compared with other modern classification models (three-layered artificial neural network, Radial Variant Function Neural Network/Fish Swarm Algorithm). The classification accuracy achieved with selected algorithms lies in the range of 70%-88.3%. Among all the selected classification algorithms, the lowest accuracy is achieved by the naive Bayes algorithm, which is close to 71.9%. However, the proposed algorithm (NB-RF-LR-SEMod), which is a combination of different classifiers, achieved a maximum accuracy of 88.3% on the Photographic and Imaging Manufacturers Association Diabetes data set, which is, by far, the best to any single classifier. Hence, this proposed work is helpful for any health care official to detect the diabetes problem at an early stage and prevent the affected person from future complications of it.
Collapse
Affiliation(s)
- Mukesh Kumar
- Department of Computer Science & Engineering, Chitkara University School of Engineering and Technology, Chitkara University, Baddi, Himachal Pradesh, India
| | - Karan Bajaj
- Department of Computer Science & Engineering, Chitkara University School of Engineering and Technology, Chitkara University, Baddi, Himachal Pradesh, India
| | - Bhisham Sharma
- Department of Computer Science & Engineering, Chitkara University School of Engineering and Technology, Chitkara University, Baddi, Himachal Pradesh, India
| | - Sushil Narang
- Department of Computer Science & Engineering, Chitkara University School of Engineering and Technology, Chitkara University, Baddi, Himachal Pradesh, India
| |
Collapse
|
18
|
Liu J, Jiao X, Zeng S, Li H, Jin P, Chi J, Liu X, Yu Y, Ma G, Zhao Y, Li M, Peng Z, Huo Y, Gao QL. Oncological big data platforms for promoting digital competencies and professionalism in Chinese medical students: a cross-sectional study. BMJ Open 2022; 12:e061015. [PMID: 36109032 PMCID: PMC9478867 DOI: 10.1136/bmjopen-2022-061015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVES Advancements in big data technology are reshaping the healthcare system in China. This study aims to explore the role of medical big data in promoting digital competencies and professionalism among Chinese medical students. DESIGN, SETTING AND PARTICIPANTS This study was conducted among 274 medical students who attended a workshop on medical big data conducted on 8 July 2021 in Tongji Hospital. The workshop was based on the first nationwide multifunction gynecologic oncology medical big data platform in China, at the National Union of Real-World Gynecologic Oncology Research & Patient Management Platform (NUWA platform). OUTCOME MEASURES Data on knowledge, attitudes towards big data technology and professionalism were collected before and after the workshop. We have measured the four skill categories: doctor‒patient relationship skills, reflective skills, time management and interprofessional relationship skills using the Professionalism Mini-Evaluation Exercise (P-MEX) as a reflection for professionalism. RESULTS A total of 274 students participated in this workshop and completed all the surveys. Before the workshop, only 27% of them knew the detailed content of medical big data platforms, and 64% knew the potential application of medical big data. The majority of the students believed that big data technology is practical in their clinical practice (77%), medical education (85%) and scientific research (82%). Over 80% of the participants showed positive attitudes toward big data platforms. They also exhibited sufficient professionalism before the workshop. Meanwhile, the workshop significantly promoted students' knowledge of medical big data (p<0.05), and led to more positive attitudes towards big data platforms and higher levels of professionalism. CONCLUSIONS Chinese medical students have primitive acquaintance and positive attitudes toward big data technology. The NUWA platform-based workshop may potentially promote their understanding of big data and enhance professionalism, according to the self-measured P-MEX scale.
Collapse
Affiliation(s)
- Jiahao Liu
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Xiaofei Jiao
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Shaoqing Zeng
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Huayi Li
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Ping Jin
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Jianhua Chi
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Xingyu Liu
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Yang Yu
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Guanchen Ma
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Yingjun Zhao
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Ming Li
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Zikun Peng
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Yabing Huo
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Qing-Lei Gao
- Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
19
|
Cheheltani R, King N, Lee S, North B, Kovarik D, Evans-Molina C, Leavitt N, Dutta S. Predicting misdiagnosed adult-onset type 1 diabetes using machine learning. Diabetes Res Clin Pract 2022; 191:110029. [PMID: 35940302 PMCID: PMC10631495 DOI: 10.1016/j.diabres.2022.110029] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 07/29/2022] [Accepted: 08/01/2022] [Indexed: 11/27/2022]
Abstract
AIMS It is now understood that almost half of newly diagnosed cases of type 1 diabetes are adult-onset. However, type 1 and type 2 diabetes are difficult to initially distinguish clinically in adults, potentially leading to ineffective care. In this study a machine learning model was developed to identify type 1 diabetes patients misdiagnosed as type 2 diabetes. METHODS In this retrospective study, a machine learning model was developed to identify misdiagnosed type 1 diabetes patients from a population of patients with a prior type 2 diabetes diagnosis. Using Ambulatory Electronic Medical Records (AEMR), features capturing relevant information on age, demographics, risk factors, symptoms, treatments, procedures, vitals, or lab results were extracted from patients' medical history. RESULTS The model identified age, BMI/weight, therapy history, and HbA1c/blood glucose values among top predictors of misdiagnosis. Model precision at low levels of recall (10 %) was 17 %, compared to <1 % incidence rate of misdiagnosis at the time of the first type 2 diabetes encounter in AEMR. CONCLUSIONS This algorithm shows potential for being translated into screening guidelines or a clinical decision support tool embedded directly in an EMR system to reduce misdiagnosis of adult-onset type 1 diabetes and implement effective care at the outset.
Collapse
Affiliation(s)
- Rabee Cheheltani
- Predictive Analytics, Real World Solutions, IQVIA, Wayne, PA, USA
| | - Nicholas King
- Predictive Analytics, Real World Solutions, IQVIA, Wayne, PA, USA
| | - Suyin Lee
- Predictive Analytics, Real World Solutions, IQVIA, Wayne, PA, USA
| | - Benjamin North
- Predictive Analytics, Real World Solutions, IQVIA, Wayne, PA, USA
| | | | - Carmella Evans-Molina
- Center for Diabetes and Metabolic Diseases, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Nadejda Leavitt
- Predictive Analytics, Real World Solutions, IQVIA, Wayne, PA, USA
| | | |
Collapse
|
20
|
Bro-Jørgensen W, Hamill JM, Bro R, Solomon GC. Trusting our machines: validating machine learning models for single-molecule transport experiments. Chem Soc Rev 2022; 51:6875-6892. [PMID: 35686581 PMCID: PMC9377421 DOI: 10.1039/d1cs00884f] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Indexed: 11/21/2022]
Abstract
In this tutorial review, we will describe crucial aspects related to the application of machine learning to help users avoid the most common pitfalls. The examples we present will be based on data from the field of molecular electronics, specifically single-molecule electron transport experiments, but the concepts and problems we explore will be sufficiently general for application in other fields with similar data. In the first part of the tutorial review, we will introduce the field of single-molecule transport, and provide an overview of the most common machine learning algorithms employed. In the second part of the tutorial review, we will show, through examples grounded in single-molecule transport, that the promises of machine learning can only be fulfilled by careful application. We will end the tutorial review with a discussion of where we, as a field, could go from here.
Collapse
Affiliation(s)
- William Bro-Jørgensen
- Department of Chemistry and Nano-Science Center, University of Copenhagen, Universitetsparken 5, DK-2100, Copenhagen Ø, Denmark.
| | - Joseph M Hamill
- Department of Chemistry and Nano-Science Center, University of Copenhagen, Universitetsparken 5, DK-2100, Copenhagen Ø, Denmark.
| | - Rasmus Bro
- Department of Food Science, University of Copenhagen, Rolighedsvej 26, 1958 Frederiksberg, Denmark.
| | - Gemma C Solomon
- Department of Chemistry and Nano-Science Center, University of Copenhagen, Universitetsparken 5, DK-2100, Copenhagen Ø, Denmark.
| |
Collapse
|
21
|
Supervised Machine Learning Empowered Multifactorial Genetic Inheritance Disorder Prediction. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1051388. [PMID: 35685134 PMCID: PMC9173933 DOI: 10.1155/2022/1051388] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 04/15/2022] [Accepted: 05/03/2022] [Indexed: 12/18/2022]
Abstract
Fatal diseases like cancer, dementia, and diabetes are very dangerous. This leads to fear of death if these are not diagnosed at early stages. Computer science uses biomedical studies to diagnose cancer, dementia, and diabetes. With the advancement of machine learning, there are various techniques which are accessible to predict and prognosis these diseases based on different datasets. These datasets varied (image datasets and CSV datasets) around the world. So, there is a need for some machine learning classifiers to predict cancer, dementia, and diabetes in a human. In this paper, we used a multifactorial genetic inheritance disorder dataset to predict cancer, dementia, and diabetes. Several studies used different machine learning classifiers to predict cancer, dementia, and diabetes separately with the help of different types of datasets. So, in this paper, multiclass classification proposed methodology used support vector machine (SVM) and K-nearest neighbor (KNN) machine learning techniques to predict three diseases and compared these techniques based on accuracy. Simulation results have shown that the proposed model of SVM and KNN for prediction of dementia, cancer, and diabetes from multifactorial genetic inheritance disorder achieved 92.8% and 92.5%, 92.8% and 91.2% accuracy during training and testing, respectively. So, it is observed that proposed SVM-based dementia, cancer, and diabetes from multifactorial genetic inheritance disorder prediction (MGIDP) give attractive results as compared with the proposed model of KNN. The application of the proposed model helps to prognosis and prediction of cancer, dementia, and diabetes before time and plays a vital role to minimize the death ratio around the world.
Collapse
|
22
|
Huang RJ, Kwon NSE, Tomizawa Y, Choi AY, Hernandez-Boussard T, Hwang JH. A Comparison of Logistic Regression Against Machine Learning Algorithms for Gastric Cancer Risk Prediction Within Real-World Clinical Data Streams. JCO Clin Cancer Inform 2022; 6:e2200039. [PMID: 35763703 DOI: 10.1200/cci.22.00039] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
PURPOSE Noncardia gastric cancer (NCGC) is a leading cause of global cancer mortality, and is often diagnosed at advanced stages. Development of NCGC risk models within electronic health records (EHR) may allow for improved cancer prevention. There has been much recent interest in use of machine learning (ML) for cancer prediction, but few studies comparing ML with classical statistical models for NCGC risk prediction. METHODS We trained models using logistic regression (LR) and four commonly used ML algorithms to predict NCGC from age-/sex-matched controls in two EHR systems: Stanford University and the University of Washington (UW). The LR model contained well-established NCGC risk factors (intestinal metaplasia histology, prior Helicobacter pylori infection, race, ethnicity, nativity status, smoking history, anemia), whereas ML models agnostically selected variables from the EHR. Models were developed and internally validated in the Stanford data, and externally validated in the UW data. Hyperparameter tuning of models was achieved using cross-validation. Model performance was compared by accuracy, sensitivity, and specificity. RESULTS In internal validation, LR performed with comparable accuracy (0.732; 95% CI, 0.698 to 0.764), sensitivity (0.697; 95% CI, 0.647 to 0.744), and specificity (0.767; 95% CI, 0.720 to 0.809) to penalized lasso, support vector machine, K-nearest neighbor, and random forest models. In external validation, LR continued to demonstrate high accuracy, sensitivity, and specificity. Although K-nearest neighbor demonstrated higher accuracy and specificity, this was offset by significantly lower sensitivity. No ML model consistently outperformed LR across evaluation criteria. CONCLUSION Drawing data from two independent EHRs, we find LR on the basis of established risk factors demonstrated comparable performance to optimized ML algorithms. This study demonstrates that classical models built on robust, hand-chosen predictor variables may not be inferior to data-driven models for NCGC risk prediction.
Collapse
Affiliation(s)
- Robert J Huang
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Stanford, CA
| | - Nicole Sung-Eun Kwon
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Stanford, CA
| | - Yutaka Tomizawa
- Division of Gastroenterology, University of Washington, Seattle, WA
| | - Alyssa Y Choi
- Division of Gastroenterology and Hepatology, University of California Irvine, Irvine, CA
| | | | - Joo Ha Hwang
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Stanford, CA
| |
Collapse
|
23
|
Dey S, Chakraborty P, Kwon BC, Dhurandhar A, Ghalwash M, Suarez Saiz FJ, Ng K, Sow D, Varshney KR, Meyer P. Human-centered explainability for life sciences, healthcare, and medical informatics. PATTERNS (NEW YORK, N.Y.) 2022; 3:100493. [PMID: 35607616 PMCID: PMC9122967 DOI: 10.1016/j.patter.2022.100493] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Rapid advances in artificial intelligence (AI) and availability of biological, medical, and healthcare data have enabled the development of a wide variety of models. Significant success has been achieved in a wide range of fields, such as genomics, protein folding, disease diagnosis, imaging, and clinical tasks. Although widely used, the inherent opacity of deep AI models has brought criticism from the research field and little adoption in clinical practice. Concurrently, there has been a significant amount of research focused on making such methods more interpretable, reviewed here, but inherent critiques of such explainability in AI (XAI), its requirements, and concerns with fairness/robustness have hampered their real-world adoption. We here discuss how user-driven XAI can be made more useful for different healthcare stakeholders through the definition of three key personas-data scientists, clinical researchers, and clinicians-and present an overview of how different XAI approaches can address their needs. For illustration, we also walk through several research and clinical examples that take advantage of XAI open-source tools, including those that help enhance the explanation of the results through visualization. This perspective thus aims to provide a guidance tool for developing explainability solutions for healthcare by empowering both subject matter experts, providing them with a survey of available tools, and explainability developers, by providing examples of how such methods can influence in practice adoption of solutions.
Collapse
Affiliation(s)
- Sanjoy Dey
- Center for Computational Health, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Prithwish Chakraborty
- Center for Computational Health, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Bum Chul Kwon
- Center for Computational Health, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Amit Dhurandhar
- IBM Research AI, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Mohamed Ghalwash
- Center for Computational Health, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
- Ain Shams University, Cairo, Egypt
| | | | - Kenney Ng
- Center for Computational Health, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Daby Sow
- IBM Research Security and Compliance, AI Industries, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Kush R. Varshney
- IBM Research AI, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Pablo Meyer
- Center for Computational Health, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| |
Collapse
|
24
|
Kodama S, Fujihara K, Horikawa C, Kitazawa M, Iwanaga M, Kato K, Watanabe K, Nakagawa Y, Matsuzaka T, Shimano H, Sone H. Predictive ability of current machine learning algorithms for type 2 diabetes mellitus: A meta-analysis. J Diabetes Investig 2022; 13:900-908. [PMID: 34942059 PMCID: PMC9077721 DOI: 10.1111/jdi.13736] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 12/09/2021] [Accepted: 12/13/2021] [Indexed: 11/22/2022] Open
Abstract
AIMS/INTRODUCTION Recently, an increasing number of cohort studies have suggested using machine learning (ML) to predict type 2 diabetes mellitus. However, its predictive ability remains inconclusive. This meta-analysis evaluated the current ability of ML algorithms for predicting incident type 2 diabetes mellitus. MATERIALS AND METHODS We systematically searched longitudinal studies published from 1 January 1950 to 17 May 2020 using MEDLINE and EMBASE. Included studies had to compare ML's classification with the actual incidence of type 2 diabetes mellitus, and present data on the number of true positives, false positives, true negatives and false negatives. The dataset for these four values was pooled with a hierarchical summary receiver operating characteristic and a bivariate random effects model. RESULTS There were 12 eligible studies. The pooled sensitivity, specificity, positive likelihood ratio and negative likelihood ratio were 0.81 (95% confidence interval [CI] 0.67-0.90), 0.82 [95% CI 0.74-0.88], 4.55 [95% CI 3.07-6.75] and 0.23 [95% CI 0.13-0.42], respectively. The area under the summarized receiver operating characteristic curve was 0.88 (95% CI 0.85-0.91). CONCLUSIONS Current ML algorithms have sufficient ability to help clinicians determine whether individuals will develop type 2 diabetes mellitus in the future. However, persons should be cautious before changing their attitude toward future diabetes risk after learning the result of the diabetes prediction test using ML algorithms.
Collapse
Affiliation(s)
- Satoru Kodama
- Department of Prevention of Noncommunicable Diseases and Promotion of Health CheckupNiigata University Graduate School of Medical and Dental SciencesNiigataJapan
- Department of Hematology, Endocrinology and MetabolismNiigata University Graduate School of Medical and Dental SciencesNiigataJapan
| | - Kazuya Fujihara
- Department of Hematology, Endocrinology and MetabolismNiigata University Graduate School of Medical and Dental SciencesNiigataJapan
| | - Chika Horikawa
- Department of Health and NutritionFaculty of Human Life StudiesUniversity of Niigata PrefectureNiigataJapan
| | - Masaru Kitazawa
- Department of Prevention of Noncommunicable Diseases and Promotion of Health CheckupNiigata University Graduate School of Medical and Dental SciencesNiigataJapan
| | - Midori Iwanaga
- Department of Prevention of Noncommunicable Diseases and Promotion of Health CheckupNiigata University Graduate School of Medical and Dental SciencesNiigataJapan
- Department of Hematology, Endocrinology and MetabolismNiigata University Graduate School of Medical and Dental SciencesNiigataJapan
| | - Kiminori Kato
- Department of Prevention of Noncommunicable Diseases and Promotion of Health CheckupNiigata University Graduate School of Medical and Dental SciencesNiigataJapan
- Department of Hematology, Endocrinology and MetabolismNiigata University Graduate School of Medical and Dental SciencesNiigataJapan
| | - Kenichi Watanabe
- Department of Prevention of Noncommunicable Diseases and Promotion of Health CheckupNiigata University Graduate School of Medical and Dental SciencesNiigataJapan
- Department of Hematology, Endocrinology and MetabolismNiigata University Graduate School of Medical and Dental SciencesNiigataJapan
| | - Yoshimi Nakagawa
- Division of Complex Biosystem ResearchInstitute of Natural MedicineToyama UniversityToyamaJapan
| | - Takashi Matsuzaka
- Department of Internal Medicine (Endocrinology and Metabolism)Faculty of MedicineUniversity of TsukubaIbarakiJapan
| | - Hitoshi Shimano
- Department of Internal Medicine (Endocrinology and Metabolism)Faculty of MedicineUniversity of TsukubaIbarakiJapan
| | - Hirohito Sone
- Department of Hematology, Endocrinology and MetabolismNiigata University Graduate School of Medical and Dental SciencesNiigataJapan
| |
Collapse
|
25
|
Rajesh N., Irudayasamy A, Mohideen MSK, Ranjith CP. Classification of Vital Genetic Syndromes Associated With Diabetes Using ANN-Based CapsNet Approach. INTERNATIONAL JOURNAL OF E-COLLABORATION 2022. [DOI: 10.4018/ijec.307133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Diabetes has been linked to a wide range of genetic abnormalities or disorders like Cushing syndrome, Wolfram’s syndrome. The factual significance of these relatively uncommon disorders originates from the knowledge that supplies into the potential processes driving prevalent diabetes. Diabetes-related syndromes are presently classified based on clinical and biochemical characteristics. However, until now, no expertise classification strategies are developed for classifying diabetes-associated syndrome disorders efficiently and accurately. Thus, we introduce an Artificial Neural Network framework based on CapsNets to categorize vital genetic disorders related to diabetes. Here, a capsule represents a bundle or set of neurons used to retain data about an essential subject and provides precise information in each image. The suggested approach was systematically compared using cutting-edge methods and basic classification models. With an overall 91.4 percent accuracy, the proposed CapsNets-based method provides the best sensitivity89.93%, specificity 90.77%, and F1-score value 93.10%
Collapse
Affiliation(s)
- Rajesh N.
- University of Technology and Applied Science, Shinas, Oman
| | | | | | | |
Collapse
|
26
|
Zuo M, Zhang W, Xu Q, Chen D. Deep Personal Multitask Prediction of Diabetes Complication with Attentive Interactions Predicting Diabetes Complications by Multitask-Learning. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:5129125. [PMID: 35494508 PMCID: PMC9045985 DOI: 10.1155/2022/5129125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 02/08/2022] [Indexed: 11/17/2022]
Abstract
Objective Diabetic complications have brought a tremendous burden for diabetic patients, but the problem of predicting diabetic complications is still unresolved. Our aim is to explore the relationship between hemoglobin A1C (HbA1c), insulin (INS), and glucose (GLU) and diabetic complications in combination with individual factors and to effectively predict multiple complications of diabetes. Methods This was a real-world study. Data were collected from 40,913 participants with an average age of 48 years from the Department of Endocrinology of Ruijin Hospital in Shanghai. We proposed deep personal multitask prediction of diabetes complication with attentive interactions (DPMP-DC) to predict the five complication models of diabetes, including diabetic retinopathy, diabetic nephropathy, diabetic peripheral neuropathy, diabetic foot disease, and diabetic cardiovascular disease. Results Our model has an accuracy rate of 88.01% for diabetic retinopathy, 89.58% for diabetic nephropathy, 85.77% for diabetic neuropathy, 80.56% for diabetic foot disease, and 82.48% for diabetic cardiovascular disease. The multitasking accuracy of multiple complications is 84.67%, and the missed diagnosis rate is 9.07%. Conclusion We put forward the method of interactive integration with individual factors of patients for the first time in diabetic complications, which reflect the differences between individuals. Our multitask model using the hard sharing mechanism provides better prediction than prior single prediction models.
Collapse
Affiliation(s)
- Ming Zuo
- Glorious Sun School of Business and Management, Donghua University, Shanghai, China
| | - Wei Zhang
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Qi Xu
- Glorious Sun School of Business and Management, Donghua University, Shanghai, China
| | - Dehua Chen
- School of Computer Science and Technology, Donghua University, Shanghai, China
| |
Collapse
|
27
|
Wesson P, Hswen Y, Valdes G, Stojanovski K, Handley MA. Risks and Opportunities to Ensure Equity in the Application of Big Data Research in Public Health. Annu Rev Public Health 2022; 43:59-78. [PMID: 34871504 PMCID: PMC8983486 DOI: 10.1146/annurev-publhealth-051920-110928] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The big data revolution presents an exciting frontier to expand public health research, broadening the scope of research and increasing the precision of answers. Despite these advances, scientists must be vigilant against also advancing potential harms toward marginalized communities. In this review, we provide examples in which big data applications have (unintentionally) perpetuated discriminatory practices, while also highlighting opportunities for big data applications to advance equity in public health. Here, big data is framed in the context of the five Vs (volume, velocity, veracity, variety, and value), and we propose a sixth V, virtuosity, which incorporates equity and justice frameworks. Analytic approaches to improving equity are presented using social computational big data, fairness in machine learning algorithms, medical claims data, and data augmentation as illustrations. Throughout, we emphasize the biasing influence of data absenteeism and positionality and conclude with recommendations for incorporating an equity lens into big data research.
Collapse
Affiliation(s)
- Paul Wesson
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA;
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California, USA
| | - Yulin Hswen
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA;
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California, USA
| | - Gilmer Valdes
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA;
- Department of Radiation Oncology, University of California, San Francisco, California, USA
| | - Kristefer Stojanovski
- Department of Health Behavior and Health Education, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
- Department of Social, Behavioral and Population Sciences, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, USA
| | - Margaret A Handley
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA;
- Department of Medicine, University of California, San Francisco, California, USA
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco, California, USA
- Partnerships for Research in Implementation Science for Equity (PRISE), University of California, San Francisco, California, USA
| |
Collapse
|
28
|
Visual Analytics for Predicting Disease Outcomes Using Laboratory Test Results. INFORMATICS 2022. [DOI: 10.3390/informatics9010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Laboratory tests play an essential role in the early and accurate diagnosis of diseases. In this paper, we propose SUNRISE, a visual analytics system that allows the user to interactively explore the relationships between laboratory test results and a disease outcome. SUNRISE integrates frequent itemset mining (i.e., Eclat algorithm) with extreme gradient boosting (XGBoost) to develop more specialized and accurate prediction models. It also includes interactive visualizations to allow the user to interact with the model and track the decision process. SUNRISE helps the user probe the prediction model by generating input examples and observing how the model responds. Furthermore, it improves the user’s confidence in the generated predictions and provides them the means to validate the model’s response by illustrating the underlying working mechanism of the prediction models through visualization representations. SUNRISE offers a balanced distribution of processing load through the seamless integration of analytical methods with interactive visual representations to support the user’s cognitive tasks. We demonstrate the usefulness of SUNRISE through a usage scenario of exploring the association between laboratory test results and acute kidney injury, using large provincial healthcare databases from Ontario, Canada.
Collapse
|
29
|
Gervasi SS, Chen IY, Smith-McLallen A, Sontag D, Obermeyer Z, Vennera M, Chawla R. The Potential For Bias In Machine Learning And Opportunities For Health Insurers To Address It. Health Aff (Millwood) 2022; 41:212-218. [PMID: 35130064 DOI: 10.1377/hlthaff.2021.01287] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
As the use of machine learning algorithms in health care continues to expand, there are growing concerns about equity, fairness, and bias in the ways in which machine learning models are developed and used in clinical and business decisions. We present a guide to the data ecosystem used by health insurers to highlight where bias can arise along machine learning pipelines. We suggest mechanisms for identifying and dealing with bias and discuss challenges and opportunities to increase fairness through analytics in the health insurance industry.
Collapse
Affiliation(s)
| | - Irene Y Chen
- Irene Y. Chen , Massachusetts Institute of Technology, Cambridge, Massachusetts
| | | | - David Sontag
- David Sontag, Massachusetts Institute of Technology
| | - Ziad Obermeyer
- Ziad Obermeyer, University of California Berkeley, Berkeley, California
| | | | | |
Collapse
|
30
|
|
31
|
Birjandi SM, Khasteh SH. A survey on data mining techniques used in medicine. J Diabetes Metab Disord 2021; 20:2055-2071. [PMID: 34900841 DOI: 10.1007/s40200-021-00884-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 08/22/2021] [Indexed: 12/15/2022]
Abstract
Data mining is the process of analyzing a massive amount of data to identify meaningful patterns and detect relations, which can lead to future trend prediction and appropriate decision making. Data mining applications are significant in marketing, banking, medicine, etc. In this paper, we present an overview of data mining applications in medicine to provide a clear view of the challenges and previous works in this area for researchers. Data mining techniques such as Decision Tree, Random Forest, K-means Clustering, Support Vector Machine, Logistic Regression, Neural Network, Naive Bayes, and association rule mining are used for diagnosing, prognosis, classifying, constructing predictive models, and analyzing risk factors of various diseases. The main objective of the paper is to analyze and compare different data mining techniques used in the medical applications. We present a summary of the results and provide comparison analysis of the data mining methods employed by the reviewed articles. Supplementary Information The online version contains supplementary material available at 10.1007/s40200-021-00884-2.
Collapse
Affiliation(s)
- Saba Maleki Birjandi
- School of Computer Engineering, K. N. Toosi University of Technology, 16317-14191 Tehran, Iran
| | - Seyed Hossein Khasteh
- School of Computer Engineering, K. N. Toosi University of Technology, 16317-14191 Tehran, Iran
- Faculty of Computer Engineering, Seyed Khandan, Shariati Ave, Tehran, Iran
| |
Collapse
|
32
|
Lu H, Uddin S. A weighted patient network-based framework for predicting chronic diseases using graph neural networks. Sci Rep 2021; 11:22607. [PMID: 34799627 PMCID: PMC8604920 DOI: 10.1038/s41598-021-01964-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 11/08/2021] [Indexed: 01/16/2023] Open
Abstract
Chronic disease prediction is a critical task in healthcare. Existing studies fulfil this requirement by employing machine learning techniques based on patient features, but they suffer from high dimensional data problems and a high level of bias. We propose a framework for predicting chronic disease based on Graph Neural Networks (GNNs) to address these issues. We begin by projecting a patient-disease bipartite graph to create a weighted patient network (WPN) that extracts the latent relationship among patients. We then use GNN-based techniques to build prediction models. These models use features extracted from WPN to create robust patient representations for chronic disease prediction. We compare the output of GNN-based models to machine learning methods by using cardiovascular disease and chronic pulmonary disease. The results show that our framework enhances the accuracy of chronic disease prediction. The model with attention mechanisms achieves an accuracy of 93.49% for cardiovascular disease prediction and 89.15% for chronic pulmonary disease prediction. Furthermore, the visualisation of the last hidden layers of GNN-based models shows the pattern for the two cohorts, demonstrating the discriminative strength of the framework. The proposed framework can help stakeholders improve health management systems for patients at risk of developing chronic diseases and conditions.
Collapse
Affiliation(s)
- Haohui Lu
- School of Project Management, Faculty of Engineering, The University of Sydney, 21 Ross St, Forest Lodge, NSW, 2037, Australia
| | - Shahadat Uddin
- School of Project Management, Faculty of Engineering, The University of Sydney, 21 Ross St, Forest Lodge, NSW, 2037, Australia.
| |
Collapse
|
33
|
Gautier T, Ziegler LB, Gerber MS, Campos-Náñez E, Patek SD. Artificial intelligence and diabetes technology: A review. Metabolism 2021; 124:154872. [PMID: 34480920 DOI: 10.1016/j.metabol.2021.154872] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 07/27/2021] [Accepted: 08/28/2021] [Indexed: 12/15/2022]
Abstract
Artificial intelligence (AI) is widely discussed in the popular literature and is portrayed as impacting many aspects of human life, both in and out of the workplace. The potential for revolutionizing healthcare is significant because of the availability of increasingly powerful computational platforms and methods, along with increasingly informative sources of patient data, both in and out of clinical settings. This review aims to provide a realistic assessment of the potential for AI in understanding and managing diabetes, accounting for the state of the art in the methodology and medical devices that collect data, process data, and act accordingly. Acknowledging that many conflicting definitions of AI have been put forth, this article attempts to characterize the main elements of the field as they relate to diabetes, identifying the main perspectives and methods that can (i) affect basic understanding of the disease, (ii) affect understanding of risk factors (genetic, clinical, and behavioral) of diabetes development, (iii) improve diagnosis, (iv) improve understanding of the arc of disease (progression and personal/societal impact), and finally (v) improve treatment.
Collapse
Affiliation(s)
- Thibault Gautier
- Dexcom/TypeZero, 946 Grady Avenue, Suite 203, Charlottesville, VA 22903, United States of America.
| | - Leah B Ziegler
- Dexcom/TypeZero, 946 Grady Avenue, Suite 203, Charlottesville, VA 22903, United States of America
| | - Matthew S Gerber
- Dexcom/TypeZero, 946 Grady Avenue, Suite 203, Charlottesville, VA 22903, United States of America
| | - Enrique Campos-Náñez
- Dexcom/TypeZero, 946 Grady Avenue, Suite 203, Charlottesville, VA 22903, United States of America
| | - Stephen D Patek
- Dexcom/TypeZero, 946 Grady Avenue, Suite 203, Charlottesville, VA 22903, United States of America
| |
Collapse
|
34
|
Rufo DD, Debelee TG, Ibenthal A, Negera WG. Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine (LightGBM). Diagnostics (Basel) 2021; 11:1714. [PMID: 34574055 PMCID: PMC8467876 DOI: 10.3390/diagnostics11091714] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 09/06/2021] [Accepted: 09/17/2021] [Indexed: 12/01/2022] Open
Abstract
Diabetes mellitus (DM) is a severe chronic disease that affects human health and has a high prevalence worldwide. Research has shown that half of the diabetic people throughout the world are unaware that they have DM and its complications are increasing, which presents new research challenges and opportunities. In this paper, we propose a preemptive diagnosis method for diabetes mellitus (DM) to assist or complement the early recognition of the disease in countries with low medical expert densities. Diabetes data are collected from the Zewditu Memorial Hospital (ZMHDD) in Addis Ababa, Ethiopia. Light Gradient Boosting Machine (LightGBM) is one of the most recent successful research findings for the gradient boosting framework that uses tree-based learning algorithms. It has low computational complexity and, therefore, is suited for applications in limited capacity regions such as Ethiopia. Thus, in this study, we apply the principle of LightGBM to develop an accurate model for the diagnosis of diabetes. The experimental results show that the prepared diabetes dataset is informative to predict the condition of diabetes mellitus. With accuracy, AUC, sensitivity, and specificity of 98.1%, 98.1%, 99.9%, and 96.3%, respectively, the LightGBM model outperformed KNN, SVM, NB, Bagging, RF, and XGBoost in the case of the ZMHDD dataset.
Collapse
Affiliation(s)
- Derara Duba Rufo
- College of Engineering and Technology, Dilla University, Dilla 419, Ethiopia;
| | - Taye Girma Debelee
- College of Electrical and Mechanical Engineering, Addis Ababa Science and Technology University, Addis Ababa 120611, Ethiopia;
- Ethiopian Artificial Intelligence Center, Addis Ababa 40782, Ethiopia;
| | - Achim Ibenthal
- Faculty of Engineering and Health, HAWK Universityof Applied Sciences and Arts, 37085 Göttingen, Germany
| | | |
Collapse
|
35
|
Cvetko A, Mangino M, Tijardović M, Kifer D, Falchi M, Keser T, Perola M, Spector TD, Lauc G, Menni C, Gornik O. Plasma N-glycome shows continuous deterioration as the diagnosis of insulin resistance approaches. BMJ Open Diabetes Res Care 2021; 9:9/1/e002263. [PMID: 34518155 PMCID: PMC8438737 DOI: 10.1136/bmjdrc-2021-002263] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 08/22/2021] [Indexed: 12/12/2022] Open
Abstract
INTRODUCTION Prediction of type 2 diabetes mellitus (T2DM) and its preceding factors, such as insulin resistance (IR), is of great importance as it may allow delay or prevention of onset of the disease. Plasma protein N-glycome has emerged as a promising predictive biomarker. In a prospective longitudinal study, we included patients with a first diagnosis of impaired glucose metabolism (IR or T2DM) to investigate the N-glycosylation's predictive value years before diabetes development. RESEARCH DESIGN AND METHODS Plasma protein N-glycome was profiled by hydrophilic interaction ultra-performance liquid chromatography in 534 TwinsUK participants free from disease at baseline. This included 89 participants with incident diagnosis of IR or T2DM during the follow-up period (7.14±3.04 years) whose last sample prior to diagnosis was compared using general linear regression with 445 age-matched unrelated controls. Findings were replicated in an independent cohort. Changes in N-glycome have also been presented in connection with time to diagnosis. RESULTS Eight groups of plasma N-glycans were different between incident IR or T2DM cases and controls (p<0.05) after adjusting for multiple testing using Benjamini-Hochberg correction. These differences were noticeable up to 10 years prior to diagnosis and are changing continuously as becoming more expressed toward the diagnosis. The prediction model was built using significant glycan traits, displaying a discriminative performance with an area under the receiver operating characteristic curve of 0.77. CONCLUSIONS In addition to previous studies, we showed the diagnostic potential of plasma N-glycome in the prediction of both IR and T2DM development years before the clinical manifestation and indicated the continuous deterioration of N-glycome toward the diagnosis.
Collapse
Affiliation(s)
- Ana Cvetko
- University of Zagreb Faculty of Pharmacy and Biochemistry, Zagreb, Croatia
| | - Massimo Mangino
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
- NIHR Biomedical Research Centre at Guy's and St Thomas' Foundation Trust, London, UK
| | - Marko Tijardović
- University of Zagreb Faculty of Pharmacy and Biochemistry, Zagreb, Croatia
| | - Domagoj Kifer
- University of Zagreb Faculty of Pharmacy and Biochemistry, Zagreb, Croatia
| | - Mario Falchi
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Toma Keser
- University of Zagreb Faculty of Pharmacy and Biochemistry, Zagreb, Croatia
| | - Markus Perola
- National Institute for Health and Welfare, Helsinki, Finland
| | - Tim D Spector
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Gordan Lauc
- University of Zagreb Faculty of Pharmacy and Biochemistry, Zagreb, Croatia
- Genos Glycoscience Research Laboratory, Zagreb, Croatia
| | - Cristina Menni
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Olga Gornik
- University of Zagreb Faculty of Pharmacy and Biochemistry, Zagreb, Croatia
| |
Collapse
|
36
|
Cohen NM, Schwartzman O, Jaschek R, Lifshitz A, Hoichman M, Balicer R, Shlush LI, Barbash G, Tanay A. Personalized lab test models to quantify disease potentials in healthy individuals. Nat Med 2021; 27:1582-1591. [PMID: 34426707 DOI: 10.1038/s41591-021-01468-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 07/12/2021] [Indexed: 12/27/2022]
Abstract
Standardized lab tests are central for patient evaluation, differential diagnosis and treatment. Interpretation of these data is nevertheless lacking quantitative and personalized metrics. Here we report on the modeling of 2.1 billion lab measurements of 92 different lab tests from 2.8 million adults over a span of 18 years. Following unsupervised filtering of 131 chronic conditions and 5,223 drug-test pairs we performed a virtual survey of lab tests distributions in healthy individuals. Age and sex alone explain less than 10% of the within-normal test variance in 89 out of 92 tests. Personalized models based on patients' history explain 60% of the variance for 17 tests and over 36% for half of the tests. This allows for systematic stratification of the risk for future abnormal test levels and subsequent emerging disease. Multivariate modeling of within-normal lab tests can be readily implemented as a basis for quantitative patient evaluation.
Collapse
Affiliation(s)
| | - Omer Schwartzman
- Department of Mathematics and Computer Science, Weizmann Institute, Rehovot, Israel.,The Division of Internal Medicine, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Ram Jaschek
- Department of Mathematics and Computer Science, Weizmann Institute, Rehovot, Israel
| | - Aviezer Lifshitz
- Department of Mathematics and Computer Science, Weizmann Institute, Rehovot, Israel
| | - Michael Hoichman
- Department of Mathematics and Computer Science, Weizmann Institute, Rehovot, Israel
| | - Ran Balicer
- Innovation Division, Clalit Research Institute, Clalit Health Services, Tel Aviv, Israel
| | - Liran I Shlush
- Department of Immunology, Weizmann Institute, Rehovot, Israel
| | - Gabi Barbash
- Bench to Bedside Program, Weizmann Institute, Rehovot, Israel
| | - Amos Tanay
- Department of Mathematics and Computer Science, Weizmann Institute, Rehovot, Israel.
| |
Collapse
|
37
|
Stiglic G, Wang F, Sheikh A, Cilar L. Development and validation of the type 2 diabetes mellitus 10-year risk score prediction models from survey data. Prim Care Diabetes 2021; 15:699-705. [PMID: 33896755 DOI: 10.1016/j.pcd.2021.04.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 04/13/2021] [Indexed: 12/23/2022]
Abstract
AIMS In this paper, we demonstrate the development and validation of the 10-years type 2 diabetes mellitus (T2DM) risk prediction models based on large survey data. METHODS The Survey of Health, Ageing and Retirement in Europe (SHARE) data collected in 12 European countries using 53 variables representing behavioural as well as physical and mental health characteristics of the participants aged 50 or older was used to build and validate prediction models. To account for strongly unbalanced outcome variables, each instance was assigned a weight according to the inverse proportion of the outcome label when the regularized logistic regression model was built. RESULTS A pooled sample of 16,363 individuals was used to build and validate a global regularized logistic regression model that achieved an area under the receiver operating characteristic curve of 0.702 (95% CI: 0.698-0.706). Additionally, we measured performance of local country-specific models where AUROC ranged from 0.578 (0.565-0.592) to 0.768 (0.749-0.787). CONCLUSIONS We have developed and validated a survey-based 10-year T2DM risk prediction model for use across 12 European countries. Our results demonstrate the importance of re-calibration of the models as well as strengths of pooling the data from multiple countries to reduce the variance and consequently increase the precision of the results.
Collapse
Affiliation(s)
- Gregor Stiglic
- University of Maribor, Faculty of Health Sciences, Zitna ulica 15, 2000 Maribor, Slovenia; University of Maribor, Faculty of Electrical Engineering and Computer Science, Koroska cesta 46, 2000 Maribor, Slovenia; Usher Institute, University of Edinburgh, Old Medical School, Teviot Place, Edinburgh EH8 9AG, UK.
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, 425 East 61 Street, New York, NY 10065
| | - Aziz Sheikh
- Usher Institute, University of Edinburgh, Old Medical School, Teviot Place, Edinburgh EH8 9AG, UK
| | - Leona Cilar
- University of Maribor, Faculty of Health Sciences, Zitna ulica 15, 2000 Maribor, Slovenia
| |
Collapse
|
38
|
Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18147346. [PMID: 34299797 PMCID: PMC8306487 DOI: 10.3390/ijerph18147346] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 07/02/2021] [Accepted: 07/05/2021] [Indexed: 12/27/2022]
Abstract
Diabetes mellitus is one of the most common human diseases worldwide and may cause several health-related complications. It is responsible for considerable morbidity, mortality, and economic loss. A timely diagnosis and prediction of this disease could provide patients with an opportunity to take the appropriate preventive and treatment strategies. To improve the understanding of risk factors, we predict type 2 diabetes for Pima Indian women utilizing a logistic regression model and decision tree—a machine learning algorithm. Our analysis finds five main predictors of type 2 diabetes: glucose, pregnancy, body mass index (BMI), diabetes pedigree function, and age. We further explore a classification tree to complement and validate our analysis. The six-fold classification tree indicates glucose, BMI, and age are important factors, while the ten-node tree implies glucose, BMI, pregnancy, diabetes pedigree function, and age as the significant predictors. Our preferred specification yields a prediction accuracy of 78.26% and a cross-validation error rate of 21.74%. We argue that our model can be applied to make a reasonable prediction of type 2 diabetes, and could potentially be used to complement existing preventive measures to curb the incidence of diabetes and reduce associated costs.
Collapse
|
39
|
A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02533-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
40
|
Ravaut M, Harish V, Sadeghi H, Leung KK, Volkovs M, Kornas K, Watson T, Poutanen T, Rosella LC. Development and Validation of a Machine Learning Model Using Administrative Health Data to Predict Onset of Type 2 Diabetes. JAMA Netw Open 2021; 4:e2111315. [PMID: 34032855 PMCID: PMC8150694 DOI: 10.1001/jamanetworkopen.2021.11315] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 04/01/2021] [Indexed: 11/14/2022] Open
Abstract
Importance Systems-level barriers to diabetes care could be improved with population health planning tools that accurately discriminate between high- and low-risk groups to guide investments and targeted interventions. Objective To develop and validate a population-level machine learning model for predicting type 2 diabetes 5 years before diabetes onset using administrative health data. Design, Setting, and Participants This decision analytical model study used linked administrative health data from the diverse, single-payer health system in Ontario, Canada, between January 1, 2006, and December 31, 2016. A gradient boosting decision tree model was trained on data from 1 657 395 patients, validated on 243 442 patients, and tested on 236 506 patients. Costs associated with each patient were estimated using a validated costing algorithm. Data were analyzed from January 1, 2006, to December 31, 2016. Exposures A random sample of 2 137 343 residents of Ontario without type 2 diabetes was obtained at study start time. More than 300 features from data sets capturing demographic information, laboratory measurements, drug benefits, health care system interactions, social determinants of health, and ambulatory care and hospitalization records were compiled over 2-year patient medical histories to generate quarterly predictions. Main Outcomes and Measures Discrimination was assessed using the area under the receiver operating characteristic curve statistic, and calibration was assessed visually using calibration plots. Feature contribution was assessed with Shapley values. Costs were estimated in 2020 US dollars. Results This study trained a gradient boosting decision tree model on data from 1 657 395 patients (12 900 257 instances; 6 666 662 women [51.7%]). The developed model achieved a test area under the curve of 80.26 (range, 80.21-80.29), demonstrated good calibration, and was robust to sex, immigration status, area-level marginalization with regard to material deprivation and race/ethnicity, and low contact with the health care system. The top 5% of patients predicted as high risk by the model represented 26% of the total annual diabetes cost in Ontario. Conclusions and Relevance In this decision analytical model study, a machine learning model approach accurately predicted the incidence of diabetes in the population using routinely collected health administrative data. These results suggest that the model could be used to inform decision-making for population health planning and diabetes prevention.
Collapse
Affiliation(s)
- Mathieu Ravaut
- Layer 6 AI, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Vinyas Harish
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Temerty Centre for Artificial Intelligence Research and Education in Medicine, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
| | | | | | | | - Kathy Kornas
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Tristan Watson
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Institute of Clinical Evaluative Sciences (ICES), Toronto, Ontario, Canada
| | | | - Laura C. Rosella
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Temerty Centre for Artificial Intelligence Research and Education in Medicine, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
- Institute of Clinical Evaluative Sciences (ICES), Toronto, Ontario, Canada
- Institute for Better Health, Trillium Health Partners, Mississauga, Ontario, Canada
| |
Collapse
|
41
|
Toth EG, Gibbs D, Moczygemba J, McLeod A. Decision tree modeling in R software to aid clinical decision making. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00542-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
42
|
Lanctin DP, Merced‐Nieves F, Mallett RM, Arensberg MB, Guenter P, Sulo S, Platts‐Mills TF. Prevalence and Economic Burden of Malnutrition Diagnosis Among Patients Presenting to United States Emergency Departments. Acad Emerg Med 2021; 28:325-335. [PMID: 31724782 DOI: 10.1111/acem.13887] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 10/23/2019] [Accepted: 11/06/2019] [Indexed: 01/23/2023]
Abstract
BACKGROUND Malnutrition is a potentially remediable condition that when untreated contributes to poor health and economic outcomes. While assessment of malnutrition risk is improving, its identification rate and economic burden in emergency departments (EDs) is largely unknown. We sought to determine prevalence and economic burden of diagnosed malnutrition among patients presenting to U.S. EDs. METHODS This is a retrospective analysis of Healthcare Cost and Utilization Project Nationwide Emergency Department Sample data. Malnutrition prevalence was confirmed via International Classification of Diseases, 9th Edition, diagnosis codes. The economic burden was assessed by comparing probability of hospitalization and the average total charges between propensity-score matched visits with and without a malnutrition diagnosis. RESULTS Data from 238 million ED visits between 2006 and 2014 were analyzed. Over this period, the prevalence of diagnosed malnutrition increased for all demographic categories assessed. For older adults (≥65 years), the prevalence increased from 2.5% (2006) to 3.6% (2014). Older age, high-income community residence, Western region, urban areas, and Medicare coverage were associated with higher diagnosis prevalence. Malnutrition diagnosis was associated with a 4.23 (95% confidence interval [CI] = 3.93 to 4.55) times higher odds of hospitalization and $21,892 higher mean total charges (95% CI = $19,593 to $24,192). CONCLUSIONS While malnutrition is currently diagnosed at a low rate in U.S. EDs, the economic burden of malnutrition is substantial in this care setting. Given the potential for systematic malnutrition screening and treatment protocols to alleviate this burden, future research is warranted.
Collapse
Affiliation(s)
| | | | | | | | - Peggi Guenter
- the American Society for Parenteral and Enteral Nutrition Silver Spring MD
| | | | | |
Collapse
|
43
|
Ravaut M, Sadeghi H, Leung KK, Volkovs M, Kornas K, Harish V, Watson T, Lewis GF, Weisman A, Poutanen T, Rosella L. Predicting adverse outcomes due to diabetes complications with machine learning using administrative health data. NPJ Digit Med 2021; 4:24. [PMID: 33580109 PMCID: PMC7881135 DOI: 10.1038/s41746-021-00394-8] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 01/11/2021] [Indexed: 02/07/2023] Open
Abstract
Across jurisdictions, government and health insurance providers hold a large amount of data from patient interactions with the healthcare system. We aimed to develop a machine learning-based model for predicting adverse outcomes due to diabetes complications using administrative health data from the single-payer health system in Ontario, Canada. A Gradient Boosting Decision Tree model was trained on data from 1,029,366 patients, validated on 272,864 patients, and tested on 265,406 patients. Discrimination was assessed using the AUC statistic and calibration was assessed visually using calibration plots overall and across population subgroups. Our model predicting three-year risk of adverse outcomes due to diabetes complications (hyper/hypoglycemia, tissue infection, retinopathy, cardiovascular events, amputation) included 700 features from multiple diverse data sources and had strong discrimination (average test AUC = 77.7, range 77.7-77.9). Through the design and validation of a high-performance model to predict diabetes complications adverse outcomes at the population level, we demonstrate the potential of machine learning and administrative health data to inform health planning and healthcare resource allocation for diabetes management.
Collapse
Affiliation(s)
- Mathieu Ravaut
- Layer 6 AI, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | | | | | | | - Kathy Kornas
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Vinyas Harish
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- MD/PhD Program, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Tristan Watson
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- ICES, Toronto, ON, Canada
| | - Gary F Lewis
- Department of Medicine, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Department of Physiology, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Alanna Weisman
- Lunenfeld-Tanenbaum Research Institute, Mt. Sinai Hospital, Toronto, ON, Canada
- Division of Endocrinology and Metabolism, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | | | - Laura Rosella
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
- ICES, Toronto, ON, Canada.
- Vector Institute, Toronto, ON, Canada.
- Institute for Better Health, Trillium Health Partners, Mississauga, ON, Canada.
- Department of Laboratory Medicine & Pathology, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
44
|
Rose S. Intersections of machine learning and epidemiological methods for health services research. Int J Epidemiol 2021; 49:1763-1770. [PMID: 32236476 PMCID: PMC7825941 DOI: 10.1093/ije/dyaa035] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/17/2020] [Indexed: 12/15/2022] Open
Abstract
The field of health services research is broad and seeks to answer questions about the health care system. It is inherently interdisciplinary, and epidemiologists have made crucial contributions. Parametric regression techniques remain standard practice in health services research with machine learning techniques currently having low penetrance in comparison. However, studies in several prominent areas, including health care spending, outcomes and quality, have begun deploying machine learning tools for these applications. Nevertheless, major advances in epidemiological methods are also as yet underleveraged in health services research. This article summarizes the current state of machine learning in key areas of health services research, and discusses important future directions at the intersection of machine learning and epidemiological methods for health services research.
Collapse
Affiliation(s)
- Sherri Rose
- Department of Health Care Policy, Harvard Medical School, 180 Longwood Ave, Boston, MA, 02115, USA
| |
Collapse
|
45
|
Appelbaum L, Cambronero JP, Stevens JP, Horng S, Pollick K, Silva G, Haneuse S, Piatkowski G, Benhaga N, Duey S, Stevenson MA, Mamon H, Kaplan ID, Rinard MC. Development and validation of a pancreatic cancer risk model for the general population using electronic health records: An observational study. Eur J Cancer 2021; 143:19-30. [PMID: 33278770 DOI: 10.1016/j.ejca.2020.10.019] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 10/15/2020] [Accepted: 10/28/2020] [Indexed: 02/07/2023]
Abstract
AIM Pancreatic ductal adenocarcinoma (PDAC) is often diagnosed at a late, incurable stage. We sought to determine whether individuals at high risk of developing PDAC could be identified early using routinely collected data. METHODS Electronic health record (EHR) databases from two independent hospitals in Boston, Massachusetts, providing inpatient, outpatient, and emergency care, from 1979 through 2017, were used with case-control matching. PDAC cases were selected using International Classification of Diseases 9/10 codes and validated with tumour registries. A data-driven feature selection approach was used to develop neural networks and L2-regularised logistic regression (LR) models on training data (594 cases, 100,787 controls) and compared with a published model based on hand-selected diagnoses ('baseline'). Model performance was validated on an external database (408 cases, 160,185 controls). Three prediction lead times (180, 270 and 365 days) were considered. RESULTS The LR model had the best performance, with an area under the curve (AUC) of 0.71 (confidence interval [CI]: 0.67-0.76) for the training set, and AUC 0.68 (CI: 0.65-0.71) for the validation set, 365 days before diagnosis. Data-driven feature selection improved results over 'baseline' (AUC = 0.55; CI: 0.52-0.58). The LR model flags 2692 (CI 2592-2791) of 156,485 as high risk, 365 days in advance, identifying 25 (CI: 16-36) cancer patients. Risk stratification showed that the high-risk group presented a cancer rate 3 to 5 times the prevalence in our data set. CONCLUSION A simple EHR model, based on diagnoses, can identify high-risk individuals for PDAC up to one year in advance. This inexpensive, systematic approach may serve as the first sieve for selection of individuals for PDAC screening programs.
Collapse
Affiliation(s)
- Limor Appelbaum
- Beth Israel Deaconess Medical Center, Department of Radiation Oncology, 330 Brookline Ave, Boston, MA, 02215, USA.
| | - José P Cambronero
- Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA, 02139, USA.
| | - Jennifer P Stevens
- Beth Israel Deaconess Medical Center, Center for Healthcare Delivery Science, 330 Brookline Ave, Boston, MA, 02215, USA.
| | - Steven Horng
- Beth Israel Deaconess Medical Center, Division of Emergency Medicine Informatics, 330 Brookline Ave, Boston, MA, 02215, USA.
| | - Karla Pollick
- Beth Israel Deaconess Medical Center, Center for Healthcare Delivery Science, 330 Brookline Ave, Boston, MA, 02215, USA.
| | - George Silva
- Beth Israel Deaconess Medical Center, Center for Healthcare Delivery Science, 330 Brookline Ave, Boston, MA, 02215, USA.
| | - Sebastien Haneuse
- Harvard University, T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA, 02115, USA.
| | - Gail Piatkowski
- Beth Israel Deaconess Medical Center, Center for Healthcare Delivery Science, 330 Brookline Ave, Boston, MA, 02215, USA.
| | - Nordine Benhaga
- Beth Israel Deaconess Medical Center, Department of Radiation Oncology, 330 Brookline Ave, Boston, MA, 02215, USA.
| | - Stacey Duey
- Brigham and Women's Hospital, Partners Research IS and Computing, Information Systems Department, 75 Francis Street, Boston, MA, 02115, USA.
| | - Mary A Stevenson
- Beth Israel Deaconess Medical Center, Department of Radiation Oncology, 330 Brookline Ave, Boston, MA, 02215, USA.
| | - Harvey Mamon
- Dana Farber Cancer Institute/Radiation Oncology, Brigham and Women's Hospital, Harvard Medical School, 75 Francis Street, Boston, MA, 02115, USA.
| | - Irving D Kaplan
- Beth Israel Deaconess Medical Center, Department of Radiation Oncology, 330 Brookline Ave, Boston, MA, 02215, USA.
| | - Martin C Rinard
- Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA, 02139, USA.
| |
Collapse
|
46
|
Guo K, Fu X, Zhang H, Wang M, Hong S, Ma S. Predicting the postoperative blood coagulation state of children with congenital heart disease by machine learning based on real-world data. Transl Pediatr 2021; 10:33-43. [PMID: 33633935 PMCID: PMC7882284 DOI: 10.21037/tp-20-238] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Postoperative blood coagulation assessment of children with congenital heart disease (CHD) has been developed using a conventional statistical approach. In this study, the machine learning (ML) was used to predict postoperative blood coagulation function of children with CHD, and assess an array of ML models. METHODS This was a retrospective and data mining study. Based on the samples of 1,690 children with CHD, and screening data based on demographic characteristics, conventional coagulation tests (CCTs) and complete blood count (CBC), with a precise data selection process, and the support of data mining and ML algorithms including Decision tree, Naive Bayes, Support Vector Machine (SVM), Adaptive Boost (AdaBoost) and Random Forest model, and explored the best prediction models of postoperative blood coagulation function for children with CHD by models performance measured in the area under the receiver operating characteristic (ROC) curve (AUC), calibration or Lift curves, and further verified the reliability of the models with statistical tests. RESULTS In primary objective prediction, as decision tree, Naive Bayes, SVM, the AUC of our prediction algorithm was 0.81, 0.82, 0.82, respectively. The accuracy rate of the overall forecast has reached more than 75%. Subsequently, we furtherly build improved models. Among them, the true positive rate of the AdaBoost, Random Forest and SVM prediction models reached more than 80% in the ROC curve. These overall accuracy rate indicated a good classification model. Combined calibration curves and Lift curves, the better fit is the SVM model, which predicted postoperative abnormal coagulation, Lift =2.2, postoperative normal coagulation, Lift =1.8. The statistical results furtherly proved the reliability of ML models. The age, sex, mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), white blood cell count (WBC) and platelet count (PLT) were the key features for predicting the postoperative blood coagulation state of children with CHD. CONCLUSIONS ML technology and data mining algorithms may be used for outcome prediction in children with CHD for postoperative blood coagulation state based on the bulk of clinical data, especially CBC indictors from the real world.
Collapse
Affiliation(s)
- Kai Guo
- Department of Transfusion Medicine, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Xiaoyan Fu
- Department of Transfusion Medicine, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Huimin Zhang
- Department of Transfusion Medicine, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Mengjian Wang
- Department of Transfusion Medicine, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Songlin Hong
- Fane Data Technology Corporation, Tianjin, China
| | - Shuxuan Ma
- Department of Transfusion Medicine, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| |
Collapse
|
47
|
Paranjape K, Schinkel M, Hammer RD, Schouten B, Nannan Panday RS, Elbers PWG, Kramer MHH, Nanayakkara P. The Value of Artificial Intelligence in Laboratory Medicine. Am J Clin Pathol 2020; 155:823-831. [PMID: 33313667 PMCID: PMC8130876 DOI: 10.1093/ajcp/aqaa170] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
OBJECTIVES As laboratory medicine continues to undergo digitalization and automation, clinical laboratorians will likely be confronted with the challenges associated with artificial intelligence (AI). Understanding what AI is good for, how to evaluate it, what are its limitations, and how it can be implemented are not well understood. With a survey, we aimed to evaluate the thoughts of stakeholders in laboratory medicine on the value of AI in the diagnostics space and identify anticipated challenges and solutions to introducing AI. METHODS We conducted a web-based survey on the use of AI with participants from Roche's Strategic Advisory Network that included key stakeholders in laboratory medicine. RESULTS In total, 128 of 302 stakeholders responded to the survey. Most of the participants were medical practitioners (26%) or laboratory managers (22%). AI is currently used in the organizations of 15.6%, while 66.4% felt they might use it in the future. Most had an unsure attitude on what they would need to adopt AI in the diagnostics space. High investment costs, lack of proven clinical benefits, number of decision makers, and privacy concerns were identified as barriers to adoption. Education in the value of AI, streamlined implementation and integration into existing workflows, and research to prove clinical utility were identified as solutions needed to mainstream AI in laboratory medicine. CONCLUSIONS This survey demonstrates that specific knowledge of AI in the medical community is poor and that AI education is much needed. One strategy could be to implement new AI tools alongside existing tools.
Collapse
Affiliation(s)
| | - Michiel Schinkel
- Section Acute Medicine, Department of Internal Medicine, Amsterdam UMC
| | - Richard D Hammer
- Department of Pathology and Anatomical Sciences, University of Missouri School of Medicine, Columbia
| | - Bo Schouten
- Amsterdam UMC
- Department of Public and Occupational Health, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - R S Nannan Panday
- Section Acute Medicine, Department of Internal Medicine, Amsterdam UMC
| | - Paul W G Elbers
- Department of Intensive Care Medicine, Amsterdam Medical Data Science, Amsterdam Cardiovascular Science, Amsterdam Infection and Immunity Institute, Amsterdam UMC
| | - Mark H H Kramer
- Board of Directors, Amsterdam UMC, Vrije Universiteit, Amsterdam, The Netherlands
| | | |
Collapse
|
48
|
Metsker O, Magoev K, Yakovlev A, Yanishevskiy S, Kopanitsa G, Kovalchuk S, Krzhizhanovskaya VV. Identification of risk factors for patients with diabetes: diabetic polyneuropathy case study. BMC Med Inform Decis Mak 2020; 20:201. [PMID: 32831065 PMCID: PMC7444272 DOI: 10.1186/s12911-020-01215-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 08/12/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Methods of data mining and analytics can be efficiently applied in medicine to develop models that use patient-specific data to predict the development of diabetic polyneuropathy. However, there is room for improvement in the accuracy of predictive models. Existing studies of diabetes polyneuropathy considered a limited number of predictors in one study to enable a comparison of efficiency of different machine learning methods with different predictors to find the most efficient one. The purpose of this study is the implementation of machine learning methods for identifying the risk of diabetes polyneuropathy based on structured electronic medical records collected in databases of medical information systems. METHODS For the purposes of our study, we developed a structured procedure for predictive modelling, which includes data extraction and preprocessing, model adjustment and performance assessment, selection of the best models and interpretation of results. The dataset contained a total number of 238,590 laboratory records. Each record 27 laboratory tests, age, gender and presence of retinopathy or nephropathy). The records included information about 5846 patients with diabetes. Diagnosis served as a source of information about the target class values for classification. RESULTS It was discovered that inclusion of two expressions, namely "nephropathy" and "retinopathy" allows to increase the performance, achieving up to 79.82% precision, 81.52% recall, 80.64% F1 score, 82.61% accuracy, and 89.88% AUC using the neural network classifier. Additionally, different models showed different results in terms of interpretation significance: random forest confirmed that the most important risk factor for polyneuropathy is the increased neutrophil level, meaning the presence of inflammation in the body. Linear models showed linear dependencies of the presence of polyneuropathy on blood glucose levels, which is confirmed by the clinical interpretation of the importance of blood glucose control. CONCLUSION Depending on whether one needs to identify pathophysiological mechanisms for one's prospective study or identify early or late predictors, the choice of model will vary. In comparison with the previous studies, our research makes a comprehensive comparison of different decisions using a large and well-structured dataset applied to different decision support tasks.
Collapse
Affiliation(s)
- Oleg Metsker
- Almazov National Medical Research Centre, Saint-Petersburg, Russia
| | - Kirill Magoev
- ITMO University, Birzhevaya 4, Saint Petersburg, Russia
- University of Amsterdam, Amsterdam, The Netherlands
| | - Alexey Yakovlev
- Almazov National Medical Research Centre, Saint-Petersburg, Russia
- ITMO University, Birzhevaya 4, Saint Petersburg, Russia
| | | | | | | | - Valeria V Krzhizhanovskaya
- ITMO University, Birzhevaya 4, Saint Petersburg, Russia
- University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
49
|
Mattie H, Reidy P, Bachtiger P, Lindemer E, Nikolaev N, Jouni M, Schaefer J, Sherman M, Panch T. A Framework for Predicting Impactability of Digital Care Management Using Machine Learning Methods. Popul Health Manag 2020; 23:319-325. [DOI: 10.1089/pop.2019.0132] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Heather Mattie
- Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Wellframe, Inc., Boston, Massachusetts, USA
| | | | - Patrik Bachtiger
- Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | | | | | | | | | | | - Trishan Panch
- Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Wellframe, Inc., Boston, Massachusetts, USA
| |
Collapse
|
50
|
Zhang L, Shang X, Sreedharan S, Yan X, Liu J, Keel S, Wu J, Peng W, He M. Predicting the Development of Type 2 Diabetes in a Large Australian Cohort Using Machine-Learning Techniques: Longitudinal Survey Study. JMIR Med Inform 2020; 8:e16850. [PMID: 32720912 PMCID: PMC7420582 DOI: 10.2196/16850] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 02/20/2020] [Accepted: 02/26/2020] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Previous conventional models for the prediction of diabetes could be updated by incorporating the increasing amount of health data available and new risk prediction methodology. OBJECTIVE We aimed to develop a substantially improved diabetes risk prediction model using sophisticated machine-learning algorithms based on a large retrospective population cohort of over 230,000 people who were enrolled in the study during 2006-2017. METHODS We collected demographic, medical, behavioral, and incidence data for type 2 diabetes mellitus (T2DM) in over 236,684 diabetes-free participants recruited from the 45 and Up Study. We predicted and compared the risk of diabetes onset in these participants at 3, 5, 7, and 10 years based on three machine-learning approaches and the conventional regression model. RESULTS Overall, 6.05% (14,313/236,684) of the participants developed T2DM during an average 8.8-year follow-up period. The 10-year diabetes incidence in men was 8.30% (8.08%-8.49%), which was significantly higher (odds ratio 1.37, 95% CI 1.32-1.41) than that in women at 6.20% (6.00%-6.40%). The incidence of T2DM was doubled in individuals with obesity (men: 17.78% [17.05%-18.43%]; women: 14.59% [13.99%-15.17%]) compared with that of nonobese individuals. The gradient boosting machine model showed the best performance among the four models (area under the curve of 79% in 3-year prediction and 75% in 10-year prediction). All machine-learning models predicted BMI as the most significant factor contributing to diabetes onset, which explained 12%-50% of the variance in the prediction of diabetes. The model predicted that if BMI in obese and overweight participants could be hypothetically reduced to a healthy range, the 10-year probability of diabetes onset would be significantly reduced from 8.3% to 2.8% (P<.001). CONCLUSIONS A one-time self-reported survey can accurately predict the risk of diabetes using a machine-learning approach. Achieving a healthy BMI can significantly reduce the risk of developing T2DM.
Collapse
Affiliation(s)
- Lei Zhang
- China-Australia Joint Research Center for Infectious Diseases, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi, China
| | - Xianwen Shang
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| | - Subhashaan Sreedharan
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| | - Xixi Yan
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| | - Jianbin Liu
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| | - Stuart Keel
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| | - Jinrong Wu
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| | - Wei Peng
- Research Centre for Data Analytics and Cognition, La Trobe University, Melbourne, Australia
| | - Mingguang He
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| |
Collapse
|