51
|
Luo X, Gandhi P, Zhang Z, Shao W, Han Z, Chandrasekaran V, Turzhitsky V, Bali V, Roberts AR, Metzger M, Baker J, La Rosa C, Weaver J, Dexter P, Huang K. Applying interpretable deep learning models to identify chronic cough patients using EHR data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 210:106395. [PMID: 34525412 DOI: 10.1016/j.cmpb.2021.106395] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 08/30/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVE Chronic cough (CC) affects approximately 10% of adults. Many disease states are associated with chronic cough, such as asthma, upper airway cough syndrome, bronchitis, and gastroesophageal reflux disease. The lack of an ICD code specific for chronic cough makes it challenging to identify such patients from electronic health records (EHRs). For clinical and research purposes, computational methods using EHR data are urgently needed to identify chronic cough cases. This research aims to investigate the data representations and deep learning algorithms for chronic cough prediction. METHODS Utilizing real-world EHR data from a large academic healthcare system from October 2005 to September 2015, we investigated Natural Language Representation of the EHR data and systematically evaluated deep learning and traditional machine learning models to predict chronic cough patients. We built these machine learning models using structured data (medication and diagnosis) and unstructured data (clinical notes). RESULTS The sensitivity and specificity of a transformer-based deep learning algorithm, specifically BERT with attention model, was 0.856 and 0.866, respectively, using structured data (medication and diagnosis). Sensitivity and specificity improved to 0.952 and 0.930 when we combined structured data with symptoms extracted from clinical notes. We further found that the attention mechanism of deep learning models can be used to extract important features that drive the prediction decisions. Compared with our previously published rule-based algorithm, the deep learning algorithm can identify more chronic cough patients with structured data. CONCLUSIONS By applying deep learning models, chronic cough patients can be reliably identified for prospective or retrospective research through medication and diagnosis data, widely available in EHR and electronic claims data, thus improving the generalizability of the patient identification algorithm. Deep learning models can identify chronic cough patients with even higher sensitivity and specificity when structured and unstructured EHR data are utilized. We anticipate language-based data representation and deep learning models developed in this research could also be productively used for other disease prediction and case identification.
Collapse
Affiliation(s)
- Xiao Luo
- Purdue School of Engineering and Technology, IUPUI, 799W Michigan St, Indianapolis, IN 46202, United States.
| | - Priyanka Gandhi
- Purdue School of Engineering and Technology, IUPUI, 799W Michigan St, Indianapolis, IN 46202, United States.
| | - Zuoyi Zhang
- Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States.
| | - Wei Shao
- Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States.
| | - Zhi Han
- Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States; Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
| | - Vasu Chandrasekaran
- Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
| | - Vladimir Turzhitsky
- Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
| | - Vishal Bali
- Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
| | - Anna R Roberts
- Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
| | - Megan Metzger
- Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
| | - Jarod Baker
- Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
| | - Carmen La Rosa
- Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
| | - Jessica Weaver
- Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
| | - Paul Dexter
- Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States; Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States; Eskenazi Health, 720 Eskenazi Ave, Indianapolis, IN 46202, United States.
| | - Kun Huang
- Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States; Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
| |
Collapse
|
52
|
Estiri H, Strasser ZH, Murphy SN. High-throughput phenotyping with temporal sequences. J Am Med Inform Assoc 2021; 28:772-781. [PMID: 33313899 DOI: 10.1093/jamia/ocaa288] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 11/04/2020] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVE High-throughput electronic phenotyping algorithms can accelerate translational research using data from electronic health record (EHR) systems. The temporal information buried in EHRs is often underutilized in developing computational phenotypic definitions. This study aims to develop a high-throughput phenotyping method, leveraging temporal sequential patterns from EHRs. MATERIALS AND METHODS We develop a representation mining algorithm to extract 5 classes of representations from EHR diagnosis and medication records: the aggregated vector of the records (aggregated vector representation), the standard sequential patterns (sequential pattern mining), the transitive sequential patterns (transitive sequential pattern mining), and 2 hybrid classes. Using EHR data on 10 phenotypes from the Mass General Brigham Biobank, we train and validate phenotyping algorithms. RESULTS Phenotyping with temporal sequences resulted in a superior classification performance across all 10 phenotypes compared with the standard representations in electronic phenotyping. The high-throughput algorithm's classification performance was superior or similar to the performance of previously published electronic phenotyping algorithms. We characterize and evaluate the top transitive sequences of diagnosis records paired with the records of risk factors, symptoms, complications, medications, or vaccinations. DISCUSSION The proposed high-throughput phenotyping approach enables seamless discovery of sequential record combinations that may be difficult to assume from raw EHR data. Transitive sequences offer more accurate characterization of the phenotype, compared with its individual components, and reflect the actual lived experiences of the patients with that particular disease. CONCLUSION Sequential data representations provide a precise mechanism for incorporating raw EHR records into downstream machine learning. Our approach starts with user interpretability and works backward to the technology.
Collapse
Affiliation(s)
- Hossein Estiri
- Harvard Medical School, Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA.,Mass General Brigham, Boston, Massachusetts, USA
| | - Zachary H Strasser
- Harvard Medical School, Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA.,Mass General Brigham, Boston, Massachusetts, USA
| | - Shawn N Murphy
- Harvard Medical School, Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA.,Mass General Brigham, Boston, Massachusetts, USA
| |
Collapse
|
53
|
Xiong Y, Peng W, Chen Q, Huang Z, Tang B. A Unified Machine Reading Comprehension Framework for Cohort Selection. IEEE J Biomed Health Inform 2021; 26:379-387. [PMID: 34236972 DOI: 10.1109/jbhi.2021.3095478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Cohort selection is an essential prerequisite for clinical research, determining whether an individual satisfies given selection criteria. Previous works for cohort selection usually treated each selection criterion independently and ignored not only the meaning of each selection criterion but the relations among cohort selection criteria. To solve the problems above, we propose a novel unified machine reading comprehension (MRC) framework. In this MRC framework, we design simple rules to generate questions for each criterion from cohort selection guidelines and treat clues extracted by trigger words from patients' medical records as passages. A series of state-of-the-art MRC models based on BiDAF, BIMPM, BERT, BioBERT, NCBI-BERT, and RoBERTa are deployed to determine which question and passage pairs match. We also introduce a cross-criterion attention mechanism on representations of question and passage pairs to model relations among cohort selection criteria. Results on two datasets, that is, the dataset of the 2018 National NLP Clinical Challenge (N2C2) for cohort selection and a dataset from the MIMIC-III dataset, show that our NCBI-BERT MRC model with cross-criterion attention mechanism achieves the highest micro-averaged F1-score of 0.9070 on the N2C2 dataset and 0.8353 on the MIMIC-III dataset. It is competitive to the best system that relies on a large number of rules defined by medical experts on the N2C2 dataset. Comparing these two models, we find that the NCBI-BERT MRC model mainly performs worse on mathematical logic criteria. When using rules instead of the NCBI-BERT MRC model on some criteria regarding mathematical logic on the N2C2 dataset, we obtain a new benchmark with an F1-score of 0.9163, indicating that it is easy to integrate rules into MRC models for improvement.
Collapse
|
54
|
Enriquez JS, Chu Y, Pudakalakatti S, Hsieh KL, Salmon D, Dutta P, Millward NZ, Lurie E, Millward S, McAllister F, Maitra A, Sen S, Killary A, Zhang J, Jiang X, Bhattacharya PK, Shams S. Hyperpolarized Magnetic Resonance and Artificial Intelligence: Frontiers of Imaging in Pancreatic Cancer. JMIR Med Inform 2021; 9:e26601. [PMID: 34137725 PMCID: PMC8277399 DOI: 10.2196/26601] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 02/24/2021] [Accepted: 04/03/2021] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND There is an unmet need for noninvasive imaging markers that can help identify the aggressive subtype(s) of pancreatic ductal adenocarcinoma (PDAC) at diagnosis and at an earlier time point, and evaluate the efficacy of therapy prior to tumor reduction. In the past few years, there have been two major developments with potential for a significant impact in establishing imaging biomarkers for PDAC and pancreatic cancer premalignancy: (1) hyperpolarized metabolic (HP)-magnetic resonance (MR), which increases the sensitivity of conventional MR by over 10,000-fold, enabling real-time metabolic measurements; and (2) applications of artificial intelligence (AI). OBJECTIVE Our objective of this review was to discuss these two exciting but independent developments (HP-MR and AI) in the realm of PDAC imaging and detection from the available literature to date. METHODS A systematic review following the PRISMA extension for Scoping Reviews (PRISMA-ScR) guidelines was performed. Studies addressing the utilization of HP-MR and/or AI for early detection, assessment of aggressiveness, and interrogating the early efficacy of therapy in patients with PDAC cited in recent clinical guidelines were extracted from the PubMed and Google Scholar databases. The studies were reviewed following predefined exclusion and inclusion criteria, and grouped based on the utilization of HP-MR and/or AI in PDAC diagnosis. RESULTS Part of the goal of this review was to highlight the knowledge gap of early detection in pancreatic cancer by any imaging modality, and to emphasize how AI and HP-MR can address this critical gap. We reviewed every paper published on HP-MR applications in PDAC, including six preclinical studies and one clinical trial. We also reviewed several HP-MR-related articles describing new probes with many functional applications in PDAC. On the AI side, we reviewed all existing papers that met our inclusion criteria on AI applications for evaluating computed tomography (CT) and MR images in PDAC. With the emergence of AI and its unique capability to learn across multimodal data, along with sensitive metabolic imaging using HP-MR, this knowledge gap in PDAC can be adequately addressed. CT is an accessible and widespread imaging modality worldwide as it is affordable; because of this reason alone, most of the data discussed are based on CT imaging datasets. Although there were relatively few MR-related papers included in this review, we believe that with rapid adoption of MR imaging and HP-MR, more clinical data on pancreatic cancer imaging will be available in the near future. CONCLUSIONS Integration of AI, HP-MR, and multimodal imaging information in pancreatic cancer may lead to the development of real-time biomarkers of early detection, assessing aggressiveness, and interrogating early efficacy of therapy in PDAC.
Collapse
Affiliation(s)
- José S Enriquez
- Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Yan Chu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Shivanand Pudakalakatti
- Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Kang Lin Hsieh
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Duncan Salmon
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, United States
| | - Prasanta Dutta
- Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Niki Zacharias Millward
- Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Urology, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Eugene Lurie
- Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Steven Millward
- Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Florencia McAllister
- Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Clinical Cancer Prevention, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Anirban Maitra
- Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Pathology, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Subrata Sen
- Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Ann Killary
- Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Jian Zhang
- Division of Computer Science and Engineering, Louisiana State University, Baton Rouge, LA, United States
| | - Xiaoqian Jiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Pratip K Bhattacharya
- Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Shayan Shams
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
| |
Collapse
|
55
|
Zhou L, Zheng X, Yang D, Wang Y, Bai X, Ye X. Application of multi-label classification models for the diagnosis of diabetic complications. BMC Med Inform Decis Mak 2021; 21:182. [PMID: 34098959 PMCID: PMC8182940 DOI: 10.1186/s12911-021-01525-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/28/2021] [Indexed: 12/23/2022] Open
Abstract
Background Early diagnosis for the diabetes complications is clinically demanding with great significancy. Regarding the complexity of diabetes complications, we applied a multi-label classification (MLC) model to predict four diabetic complications simultaneously using data in the modern electronic health records (EHRs), and leveraged the correlations between the complications to further improve the prediction accuracy. Methods We obtained the demographic characteristics and laboratory data from the EHRs for patients admitted to Changzhou No. 2 People’s Hospital, the affiliated hospital of Nanjing Medical University in China from May 2013 to June 2020. The data included 93 biochemical indicators and 9,765 patients. We used the Pearson correlation coefficient (PCC) to analyze the correlations between different diabetic complications from a statistical perspective. We used an MLC model, based on the Random Forest (RF) technique, to leverage these correlations and predict four complications simultaneously. We explored four different MLC models; a Label Power Set (LP), Classifier Chains (CC), Ensemble Classifier Chains (ECC), and Calibrated Label Ranking (CLR). We used traditional Binary Relevance (BR) as a comparison. We used 11 different performance metrics and the area under the receiver operating characteristic curve (AUROC) to evaluate these models. We analyzed the weights of the learned model and illustrated (1) the top 10 key indicators of different complications and (2) the correlations between different diabetic complications. Results The MLC models including CC, ECC and CLR outperformed the traditional BR method in most performance metrics; the ECC models performed the best in Hamming loss (0.1760), Accuracy (0.7020), F1_Score (0.7855), Precision (0.8649), F1_micro (0.8078), F1_macro (0.7773), Recall_micro (0.8631), Recall_macro (0.8009), and AUROC (0.8231). The two diabetic complication correlation matrices drawn from the PCC analysis and the MLC models were consistent with each other and indicated that the complications correlated to different extents. The top 10 key indicators given by the model are valuable in medical application. Conclusions Our MLC model can effectively utilize the potential correlation between different diabetic complications to further improve the prediction accuracy. This model should be explored further in other complex diseases with multiple complications. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-021-01525-7.
Collapse
Affiliation(s)
- Liang Zhou
- Department of Endocrinology, Changzhou No.2 People's Hospital Affiliated to Nanjing Medical University, 29 Xinglongxiang Road, Changzhou City, 213000, Jiangsu Province, China
| | - Xiaoyuan Zheng
- Department of Endocrinology, Changzhou No.2 People's Hospital Affiliated to Nanjing Medical University, 29 Xinglongxiang Road, Changzhou City, 213000, Jiangsu Province, China
| | - Di Yang
- Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Ying Wang
- Department of Endocrinology, Changzhou No.2 People's Hospital Affiliated to Nanjing Medical University, 29 Xinglongxiang Road, Changzhou City, 213000, Jiangsu Province, China
| | - Xuesong Bai
- Capital Medical University, Beijing, 100053, China
| | - Xinhua Ye
- Department of Endocrinology, Changzhou No.2 People's Hospital Affiliated to Nanjing Medical University, 29 Xinglongxiang Road, Changzhou City, 213000, Jiangsu Province, China.
| |
Collapse
|
56
|
Machine Learning: Algorithms, Real-World Applications and Research Directions. ACTA ACUST UNITED AC 2021; 2:160. [PMID: 33778771 PMCID: PMC7983091 DOI: 10.1007/s42979-021-00592-x] [Citation(s) in RCA: 463] [Impact Index Per Article: 154.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 03/12/2021] [Indexed: 12/16/2022]
Abstract
In the current age of the Fourth Industrial Revolution (4IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the knowledge of artificial intelligence (AI), particularly, machine learning (ML) is the key. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and reinforcement learning exist in the area. Besides, the deep learning, which is part of a broader family of machine learning methods, can intelligently analyze the data on a large scale. In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, this study’s key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more. We also highlight the challenges and potential research directions based on our study. Overall, this paper aims to serve as a reference point for both academia and industry professionals as well as for decision-makers in various real-world situations and application areas, particularly from the technical point of view.
Collapse
|
57
|
Saberi-Karimian M, Khorasanchi Z, Ghazizadeh H, Tayefi M, Saffar S, Ferns GA, Ghayour-Mobarhan M. Potential value and impact of data mining and machine learning in clinical diagnostics. Crit Rev Clin Lab Sci 2021; 58:275-296. [PMID: 33739235 DOI: 10.1080/10408363.2020.1857681] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Data mining involves the use of mathematical sciences, statistics, artificial intelligence, and machine learning to determine the relationships between variables from a large sample of data. It has previously been shown that data mining can improve the prediction and diagnostic precision of type 2 diabetes mellitus. A few studies have applied machine learning to assess hypertension and metabolic syndrome-related biomarkers, as well as refine the assessment of cardiovascular disease risk. Machine learning methods have also been applied to assess new biomarkers and survival outcomes in patients with renal diseases to predict the development of chronic kidney disease, disease progression, and renal graft survival. In the latter, random forest methods were found to be the best for the prediction of chronic kidney disease. Some studies have investigated the prognosis of nonalcoholic fatty liver disease and acute liver failure, as well as therapy response prediction in patients with viral disorders, using decision tree models. Machine learning techniques, such as Sparse High-Order Interaction Model with Rejection Option, have been used for diagnosing Alzheimer's disease. Data mining techniques have also been applied to identify the risk factors for serious mental illness, such as depression and dementia, and help to diagnose and predict the quality of life of such patients. In relation to child health, some studies have determined the best algorithms for predicting obesity and malnutrition. Machine learning has determined the important risk factors for preterm birth and low birth weight. Published studies of patients with cancer and bacterial diseases are limited and should perhaps be addressed more comprehensively in future studies. Herein, we provide an in-depth review of studies in which biochemical biomarker data were analyzed using machine learning methods to assess the risk of several common diseases, in order to summarize the potential applications of data mining methods in clinical diagnosis. Data mining techniques have now been increasingly applied to clinical diagnostics, and they have the potential to support this field.
Collapse
Affiliation(s)
- Maryam Saberi-Karimian
- International UNESCO Center for Health Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran.,Student Research Committee, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Zahra Khorasanchi
- Department of Nutrition, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Hamideh Ghazizadeh
- International UNESCO Center for Health Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran.,Student Research Committee, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Maryam Tayefi
- Norwegian Center for e-health Research, University Hospital of North Norway, Tromsø, Norway
| | - Sara Saffar
- International UNESCO Center for Health Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Division of Medical Education, Brighton and Sussex Medical School, Falmer, UK
| | - Majid Ghayour-Mobarhan
- International UNESCO Center for Health Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
58
|
Understanding current states of machine learning approaches in medical informatics: a systematic literature review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00538-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
59
|
Comparison of Diagnosis Accuracy between a Backpropagation Artificial Neural Network Model and Linear Regression in Digestive Disease Patients: an Empirical Research. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6662779. [PMID: 33727951 PMCID: PMC7937476 DOI: 10.1155/2021/6662779] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 12/10/2020] [Accepted: 02/18/2021] [Indexed: 02/08/2023]
Abstract
Introduction A Noninvasive diagnosis model for digestive diseases is the vital issue for the current clinical research. Our systematic review is aimed at demonstrating diagnosis accuracy between the BP-ANN algorithm and linear regression in digestive disease patients, including their activation function and data structure. Methods We reported the systematic review according to the PRISMA guidelines. We searched related articles from seven electronic scholarly databases for comparison of the diagnosis accuracy focusing on BP-ANN and linear regression. The characteristics, patient number, input/output marker, diagnosis accuracy, and results/conclusions related to comparison were extracted independently based on inclusion criteria. Results Nine articles met all the criteria and were enrolled in our review. Of those enrolled articles, the publishing year ranged from 1991 to 2017. The sample size ranged from 42 to 3222 digestive disease patients, and all of the patients showed comparable biomarkers between the BP-ANN algorithm and linear regression. According to our study, 8 literature demonstrated that the BP-ANN model is superior to linear regression in predicting the disease outcome based on AUROC results. One literature reported linear regression to be superior to BP-ANN for the early diagnosis of colorectal cancer. Conclusion The BP-ANN algorithm and linear regression both had high capacity in fitting the diagnostic model and BP-ANN displayed more prediction accuracy for the noninvasive diagnosis model of digestive diseases. We compared the activation functions and data structure between BP-ANN and linear regression for fitting the diagnosis model, and the data suggested that BP-ANN was a comprehensive recommendation algorithm.
Collapse
|
60
|
Annapragada AV, Donaruma-Kwoh MM, Annapragada AV, Starosolski ZA. A natural language processing and deep learning approach to identify child abuse from pediatric electronic medical records. PLoS One 2021; 16:e0247404. [PMID: 33635890 PMCID: PMC7909689 DOI: 10.1371/journal.pone.0247404] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 02/07/2021] [Indexed: 01/16/2023] Open
Abstract
Child physical abuse is a leading cause of traumatic injury and death in children. In 2017, child abuse was responsible for 1688 fatalities in the United States, of 3.5 million children referred to Child Protection Services and 674,000 substantiated victims. While large referral hospitals maintain teams trained in Child Abuse Pediatrics, smaller community hospitals often do not have such dedicated resources to evaluate patients for potential abuse. Moreover, identification of abuse has a low margin of error, as false positive identifications lead to unwarranted separations, while false negatives allow dangerous situations to continue. This context makes the consistent detection of and response to abuse difficult, particularly given subtle signs in young, non-verbal patients. Here, we describe the development of artificial intelligence algorithms that use unstructured free-text in the electronic medical record-including notes from physicians, nurses, and social workers-to identify children who are suspected victims of physical abuse. Importantly, only the notes from time of first encounter (e.g.: birth, routine visit, sickness) to the last record before child protection team involvement were used. This allowed us to develop an algorithm using only information available prior to referral to the specialized child protection team. The study was performed in a multi-center referral pediatric hospital on patients screened for abuse within five different locations between 2015 and 2019. Of 1123 patients, 867 records were available after data cleaning and processing, and 55% were abuse-positive as determined by a multi-disciplinary team of clinical professionals. These electronic medical records were encoded with three natural language processing (NLP) algorithms-Bag of Words (BOW), Word Embeddings (WE), and Rules-Based (RB)-and used to train multiple neural network architectures. The BOW and WE encodings utilize the full free-text, while RB selects crucial phrases as identified by physicians. The best architecture was selected by average classification accuracy for the best performing model from each train-test split of a cross-validation experiment. Natural language processing coupled with neural networks detected cases of likely child abuse using only information available to clinicians prior to child protection team referral with average accuracy of 0.90±0.02 and average area under the receiver operator characteristic curve (ROC-AUC) 0.93±0.02 for the best performing Bag of Words models. The best performing rules-based models achieved average accuracy of 0.77±0.04 and average ROC-AUC 0.81±0.05, while a Word Embeddings strategy was severely limited by lack of representative embeddings. Importantly, the best performing model had a false positive rate of 8%, as compared to rates of 20% or higher in previously reported studies. This artificial intelligence approach can help screen patients for whom an abuse concern exists and streamline the identification of patients who may benefit from referral to a child protection team. Furthermore, this approach could be applied to develop computer-aided-diagnosis platforms for the challenging and often intractable problem of reliably identifying pediatric patients suffering from physical abuse.
Collapse
Affiliation(s)
- Akshaya V. Annapragada
- Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, United States of America
| | | | - Ananth V. Annapragada
- The Singleton Department of Pediatric Radiology, Texas Children’s Hospital, Houston, TX, United States of America
- Department of Radiology, Baylor College of Medicine, Houston, TX, United States of America
| | - Zbigniew A. Starosolski
- The Singleton Department of Pediatric Radiology, Texas Children’s Hospital, Houston, TX, United States of America
- Department of Radiology, Baylor College of Medicine, Houston, TX, United States of America
| |
Collapse
|
61
|
Okui T, Nojiri C, Kimura S, Abe K, Maeno S, Minami M, Maeda Y, Tajima N, Kawamura T, Nakashima N. Performance evaluation of case definitions of type 1 diabetes for health insurance claims data in Japan. BMC Med Inform Decis Mak 2021; 21:52. [PMID: 33573645 PMCID: PMC7879626 DOI: 10.1186/s12911-021-01422-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 01/25/2021] [Indexed: 12/18/2022] Open
Abstract
Background No case definition of Type 1 diabetes (T1D) for the claims data has been proposed in Japan yet. This study aimed to evaluate the performance of candidate case definitions for T1D using Electronic health care records (EHR) and claims data in a University Hospital in Japan. Methods The EHR and claims data for all the visiting patients in a University Hospital were used. As the candidate case definitions for claims data, we constructed 11 definitions by combinations of International Statistical Classification of Diseases and Related Health Problems, Tenth Revision. (ICD 10) code of T1D, the claims code of insulin needles for T1D patients, basal insulin, and syringe pump for continuous subcutaneous insulin infusion (CSII). We constructed a predictive model for T1D patients using disease names, medical practices, and medications as explanatory variables. The predictive model was applied to patients of test group (validation data), and performances of candidate case definitions were evaluated. Results As a result of performance evaluation, the sensitivity of the confirmed disease name of T1D was 32.9 (95% CI: 28.4, 37.2), and positive predictive value (PPV) was 33.3 (95% CI: 38.0, 38.4). By using the case definition of both the confirmed diagnosis of T1D and either of the claims code of the two insulin treatment methods (i.e., syringe pump for CSII and insulin needles), PPV improved to 90.2 (95% CI: 85.2, 94.4). Conclusions We have established a case definition with high PPV, and the case definition can be used for precisely detecting T1D patients from claims data in Japan.
Collapse
Affiliation(s)
- Tasuku Okui
- Medical Information Center, Kyushu University Hospital, Maidashi 3-1-1 Higashi-ku, Fukuoka City, Fukuoka Prefecture, 812-8582, Japan.
| | - Chinatsu Nojiri
- Medical Information Center, Kyushu University Hospital, Maidashi 3-1-1 Higashi-ku, Fukuoka City, Fukuoka Prefecture, 812-8582, Japan
| | - Shinichiro Kimura
- Department of Molecular Medicine and Metabolism, Research Institute of Environmental Medicine, Nagoya University, Nagoya, Japan
| | - Kentaro Abe
- National Hospital Organization Kokura Medical Center, Fukuoka, Japan
| | | | | | | | - Naoko Tajima
- Jikei University School of Medicine, Tokyo, Japan
| | | | - Naoki Nakashima
- Medical Information Center, Kyushu University Hospital, Maidashi 3-1-1 Higashi-ku, Fukuoka City, Fukuoka Prefecture, 812-8582, Japan
| |
Collapse
|
62
|
Wu B, Chow W, Sakthivel M, Kakade O, Gupta K, Israel D, Chen YW, Kuruvilla AS. Body Mass Index Variable Interpolation to Expand the Utility of Real-world Administrative Healthcare Claims Database Analyses. Adv Ther 2021; 38:1314-1327. [PMID: 33432543 PMCID: PMC7889527 DOI: 10.1007/s12325-020-01605-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 12/11/2020] [Indexed: 12/23/2022]
Abstract
INTRODUCTION Administrative claims data provide an important source for real-world evidence (RWE) generation, but incomplete reporting, such as for body mass index (BMI), limits the sample sizes that can be analyzed to address certain research questions. The objective of this study was to construct models by implementing machine-learning (ML) algorithms to predict BMI classifications (≥ 30, ≥ 35, and ≥ 40 kg/m2) in administrative healthcare claims databases, and then internally and externally validate them. METHODS Five advanced ML algorithms were implemented for each BMI classification on a random sampling of BMI readings from the Optum PanTher Electronic Health Record database (2%) and the Optum Clinformatics Date of Death (20%) database, while incorporating baseline demographic and clinical characteristics. Sensitivity analyses with oversampling ratios were conducted. Model performance was validated internally and externally. RESULTS Models trained on the Super Learner ML algorithm (SLA) yielded the best BMI classification predictive performance. SLA model 1 utilized sociodemographic and clinical characteristics, including baseline BMI values; the area under the receiver operating characteristic curve (ROC AUC) was approximately 88% for the prediction of BMI classifications of ≥ 30, ≥ 35, and ≥ 40 kg/m2 (internal validation), while accuracy ranged from 87.9% to 92.8% and specificity ranged from 91.8% to 94.7%. SLA model 2 utilized sociodemographic information and clinical characteristics, excluding baseline BMI values; ROC AUC was approximately 73% for the prediction of BMI classifications of ≥ 30, ≥ 35, and ≥ 40 kg/m2 (internal validation), while accuracy ranged from 73.6% to 80.0% and specificity ranged from 71.6% to 85.9%. The external validation on the MarketScan Commercial Claims and Encounters database yielded relatively consistent results with slightly diminished performance. CONCLUSION This study demonstrated the feasibility and validity of using ML algorithms to predict BMI classifications in administrative healthcare claims data to expand the utility for RWE generation.
Collapse
|
63
|
Lee S, Doktorchik C, Martin EA, D'Souza AG, Eastwood C, Shaheen AA, Naugler C, Lee J, Quan H. Electronic Medical Record-Based Case Phenotyping for the Charlson Conditions: Scoping Review. JMIR Med Inform 2021; 9:e23934. [PMID: 33522976 PMCID: PMC7884219 DOI: 10.2196/23934] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 11/20/2020] [Accepted: 12/05/2020] [Indexed: 12/16/2022] Open
Abstract
Background Electronic medical records (EMRs) contain large amounts of rich clinical information. Developing EMR-based case definitions, also known as EMR phenotyping, is an active area of research that has implications for epidemiology, clinical care, and health services research. Objective This review aims to describe and assess the present landscape of EMR-based case phenotyping for the Charlson conditions. Methods A scoping review of EMR-based algorithms for defining the Charlson comorbidity index conditions was completed. This study covered articles published between January 2000 and April 2020, both inclusive. Embase (Excerpta Medica database) and MEDLINE (Medical Literature Analysis and Retrieval System Online) were searched using keywords developed in the following 3 domains: terms related to EMR, terms related to case finding, and disease-specific terms. The manuscript follows the Preferred Reporting Items for Systematic reviews and Meta-analyses extension for Scoping Reviews (PRISMA) guidelines. Results A total of 274 articles representing 299 algorithms were assessed and summarized. Most studies were undertaken in the United States (181/299, 60.5%), followed by the United Kingdom (42/299, 14.0%) and Canada (15/299, 5.0%). These algorithms were mostly developed either in primary care (103/299, 34.4%) or inpatient (168/299, 56.2%) settings. Diabetes, congestive heart failure, myocardial infarction, and rheumatology had the highest number of developed algorithms. Data-driven and clinical rule–based approaches have been identified. EMR-based phenotype and algorithm development reflect the data access allowed by respective health systems, and algorithms vary in their performance. Conclusions Recognizing similarities and differences in health systems, data collection strategies, extraction, data release protocols, and existing clinical pathways is critical to algorithm development strategies. Several strategies to assist with phenotype-based case definitions have been proposed.
Collapse
Affiliation(s)
- Seungwon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Chelsea Doktorchik
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Elliot Asher Martin
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
| | - Adam Giles D'Souza
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
| | - Cathy Eastwood
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Abdel Aziz Shaheen
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Christopher Naugler
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Pathology and Laboratory Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Joon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Hude Quan
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
64
|
Deepa N, Prabadevi B, Maddikunta PK, Gadekallu TR, Baker T, Khan MA, Tariq U. An AI-based intelligent system for healthcare analysis using Ridge-Adaline Stochastic Gradient Descent Classifier. THE JOURNAL OF SUPERCOMPUTING 2021; 77:1998-2017. [DOI: 10.1007/s11227-020-03347-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
|
65
|
Xiong Y, Shi X, Chen S, Jiang D, Tang B, Wang X, Chen Q, Yan J. Cohort selection for clinical trials using hierarchical neural network. J Am Med Inform Assoc 2021; 26:1203-1208. [PMID: 31305921 DOI: 10.1093/jamia/ocz099] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Revised: 04/28/2019] [Accepted: 06/13/2019] [Indexed: 12/22/2022] Open
Abstract
OBJECTIVE Cohort selection for clinical trials is a key step for clinical research. We proposed a hierarchical neural network to determine whether a patient satisfied selection criteria or not. MATERIALS AND METHODS We designed a hierarchical neural network (denoted as CNN-Highway-LSTM or LSTM-Highway-LSTM) for the track 1 of the national natural language processing (NLP) clinical challenge (n2c2) on cohort selection for clinical trials in 2018. The neural network is composed of 5 components: (1) sentence representation using convolutional neural network (CNN) or long short-term memory (LSTM) network; (2) a highway network to adjust information flow; (3) a self-attention neural network to reweight sentences; (4) document representation using LSTM, which takes sentence representations in chronological order as input; (5) a fully connected neural network to determine whether each criterion is met or not. We compared the proposed method with its variants, including the methods only using the first component to represent documents directly and the fully connected neural network for classification (denoted as CNN-only or LSTM-only) and the methods without using the highway network (denoted as CNN-LSTM or LSTM-LSTM). The performance of all methods was measured by micro-averaged precision, recall, and F1 score. RESULTS The micro-averaged F1 scores of CNN-only, LSTM-only, CNN-LSTM, LSTM-LSTM, CNN-Highway-LSTM, and LSTM-Highway-LSTM were 85.24%, 84.25%, 87.27%, 88.68%, 88.48%, and 90.21%, respectively. The highest micro-averaged F1 score is higher than our submitted 1 of 88.55%, which is 1 of the top-ranked results in the challenge. The results indicate that the proposed method is effective for cohort selection for clinical trials. DISCUSSION Although the proposed method achieved promising results, some mistakes were caused by word ambiguity, negation, number analysis and incomplete dictionary. Moreover, imbalanced data was another challenge that needs to be tackled in the future. CONCLUSION In this article, we proposed a hierarchical neural network for cohort selection. Experimental results show that this method is good at selecting cohort.
Collapse
Affiliation(s)
- Ying Xiong
- Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
| | - Xue Shi
- Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
| | - Shuai Chen
- Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
| | - Dehuan Jiang
- Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
| | - Buzhou Tang
- Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
| | - Xiaolong Wang
- Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
| | - Qingcai Chen
- Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
| | - Jun Yan
- Yidu Cloud (Beijing) Technology Co., Ltd, Beijing, China
| |
Collapse
|
66
|
Qayyum A, Qadir J, Bilal M, Al-Fuqaha A. Secure and Robust Machine Learning for Healthcare: A Survey. IEEE Rev Biomed Eng 2021; 14:156-180. [PMID: 32746371 DOI: 10.1109/rbme.2020.3013489] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Recent years have witnessed widespread adoption of machine learning (ML)/deep learning (DL) techniques due to their superior performance for a variety of healthcare applications ranging from the prediction of cardiac arrest from one-dimensional heart signals to computer-aided diagnosis (CADx) using multi-dimensional medical images. Notwithstanding the impressive performance of ML/DL, there are still lingering doubts regarding the robustness of ML/DL in healthcare settings (which is traditionally considered quite challenging due to the myriad security and privacy issues involved), especially in light of recent results that have shown that ML/DL are vulnerable to adversarial attacks. In this paper, we present an overview of various application areas in healthcare that leverage such techniques from security and privacy point of view and present associated challenges. In addition, we present potential methods to ensure secure and privacy-preserving ML for healthcare applications. Finally, we provide insight into the current research challenges and promising directions for future research.
Collapse
|
67
|
Sarker IH. Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective. SN COMPUTER SCIENCE 2021; 2:377. [PMID: 34278328 PMCID: PMC8274472 DOI: 10.1007/s42979-021-00765-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 07/02/2021] [Indexed: 02/07/2023]
Abstract
The digital world has a wealth of data, such as internet of things (IoT) data, business data, health data, mobile data, urban data, security data, and many more, in the current age of the Fourth Industrial Revolution (Industry 4.0 or 4IR). Extracting knowledge or useful insights from these data can be used for smart decision-making in various applications domains. In the area of data science, advanced analytics methods including machine learning modeling can provide actionable insights or deeper knowledge about data, which makes the computing process automatic and smart. In this paper, we present a comprehensive view on "Data Science" including various types of advanced analytics methods that can be applied to enhance the intelligence and capabilities of an application through smart decision-making in different scenarios. We also discuss and summarize ten potential real-world application domains including business, healthcare, cybersecurity, urban and rural data science, and so on by taking into account data-driven smart computing and decision making. Based on this, we finally highlight the challenges and potential research directions within the scope of our study. Overall, this paper aims to serve as a reference point on data science and advanced analytics to the researchers and decision-makers as well as application developers, particularly from the data-driven solution point of view for real-world problems.
Collapse
Affiliation(s)
- Iqbal H. Sarker
- Swinburne University of Technology, Melbourne, VIC 3122 Australia ,Department of Computer Science and Engineering, Chittagong University of Engineering & Technology, Chittagong, 4349 Bangladesh
| |
Collapse
|
68
|
Leary E, Stoker AM, Cook JL. Classification, Categorization, and Algorithms for Articular Cartilage Defects. J Knee Surg 2020; 33:1069-1077. [PMID: 32663886 DOI: 10.1055/s-0040-1713778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
There is a critical unmet need in the clinical implementation of valid preventative and therapeutic strategies for patients with articular cartilage pathology based on the significant gap in understanding of the relationships between diagnostic data, disease progression, patient-related variables, and symptoms. In this article, the current state of classification and categorization for articular cartilage pathology is discussed with particular focus on machine learning methods and the authors propose a bedside-bench-bedside approach with highly quantitative techniques as a solution to these hurdles. Leveraging computational learning with available data toward articular cartilage pathology patient phenotyping holds promise for clinical research and will likely be an important tool to identify translational solutions into evidence-based clinical applications to benefit patients. Recommendations for successful implementation of these approaches include using standardized definitions of articular cartilage, to include characterization of depth, size, location, and number; using measurements that minimize subjectivity or validated patient-reported outcome measures; considering not just the articular cartilage pathology but the whole joint, and the patient perception and perspective. Application of this approach through a multistep process by a multidisciplinary team of clinicians and scientists holds promise for validating disease mechanism-based phenotypes toward clinically relevant understanding of articular cartilage pathology for evidence-based application to orthopaedic practice.
Collapse
Affiliation(s)
- Emily Leary
- Thompson Laboratory for Regenerative Orthopaedics, University of Missouri, Columbia, Missouri.,Department of Orthopaedic Surgery, University of Missouri, Columbia, Missouri
| | - Aaron M Stoker
- Thompson Laboratory for Regenerative Orthopaedics, University of Missouri, Columbia, Missouri.,Department of Orthopaedic Surgery, University of Missouri, Columbia, Missouri
| | - James L Cook
- Thompson Laboratory for Regenerative Orthopaedics, University of Missouri, Columbia, Missouri.,Department of Orthopaedic Surgery, University of Missouri, Columbia, Missouri
| |
Collapse
|
69
|
Frontoni E, Romeo L, Bernardini M, Moccia S, Migliorelli L, Paolanti M, Ferri A, Misericordia P, Mancini A, Zingaretti P. A Decision Support System for Diabetes Chronic Care Models Based on General Practitioner Engagement and EHR Data Sharing. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2020; 8:3000112. [PMID: 33150095 PMCID: PMC7605604 DOI: 10.1109/jtehm.2020.3031107] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 09/16/2020] [Accepted: 10/10/2020] [Indexed: 12/19/2022]
Abstract
Objective Decision support systems (DSS) have been developed and promoted for their potential to improve quality of health care. However, there is a lack of common clinical strategy and a poor management of clinical resources and erroneous implementation of preventive medicine. Methods To overcome this problem, this work proposed an integrated system that relies on the creation and sharing of a database extracted from GPs' Electronic Health Records (EHRs) within the Netmedica Italian (NMI) cloud infrastructure. Although the proposed system is a pilot application specifically tailored for improving the chronic Type 2 Diabetes (T2D) care it could be easily targeted to effectively manage different chronic-diseases. The proposed DSS is based on EHR structure used by GPs in their daily activities following the most updated guidelines in data protection and sharing. The DSS is equipped with a Machine Learning (ML) method for analyzing the shared EHRs and thus tackling the high variability of EHRs. A novel set of T2D care-quality indicators are used specifically to determine the economic incentives and the T2D features are presented as predictors of the proposed ML approach. Results The EHRs from 41237 T2D patients were analyzed. No additional data collection, with respect to the standard clinical practice, was required. The DSS exhibited competitive performance (up to an overall accuracy of 98%±2% and macro-recall of 96%±1%) for classifying chronic care quality across the different follow-up phases. The chronic care quality model brought to a significant increase (up to 12%) of the T2D patients without complications. For GPs who agreed to use the proposed system, there was an economic incentive. A further bonus was assigned when performance targets are achieved. Conclusions The quality care evaluation in a clinical use-case scenario demonstrated how the empowerment of the GPs through the use of the platform (integrating the proposed DSS), along with the economic incentives, may speed up the improvement of care.
Collapse
Affiliation(s)
- Emanuele Frontoni
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Luca Romeo
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Michele Bernardini
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Sara Moccia
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Lucia Migliorelli
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Marina Paolanti
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Alessandro Ferri
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | | | - Adriano Mancini
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| | - Primo Zingaretti
- Department of Information EngineeringUniversità Politecnica delle Marche60131AnconaItaly
| |
Collapse
|
70
|
D'Adamo GL, Widdop JT, Giles EM. The future is now? Clinical and translational aspects of "Omics" technologies. Immunol Cell Biol 2020; 99:168-176. [PMID: 32924178 DOI: 10.1111/imcb.12404] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 09/09/2020] [Accepted: 09/09/2020] [Indexed: 12/16/2022]
Abstract
Big data has become a central part of medical research, as well as modern life generally. "Omics" technologies include genomics, proteomics, microbiomics and increasingly other omics. These have been driven by rapid advances in laboratory techniques and equipment. Crucially, improved information handling capabilities have allowed concepts such as artificial intelligence and machine learning to enter the research world. The COVID-19 pandemic has shown how quickly information can be generated and analyzed using such approaches, but also showed its limitations. This review will look at how "omics" has begun to be translated into clinical practice. While there appears almost limitless potential in using big data for "precision" or "personalized" medicine, the reality is that this remains largely aspirational. Oncology is the only field of medicine that is widely adopting such technologies, and even in this field uptake is irregular. There are practical and ethical reasons for this lack of translation of increasingly affordable techniques into the clinic. Undoubtedly, there will be increasing use of large data sets from traditional (e.g. tumor samples, patient genomics) and nontraditional (e.g. smartphone) sources. It is perhaps the greatest challenge of the health-care sector over the coming decade to integrate these resources in an effective, practical and ethical way.
Collapse
Affiliation(s)
- Gemma L D'Adamo
- Centre for Innate Immunity and Infectious Disease, Hudson Institute of Medical Research, Clayton, VIC, Australia
| | - James T Widdop
- Centre for Innate Immunity and Infectious Disease, Hudson Institute of Medical Research, Clayton, VIC, Australia
| | - Edward M Giles
- Centre for Innate Immunity and Infectious Disease, Hudson Institute of Medical Research, Clayton, VIC, Australia.,Department of Paediatrics, Monash University, Clayton, VIC, Australia
| |
Collapse
|
71
|
Sampa MB, Hossain MN, Hoque MR, Islam R, Yokota F, Nishikitani M, Ahmed A. Blood Uric Acid Prediction With Machine Learning: Model Development and Performance Comparison. JMIR Med Inform 2020; 8:e18331. [PMID: 33030442 PMCID: PMC7582147 DOI: 10.2196/18331] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 07/16/2020] [Accepted: 08/10/2020] [Indexed: 02/06/2023] Open
Abstract
Background Uric acid is associated with noncommunicable diseases such as cardiovascular diseases, chronic kidney disease, coronary artery disease, stroke, diabetes, metabolic syndrome, vascular dementia, and hypertension. Therefore, uric acid is considered to be a risk factor for the development of noncommunicable diseases. Most studies on uric acid have been performed in developed countries, and the application of machine-learning approaches in uric acid prediction in developing countries is rare. Different machine-learning algorithms will work differently on different types of data in various diseases; therefore, a different investigation is needed for different types of data to identify the most accurate algorithms. Specifically, no study has yet focused on the urban corporate population in Bangladesh, despite the high risk of developing noncommunicable diseases for this population. Objective The aim of this study was to develop a model for predicting blood uric acid values based on basic health checkup test results, dietary information, and sociodemographic characteristics using machine-learning algorithms. The prediction of health checkup test measurements can be very helpful to reduce health management costs. Methods Various machine-learning approaches were used in this study because clinical input data are not completely independent and exhibit complex interactions. Conventional statistical models have limitations to consider these complex interactions, whereas machine learning can consider all possible interactions among input data. We used boosted decision tree regression, decision forest regression, Bayesian linear regression, and linear regression to predict personalized blood uric acid based on basic health checkup test results, dietary information, and sociodemographic characteristics. We evaluated the performance of these five widely used machine-learning models using data collected from 271 employees in the Grameen Bank complex of Dhaka, Bangladesh. Results The mean uric acid level was 6.63 mg/dL, indicating a borderline result for the majority of the sample (normal range <7.0 mg/dL). Therefore, these individuals should be monitoring their uric acid regularly. The boosted decision tree regression model showed the best performance among the models tested based on the root mean squared error of 0.03, which is also better than that of any previously reported model. Conclusions A uric acid prediction model was developed based on personal characteristics, dietary information, and some basic health checkup measurements. This model will be useful for improving awareness among high-risk individuals and populations, which can help to save medical costs. A future study could include additional features (eg, work stress, daily physical activity, alcohol intake, eating red meat) in improving prediction.
Collapse
Affiliation(s)
- Masuda Begum Sampa
- Department of Advanced Information Technology, Kyushu University, Fukuoka, Japan
| | - Md Nazmul Hossain
- Department of Marketing, Faculty of Business Studies, University of Dhaka, Dhaka, Bangladesh
| | - Md Rakibul Hoque
- School of Business, Emporia State University, Kansas, KS, United States
| | - Rafiqul Islam
- Medical Information Center, Kyushu University Hospital, Fukuoka, Japan
| | - Fumihiko Yokota
- Institute of Decision Science for a Sustainable Society, Kyushu University, Fukuoka, Japan
| | | | - Ashir Ahmed
- Department of Advanced Information Technology, Kyushu University, Fukuoka, Japan
| |
Collapse
|
72
|
Tang Y, Gao R, Lee HH, Wells QS, Spann A, Terry JG, Carr JJ, Huo Y, Bao S, Landman BA. Prediction of Type II Diabetes Onset with Computed Tomography and Electronic Medical Records. MULTIMODAL LEARNING FOR CLINICAL DECISION SUPPORT AND CLINICAL IMAGE-BASED PROCEDURES : 10TH INTERNATIONAL WORKSHOP, ML-CDS 2020, AND 9TH INTERNATIONAL WORKSHOP, CLIP 2020, HELD IN CONJUNCTION WITH MICCAI 2020, LIMA, PERU, OCTOBER 4-8, ... 2020; 12445:13-23. [PMID: 34113927 PMCID: PMC8188902 DOI: 10.1007/978-3-030-60946-7_2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Type II diabetes mellitus (T2DM) is a significant public health concern with multiple known risk factors (e.g., body mass index (BMI), body fat distribution, glucose levels). Improved prediction or prognosis would enable earlier intervention before possibly irreversible damage has occurred. Meanwhile, abdominal computed tomography (CT) is a relatively common imaging technique. Herein, we explore secondary use of the CT imaging data to refine the risk profile of future diagnosis of T2DM. In this work, we delineate quantitative information and imaging slices of patient history to predict onset T2DM retrieved from ICD-9 codes at least one year in the future. Furthermore, we investigate the role of five different types of electronic medical records (EMR), specifically 1) demographics; 2) pancreas volume; 3) visceral/subcutaneous fat volumes in L2 region of interest; 4) abdominal body fat distribution and 5) glucose lab tests in prediction. Next, we build a deep neural network to predict onset T2DM with pancreas imaging slices. Finally, motivated by multi-modal machine learning, we construct a merged framework to combine CT imaging slices with EMR information to refine the prediction. We empirically demonstrate our proposed joint analysis involving images and EMR leads to 4.25% and 6.93% AUC increase in predicting T2DM compared with only using images or EMR. In this study, we used case-control dataset of 997 subjects with CT scans and contextual EMR scores. To the best of our knowledge, this is the first work to show the ability to prognose T2DM using the patients' contextual and imaging history. We believe this study has promising potential for heterogeneous data analysis and multi-modal medical applications.
Collapse
Affiliation(s)
| | | | | | | | - Ashley Spann
- Vanderbilt University Medical Center, , Nashville, USA
| | - James G Terry
- Vanderbilt University Medical Center, , Nashville, USA
| | - John J Carr
- Vanderbilt University Medical Center, , Nashville, USA
| | | | | | - Bennett A Landman
- Vanderbilt University, , Nashville, USA
- Vanderbilt University Medical Center, , Nashville, USA
| |
Collapse
|
73
|
Luo YF, Henry S, Wang Y, Shen F, Uzuner O, Rumshisky A. The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records. J Am Med Inform Assoc 2020; 27:1529-1537. [PMID: 32968800 PMCID: PMC7647359 DOI: 10.1093/jamia/ocaa106] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 05/01/2020] [Accepted: 05/14/2020] [Indexed: 01/19/2023] Open
Abstract
OBJECTIVE The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task track 3, focused on medical concept normalization (MCN) in clinical records. This track aimed to assess the state of the art in identifying and matching salient medical concepts to a controlled vocabulary. In this paper, we describe the task, describe the data set used, compare the participating systems, present results, identify the strengths and limitations of the current state of the art, and identify directions for future research. MATERIALS AND METHODS Participating teams were provided with narrative discharge summaries in which text spans corresponding to medical concepts were identified. This paper refers to these text spans as mentions. Teams were tasked with normalizing these mentions to concepts, represented by concept unique identifiers, within the Unified Medical Language System. Submitted systems represented 4 broad categories of approaches: cascading dictionary matching, cosine distance, deep learning, and retrieve-and-rank systems. Disambiguation modules were common across all approaches. RESULTS A total of 33 teams participated in the MCN task. The best-performing team achieved an accuracy of 0.8526. The median and mean performances among all teams were 0.7733 and 0.7426, respectively. CONCLUSIONS Overall performance among the top 10 teams was high. However, several mention types were challenging for all teams. These included mentions requiring disambiguation of misspelled words, acronyms, abbreviations, and mentions with more than 1 possible semantic type. Also challenging were complex mentions of long, multi-word terms that may require new ways of extracting and representing mention meaning, the use of domain knowledge, parse trees, or hand-crafted rules.
Collapse
Affiliation(s)
- Yen-Fu Luo
- Department of Computer Science, University of Massachusetts
Lowell, Lowell, Massachusetts, USA
| | - Sam Henry
- Department of Information Sciences and Technology, George Mason
University, Fairfax, Virginia, USA
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester,
New York, USA
| | - Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, Rochester,
New York, USA
| | - Ozlem Uzuner
- Department of Information Sciences and Technology, George Mason
University, Fairfax, Virginia, USA
- Department of Biomedical Informatics, Harvard Medical School,
Boston, Massachusetts, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts
Institute of Technology, Cambridge, Massachusetts, USA
| | - Anna Rumshisky
- Department of Computer Science, University of Massachusetts
Lowell, Lowell, Massachusetts, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts
Institute of Technology, Cambridge, Massachusetts, USA
| |
Collapse
|
74
|
Alfian G, Syafrudin M, Anshari M, Benes F, Atmaji FTD, Fahrurrozi I, Hidayatullah AF, Rhee J. Blood glucose prediction model for type 1 diabetes based on artificial neural network with time-domain features. Biocybern Biomed Eng 2020. [DOI: 10.1016/j.bbe.2020.10.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
75
|
Sengupta PP, Shrestha S, Berthon B, Messas E, Donal E, Tison GH, Min JK, D'hooge J, Voigt JU, Dudley J, Verjans JW, Shameer K, Johnson K, Lovstakken L, Tabassian M, Piccirilli M, Pernot M, Yanamala N, Duchateau N, Kagiyama N, Bernard O, Slomka P, Deo R, Arnaout R. Proposed Requirements for Cardiovascular Imaging-Related Machine Learning Evaluation (PRIME): A Checklist: Reviewed by the American College of Cardiology Healthcare Innovation Council. JACC Cardiovasc Imaging 2020; 13:2017-2035. [PMID: 32912474 PMCID: PMC7953597 DOI: 10.1016/j.jcmg.2020.07.015] [Citation(s) in RCA: 125] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 07/15/2020] [Accepted: 07/16/2020] [Indexed: 12/20/2022]
Abstract
Machine learning (ML) has been increasingly used within cardiology, particularly in the domain of cardiovascular imaging. Due to the inherent complexity and flexibility of ML algorithms, inconsistencies in the model performance and interpretation may occur. Several review articles have been recently published that introduce the fundamental principles and clinical application of ML for cardiologists. This paper builds on these introductory principles and outlines a more comprehensive list of crucial responsibilities that need to be completed when developing ML models. This paper aims to serve as a scientific foundation to aid investigators, data scientists, authors, editors, and reviewers involved in machine learning research with the intent of uniform reporting of ML investigations. An independent multidisciplinary panel of ML experts, clinicians, and statisticians worked together to review the theoretical rationale underlying 7 sets of requirements that may reduce algorithmic errors and biases. Finally, the paper summarizes a list of reporting items as an itemized checklist that highlights steps for ensuring correct application of ML models and the consistent reporting of model specifications and results. It is expected that the rapid pace of research and development and the increased availability of real-world evidence may require periodic updates to the checklist.
Collapse
Affiliation(s)
- Partho P Sengupta
- West Virginia University Heart and Vascular Institute, Division of Cardiology, Morgantown, West Virginia.
| | - Sirish Shrestha
- West Virginia University Heart and Vascular Institute, Division of Cardiology, Morgantown, West Virginia
| | - Béatrice Berthon
- Physique pour la Médecine Paris, Inserm U1273, CNRS FRE 2031, ESPCI Paris, PSL Research University, Paris, France
| | - Emmanuel Messas
- Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - Erwan Donal
- Département de Cardiologie et Maladies Vasculaires, Service de Cardiologie et maladies vasculaires, CHU Rennes, Rennes, France
| | - Geoffrey H Tison
- Division of Cardiology, Department of Medicine, University of California San Francisco, San Francisco, California
| | | | - Jan D'hooge
- Laboratory on Cardiovascular Imaging and Dynamics, Department of Cardiovascular Sciences, KU Leuven, Leuven, Belgium
| | - Jens-Uwe Voigt
- Department of Cardiovascular Science, KU Leuven, Leuven, Belgium; Department of Cardiovascular Diseases, University Hospitals Leuven, Belgium
| | - Joel Dudley
- Department of Genetics and Genomic Sciences and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York; Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Johan W Verjans
- Australian Institute for Machine Learning, University of Adelaide, North Terrace, Adelaide, South Australia, Australia; Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Khader Shameer
- Department of Genetics and Genomic Sciences and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York; Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Kipp Johnson
- Department of Genetics and Genomic Sciences and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York; Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Lasse Lovstakken
- Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Mahdi Tabassian
- Laboratory on Cardiovascular Imaging and Dynamics, Department of Cardiovascular Sciences, KU Leuven, Leuven, Belgium
| | - Marco Piccirilli
- West Virginia University Heart and Vascular Institute, Division of Cardiology, Morgantown, West Virginia
| | - Mathieu Pernot
- Physique pour la Médecine Paris, Inserm U1273, CNRS FRE 2031, ESPCI Paris, PSL Research University, Paris, France
| | - Naveena Yanamala
- West Virginia University Heart and Vascular Institute, Division of Cardiology, Morgantown, West Virginia
| | - Nicolas Duchateau
- CREATIS, CNRS UMR 5220, INSERM U1206, Université Lyon 1, INSA-LYON, France
| | - Nobuyuki Kagiyama
- West Virginia University Heart and Vascular Institute, Division of Cardiology, Morgantown, West Virginia
| | - Olivier Bernard
- CREATIS, CNRS UMR 5220, INSERM U1206, Université Lyon 1, INSA-LYON, France
| | - Piotr Slomka
- Department of Imaging and Medicine, Cedars-Sinai Medical Center, Los Angeles, California
| | - Rahul Deo
- Division of Cardiology, Department of Medicine, University of California San Francisco, San Francisco, California
| | - Rima Arnaout
- Division of Cardiology, Department of Medicine, University of California San Francisco, San Francisco, California
| |
Collapse
|
76
|
Jadhav AS, Patil PB, Biradar S. Analysis on diagnosing diabetic retinopathy by segmenting blood vessels, optic disc and retinal abnormalities. J Med Eng Technol 2020; 44:299-316. [PMID: 32729345 DOI: 10.1080/03091902.2020.1791986] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
The main intention of mass screening programmes for Diabetic Retinopathy (DR) is to detect and diagnose the disorder earlier than it leads to vision loss. Automated analysis of retinal images has the likelihood to improve the efficacy of screening programmes when compared over the manual image analysis. This article plans to develop a framework for the detection of DR from the retinal fundus images using three evaluations based on optic disc, blood vessels and retinal abnormalities. Initially, the pre-processing steps like green channel conversion and Contrast Limited Adaptive Histogram Equalisation is done. Further, the segmentation procedure starts with optic disc segmentation by open-close watershed transform, blood vessel segmentation by grey level thresholding and abnormality segmentation (hard exudates, haemorrhages, Microaneurysm and soft exudates) by top hat transform and Gabor filtering mechanisms. From the three segmented images, the feature like local binary pattern, texture energy measurement, Shanon's and Kapur's entropy are extracted, which is subjected to optimal feature selection process using the new hybrid optimisation algorithm termed as Trial-based Bypass Improved Dragonfly Algorithm (TB - DA). These features are given to hybrid machine learning algorithm with the combination of NN and DBN. As a modification, the same hybrid TB - DA is used to enhance the training of hybrid classifier, which outputs the categorisation as normal, mild, moderate or severe images based on three components.
Collapse
Affiliation(s)
- Ambaji S Jadhav
- Department of Electrical and Electronics, B.L.D.E.A's V.P. Dr. P.G. Halakatti College of Engineering & Technology (Affiliated to Visvesvaraya Technological University, Belagavi), Vijayapur, India
| | - Pushpa B Patil
- Department of Computer Science & Engineering, B.L.D.E.A's V.P. Dr. P.G. Halakatti College of Engineering & Technology (Affiliated to Visvesvaraya Technological University, Belagavi), Vijayapur, India
| | - Sunil Biradar
- Department of Ophthalmology, Shri B.M. Patil Medical College Hospital and Research Center, Vijayapur, India
| |
Collapse
|
77
|
Ruan Y, Bellot A, Moysova Z, Tan GD, Lumb A, Davies J, van der Schaar M, Rea R. Predicting the Risk of Inpatient Hypoglycemia With Machine Learning Using Electronic Health Records. Diabetes Care 2020; 43:1504-1511. [PMID: 32350021 DOI: 10.2337/dc19-1743] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 04/04/2020] [Indexed: 02/03/2023]
Abstract
OBJECTIVE We analyzed data from inpatients with diabetes admitted to a large university hospital to predict the risk of hypoglycemia through the use of machine learning algorithms. RESEARCH DESIGN AND METHODS Four years of data were extracted from a hospital electronic health record system. This included laboratory and point-of-care blood glucose (BG) values to identify biochemical and clinically significant hypoglycemic episodes (BG ≤3.9 and ≤2.9 mmol/L, respectively). We used patient demographics, administered medications, vital signs, laboratory results, and procedures performed during the hospital stays to inform the model. Two iterations of the data set included the doses of insulin administered and the past history of inpatient hypoglycemia. Eighteen different prediction models were compared using the area under the receiver operating characteristic curve (AUROC) through a 10-fold cross validation. RESULTS We analyzed data obtained from 17,658 inpatients with diabetes who underwent 32,758 admissions between July 2014 and August 2018. The predictive factors from the logistic regression model included people undergoing procedures, weight, type of diabetes, oxygen saturation level, use of medications (insulin, sulfonylurea, and metformin), and albumin levels. The machine learning model with the best performance was the XGBoost model (AUROC 0.96). This outperformed the logistic regression model, which had an AUROC of 0.75 for the estimation of the risk of clinically significant hypoglycemia. CONCLUSIONS Advanced machine learning models are superior to logistic regression models in predicting the risk of hypoglycemia in inpatients with diabetes. Trials of such models should be conducted in real time to evaluate their utility to reduce inpatient hypoglycemia.
Collapse
Affiliation(s)
- Yue Ruan
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Oxford University Hospitals National Health Service Foundation Trust, Oxford, U.K.,Oxford National Institute for Health Research Biomedical Research Centre, Oxford, U.K.,Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, U.K
| | - Alexis Bellot
- Department of Mathematics, University of Cambridge, Cambridge, U.K.,Alan Turing Institute, London, U.K
| | - Zuzana Moysova
- Big Data Institute, University of Oxford Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, U.K
| | - Garry D Tan
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Oxford University Hospitals National Health Service Foundation Trust, Oxford, U.K.,Oxford National Institute for Health Research Biomedical Research Centre, Oxford, U.K
| | - Alistair Lumb
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Oxford University Hospitals National Health Service Foundation Trust, Oxford, U.K.,Oxford National Institute for Health Research Biomedical Research Centre, Oxford, U.K
| | - Jim Davies
- Big Data Institute, University of Oxford Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, U.K
| | - Mihaela van der Schaar
- Department of Mathematics, University of Cambridge, Cambridge, U.K.,Alan Turing Institute, London, U.K
| | - Rustam Rea
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Oxford University Hospitals National Health Service Foundation Trust, Oxford, U.K. .,Oxford National Institute for Health Research Biomedical Research Centre, Oxford, U.K
| |
Collapse
|
78
|
Srivastava AK, Kumar Y, Singh PK. A Rule-Based Monitoring System for Accurate Prediction of Diabetes. INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS 2020. [DOI: 10.4018/ijehmc.2020070103] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Diabetes is a chronic disease that can affect the life of people due to high sugar level in their blood. The sugar level is increased due to a lack of production of insulin in the human body. Large numbers of people are affected with diabetes and it can increase tremendously due life style behavior. Diabetes can also affect the other human organs, like kidneys, hearts, retinas and lead to the failure of these organs. This article presents a diabetic monitoring system to determine the risk of diabetes based on the personal health record of patients. In this work, several rules are designed based on the clinical as well as non-clinical symptoms. The effectiveness of the diabetes monitoring system is tested on a set of two hundred forty people. The simulation results are also compared with well-known techniques available for diabetes prediction. It is stated that proposed monitoring system obtains 90.41% accuracy rate as compared with other techniques.
Collapse
Affiliation(s)
| | - Yugal Kumar
- Jaypee University of Information Technology, India
| | | |
Collapse
|
79
|
Yang T, Zhang L, Yi L, Feng H, Li S, Chen H, Zhu J, Zhao J, Zeng Y, Liu H. Ensemble Learning Models Based on Noninvasive Features for Type 2 Diabetes Screening: Model Development and Validation. JMIR Med Inform 2020; 8:e15431. [PMID: 32554386 PMCID: PMC7333074 DOI: 10.2196/15431] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Revised: 12/22/2019] [Accepted: 02/07/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Early diabetes screening can effectively reduce the burden of disease. However, natural population-based screening projects require a large number of resources. With the emergence and development of machine learning, researchers have started to pursue more flexible and efficient methods to screen or predict type 2 diabetes. OBJECTIVE The aim of this study was to build prediction models based on the ensemble learning method for diabetes screening to further improve the health status of the population in a noninvasive and inexpensive manner. METHODS The dataset for building and evaluating the diabetes prediction model was extracted from the National Health and Nutrition Examination Survey from 2011-2016. After data cleaning and feature selection, the dataset was split into a training set (80%, 2011-2014), test set (20%, 2011-2014) and validation set (2015-2016). Three simple machine learning methods (linear discriminant analysis, support vector machine, and random forest) and easy ensemble methods were used to build diabetes prediction models. The performance of the models was evaluated through 5-fold cross-validation and external validation. The Delong test (2-sided) was used to test the performance differences between the models. RESULTS We selected 8057 observations and 12 attributes from the database. In the 5-fold cross-validation, the three simple methods yielded highly predictive performance models with areas under the curve (AUCs) over 0.800, wherein the ensemble methods significantly outperformed the simple methods. When we evaluated the models in the test set and validation set, the same trends were observed. The ensemble model of linear discriminant analysis yielded the best performance, with an AUC of 0.849, an accuracy of 0.730, a sensitivity of 0.819, and a specificity of 0.709 in the validation set. CONCLUSIONS This study indicates that efficient screening using machine learning methods with noninvasive tests can be applied to a large population and achieve the objective of secondary prevention.
Collapse
Affiliation(s)
- Tianzhou Yang
- School of Life Science, Liaoning University, Shenyang, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang, China
| | - Liwei Yi
- School of Information, Liaoning University, Shenyang, China
| | - Huawei Feng
- School of Life Science, Liaoning University, Shenyang, China
| | - Shimeng Li
- School of Life Science, Liaoning University, Shenyang, China
| | - Haoyu Chen
- School of Information, Liaoning University, Shenyang, China
| | - Junfeng Zhu
- School of Life Science, Liaoning University, Shenyang, China
| | - Jian Zhao
- School of Life Science, Liaoning University, Shenyang, China
| | - Yingyue Zeng
- School of Life Science, Liaoning University, Shenyang, China
| | - Hongsheng Liu
- School of Life Science, Liaoning University, Shenyang, China.,Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, Liaoning University, Shenyang, China.,Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, China
| |
Collapse
|
80
|
Tjandra D, Migrino RQ, Giordani B, Wiens J. Cohort discovery and risk stratification for Alzheimer's disease: an electronic health record-based approach. ALZHEIMER'S & DEMENTIA (NEW YORK, N. Y.) 2020; 6:e12035. [PMID: 32548236 PMCID: PMC7293993 DOI: 10.1002/trc2.12035] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 04/18/2020] [Indexed: 11/17/2022]
Abstract
BACKGROUND We sought to leverage data routinely collected in electronic health records (EHRs), with the goal of developing patient risk stratification tools for predicting risk of developing Alzheimer's disease (AD). METHOD Using EHR data from the University of Michigan (UM) hospitals and consensus-based diagnoses from the Michigan Alzheimer's Disease Research Center, we developed and validated a cohort discovery tool for identifying patients with AD. Applied to all UM patients, these labels were used to train an EHR-based machine learning model for predicting AD onset within 10 years. RESULTS Applied to a test cohort of 1697 UM patients, the model achieved an area under the receiver operating characteristics curve of 0.70 (95% confidence interval = 0.63-0.77). Important predictive factors included cardiovascular factors and laboratory blood testing. CONCLUSION Routinely collected EHR data can be used to predict AD onset with modest accuracy. Mining routinely collected data could shed light on early indicators of AD appearance and progression.
Collapse
Affiliation(s)
- Donna Tjandra
- Department of Electrical Engineering and Computer ScienceUniversity of MichiganAnn ArborMichiganUSA
| | - Raymond Q. Migrino
- Phoenix Veterans Affairs Health Care SystemPhoenixArizonaUSA
- Department of MedicineUniversity of Arizona College of Medicine‐PhoenixPhoenixArizonaUSA
| | - Bruno Giordani
- Department of Psychiatry, Neuropsychology ProgramUniversity of Michigan Ann ArborAnn ArborMichiganUSA
| | - Jenna Wiens
- Department of Electrical Engineering and Computer ScienceUniversity of MichiganAnn ArborMichiganUSA
| |
Collapse
|
81
|
Toward Prevention of Adverse Events Using Anticipatory Analytics. PROGRESS IN PREVENTIVE MEDICINE 2020. [DOI: 10.1097/pp9.0000000000000029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
82
|
Wang X, Yang Y, Xu Y, Chen Q, Wang H, Gao H. Predicting hypoglycemic drugs of type 2 diabetes based on weighted rank support vector machine. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105868] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
83
|
Dexter GP, Grannis SJ, Dixon BE, Kasthurirathne SN. Generalization of Machine Learning Approaches to Identify Notifiable Conditions from a Statewide Health Information Exchange. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020; 2020:152-161. [PMID: 32477634 PMCID: PMC7233074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Healthcare analytics is impeded by a lack of machine learning (ML) model generalizability, the ability of a model to predict accurately on varied data sources not included in the model's training dataset. We leveraged free-text laboratory data from a Health Information Exchange network to evaluate ML generalization using Notifiable Condition Detection (NCD) for public health surveillance as a use case. We 1) built ML models for detecting syphilis, salmonella, and histoplasmosis; 2) evaluated generalizability of these models across data from holdout lab systems, and; 3) explored factors that influence weak model generalizability. Models for predicting each disease reported considerable accuracy. However, they demonstrated poor generalizability across data from holdout lab systems being tested. Our evaluation determined that weak generalization was influenced by variant syntactic nature of free-text datasets across each lab system. Results highlight the need for actionable methodology to generalize ML solutions for healthcare analytics.
Collapse
Affiliation(s)
- Gregory P Dexter
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, IN, USA
| | - Shaun J Grannis
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, IN, USA
- Indiana University School of Medicine, Indianapolis, IN, USA
| | - Brian E Dixon
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, IN, USA
- Indiana University Richard M. Fairbanks School of Public Health, Indianapolis, IN, USA
| | - Suranga N Kasthurirathne
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, IN, USA
- Indiana University Richard M. Fairbanks School of Public Health, Indianapolis, IN, USA
| |
Collapse
|
84
|
Early temporal prediction of Type 2 Diabetes Risk Condition from a General Practitioner Electronic Health Record: A Multiple Instance Boosting Approach. Artif Intell Med 2020; 105:101847. [PMID: 32505428 DOI: 10.1016/j.artmed.2020.101847] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Revised: 02/12/2020] [Accepted: 03/20/2020] [Indexed: 11/22/2022]
Abstract
Early prediction of target patients at high risk of developing Type 2 diabetes (T2D) plays a significant role in preventing the onset of overt disease and its associated comorbidities. Although fundamental in early phases of T2D natural history, insulin resistance is not usually quantified by General Practitioners (GPs). Triglyceride-glucose (TyG) index has been proven useful in clinical studies for quantifying insulin resistance and for the early identification of individuals at T2D risk but still not applied by GPs for diagnostic purposes. The aim of this study is to propose a multiple instance learning boosting algorithm (MIL-Boost) for creating a predictive model capable of early prediction of worsening insulin resistance (low vs high T2D risk) in terms of TyG index. The MIL-Boost is applied to past electronic health record (EHR) patients' information stored by a single GP. The proposed MIL-Boost algorithm proved to be effective in dealing with this task, by performing better than the other state-of-the-art ML competitors (Recall from 0.70 and up to 0.83). The proposed MIL-based approach is able to extract hidden patterns from past EHR temporal data, even not directly exploiting triglycerides and glucose measurements. The major advantages of our method can be found in its ability to model the temporal evolution of longitudinal EHR data while dealing with small sample size and variability in the observations (e.g., a small variable number of prescriptions for non-hospitalized patients). The proposed algorithm may represent the main core of a clinical decision support system.
Collapse
|
85
|
Lanera C, Berchialla P, Baldi I, Lorenzoni G, Tramontan L, Scamarcia A, Cantarutti L, Giaquinto C, Gregori D. Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study. JMIR Med Inform 2020; 8:e14330. [PMID: 32369038 PMCID: PMC7238079 DOI: 10.2196/14330] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 08/28/2019] [Accepted: 12/16/2019] [Indexed: 12/11/2022] Open
Abstract
Background The detection of infectious diseases through the analysis of free text on electronic health reports (EHRs) can provide prompt and accurate background information for the implementation of preventative measures, such as advertising and monitoring the effectiveness of vaccination campaigns. Objective The purpose of this paper is to compare machine learning techniques in their application to EHR analysis for disease detection. Methods The Pedianet database was used as a data source for a real-world scenario on the identification of cases of varicella. The models’ training and test sets were based on two different Italian regions’ (Veneto and Sicilia) data sets of 7631 patients and 1,230,355 records, and 2347 patients and 569,926 records, respectively, for whom a gold standard of varicella diagnosis was available. Elastic-net regularized generalized linear model (GLMNet), maximum entropy (MAXENT), and LogitBoost (boosting) algorithms were implemented in a supervised environment and 5-fold cross-validated. The document-term matrix generated by the training set involves a dictionary of 1,871,532 tokens. The analysis was conducted on a subset of 29,096 tokens, corresponding to a matrix with no more than a 99% sparsity ratio. Results The highest predictive values were achieved through boosting (positive predicative value [PPV] 63.1, 95% CI 42.7-83.5 and negative predicative value [NPV] 98.8, 95% CI 98.3-99.3). GLMNet delivered superior predictive capability compared to MAXENT (PPV 24.5% and NPV 98.3% vs PPV 11.0% and NPV 98.0%). MAXENT and GLMNet predictions weakly agree with each other (agreement coefficient 1 [AC1]=0.60, 95% CI 0.58-0.62), as well as with LogitBoost (MAXENT: AC1=0.64, 95% CI 0.63-0.66 and GLMNet: AC1=0.53, 95% CI 0.51-0.55). Conclusions Boosting has demonstrated promising performance in large-scale EHR-based infectious disease identification.
Collapse
Affiliation(s)
- Corrado Lanera
- Department of Cardiac Thoracic Vascular Sciences and Public Health, University of Padova, Unit of Biostatistics, Epidemiology and Public Health, Padova, Italy
| | - Paola Berchialla
- Department of Clinical and Biological Science, University of Turin, Torino, Italy
| | - Ileana Baldi
- Department of Cardiac Thoracic Vascular Sciences and Public Health, University of Padova, Unit of Biostatistics, Epidemiology and Public Health, Padova, Italy
| | - Giulia Lorenzoni
- Department of Cardiac Thoracic Vascular Sciences and Public Health, University of Padova, Unit of Biostatistics, Epidemiology and Public Health, Padova, Italy
| | | | | | | | - Carlo Giaquinto
- Department of Women's and Children's Health, University of Padova, Padova, Italy
| | - Dario Gregori
- Department of Cardiac Thoracic Vascular Sciences and Public Health, University of Padova, Unit of Biostatistics, Epidemiology and Public Health, Padova, Italy
| |
Collapse
|
86
|
Park Y, Ho JC. CaliForest: Calibrated Random Forest for Health Data. PROCEEDINGS OF THE ACM CONFERENCE ON HEALTH, INFERENCE, AND LEARNING 2020; 2020:40-50. [PMID: 34308443 DOI: 10.1145/3368555.3384461] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Real-world predictive models in healthcare should be evaluated in terms of discrimination, the ability to differentiate between high and low risk events, and calibration, or the accuracy of the risk estimates. Unfortunately, calibration is often neglected and only discrimination is analyzed. Calibration is crucial for personalized medicine as they play an increasing role in the decision making process. Since random forest is a popular model for many healthcare applications, we propose CaliForest, a new calibrated random forest. Unlike existing calibration methodologies, CaliForest utilizes the out-of-bag samples to avoid the explicit construction of a calibration set. We evaluated CaliForest on two risk prediction tasks obtained from the publicly-available MIMIC-III database. Evaluation on these binary prediction tasks demonstrates that CaliForest can achieve the same discriminative power as random forest while obtaining a better-calibrated model evaluated across six different metrics. CaliForest is published on the standard Python software repository and the code is openly available on Github.
Collapse
|
87
|
Li R, Chen Y, Ritchie MD, Moore JH. Electronic health records and polygenic risk scores for predicting disease risk. Nat Rev Genet 2020; 21:493-502. [PMID: 32235907 DOI: 10.1038/s41576-020-0224-1] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/02/2020] [Indexed: 01/03/2023]
Abstract
Accurate prediction of disease risk based on the genetic make-up of an individual is essential for effective prevention and personalized treatment. Nevertheless, to date, individual genetic variants from genome-wide association studies have achieved only moderate prediction of disease risk. The aggregation of genetic variants under a polygenic model shows promising improvements in prediction accuracies. Increasingly, electronic health records (EHRs) are being linked to patient genetic data in biobanks, which provides new opportunities for developing and applying polygenic risk scores in the clinic, to systematically examine and evaluate patient susceptibilities to disease. However, the heterogeneous nature of EHR data brings forth many practical challenges along every step of designing and implementing risk prediction strategies. In this Review, we present the unique considerations for using genotype and phenotype data from biobank-linked EHRs for polygenic risk prediction.
Collapse
Affiliation(s)
- Ruowang Li
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jason H Moore
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
88
|
Zhang L, Wang Y, Niu M, Wang C, Wang Z. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study. Sci Rep 2020; 10:4406. [PMID: 32157171 PMCID: PMC7064542 DOI: 10.1038/s41598-020-61123-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 02/19/2020] [Indexed: 01/19/2023] Open
Abstract
With the development of data mining, machine learning offers opportunities to improve discrimination by analyzing complex interactions among massive variables. To test the ability of machine learning algorithms for predicting risk of type 2 diabetes mellitus (T2DM) in a rural Chinese population, we focus on a total of 36,652 eligible participants from the Henan Rural Cohort Study. Risk assessment models for T2DM were developed using six machine learning algorithms, including logistic regression (LR), classification and regression tree (CART), artificial neural networks (ANN), support vector machine (SVM), random forest (RF) and gradient boosting machine (GBM). The model performance was measured in an area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, negative predictive value and area under precision recall curve. The importance of variables was identified based on each classifier and the shapley additive explanations approach. Using all available variables, all models for predicting risk of T2DM demonstrated strong predictive performance, with AUCs ranging between 0.811 and 0.872 using laboratory data and from 0.767 to 0.817 without laboratory data. Among them, the GBM model performed best (AUC: 0.872 with laboratory data and 0.817 without laboratory data). Performance of models plateaued when introduced 30 variables to each model except CART model. Among the top-10 variables across all methods were sweet flavor, urine glucose, age, heart rate, creatinine, waist circumference, uric acid, pulse pressure, insulin, and hypertension. New important risk factors (urinary indicators, sweet flavor) were not found in previous risk prediction methods, but determined by machine learning in our study. Through the results, machine learning methods showed competence in predicting risk of T2DM, leading to greater insights on disease risk factors with no priori assumption of causality.
Collapse
Affiliation(s)
- Liying Zhang
- School of Information Engineering, Zhengzhou University, Zhengzhou, Henan, P.R. China
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, P.R. China
| | - Yikang Wang
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, P.R. China
| | - Miaomiao Niu
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, P.R. China
| | - Chongjian Wang
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, P.R. China
| | - Zhenfei Wang
- School of Information Engineering, Zhengzhou University, Zhengzhou, Henan, P.R. China.
| |
Collapse
|
89
|
Bernardini M, Romeo L, Misericordia P, Frontoni E. Discovering the Type 2 Diabetes in Electronic Health Records Using the Sparse Balanced Support Vector Machine. IEEE J Biomed Health Inform 2020; 24:235-246. [DOI: 10.1109/jbhi.2019.2899218] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
90
|
Talaei-Khoei A, Tavana M, Wilson JM. A predictive analytics framework for identifying patients at risk of developing multiple medical complications caused by chronic diseases. Artif Intell Med 2019; 101:101750. [PMID: 31813486 DOI: 10.1016/j.artmed.2019.101750] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Revised: 07/07/2019] [Accepted: 10/30/2019] [Indexed: 01/22/2023]
Abstract
Chronic diseases often cause several medical complications. This paper aims to predict multiple complications among patients with a chronic disease. The literature uses single-task learning algorithms to predict complications independently and assumes no correlation among complications of chronic diseases. We propose two methods (independent prediction of complications with single-task learning and concurrent prediction of complications with multi-task learning) and show that medical complications of chronic diseases can be correlated. We use a case study and compare the performance of these two methods by predicting complications of hypertrophic cardiomyopathy on 106 predictors in 1078 electronic medical records from April 2009-April 2017, inclusive. The methods are implemented using logistic regression, artificial neural networks, decision trees, and support vector machines. The results show multi-task learning with logistic regression improves the performance of predictions in terms of both discrimination and calibration.
Collapse
Affiliation(s)
- Amir Talaei-Khoei
- Department of Information Systems, University of Nevada, Reno, USA; School of Software, University of Technology Sydney, Australia.
| | - Madjid Tavana
- Business Systems and Analytics Department, Distinguished Chair of Business Analytics, La Salle University, Philadelphia, USA; Business Information Systems Department, Faculty of Business Administration and Economics, University of Paderborn, Paderborn, Germany.
| | - James M Wilson
- School of Community Health Sciences, University of Nevada, Reno, USA.
| |
Collapse
|
91
|
Lanera C, Berchialla P, Sharma A, Minto C, Gregori D, Baldi I. Screening PubMed abstracts: is class imbalance always a challenge to machine learning? Syst Rev 2019; 8:317. [PMID: 31810495 PMCID: PMC6896747 DOI: 10.1186/s13643-019-1245-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 11/25/2019] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The growing number of medical literature and textual data in online repositories led to an exponential increase in the workload of researchers involved in citation screening for systematic reviews. This work aims to combine machine learning techniques and data preprocessing for class imbalance to identify the outperforming strategy to screen articles in PubMed for inclusion in systematic reviews. METHODS We trained four binary text classifiers (support vector machines, k-nearest neighbor, random forest, and elastic-net regularized generalized linear models) in combination with four techniques for class imbalance: random undersampling and oversampling with 50:50 and 35:65 positive to negative class ratios and none as a benchmark. We used textual data of 14 systematic reviews as case studies. Difference between cross-validated area under the receiver operating characteristic curve (AUC-ROC) for machine learning techniques with and without preprocessing (delta AUC) was estimated within each systematic review, separately for each classifier. Meta-analytic fixed-effect models were used to pool delta AUCs separately by classifier and strategy. RESULTS Cross-validated AUC-ROC for machine learning techniques (excluding k-nearest neighbor) without preprocessing was prevalently above 90%. Except for k-nearest neighbor, machine learning techniques achieved the best improvement in conjunction with random oversampling 50:50 and random undersampling 35:65. CONCLUSIONS Resampling techniques slightly improved the performance of the investigated machine learning techniques. From a computational perspective, random undersampling 35:65 may be preferred.
Collapse
Affiliation(s)
- Corrado Lanera
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac Thoracic Vascular Sciences and Public Health, University of Padova, Via Loredan, 18, 35131, Padova, Italy
| | - Paola Berchialla
- Department of Clinical and Biological Sciences, University of Torino, Torino, Italy
| | - Abhinav Sharma
- Department of Biological Sciences and Bioengineering, Indian Institute of Technology Kanpur, Kanpur, India
| | - Clara Minto
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac Thoracic Vascular Sciences and Public Health, University of Padova, Via Loredan, 18, 35131, Padova, Italy
| | - Dario Gregori
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac Thoracic Vascular Sciences and Public Health, University of Padova, Via Loredan, 18, 35131, Padova, Italy
| | - Ileana Baldi
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac Thoracic Vascular Sciences and Public Health, University of Padova, Via Loredan, 18, 35131, Padova, Italy.
| |
Collapse
|
92
|
Alexander J, Edwards RA, Manca L, Grugni R, Bonfanti G, Emir B, Whalen E, Watt S, Brodsky M, Parsons B. Integrating Machine Learning With Microsimulation to Classify Hypothetical, Novel Patients for Predicting Pregabalin Treatment Response Based on Observational and Randomized Data in Patients With Painful Diabetic Peripheral Neuropathy. Pragmat Obs Res 2019; 10:67-76. [PMID: 31802967 PMCID: PMC6827520 DOI: 10.2147/por.s214412] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2019] [Accepted: 10/15/2019] [Indexed: 11/23/2022] Open
Abstract
Purpose Variability in patient treatment responses can be a barrier to effective care. Utilization of available patient databases may improve the prediction of treatment responses. We evaluated machine learning methods to predict novel, individual patient responses to pregabalin for painful diabetic peripheral neuropathy, utilizing an agent-based modeling and simulation platform that integrates real-world observational study (OS) data and randomized clinical trial (RCT) data. Patients and methods The best supervised machine learning methods were selected (through literature review) and combined in a novel way for aligning patients with relevant subgroups that best enable prediction of pregabalin responses. Data were derived from a German OS of pregabalin (N=2642) and nine international RCTs (N=1320). Coarsened exact matching of OS and RCT patients was used and a hierarchical cluster analysis was implemented. We tested which machine learning methods would best align candidate patients with specific clusters that predict their pain scores over time. Cluster alignments would trigger assignments of cluster-specific time-series regressions with lagged variables as inputs in order to simulate "virtual" patients and generate 1000 trajectory variations for given novel patients. Results Instance-based machine learning methods (k-nearest neighbor, supervised fuzzy c-means) were selected for quantitative analyses. Each method alone correctly classified 56.7% and 39.1% of patients, respectively. An "ensemble method" (combining both methods) correctly classified 98.4% and 95.9% of patients in the training and testing datasets, respectively. Conclusion An ensemble combination of two instance-based machine learning techniques best accommodated different data types (dichotomous, categorical, continuous) and performed better than either technique alone in assigning novel patients to subgroups for predicting treatment outcomes using microsimulation. Assignment of novel patients to a cluster of similar patients has the potential to improve prediction of patient outcomes for chronic conditions in which initial treatment response can be incorporated using microsimulation. Clinical trial registries www.clinicaltrials.gov: NCT00156078, NCT00159679, NCT00143156, NCT00553475.
Collapse
Affiliation(s)
- Joe Alexander
- Global Medical Affairs, Pfizer Inc, New York, NY 10017, USA
| | - Roger A Edwards
- Health Services Consulting Corporation, Boxborough, MA 01719, USA
| | | | | | | | - Birol Emir
- Global Statistics, Pfizer Inc, New York, NY 10017, USA
| | - Ed Whalen
- Global Statistics, Pfizer Inc, New York, NY 10017, USA
| | - Steve Watt
- Global Medical Affairs, Pfizer Inc, New York, NY 10017, USA
| | - Marina Brodsky
- Global Medical Affairs, Pfizer Inc, Groton, CT 06340, USA
| | - Bruce Parsons
- Global Medical Product Evaluation, Pfizer Inc, New York, NY 10017, USA
| |
Collapse
|
93
|
Nguyen BP, Pham HN, Tran H, Nghiem N, Nguyen QH, Do TTT, Tran CT, Simpson CR. Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 182:105055. [PMID: 31505379 DOI: 10.1016/j.cmpb.2019.105055] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 08/17/2019] [Accepted: 08/27/2019] [Indexed: 06/10/2023]
Abstract
OBJECTIVE Diabetes is responsible for considerable morbidity, healthcare utilisation and mortality in both developed and developing countries. Currently, methods of treating diabetes are inadequate and costly so prevention becomes an important step in reducing the burden of diabetes and its complications. Electronic health records (EHRs) for each individual or a population have become important tools in understanding developing trends of diseases. Using EHRs to predict the onset of diabetes could improve the quality and efficiency of medical care. In this paper, we apply a wide and deep learning model that combines the strength of a generalised linear model with various features and a deep feed-forward neural network to improve the prediction of the onset of type 2 diabetes mellitus (T2DM). MATERIALS AND METHODS The proposed method was implemented by training various models into a logistic loss function using a stochastic gradient descent. We applied this model using public hospital record data provided by the Practice Fusion EHRs for the United States population. The dataset consists of de-identified electronic health records for 9948 patients, of which 1904 have been diagnosed with T2DM. Prediction of diabetes in 2012 was based on data obtained from previous years (2009-2011). The imbalance class of the model was handled by Synthetic Minority Oversampling Technique (SMOTE) for each cross-validation training fold to analyse the performance when synthetic examples for the minority class are created. We used SMOTE of 150 and 300 percent, in which 300 percent means that three new synthetic instances are created for each minority class instance. This results in the approximated diabetes:non-diabetes distributions in the training set of 1:2 and 1:1, respectively. RESULTS Our final ensemble model not using SMOTE obtained an accuracy of 84.28%, area under the receiver operating characteristic curve (AUC) of 84.13%, sensitivity of 31.17% and specificity of 96.85%. Using SMOTE of 150 and 300 percent did not improve AUC (83.33% and 82.12%, respectively) but increased sensitivity (49.40% and 71.57%, respectively) with a moderate decrease in specificity (90.16% and 76.59%, respectively). DISCUSSION AND CONCLUSIONS Our algorithm has further optimised the prediction of diabetes onset using a novel state-of-the-art machine learning algorithm: the wide and deep learning neural network architecture.
Collapse
Affiliation(s)
- Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand.
| | - Hung N Pham
- School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi 100000, Vietnam
| | - Hop Tran
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand
| | - Nhung Nghiem
- Department of Public Health, University of Otago, 23A Mein Street, Wellington 6021, New Zealand
| | - Quang H Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi 100000, Vietnam
| | - Trang T T Do
- Institute for Infocomm Research, Agency for Science, Technology and Research, 1 Fusionopolis Way, Singapore 138632, Singapore
| | - Cao Truong Tran
- Faculty of Information Technology, Le Quy Don Technical University, 236 Hoang Quoc Viet Street, Hanoi 100000, Vietnam
| | - Colin R Simpson
- Faculty of Health, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand; Usher Institute, The University of Edinburgh, Edinburgh, EH89AG, United Kingdom
| |
Collapse
|
94
|
Dong Y, Xu L, Fan Y, Xiang P, Gao X, Chen Y, Zhang W, Ge Q. A novel surgical predictive model for Chinese Crohn's disease patients. Medicine (Baltimore) 2019; 98:e17510. [PMID: 31725605 PMCID: PMC6867775 DOI: 10.1097/md.0000000000017510] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Due to the complexity of Crohn's disease (CD), it is difficult to predict disease course with a single stratification factor or biomarker. A logistic regression (LR) model has been proposed by Guizzetti et al to stratify patients with CD-related surgical risk, which could help decision-making on disease treatment. However, there are no reports on relevant studies on Chinese population. The aim of the study is to present and validate a novel surgical predictive model to facilitate therapeutic decision-making for Chinese CD patients. Data was extracted from retrospective full-mode electronic medical records, which contained 239 CD patients and 1524 instances. Two sub-datasets were generated according to different attribute selection strategies, both of which were split into training and testing sets randomly. The imbalanced data in the training sets was addressed by synthetic minority over-sampling technique (SMOTE) algorithm before model development. Seven predictive models were employed using 5 popular machine learning algorithms: random forest (RF), LR, support vector machine (SVM), decision tree (DT) and artificial neural networks (ANN). The performance of each model was evaluated by accuracy, precision, F1-score, true negative (TN) rate, and the area under the receiver operating characteristic curve (AuROC). The result revealed that RF outperformed all other baseline models on both sub-datasets. The 10 leading risk factors for CD-related surgery returned from RF for attribute ranking were changes of radiology, presence of a fistula, presence of an abscess, no infliximab use, enteroscopy findings, C-reactive protein, abdominal pain, white blood cells, erythrocyte sedimentation rate and platelet count. The proposed machine learning model can accurately predict the risk of surgical intervention in Chinese CD patients, which could be used to tailor and modify the treatment strategies for CD patients in clinical practice.
Collapse
Affiliation(s)
| | - Li Xu
- Department of Anorectal Surgery
| | | | - Ping Xiang
- Department of Radiology, The First Affiliated Hospital of Zhejiang Chinese Medical University, Zhejiang Provincial Hospital of TCM, Zhejiang International Exchange Center of Clinical TCM
| | - Xuning Gao
- Department of Radiology, The First Affiliated Hospital of Zhejiang Chinese Medical University, Zhejiang Provincial Hospital of TCM, Zhejiang International Exchange Center of Clinical TCM
| | - Yong Chen
- School of Information, Zhejiang University of Finance and Economics, Hangzhou 310018, China
| | - Wenyu Zhang
- School of Information, Zhejiang University of Finance and Economics, Hangzhou 310018, China
| | | |
Collapse
|
95
|
Abhari S, Niakan Kalhori SR, Ebrahimi M, Hasannejadasl H, Garavand A. Artificial Intelligence Applications in Type 2 Diabetes Mellitus Care: Focus on Machine Learning Methods. Healthc Inform Res 2019; 25:248-261. [PMID: 31777668 PMCID: PMC6859270 DOI: 10.4258/hir.2019.25.4.248] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 10/06/2019] [Accepted: 10/09/2019] [Indexed: 12/18/2022] Open
Abstract
Objectives The incidence of type 2 diabetes mellitus has increased significantly in recent years. With the development of artificial intelligence applications in healthcare, they are used for diagnosis, therapeutic decision making, and outcome prediction, especially in type 2 diabetes mellitus. This study aimed to identify the artificial intelligence (AI) applications for type 2 diabetes mellitus care. Methods This is a review conducted in 2018. We searched the PubMed, Web of Science, and Embase scientific databases, based on a combination of related mesh terms. The article selection process was based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). Finally, 31 articles were selected after inclusion and exclusion criteria were applied. Data gathering was done by using a data extraction form. Data were summarized and reported based on the study objectives. Results The main applications of AI for type 2 diabetes mellitus care were screening and diagnosis in different stages. Among all of the reviewed AI methods, machine learning methods with 71% (n = 22) were the most commonly applied techniques. Many applications were in multi method forms (23%). Among the machine learning algorithms applications, support vector machine (21%) and naive Bayesian (19%) were the most commonly used methods. The most important variables that were used in the selected studies were body mass index, fasting blood sugar, blood pressure, HbA1c, triglycerides, low-density lipoprotein, high-density lipoprotein, and demographic variables. Conclusions It is recommended to select optimal algorithms by testing various techniques. Support vector machine and naive Bayesian might achieve better performance than other applications due to the type of variables and targets in diabetes-related outcomes classification.
Collapse
Affiliation(s)
- Shahabeddin Abhari
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Sharareh R Niakan Kalhori
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Mehdi Ebrahimi
- Department of Internal Medicine, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.,Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Hajar Hasannejadasl
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Ali Garavand
- Department of Health Information Management and Technology, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
96
|
Kim J, Chang H, Kim D, Jang DH, Park I, Kim K. Machine learning for prediction of septic shock at initial triage in emergency department. J Crit Care 2019; 55:163-170. [PMID: 31734491 DOI: 10.1016/j.jcrc.2019.09.024] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 09/05/2019] [Accepted: 09/23/2019] [Indexed: 12/23/2022]
Abstract
BACKGROUND We hypothesized utilizing machine learning (ML) algorithms for screening septic shock in ED would provide better accuracy than qSOFA or MEWS. METHODS The study population was adult (≥20 years) patients visiting ED for suspected infection. Target event was septic shock within 24 h after arrival. Demographics, vital signs, level of consciousness, chief complaints (CC) and initial blood test results were used as predictors. CC were embedded into 16-dimensional vector space using singular value decomposition. Six base learners including support vector machine, gradient-boosting machine, random forest, multivariate adaptive regression splines and least absolute shrinkage and selection operator and ridge regression and their ensembles were tested. We also trained and tested MLP networks with various setting. RESULTS A total of 49,560 patients were included and 4817 (9.7%) had septic shock within 24 h. All ML classifiers significantly outperformed qSOFA score, MEWS and their age-sex adjusted versions with their AUROC ranging from 0.883 to 0.929. The ensembles of the base classifiers showed the best performance and addition of CC embedding was associated with statistically significant increases in performance. CONCLUSIONS ML classifiers significantly outperforms clinical scores in screening septic shock at ED triage.
Collapse
Affiliation(s)
- Joonghee Kim
- Department of Emergency Medicine, Seoul National University Bundang Hospital, 166 Gumi-ro, Bundang-gu, Gyeonggi-do, Seongnam-si 463-707, Republic of Korea
| | - HyungLan Chang
- Department of Emergency Medicine, CHA Bundang Medical Center, CHA University, 59, Yatap-ro, Bundang-gu, Gyeonggi-do, Seongnam-si 463-712, Republic of Korea
| | - Doyun Kim
- Department of Emergency Medicine, Seoul National University Bundang Hospital, 166 Gumi-ro, Bundang-gu, Gyeonggi-do, Seongnam-si 463-707, Republic of Korea
| | - Dong-Hyun Jang
- Department of Emergency Medicine, Seoul National University Bundang Hospital, 166 Gumi-ro, Bundang-gu, Gyeonggi-do, Seongnam-si 463-707, Republic of Korea
| | - Inwon Park
- Department of Emergency Medicine, Seoul National University Bundang Hospital, 166 Gumi-ro, Bundang-gu, Gyeonggi-do, Seongnam-si 463-707, Republic of Korea
| | - Kyuseok Kim
- College of Medicine, Seoul National University, 103 Daehak-ro, Jongno-gu, Seoul, Republic of Korea.
| |
Collapse
|
97
|
Gilvary C, Madhukar N, Elkhader J, Elemento O. The Missing Pieces of Artificial Intelligence in Medicine. Trends Pharmacol Sci 2019; 40:555-564. [DOI: 10.1016/j.tips.2019.06.001] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 06/03/2019] [Accepted: 06/04/2019] [Indexed: 12/22/2022]
|
98
|
Bernardini M, Morettini M, Romeo L, Frontoni E, Burattini L. TyG-er: An ensemble Regression Forest approach for identification of clinical factors related to insulin resistance condition using Electronic Health Records. Comput Biol Med 2019; 112:103358. [PMID: 31336327 DOI: 10.1016/j.compbiomed.2019.103358] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 06/17/2019] [Accepted: 07/15/2019] [Indexed: 01/19/2023]
Abstract
BACKGROUND Insulin resistance is an early-stage deterioration of Type 2 diabetes. Identification and quantification of insulin resistance requires specific blood tests; however, the triglyceride-glucose (TyG) index can provide a surrogate assessment from routine Electronic Health Record (EHR) data. Since insulin resistance is a multi-factorial condition, to improve its characterisation, this study aims to discover non-trivial clinical factors in EHR data to determine where the insulin-resistance condition is encoded. METHODS We proposed a high-interpretable Machine Learning approach (i.e., ensemble Regression Forest combined with data imputation strategies), named TyG-er. We applied three different experimental procedures to test TyG-er reliability on the Italian Federation of General Practitioners dataset, named FIMMG_obs dataset, which is publicly available and reflects the clinical use-case (i.e., not all laboratory exams are prescribed on a regular basis over time). RESULTS Results detected non-conventional clinical factors (i.e., uricemia, leukocytes, gamma-glutamyltransferase and protein profile) and provided novel insight into the best combination of clinical factors for detecting early glucose tolerance deterioration. The robustness of these extracted clinical factors was confirmed by the high agreement (from 0.664 to 0.911 of Lin's correlation coefficient (rc)) of the TyG-er approach among different experimental procedures. Moreover, the results of the three experimental procedures outlined the predictive power of the TyG-er approach (up to a mean absolute error of 5.68% and rc=0.666,p<.05). CONCLUSIONS The TyG-er approach is able to carry information about the identification of the TyG index, strictly correlated with the insulin-resistance condition, while extracting the most relevant non-glycemic features from routine data.
Collapse
|
99
|
|
100
|
A Comprehensive Medical Decision–Support Framework Based on a Heterogeneous Ensemble Classifier for Diabetes Prediction. ELECTRONICS 2019. [DOI: 10.3390/electronics8060635] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Early diagnosis of diabetes mellitus (DM) is critical to prevent its serious complications. An ensemble of classifiers is an effective way to enhance classification performance, which can be used to diagnose complex diseases, such as DM. This paper proposes an ensemble framework to diagnose DM by optimally employing multiple classifiers based on bagging and random subspace techniques. The proposed framework combines seven of the most suitable and heterogeneous data mining techniques, each with a separate set of suitable features. These techniques are k-nearest neighbors, naïve Bayes, decision tree, support vector machine, fuzzy decision tree, artificial neural network, and logistic regression. The framework is designed accurately by selecting, for every sub-dataset, the most suitable feature set and the most accurate classifier. It was evaluated using a real dataset collected from electronic health records of Mansura University Hospitals (Mansura, Egypt). The resulting framework achieved 90% of accuracy, 90.2% of recall = 90.2%, and 94.9% of precision. We evaluated and compared the proposed framework with many other classification algorithms. An analysis of the results indicated that the proposed ensemble framework significantly outperforms all other classifiers. It is a successful step towards constructing a personalized decision support system, which could help physicians in daily clinical practice.
Collapse
|