1
|
Liu K, Li L, Ma Y, Jiang J, Liu Z, Ye Z, Liu S, Pu C, Chen C, Wan Y. Machine Learning Models for Blood Glucose Level Prediction in Patients With Diabetes Mellitus: Systematic Review and Network Meta-Analysis. JMIR Med Inform 2023; 11:e47833. [PMID: 37983072 PMCID: PMC10696506 DOI: 10.2196/47833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 08/21/2023] [Accepted: 10/12/2023] [Indexed: 11/21/2023] Open
Abstract
BACKGROUND Machine learning (ML) models provide more choices to patients with diabetes mellitus (DM) to more properly manage blood glucose (BG) levels. However, because of numerous types of ML algorithms, choosing an appropriate model is vitally important. OBJECTIVE In a systematic review and network meta-analysis, this study aimed to comprehensively assess the performance of ML models in predicting BG levels. In addition, we assessed ML models used to detect and predict adverse BG (hypoglycemia) events by calculating pooled estimates of sensitivity and specificity. METHODS PubMed, Embase, Web of Science, and Institute of Electrical and Electronics Engineers Explore databases were systematically searched for studies on predicting BG levels and predicting or detecting adverse BG events using ML models, from inception to November 2022. Studies that assessed the performance of different ML models in predicting or detecting BG levels or adverse BG events of patients with DM were included. Studies with no derivation or performance metrics of ML models were excluded. The Quality Assessment of Diagnostic Accuracy Studies tool was applied to assess the quality of included studies. Primary outcomes were the relative ranking of ML models for predicting BG levels in different prediction horizons (PHs) and pooled estimates of the sensitivity and specificity of ML models in detecting or predicting adverse BG events. RESULTS In total, 46 eligible studies were included for meta-analysis. Regarding ML models for predicting BG levels, the means of the absolute root mean square error (RMSE) in a PH of 15, 30, 45, and 60 minutes were 18.88 (SD 19.71), 21.40 (SD 12.56), 21.27 (SD 5.17), and 30.01 (SD 7.23) mg/dL, respectively. The neural network model (NNM) showed the highest relative performance in different PHs. Furthermore, the pooled estimates of the positive likelihood ratio and the negative likelihood ratio of ML models were 8.3 (95% CI 5.7-12.0) and 0.31 (95% CI 0.22-0.44), respectively, for predicting hypoglycemia and 2.4 (95% CI 1.6-3.7) and 0.37 (95% CI 0.29-0.46), respectively, for detecting hypoglycemia. CONCLUSIONS Statistically significant high heterogeneity was detected in all subgroups, with different sources of heterogeneity. For predicting precise BG levels, the RMSE increases with a rise in the PH, and the NNM shows the highest relative performance among all the ML models. Meanwhile, current ML models have sufficient ability to predict adverse BG events, while their ability to detect adverse BG events needs to be enhanced. TRIAL REGISTRATION PROSPERO CRD42022375250; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=375250.
Collapse
Affiliation(s)
- Kui Liu
- Department of Health Service, Air Force Medical University, Xi'an, Shaanxi, China
| | - Linyi Li
- Department of Health Service, Air Force Medical University, Xi'an, Shaanxi, China
| | - Yifei Ma
- Department of Health Service, Air Force Medical University, Xi'an, Shaanxi, China
| | - Jun Jiang
- Department of Health Service, Air Force Medical University, Xi'an, Shaanxi, China
| | - Zhenhua Liu
- Department of Health Service, Air Force Medical University, Xi'an, Shaanxi, China
| | - Zichen Ye
- Department of Health Service, Air Force Medical University, Xi'an, Shaanxi, China
| | - Shuang Liu
- Department of Health Service, Air Force Medical University, Xi'an, Shaanxi, China
| | - Chen Pu
- Department of Health Service, Air Force Medical University, Xi'an, Shaanxi, China
| | - Changsheng Chen
- Department of Health Statistics, Air Force Medical University, Xi'an, Shaanxi, China
| | - Yi Wan
- Department of Health Service, Air Force Medical University, Xi'an, Shaanxi, China
| |
Collapse
|
2
|
Yang H, Li J, Liu S, Yang X, Liu J. Predicting Risk of Hypoglycemia in Patients With Type 2 Diabetes by Electronic Health Record-Based Machine Learning: Development and Validation. JMIR Med Inform 2022; 10:e36958. [PMID: 35708754 PMCID: PMC9247813 DOI: 10.2196/36958] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 05/08/2022] [Accepted: 05/31/2022] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Hypoglycemia is a common adverse event in the treatment of diabetes. To efficiently cope with hypoglycemia, effective hypoglycemia prediction models need to be developed. OBJECTIVE The aim of this study was to develop and validate machine learning models to predict the risk of hypoglycemia in adult patients with type 2 diabetes. METHODS We used the electronic health records of all adult patients with type 2 diabetes admitted to West China Hospital between November 2019 and December 2021. The prediction model was developed based on XGBoost and natural language processing. F1 score, area under the receiver operating characteristic curve (AUC), and decision curve analysis (DCA) were used as the main criteria to evaluate model performance. RESULTS We included 29,843 patients with type 2 diabetes, of whom 2804 patients (9.4%) developed hypoglycemia. In this study, the embedding machine learning model (XGBoost3) showed the best performance among all the models. The AUC and the accuracy of XGBoost are 0.82 and 0.93, respectively. The XGboost3 was also superior to other models in DCA. CONCLUSIONS The Paragraph Vector-Distributed Memory model can effectively extract features and improve the performance of the XGBoost model, which can then effectively predict hypoglycemia in patients with type 2 diabetes.
Collapse
Affiliation(s)
- Hao Yang
- Information Center, West China Hospital, Sichuan University, Chengdu, China
| | - Jiaxi Li
- Department of Clinical Laboratory Medicine, Jinniu Maternity and Child Health Hospital of Chengdu, Chengdu, China
| | - Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Xiaoling Yang
- West China School of Nursing, Endocrinology and Metabolism Department, West China Hospital, Sichuan University, Chengdu, China
| | - Jialin Liu
- Information Center, West China Hospital, Sichuan University, Chengdu, China.,Department of Medical Informatics, West China Medical School, Chengdu, China
| |
Collapse
|
3
|
Bright RA, Rankin SK, Dowdy K, Blok SV, Bright SJ, Palmer LAM. Finding Potential Adverse Events in the Unstructured Text of Electronic Health Care Records: Development of the Shakespeare Method. JMIRX MED 2021; 2:e27017. [PMID: 37725533 PMCID: PMC10414364 DOI: 10.2196/27017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 04/03/2021] [Accepted: 05/01/2021] [Indexed: 09/21/2023]
Abstract
BACKGROUND Big data tools provide opportunities to monitor adverse events (patient harm associated with medical care) (AEs) in the unstructured text of electronic health care records (EHRs). Writers may explicitly state an apparent association between treatment and adverse outcome ("attributed") or state the simple treatment and outcome without an association ("unattributed"). Many methods for finding AEs in text rely on predefining possible AEs before searching for prespecified words and phrases or manual labeling (standardization) by investigators. We developed a method to identify possible AEs, even if unknown or unattributed, without any prespecifications or standardization of notes. Our method was inspired by word-frequency analysis methods used to uncover the true authorship of disputed works credited to William Shakespeare. We chose two use cases, "transfusion" and "time-based." Transfusion was chosen because new transfusion AE types were becoming recognized during the study data period; therefore, we anticipated an opportunity to find unattributed potential AEs (PAEs) in the notes. With the time-based case, we wanted to simulate near real-time surveillance. We chose time periods in the hope of detecting PAEs due to contaminated heparin from mid-2007 to mid-2008 that were announced in early 2008. We hypothesized that the prevalence of contaminated heparin may have been widespread enough to manifest in EHRs through symptoms related to heparin AEs, independent of clinicians' documentation of attributed AEs. OBJECTIVE We aimed to develop a new method to identify attributed and unattributed PAEs using the unstructured text of EHRs. METHODS We used EHRs for adult critical care admissions at a major teaching hospital (2001-2012). For each case, we formed a group of interest and a comparison group. We concatenated the text notes for each admission into one document sorted by date, and deleted replicate sentences and lists. We identified statistically significant words in the group of interest versus the comparison group. Documents in the group of interest were filtered to those words, followed by topic modeling on the filtered documents to produce topics. For each topic, the three documents with the maximum topic scores were manually reviewed to identify PAEs. RESULTS Topics centered around medical conditions that were unique to or more common in the group of interest, including PAEs. In each use case, most PAEs were unattributed in the notes. Among the transfusion PAEs was unattributed evidence of transfusion-associated cardiac overload and transfusion-related acute lung injury. Some of the PAEs from mid-2007 to mid-2008 were increased unattributed events consistent with AEs related to heparin contamination. CONCLUSIONS The Shakespeare method could be a useful supplement to AE reporting and surveillance of structured EHR data. Future improvements should include automation of the manual review process.
Collapse
Affiliation(s)
- Roselie A Bright
- US Food and Drug Administration, Silver Spring, MD, United States
| | | | | | | | - Susan J Bright
- US Food and Drug Administration, Rockville, MD, United States
| | | |
Collapse
|
4
|
Pilla SJ, Park J, Schwartz JL, Albert MC, Ephraim PL, Boulware LE, Mathioudakis NN, Maruthur NM, Beach MC, Greer RC. Hypoglycemia Communication in Primary Care Visits for Patients with Diabetes. J Gen Intern Med 2021; 36:1533-1542. [PMID: 33479925 PMCID: PMC8175615 DOI: 10.1007/s11606-020-06385-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 12/02/2020] [Indexed: 02/07/2023]
Abstract
BACKGROUND Hypoglycemia is a common and serious adverse effect of diabetes treatment, especially for patients using insulin or insulin secretagogues. Guidelines recommend that these patients be assessed for interval hypoglycemic events at each clinical encounter and be provided anticipatory guidance for hypoglycemia prevention. OBJECTIVE To determine the frequency and content of hypoglycemia communication in primary care visits. DESIGN Qualitative study PARTICIPANTS: We examined 83 primary care visits from one urban health practice representing 8 clinicians and 33 patients using insulin or insulin secretagogues. APPROACH Using a directed content analysis approach, we analyzed audio-recorded primary care visits collected as part of the Achieving Blood Pressure Control Together study, a randomized trial of behavioral interventions for hypertension. The coding framework included communication about interval hypoglycemia, defined as discussion of hypoglycemic events or symptoms; the components of hypoglycemia anticipatory guidance in diabetes guidelines; and hypoglycemia unawareness. Hypoglycemia documentation in visit notes was compared to visit transcripts. KEY RESULTS Communication about interval hypoglycemia occurred in 24% of visits, and hypoglycemic events were reported in 16%. Despite patients voicing fear of hypoglycemia, clinicians rarely assessed hypoglycemia frequency, severity, or its impact on quality of life. Hypoglycemia anticipatory guidance was provided in 21% of visits which focused on diet and behavior change; clinicians rarely counseled on hypoglycemia treatment or avoidance of driving. Limited discussions of hypoglycemia unawareness occurred in 8% of visits. Documentation in visit notes had low sensitivity but high specificity for ascertaining interval hypoglycemia communication or hypoglycemic events, compared to visit transcripts. CONCLUSIONS In this high hypoglycemia risk population, communication about interval hypoglycemia and counseling for hypoglycemia prevention occurred in a minority of visits. There is a need to support clinicians to more regularly assess their patients' hypoglycemia burden and enhance counseling practices in order to optimize hypoglycemia prevention in primary care.
Collapse
Affiliation(s)
- Scott J Pilla
- Department of Medicine, Division of General Internal Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Welch Center for Prevention, Epidemiology & Clinical Research, Baltimore, MD, USA.
| | - Jenny Park
- Department of Medicine, Division of General Internal Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jessica L Schwartz
- Department of Medicine, Division of General Internal Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Michael C Albert
- Department of Medicine, Division of General Internal Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Johns Hopkins Community Physicians, Johns Hopkins University, Baltimore, MD, USA
| | - Patti L Ephraim
- Welch Center for Prevention, Epidemiology & Clinical Research, Baltimore, MD, USA
- Department of Epidemiology, The Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - L Ebony Boulware
- Division of General Internal Medicine, Duke University, Durham, NC, USA
| | - Nestoras N Mathioudakis
- Department of Medicine, Division of Endocrinology, Diabetes, & Metabolism, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Nisa M Maruthur
- Department of Medicine, Division of General Internal Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Welch Center for Prevention, Epidemiology & Clinical Research, Baltimore, MD, USA
- Department of Epidemiology, The Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Mary Catherine Beach
- Department of Medicine, Division of General Internal Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Welch Center for Prevention, Epidemiology & Clinical Research, Baltimore, MD, USA
- Department of Health, Behavior & Society, The Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Raquel C Greer
- Department of Medicine, Division of General Internal Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Welch Center for Prevention, Epidemiology & Clinical Research, Baltimore, MD, USA
- Department of Epidemiology, The Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| |
Collapse
|
5
|
Turchin A, Florez Builes LF. Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review. J Diabetes Sci Technol 2021; 15:553-560. [PMID: 33736486 PMCID: PMC8120048 DOI: 10.1177/19322968211000831] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
BACKGROUND Real-world evidence research plays an increasingly important role in diabetes care. However, a large fraction of real-world data are "locked" in narrative format. Natural language processing (NLP) technology offers a solution for analysis of narrative electronic data. METHODS We conducted a systematic review of studies of NLP technology focused on diabetes. Articles published prior to June 2020 were included. RESULTS We included 38 studies in the analysis. The majority (24; 63.2%) described only development of NLP tools; the remainder used NLP tools to conduct clinical research. A large fraction (17; 44.7%) of studies focused on identification of patients with diabetes; the rest covered a broad range of subjects that included hypoglycemia, lifestyle counseling, diabetic kidney disease, insulin therapy and others. The mean F1 score for all studies where it was available was 0.882. It tended to be lower (0.817) in studies of more linguistically complex concepts. Seven studies reported findings with potential implications for improving delivery of diabetes care. CONCLUSION Research in NLP technology to study diabetes is growing quickly, although challenges (e.g. in analysis of more linguistically complex concepts) remain. Its potential to deliver evidence on treatment and improving quality of diabetes care is demonstrated by a number of studies. Further growth in this area would be aided by deeper collaboration between developers and end-users of natural language processing tools as well as by broader sharing of the tools themselves and related resources.
Collapse
Affiliation(s)
- Alexander Turchin
- Brigham and Women’s Hospital, Boston,
MA, USA
- Alexander Turchin, MD, MS, Brigham and
Women’s Hospital, 221 Longwood Avenue, Boston, MA 02115, USA.
| | | |
Collapse
|
6
|
Kodama S, Fujihara K, Shiozaki H, Horikawa C, Yamada MH, Sato T, Yaguchi Y, Yamamoto M, Kitazawa M, Iwanaga M, Matsubayashi Y, Sone H. Ability of Current Machine Learning Algorithms to Predict and Detect Hypoglycemia in Patients With Diabetes Mellitus: Meta-analysis. JMIR Diabetes 2021; 6:e22458. [PMID: 33512324 PMCID: PMC7880810 DOI: 10.2196/22458] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 11/09/2020] [Accepted: 12/07/2020] [Indexed: 12/12/2022] Open
Abstract
Background Machine learning (ML) algorithms have been widely introduced to diabetes research including those for the identification of hypoglycemia. Objective The objective of this meta-analysis is to assess the current ability of ML algorithms to detect hypoglycemia (ie, alert to hypoglycemia coinciding with its symptoms) or predict hypoglycemia (ie, alert to hypoglycemia before its symptoms have occurred). Methods Electronic literature searches (from January 1, 1950, to September 14, 2020) were conducted using the Dialog platform that covers 96 databases of peer-reviewed literature. Included studies had to train the ML algorithm in order to build a model to detect or predict hypoglycemia and test its performance. The set of 2 × 2 data (ie, number of true positives, false positives, true negatives, and false negatives) was pooled with a hierarchical summary receiver operating characteristic model. Results A total of 33 studies (14 studies for detecting hypoglycemia and 19 studies for predicting hypoglycemia) were eligible. For detection of hypoglycemia, pooled estimates (95% CI) of sensitivity, specificity, positive likelihood ratio (PLR), and negative likelihood ratio (NLR) were 0.79 (0.75-0.83), 0.80 (0.64-0.91), 8.05 (4.79-13.51), and 0.18 (0.12-0.27), respectively. For prediction of hypoglycemia, pooled estimates (95% CI) were 0.80 (0.72-0.86) for sensitivity, 0.92 (0.87-0.96) for specificity, 10.42 (5.82-18.65) for PLR, and 0.22 (0.15-0.31) for NLR. Conclusions Current ML algorithms have insufficient ability to detect ongoing hypoglycemia and considerate ability to predict impeding hypoglycemia in patients with diabetes mellitus using hypoglycemic drugs with regard to diagnostic tests in accordance with the Users’ Guide to Medical Literature (PLR should be ≥5 and NLR should be ≤0.2 for moderate reliability). However, it should be emphasized that the clinical applicability of these ML algorithms should be evaluated according to patients’ risk profiles such as for hypoglycemia and its associated complications (eg, arrhythmia, neuroglycopenia) as well as the average ability of the ML algorithms. Continued research is required to develop more accurate ML algorithms than those that currently exist and to enhance the feasibility of applying ML in clinical settings. Trial Registration PROSPERO International Prospective Register of Systematic Reviews CRD42020163682; http://www.crd.york.ac.uk/PROSPERO/display_record.php?ID=CRD42020163682
Collapse
Affiliation(s)
- Satoru Kodama
- Department of Prevention of Noncommunicable Diseases and Promotion of Health Checkup, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Kazuya Fujihara
- Department of Hematology, Endocrinology and Metabolism, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Haruka Shiozaki
- Department of Hematology, Endocrinology and Metabolism, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Chika Horikawa
- Department of Health and Nutrition, Faculty of Human Life Studies, University of Niigata Prefecture, Niigata, Japan
| | - Mayuko Harada Yamada
- Department of Hematology, Endocrinology and Metabolism, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Takaaki Sato
- Department of Hematology, Endocrinology and Metabolism, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Yuta Yaguchi
- Department of Hematology, Endocrinology and Metabolism, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Masahiko Yamamoto
- Department of Hematology, Endocrinology and Metabolism, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Masaru Kitazawa
- Department of Hematology, Endocrinology and Metabolism, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Midori Iwanaga
- Department of Hematology, Endocrinology and Metabolism, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Yasuhiro Matsubayashi
- Department of Hematology, Endocrinology and Metabolism, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Hirohito Sone
- Department of Hematology, Endocrinology and Metabolism, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| |
Collapse
|
7
|
Mujahid O, Contreras I, Vehi J. Machine Learning Techniques for Hypoglycemia Prediction: Trends and Challenges. SENSORS (BASEL, SWITZERLAND) 2021; 21:E546. [PMID: 33466659 PMCID: PMC7828835 DOI: 10.3390/s21020546] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 01/08/2021] [Accepted: 01/12/2021] [Indexed: 12/11/2022]
Abstract
(1) Background: the use of machine learning techniques for the purpose of anticipating hypoglycemia has increased considerably in the past few years. Hypoglycemia is the drop in blood glucose below critical levels in diabetic patients. This may cause loss of cognitive ability, seizures, and in extreme cases, death. In almost half of all the severe cases, hypoglycemia arrives unannounced and is essentially asymptomatic. The inability of a diabetic patient to anticipate and intervene the occurrence of a hypoglycemic event often results in crisis. Hence, the prediction of hypoglycemia is a vital step in improving the life quality of a diabetic patient. The objective of this paper is to review work performed in the domain of hypoglycemia prediction by using machine learning and also to explore the latest trends and challenges that the researchers face in this area; (2) Methods: literature obtained from PubMed and Google Scholar was reviewed. Manuscripts from the last five years were searched for this purpose. A total of 903 papers were initially selected of which 57 papers were eventually shortlisted for detailed review; (3) Results: a thorough dissection of the shortlisted manuscripts provided an interesting split between the works based on two categories: hypoglycemia prediction and hypoglycemia detection. The entire review was carried out keeping this categorical distinction in perspective while providing a thorough overview of the machine learning approaches used to anticipate hypoglycemia, the type of training data, and the prediction horizon.
Collapse
Affiliation(s)
- Omer Mujahid
- Model Identification and Control Laboratory, Institut d’Informatica i Applicacions, Universitat de Girona, 17003 Girona, Spain; (O.M.); (I.C.)
| | - Ivan Contreras
- Model Identification and Control Laboratory, Institut d’Informatica i Applicacions, Universitat de Girona, 17003 Girona, Spain; (O.M.); (I.C.)
| | - Josep Vehi
- Model Identification and Control Laboratory, Institut d’Informatica i Applicacions, Universitat de Girona, 17003 Girona, Spain; (O.M.); (I.C.)
- Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), 17003 Girona, Spain
| |
Collapse
|
8
|
Jang R, Kim N, Jang M, Lee KH, Lee SM, Lee KH, Noh HN, Seo JB. Assessment of the Robustness of Convolutional Neural Networks in Labeling Noise by Using Chest X-Ray Images From Multiple Centers. JMIR Med Inform 2020; 8:e18089. [PMID: 32749222 PMCID: PMC7435602 DOI: 10.2196/18089] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2020] [Revised: 06/08/2020] [Accepted: 06/21/2020] [Indexed: 12/25/2022] Open
Abstract
Background Computer-aided diagnosis on chest x-ray images using deep learning is a widely studied modality in medicine. Many studies are based on public datasets, such as the National Institutes of Health (NIH) dataset and the Stanford CheXpert dataset. However, these datasets are preprocessed by classical natural language processing, which may cause a certain extent of label errors. Objective This study aimed to investigate the robustness of deep convolutional neural networks (CNNs) for binary classification of posteroanterior chest x-ray through random incorrect labeling. Methods We trained and validated the CNN architecture with different noise levels of labels in 3 datasets, namely, Asan Medical Center-Seoul National University Bundang Hospital (AMC-SNUBH), NIH, and CheXpert, and tested the models with each test set. Diseases of each chest x-ray in our dataset were confirmed by a thoracic radiologist using computed tomography (CT). Receiver operating characteristic (ROC) and area under the curve (AUC) were evaluated in each test. Randomly chosen chest x-rays of public datasets were evaluated by 3 physicians and 1 thoracic radiologist. Results In comparison with the public datasets of NIH and CheXpert, where AUCs did not significantly drop to 16%, the AUC of the AMC-SNUBH dataset significantly decreased from 2% label noise. Evaluation of the public datasets by 3 physicians and 1 thoracic radiologist showed an accuracy of 65%-80%. Conclusions The deep learning–based computer-aided diagnosis model is sensitive to label noise, and computer-aided diagnosis with inaccurate labels is not credible. Furthermore, open datasets such as NIH and CheXpert need to be distilled before being used for deep learning–based computer-aided diagnosis.
Collapse
Affiliation(s)
- Ryoungwoo Jang
- Department of Biomedical Engineering, College of Medicine, University of Ulsan, Seoul, Republic of Korea
| | - Namkug Kim
- Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Miso Jang
- Department of Biomedical Engineering, College of Medicine, University of Ulsan, Seoul, Republic of Korea
| | - Kyung Hwa Lee
- Department of Biomedical Engineering, College of Medicine, University of Ulsan, Seoul, Republic of Korea
| | - Sang Min Lee
- Department of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Kyung Hee Lee
- Department of Radiology, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam, Republic of Korea
| | - Han Na Noh
- Department of Health Screening and Promotion Center, Asan Medical Center, Seoul, Republic of Korea
| | - Joon Beom Seo
- Department of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
9
|
Jin Y, Li F, Yu H. BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab. PROCEEDINGS OF THE CONFERENCE. ASSOCIATION FOR COMPUTATIONAL LINGUISTICS. MEETING 2020; 2020:95-100. [PMID: 33223604 PMCID: PMC7679080 DOI: 10.18653/v1/2020.acl-demos.13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
CodaLab is an open-source web-based platform for collaborative computational research. Although CodaLab has gained popularity in the research community, its interface has limited support for creating reusable tools that can be easily applied to new datasets and composed into pipelines. In clinical domain, natural language processing (NLP) on medical notes generally involves multiple steps, like tokenization, named entity recognition, etc. Since these steps require different tools which are usually scattered in different publications, it is not easy for researchers to use them to process their own datasets. In this paper, we present BENTO, a workflow management platform with a graphic user interface (GUI) that is built on top of CodaLab, to facilitate the process of building clinical NLP pipelines. BENTO comes with a number of clinical NLP tools that have been pre-trained using medical notes and expert annotations and can be readily used for various clinical NLP tasks. It also allows researchers and developers to create their custom tools (e.g., pre-trained NLP models) and use them in a controlled and reproducible way. In addition, the GUI interface enables researchers with limited computer background to compose tools into NLP pipelines and then apply the pipelines on their own datasets in a "what you see is what you get" (WYSIWYG) way. Although BENTO is designed for clinical NLP applications, the underlying architecture is flexible to be tailored to any other domains.
Collapse
Affiliation(s)
- Yonghao Jin
- Department of Computer Science, University of Massachusetts Lowell, MA, USA
| | - Fei Li
- Department of Computer Science, University of Massachusetts Lowell, MA, USA
| | - Hong Yu
- Department of Computer Science, University of Massachusetts Lowell, MA, USA
| |
Collapse
|