1
|
Xie J, Wang Y, Sheng Q, Liu X, Li J, Sun F, Wang Y, Li S, Li Y, Yu Y, Yu G. Identification of mycoplasma pneumonia in children based on fusion of multi-modal clinical free-text description and structured test data. Health Informatics J 2024; 30:14604582241255818. [PMID: 38779978 DOI: 10.1177/14604582241255818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Mycoplasma pneumonia may lead to hospitalizations and pose life-threatening risks in children. The automated identification of mycoplasma pneumonia from electronic medical records holds significant potential for improving the efficiency of hospital resource allocation. In this study, we proposed a novel method for identifying mycoplasma pneumonia by integrating multi-modal features derived from both free-text descriptions and structured test data in electronic medical records. Our approach begins with the extraction of free-text and structured data from clinical records through a systematic preprocessing pipeline. Subsequently, we employ a pre-trained transformer language model to extract features from the free-text, while multiple additive regression trees are used to transform features from the structured data. An attention-based fusion mechanism is then applied to integrate these multi-modal features for effective classification. We validated our method using clinic records of 7157 patients, retrospectively collected for training and testing purposes. The experimental results demonstrate that our proposed multi-modal fusion approach achieves significant improvements over other methods across four key performance metrics.
Collapse
Affiliation(s)
- Jingna Xie
- The Children's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yingshuo Wang
- The Children's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qiuyang Sheng
- Deepwise Healthcare Artificial Intelligence Laboratory, Beijing, China
| | - Xiaoqing Liu
- Deepwise Healthcare Artificial Intelligence Laboratory, Beijing, China
| | - Jing Li
- The Children's Hospital, Zhejiang University School of Medicine, Hangzhou, China; Sino-Finland Joint AI Laboratory for Child Health of Zhejiang Province, Hangzhou, China
| | - Fenglei Sun
- Deepwise Healthcare Artificial Intelligence Laboratory, Beijing, China
| | - Yuqi Wang
- The Children's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Shuxian Li
- The Children's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yiming Li
- Deepwise Healthcare Artificial Intelligence Laboratory, Beijing, China
| | - Yizhou Yu
- Deepwise Healthcare Artificial Intelligence Laboratory, Beijing, China; Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Gang Yu
- The Children's Hospital, Zhejiang University School of Medicine, Hangzhou, China; Sino-Finland Joint AI Laboratory for Child Health of Zhejiang Province, Hangzhou, China; Polytechnic Institute, Zhejiang University, Hangzhou, China
| |
Collapse
|
2
|
Axford D, Sohel F, Abedi V, Zhu Y, Zand R, Barkoudah E, Krupica T, Iheasirim K, Sharma UM, Dugani SB, Takahashi PY, Bhagra S, Murad MH, Saposnik G, Yousufuddin M. Development and internal validation of machine learning-based models and external validation of existing risk scores for outcome prediction in patients with ischaemic stroke. EUROPEAN HEART JOURNAL. DIGITAL HEALTH 2024; 5:109-122. [PMID: 38505491 PMCID: PMC10944684 DOI: 10.1093/ehjdh/ztad073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 10/14/2023] [Accepted: 10/30/2023] [Indexed: 03/21/2024]
Abstract
Aims We developed new machine learning (ML) models and externally validated existing statistical models [ischaemic stroke predictive risk score (iScore) and totalled health risks in vascular events (THRIVE) scores] for predicting the composite of recurrent stroke or all-cause mortality at 90 days and at 3 years after hospitalization for first acute ischaemic stroke (AIS). Methods and results In adults hospitalized with AIS from January 2005 to November 2016, with follow-up until November 2019, we developed three ML models [random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBOOST)] and externally validated the iScore and THRIVE scores for predicting the composite outcomes after AIS hospitalization, using data from 721 patients and 90 potential predictor variables. At 90 days and 3 years, 11 and 34% of patients, respectively, reached the composite outcome. For the 90-day prediction, the area under the receiver operating characteristic curve (AUC) was 0.779 for RF, 0.771 for SVM, 0.772 for XGBOOST, 0.720 for iScore, and 0.664 for THRIVE. For 3-year prediction, the AUC was 0.743 for RF, 0.777 for SVM, 0.773 for XGBOOST, 0.710 for iScore, and 0.675 for THRIVE. Conclusion The study provided three ML-based predictive models that achieved good discrimination and clinical usefulness in outcome prediction after AIS and broadened the application of the iScore and THRIVE scoring system for long-term outcome prediction. Our findings warrant comparative analyses of ML and existing statistical method-based risk prediction tools for outcome prediction after AIS in new data sets.
Collapse
Affiliation(s)
- Daniel Axford
- Department of Information Technology, Mathematics and Statistics, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, Australia
| | - Ferdous Sohel
- Department of Information Technology, Mathematics and Statistics, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, Australia
| | - Vida Abedi
- Department of Public Health Science, Penn State College of Medicine, Hershey, PA, USA
| | - Ye Zhu
- Robert D. and Patricia E. Kern Centre for the Science of Healthcare Delivery, Mayo Clinic, Rochester, MN, USA
| | - Ramin Zand
- Neuroscience Institute, Geisinger Health System, 100 North Academy Ave, Danville, PA 17822, USA
- Neuroscience Institute, The Pennsylvania State University, Hershey, PA 17033, USA
| | - Ebrahim Barkoudah
- Internal Medicine/Hospital Medicine, Brigham and Women’s Hospital, Harvard University, Boston, MA, USA
| | - Troy Krupica
- Internal Medicine/Hospital Medicine, West Virginial University, Morgantown, WV, USA
| | - Kingsley Iheasirim
- Internal Medicine/Hospital Internal Medicine, Mayo Clinic Health System, Mankato, MN, USA
| | - Umesh M Sharma
- Hospital Internal Medicine, Mayo Clinic, Phoenix, AZ, USA
| | - Sagar B Dugani
- Hospital Internal Medicine, Mayo Clinic, Rochester, MN, USA
| | | | - Sumit Bhagra
- Endocrinology, Diabetes and Metabolism, Mayo Clinic Health System, Austin, MN, USA
| | - Mohammad H Murad
- Division of Public Health, Infectious Diseases, and Occupational Medicine, Mayo Clinic, Rochester, MN, USA
| | - Gustavo Saposnik
- Stroke Outcomes and Decision Neuroscience Research Unit, Division of Neurology, Department of Medicine and Li Ka Shing Knowledge Institute, St.Michael’s Hospital, University of Toronto, Toronto, Ontario, Canada
| | - Mohammed Yousufuddin
- Hospital Internal Medicine, Mayo Clinic Health System, 1000 1st Drive NW, Austin, MN 55912, USA
| |
Collapse
|
3
|
Wang R, Cai L, Liu Y, Zhang J, Ou X, Xu J. Machine learning algorithms for prediction of ventilator associated pneumonia in traumatic brain injury patients from the MIMIC-III database. Heart Lung 2023; 62:225-232. [PMID: 37595390 DOI: 10.1016/j.hrtlng.2023.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 07/31/2023] [Accepted: 08/03/2023] [Indexed: 08/20/2023]
Abstract
BACKGROUND Ventilator associated pneumonia (VAP) is a common complication and associated with poor prognosis of traumatic brain injury (TBI) patients. OBJECTIVES This study was conducted to explore the predictive performance of different machine-learning algorithms for VAP in TBI patients. METHODS TBI patients receiving mechanical ventilation more than 48 hours from the Medical Information Mart for Intensive Care-III (MIMIC-III) database were eligible for the study. The VAP was confirmed based on the ICD-9 code. Included patients were separated to the training cohort and the validation cohort with a ratio of 7:3. Predictive models based on different machine learning algorithms were developed using 5-fold cross validation in the training cohort and then verified in the validation cohort by evaluating the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy and F score. RESULTS 786 TBI patients from the MIMIC-III were finally included with the VAP incidence of 44.0%. The random forest performed the best on predicting VAP in the training cohort with a AUC of 1.000. The XGBoost and AdaBoost were ranked the second and the third with a AUC of 0.915 and 0.789 in the training cohort. While the AdaBoost performed the best on predicting VAP in the validation cohort with a AUC of 0.706. The XGBoost and random forest were ranked the second and the third with the AUC of 0.685 and 0.683 in the validation cohort. Generally, the random forest and XGBoost were likely to be over-fitting while the AdaBoost was relatively stable in predicting the VAP. CONCLUSIONS The AdaBoost performed well and stably on predicting the VAP in TBI patients. Developing programs using AdaBoost in portable electronic devices may effectively assist physicians in assessing the risk of VAP in TBI.
Collapse
Affiliation(s)
- Ruoran Wang
- Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu, Sichuan province, China
| | - Linrui Cai
- Institute of Drug Clinical Trial·GCP, West China Second University Hospital, Sichuan University, Chengdu, China; Diseases of Women and Children, Sichuan University, Ministry of Education, Chengdu, China
| | - Yan Liu
- Laboratory Animal Center of Sichuan University, Chengdu, China
| | - Jing Zhang
- Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu, Sichuan province, China
| | - Xiaofeng Ou
- Department of Critical care medicine, West China Hospital, Sichuan University, Chengdu, Sichuan province, China.
| | - Jianguo Xu
- Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu, Sichuan province, China.
| |
Collapse
|
4
|
Matsumoto K, Nohara Y, Sakaguchi M, Takayama Y, Fukushige S, Soejima H, Nakashima N, Kamouchi M. Temporal Generalizability of Machine Learning Models for Predicting Postoperative Delirium Using Electronic Health Record Data: Model Development and Validation Study. JMIR Perioper Med 2023; 6:e50895. [PMID: 37883164 PMCID: PMC10636625 DOI: 10.2196/50895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 09/24/2023] [Accepted: 09/29/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND Although machine learning models demonstrate significant potential in predicting postoperative delirium, the advantages of their implementation in real-world settings remain unclear and require a comparison with conventional models in practical applications. OBJECTIVE The objective of this study was to validate the temporal generalizability of decision tree ensemble and sparse linear regression models for predicting delirium after surgery compared with that of the traditional logistic regression model. METHODS The health record data of patients hospitalized at an advanced emergency and critical care medical center in Kumamoto, Japan, were collected electronically. We developed a decision tree ensemble model using extreme gradient boosting (XGBoost) and a sparse linear regression model using least absolute shrinkage and selection operator (LASSO) regression. To evaluate the predictive performance of the model, we used the area under the receiver operating characteristic curve (AUROC) and the Matthews correlation coefficient (MCC) to measure discrimination and the slope and intercept of the regression between predicted and observed probabilities to measure calibration. The Brier score was evaluated as an overall performance metric. We included 11,863 consecutive patients who underwent surgery with general anesthesia between December 2017 and February 2022. The patients were divided into a derivation cohort before the COVID-19 pandemic and a validation cohort during the COVID-19 pandemic. Postoperative delirium was diagnosed according to the confusion assessment method. RESULTS A total of 6497 patients (68.5, SD 14.4 years, women n=2627, 40.4%) were included in the derivation cohort, and 5366 patients (67.8, SD 14.6 years, women n=2105, 39.2%) were included in the validation cohort. Regarding discrimination, the XGBoost model (AUROC 0.87-0.90 and MCC 0.34-0.44) did not significantly outperform the LASSO model (AUROC 0.86-0.89 and MCC 0.34-0.41). The logistic regression model (AUROC 0.84-0.88, MCC 0.33-0.40, slope 1.01-1.19, intercept -0.16 to 0.06, and Brier score 0.06-0.07), with 8 predictors (age, intensive care unit, neurosurgery, emergency admission, anesthesia time, BMI, blood loss during surgery, and use of an ambulance) achieved good predictive performance. CONCLUSIONS The XGBoost model did not significantly outperform the LASSO model in predicting postoperative delirium. Furthermore, a parsimonious logistic model with a few important predictors achieved comparable performance to machine learning models in predicting postoperative delirium.
Collapse
Affiliation(s)
| | - Yasunobu Nohara
- Big Data Science and Technology, Faculty of Advanced Science and Technology, Kumamoto University, Kumamoto, Japan
| | - Mikako Sakaguchi
- Department of Nursing, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Yohei Takayama
- Department of Nursing, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Syota Fukushige
- Department of Inspection, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Hidehisa Soejima
- Institute for Medical Information Research and Analysis, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Naoki Nakashima
- Medical Information Center, Kyushu University Hospital, Fukuoka, Japan
| | - Masahiro Kamouchi
- Department of Health Care Administration and Management, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
- Center for Cohort Studies, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| |
Collapse
|
5
|
Caruana A, Bandara M, Musial K, Catchpoole D, Kennedy PJ. Machine learning for administrative health records: A systematic review of techniques and applications. Artif Intell Med 2023; 144:102642. [PMID: 37783537 DOI: 10.1016/j.artmed.2023.102642] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 08/21/2023] [Accepted: 08/25/2023] [Indexed: 10/04/2023]
Abstract
Machine learning provides many powerful and effective techniques for analysing heterogeneous electronic health records (EHR). Administrative Health Records (AHR) are a subset of EHR collected for administrative purposes, and the use of machine learning on AHRs is a growing subfield of EHR analytics. Existing reviews of EHR analytics emphasise that the data-modality of the EHR limits the breadth of suitable machine learning techniques, and pursuable healthcare applications. Despite emphasising the importance of data modality, the literature fails to analyse which techniques and applications are relevant to AHRs. AHRs contain uniquely well-structured, categorically encoded records which are distinct from other data-modalities captured by EHRs, and they can provide valuable information pertaining to how patients interact with the healthcare system. This paper systematically reviews AHR-based research, analysing 70 relevant studies and spanning multiple databases. We identify and analyse which machine learning techniques are applied to AHRs and which health informatics applications are pursued in AHR-based research. We also analyse how these techniques are applied in pursuit of each application, and identify the limitations of these approaches. We find that while AHR-based studies are disconnected from each other, the use of AHRs in health informatics research is substantial and accelerating. Our synthesis of these studies highlights the utility of AHRs for pursuing increasingly complex and diverse research objectives despite a number of pervading data- and technique-based limitations. Finally, through our findings, we propose a set of future research directions that can enhance the utility of AHR data and machine learning techniques for health informatics research.
Collapse
Affiliation(s)
- Adrian Caruana
- Australian Artificial Intelligence Institute, Faculty of Engineering and IT, University of Technology Sydney, Australia.
| | - Madhushi Bandara
- Australian Artificial Intelligence Institute, Faculty of Engineering and IT, University of Technology Sydney, Australia
| | - Katarzyna Musial
- Complex Adaptive Systems Lab, Data Science Institute, Faculty of Engineering and IT, University of Technology Sydney, Australia
| | - Daniel Catchpoole
- Australian Artificial Intelligence Institute, Faculty of Engineering and IT, University of Technology Sydney, Australia; Biospecimen Research Services, The Children's Cancer Research Unit, The Children's Hospital at Westmead, Australia
| | - Paul J Kennedy
- Australian Artificial Intelligence Institute, Faculty of Engineering and IT, University of Technology Sydney, Australia; Joint Research Centre in AI for Health and Wellness, University of Technology Sydney, Australia, and Ontario Tech University, Canada
| |
Collapse
|
6
|
Shakibfar S, Andersen M, Sessa M. AI-based disease risk score for community-acquired pneumonia hospitalization. iScience 2023; 26:107027. [PMID: 37426351 PMCID: PMC10329143 DOI: 10.1016/j.isci.2023.107027] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 04/03/2023] [Accepted: 05/30/2023] [Indexed: 07/11/2023] Open
Abstract
Community-acquired pneumonia (CAP) is an acute infection involving the parenchyma of the lungs, which is acquired outside of the hospital. Population-wide real-world data and artificial intelligence (AI) were used to develop a disease risk score for CAP hospitalization among older individuals. The source population included residents in Denmark aged 65 years or older in the period January 1, 1996, to July 30, 2018. 137344 individuals were hospitalized for pneumonia during the study period for which, 5 controls were matched leading to a study population of 620908 individuals. The disease risk had an average accuracy of 0.79 based on 5-fold cross-validation in predicting CAP hospitalization. The disease risk score can be useful in clinical practice to identify individuals at higher risk of CAP hospitalization and intervene to minimize their risk of being hospitalized for CAP.
Collapse
Affiliation(s)
- Saeed Shakibfar
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | - Morten Andersen
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | - Maurizio Sessa
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
7
|
Iqbal S, Qureshi AN, Li J, Choudhry IA, Mahmood T. Dynamic learning for imbalanced data in learning chest X-ray and CT images. Heliyon 2023; 9:e16807. [PMID: 37313141 PMCID: PMC10258426 DOI: 10.1016/j.heliyon.2023.e16807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 05/26/2023] [Accepted: 05/29/2023] [Indexed: 06/15/2023] Open
Abstract
Massive annotated datasets are necessary for networks of deep learning. When a topic is being researched for the first time, as in the situation of the viral epidemic, handling it with limited annotated datasets might be difficult. Additionally, the datasets are quite unbalanced in this situation, with limited findings coming from significant instances of the novel illness. We offer a technique that allows a class balancing algorithm to understand and detect lung disease signs from chest X-ray and CT images. Deep learning techniques are used to train and evaluate images, enabling the extraction of basic visual attributes. The training objects' characteristics, instances, categories, and relative data modeling are all represented probabilistically. It is possible to identify a minority category in the classification process by using an imbalance-based sample analyzer. In order to address the imbalance problem, learning samples from the minority class are examined. The Support Vector Machine (SVM) is used to categorize images in clustering. Physicians and medical professionals can use the CNN model to validate their initial assessments of malignant and benign categorization. The proposed technique for class imbalance (3-Phase Dynamic Learning (3PDL)) and parallel CNN model (Hybrid Feature Fusion (HFF)) for multiple modalities achieve a high F1 score of 96.83 and precision is 96.87, its outstanding accuracy and generalization suggest that it may be utilized to create a pathologist's help tool.
Collapse
Affiliation(s)
- Saeed Iqbal
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124,China
- Department of Computer Science, Faculty of Information Technology & Computer Science, University of Central Punjab, Lahore, Pakistan
| | - Adnan N. Qureshi
- Department of Computer Science, Faculty of Information Technology & Computer Science, University of Central Punjab, Lahore, Pakistan
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124,China
- Beijing Engineering Research Center for IoT Software and Systems, 100124, China
| | - Imran Arshad Choudhry
- Department of Computer Science, Faculty of Information Technology & Computer Science, University of Central Punjab, Lahore, Pakistan
| | - Tariq Mahmood
- Faculty of Information Sciences, University of Education, Vehari Campus, Vehari, 61100, Pakistan
- Artificial Intelligence and Data Analytics (AIDA) Lab, College of Computer & Information Sciences (CCIS), Prince Sultan University, Riyadh, 11586, Kingdom of Saudi Arabia
| |
Collapse
|
8
|
Li J, Wang Y, Sheng Q, Liu X, Xing Z, Sun F, Wang Y, Li S, Li Y, Yu Y, Yu G. Interpretable modeling and discovery of key predictors for pneumonia diagnosis in children based on electronic medical records. Digit Health 2022; 8:20552076221131185. [PMID: 36276188 PMCID: PMC9583222 DOI: 10.1177/20552076221131185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 09/20/2022] [Indexed: 11/15/2022] Open
Abstract
Background Community-acquired pneumonia is one of the most common infectious diseases in children and is a leading cause of death among children under 5 years of age, resulting in high rates of antibiotic usage and hospitalization. It is of extremely practical significance to make full use of the existing electronic medical records to study pneumonia and to establish automatic diagnosis models for pneumonia. Methods We established pneumonia diagnosis models of Bayesian network using a total of 13,448 electronic medical records. We investigated learning network structure and parameter estimation and evaluated different structure learning strategies and various modeling methods. By identifying the key predictors of model, the pneumonia status was analyzed. Results The performance of the proposed Bayesian network was evaluated using a set of 3361 cases with a precision of 0.7861, a recall of 0.9889, and an F1-score of 0.8759. On an independent external validation set containing 4925 cases, Bayesian network achieved a precision of 0.7382, a recall of 0.9947, and an F1-score of 0.8475. Our proposed Bayesian network outperformed all other methods, including CatBoost, XGBoost, LightGBM, logistic regression, and ridge classification. Conclusion The appropriate feature selection improved the performance of Bayesian networks. The proposed Bayesian network had good generalizability and could be directly applied to clinical research centers. And the key predictors identified by the network demonstrated good clinical interpretability, allowing for a better understanding of pneumonia status and complications. This study had important clinical value and practical significance for the research and diagnosis of pediatric pneumonia.
Collapse
Affiliation(s)
- Jing Li
- Department of Data and Information, The Children's Hospital,
Zhejiang University School of Medicine, Hangzhou, China,Sino-Finland Joint AI Laboratory for Child Health of Zhejiang
Province, Hangzhou, China
| | - Yingshuo Wang
- Department of Pulmonology, The Children's Hospital, Zhejiang
University School of Medicine, Hangzhou, China
| | - Qiuyang Sheng
- Deepwise Healthcare Artificial Intelligence Laboratory, Beijing,
China
| | - Xiaoqing Liu
- Deepwise Healthcare Artificial Intelligence Laboratory, Beijing,
China
| | - Zijian Xing
- Deepwise Healthcare Artificial Intelligence Laboratory, Beijing,
China
| | - Fenglei Sun
- Deepwise Healthcare Artificial Intelligence Laboratory, Beijing,
China
| | - Yuqi Wang
- Department of Pulmonology, The Children's Hospital, Zhejiang
University School of Medicine, Hangzhou, China
| | - Shuxian Li
- Department of Pulmonology, The Children's Hospital, Zhejiang
University School of Medicine, Hangzhou, China
| | - Yiming Li
- Deepwise Healthcare Artificial Intelligence Laboratory, Beijing,
China
| | - Yizhou Yu
- Deepwise Healthcare Artificial Intelligence Laboratory, Beijing,
China,Yizhou Yu, Deepwise Healthcare Artificial
Intelligence Laboratory, 13th Floor, Building 2, Yard 2, Xisanhuan North Road,
Haidian District, Beijing, China.
| | - Gang Yu
- Department of Data and Information, The Children's Hospital,
Zhejiang University School of Medicine, Hangzhou, China,Sino-Finland Joint AI Laboratory for Child Health of Zhejiang
Province, Hangzhou, China,Polytechnic Institute, Zhejiang University, Hangzhou, China ,Gang Yu, Department of Data and
Information, The Children’s Hospital, Zhejiang University School of Medicine,
3333 Binsheng Road, Binjiang District, Hangzhou 310052, China.
| |
Collapse
|
9
|
Zhao M, Li J, Xiang L, Zhang ZH, Peng SL. A diagnosis model of dementia via machine learning. Front Aging Neurosci 2022; 14:984894. [PMID: 36158565 PMCID: PMC9490175 DOI: 10.3389/fnagi.2022.984894] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 08/11/2022] [Indexed: 11/13/2022] Open
Abstract
As the aging population poses serious challenges to families and societies, the issue of dementia has also received increasing attention. Dementia detection often requires a series of complex tests and lengthy questionnaires, which are time-consuming. In order to solve this problem, this article aims at the diagnosis method of questionnaire survey, hoping to establish a diagnosis model to help doctors make a diagnosis through machine learning method, and use feature selection method to select important questions to reduce the number of questions in the questionnaire, so as to reduce medical and time costs. In this article, Clinical Dementia Rating (CDR) is used as the data source, and various methods are used for modeling and feature selection, so as to combine similar attributes in the data set, reduce the categories, and finally use the confusion matrix to judge the effect. The experimental results show that the model established by the bagging method has the best effect, and the accuracy rate can reach 80% of the true diagnosis rate; in terms of feature selection, the principal component analysis (PCA) has the best effect compared with other methods.
Collapse
Affiliation(s)
- Ming Zhao
- School of Computer Science, Yangtze University, Jingzhou, China
| | - Jie Li
- School of Computer Science, Yangtze University, Jingzhou, China
| | - Liuqing Xiang
- School of Computer Science, Yangtze University, Jingzhou, China
| | - Zu-hai Zhang
- Department of Ophthalmology, The First Affiliated Hospital of Yangtze University, Jingzhou, China
- *Correspondence: Zu-hai Zhang,
| | - Sheng-Lung Peng
- Department of Creative Technologies and Product Design, National Taipei University of Business, Taipei, Taiwan
| |
Collapse
|
10
|
Exploring the Utility of Anonymized EHR Datasets in Machine Learning Experiments in the Context of the MODELHealth Project. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12125942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
The object of this paper was the application of machine learning to a clinical dataset that was anonymized using the Mondrian algorithm. (1) Background: The preservation of patient privacy is a necessity rising from the increasing digitization of health data; however, the effect of data anonymization on the performance of machine learning models remains to be explored. (2) Methods: The original EHR derived dataset was subjected to anonymization by applying the Mondrian algorithm for various k values and quasi identifier (QI) set attributes. The logistic regression, decision trees, k-nearest neighbors, Gaussian naive Bayes and support vector machine models were applied to the different dataset versions. (3) Results: The classifiers demonstrated different degrees of resilience to the anonymization, with the decision tree and the KNN models showing remarkably stable performance, as opposed to the Gaussian naïve Bayes model. The choice of the QI set attributes and the generalized information loss value played a more important role than the size of the QI set or the k value. (4) Conclusions: Data anonymization can reduce the performance of certain machine learning models, although the appropriate selection of classifier and parameter values can mitigate this effect.
Collapse
|