1
|
Elshawi R, Sakr S, Al-Mallah MH, Keteyian SJ, Brawner CA, Ehrman JK. FIT calculator: a multi-risk prediction framework for medical outcomes using cardiorespiratory fitness data. Sci Rep 2024; 14:8745. [PMID: 38627439 PMCID: PMC11021455 DOI: 10.1038/s41598-024-59401-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 04/10/2024] [Indexed: 04/19/2024] Open
Abstract
Accurately predicting patients' risk for specific medical outcomes is paramount for effective healthcare management and personalized medicine. While a substantial body of literature addresses the prediction of diverse medical conditions, existing models predominantly focus on singular outcomes, limiting their scope to one disease at a time. However, clinical reality often entails patients concurrently facing multiple health risks across various medical domains. In response to this gap, our study proposes a novel multi-risk framework adept at simultaneous risk prediction for multiple clinical outcomes, including diabetes, mortality, and hypertension. Leveraging a concise set of features extracted from patients' cardiorespiratory fitness data, our framework minimizes computational complexity while maximizing predictive accuracy. Moreover, we integrate a state-of-the-art instance-based interpretability technique into our framework, providing users with comprehensive explanations for each prediction. These explanations afford medical practitioners invaluable insights into the primary health factors influencing individual predictions, fostering greater trust and utility in the underlying prediction models. Our approach thus stands to significantly enhance healthcare decision-making processes, facilitating more targeted interventions and improving patient outcomes in clinical practice. Our prediction framework utilizes an automated machine learning framework, Auto-Weka, to optimize machine learning models and hyper-parameter configurations for the simultaneous prediction of three medical outcomes: diabetes, mortality, and hypertension. Additionally, we employ a local interpretability technique to elucidate predictions generated by our framework. These explanations manifest visually, highlighting key attributes contributing to each instance's prediction for enhanced interpretability. Using automated machine learning techniques, the models simultaneously predict hypertension, mortality, and diabetes risks, utilizing only nine patient features. They achieved an average AUC of 0.90 ± 0.001 on the hypertension dataset, 0.90 ± 0.002 on the mortality dataset, and 0.89 ± 0.001 on the diabetes dataset through tenfold cross-validation. Additionally, the models demonstrated strong performance with an average AUC of 0.89 ± 0.001 on the hypertension dataset, 0.90 ± 0.001 on the mortality dataset, and 0.89 ± 0.001 on the diabetes dataset using bootstrap evaluation with 1000 resamples.
Collapse
Affiliation(s)
- Radwa Elshawi
- Institute of Computer Science, University of Tartu, Tartu, Estonia.
| | - Sherif Sakr
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | | | - Steven J Keteyian
- Division of Cardiovascular Medicine, Henry Ford Hospital, 6525 Second Ave., Detroit, MI, 48202, USA
| | - Clinton A Brawner
- Division of Cardiovascular Medicine, Henry Ford Hospital, 6525 Second Ave., Detroit, MI, 48202, USA
| | - Jonathan K Ehrman
- Division of Cardiovascular Medicine, Henry Ford Hospital, 6525 Second Ave., Detroit, MI, 48202, USA
| |
Collapse
|
2
|
Zad Z, Jiang VS, Wolf AT, Wang T, Cheng JJ, Paschalidis IC, Mahalingaiah S. Predicting polycystic ovary syndrome with machine learning algorithms from electronic health records. Front Endocrinol (Lausanne) 2024; 15:1298628. [PMID: 38356959 PMCID: PMC10866556 DOI: 10.3389/fendo.2024.1298628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 01/08/2024] [Indexed: 02/16/2024] Open
Abstract
Introduction Predictive models have been used to aid early diagnosis of PCOS, though existing models are based on small sample sizes and limited to fertility clinic populations. We built a predictive model using machine learning algorithms based on an outpatient population at risk for PCOS to predict risk and facilitate earlier diagnosis, particularly among those who meet diagnostic criteria but have not received a diagnosis. Methods This is a retrospective cohort study from a SafetyNet hospital's electronic health records (EHR) from 2003-2016. The study population included 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics and gynecology, endocrinology, family medicine, or general internal medicine. Four prediction outcomes were assessed for PCOS. The first outcome was PCOS ICD-9 diagnosis with additional model outcomes of algorithm-defined PCOS. The latter was based on Rotterdam criteria and merging laboratory values, radiographic imaging, and ICD data from the EHR to define irregular menstruation, hyperandrogenism, and polycystic ovarian morphology on ultrasound. Results We developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone binding globulin) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved an average AUC of 85%, 81%, 80%, and 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG. Conclusion Machine learning algorithms were used to predict PCOS based on a large at-risk population. This approach may guide early detection of PCOS within EHR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences. Our model illustrates the potential benefits of an artificial intelligence-enabled provider assistance tool that can be integrated into the EHR to reduce delays in diagnosis. However, model validation in other hospital-based populations is necessary.
Collapse
Affiliation(s)
- Zahra Zad
- Division of Systems Engineering, Center for Information and Systems Engineering (CISE), Boston University, Brookline, MA, United States
| | - Victoria S. Jiang
- Division of Reproductive Endocrinology and Infertility, Department of Obstetrics and Gynecology, Massachusetts General Hospital, Boston, MA, United States
| | - Amber T. Wolf
- Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Taiyao Wang
- Division of Systems Engineering, Center for Information and Systems Engineering (CISE), Boston University, Brookline, MA, United States
| | - J. Jojo Cheng
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, United States
| | - Ioannis Ch. Paschalidis
- Division of Systems Engineering, Center for Information and Systems Engineering (CISE), Boston University, Brookline, MA, United States
- Department of Electrical & Computer Engineering, Department of Biomedical Engineering, and Faculty for Computing & Data Sciences, Boston University, Boston, MA, United States
| | - Shruthi Mahalingaiah
- Division of Reproductive Endocrinology and Infertility, Department of Obstetrics and Gynecology, Massachusetts General Hospital, Boston, MA, United States
- Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston, MA, United States
| |
Collapse
|
3
|
Chen K, Abtahi F, Carrero JJ, Fernandez-Llatas C, Seoane F. Process mining and data mining applications in the domain of chronic diseases: A systematic review. Artif Intell Med 2023; 144:102645. [PMID: 37783545 DOI: 10.1016/j.artmed.2023.102645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 08/24/2023] [Accepted: 08/28/2023] [Indexed: 10/04/2023]
Abstract
The widespread use of information technology in healthcare leads to extensive data collection, which can be utilised to enhance patient care and manage chronic illnesses. Our objective is to summarise previous studies that have used data mining or process mining methods in the context of chronic diseases in order to identify research trends and future opportunities. The review covers articles that pertain to the application of data mining or process mining methods on chronic diseases that were published between 2000 and 2022. Articles were sourced from PubMed, Web of Science, EMBASE, and Google Scholar based on predetermined inclusion and exclusion criteria. A total of 71 articles met the inclusion criteria and were included in the review. Based on the literature review results, we detected a growing trend in the application of data mining methods in diabetes research. Additionally, a distinct increase in the use of process mining methods to model clinical pathways in cancer research was observed. Frequently, this takes the form of a collaborative integration of process mining, data mining, and traditional statistical methods. In light of this collaborative approach, the meticulous selection of statistical methods based on their underlying assumptions is essential when integrating these traditional methods with process mining and data mining methods. Another notable challenge is the lack of standardised guidelines for reporting process mining studies in the medical field. Furthermore, there is a pressing need to enhance the clinical interpretation of data mining and process mining results.
Collapse
Affiliation(s)
- Kaile Chen
- Department of Clinical Science, Intervention and Technology, Karolinska Institutet, 17177 Stockholm, Sweden; School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Biomedical Engineering and Health Systems, Division of Ergonomics, KTH Royal Institute of Technology, 14157 Stockholm, Sweden.
| | - Farhad Abtahi
- Department of Clinical Science, Intervention and Technology, Karolinska Institutet, 17177 Stockholm, Sweden; School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Biomedical Engineering and Health Systems, Division of Ergonomics, KTH Royal Institute of Technology, 14157 Stockholm, Sweden; Department of Clinical Physiology, Karolinska University Hospital, 17176 Stockholm, Sweden
| | - Juan-Jesus Carrero
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177 Stockholm, Sweden
| | - Carlos Fernandez-Llatas
- Department of Clinical Science, Intervention and Technology, Karolinska Institutet, 17177 Stockholm, Sweden; SABIEN, ITACA, Universitat Politècnica de València, Spain
| | - Fernando Seoane
- Department of Clinical Science, Intervention and Technology, Karolinska Institutet, 17177 Stockholm, Sweden; Department of Clinical Physiology, Karolinska University Hospital, 17176 Stockholm, Sweden; Department of Medical Technology, Karolinska University Hospital, 17176 Stockholm, Sweden; Department of Textile Technology, University of Borås, 50190 Borås, Sweden
| |
Collapse
|
4
|
Zad Z, Jiang VS, Wolf AT, Wang T, Cheng JJ, Paschalidis IC, Mahalingaiah S. Predicting polycystic ovary syndrome (PCOS) with machine learning algorithms from electronic health records. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.27.23293255. [PMID: 37577593 PMCID: PMC10418575 DOI: 10.1101/2023.07.27.23293255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Introduction Predictive models have been used to aid early diagnosis of PCOS, though existing models are based on small sample sizes and limited to fertility clinic populations. We built a predictive model using machine learning algorithms based on an outpatient population at risk for PCOS to predict risk and facilitate earlier diagnosis, particularly among those who meet diagnostic criteria but have not received a diagnosis. Methods This is a retrospective cohort study from a SafetyNet hospital's electronic health records (EHR) from 2003-2016. The study population included 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics and gynecology, endocrinology, family medicine, or general internal medicine. Four prediction outcomes were assessed for PCOS. The first outcome was PCOS ICD-9 diagnosis with additional model outcomes of algorithm-defined PCOS. The latter was based on Rotterdam criteria and merging laboratory values, radiographic imaging, and ICD data from the EHR to define irregular menstruation, hyperandrogenism, and polycystic ovarian morphology on ultrasound. Results We developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone binding globulin) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved AUC of 85%, 81%, 80%, and 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG. Conclusions Machine learning algorithms were used to predict PCOS based on a large at-risk population. This approach may guide early detection of PCOS within EHR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences. Our model illustrates the potential benefits of an artificial intelligence-enabled provider assistance tool that can be integrated into the EHR to reduce delays in diagnosis. However, model validation in other hospital-based populations is necessary.
Collapse
|
5
|
Santos CY, Tuboi S, de Jesus Lopes de Abreu A, Abud DA, Lobao Neto AA, Pereira R, Siqueira JB. A machine learning model to assess potential misdiagnosed dengue hospitalization. Heliyon 2023; 9:e16634. [PMID: 37313173 PMCID: PMC10258378 DOI: 10.1016/j.heliyon.2023.e16634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 05/23/2023] [Accepted: 05/23/2023] [Indexed: 06/15/2023] Open
Abstract
Dengue, like other arboviruses with broad clinical spectra, can easily be misdiagnosed as other infectious diseases due to the overlap of signs and symptoms. During large outbreaks, severe dengue cases have the potential to overwhelm the health care system and understanding the burden of dengue hospitalizations is therefore important to better allocate medical care and public health resources. A machine learning model that used data from the Brazilian public healthcare system database and the National Institute of Meteorology (INMET) was developed to estimate potential misdiagnosed dengue hospitalizations in Brazil. The data was modeled into a hospitalization level linked dataset. Then, Random Forest, Logistic Regression and Support Vector Machine algorithms were assessed. The algorithms were trained by dividing the dataset in training/test set and performing a cross validation to select the best hyperparameters in each algorithm tested. The evaluation was done based on accuracy, precision, recall, F1 score, sensitivity, and specificity. The best model developed was Random Forest with an accuracy of 85% on the final reviewed test. This model shows that 3.4% (13,608) of all hospitalizations in the public healthcare system from 2014 to 2020 could have been dengue misdiagnosed as other diseases. The model was helpful in finding potentially misdiagnosed dengue and might be a useful tool to help public health decision makers in planning resource allocation.
Collapse
Affiliation(s)
- Claudia Yang Santos
- Takeda Pharmaceuticals Brazil, Av. das Nações Unidas 14401, São Paulo, SP, Brazil
| | - Suely Tuboi
- Takeda Pharmaceuticals Brazil, Av. das Nações Unidas 14401, São Paulo, SP, Brazil
| | | | - Denise Alves Abud
- Takeda Pharmaceuticals Brazil, Av. das Nações Unidas 14401, São Paulo, SP, Brazil
| | | | - Ramon Pereira
- IQVIA Brazil, Rua Verbo Divino 2001, São Paulo, SP, Brazil
| | | |
Collapse
|
6
|
Prediction of Prednisolone Dose Correction Using Machine Learning. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2023; 7:84-103. [PMID: 36910914 PMCID: PMC9995628 DOI: 10.1007/s41666-023-00128-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 11/20/2022] [Accepted: 02/03/2023] [Indexed: 02/17/2023]
Abstract
Wrong dose, a common prescription error, can cause serious patient harm, especially in the case of high-risk drugs like oral corticosteroids. This study aims to build a machine learning model to predict dose-related prescription modifications for oral prednisolone tablets (i.e., highly imbalanced data with very few positive cases). Prescription data were obtained from the electronic medical records at a single institute. Cluster analysis classified the clinical departments into six clusters with similar patterns of prednisolone prescription. Two patterns of training datasets were created with/without preprocessing by the SMOTE method. Five ML models (SVM, KNN, GB, RF, and BRF) and logistic regression (LR) models were constructed by Python. The model was internally validated by five-fold stratified cross-validation and was validated with a 30% holdout test dataset. Eighty-two thousand five hundred fifty-three prescribing data for prednisolone tablets containing 135 dose-corrected positive cases were obtained. In the original dataset (without SMOTE), only the BRF model showed a good performance (in test dataset, ROC-AUC:0.917, recall: 0.951). In the training dataset preprocessed by SMOTE, performance was improved on all models. The highest performance models with SMOTE were SVM (in test dataset, ROC-AUC: 0.820, recall: 0.659) and BRF (ROC-AUC: 0.814, recall: 0.634). Although the prescribing data for dose-related collection are highly imbalanced, various techniques such as the following have allowed us to build high-performance prediction models: data preprocessing by SMOTE, stratified cross-validation, and BRF classifier corresponding to imbalanced data. ML is useful in complicated dose audits such as oral prednisolone. Supplementary Information The online version contains supplementary material available at 10.1007/s41666-023-00128-3.
Collapse
|
7
|
Ding W, Abdel-Basset M, Hawash H, Ali AM. Explainability of artificial intelligence methods, applications and challenges: A comprehensive survey. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.10.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
8
|
Integration of Artificial Intelligence and Blockchain Technology in Healthcare and Agriculture. J FOOD QUALITY 2022. [DOI: 10.1155/2022/4228448] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Over the last decade, the healthcare sector has accelerated its digitization and electronic health records (EHRs). As information technology progresses, the notion of intelligent health also gathers popularity. By combining technologies such as the internet of things (IoT) and artificial intelligence (AI), innovative healthcare modifies and enhances traditional medical systems in terms of efficiency, service, and personalization. On the other side, intelligent healthcare systems are incredibly vulnerable to data breaches and other malicious assaults. Recently, blockchain technology has emerged as a potentially transformative option for enhancing data management, access control, and integrity inside healthcare systems. Integrating these advanced approaches in agriculture is critical for managing food supply chains, drug supply chains, quality maintenance, and intelligent prediction. This study reviews the literature, formulates a research topic, and analyzes the applicability of blockchain to the agriculture/food industry and healthcare, with a particular emphasis on AI and IoT. This article summarizes research on the newest blockchain solutions paired with AI technologies for strengthening and inventing new technological standards for the healthcare ecosystems and food industry.
Collapse
|
9
|
Ann Romalt A, Kumar MS. A Novel Machine Learning Based Probabilistic Classification Model for Heart Disease Prediction. JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS 2022. [DOI: 10.1166/jmihi.2022.3940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Cardiovascular disease (CVD) is most dreadful disease that results in fatal-threats like heart attacks. Accurate disease prediction is very essential and machine-learning techniques contribute a major part in predicting occurrence. In this paper, a novel machine learning based model
for accurate prediction of cardiovascular disease is developed that applies unique feature selection technique called Chronic Fatigue Syndrome Best Known Method (CFSBKM). Each feature is ranked based on the feature importance scores. The new learning model eliminates the most irrelevant and
low importance features from the datasets thereby resulting in the robust heart disease risk prediction model. The multi-nominal Naive Bayes classifier is used for the classification. The performance of the CFSBKM model is evaluated using the Benchmark dataset Cleveland dataset from UCI repository
and the proposed models out-perform the existing techniques.
Collapse
Affiliation(s)
- A. Ann Romalt
- Stella Mary’s College of Engineering, Nagercoil 629202, Tamilnadu, India
| | | |
Collapse
|
10
|
Yland JJ, Wang T, Zad Z, Willis SK, Wang TR, Wesselink AK, Jiang T, Hatch EE, Wise LA, Paschalidis IC. Predictive models of pregnancy based on data from a preconception cohort study. Hum Reprod 2022; 37:565-576. [PMID: 35024824 PMCID: PMC8888990 DOI: 10.1093/humrep/deab280] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 11/30/2021] [Indexed: 01/16/2023] Open
Abstract
STUDY QUESTION Can we derive adequate models to predict the probability of conception among couples actively trying to conceive? SUMMARY ANSWER Leveraging data collected from female participants in a North American preconception cohort study, we developed models to predict pregnancy with performance of ∼70% in the area under the receiver operating characteristic curve (AUC). WHAT IS KNOWN ALREADY Earlier work has focused primarily on identifying individual risk factors for infertility. Several predictive models have been developed in subfertile populations, with relatively low discrimination (AUC: 59-64%). STUDY DESIGN, SIZE, DURATION Study participants were female, aged 21-45 years, residents of the USA or Canada, not using fertility treatment, and actively trying to conceive at enrollment (2013-2019). Participants completed a baseline questionnaire at enrollment and follow-up questionnaires every 2 months for up to 12 months or until conception. We used data from 4133 participants with no more than one menstrual cycle of pregnancy attempt at study entry. PARTICIPANTS/MATERIALS, SETTING, METHODS On the baseline questionnaire, participants reported data on sociodemographic factors, lifestyle and behavioral factors, diet quality, medical history and selected male partner characteristics. A total of 163 predictors were considered in this study. We implemented regularized logistic regression, support vector machines, neural networks and gradient boosted decision trees to derive models predicting the probability of pregnancy: (i) within fewer than 12 menstrual cycles of pregnancy attempt time (Model I), and (ii) within 6 menstrual cycles of pregnancy attempt time (Model II). Cox models were used to predict the probability of pregnancy within each menstrual cycle for up to 12 cycles of follow-up (Model III). We assessed model performance using the AUC and the weighted-F1 score for Models I and II, and the concordance index for Model III. MAIN RESULTS AND THE ROLE OF CHANCE Model I and II AUCs were 70% and 66%, respectively, in parsimonious models, and the concordance index for Model III was 63%. The predictors that were positively associated with pregnancy in all models were: having previously breastfed an infant and using multivitamins or folic acid supplements. The predictors that were inversely associated with pregnancy in all models were: female age, female BMI and history of infertility. Among nulligravid women with no history of infertility, the most important predictors were: female age, female BMI, male BMI, use of a fertility app, attempt time at study entry and perceived stress. LIMITATIONS, REASONS FOR CAUTION Reliance on self-reported predictor data could have introduced misclassification, which would likely be non-differential with respect to the pregnancy outcome given the prospective design. In addition, we cannot be certain that all relevant predictor variables were considered. Finally, though we validated the models using split-sample replication techniques, we did not conduct an external validation study. WIDER IMPLICATIONS OF THE FINDINGS Given a wide range of predictor data, machine learning algorithms can be leveraged to analyze epidemiologic data and predict the probability of conception with discrimination that exceeds earlier work. STUDY FUNDING/COMPETING INTEREST(S) The research was partially supported by the U.S. National Science Foundation (under grants DMS-1664644, CNS-1645681 and IIS-1914792) and the National Institutes for Health (under grants R01 GM135930 and UL54 TR004130). In the last 3 years, L.A.W. has received in-kind donations for primary data collection in PRESTO from FertilityFriend.com, Kindara.com, Sandstone Diagnostics and Swiss Precision Diagnostics. L.A.W. also serves as a fibroid consultant to AbbVie, Inc. The other authors declare no competing interests. TRIAL REGISTRATION NUMBER N/A.
Collapse
Affiliation(s)
- Jennifer J Yland
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA,Correspondence address. Department of Epidemiology, Boston University School of Public Health, 715 Albany Street, Boston, MA 02118, USA. E-mail:
| | - Taiyao Wang
- Center for Information and Systems Engineering, Boston University, Boston, MA, USA,Philips Research North America, Cambridge, MA, USA
| | - Zahra Zad
- Center for Information and Systems Engineering, Boston University, Boston, MA, USA,Division of Systems Engineering, Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA
| | - Sydney K Willis
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Tanran R Wang
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Amelia K Wesselink
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Tammy Jiang
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Elizabeth E Hatch
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Lauren A Wise
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Ioannis Ch Paschalidis
- Center for Information and Systems Engineering, Boston University, Boston, MA, USA,Division of Systems Engineering, Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA,Department of Biomedical Engineering, Boston University, Boston, MA, USA
| |
Collapse
|
11
|
Swain S, Bhushan B, Dhiman G, Viriyasitavat W. Appositeness of Optimized and Reliable Machine Learning for Healthcare: A Survey. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2022; 29:3981-4003. [PMID: 35342282 PMCID: PMC8939887 DOI: 10.1007/s11831-022-09733-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 02/09/2022] [Indexed: 05/04/2023]
Abstract
Machine Learning (ML) has been categorized as a branch of Artificial Intelligence (AI) under the Computer Science domain wherein programmable machines imitate human learning behavior with the help of statistical methods and data. The Healthcare industry is one of the largest and busiest sectors in the world, functioning with an extensive amount of manual moderation at every stage. Most of the clinical documents concerning patient care are hand-written by experts, selective reports are machine-generated. This process elevates the chances of misdiagnosis thereby, imposing a risk to a patient's life. Recent technological adoptions for automating manual operations have witnessed extensive use of ML in its applications. The paper surveys the applicability of ML approaches in automating medical systems. The paper discusses most of the optimized statistical ML frameworks that encourage better service delivery in clinical aspects. The universal adoption of various Deep Learning (DL) and ML techniques as the underlying systems for a variety of wellness applications, is delineated by challenges and elevated by myriads of security. This work tries to recognize a variety of vulnerabilities occurring in medical procurement, admitting the concerns over its predictive performance from a privacy point of view. Finally providing possible risk delimiting facts and directions for active challenges in the future.
Collapse
Affiliation(s)
- Subhasmita Swain
- Department of Computer Science and Engineering, School of Engineering and Technology, Sharda University, Greater Noida, India
| | - Bharat Bhushan
- Department of Computer Science and Engineering, School of Engineering and Technology, Sharda University, Greater Noida, India
| | - Gaurav Dhiman
- Department of Computer Science, Government Bikram College of Commerce, Patiala, India
- University Centre for Research and Development, Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, India
- Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, India
| | - Wattana Viriyasitavat
- Department of Statistics, Faculty of Commerce and Accountancy, Chulalongkorn Business School, Bangkok, Thailand
| |
Collapse
|
12
|
AIM in Medical Informatics. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_32] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
13
|
Zhao Y, Paschalidis IC, Hu J. The impact of payer status on hospital admissions: evidence from an academic medical center. BMC Health Serv Res 2021; 21:930. [PMID: 34493261 PMCID: PMC8425077 DOI: 10.1186/s12913-021-06886-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 08/09/2021] [Indexed: 11/10/2022] Open
Abstract
Background There are plenty of studies investigating the disparity of payer status in accessing to care. However, most studies are either disease-specific or cohort-specific. Quantifying the disparity from the level of facility through a large controlled study are rare. This study aims to examine how the payer status affects patient hospitalization from the perspective of a facility. Methods We extracted all patients with visiting record in a medical center between 5/1/2009-4/30/2014, and then linked the outpatient and inpatient records three year before target admission time to patients. We conduct a retrospective observational study using a conditional logistic regression methodology. To control the illness of patients with different diseases in training the model, we construct a three-dimension variable with data stratification technology. The model is validated on a dataset distinct from the one used for training. Results Patients covered by private insurance or uninsured are less likely to be hospitalized than patients insured by government. For uninsured patients, inequity in access to hospitalization is observed. The value of standardized coefficients indicates that government-sponsored insurance has the greatest impact on improving patients’ hospitalization. Conclusion Attention is needed on improving the access to care for uninsured patients. Also, basic preventive care services should be enhanced, especially for people insured by government. The findings can serve as a baseline from which to measure the anticipated effect of measures to reduce disparity of payer status in hospitalization. Supplementary Information The online version contains supplementary material available at (10.1186/s12913-021-06886-3).
Collapse
Affiliation(s)
- Yanying Zhao
- School of Management, Fudan University, 670 Guoshun Road, Yangpu District, Shanghai, 200433, China.
| | - Ioannis Ch Paschalidis
- Departments of Electrical & Computer Engineering, Systems Engineering, and Biomedical Engineering, Boston University, 8 St Marys Street, Boston, Massachusetts, 02215, USA
| | - Jianqiang Hu
- School of Management, Fudan University, 670 Guoshun Road, Yangpu District, Shanghai, 200433, China
| |
Collapse
|
14
|
Abstract
Smart cities connect people and places using innovative technologies such as Data Mining (DM), Machine Learning (ML), big data, and the Internet of Things (IoT). This paper presents a bibliometric analysis to provide a comprehensive overview of studies associated with DM technologies used in smart cities applications. The study aims to identify the main DM techniques used in the context of smart cities and how the research field of DM for smart cities evolves over time. We adopted both qualitative and quantitative methods to explore the topic. We used the Scopus database to find relative articles published in scientific journals. This study covers 197 articles published over the period from 2013 to 2021. For the bibliometric analysis, we used the Biliometrix library, developed in R. Our findings show that there is a wide range of DM technologies used in every layer of a smart city project. Several ML algorithms, supervised or unsupervised, are adopted for operating the instrumentation, middleware, and application layer. The bibliometric analysis shows that DM for smart cities is a fast-growing scientific field. Scientists from all over the world show a great interest in researching and collaborating on this interdisciplinary scientific field.
Collapse
|
15
|
Vellameeran FA, Brindha T. An integrated review on machine learning approaches for heart disease prediction: Direction towards future research gaps. BIO-ALGORITHMS AND MED-SYSTEMS 2021. [DOI: 10.1515/bams-2020-0069] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Abstract
Objectives
To make a clear literature review on state-of-the-art heart disease prediction models.
Methods
It reviews 61 research papers and states the significant analysis. Initially, the analysis addresses the contributions of each literature works and observes the simulation environment. Here, different types of machine learning algorithms deployed in each contribution. In addition, the utilized dataset for existing heart disease prediction models was observed.
Results
The performance measures computed in entire papers like prediction accuracy, prediction error, specificity, sensitivity, f-measure, etc., are learned. Further, the best performance is also checked to confirm the effectiveness of entire contributions.
Conclusions
The comprehensive research challenges and the gap are portrayed based on the development of intelligent methods concerning the unresolved challenges in heart disease prediction using data mining techniques.
Collapse
Affiliation(s)
| | - Thomas Brindha
- Department of Information Technology , Noorul Islam Centre for Higher Education , Kanyakumari , India
| |
Collapse
|
16
|
Bruno P, Calimeri F, Greco G. AIM in Medical Informatics. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_32-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
17
|
MapReduce-based big data framework using modified artificial neural network classifier for diabetic chronic disease prediction. Soft comput 2020. [DOI: 10.1007/s00500-020-04943-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
18
|
Wollenstein-Betech S, Cassandras CG, Paschalidis IC. Personalized predictive models for symptomatic COVID-19 patients using basic preconditions: Hospitalizations, mortality, and the need for an ICU or ventilator. Int J Med Inform 2020; 142:104258. [PMID: 32927229 PMCID: PMC7442577 DOI: 10.1016/j.ijmedinf.2020.104258] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 07/26/2020] [Accepted: 08/17/2020] [Indexed: 01/08/2023]
Abstract
BACKGROUND The rapid global spread of the SARS-CoV-2 virus has provoked a spike in demand for hospital care. Hospital systems across the world have been over-extended, including in Northern Italy, Ecuador, and New York City, and many other systems face similar challenges. As a result, decisions on how to best allocate very limited medical resources and design targeted policies for vulnerable subgroups have come to the forefront. Specifically, under consideration are decisions on who to test, who to admit into hospitals, who to treat in an Intensive Care Unit (ICU), and who to support with a ventilator. Given today's ability to gather, share, analyze and process data, personalized predictive models based on demographics and information regarding prior conditions can be used to (1) help decision-makers allocate limited resources, when needed, (2) advise individuals how to better protect themselves given their risk profile, (3) differentiate social distancing guidelines based on risk, and (4) prioritize vaccinations once a vaccine becomes available. OBJECTIVE To develop personalized models that predict the following events: (1) hospitalization, (2) mortality, (3) need for ICU, and (4) need for a ventilator. To predict hospitalization, it is assumed that one has access to a patient's basic preconditions, which can be easily gathered without the need to be at a hospital and hence serve citizens and policy makers to assess individual risk during a pandemic. For the remaining models, different versions developed include different sets of a patient's features, with some including information on how the disease is progressing (e.g., diagnosis of pneumonia). MATERIALS AND METHODS National data from a publicly available repository, updated daily, containing information from approximately 91,000 patients in Mexico were used. The data for each patient include demographics, prior medical conditions, SARS-CoV-2 test results, hospitalization, mortality and whether a patient has developed pneumonia or not. Several classification methods were applied and compared, including robust versions of logistic regression, and support vector machines, as well as random forests and gradient boosted decision trees. RESULTS Interpretable methods (logistic regression and support vector machines) perform just as well as more complex models in terms of accuracy and detection rates, with the additional benefit of elucidating variables on which the predictions are based. Classification accuracies reached 72 %, 79 %, 89 %, and 90 % for predicting hospitalization, mortality, need for ICU and need for a ventilator, respectively. The analysis reveals the most important preconditions for making the predictions. For the four models derived, these are: (1) for hospitalization:age, pregnancy, diabetes, gender, chronic renal insufficiency, and immunosuppression; (2) for mortality: age, immunosuppression, chronic renal insufficiency, obesity and diabetes; (3) for ICU need: development of pneumonia (if available), age, obesity, diabetes and hypertension; and (4) for ventilator need: ICU and pneumonia (if available), age, obesity, and hypertension.
Collapse
Affiliation(s)
- Salomón Wollenstein-Betech
- Department of Electrical & Computer Engineering, Division of Systems Engineering, Boston University, 8 Saint Mary's St., Boston, MA 02215, USA
| | - Christos G Cassandras
- Department of Electrical & Computer Engineering, Division of Systems Engineering, Boston University, 8 Saint Mary's St., Boston, MA 02215, USA
| | - Ioannis Ch Paschalidis
- Department of Electrical & Computer Engineering, Division of Systems Engineering, Boston University, 8 Saint Mary's St., Boston, MA 02215, USA; Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215, USA.
| |
Collapse
|
19
|
Wang Q, Guo A. An efficient variance estimator of AUC and its applications to binary classification. Stat Med 2020; 39:4281-4300. [PMID: 32914457 DOI: 10.1002/sim.8725] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Revised: 05/05/2020] [Accepted: 07/19/2020] [Indexed: 11/11/2022]
Abstract
The area under the ROC (receiver operating characteristic) curve, AUC, is one of the most commonly used measures to evaluate the performance of a binary classifier. Due to sampling variation, the model with the largest observed AUC score is not necessarily optimal, so it is crucial to assess the variation of AUC estimate. We extend the proposal by Wang and Lindsay and devise an unbiased variance estimator of AUC estimate that is of a two-sample U-statistic form. The proposal can be easily generalized to estimate the variance of a K-sample U-statistic (K ≥ 2). To make our developed variance estimator more applicable, we employ a partition-resampling scheme that is computationally efficient. Simulation studies suggest that the developed AUC variance estimator yields much better or comparable performance to jackknife and bootstrap variance estimators, and computational times that are about 10 to 30 times faster than the times of its counterparts. In practice, the proposal can be used in the one-standard-error rule for model selection, or to construct an asymptotic confidence interval of AUC in binary classification. In addition to conducting simulation studies, we illustrate its practical applications using two real datasets in medical sciences.
Collapse
Affiliation(s)
- Qing Wang
- Department of Mathematics, Wellesley College, Wellesley, Massachusetts
| | - Alexandria Guo
- Department of Mathematics, Wellesley College, Wellesley, Massachusetts
| |
Collapse
|
20
|
Bertsimas D, Li ML, Paschalidis IC, Wang T. Prescriptive analytics for reducing 30-day hospital readmissions after general surgery. PLoS One 2020; 15:e0238118. [PMID: 32903282 PMCID: PMC7480861 DOI: 10.1371/journal.pone.0238118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Accepted: 08/09/2020] [Indexed: 11/18/2022] Open
Abstract
INTRODUCTION New financial incentives, such as reduced Medicare reimbursements, have led hospitals to closely monitor their readmission rates and initiate efforts aimed at reducing them. In this context, many surgical departments participate in the American College of Surgeons National Surgical Quality Improvement Program (NSQIP), which collects detailed demographic, laboratory, clinical, procedure and perioperative occurrence data. The availability of such data enables the development of data science methods which predict readmissions and, as done in this paper, offer specific recommendations aimed at preventing readmissions. MATERIALS AND METHODS This study leverages NSQIP data for 722,101 surgeries to develop predictive and prescriptive models, predicting readmissions and offering real-time, personalized treatment recommendations for surgical patients during their hospital stay, aimed at reducing the risk of a 30-day readmission. We applied a variety of classification methods to predict 30-day readmissions and developed two prescriptive methods to recommend pre-operative blood transfusions to increase the patient's hematocrit with the objective of preventing readmissions. The effect of these interventions was evaluated using several predictive models. RESULTS Predictions of 30-day readmissions based on the entire collection of NSQIP variables achieve an out-of-sample accuracy of 87% (Area Under the Curve-AUC). Predictions based only on pre-operative variables have an accuracy of 74% AUC, out-of-sample. Personalized interventions, in the form of pre-operative blood transfusions identified by the prescriptive methods, reduce readmissions by 12%, on average, for patients considered as candidates for pre-operative transfusion (pre-operative hematoctic <30). The prediction accuracy of the proposed models exceeds results in the literature. CONCLUSIONS This study is among the first to develop a methodology for making specific, data-driven, personalized treatment recommendations to reduce the 30-day readmission rate. The reported predicted reduction in readmissions can lead to more than $20 million in savings in the U.S. annually.
Collapse
Affiliation(s)
- Dimitris Bertsimas
- Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, United States of America
| | - Michael Lingzhi Li
- Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, United States of America
| | - Ioannis Ch. Paschalidis
- Center for Information and Systems Engineering, Boston University, Boston, MA, United States of America
| | - Taiyao Wang
- Center for Information and Systems Engineering, Boston University, Boston, MA, United States of America
| |
Collapse
|
21
|
Payrovnaziri SN, Chen Z, Rengifo-Moreno P, Miller T, Bian J, Chen JH, Liu X, He Z. Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review. J Am Med Inform Assoc 2020; 27:1173-1185. [PMID: 32417928 PMCID: PMC7647281 DOI: 10.1093/jamia/ocaa053] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 04/01/2020] [Accepted: 04/07/2020] [Indexed: 01/08/2023] Open
Abstract
OBJECTIVE To conduct a systematic scoping review of explainable artificial intelligence (XAI) models that use real-world electronic health record data, categorize these techniques according to different biomedical applications, identify gaps of current studies, and suggest future research directions. MATERIALS AND METHODS We searched MEDLINE, IEEE Xplore, and the Association for Computing Machinery (ACM) Digital Library to identify relevant papers published between January 1, 2009 and May 1, 2019. We summarized these studies based on the year of publication, prediction tasks, machine learning algorithm, dataset(s) used to build the models, the scope, category, and evaluation of the XAI methods. We further assessed the reproducibility of the studies in terms of the availability of data and code and discussed open issues and challenges. RESULTS Forty-two articles were included in this review. We reported the research trend and most-studied diseases. We grouped XAI methods into 5 categories: knowledge distillation and rule extraction (N = 13), intrinsically interpretable models (N = 9), data dimensionality reduction (N = 8), attention mechanism (N = 7), and feature interaction and importance (N = 5). DISCUSSION XAI evaluation is an open issue that requires a deeper focus in the case of medical applications. We also discuss the importance of reproducibility of research work in this field, as well as the challenges and opportunities of XAI from 2 medical professionals' point of view. CONCLUSION Based on our review, we found that XAI evaluation in medicine has not been adequately and formally practiced. Reproducibility remains a critical concern. Ample opportunities exist to advance XAI research in medicine.
Collapse
Affiliation(s)
| | - Zhaoyi Chen
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Pablo Rengifo-Moreno
- College of Medicine, Florida State University, Tallahassee, Florida, USA
- Tallahassee Memorial Hospital, Tallahassee, Florida, USA
| | - Tim Miller
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Victoria, Australia
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Jonathan H Chen
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, California, USA
- Division of Hospital Medicine, Department of Medicine, Stanford University, Stanford, California, USA
| | - Xiuwen Liu
- Department of Computer Science, Florida State University, Tallahassee, Florida, USA
| | - Zhe He
- School of Information, Florida State University, Tallahassee, Florida, USA
| |
Collapse
|
22
|
Tarekegn A, Ricceri F, Costa G, Ferracin E, Giacobini M. Predictive Modeling for Frailty Conditions in Elderly People: Machine Learning Approaches. JMIR Med Inform 2020; 8:e16678. [PMID: 32442149 PMCID: PMC7303829 DOI: 10.2196/16678] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 01/07/2020] [Accepted: 02/16/2020] [Indexed: 12/15/2022] Open
Abstract
Background Frailty is one of the most critical age-related conditions in older adults. It is often recognized as a syndrome of physiological decline in late life, characterized by a marked vulnerability to adverse health outcomes. A clear operational definition of frailty, however, has not been agreed so far. There is a wide range of studies on the detection of frailty and their association with mortality. Several of these studies have focused on the possible risk factors associated with frailty in the elderly population while predicting who will be at increased risk of frailty is still overlooked in clinical settings. Objective The objective of our study was to develop predictive models for frailty conditions in older people using different machine learning methods based on a database of clinical characteristics and socioeconomic factors. Methods An administrative health database containing 1,095,612 elderly people aged 65 or older with 58 input variables and 6 output variables was used. We first identify and define six problems/outputs as surrogates of frailty. We then resolve the imbalanced nature of the data through resampling process and a comparative study between the different machine learning (ML) algorithms – Artificial neural network (ANN), Genetic programming (GP), Support vector machines (SVM), Random Forest (RF), Logistic regression (LR) and Decision tree (DT) – was carried out. The performance of each model was evaluated using a separate unseen dataset. Results Predicting mortality outcome has shown higher performance with ANN (TPR 0.81, TNR 0.76, accuracy 0.78, F1-score 0.79) and SVM (TPR 0.77, TNR 0.80, accuracy 0.79, F1-score 0.78) than predicting the other outcomes. On average, over the six problems, the DT classifier has shown the lowest accuracy, while other models (GP, LR, RF, ANN, and SVM) performed better. All models have shown lower accuracy in predicting an event of an emergency admission with red code than predicting fracture and disability. In predicting urgent hospitalization, only SVM achieved better performance (TPR 0.75, TNR 0.77, accuracy 0.73, F1-score 0.76) with the 10-fold cross validation compared with other models in all evaluation metrics. Conclusions We developed machine learning models for predicting frailty conditions (mortality, urgent hospitalization, disability, fracture, and emergency admission). The results show that the prediction performance of machine learning models significantly varies from problem to problem in terms of different evaluation metrics. Through further improvement, the model that performs better can be used as a base for developing decision-support tools to improve early identification and prediction of frail older adults.
Collapse
Affiliation(s)
- Adane Tarekegn
- Modeling and Data Science, Department of Mathematics, University of Turin, Turin, Italy
| | - Fulvio Ricceri
- Department of Clinical and Biological Sciences, University of Turin, Turin, Italy.,Unit of Epidemiology, Regional Health Service, Local Health Unit Torino 3, Turin, Italy
| | - Giuseppe Costa
- Department of Clinical and Biological Sciences, University of Turin, Turin, Italy.,Unit of Epidemiology, Regional Health Service, Local Health Unit Torino 3, Turin, Italy
| | - Elisa Ferracin
- Unit of Epidemiology, Regional Health Service, Local Health Unit Torino 3, Turin, Italy
| | - Mario Giacobini
- Data Analysis and Modeling Unit, Department of Veterinary Sciences, University of Turin, Turin, Italy
| |
Collapse
|
23
|
Wollenstein-Betech S, Cassandras CG, Paschalidis IC. Personalized Predictive Models for Symptomatic COVID-19 Patients Using Basic Preconditions: Hospitalizations, Mortality, and the Need for an ICU or Ventilator. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2020:2020.05.03.20089813. [PMID: 32511489 PMCID: PMC7273257 DOI: 10.1101/2020.05.03.20089813] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
BACKGROUND The rapid global spread of the virus SARS-CoV-2 has provoked a spike in demand for hospital care. Hospital systems across the world have been over-extended, including in Northern Italy, Ecuador, and New York City, and many other systems face similar challenges. As a result, decisions on how to best allocate very limited medical resources have come to the forefront. Specifically, under consideration are decisions on who to test, who to admit into hospitals, who to treat in an Intensive Care Unit (ICU), and who to support with a ventilator. Given today's ability to gather, share, analyze and process data, personalized predictive models based on demographics and information regarding prior conditions can be used to (1) help decision-makers allocate limited resources, when needed, (2) advise individuals how to better protect themselves given their risk profile, (3) differentiate social distancing guidelines based on risk, and (4) prioritize vaccinations once a vaccine becomes available. OBJECTIVE To develop personalized models that predict the following events: (1) hospitalization, (2) mortality, (3) need for ICU, and (4) need for a ventilator. To predict hospitalization, it is assumed that one has access to a patient's basic preconditions, which can be easily gathered without the need to be at a hospital. For the remaining models, different versions developed include different sets of a patient's features, with some including information on how the disease is progressing (e.g., diagnosis of pneumonia). MATERIALS AND METHODS Data from a publicly available repository, updated daily, containing information from approximately 91,000 patients in Mexico were used. The data for each patient include demographics, prior medical conditions, SARS-CoV-2 test results, hospitalization, mortality and whether a patient has developed pneumonia or not. Several classification methods were applied, including robust versions of logistic regression, and support vector machines, as well as random forests and gradient boosted decision trees. RESULTS Interpretable methods (logistic regression and support vector machines) perform just as well as more complex models in terms of accuracy and detection rates, with the additional benefit of elucidating variables on which the predictions are based. Classification accuracies reached 61%, 76%, 83%, and 84% for predicting hospitalization, mortality, need for ICU and need for a ventilator, respectively. The analysis reveals the most important preconditions for making the predictions. For the four models derived, these are: (1) for hospitalization: age, gender, chronic renal insufficiency, diabetes, immunosuppression; (2) for mortality: age, SARS-CoV-2 test status, immunosuppression and pregnancy; (3) for ICU need: development of pneumonia (if available), cardiovascular disease, asthma, and SARS-CoV-2 test status; and (4) for ventilator need: ICU and pneumonia (if available), age, gender, cardiovascular disease, obesity, pregnancy, and SARS-CoV-2 test result.
Collapse
Affiliation(s)
- Salomón Wollenstein-Betech
- Division of Systems Engineering, Department of Electrical and Computer Engineering, Boston University, Boston, MA 02215
| | - Christos G Cassandras
- Division of Systems Engineering, Department of Electrical and Computer Engineering, Boston University, Boston, MA 02215
| | - Ioannis Ch Paschalidis
- Division of Systems Engineering, Department of Electrical and Computer Engineering, Department of Biomedical Engineering, Boston University, Boston, MA 02215
| |
Collapse
|
24
|
Zhang L, Wang Y, Niu M, Wang C, Wang Z. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study. Sci Rep 2020; 10:4406. [PMID: 32157171 PMCID: PMC7064542 DOI: 10.1038/s41598-020-61123-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 02/19/2020] [Indexed: 01/19/2023] Open
Abstract
With the development of data mining, machine learning offers opportunities to improve discrimination by analyzing complex interactions among massive variables. To test the ability of machine learning algorithms for predicting risk of type 2 diabetes mellitus (T2DM) in a rural Chinese population, we focus on a total of 36,652 eligible participants from the Henan Rural Cohort Study. Risk assessment models for T2DM were developed using six machine learning algorithms, including logistic regression (LR), classification and regression tree (CART), artificial neural networks (ANN), support vector machine (SVM), random forest (RF) and gradient boosting machine (GBM). The model performance was measured in an area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, negative predictive value and area under precision recall curve. The importance of variables was identified based on each classifier and the shapley additive explanations approach. Using all available variables, all models for predicting risk of T2DM demonstrated strong predictive performance, with AUCs ranging between 0.811 and 0.872 using laboratory data and from 0.767 to 0.817 without laboratory data. Among them, the GBM model performed best (AUC: 0.872 with laboratory data and 0.817 without laboratory data). Performance of models plateaued when introduced 30 variables to each model except CART model. Among the top-10 variables across all methods were sweet flavor, urine glucose, age, heart rate, creatinine, waist circumference, uric acid, pulse pressure, insulin, and hypertension. New important risk factors (urinary indicators, sweet flavor) were not found in previous risk prediction methods, but determined by machine learning in our study. Through the results, machine learning methods showed competence in predicting risk of T2DM, leading to greater insights on disease risk factors with no priori assumption of causality.
Collapse
Affiliation(s)
- Liying Zhang
- School of Information Engineering, Zhengzhou University, Zhengzhou, Henan, P.R. China
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, P.R. China
| | - Yikang Wang
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, P.R. China
| | - Miaomiao Niu
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, P.R. China
| | - Chongjian Wang
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, P.R. China
| | - Zhenfei Wang
- School of Information Engineering, Zhengzhou University, Zhengzhou, Henan, P.R. China.
| |
Collapse
|
25
|
Soft Clustering for Enhancing the Diagnosis of Chronic Diseases over Machine Learning Algorithms. JOURNAL OF HEALTHCARE ENGINEERING 2020; 2020:4984967. [PMID: 32211144 PMCID: PMC7085388 DOI: 10.1155/2020/4984967] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 01/02/2020] [Accepted: 01/18/2020] [Indexed: 11/18/2022]
Abstract
Chronic diseases represent a serious threat to public health across the world. It is estimated at about 60% of all deaths worldwide and approximately 43% of the global burden of chronic diseases. Thus, the analysis of the healthcare data has helped health officials, patients, and healthcare communities to perform early detection for those diseases. Extracting the patterns from healthcare data has helped the healthcare communities to obtain complete medical data for the purpose of diagnosis. The objective of the present research work is presented to improve the surveillance detection system for chronic diseases, which is used for the protection of people's lives. For this purpose, the proposed system has been developed to enhance the detection of chronic disease by using machine learning algorithms. The standard data related to chronic diseases have been collected from various worldwide resources. In healthcare data, special chronic diseases include ambiguous objects of the class. Therefore, the presence of ambiguous objects shows the availability of traits involving two or more classes, which reduces the accuracy of the machine learning algorithms. The novelty of the current research work lies in the assumption that demonstrates the noncrisp Rough K-means (RKM) clustering for figuring out the ambiguity in chronic disease dataset to improve the performance of the system. The RKM algorithm has clustered data into two sets, namely, the upper approximation and lower approximation. The objects belonging to the upper approximation are favourable objects, whereas the ones belonging to the lower approximation are excluded and identified as ambiguous. These ambiguous objects have been excluded to improve the machine learning algorithms. The machine learning algorithms, namely, naïve Bayes (NB), support vector machine (SVM), K-nearest neighbors (KNN), and random forest tree, are presented and compared. The chronic disease data are obtained from the machine learning repository and Kaggle to test and evaluate the proposed model. The experimental results demonstrate that the proposed system is successfully employed for the diagnosis of chronic diseases. The proposed model achieved the best results with naive Bayes with RKM for the classification of diabetic disease (80.55%), whereas SVM with RKM for the classification of kidney disease achieved 100% and SVM with RKM for the classification of cancer disease achieved 97.53 with respect to accuracy metric. The performance measures, such as accuracy, sensitivity, specificity, precision, and F-score, are employed to evaluate the performance of the proposed system. Furthermore, evaluation and comparison of the proposed system with the existing machine learning algorithms are presented. Finally, the proposed system has enhanced the performance of machine learning algorithms.
Collapse
|
26
|
A classification model for prediction of clinical severity level using qSOFA medical score. INFORMATION DISCOVERY AND DELIVERY 2020. [DOI: 10.1108/idd-02-2019-0013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
The purpose of this study is to develop an efficient prediction model using vital signs and standard medical score systems, which predicts the clinical severity level of the patient in advance based on the quick sequential organ failure assessment (qSOFA) medical score method.
Design/methodology/approach
To predict the clinical severity level of the patient in advance, the authors have formulated a training dataset that is constructed based on the qSOFA medical score method. Further, along with the multiple vital signs, different standard medical scores and their correlation features are used to build and improve the accuracy of the prediction model. It is made sure that the constructed training set is suitable for the severity level prediction because the formulated dataset has different clusters each corresponding to different severity levels according to qSOFA score.
Findings
From the experimental result, it is found that the inclusion of the standard medical scores and their correlation along with multiple vital signs improves the accuracy of the clinical severity level prediction model. In addition, the authors showed that the training dataset formulated from the temporal data (which includes vital signs and medical scores) based on the qSOFA medical scoring system has the clusters which correspond to each severity level in qSOFA score. Finally, it is found that RAndom k-labELsets multi-label classification performs better prediction of severity level compared to neural network-based multi-label classification.
Originality/value
This paper helps in identifying patient' clinical status.
Collapse
|
27
|
A Method to Detect Type 1 Diabetes Based on Physical Activity Measurements Using a Mobile Device. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9122555] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Type 1 diabetes is a chronic disease marked by high blood glucose levels, called hyperglycemia. Diagnosis of diabetes typically requires one or more blood tests. The aim of this paper is to discuss a non-invasive method of type 1 diabetes detection, based on physical activity measurement. We solved a binary classification problem using a variety of computational intelligence methods, including non-linear classification algorithms, which were applied and comparatively assessed. Prediction of disease presence among children and adolescents was evaluated using performance measures, such as accuracy, sensitivity, specificity, precision, the goodness index, and AUC. The most satisfying results were obtained when using the random forest method. The primary parameters in disease detection were weekly step count and the weekly number of vigorous activity minutes. The dependance between the weekly number of steps and the type 1 diabetes presence was established after an insightful analysis of data using classification and clustering algorithms. The findings have shown promising results that type 1 diabetes can be diagnosed using physical activity measurement. This is essential regarding the non-invasiveness and flexibility of the detection method, which can be tested at any time anywhere. The proposed technique can be implemented on a mobile device.
Collapse
|
28
|
Brisimi TS, Xu T, Wang T, Dai W, Paschalidis IC. Predicting diabetes-related hospitalizations based on electronic health records. Stat Methods Med Res 2018; 28:3667-3682. [PMID: 30474497 DOI: 10.1177/0962280218810911] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Objective: To derive a predictive model to identify patients likely to be hospitalized during the following year due to complications attributed to Type II diabetes. Methods: A variety of supervised machine learning classification methods were tested and a new method that discovers hidden patient clusters in the positive class (hospitalized) was developed while, at the same time, sparse linear support vector machine classifiers were derived to separate positive samples from the negative ones (non-hospitalized). The convergence of the new method was established and theoretical guarantees were proved on how the classifiers it produces generalize to a test set not seen during training. Results: The methods were tested on a large set of patients from the Boston Medical Center - the largest safety net hospital in New England. It is found that our new joint clustering/classification method achieves an accuracy of 89% (measured in terms of area under the ROC Curve) and yields informative clusters which can help interpret the classification results, thus increasing the trust of physicians to the algorithmic output and providing some guidance towards preventive measures. While it is possible to increase accuracy to 92% with other methods, this comes with increased computational cost and lack of interpretability. The analysis shows that even a modest probability of preventive actions being effective (more than 19%) suffices to generate significant hospital care savings. Conclusions: Predictive models are proposed that can help avert hospitalizations, improve health outcomes and drastically reduce hospital expenditures. The scope for savings is significant as it has been estimated that in the USA alone, about $5.8 billion are spent each year on diabetes-related hospitalizations that could be prevented.
Collapse
Affiliation(s)
- Theodora S Brisimi
- Center for Information and Systems Engineering, Boston University, Boston, MA, USA
| | - Tingting Xu
- Center for Information and Systems Engineering, Boston University, Boston, MA, USA
| | - Taiyao Wang
- Center for Information and Systems Engineering, Boston University, Boston, MA, USA
| | - Wuyang Dai
- Center for Information and Systems Engineering, Boston University, Boston, MA, USA
| | | |
Collapse
|