1
|
Brosula R, Corbin CK, Chen JH. Pathophysiological Features in Electronic Medical Records Sustain Model Performance under Temporal Dataset Shift. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:95-104. [PMID: 38827052 PMCID: PMC11141811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Access to real-world data streams like electronic medical records (EMRs) has accelerated the development of supervised machine learning (ML) models for clinical applications. However, few studies investigate the differential impact of particular features in the EMR on model performance under temporal dataset shift. To explain how features in the EMR impact models over time, this study aggregates features into feature groups by their source (e.g. medication orders, diagnosis codes and lab results) and feature categories based on their reflection of patient pathophysiology or healthcare processes. We adapt Shapley values to explain feature groups' and feature categories' marginal contribution to initial and sustained model performance. We investigate three standard clinical prediction tasks and find that while feature contributions to initial performance differ across tasks, pathophysiological features help mitigate temporal discrimination deterioration. These results provide interpretable insights on how specific feature groups contribute to model performance and robustness to temporal dataset shift.
Collapse
Affiliation(s)
- Raphael Brosula
- Genomic Center for Infectious Diseases, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Conor K Corbin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Jonathan H Chen
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| |
Collapse
|
2
|
Maimaiti M, Yang B, Xu T, Cui L, Yang S. Accurate correction model of blood potassium concentration in hemolytic specimens. Clin Chim Acta 2024; 554:117762. [PMID: 38211807 DOI: 10.1016/j.cca.2024.117762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 12/27/2023] [Accepted: 01/04/2024] [Indexed: 01/13/2024]
Abstract
BACKGROUND AND AIMS The results of blood potassium can be seriously affected by specimen hemolysis which may interfere with clinicians' interpretation of test results. Redrawing blood and retesting may delay treatment time and it is not feasible for critically ill patients with difficulty in specimen collection. Therefore, it is significant to establish a mathematical model that can quickly correct the blood potassium concentration of hemolytic specimens. MATERIALS AND METHODS The residual blood samples from 107 patients at Peking University Third Hospital were collected to establish potassium correction model. Samples with different hemolysis indexes were obtained by ultrasonic crushing method. Blood potassium correction models of hemolysis specimens were established by linear regression and curve fitting using SPSS and MATLAB, respectively. In addition, blood samples from another 85 patients were used to verify the accuracy of the models and determine the optimal model. RESULTS Variation of potassium (ΔK) was 0.003HI-0.03 (R2 = 0.9749) in linear regression model which had high correlation in ΔK and HI, and the correction formula was Kcorrection = Khemolysis-0.003 × HI + 0.03. Average rate of potassium change (αaverage) was 0.003 ± 0.0002 mmol/L in curve fitting model, and correction formula was Kcorrection = Khemolysis-0.003 × HI, and both men and women can use the same correction model. The accuracy of linear regression model was 96.5 %, and there was statistical difference between the verification results and the measured values (p < 0.05), while the accuracy of curve fitting model was 100 %, and there was no statistical difference between the verification results and the measured values (p = 0.552). The model was validated in an independent set of samples and all were within the TEa of 6 % and the accuracy of 100 %. CONCLUSIONS Both linear regression and curve fitting models of potassium correction had high accuracy, and can effectively correct the potassium concentration of hemolytic specimens, while the curve fitting model have superior accuracy.
Collapse
Affiliation(s)
- Mulatijiang Maimaiti
- Department of Laboratory Medicine, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing 100191, PR China
| | - Boxin Yang
- Department of Laboratory Medicine, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing 100191, PR China
| | - Tong Xu
- Department of Laboratory Medicine, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing 100191, PR China
| | - Liyan Cui
- Department of Laboratory Medicine, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing 100191, PR China.
| | - Shuo Yang
- Department of Laboratory Medicine, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing 100191, PR China.
| |
Collapse
|
3
|
Zheng J, Li J, Zhang Z, Yu Y, Tan J, Liu Y, Gong J, Wang T, Wu X, Guo Z. Clinical Data based XGBoost Algorithm for infection risk prediction of patients with decompensated cirrhosis: a 10-year (2012-2021) Multicenter Retrospective Case-control study. BMC Gastroenterol 2023; 23:310. [PMID: 37704966 PMCID: PMC10500933 DOI: 10.1186/s12876-023-02949-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 09/07/2023] [Indexed: 09/15/2023] Open
Abstract
OBJECTIVES To appraise effective predictors for infection in patients with decompensated cirrhosis (DC) by using XGBoost algorithm in a retrospective case-control study. METHODS Clinical data were retrospectively collected from 6,648 patients with DC admitted to five tertiary hospitals. Indicators with significant differences were determined by univariate analysis and least absolute contraction and selection operator (LASSO) regression. Further multi-tree extreme gradient boosting (XGBoost) machine learning-based model was used to rank importance of features selected from LASSO and subsequently constructed infection risk prediction model with simple-tree XGBoost model. Finally, the simple-tree XGBoost model is compared with the traditional logical regression (LR) model. Performances of models were evaluated by area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity. RESULTS Six features, including total bilirubin, blood sodium, albumin, prothrombin activity, white blood cell count, and neutrophils to lymphocytes ratio were selected as predictors for infection in patients with DC. Simple-tree XGBoost model conducted by these features can predict infection risk accurately with an AUROC of 0.971, sensitivity of 0.915, and specificity of 0.900 in training set. The performance of simple-tree XGBoost model is better than that of traditional LR model in training set, internal verification set, and external feature set (P < 0.001). CONCLUSIONS The simple-tree XGBoost predictive model developed based on a minimal amount of clinical data available to DC patients with restricted medical resources could help primary healthcare practitioners promptly identify potential infection.
Collapse
Affiliation(s)
- Jing Zheng
- Operation Management Office, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, 401320, China
| | - Jianjun Li
- Department of Cardiothoracic Surgery, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, 401320, China
| | - Zhengyu Zhang
- Medical Records Department, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310003, China
| | - Yue Yu
- Senior Bioinformatician Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, 55905, US
| | - Juntao Tan
- Operation Management Office, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, 401320, China
| | - Yunyu Liu
- Medical Records Department, the Second Affiliated Hospital of Chongqing Medical University, Chongqing, 400010, China
| | - Jun Gong
- Department of Information Center, the University Town Hospital of Chongqing Medical University, Chongqing, 401331, China
| | - Tingting Wang
- College of Medical Informatics, Chongqing Medical University, Chongqing, 400016, China
| | - Xiaoxin Wu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Centre for Infectious Diseases, the First Affiliated Hospital, Zhejiang University School of Medicine, 79 Qing Chun Road, Hangzhou, 310003, Zhejiang, China.
| | - Zihao Guo
- Department of Gastroenterology, Chongqing Banan Cancer Hospital, Chongqing, 400054, China.
| |
Collapse
|
4
|
Liu J, Glied S, Yakusheva O, Bevin C, Schlak AE, Yoon S, Kulage KM, Poghosyan L. Using machine-learning methods to predict in-hospital mortality through the Elixhauser index: A Medicare data analysis. Res Nurs Health 2023; 46:411-424. [PMID: 37221452 PMCID: PMC10330510 DOI: 10.1002/nur.22322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 04/21/2023] [Accepted: 05/10/2023] [Indexed: 05/25/2023]
Abstract
Accurate in-hospital mortality prediction can reflect the prognosis of patients, help guide allocation of clinical resources, and help clinicians make the right care decisions. There are limitations to using traditional logistic regression models when assessing the model performance of comorbidity measures to predict in-hospital mortality. Meanwhile, the use of novel machine-learning methods is growing rapidly. In 2021, the Agency for Healthcare Research and Quality published new guidelines for using the Present-on-Admission (POA) indicator from the International Classification of Diseases, Tenth Revision, for coding comorbidities to predict in-hospital mortality from the Elixhauser's comorbidity measurement method. We compared the model performance of logistic regression, elastic net model, and artificial neural network (ANN) to predict in-hospital mortality from Elixhauser's measures under the updated POA guidelines. In this retrospective analysis, 1,810,106 adult Medicare inpatient admissions from six US states admitted after September 23, 2017, and discharged before April 11, 2019 were extracted from the Centers for Medicare and Medicaid Services data warehouse. The POA indicator was used to distinguish pre-existing comorbidities from complications that occurred during hospitalization. All models performed well (C-statistics >0.77). Elastic net method generated a parsimonious model, in which there were five fewer comorbidities selected to predict in-hospital mortality with similar predictive power compared to the logistic regression model. ANN had the highest C-statistics compared to the other two models (0.800 vs. 0.791 and 0.791). Elastic net model and AAN can be applied successfully to predict in-hospital mortality.
Collapse
Affiliation(s)
- Jianfang Liu
- Columbia University School of Nursing, New York City, New York, USA
| | - Sherry Glied
- Robert F. Wagner Graduate School of Public Service, New York University, New York City, New York, USA
| | - Olga Yakusheva
- University of Michigan School of Nursing, University of Michigan School of Public Health, Ann Arbor, Michigan, USA
| | - Cohen Bevin
- Mount Sinai Health System, New York City, New York, USA
| | - Amelia E Schlak
- AAAS Science and Technology Policy Fellow, Office of Research and Development, U.S. Department of Veteran Affairs, Washington, DC, USA
| | - Sunmoo Yoon
- Division of General Medicine, Department of Medicine, Columbia University Irving Medical Center, New York City, New York, USA
| | - Kristine M Kulage
- Office of Scholarship and Research Development, Columbia University School of Nursing, New York City, New York, USA
| | - Lusine Poghosyan
- Columbia University School of Nursing and Professor of Health Policy and Management, Mailman School of Public Health, Columbia University, Executive Director Center for Healthcare Delivery Research & Innovations (HDRI), New York City, New York, USA
| |
Collapse
|
5
|
Kablan R, Miller HA, Suliman S, Frieboes HB. Evaluation of stacked ensemble model performance to predict clinical outcomes: A COVID-19 study. Int J Med Inform 2023; 175:105090. [PMID: 37172507 PMCID: PMC10165871 DOI: 10.1016/j.ijmedinf.2023.105090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/17/2023] [Accepted: 05/03/2023] [Indexed: 05/15/2023]
Abstract
BACKGROUND The application of machine learning (ML) to analyze clinical data with the goal to predict patient outcomes has garnered increasing attention. Ensemble learning has been used in conjunction with ML to improve predictive performance. Although stacked generalization (stacking), a type of heterogeneous ensemble of ML models, has emerged in clinical data analysis, it remains unclear how to define the best model combinations for strong predictive performance. This study develops a methodology to evaluate the performance of "base" learner models and their optimized combination using "meta" learner models in stacked ensembles to accurately assess performance in the context of clinical outcomes. METHODS De-identified COVID-19 data was obtained from the University of Louisville Hospital, where a retrospective chart review was performed from March 2020 to November 2021. Three differently-sized subsets using features from the overall dataset were chosen to train and evaluate ensemble classification performance. The number of base learners chosen from several algorithm families coupled with a complementary meta learner was varied from a minimum of 2 to a maximum of 8. Predictive performance of these combinations was evaluated in terms of mortality and severe cardiac event outcomes using area-under-the-receiver-operating-characteristic (AUROC), F1, balanced accuracy, and kappa. RESULTS The results highlight the potential to accurately predict clinical outcomes, such as severe cardiac events with COVID-19, from routinely acquired in-hospital patient data. Meta learners Generalized Linear Model (GLM), Multi-Layer Perceptron (MLP), and Partial Least Squares (PLS) had the highest AUROC for both outcomes, while K-Nearest Neighbors (KNN) had the lowest. Performance trended lower in the training set as the number of features increased, and exhibited less variance in both training and validation across all feature subsets as the number of base learners increased. CONCLUSION This study offers a methodology to robustly evaluate ensemble ML performance when analyzing clinical data.
Collapse
Affiliation(s)
- Rianne Kablan
- Department of Bioengineering, University of Louisville, Louisville, KY, USA
| | - Hunter A Miller
- Department of Bioengineering, University of Louisville, Louisville, KY, USA
| | | | - Hermann B Frieboes
- Department of Bioengineering, University of Louisville, Louisville, KY, USA; James Graham Brown Cancer Center, University of Louisville, Louisville, KY, USA; Center for Predictive Medicine, University of Louisville, Louisville, KY, USA.
| |
Collapse
|
6
|
Li H, Chang E, Zheng W, Liu B, Xu J, Gu W, Zhou L, Li J, Liu C, Yu H, Huang W. Multimorbidity and catastrophic health expenditure: Evidence from the China Health and Retirement Longitudinal Study. Front Public Health 2022; 10:1043189. [PMID: 36388267 PMCID: PMC9643627 DOI: 10.3389/fpubh.2022.1043189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 10/05/2022] [Indexed: 01/29/2023] Open
Abstract
Background Population aging accompanied by multimorbidity imposes a great burden on households and the healthcare system. This study aimed to determine the incidence and determinants of catastrophic health expenditure (CHE) in the households of old people with multimorbidity in China. Methods Data were obtained from the China Health and Retirement Longitudinal Study (CHARLS) conducted in 2018, with 3,511 old people (≥60 years) with multimorbidity responding to the survey on behalf of their households. CHE was identified using two thresholds: ≥10% of out-of-pocket (OOP) health spending in total household expenditure (THE) and ≥40% of OOP health spending in household capacity to pay (CTP) measured by non-food household expenditure. Logistic regression models were established to identify the individual and household characteristics associated with CHE incidence. Results The median values of THE, OOP health spending, and CTP reached 19,900, 1,500, and 10,520 Yuan, respectively. The CHE incidence reached 31.5% using the ≥40% CTP threshold and 45.6% using the ≥10% THE threshold. It increased by the number of chronic conditions reported by the respondents (aOR = 1.293-1.855, p < 0.05) and decreased with increasing household economic status (aOR = 1.622-4.595 relative the highest quartile, p < 0.001). Hospital admissions over the past year (aOR = 6.707, 95% CI: 5.186 to 8.674) and outpatient visits over the past month (aOR = 4.891, 95% CI: 3.822 to 6.259) of the respondents were the strongest predictors of CHE incidence. The respondents who were male (aOR = 1.266, 95% CI: 1.054 to 1.521), married (OR = 1.502, 95% CI: 1.211 to 1.862), older than 70 years (aOR = 1.288-1.458 relative to 60-69 years, p < 0.05), completed primary (aOR = 1.328 relative to illiterate, 95% CI: 1.079 to 1.635) or secondary school education (aOR = 1.305 relative to illiterate, 95% CI: 1.002 to 1.701), lived in a small (≤2 members) household (aOR = 2.207, 95% CI: 1.825 to 2.669), and resided in the northeast region (aOR = 1.935 relative to eastern, 95% CI: 1.396 to 2.682) were more likely to incur CHE. Conclusion Multimorbidity is a significant risk of CHE. Household CHE incidence increases with the number of reported chronic conditions. Socioeconomic and regional disparities in CHE incidence persist in China.
Collapse
Affiliation(s)
- Haofei Li
- Department of Health Economics, School of Health Management, Harbin Medical University, Harbin, China
| | - Enxue Chang
- Department of Health Economics, School of Health Management, Harbin Medical University, Harbin, China
| | - Wanji Zheng
- Department of Health Economics, School of Health Management, Harbin Medical University, Harbin, China
| | - Bo Liu
- Department of Health Economics, School of Health Management, Harbin Medical University, Harbin, China
| | - Juan Xu
- Department of Health Economics, School of Health Management, Harbin Medical University, Harbin, China
| | - Wen Gu
- Department of Health Economics, School of Health Management, Harbin Medical University, Harbin, China
| | - Lan Zhou
- Department of Health Economics, School of Health Management, Harbin Medical University, Harbin, China
| | - Jinmei Li
- Heilongjiang Medical Service Management Evaluation Center, Harbin, China
| | - Chaojie Liu
- School of Psychology and Public Health, La Trobe University, Melbourne, VIC, Australia,Chaojie Liu
| | - Hongjuan Yu
- Department of Hematology, The First Affiliated Hospital, Harbin Medical University, Harbin, China,Hongjuan Yu
| | - Weidong Huang
- Department of Health Economics, School of Health Management, Harbin Medical University, Harbin, China,*Correspondence: Weidong Huang
| |
Collapse
|
7
|
Shimoni Z, Froom P, Benbassat J. Parameters of the complete blood count predict in hospital mortality. Int J Lab Hematol 2022; 44:88-95. [PMID: 34464032 DOI: 10.1111/ijlh.13684] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 07/25/2021] [Accepted: 08/10/2021] [Indexed: 11/27/2022]
Abstract
INTRODUCTION Mortality rates are used to evaluate the quality of hospital care after adjusting for disease severity and, commonly also, for age, comorbidity, and laboratory data with only few parameters of the complete blood count (CBC). OBJECTIVE To identify the parameters of the CBC that predict independently in-hospital mortality of acutely admitted patients. POPULATION All patients were admitted to internal medicine, cardiology, and intensive care departments at the Laniado Hospital in Israel in 2018 and 2019. VARIABLES Independent variables were patients' age, sex, and parameters of the CBC. The outcome variable was in-hospital mortality. ANALYSIS Logistic regression. In 2018, we identified the variables that were associated with in-hospital mortality and validated this association in the 2019 cohort. RESULTS In the validation cohort, a model consisting of nine parameters that are commonly available in modern analyzers had a c-statistics (area under the receiver operator curve) of 0.86 and a 10%-90% risk gradient of 0%-21.4%. After including the proportions of large unstained cells, hypochromic, and macrocytic red cells, the c-statistic increased to 0.89, and the risk gradient to 0.1%-29.5%. CONCLUSION The commonly available parameters of the CBC predict in-hospital mortality. Addition of the proportions of hypochromic red cells, macrocytic red cells, and large unstained cells may improve the predictive value of the CBC.
Collapse
Affiliation(s)
- Zvi Shimoni
- Department of Internal Medicine B, Laniado Hospital, Netanya, Israel
- Ruth and Bruce Rappaport School of Medicine, Haifa, Israel
| | - Paul Froom
- Clinical Utility Department, Sanz Medical Center, Laniado Hospital, Netanya, Israel
- School of Public Health, University of Tel Aviv, Tel Aviv, Israel
| | - Jochanan Benbassat
- Department of Medicine (retired), Hadassah University Hospital Jerusalem, Jerusalem, Israel
| |
Collapse
|
8
|
Ducatman BS, Ducatman AM, Crawford JM, Laposata M, Sanfilippo F. The Value Proposition for Pathologists: A Population Health Approach. Acad Pathol 2020; 7:2374289519898857. [PMID: 31984223 PMCID: PMC6961144 DOI: 10.1177/2374289519898857] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 11/11/2019] [Accepted: 12/04/2019] [Indexed: 01/09/2023] Open
Abstract
The transition to a value-based payment system offers pathologists the opportunity to play an increased role in population health by improving outcomes and safety as well as reducing costs. Although laboratory testing itself accounts for a small portion of health-care spending, laboratory data have significant downstream effects in patient management as well as diagnosis. Pathologists currently are heavily engaged in precision medicine, use of laboratory and pathology test results (including autopsy data) to reduce diagnostic errors, and play leading roles in diagnostic management teams. Additionally, pathologists can use aggregate laboratory data to monitor the health of populations and improve health-care outcomes for both individual patients and populations. For the profession to thrive, pathologists will need to focus on extending their roles outside the laboratory beyond the traditional role in the analytic phase of testing. This should include leadership in ensuring correct ordering and interpretation of laboratory testing and leadership in population health programs. Pathologists in training will need to learn key concepts in informatics and data analytics, health-care economics, public health, implementation science, and health systems science. While these changes may reduce reimbursement for the traditional activities of pathologists, new opportunities arise for value creation and new compensation models. This report reviews these opportunities for pathologist leadership in utilization management, precision medicine, reducing diagnostic errors, and improving health-care outcomes.
Collapse
Affiliation(s)
- Barbara S. Ducatman
- Department of Pathology, Beaumont Health, Royal Oak, MI, USA
- Oakland University William Beaumont School of Medicine, Rochester, MI,
USA
| | - Alan M. Ducatman
- Department of Occupational and Environmental Health Sciences, West Virginia
University School of Public Health, Morgantown, WV, USA
| | - James M. Crawford
- Department of Pathology and Laboratory Medicine, Donald and Barbara Zucker
School of Medicine at Hofstra/Northwell, Hempstead, NY, USA
| | - Michael Laposata
- Department of Pathology, University of Texas Medical Branch, Galveston, TX,
USA
| | - Fred Sanfilippo
- Department of Pathology and Laboratory Medicine, Emory University School of
Medicine, Atlanta, GA, USA
| |
Collapse
|