1
|
Rezaei Ghahroodi Z, Eftekhari Mahabadi S, Esberizi A, Sami R, Mansourian M. Association of the medication protocols and longitudinal change of COVID-19 symptoms: a hospital-based mixed-statistical methods study. J Biopharm Stat 2024:1-21. [PMID: 38515283 DOI: 10.1080/10543406.2024.2333527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 03/17/2024] [Indexed: 03/23/2024]
Abstract
The objective of this study was to identify the relationship between hospitalization treatment strategies leading to change in symptoms during 12-week follow-up among hospitalized patients during the COVID-19 outbreak. In this article, data from a prospective cohort study on COVID-19 patients admitted to Khorshid Hospital, Isfahan, Iran, from February 2020 to February 2021, were analyzed and reported. Patient characteristics, including socio-demographics, comorbidities, signs and symptoms, and treatments during hospitalization, were investigated. Also, to investigate the treatment effects adjusted by other confounding factors that lead to symptom change during follow-up, the binary classification trees, generalized linear mixed model, machine learning, and joint generalized estimating equation methods were applied. This research scrutinized the effects of various medications on COVID-19 patients in a prospective hospital-based cohort study, and found that heparin, methylprednisolone, ceftriaxone, and hydroxychloroquine were the most frequently prescribed medications. The results indicate that of patients under 65 years of age, 76% had a cough at the time of admission, while of patients with Cr levels of 1.1 or more, 80% had not lost weight at the time of admission. The results of fitted models showed that, during the follow-up, women are more likely to have shortness of breath (OR = 1.25; P-value: 0.039), fatigue (OR = 1.31; P-value: 0.013) and cough (OR = 1.29; P-value: 0.019) compared to men. Additionally, patients with symptoms of chest pain, fatigue and decreased appetite during admission are at a higher risk of experiencing fatigue during follow-up. Each day increase in the duration of ceftriaxone multiplies the odds of shortness of breath by 1.15 (P-value: 0.012). With each passing week, the odds of losing weight increase by 1.41 (P-value: 0.038), while the odds of shortness of breath and cough decrease by 0.84 (P-value: 0.005) and 0.56 (P-value: 0.000), respectively. In addition, each day increase in the duration of meropenem or methylprednisolone decreased the odds of weight loss at follow-up by 0.88 (P-value: 0.026) and 0.91 (P-value: 0.023), respectively (among those who took these medications). Identified prognostic factors can help clinicians and policymakers adapt management strategies for patients in any pandemic like COVID-19, which ultimately leads to better hospital decision-making and improved patient quality of life outcomes.
Collapse
Affiliation(s)
- Zahra Rezaei Ghahroodi
- School of Mathematics, Statistics and Computer Science, University of Tehran, Tehran, Iran
| | | | - Alireza Esberizi
- School of Mathematics, Statistics and Computer Science, University of Tehran, Tehran, Iran
| | - Ramin Sami
- Department of Internal Medicine, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Marjan Mansourian
- Department of Epidemiology and Biostatistics, School of Health, Isfahan University of Medical Sciences, Isfahan, Iran
| |
Collapse
|
2
|
Åkerla J, Nevalainen J, Pesonen JS, Pöyhönen A, Koskimäki J, Häkkinen J, Tammela TLJ, Auvinen A. Do LUTS Predict Mortality? An Analysis Using Random Forest Algorithms. Clin Interv Aging 2024; 19:237-245. [PMID: 38371602 PMCID: PMC10873145 DOI: 10.2147/cia.s432368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 01/17/2024] [Indexed: 02/20/2024] Open
Abstract
Purpose To evaluate a random forest (RF) algorithm of lower urinary tract symptoms (LUTS) as a predictor of all-cause mortality in a population-based cohort. Materials and Methods A population-based cohort of 3143 men born in 1924, 1934, and 1944 was evaluated using a mailed questionnaire including the Danish Prostatic Symptom Score (DAN-PSS-1) to assess LUTS as well as questions on medical conditions and behavioral and sociodemographic factors. Surveys were repeated in 1994, 1999, 2004, 2009 and 2015. The cohort was followed-up for vital status until the end of 2018. RF uses an ensemble of classification trees for prediction with a good flexibility and without overfitting. RF algorithms were developed to predict the five-year mortality using LUTS, demographic, medical, and behavioral factors alone and in combinations. Results A total of 2663 men were included in the study, of whom 917 (34%) died during follow-up (median follow-up time 15.0 years). The LUTS-based RF algorithm showed an area under the curve (AUC) 0.60 (95% CI 0.52-0.69) for five-year mortality. An expanded RF algorithm, including LUTS, medical history, and behavioral and sociodemographic factors, yielded an AUC 0.73 (0.65-0.81), while an algorithm excluding LUTS yielded an AUC 0.71 (0.62-0.78). Conclusion An exploratory RF algorithm using LUTS can predict all-cause mortality with acceptable discrimination at the group level. In clinical practice, it is unlikely that LUTS will improve the accuracy to predict death if the patient's background is well known.
Collapse
Affiliation(s)
- Jonne Åkerla
- Department of Urology, Tampere University Hospital, Tampere, Finland
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | | | - Jori S Pesonen
- Department of Surgery, Päijät-Häme Central Hospital, Lahti, Finland
| | - Antti Pöyhönen
- Centre for Military Medicine, The Finnish Defence Forces, Riihimäki, Finland
| | - Juha Koskimäki
- Department of Urology, Tampere University Hospital, Tampere, Finland
| | - Jukka Häkkinen
- Department of Urology, Länsi-Pohja healthcare District, Kemi, Finland
| | - Teuvo L J Tammela
- Department of Urology, Tampere University Hospital, Tampere, Finland
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Anssi Auvinen
- Faculty of Social Sciences, Tampere University, Tampere, Finland
| |
Collapse
|
3
|
Souza-Silva RD, Calixto-Lima L, Varea Maria Wiegert E, de Oliveira LC. Decision tree algorithm to predict mortality in incurable cancer: a new prognostic model. BMJ Support Palliat Care 2024:spcare-2023-004581. [PMID: 38242639 DOI: 10.1136/spcare-2023-004581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 01/08/2024] [Indexed: 01/21/2024]
Abstract
OBJECTIVES To develop and validate a new prognostic model to predict 90-day mortality in patients with incurable cancer. METHODS In this prospective cohort study, patients with incurable cancer receiving palliative care (n = 1322) were randomly divided into two groups: development (n = 926, 70%) and validation (n = 396, 30%). A decision tree algorithm was used to develop a prognostic model with clinical variables. The accuracy and applicability of the proposed model were assessed by the C-statistic, calibration and receiver operating characteristic (ROC) curve. RESULTS Albumin (75.2%), C reactive protein (CRP) (47.7%) and Karnofsky Performance Status (KPS) ≥50% (26.5%) were the variables that most contributed to the classification power of the prognostic model, named Simple decision Tree algorithm for predicting mortality in patients with Incurable Cancer (acromion STIC). This was used to identify three groups of increasing risk of 90-day mortality: STIC-1 - low risk (probability of death: 0.30): albumin ≥3.6 g/dL, CRP <7.8 mg/dL and KPS ≥50%; STIC-2 - medium risk (probability of death: 0.66 to 0.69): albumin ≥3.6 g/dL, CRP <7.8 mg/dL and KPS <50%, or albumin ≥3.6 g/dL and CRP ≥7.8 mg/dL; STIC-3 - high risk (probability of death: 0.79): albumin <3.6 g/dL. In the validation dataset, good accuracy (C-statistic ≥0.71), Hosmer-Lemeshow p=0.12 and area under the ROC curve=0.707 were found. CONCLUSIONS STIC is a valid, practical tool for stratifying patients with incurable cancer into three risk groups for 90-day mortality.
Collapse
|
4
|
Mangino AA, Bolin JH, Finch WH. Fixed Effects or Mixed Effects Classifiers? Evidence From Simulated and Archival Data. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2023; 83:710-739. [PMID: 37398843 PMCID: PMC10311958 DOI: 10.1177/00131644221108180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
This study seeks to compare fixed and mixed effects models for the purposes of predictive classification in the presence of multilevel data. The first part of the study utilizes a Monte Carlo simulation to compare fixed and mixed effects logistic regression and random forests. An applied examination of the prediction of student retention in the public-use U.S. PISA data set was considered to verify the simulation findings. Results of this study indicate fixed effects models performed comparably with mixed effects models across both the simulation and PISA examinations. Results broadly suggest that researchers should be cognizant of the type of predictors and data structure being used, as these factors carried more weight than did the model type.
Collapse
Affiliation(s)
- Anthony A. Mangino
- Ball State University, Muncie, IN, USA
- University of Kentucky, Lexington, USA
| | | | | |
Collapse
|
5
|
Hu J, Szymczak S. A review on longitudinal data analysis with random forest. Brief Bioinform 2023; 24:6991123. [PMID: 36653905 PMCID: PMC10025446 DOI: 10.1093/bib/bbad002] [Citation(s) in RCA: 42] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 12/12/2022] [Accepted: 12/31/2012] [Indexed: 01/20/2023] Open
Abstract
In longitudinal studies variables are measured repeatedly over time, leading to clustered and correlated observations. If the goal of the study is to develop prediction models, machine learning approaches such as the powerful random forest (RF) are often promising alternatives to standard statistical methods, especially in the context of high-dimensional data. In this paper, we review extensions of the standard RF method for the purpose of longitudinal data analysis. Extension methods are categorized according to the data structures for which they are designed. We consider both univariate and multivariate response longitudinal data and further categorize the repeated measurements according to whether the time effect is relevant. Even though most extensions are proposed for low-dimensional data, some can be applied to high-dimensional data. Information of available software implementations of the reviewed extensions is also given. We conclude with discussions on the limitations of our review and some future research directions.
Collapse
Affiliation(s)
- Jianchang Hu
- Institute of Medical Biometry and Statistics, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Silke Szymczak
- Institute of Medical Biometry and Statistics, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
| |
Collapse
|
6
|
Sigrist F. Latent Gaussian Model Boosting. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:1894-1905. [PMID: 35439126 DOI: 10.1109/tpami.2022.3168152] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Latent Gaussian models and boosting are widely used techniques in statistics and machine learning. Tree-boosting shows excellent prediction accuracy on many data sets, but potential drawbacks are that it assumes conditional independence of samples, produces discontinuous predictions for, e.g., spatial data, and it can have difficulty with high-cardinality categorical variables. Latent Gaussian models, such as Gaussian process and grouped random effects models, are flexible prior models which explicitly model dependence among samples and which allow for efficient learning of predictor functions and for making probabilistic predictions. However, existing latent Gaussian models usually assume either a zero or a linear prior mean function which can be an unrealistic assumption. This article introduces a novel approach that combines boosting and latent Gaussian models to remedy the above-mentioned drawbacks and to leverage the advantages of both techniques. We obtain increased prediction accuracy compared to existing approaches in both simulated and real-world data experiments.
Collapse
|
7
|
Mangino AA, Finch WH. Prediction With Mixed Effects Models: A Monte Carlo Simulation Study. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2021; 81:1118-1142. [PMID: 34565818 PMCID: PMC8451021 DOI: 10.1177/0013164421992818] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Oftentimes in many fields of the social and natural sciences, data are obtained within a nested structure (e.g., students within schools). To effectively analyze data with such a structure, multilevel models are frequently employed. The present study utilizes a Monte Carlo simulation to compare several novel multilevel classification algorithms across several varied data conditions for the purpose of prediction. Among these models, the panel neural network and Bayesian generalized mixed effects model (multilevel Bayes) consistently yielded the highest prediction accuracy in test data across nearly all data conditions.
Collapse
Affiliation(s)
| | - W Holmes Finch
- Ball State University, Teachers College, Muncie, IN, USA
| |
Collapse
|
8
|
Speiser JL. A random forest method with feature selection for developing medical prediction models with clustered and longitudinal data. J Biomed Inform 2021; 117:103763. [PMID: 33781921 PMCID: PMC8131242 DOI: 10.1016/j.jbi.2021.103763] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 03/03/2021] [Accepted: 03/23/2021] [Indexed: 12/22/2022]
Abstract
BACKGROUND Machine learning methodologies are gaining popularity for developing medical prediction models for datasets with a large number of predictors, particularly in the setting of clustered and longitudinal data. Binary Mixed Model (BiMM) forest is a promising machine learning algorithm which may be applied to develop prediction models for clustered and longitudinal binary outcomes. Although machine learning methods for clustered and longitudinal methods such as BiMM forest exist, feature selection has not been analyzed via data simulations. Feature selection improves the practicality and ease of use of prediction models for clinicians by reducing the burden of data collection. Thus, feature selection procedures are not only beneficial, but are often necessary for development of medical prediction models. In this study, we aim to assess feature selection within the BiMM forest setting for modeling clustered and longitudinal binary outcomes. METHODS We conducted a simulation study to compare BiMM forest with feature selection (backward elimination or stepwise selection) to standard generalized linear mixed model feature selection methods (shrinkage and backward elimination). We also evaluated feature selection methods to develop models predicting mobility disability in older adults using the Health, Aging and Body Composition Study dataset as an example utilization of the proposed methodology. RESULTS BiMM forest with backward elimination generally offered higher computational efficiency, similar or higher predictive performance (accuracy and area under the receiver operating curve), and similar or higher ability to identify correct features compared to linear methods for the different simulated scenarios. For predicting mobility disability in older adults, methods generally performed similarly in terms of accuracy, area under the receiver operating curve, and specificity; however, BiMM forest with backward elimination had the highest sensitivity. CONCLUSIONS This study is novel because it is the first investigation of feature selection for developing random forest prediction models for clustered and longitudinal binary outcomes. Results from the simulation study reveal that BiMM forest with backward elimination has the highest accuracy (performance and identification of correct features) and lowest computation time compared to other feature selection methods in some scenarios and similar performance in other scenarios. Many informatics datasets have clustered and longitudinal outcomes and results from this study suggest that BiMM forest with backward elimination may be beneficial for developing medical prediction models.
Collapse
Affiliation(s)
- Jaime Lynn Speiser
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA.
| |
Collapse
|
9
|
Arboretti R, Ceccato R, Pegoraro L, Salmaso L, Housmekerides C, Spadoni L, Pierangelo E, Quaggia S, Tveit C, Vianello S. Machine learning and design of experiments with an application to product innovation in the chemical industry. J Appl Stat 2021; 49:2674-2699. [DOI: 10.1080/02664763.2021.1907840] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Rosa Arboretti
- Department of Civil, Environmental and Architectural Engineering, Università degli Studi di Padova, Padua, Italy
| | - Riccardo Ceccato
- Department of Management and Engineering, Università degli Studi di Padova, Vicenza, Italy
| | - Luca Pegoraro
- Department of Management and Engineering, Università degli Studi di Padova, Vicenza, Italy
| | - Luigi Salmaso
- Department of Management and Engineering, Università degli Studi di Padova, Vicenza, Italy
| | | | | | | | | | | | | |
Collapse
|
10
|
D’Ottaviano F, Yang W. On missing random effects in machine learning. COMMUN STAT-SIMUL C 2020. [DOI: 10.1080/03610918.2020.1801729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
| | - Wenzhao Yang
- Dow Chemical Co, Core R&D, Dow Inc., Lake Jackson, Texas, USA
| |
Collapse
|