Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Speiser JL, Wolf BJ, Chung D, Karvellas CJ, Koch DG, Durkalski VL. BiMM tree: A decision tree method for modeling clustered and longitudinal binary outcomes. COMMUN STAT-SIMUL C 2018;49:1004-1023. [PMID: 32377032 PMCID: PMC7202553 DOI: 10.1080/03610918.2018.1490429] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 06/04/2018] [Accepted: 06/13/2018] [Indexed: 10/28/2022]

For:	Speiser JL, Wolf BJ, Chung D, Karvellas CJ, Koch DG, Durkalski VL. BiMM tree: A decision tree method for modeling clustered and longitudinal binary outcomes. COMMUN STAT-SIMUL C 2018;49:1004-1023. [PMID: 32377032 PMCID: PMC7202553 DOI: 10.1080/03610918.2018.1490429] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 06/04/2018] [Accepted: 06/13/2018] [Indexed: 10/28/2022]

Number

Cited by Other Article(s)

Rezaei Ghahroodi Z, Eftekhari Mahabadi S, Esberizi A, Sami R, Mansourian M. Association of the medication protocols and longitudinal change of COVID-19 symptoms: a hospital-based mixed-statistical methods study. J Biopharm Stat 2025;35:386-406. [PMID: 38515283 DOI: 10.1080/10543406.2024.2333527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 03/17/2024] [Indexed: 03/23/2024]

Abstract

The objective of this study was to identify the relationship between hospitalization treatment strategies leading to change in symptoms during 12-week follow-up among hospitalized patients during the COVID-19 outbreak. In this article, data from a prospective cohort study on COVID-19 patients admitted to Khorshid Hospital, Isfahan, Iran, from February 2020 to February 2021, were analyzed and reported. Patient characteristics, including socio-demographics, comorbidities, signs and symptoms, and treatments during hospitalization, were investigated. Also, to investigate the treatment effects adjusted by other confounding factors that lead to symptom change during follow-up, the binary classification trees, generalized linear mixed model, machine learning, and joint generalized estimating equation methods were applied. This research scrutinized the effects of various medications on COVID-19 patients in a prospective hospital-based cohort study, and found that heparin, methylprednisolone, ceftriaxone, and hydroxychloroquine were the most frequently prescribed medications. The results indicate that of patients under 65 years of age, 76% had a cough at the time of admission, while of patients with Cr levels of 1.1 or more, 80% had not lost weight at the time of admission. The results of fitted models showed that, during the follow-up, women are more likely to have shortness of breath (OR = 1.25; P-value: 0.039), fatigue (OR = 1.31; P-value: 0.013) and cough (OR = 1.29; P-value: 0.019) compared to men. Additionally, patients with symptoms of chest pain, fatigue and decreased appetite during admission are at a higher risk of experiencing fatigue during follow-up. Each day increase in the duration of ceftriaxone multiplies the odds of shortness of breath by 1.15 (P-value: 0.012). With each passing week, the odds of losing weight increase by 1.41 (P-value: 0.038), while the odds of shortness of breath and cough decrease by 0.84 (P-value: 0.005) and 0.56 (P-value: 0.000), respectively. In addition, each day increase in the duration of meropenem or methylprednisolone decreased the odds of weight loss at follow-up by 0.88 (P-value: 0.026) and 0.91 (P-value: 0.023), respectively (among those who took these medications). Identified prognostic factors can help clinicians and policymakers adapt management strategies for patients in any pandemic like COVID-19, which ultimately leads to better hospital decision-making and improved patient quality of life outcomes.

Collapse

O'Connell NS, Speiser JL. OpenClustered: an R package with a benchmark suite of clustered datasets for methodological evaluation and comparison. BMC Med Res Methodol 2025;25:92. [PMID: 40211159 PMCID: PMC11987461 DOI: 10.1186/s12874-025-02548-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Accepted: 04/01/2025] [Indexed: 04/12/2025] Open

Abstract

BACKGROUND

Clustered data arise when observations are correlated within a group or sampling unit and frequently arise in epidemiology, social sciences, education, linguistics, econometrics, and medicine. Given growing interest in clustered data, we developed a data repository offering clustered datasets that can be used for methodologic comparison with open-source, publicly available data. Traditionally, data simulation studies are employed for methodology evaluation and comparison, which can be fraught with issues such as overly simplistic design and potential for bias. Excellent data repositories are available for standard (non-clustered) datasets, such as OpenML and the Penn Machine Learning Benchmark repository, but there is a paucity of resources available that have clustered data.

RESULTS

In this pilot study, we developed an R package called OpenClustered, which includes 19 clustered datasets with binary outcomes arising from various domains and varying in terms of their size and composition. We present tutorials for using OpenClustered, including examples for filtering and summarizing the datasets. We demonstrate the use of OpenClustered with a small benchmarking study comparing Frequentist and Bayesian implementations of generalized linear mixed models. All code and data are contained on the OpenClustered GitHub page.

CONCLUSION

The OpenClustered R package is the start of a useful data resource for conducting benchmarking studies with open-source clustered data. It facilitates empirical methodologic guidance that is less prone to bias compared to data simulation studies, thereby improving rigor across diverse research fields. In the future, we plan to add more datasets, particularly those with continuous outcomes, as well as functionality for users to submit their clustered datasets to be included in the repository.

Collapse

Åkerla J, Nevalainen J, Pesonen JS, Pöyhönen A, Koskimäki J, Häkkinen J, Tammela TLJ, Auvinen A. Do LUTS Predict Mortality? An Analysis Using Random Forest Algorithms. Clin Interv Aging 2024;19:237-245. [PMID: 38371602 PMCID: PMC10873145 DOI: 10.2147/cia.s432368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 01/17/2024] [Indexed: 02/20/2024] Open

Souza-Silva RD, Calixto-Lima L, Varea Maria Wiegert E, de Oliveira LC. Decision tree algorithm to predict mortality in incurable cancer: a new prognostic model. BMJ Support Palliat Care 2024:spcare-2023-004581. [PMID: 38242639 DOI: 10.1136/spcare-2023-004581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 01/08/2024] [Indexed: 01/21/2024]

Mangino AA, Bolin JH, Finch WH. Fixed Effects or Mixed Effects Classifiers? Evidence From Simulated and Archival Data. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2023;83:710-739. [PMID: 37398843 PMCID: PMC10311958 DOI: 10.1177/00131644221108180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]

Hu J, Szymczak S. A review on longitudinal data analysis with random forest. Brief Bioinform 2023;24:6991123. [PMID: 36653905 PMCID: PMC10025446 DOI: 10.1093/bib/bbad002] [Citation(s) in RCA: 138] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 12/12/2022] [Accepted: 12/31/2012] [Indexed: 01/20/2023] Open

Sigrist F. Latent Gaussian Model Boosting. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023;45:1894-1905. [PMID: 35439126 DOI: 10.1109/tpami.2022.3168152] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Mangino AA, Finch WH. Prediction With Mixed Effects Models: A Monte Carlo Simulation Study. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2021;81:1118-1142. [PMID: 34565818 PMCID: PMC8451021 DOI: 10.1177/0013164421992818] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Speiser JL. A random forest method with feature selection for developing medical prediction models with clustered and longitudinal data. J Biomed Inform 2021;117:103763. [PMID: 33781921 PMCID: PMC8131242 DOI: 10.1016/j.jbi.2021.103763] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 03/03/2021] [Accepted: 03/23/2021] [Indexed: 12/22/2022]

Abstract

BACKGROUND

Machine learning methodologies are gaining popularity for developing medical prediction models for datasets with a large number of predictors, particularly in the setting of clustered and longitudinal data. Binary Mixed Model (BiMM) forest is a promising machine learning algorithm which may be applied to develop prediction models for clustered and longitudinal binary outcomes. Although machine learning methods for clustered and longitudinal methods such as BiMM forest exist, feature selection has not been analyzed via data simulations. Feature selection improves the practicality and ease of use of prediction models for clinicians by reducing the burden of data collection. Thus, feature selection procedures are not only beneficial, but are often necessary for development of medical prediction models. In this study, we aim to assess feature selection within the BiMM forest setting for modeling clustered and longitudinal binary outcomes.

METHODS

We conducted a simulation study to compare BiMM forest with feature selection (backward elimination or stepwise selection) to standard generalized linear mixed model feature selection methods (shrinkage and backward elimination). We also evaluated feature selection methods to develop models predicting mobility disability in older adults using the Health, Aging and Body Composition Study dataset as an example utilization of the proposed methodology.

RESULTS

BiMM forest with backward elimination generally offered higher computational efficiency, similar or higher predictive performance (accuracy and area under the receiver operating curve), and similar or higher ability to identify correct features compared to linear methods for the different simulated scenarios. For predicting mobility disability in older adults, methods generally performed similarly in terms of accuracy, area under the receiver operating curve, and specificity; however, BiMM forest with backward elimination had the highest sensitivity.

CONCLUSIONS

This study is novel because it is the first investigation of feature selection for developing random forest prediction models for clustered and longitudinal binary outcomes. Results from the simulation study reveal that BiMM forest with backward elimination has the highest accuracy (performance and identification of correct features) and lowest computation time compared to other feature selection methods in some scenarios and similar performance in other scenarios. Many informatics datasets have clustered and longitudinal outcomes and results from this study suggest that BiMM forest with backward elimination may be beneficial for developing medical prediction models.

Collapse

Arboretti R, Ceccato R, Pegoraro L, Salmaso L, Housmekerides C, Spadoni L, Pierangelo E, Quaggia S, Tveit C, Vianello S. Machine learning and design of experiments with an application to product innovation in the chemical industry. J Appl Stat 2021;49:2674-2699. [DOI: 10.1080/02664763.2021.1907840] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

D’Ottaviano F, Yang W. On missing random effects in machine learning. COMMUN STAT-SIMUL C 2020. [DOI: 10.1080/03610918.2020.1801729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]