Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Laqueur HS, Shev AB, Kagawa RMC. SuperMICE: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations. Am J Epidemiol 2022;191:516-525. [PMID: 34788362 DOI: 10.1093/aje/kwab271] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 09/17/2021] [Accepted: 11/08/2021] [Indexed: 11/13/2022] Open

For:	Laqueur HS, Shev AB, Kagawa RMC. SuperMICE: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations. Am J Epidemiol 2022;191:516-525. [PMID: 34788362 DOI: 10.1093/aje/kwab271] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 09/17/2021] [Accepted: 11/08/2021] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Cheng X, Wu Z, Lin J, Wang B, Huang S, Liu M, Yang J. A two-stage ensemble learning based prediction and grading model for PD-1/PD-L1 inhibitor-related cardiac adverse events: A multicenter retrospective study. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024;255:108360. [PMID: 39163785 DOI: 10.1016/j.cmpb.2024.108360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 06/12/2024] [Accepted: 07/27/2024] [Indexed: 08/22/2024]

Abstract

BACKGROUND

Immune-related cardiac adverse events (ircAEs) caused by programmed cell death protein-1 (PD-1) and programmed death-ligand-1 (PD-L1) inhibitors can lead to fulminant and even fatal consequences. This study aims to develop a prediction and grading model for ircAEs, enabling graded management of patients.

METHODS

This study utilized medical record systems from two medical institutions to develop a prediction and grading model for ircAEs using ten machine learning algorithms and two variable screening methods. The model was developed based on a two-stage ensemble learning framework. In the first stage, the ircAEs and non-ircAEs cases were classified. In the second stage, ircAEs cases were grouped into grades 1-2 and 3-5. The experiments were evaluated using five-fold cross-validation. The model's prediction performance was assessed using accuracy, precision, recall, F1 value, Brier score, receiver operating characteristic curve area (AUC), and area under the precision-recall curve (AUPR).

RESULTS

615 patients were included in the study. 147 experienced ircAEs, and 44 experienced grade 3-5 ircAEs. The soft voting classifier trained using the variables screened by feature importance ranking performed better than other classifiers in both stages. The average AUC for the first and second stages is 84.18 % and 85.13 %, respectively. In the first stage, the three most important variables are N-terminal B-type natriuretic peptide (NT-proBNP), interleukin-2 (IL-2), and C-reactive protein (CRP). In the second stage, the patient's age, NT-proBNP, and left ventricular ejection fraction (LVEF) are the three most critical variables.

CONCLUSIONS

The prediction and grading model of ircAEs based on two-stage ensemble learning established in this study has good performance and potential clinical application.

Collapse

El Badisy I, Graffeo N, Khalis M, Giorgi R. Multi-metric comparison of machine learning imputation methods with application to breast cancer survival. BMC Med Res Methodol 2024;24:191. [PMID: 39215245 PMCID: PMC11363416 DOI: 10.1186/s12874-024-02305-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 08/08/2024] [Indexed: 09/04/2024] Open

Abstract

Handling missing data in clinical prognostic studies is an essential yet challenging task. This study aimed to provide a comprehensive assessment of the effectiveness and reliability of different machine learning (ML) imputation methods across various analytical perspectives. Specifically, it focused on three distinct classes of performance metrics used to evaluate ML imputation methods: post-imputation bias of regression estimates, post-imputation predictive accuracy, and substantive model-free metrics. As an illustration, we applied data from a real-world breast cancer survival study. This comprehensive approach aimed to provide a thorough assessment of the effectiveness and reliability of ML imputation methods across various analytical perspectives. A simulated dataset with 30% Missing At Random (MAR) values was used. A number of single imputation (SI) methods - specifically KNN, missMDA, CART, missForest, missRanger, missCforest - and multiple imputation (MI) methods - specifically miceCART and miceRF - were evaluated. The performance metrics used were Gower's distance, estimation bias, empirical standard error, coverage rate, length of confidence interval, predictive accuracy, proportion of falsely classified (PFC), normalized root mean squared error (NRMSE), AUC, and C-index scores. The analysis revealed that in terms of Gower's distance, CART and missForest were the most accurate, while missMDA and CART excelled for binary covariates; missForest and miceCART were superior for continuous covariates. When assessing bias and accuracy in regression estimates, miceCART and miceRF exhibited the least bias. Overall, the various imputation methods demonstrated greater efficiency than complete-case analysis (CCA), with MICE methods providing optimal confidence interval coverage. In terms of predictive accuracy for Cox models, missMDA and missForest had superior AUC and C-index scores. Despite offering better predictive accuracy, the study found that SI methods introduced more bias into the regression coefficients compared to MI methods. This study underlines the importance of selecting appropriate imputation methods based on study goals and data types in time-to-event research. The varying effectiveness of methods across the different performance metrics studied highlights the value of using advanced machine learning algorithms within a multiple imputation framework to enhance research integrity and the robustness of findings.

Collapse

Santipas B, Veerakanjana K, Ittichaiwong P, Chavalparit P, Wilartratsami S, Luksanapruksa P. Development and internal validation of machine-learning models for predicting survival in patients who underwent surgery for spinal metastases. Asian Spine J 2024;18:325-335. [PMID: 38764230 PMCID: PMC11222881 DOI: 10.31616/asj.2023.0314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/17/2024] [Accepted: 01/23/2024] [Indexed: 05/21/2024] Open

Abstract

STUDY DESIGN

A retrospective study.

PURPOSE

This study aimed to develop machine-learning algorithms for predicting survival in patients who underwent surgery for spinal metastasis.

OVERVIEW OF LITERATURE

This study develops machine-learning models to predict postoperative survival in spinal metastasis patients, filling the gaps of traditional prognostic systems. Utilizing data from 389 patients, the study highlights XGBoost and CatBoost algorithms̓ effectiveness for 90, 180, and 365-day survival predictions, with preoperative serum albumin as a key predictor. These models offer a promising approach for enhancing clinical decision-making and personalized patient care.

METHODS

A registry of patients who underwent surgery (instrumentation, decompression, or fusion) for spinal metastases between 2004 and 2018 was used. The outcome measure was survival at postoperative days 90, 180, and 365. Preoperative variables were used to develop machine-learning algorithms to predict survival chance in each period. The performance of the algorithms was measured using the area under the receiver operating characteristic curve (AUC).

RESULTS

A total of 389 patients were identified, with 90-, 180-, and 365-day mortality rates of 18%, 41%, and 45% postoperatively, respectively. The XGBoost algorithm showed the best performance for predicting 180-day and 365-day survival (AUCs of 0.744 and 0.693, respectively). The CatBoost algorithm demonstrated the best performance for predicting 90-day survival (AUC of 0.758). Serum albumin had the highest positive correlation with survival after surgery.

CONCLUSIONS

These machine-learning algorithms showed promising results in predicting survival in patients who underwent spinal palliative surgery for spinal metastasis, which may assist surgeons in choosing appropriate treatment and increasing awareness of mortality-related factors before surgery.

Collapse

Liu J, Duan Z, Hu X, Zhong J, Yin Y. Detracking Autoencoding Conditional Generative Adversarial Network: Improved Generative Adversarial Network Method for Tabular Missing Value Imputation. ENTROPY (BASEL, SWITZERLAND) 2024;26:402. [PMID: 38785651 PMCID: PMC11120050 DOI: 10.3390/e26050402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/20/2024] [Accepted: 04/21/2024] [Indexed: 05/25/2024]

Li J, Hao Y, Liu Y, Wu L, Liang H, Ni L, Wang F, Wang S, Duan Y, Xu Q, Xiao J, Yang D, Gao G, Ding Y, Gao C, Xiao J, Zhao H. Supervised machine learning algorithms to predict the duration and risk of long-term hospitalization in HIV-infected individuals: a retrospective study. Front Public Health 2024;11:1282324. [PMID: 38249414 PMCID: PMC10796994 DOI: 10.3389/fpubh.2023.1282324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 12/13/2023] [Indexed: 01/23/2024] Open

Affiliation(s)

Jialu Li Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Yiwei Hao Division of Medical Record and Statistics, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Ying Liu Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Liang Wu Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Hongyuan Liang Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Liang Ni Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Fang Wang Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Sa Wang Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Yujiao Duan Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Qiuhua Xu Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Jinjing Xiao Department of Clinical Medicine, Zhengzhou University, Zhengzhou, China
Di Yang Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Guiju Gao Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Yi Ding Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Chengyu Gao Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Jiang Xiao Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
Hongxin Zhao Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China

Collapse

Kondo M, Oba K. Handling of outcome missing data dependent on measured or unmeasured background factors in micro-randomized trial: Simulation and application study. Digit Health 2024;10:20552076241249631. [PMID: 38698826 PMCID: PMC11064756 DOI: 10.1177/20552076241249631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/08/2024] [Indexed: 05/05/2024] Open

Hong WT, Clifton G, Nelson JD. Railway accident causation analysis: Current approaches, challenges and potential solutions. ACCIDENT; ANALYSIS AND PREVENTION 2023;186:107049. [PMID: 36989961 DOI: 10.1016/j.aap.2023.107049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 03/23/2023] [Accepted: 03/24/2023] [Indexed: 06/19/2023]

Abstract

Railway accident causation analysis is fundamental to understanding the nature of railway safety. Although a considerable number of prior studies have investigated this context, many of them suffer from the need to deal with a large amount of textual data given that most railway safety-related information is recorded and stored in the form of text. To gain a better understanding of the limitations imposed by overreliance on textual analysis, a scoping review of the academic literature on how railway accident causation analysis is addressed has been conducted. The results confirm the high frequency of using textual data, a single case study, and in-depth analysis frameworks. While the value of exploring causational factors is clear, the high level of human intervention and the labour-intensive analysis processes based on a large volume of textual data hinder researchers from understanding the complex nature of the rail safety system. Recently, growing attention has been given to the application of Natural Language Processing (NLP) to aid the practice of analysing a large corpus of textual data, but only limited studies to date in railway safety use such techniques and none address railway accident causation analysis. To fill this gap, a supplementary review is conducted to identify opportunities, challenges, boundaries and limitations in the application of NLP approaches to railway accident causation analysis. Findings indicate that novel techniques using off-the-shelf tools have strong potential to overcome the limitations of overreliance on manual analysis in practice and theory, but the absence of shared railway safety-related benchmark corpora restricts implementation. This study sheds light on a new approach to railway accident causation analysis and clarifies future applicable utilisations for further research.

Collapse

Pelgrims I, Devleesschauwer B, Vandevijvere S, De Clercq EM, Vansteelandt S, Gorasso V, Van der Heyden J. Using random-forest multiple imputation to address bias of self-reported anthropometric measures, hypertension and hypercholesterolemia in the Belgian health interview survey. BMC Med Res Methodol 2023;23:69. [PMID: 36966305 PMCID: PMC10040120 DOI: 10.1186/s12874-023-01892-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 03/16/2023] [Indexed: 03/27/2023] Open

Abstract

BACKGROUND

In many countries, the prevalence of non-communicable diseases risk factors is commonly assessed through self-reported information from health interview surveys. It has been shown, however, that self-reported instead of objective data lead to an underestimation of the prevalence of obesity, hypertension and hypercholesterolemia. This study aimed to assess the agreement between self-reported and measured height, weight, hypertension and hypercholesterolemia and to identify an adequate approach for valid measurement error correction.

METHODS

Nine thousand four hundred thirty-nine participants of the 2018 Belgian health interview survey (BHIS) older than 18 years, of which 1184 participated in the 2018 Belgian health examination survey (BELHES), were included in the analysis. Regression calibration was compared with multiple imputation by chained equations based on parametric and non-parametric techniques.

RESULTS

This study confirmed the underestimation of risk factor prevalence based on self-reported data. With both regression calibration and multiple imputation, adjusted estimation of these variables in the BHIS allowed to generate national prevalence estimates that were closer to their BELHES clinical counterparts. For overweight, obesity and hypertension, all methods provided smaller standard errors than those obtained with clinical data. However, for hypercholesterolemia, for which the regression model's accuracy was poor, multiple imputation was the only approach which provided smaller standard errors than those based on clinical data.

CONCLUSIONS

The random-forest multiple imputation proves to be the method of choice to correct the bias related to self-reported data in the BHIS. This method is particularly useful to enable improved secondary analysis of self-reported data by using information included in the BELHES. Whenever feasible, combined information from HIS and objective measurements should be used in risk factor monitoring.

Collapse

Li D, Wong J, Li X, Toh S, Wang R. Imputing missing covariates in time-to-event analysis within distributed research networks: A simulation study. Pharmacoepidemiol Drug Saf 2023;32:330-340. [PMID: 36380400 DOI: 10.1002/pds.5563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 09/13/2022] [Accepted: 10/26/2022] [Indexed: 11/18/2022]

Kong W, Hui HWH, Peng H, Goh WWB. Dealing with missing values in proteomics data. Proteomics 2022;22:e2200092. [PMID: 36349819 DOI: 10.1002/pmic.202200092] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/15/2022] [Accepted: 10/11/2022] [Indexed: 11/10/2022]

Leist AK, Klee M, Kim JH, Rehkopf DH, Bordas SPA, Muniz-Terrera G, Wade S. Mapping of machine learning approaches for description, prediction, and causal inference in the social and health sciences. SCIENCE ADVANCES 2022;8:eabk1942. [PMID: 36260666 PMCID: PMC9581488 DOI: 10.1126/sciadv.abk1942] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 09/01/2022] [Indexed: 05/20/2023]

Carpenito T, Manjourides J. MISL: Multiple imputation by super learning. Stat Methods Med Res 2022;31:1904-1915. [PMID: 35658622 DOI: 10.1177/09622802221104238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]