Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Šinkovec H, Heinze G, Blagus R, Geroldinger A. To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets. BMC Med Res Methodol 2021;21:199. [PMID: 34592945 PMCID: PMC8482588 DOI: 10.1186/s12874-021-01374-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 08/19/2021] [Indexed: 11/10/2022] Open

For:	Šinkovec H, Heinze G, Blagus R, Geroldinger A. To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets. BMC Med Res Methodol 2021;21:199. [PMID: 34592945 PMCID: PMC8482588 DOI: 10.1186/s12874-021-01374-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 08/19/2021] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Fan Y, Sun N, Lv S, Jiang H, Zhang Z, Wang J, Xie Y, Yue X, Hu B, Ju B, Yu P. Prediction of developmental toxic effects of fine particulate matter (PM_2.5) water-soluble components via machine learning through observation of PM_2.5 from diverse urban areas. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024;946:174027. [PMID: 38906297 DOI: 10.1016/j.scitotenv.2024.174027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/09/2024] [Accepted: 06/13/2024] [Indexed: 06/23/2024]

Gregorich M, Simpson SL, Heinze G. Flexible parametrization of graph-theoretical features from individual-specific networks for prediction. Stat Med 2024;43:2592-2606. [PMID: 38664934 DOI: 10.1002/sim.10091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 03/15/2024] [Accepted: 04/15/2024] [Indexed: 05/24/2024]

Abstract

Statistical techniques are needed to analyze data structures with complex dependencies such that clinically useful information can be extracted. Individual-specific networks, which capture dependencies in complex biological systems, are often summarized by graph-theoretical features. These features, which lend themselves to outcome modeling, can be subject to high variability due to arbitrary decisions in network inference and noise. Correlation-based adjacency matrices often need to be sparsified before meaningful graph-theoretical features can be extracted, requiring the data analysts to determine an optimal threshold. To address this issue, we propose to incorporate a flexible weighting function over the full range of possible thresholds to capture the variability of graph-theoretical features over the threshold domain. The potential of this approach, which extends concepts from functional data analysis to a graph-theoretical setting, is explored in a plasmode simulation study using real functional magnetic resonance imaging (fMRI) data from the Autism Brain Imaging Data Exchange (ABIDE) Preprocessed initiative. The simulations show that our modeling approach yields accurate estimates of the functional form of the weight function, improves inference efficiency, and achieves a comparable or reduced root mean square prediction error compared to competitor modeling approaches. This assertion holds true in settings where both complex functional forms underlie the outcome-generating process and a universal threshold value is employed. We demonstrate the practical utility of our approach by using resting-state fMRI data to predict biological age in children. Our study establishes the flexible modeling approach as a statistically principled, serious competitor to ad-hoc methods with superior performance.

Collapse

Dunias ZS, Van Calster B, Timmerman D, Boulesteix AL, van Smeden M. A comparison of hyperparameter tuning procedures for clinical prediction models: A simulation study. Stat Med 2024;43:1119-1134. [PMID: 38189632 DOI: 10.1002/sim.9932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 09/10/2023] [Accepted: 09/21/2023] [Indexed: 01/09/2024]

Abstract

Tuning hyperparameters, such as the regularization parameter in Ridge or Lasso regression, is often aimed at improving the predictive performance of risk prediction models. In this study, various hyperparameter tuning procedures for clinical prediction models were systematically compared and evaluated in low-dimensional data. The focus was on out-of-sample predictive performance (discrimination, calibration, and overall prediction error) of risk prediction models developed using Ridge, Lasso, Elastic Net, or Random Forest. The influence of sample size, number of predictors and events fraction on performance of the hyperparameter tuning procedures was studied using extensive simulations. The results indicate important differences between tuning procedures in calibration performance, while generally showing similar discriminative performance. The one-standard-error rule for tuning applied to cross-validation (1SE CV) often resulted in severe miscalibration. Standard non-repeated and repeated cross-validation (both 5-fold and 10-fold) performed similarly well and outperformed the other tuning procedures. Bootstrap showed a slight tendency to more severe miscalibration than standard cross-validation-based tuning procedures. Differences between tuning procedures were larger for smaller sample sizes, lower events fractions and fewer predictors. These results imply that the choice of tuning procedure can have a profound influence on the predictive performance of prediction models. The results support the application of standard 5-fold or 10-fold cross-validation that minimizes out-of-sample prediction error. Despite an increased computational burden, we found no clear benefit of repeated over non-repeated cross-validation for hyperparameter tuning. We warn against the potentially detrimental effects on model calibration of the popular 1SE CV rule for tuning prediction models in low-dimensional settings.

Collapse

Patwary AL, Haque AM, Mahdinia I, Khattak AJ. Investigating transportation safety in disadvantaged communities by integrating crash and Environmental Justice data. ACCIDENT; ANALYSIS AND PREVENTION 2024;194:107366. [PMID: 37924566 DOI: 10.1016/j.aap.2023.107366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 10/03/2023] [Accepted: 10/24/2023] [Indexed: 11/06/2023]

Jin H, Ranasinghe KG, Prabhu P, Dale C, Gao Y, Kudo K, Vossel K, Raj A, Nagarajan SS, Jiang F. Dynamic functional connectivity MEG features of Alzheimer's disease. Neuroimage 2023;281:120358. [PMID: 37699440 PMCID: PMC10865998 DOI: 10.1016/j.neuroimage.2023.120358] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 08/14/2023] [Accepted: 08/31/2023] [Indexed: 09/14/2023] Open

Abstract

Dynamic resting state functional connectivity (RSFC) characterizes time-varying fluctuations of functional brain network activity. While many studies have investigated static functional connectivity, it has been unclear whether features of dynamic functional connectivity are associated with neurodegenerative diseases. Popular sliding-window and clustering methods for extracting dynamic RSFC have various limitations that prevent extracting reliable features to address this question. Here, we use a novel and robust time-varying dynamic network (TVDN) approach to extract the dynamic RSFC features from high resolution magnetoencephalography (MEG) data of participants with Alzheimer's disease (AD) and matched controls. The TVDN algorithm automatically and adaptively learns the low-dimensional spatiotemporal manifold of dynamic RSFC and detects dynamic state transitions in data. We show that amongst all the functional features we investigated, the dynamic manifold features are the most predictive of AD. These include: the temporal complexity of the brain network, given by the number of state transitions and their dwell times, and the spatial complexity of the brain network, given by the number of eigenmodes. These dynamic features have higher sensitivity and specificity in distinguishing AD from healthy subjects than the existing benchmarks do. Intriguingly, we found that AD patients generally have higher spatial complexity but lower temporal complexity compared with healthy controls. We also show that graph theoretic metrics of dynamic component of TVDN are significantly different in AD versus controls, while static graph metrics are not statistically different. These results indicate that dynamic RSFC features are impacted in neurodegenerative disease like Alzheimer's disease, and may be crucial to understanding the pathophysiological trajectory of these diseases.

Collapse

Tiulpin A, Saarakkala S, Mathiessen A, Hammer HB, Furnes O, Nordsletten L, Englund M, Magnusson K. Predicting total knee arthroplasty from ultrasonography using machine learning. OSTEOARTHRITIS AND CARTILAGE OPEN 2022;4:100319. [DOI: 10.1016/j.ocarto.2022.100319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 10/15/2022] [Accepted: 10/30/2022] [Indexed: 11/09/2022] Open

Ghorashi SM, Fazeli A, Hedayat B, Mokhtari H, Jalali A, Ahmadi P, Chalian H, Bragazzi NL, Shirani S, Omidi N. Comparison of conventional scoring systems to machine learning models for the prediction of major adverse cardiovascular events in patients undergoing coronary computed tomography angiography. Front Cardiovasc Med 2022;9:994483. [PMID: 36386332 PMCID: PMC9643500 DOI: 10.3389/fcvm.2022.994483] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 10/05/2022] [Indexed: 08/04/2023] Open

Abstract

BACKGROUND

The study aims to compare the prognostic performance of conventional scoring systems to a machine learning (ML) model on coronary computed tomography angiography (CCTA) to discriminate between the patients with and without major adverse cardiovascular events (MACEs) and to find the most important contributing factor of MACE.

MATERIALS AND METHODS

From November to December 2019, 500 of 1586 CCTA scans were included and analyzed, then six conventional scores were calculated for each participant, and seven ML models were designed. Our study endpoints were all-cause mortality, non-fatal myocardial infarction, late coronary revascularization, and hospitalization for unstable angina or heart failure. Score performance was assessed by area under the curve (AUC) analysis.

RESULTS

Of 500 patients (mean age: 60 ± 10; 53.8% male subjects) referred for CCTA, 416 patients have met inclusion criteria, 46 patients with early (<90 days) cardiac evaluation (due to the inability to clarify the reason for the assessment, deterioration of the symptoms vs. the CCTA result), and 38 patients because of missed follow-up were not enrolled in the final analysis. Forty-six patients (11.0%) developed MACE within 20.5 ± 7.9 months of follow-up. Compared to conventional scores, ML models showed better performance, except only one model which is eXtreme Gradient Boosting had lower performance than conventional scoring systems (AUC:0.824, 95% confidence interval (CI): 0.701-0.947). Between ML models, random forest, ensemble with generalized linear, and ensemble with naive Bayes were shown to have higher prognostic performance (AUC: 0.92, 95% CI: 0.85-0.99, AUC: 0.90, 95% CI: 0.81-0.98, and AUC: 0.89, 95% CI: 0.82-0.97), respectively. Coronary artery calcium score (CACS) had the highest correlation with MACE.

CONCLUSION

Compared to the conventional scoring system, ML models using CCTA scans show improved prognostic prediction for MACE. Anatomical features were more important than clinical characteristics.

Collapse

Gregorich M, Melograna F, Sunqvist M, Michiels S, Van Steen K, Heinze G. Individual-specific networks for prediction modelling – A scoping review of methods. BMC Med Res Methodol 2022;22:62. [PMID: 35249534 PMCID: PMC8898441 DOI: 10.1186/s12874-022-01544-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 02/11/2022] [Indexed: 11/10/2022] Open

Abstract

Background

Recent advances in biotechnology enable the acquisition of high-dimensional data on individuals, posing challenges for prediction models which traditionally use covariates such as clinical patient characteristics. Alternative forms of covariate representations for the features derived from these modern data modalities should be considered that can utilize their intrinsic interconnection. The connectivity information between these features can be represented as an individual-specific network defined by a set of nodes and edges, the strength of which can vary from individual to individual. Global or local graph-theoretical features describing the network may constitute potential prognostic biomarkers instead of or in addition to traditional covariates and may replace the often unsuccessful search for individual biomarkers in a high-dimensional predictor space.

Methods

We conducted a scoping review to identify, collate and critically appraise the state-of-art in the use of individual-specific networks for prediction modelling in medicine and applied health research, published during 2000–2020 in the electronic databases PubMed, Scopus and Embase.

Results

Our scoping review revealed the main application areas namely neurology and pathopsychology, followed by cancer research, cardiology and pathology (N = 148). Network construction was mainly based on Pearson correlation coefficients of repeated measurements, but also alternative approaches (e.g. partial correlation, visibility graphs) were found. For covariates measured only once per individual, network construction was mostly based on quantifying an individual’s contribution to the overall group-level structure. Despite the multitude of identified methodological approaches for individual-specific network inference, the number of studies that were intended to enable the prediction of clinical outcomes for future individuals was quite limited, and most of the models served as proof of concept that network characteristics can in principle be useful for prediction.

Conclusion

The current body of research clearly demonstrates the value of individual-specific network analysis for prediction modelling, but it has not yet been considered as a general tool outside the current areas of application. More methodological research is still needed on well-founded strategies for network inference, especially on adequate network sparsification and outcome-guided graph-theoretical feature extraction and selection, and on how networks can be exploited efficiently for prediction modelling.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12874-022-01544-6.

Collapse