1
|
Orimaye SO, Schmidtke KA. Combining artificial neural networks and a marginal structural model to predict the progression from depression to Alzheimer's disease. FRONTIERS IN DEMENTIA 2024; 3:1362230. [PMID: 39081615 PMCID: PMC11285640 DOI: 10.3389/frdem.2024.1362230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 03/21/2024] [Indexed: 08/02/2024]
Abstract
Introduction Decades of research in population health have established depression as a likely precursor to Alzheimer's disease. A combination of causal estimates and machine learning methods in artificial intelligence could identify internal and external mediating mechanisms that contribute to the likelihood of progression from depression to Alzheimer's disease. Methods We developed an integrated predictive model, combining the marginal structural model and an artificial intelligence predictive model, distinguishing between patients likely to progress from depressive states to Alzheimer's disease better than each model alone. Results The integrated predictive model achieved substantial clinical relevance when using the area under the curve measure. It performed better than the traditional statistical method or a single artificial intelligence method alone. Discussion The integrated predictive model could form a part of a clinical screening tool that identifies patients who are likely to progress from depression to Alzheimer's disease for early behavioral health interventions. Given the high costs of treating Alzheimer's disease, our model could serve as a cost-effective intervention for the early detection of depression before it progresses to Alzheimer's disease.
Collapse
Affiliation(s)
- Sylvester O. Orimaye
- College of Global Population Health, University of Health Sciences and Pharmacy, St. Louis, MO, United States
| | - Kelly A. Schmidtke
- College of Arts and Sciences, University of Health Sciences and Pharmacy, St. Louis, MO, United States
| |
Collapse
|
2
|
Post RAJ, Petkovic M, van den Heuvel IL, van den Heuvel ER. Flexible Machine Learning Estimation of Conditional Average Treatment Effects: A Blessing and a Curse. Epidemiology 2024; 35:32-40. [PMID: 37889951 DOI: 10.1097/ede.0000000000001684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/29/2023]
Abstract
Causal inference from observational data requires untestable identification assumptions. If these assumptions apply, machine learning methods can be used to study complex forms of causal effect heterogeneity. Recently, several machine learning methods were developed to estimate the conditional average treatment effect (ATE). If the features at hand cannot explain all heterogeneity, the individual treatment effects can seriously deviate from the conditional ATE. In this work, we demonstrate how the distributions of the individual treatment effect and the conditional ATE can differ when a causal random forest is applied. We extend the causal random forest to estimate the difference in conditional variance between treated and controls. If the distribution of the individual treatment effect equals that of the conditional ATE, this estimated difference in variance should be small. If they differ, an additional causal assumption is necessary to quantify the heterogeneity not captured by the distribution of the conditional ATE. The conditional variance of the individual treatment effect can be identified when the individual effect is independent of the outcome under no treatment given the measured features. Then, in the cases where the individual treatment effect and conditional ATE distributions differ, the extended causal random forest can appropriately estimate the variance of the individual treatment effect distribution, whereas the causal random forest fails to do so.
Collapse
Affiliation(s)
- Richard A J Post
- From the Department of Mathematics and Computer Science, Eindhoven University of Technology, the Netherlands
| | - Marko Petkovic
- From the Department of Mathematics and Computer Science, Eindhoven University of Technology, the Netherlands
| | - Isabel L van den Heuvel
- From the Department of Mathematics and Computer Science, Eindhoven University of Technology, the Netherlands
| | - Edwin R van den Heuvel
- From the Department of Mathematics and Computer Science, Eindhoven University of Technology, the Netherlands
- Department of Preventive Medicine and Epidemiology, School of Medicine, Boston University, Boston, MA
| |
Collapse
|
3
|
Moodie EEM, Talbot D. On "Reflections on the concept of optimality of single decision point treatment regimes". Biom J 2023; 65:e2300027. [PMID: 37797173 DOI: 10.1002/bimj.202300027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 04/26/2023] [Accepted: 06/22/2023] [Indexed: 10/07/2023]
Abstract
This is a discussion of "Reflections on the concept of optimality of single decision point treatment regimes" by Trung Dung Tran, Ariel Alonso Abad, Geert Verbeke, Geert Molenberghs, and Iven Van Mechelen. The authors propose a thoughtful consideration of optimization targets and the implications of such targets for the resulting optimal treatment rule. However, we contest the assertation that targets of optimization have been overlooked and suggest additional considerations that researchers must contemplate as part of a complete framework for learning about optimal treatment regimes.
Collapse
Affiliation(s)
- Erica E M Moodie
- Department of Epidemiology & Biostatistics, McGill University, Montreal, Quebec, Canada
| | - Denis Talbot
- Department of Social and Preventive Medicine, Université Laval, Quebec, Canada
| |
Collapse
|
4
|
Benitez A, Petersen ML, van der Laan MJ, Santos N, Butrick E, Walker D, Ghosh R, Otieno P, Waiswa P, Balzer LB. Defining and estimating effects in cluster randomized trials: A methods comparison. Stat Med 2023; 42:3443-3466. [PMID: 37308115 PMCID: PMC10898620 DOI: 10.1002/sim.9813] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 04/27/2023] [Accepted: 05/21/2023] [Indexed: 06/14/2023]
Abstract
Across research disciplines, cluster randomized trials (CRTs) are commonly implemented to evaluate interventions delivered to groups of participants, such as communities and clinics. Despite advances in the design and analysis of CRTs, several challenges remain. First, there are many possible ways to specify the causal effect of interest (eg, at the individual-level or at the cluster-level). Second, the theoretical and practical performance of common methods for CRT analysis remain poorly understood. Here, we present a general framework to formally define an array of causal effects in terms of summary measures of counterfactual outcomes. Next, we provide a comprehensive overview of CRT estimators, including the t-test, generalized estimating equations (GEE), augmented-GEE, and targeted maximum likelihood estimation (TMLE). Using finite sample simulations, we illustrate the practical performance of these estimators for different causal effects and when, as commonly occurs, there are limited numbers of clusters of different sizes. Finally, our application to data from the Preterm Birth Initiative (PTBi) study demonstrates the real-world impact of varying cluster sizes and targeting effects at the cluster-level or at the individual-level. Specifically, the relative effect of the PTBi intervention was 0.81 at the cluster-level, corresponding to a 19% reduction in outcome incidence, and was 0.66 at the individual-level, corresponding to a 34% reduction in outcome risk. Given its flexibility to estimate a variety of user-specified effects and ability to adaptively adjust for covariates for precision gains while maintaining Type-I error control, we conclude TMLE is a promising tool for CRT analysis.
Collapse
Affiliation(s)
| | - Maya L. Petersen
- School of Public Health, Biostatistics, University of California Berkeley, Berkeley, California
| | - Mark J. van der Laan
- School of Public Health, Biostatistics, University of California Berkeley, Berkeley, California
| | - Nicole Santos
- Institute for Global Health Sciences, University of California San Francisco, San Francisco, California
| | - Elizabeth Butrick
- Institute for Global Health Sciences, University of California San Francisco, San Francisco, California
| | - Dilys Walker
- Institute for Global Health Sciences, University of California San Francisco, San Francisco, California
| | - Rakesh Ghosh
- Institute for Global Health Sciences, University of California San Francisco, San Francisco, California
| | - Phelgona Otieno
- Center for Clinical Research, Kenya Medical Research Institute, Nairobi, Kenya
| | - Peter Waiswa
- Centre of Excellence for Maternal, Newborn and Child Health, Makerere University College of Health Sciences, Kampala, Uganda
| | - Laura B. Balzer
- School of Public Health, Biostatistics, University of California Berkeley, Berkeley, California
| |
Collapse
|
5
|
Al Hajj GS, Pensar J, Sandve GK. DagSim: Combining DAG-based model structure with unconstrained data types and relations for flexible, transparent, and modularized data simulation. PLoS One 2023; 18:e0284443. [PMID: 37058511 PMCID: PMC10104342 DOI: 10.1371/journal.pone.0284443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 03/30/2023] [Indexed: 04/15/2023] Open
Abstract
Data simulation is fundamental for machine learning and causal inference, as it allows exploration of scenarios and assessment of methods in settings with full control of ground truth. Directed acyclic graphs (DAGs) are well established for encoding the dependence structure over a collection of variables in both inference and simulation settings. However, while modern machine learning is applied to data of an increasingly complex nature, DAG-based simulation frameworks are still confined to settings with relatively simple variable types and functional forms. We here present DagSim, a Python-based framework for DAG-based data simulation without any constraints on variable types or functional relations. A succinct YAML format for defining the simulation model structure promotes transparency, while separate user-provided functions for generating each variable based on its parents ensure simulation code modularization. We illustrate the capabilities of DagSim through use cases where metadata variables control shapes in an image and patterns in bio-sequences. DagSim is available as a Python package at PyPI. Source code and documentation are available at: https://github.com/uio-bmi/dagsim.
Collapse
Affiliation(s)
| | - Johan Pensar
- Department of Mathematics, University of Oslo, Oslo, Norway
| | - Geir K. Sandve
- Department of Informatics, University of Oslo, Oslo, Norway
| |
Collapse
|
6
|
Leist AK, Klee M, Kim JH, Rehkopf DH, Bordas SPA, Muniz-Terrera G, Wade S. Mapping of machine learning approaches for description, prediction, and causal inference in the social and health sciences. SCIENCE ADVANCES 2022; 8:eabk1942. [PMID: 36260666 PMCID: PMC9581488 DOI: 10.1126/sciadv.abk1942] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 09/01/2022] [Indexed: 05/20/2023]
Abstract
Machine learning (ML) methodology used in the social and health sciences needs to fit the intended research purposes of description, prediction, or causal inference. This paper provides a comprehensive, systematic meta-mapping of research questions in the social and health sciences to appropriate ML approaches by incorporating the necessary requirements to statistical analysis in these disciplines. We map the established classification into description, prediction, counterfactual prediction, and causal structural learning to common research goals, such as estimating prevalence of adverse social or health outcomes, predicting the risk of an event, and identifying risk factors or causes of adverse outcomes, and explain common ML performance metrics. Such mapping may help to fully exploit the benefits of ML while considering domain-specific aspects relevant to the social and health sciences and hopefully contribute to the acceleration of the uptake of ML applications to advance both basic and applied social and health sciences research.
Collapse
Affiliation(s)
- Anja K. Leist
- Department of Social Sciences, Institute for Research on Socio-Economic Inequality (IRSEI), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Corresponding author.
| | - Matthias Klee
- Department of Social Sciences, Institute for Research on Socio-Economic Inequality (IRSEI), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Jung Hyun Kim
- Department of Social Sciences, Institute for Research on Socio-Economic Inequality (IRSEI), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - David H. Rehkopf
- Department of Epidemiology and Population Health, Stanford University, Palo Alto, CA, USA
| | | | - Graciela Muniz-Terrera
- Centre for Dementia Prevention, University of Edinburgh, Edinburgh, UK
- Ohio University, Athens, OH, USA
| | - Sara Wade
- School of Mathematics, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
7
|
Kalia S, Saarela O, Chen T, O'Neill B, Meaney C, Gronsbell J, Sejdic E, Escobar M, Aliarzadeh B, Moineddin R, Pow C, Sullivan F, Greiver M. Marginal structural models using calibrated weights with SuperLearner: application to type II diabetes cohort. IEEE J Biomed Health Inform 2022; 26:4197-4206. [PMID: 35588417 DOI: 10.1109/jbhi.2022.3175862] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
As different scientific disciplines begin to converge on machine learning for causal inference, we demonstrate the application of machine learning algorithms in the context of longitudinal causal estimation using electronic health records. Our aim is to formulate a marginal structural model for estimating diabetes care provisions in which we envisioned hypothetical (i.e. counterfactual) dynamic treatment regimes using a combination of drug therapies to manage diabetes: metformin, sulfonylurea and SGLT-2i. The binary outcome of diabetes care provisions was defined using a composite measure of chronic disease prevention and screening elements [27] including (i) primary care visit, (ii) blood pressure, (iii) weight, (iv) hemoglobin A1c, (v) lipid, (vi) ACR, (vii) eGFR and (viii) statin medication. We used several statistical learning algorithms to describe causal relationships between the prescription of three common classes of diabetes medications and quality of diabetes care using the electronic health records contained in National Diabetes Repository. In particular, we generated an ensemble of statistical learning algorithms using the SuperLearner framework based on the following base learners: (i) least absolute shrinkage and selection operator, (ii) ridge regression, (iii) elastic net, (iv) random forest, (v) gradient boosting machines, and (vi) neural network. Each statistical learning algorithm was fitted using the pseudo-population generated from the marginalization of the time-dependent confounding process. Covariate balance was assessed using the longitudinal (i.e. cumulative-time product) stabilized weights with calibrated restrictions. Our results indicated that the treatment drop-in cohorts (with respect to metformin, sulfonylurea and SGLT-2i) may have improved diabetes care provisions in relation to treatment naive (i.e. no treatment) cohort. As a clinical utility, we hope that this article will facilitate discussions around the prevention of adverse chronic outcomes associated with type II diabetes through the improvement of diabetes care provisions in primary care.
Collapse
|