Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hu L, Liu B, Li Y. Ranking sociodemographic, health behavior, prevention, and environmental factors in predicting neighborhood cardiovascular health: A Bayesian machine learning approach. Prev Med 2020;141:106240. [PMID: 32860821 PMCID: PMC7704682 DOI: 10.1016/j.ypmed.2020.106240] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Revised: 07/19/2020] [Accepted: 08/19/2020] [Indexed: 10/23/2022]

For:	Hu L, Liu B, Li Y. Ranking sociodemographic, health behavior, prevention, and environmental factors in predicting neighborhood cardiovascular health: A Bayesian machine learning approach. Prev Med 2020;141:106240. [PMID: 32860821 PMCID: PMC7704682 DOI: 10.1016/j.ypmed.2020.106240] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Revised: 07/19/2020] [Accepted: 08/19/2020] [Indexed: 10/23/2022]

Number

Cited by Other Article(s)

Klein H, Washington TA. The Relationship of Anti-Transgender Discrimination, Harassment, and Violence to Binge Drinking among Transgender Adults. Subst Use Misuse 2024;59:583-590. [PMID: 38105183 DOI: 10.1080/10826084.2023.2293731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]

Hu L. A new method for clustered survival data: Estimation of treatment effect heterogeneity and variable selection. Biom J 2024;66:e2200178. [PMID: 38072661 PMCID: PMC10953775 DOI: 10.1002/bimj.202200178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/31/2023] [Accepted: 08/11/2023] [Indexed: 01/30/2024]

Abstract

We recently developed a new method random-intercept accelerated failure time model with Bayesian additive regression trees (riAFT-BART) to draw causal inferences about population treatment effect on patient survival from clustered and censored survival data while accounting for the multilevel data structure. The practical utility of this method goes beyond the estimation of population average treatment effect. In this work, we exposit how riAFT-BART can be used to solve two important statistical questions with clustered survival data: estimating the treatment effect heterogeneity and variable selection. Leveraging the likelihood-based machine learning, we describe a way in which we can draw posterior samples of the individual survival treatment effect from riAFT-BART model runs, and use the drawn posterior samples to perform an exploratory treatment effect heterogeneity analysis to identify subpopulations who may experience differential treatment effects than population average effects. There is sparse literature on methods for variable selection among clustered and censored survival data, particularly ones using flexible modeling techniques. We propose a permutation-based approach using the predictor's variable inclusion proportion supplied by the riAFT-BART model for variable selection. To address the missing data issue frequently encountered in health databases, we propose a strategy to combine bootstrap imputation and riAFT-BART for variable selection among incomplete clustered survival data. We conduct an expansive simulation study to examine the practical operating characteristics of our proposed methods, and provide empirical evidence that our proposed methods perform better than several existing methods across a wide range of data scenarios. Finally, we demonstrate the methods via a case study of predictors for in-hospital mortality among severe COVID-19 patients and estimating the heterogeneous treatment effects of three COVID-specific medications. The methods developed in this work are readily available in the R ${\textsf {R}}$ package riAFTBART $\textsf {riAFTBART}$ .

Collapse

Brousmiche D, Lanier C, Cuny D, Frevent C, Genin M, Blanc-Garin C, Amouyel P, Deram A, Occelli F, Meirhaeghe A. How do territorial characteristics affect spatial inequalities in the risk of coronary heart disease? THE SCIENCE OF THE TOTAL ENVIRONMENT 2023;867:161563. [PMID: 36640871 DOI: 10.1016/j.scitotenv.2023.161563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 01/08/2023] [Accepted: 01/08/2023] [Indexed: 06/17/2023]

Abstract

BACKGROUND

Cardiovascular diseases remain the leading cause of death and disabilities worldwide, with coronary heart diseases being the most frequently diagnosed. Their multifactorial etiology involves individual, behavioral and territorial determinants, and thus requires the implementation of multidimensional approaches to assess links between territorial characteristics and the incidence of coronary heart diseases.

CONTEXT AND OBJECTIVES

This study was carried out in a densely populated area located in the north of France with multiple sources of pollutants. The aim of this research was therefore to establish complex territorial profiles that have been characterized by the standardized incidence, thereby identifying the influences of determinants that can be related to a beneficial or a deleterious effect on cardiovascular health.

METHODS

Forty-four variables related to economic, social, health, environment and services dimensions with an established or suspected impact on cardiovascular health were used to describe the multidimensional characteristics involved in cardiovascular health.

RESULTS

Three complex territorial profiles have been highlighted and characterized by the standardized incidence rate (SIR) of coronary heart diseases after adjustment for age and gender. Profile 1 was characterized by an SIR of 0.895 (sd: 0.143) and a higher number of determinants that revealed favorable territorial conditions. Profiles 2 and 3 were characterized by SIRs of respectively 1.225 (sd: 0.242) and 1.119 (sd: 0.273). Territorial characteristics among these profiles of over-incidence were nevertheless dissimilar. Profile 2 revealed higher deprivation, lower vegetation and lower atmospheric pollution, while profile 3 displayed a rather privileged population with contrasted territorial conditions.

CONCLUSION

This methodology permitted the characterization of the multidimensional determinants involved in cardiovascular health, whether they have a negative or a positive impact, and could provide stakeholders with a diagnostic tool to implement contextualized public health policies to prevent coronary heart diseases.

Collapse

Affiliation(s)

Delphine Brousmiche Univ. Lille, Univ. Artois, IMT Lille Douai, JUNIA, ULR 4515 - LGCgE, Laboratoire de Génie Civil et Géo-Environnement, F-59000 Lille, France; Association pour la Prévention de la Pollution Atmosphérique, F-59120 Loos, France.
Caroline Lanier Univ. Lille, Univ. Artois, IMT Lille Douai, JUNIA, ULR 4515 - LGCgE, Laboratoire de Génie Civil et Géo-Environnement, F-59000 Lille, France; Univ. Lille, UFR3S-Faculté d'Ingénierie et Management de la Santé (ILIS), F-59000 Lille, France
Damien Cuny Univ. Lille, Univ. Artois, IMT Lille Douai, JUNIA, ULR 4515 - LGCgE, Laboratoire de Génie Civil et Géo-Environnement, F-59000 Lille, France; Univ. Lille, UFR3S-Faculté de Pharmacie de Lille - LSVF, F-59000 Lille, France
Camille Frevent Univ. Lille, CHU Lille, ULR 2694 - METRICS: Evaluation des technologies de santé et des pratiques médicales, F-59000 Lille, France
Michael Genin Univ. Lille, CHU Lille, ULR 2694 - METRICS: Evaluation des technologies de santé et des pratiques médicales, F-59000 Lille, France
Carine Blanc-Garin Univ. Lille, CHU Lille, Institut Pasteur de Lille, Inserm UMR1167 RID-AGE (Risk Factors and Molecular Determinants of Aging-Related Diseases), F-59000 Lille, France
Philippe Amouyel Univ. Lille, CHU Lille, Institut Pasteur de Lille, Inserm UMR1167 RID-AGE (Risk Factors and Molecular Determinants of Aging-Related Diseases), F-59000 Lille, France
Annabelle Deram Univ. Lille, Univ. Artois, IMT Lille Douai, JUNIA, ULR 4515 - LGCgE, Laboratoire de Génie Civil et Géo-Environnement, F-59000 Lille, France; Univ. Lille, UFR3S-Faculté d'Ingénierie et Management de la Santé (ILIS), F-59000 Lille, France
Florent Occelli Univ. Lille, Univ. Artois, IMT Lille Douai, JUNIA, ULR 4515 - LGCgE, Laboratoire de Génie Civil et Géo-Environnement, F-59000 Lille, France; Univ. Lille, UFR3S-Faculté d'Ingénierie et Management de la Santé (ILIS), F-59000 Lille, France
Aline Meirhaeghe Univ. Lille, CHU Lille, Institut Pasteur de Lille, Inserm UMR1167 RID-AGE (Risk Factors and Molecular Determinants of Aging-Related Diseases), F-59000 Lille, France

Collapse

Sun F, Yao J, Du S, Qian F, Appleton AA, Tao C, Xu H, Liu L, Dai Q, Joyce BT, Nannini DR, Hou L, Zhang K. Social Determinants, Cardiovascular Disease, and Health Care Cost: A Nationwide Study in the United States Using Machine Learning. J Am Heart Assoc 2023;12:e027919. [PMID: 36802713 PMCID: PMC10111459 DOI: 10.1161/jaha.122.027919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]

Hu L, Li L. Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;19:16080. [PMID: 36498153 PMCID: PMC9736500 DOI: 10.3390/ijerph192316080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/22/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]

Hu L, Zou J, Gu C, Ji J, Lopez M, Kale M. A FLEXIBLE SENSITIVITY ANALYSIS APPROACH FOR UNMEASURED CONFOUNDING WITH MULTIPLE TREATMENTS AND A BINARY OUTCOME WITH APPLICATION TO SEER-MEDICARE LUNG CANCER DATA. Ann Appl Stat 2022;16:1014-1037. [PMID: 36644682 PMCID: PMC9835106 DOI: 10.1214/21-aoas1530] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Lin JYJ, Hu L, Huang C, Jiayi J, Lawrence S, Govindarajulu U. A flexible approach for variable selection in large-scale healthcare database studies with missing covariate and outcome data. BMC Med Res Methodol 2022;22:132. [PMID: 35508974 PMCID: PMC9066834 DOI: 10.1186/s12874-022-01608-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 04/19/2022] [Indexed: 12/17/2022] Open

Abstract

Background

Prior work has shown that combining bootstrap imputation with tree-based machine learning variable selection methods can provide good performances achievable on fully observed data when covariate and outcome data are missing at random (MAR). This approach however is computationally expensive, especially on large-scale datasets.

Methods

We propose an inference-based method, called RR-BART, which leverages the likelihood-based Bayesian machine learning technique, Bayesian additive regression trees, and uses Rubin’s rule to combine the estimates and variances of the variable importance measures on multiply imputed datasets for variable selection in the presence of MAR data. We conduct a representative simulation study to investigate the practical operating characteristics of RR-BART, and compare it with the bootstrap imputation based methods. We further demonstrate the methods via a case study of risk factors for 3-year incidence of metabolic syndrome among middle-aged women using data from the Study of Women’s Health Across the Nation (SWAN).

Results

The simulation study suggests that even in complex conditions of nonlinearity and nonadditivity with a large percentage of missingness, RR-BART can reasonably recover both prediction and variable selection performances, achievable on the fully observed data. RR-BART provides the best performance that the bootstrap imputation based methods can achieve with the optimal selection threshold value. In addition, RR-BART demonstrates a substantially stronger ability of detecting discrete predictors. Furthermore, RR-BART offers substantial computational savings. When implemented on the SWAN data, RR-BART adds to the literature by selecting a set of predictors that had been less commonly identified as risk factors but had substantial biological justifications.

Conclusion

The proposed variable selection method for MAR data, RR-BART, offers both computational efficiency and good operating characteristics and is utilitarian in large-scale healthcare database studies.

Supplementary Information

The online version contains supplementary material available at (10.1186/s12874-022-01608-7).

Collapse

Hu L, Joyce Lin JY, Ji J. Variable selection with missing data in both covariates and outcomes: Imputation and machine learning. Stat Methods Med Res 2021;30:2651-2671. [PMID: 34696650 PMCID: PMC11181487 DOI: 10.1177/09622802211046385] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Liu Z, Lin Z, Cao W, Li R, Liu L, Wu H, Tang K. Identify Key Determinants of Contraceptive Use for Sexually Active Young People: A Hybrid Ensemble of Machine Learning Methods. CHILDREN (BASEL, SWITZERLAND) 2021;8:968. [PMID: 34828681 PMCID: PMC8622295 DOI: 10.3390/children8110968] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 10/09/2021] [Accepted: 10/25/2021] [Indexed: 02/05/2023]

Hu L, Lin JY, Sigel K, Kale M. Estimating heterogeneous survival treatment effects of lung cancer screening approaches: A causal machine learning analysis. Ann Epidemiol 2021;62:36-42. [PMID: 34157399 PMCID: PMC8463451 DOI: 10.1016/j.annepidem.2021.06.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 05/18/2021] [Accepted: 06/14/2021] [Indexed: 12/20/2022]

Menchaca M, Pagone F, Erdal S. Comparison of positive SARS-CoV-2 incidence rate with environmental and socioeconomic factors in northern Illinois. Heliyon 2021;7:e07806. [PMID: 34414309 PMCID: PMC8364149 DOI: 10.1016/j.heliyon.2021.e07806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/12/2021] [Accepted: 08/12/2021] [Indexed: 11/19/2022] Open

Hu L, Li L, Ji J, Sanderson M. Identifying and understanding determinants of high healthcare costs for breast cancer: a quantile regression machine learning approach. BMC Health Serv Res 2020;20:1066. [PMID: 33228683 PMCID: PMC7684910 DOI: 10.1186/s12913-020-05936-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 11/18/2020] [Indexed: 12/16/2022] Open

Identifying and assessing the impact of key neighborhood-level determinants on geographic variation in stroke: a machine learning and multilevel modeling approach. BMC Public Health 2020;20:1666. [PMID: 33160324 PMCID: PMC7648288 DOI: 10.1186/s12889-020-09766-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 10/26/2020] [Indexed: 12/31/2022] Open

Abstract

Background

Stroke is a chronic cardiovascular disease that puts major stresses on U.S. health and economy.

The prevalence of stroke exhibits a strong geographical pattern at the state-level, where a cluster of southern states with a substantially higher prevalence of stroke has been called the stroke belt of the nation. Despite this recognition, the extent to which key neighborhood characteristics affect stroke prevalence remains to be further clarified.

Methods

We generated a new neighborhood health data set at the census tract level on nearly 27,000 tracts by pooling information from multiple data sources including the CDC’s 500 Cities Project 2017 data release. We employed a two-stage modeling approach to understand how key neighborhood-level risk factors affect the neighborhood-level stroke prevalence in each state of the US. The first stage used a state-of-the-art Bayesian machine learning algorithm to identify key neighborhood-level determinants. The second stage applied a Bayesian multilevel modeling approach to describe how these key determinants explain the variability in stroke prevalence in each state.

Results

Neighborhoods with a larger proportion of older adults and non-Hispanic blacks were associated with neighborhoods with a higher prevalence of stroke. Higher median household income was linked to lower stroke prevalence. Ozone was found to be positively associated with stroke prevalence in 10 states, while negatively associated with stroke in five states. There was substantial variation in both the direction and magnitude of the associations between these four key factors with stroke prevalence across the states.

Conclusions

When used in a principled variable selection framework, high-performance machine learning can identify key factors of neighborhood-level prevalence of stroke from wide-ranging information in a data-driven way. The Bayesian multilevel modeling approach provides a detailed view of the impact of key factors across the states. The identified major factors and their effect mechanisms can potentially aid policy makers in developing area-based stroke prevention strategies.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12889-020-09766-3.

Collapse

Hu L, Li L, Ji J. Machine learning to identify and understand key factors for provider-patient discussions about smoking. Prev Med Rep 2020;20:101238. [PMID: 33224719 PMCID: PMC7666379 DOI: 10.1016/j.pmedr.2020.101238] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 10/07/2020] [Accepted: 10/20/2020] [Indexed: 12/15/2022] Open