1
|
Klein H, Washington TA. The Relationship of Anti-Transgender Discrimination, Harassment, and Violence to Binge Drinking among Transgender Adults. Subst Use Misuse 2024; 59:583-590. [PMID: 38105183 DOI: 10.1080/10826084.2023.2293731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Purpose: Using a minority stress paradigm, this paper examines the relationship between anti-transgender discrimination, harassment, and violence among transgender adults. Methods: Data from the 2015 U.S. National Transgender Survey were used to examine twenty types of anti-transgender experiences/problems (e.g., harassment at work, problems with police officials, verbal and physical assaults) in a sample of 27,715 transgender Americans aged 18 or older. Binge drinking during the previous month was the dependent variable, and eight control measures were examined in the multivariate analysis. Results: Experiencing any of the twenty types of anti-transgender discrimination, harassment, or violence increased the odds of binge drinking by 48%. Experiencing many such problems increased the odds of binge drinking by 104%. Multivariate analysis showed that anti-transgender discrimination, harassment, and violence remains a predictor of binge drinking even when other key measures are taken into account. Younger people, racial minority group members, and persons who were not married or "involved" were at particularly great risk. Conclusions: Consistent with the minority stress paradigm, the more different types of anti-transgender experiences people had, the more likely they were to engage in binge drinking. Targeted intervention needs to help transgender persons to avoid anti-transgender discrimination, harassment, and violence to the greatest extent possible, and to develop resiliency skills whenever they are victimized. This is particularly true for transgender persons who are younger, minority, and not "involved" in a relationship.
Collapse
Affiliation(s)
- Hugh Klein
- Kensington Research Institute, Silver Spring, Maryland, USA
- School of Social Work, California State University-Long Beach, Long Beach, California, USA
| | - Thomas Alex Washington
- School of Social Work, California State University-Long Beach, Long Beach, California, USA
| |
Collapse
|
2
|
Hu L. A new method for clustered survival data: Estimation of treatment effect heterogeneity and variable selection. Biom J 2024; 66:e2200178. [PMID: 38072661 PMCID: PMC10953775 DOI: 10.1002/bimj.202200178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/31/2023] [Accepted: 08/11/2023] [Indexed: 01/30/2024]
Abstract
We recently developed a new method random-intercept accelerated failure time model with Bayesian additive regression trees (riAFT-BART) to draw causal inferences about population treatment effect on patient survival from clustered and censored survival data while accounting for the multilevel data structure. The practical utility of this method goes beyond the estimation of population average treatment effect. In this work, we exposit how riAFT-BART can be used to solve two important statistical questions with clustered survival data: estimating the treatment effect heterogeneity and variable selection. Leveraging the likelihood-based machine learning, we describe a way in which we can draw posterior samples of the individual survival treatment effect from riAFT-BART model runs, and use the drawn posterior samples to perform an exploratory treatment effect heterogeneity analysis to identify subpopulations who may experience differential treatment effects than population average effects. There is sparse literature on methods for variable selection among clustered and censored survival data, particularly ones using flexible modeling techniques. We propose a permutation-based approach using the predictor's variable inclusion proportion supplied by the riAFT-BART model for variable selection. To address the missing data issue frequently encountered in health databases, we propose a strategy to combine bootstrap imputation and riAFT-BART for variable selection among incomplete clustered survival data. We conduct an expansive simulation study to examine the practical operating characteristics of our proposed methods, and provide empirical evidence that our proposed methods perform better than several existing methods across a wide range of data scenarios. Finally, we demonstrate the methods via a case study of predictors for in-hospital mortality among severe COVID-19 patients and estimating the heterogeneous treatment effects of three COVID-specific medications. The methods developed in this work are readily available in the R ${\textsf {R}}$ package riAFTBART $\textsf {riAFTBART}$ .
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, New Jersey 08854
| |
Collapse
|
3
|
Brousmiche D, Lanier C, Cuny D, Frevent C, Genin M, Blanc-Garin C, Amouyel P, Deram A, Occelli F, Meirhaeghe A. How do territorial characteristics affect spatial inequalities in the risk of coronary heart disease? THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 867:161563. [PMID: 36640871 DOI: 10.1016/j.scitotenv.2023.161563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 01/08/2023] [Accepted: 01/08/2023] [Indexed: 06/17/2023]
Abstract
BACKGROUND Cardiovascular diseases remain the leading cause of death and disabilities worldwide, with coronary heart diseases being the most frequently diagnosed. Their multifactorial etiology involves individual, behavioral and territorial determinants, and thus requires the implementation of multidimensional approaches to assess links between territorial characteristics and the incidence of coronary heart diseases. CONTEXT AND OBJECTIVES This study was carried out in a densely populated area located in the north of France with multiple sources of pollutants. The aim of this research was therefore to establish complex territorial profiles that have been characterized by the standardized incidence, thereby identifying the influences of determinants that can be related to a beneficial or a deleterious effect on cardiovascular health. METHODS Forty-four variables related to economic, social, health, environment and services dimensions with an established or suspected impact on cardiovascular health were used to describe the multidimensional characteristics involved in cardiovascular health. RESULTS Three complex territorial profiles have been highlighted and characterized by the standardized incidence rate (SIR) of coronary heart diseases after adjustment for age and gender. Profile 1 was characterized by an SIR of 0.895 (sd: 0.143) and a higher number of determinants that revealed favorable territorial conditions. Profiles 2 and 3 were characterized by SIRs of respectively 1.225 (sd: 0.242) and 1.119 (sd: 0.273). Territorial characteristics among these profiles of over-incidence were nevertheless dissimilar. Profile 2 revealed higher deprivation, lower vegetation and lower atmospheric pollution, while profile 3 displayed a rather privileged population with contrasted territorial conditions. CONCLUSION This methodology permitted the characterization of the multidimensional determinants involved in cardiovascular health, whether they have a negative or a positive impact, and could provide stakeholders with a diagnostic tool to implement contextualized public health policies to prevent coronary heart diseases.
Collapse
Affiliation(s)
- Delphine Brousmiche
- Univ. Lille, Univ. Artois, IMT Lille Douai, JUNIA, ULR 4515 - LGCgE, Laboratoire de Génie Civil et Géo-Environnement, F-59000 Lille, France; Association pour la Prévention de la Pollution Atmosphérique, F-59120 Loos, France.
| | - Caroline Lanier
- Univ. Lille, Univ. Artois, IMT Lille Douai, JUNIA, ULR 4515 - LGCgE, Laboratoire de Génie Civil et Géo-Environnement, F-59000 Lille, France; Univ. Lille, UFR3S-Faculté d'Ingénierie et Management de la Santé (ILIS), F-59000 Lille, France
| | - Damien Cuny
- Univ. Lille, Univ. Artois, IMT Lille Douai, JUNIA, ULR 4515 - LGCgE, Laboratoire de Génie Civil et Géo-Environnement, F-59000 Lille, France; Univ. Lille, UFR3S-Faculté de Pharmacie de Lille - LSVF, F-59000 Lille, France
| | - Camille Frevent
- Univ. Lille, CHU Lille, ULR 2694 - METRICS: Evaluation des technologies de santé et des pratiques médicales, F-59000 Lille, France
| | - Michael Genin
- Univ. Lille, CHU Lille, ULR 2694 - METRICS: Evaluation des technologies de santé et des pratiques médicales, F-59000 Lille, France
| | - Carine Blanc-Garin
- Univ. Lille, CHU Lille, Institut Pasteur de Lille, Inserm UMR1167 RID-AGE (Risk Factors and Molecular Determinants of Aging-Related Diseases), F-59000 Lille, France
| | - Philippe Amouyel
- Univ. Lille, CHU Lille, Institut Pasteur de Lille, Inserm UMR1167 RID-AGE (Risk Factors and Molecular Determinants of Aging-Related Diseases), F-59000 Lille, France
| | - Annabelle Deram
- Univ. Lille, Univ. Artois, IMT Lille Douai, JUNIA, ULR 4515 - LGCgE, Laboratoire de Génie Civil et Géo-Environnement, F-59000 Lille, France; Univ. Lille, UFR3S-Faculté d'Ingénierie et Management de la Santé (ILIS), F-59000 Lille, France
| | - Florent Occelli
- Univ. Lille, Univ. Artois, IMT Lille Douai, JUNIA, ULR 4515 - LGCgE, Laboratoire de Génie Civil et Géo-Environnement, F-59000 Lille, France; Univ. Lille, UFR3S-Faculté d'Ingénierie et Management de la Santé (ILIS), F-59000 Lille, France
| | - Aline Meirhaeghe
- Univ. Lille, CHU Lille, Institut Pasteur de Lille, Inserm UMR1167 RID-AGE (Risk Factors and Molecular Determinants of Aging-Related Diseases), F-59000 Lille, France
| |
Collapse
|
4
|
Sun F, Yao J, Du S, Qian F, Appleton AA, Tao C, Xu H, Liu L, Dai Q, Joyce BT, Nannini DR, Hou L, Zhang K. Social Determinants, Cardiovascular Disease, and Health Care Cost: A Nationwide Study in the United States Using Machine Learning. J Am Heart Assoc 2023; 12:e027919. [PMID: 36802713 PMCID: PMC10111459 DOI: 10.1161/jaha.122.027919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
Background Existing studies on cardiovascular diseases (CVDs) often focus on individual-level behavioral risk factors, but research examining social determinants is limited. This study applies a novel machine learning approach to identify the key predictors of county-level care costs and prevalence of CVDs (including atrial fibrillation, acute myocardial infarction, congestive heart failure, and ischemic heart disease). Methods and Results We applied the extreme gradient boosting machine learning approach to a total of 3137 counties. Data are from the Interactive Atlas of Heart Disease and Stroke and a variety of national data sets. We found that although demographic composition (eg, percentages of Black people and older adults) and risk factors (eg, smoking and physical inactivity) are among the most important predictors for inpatient care costs and CVD prevalence, contextual factors such as social vulnerability and racial and ethnic segregation are particularly important for the total and outpatient care costs. Poverty and income inequality are the major contributors to the total care costs for counties that are in nonmetro areas or have high segregation or social vulnerability levels. Racial and ethnic segregation is particularly important in shaping the total care costs for counties with low poverty rates or social vulnerability level. Demographic composition, education, and social vulnerability are consistently important across different scenarios. Conclusions The findings highlight the differences in predictors for different types of CVD cost outcomes and the importance of social determinants. Interventions directed toward areas that have been economically and socially marginalized may aid in reducing the impact of CVDs.
Collapse
Affiliation(s)
- Feinuo Sun
- Global Aging and Community Initiative Mount Saint Vincent University Halifax Nova Scotia Canada
| | - Jie Yao
- Department of Epidemiology and Biostatistics, School of Public Health University at Albany, State University of New York Albany NY
| | - Shichao Du
- Department of Sociology University at Albany, State University of New York Albany NY
| | - Feng Qian
- Department of Health Policy, Management and Behavior, School of Public Health University at Albany, State University of New York Albany NY
| | - Allison A Appleton
- Department of Epidemiology and Biostatistics, School of Public Health University at Albany, State University of New York Albany NY
| | - Cui Tao
- School of Biomedical Informatics The University of Texas Health Science Center at Houston Houston TX
| | - Hua Xu
- School of Biomedical Informatics The University of Texas Health Science Center at Houston Houston TX
| | - Lei Liu
- Division of Biostatistics Washington University in St. Louis St. Louis MO
| | - Qi Dai
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, School of Medicine Vanderbilt University, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center Nashville TN
| | - Brian T Joyce
- Department of Preventive Medicine Northwestern University Feinberg School of Medicine Chicago IL
| | - Drew R Nannini
- Department of Preventive Medicine Northwestern University Feinberg School of Medicine Chicago IL
| | - Lifang Hou
- Department of Preventive Medicine Northwestern University Feinberg School of Medicine Chicago IL
| | - Kai Zhang
- Department of Environmental Health Sciences, School of Public Health University at Albany, State University of New York Albany NY
| |
Collapse
|
5
|
Hu L, Li L. Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:16080. [PMID: 36498153 PMCID: PMC9736500 DOI: 10.3390/ijerph192316080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/22/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]
Abstract
Tree-based machine learning methods have gained traction in the statistical and data science fields. They have been shown to provide better solutions to various research questions than traditional analysis approaches. To encourage the uptake of tree-based methods in health research, we review the methodological fundamentals of three key tree-based machine learning methods: random forests, extreme gradient boosting and Bayesian additive regression trees. We further conduct a series of case studies to illustrate how these methods can be properly used to solve important health research problems in four domains: variable selection, estimation of causal effects, propensity score weighting and missing data. We exposit that the central idea of using ensemble tree methods for these research questions is accurate prediction via flexible modeling. We applied ensemble trees methods to select important predictors for the presence of postoperative respiratory complication among early stage lung cancer patients with resectable tumors. We then demonstrated how to use these methods to estimate the causal effects of popular surgical approaches on postoperative respiratory complications among lung cancer patients. Using the same data, we further implemented the methods to accurately estimate the inverse probability weights for a propensity score analysis of the comparative effectiveness of the surgical approaches. Finally, we demonstrated how random forests can be used to impute missing data using the Study of Women's Health Across the Nation data set. To conclude, the tree-based methods are a flexible tool and should be properly used for health investigations.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, NJ 08854, USA
| | - Lihua Li
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
6
|
Hu L, Zou J, Gu C, Ji J, Lopez M, Kale M. A FLEXIBLE SENSITIVITY ANALYSIS APPROACH FOR UNMEASURED CONFOUNDING WITH MULTIPLE TREATMENTS AND A BINARY OUTCOME WITH APPLICATION TO SEER-MEDICARE LUNG CANCER DATA. Ann Appl Stat 2022; 16:1014-1037. [PMID: 36644682 PMCID: PMC9835106 DOI: 10.1214/21-aoas1530] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In the absence of a randomized experiment, a key assumption for drawing causal inference about treatment effects is the ignorable treatment assignment. Violations of the ignorability assumption may lead to biased treatment effect estimates. Sensitivity analysis helps gauge how causal conclusions will be altered in response to the potential magnitude of departure from the ignorability assumption. However, sensitivity analysis approaches for unmeasured confounding in the context of multiple treatments and binary outcomes are scarce. We propose a flexible Monte Carlo sensitivity analysis approach for causal inference in such settings. We first derive the general form of the bias introduced by unmeasured confounding, with emphasis on theoretical properties uniquely relevant to multiple treatments. We then propose methods to encode the impact of unmeasured confounding on potential outcomes and adjust the estimates of causal effects in which the presumed unmeasured confounding is removed. Our proposed methods embed nested multiple imputation within the Bayesian framework, which allow for seamless integration of the uncertainty about the values of the sensitivity parameters and the sampling variability, as well as use of the Bayesian Additive Regression Trees for modeling flexibility. Expansive simulations validate our methods and gain insight into sensitivity analysis with multiple treatments. We use the SEER-Medicare data to demonstrate sensitivity analysis using three treatments for early stage non-small cell lung cancer. The methods developed in this work are readily available in the R package SAMTx.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University
| | - Jungang Zou
- Department of Biostatistics, Columbia University
| | | | - Jiayi Ji
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai
| | | | - Minal Kale
- Department of Medicine, Icahn School of Medicine at Mount Sinai
| |
Collapse
|
7
|
Lin JYJ, Hu L, Huang C, Jiayi J, Lawrence S, Govindarajulu U. A flexible approach for variable selection in large-scale healthcare database studies with missing covariate and outcome data. BMC Med Res Methodol 2022; 22:132. [PMID: 35508974 PMCID: PMC9066834 DOI: 10.1186/s12874-022-01608-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 04/19/2022] [Indexed: 12/17/2022] Open
Abstract
Background Prior work has shown that combining bootstrap imputation with tree-based machine learning variable selection methods can provide good performances achievable on fully observed data when covariate and outcome data are missing at random (MAR). This approach however is computationally expensive, especially on large-scale datasets. Methods We propose an inference-based method, called RR-BART, which leverages the likelihood-based Bayesian machine learning technique, Bayesian additive regression trees, and uses Rubin’s rule to combine the estimates and variances of the variable importance measures on multiply imputed datasets for variable selection in the presence of MAR data. We conduct a representative simulation study to investigate the practical operating characteristics of RR-BART, and compare it with the bootstrap imputation based methods. We further demonstrate the methods via a case study of risk factors for 3-year incidence of metabolic syndrome among middle-aged women using data from the Study of Women’s Health Across the Nation (SWAN). Results The simulation study suggests that even in complex conditions of nonlinearity and nonadditivity with a large percentage of missingness, RR-BART can reasonably recover both prediction and variable selection performances, achievable on the fully observed data. RR-BART provides the best performance that the bootstrap imputation based methods can achieve with the optimal selection threshold value. In addition, RR-BART demonstrates a substantially stronger ability of detecting discrete predictors. Furthermore, RR-BART offers substantial computational savings. When implemented on the SWAN data, RR-BART adds to the literature by selecting a set of predictors that had been less commonly identified as risk factors but had substantial biological justifications. Conclusion The proposed variable selection method for MAR data, RR-BART, offers both computational efficiency and good operating characteristics and is utilitarian in large-scale healthcare database studies. Supplementary Information The online version contains supplementary material available at (10.1186/s12874-022-01608-7).
Collapse
Affiliation(s)
- Jung-Yi Joyce Lin
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, New York, 10029, USA
| | - Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University, 683 Hoes Lane West, Piscataway, 08854, USA.
| | - Chuyue Huang
- Primary Research Solution LLC., 115 W 18th St, New York, 10011, USA
| | - Ji Jiayi
- Department of Biostatistics and Epidemiology, Rutgers University, 683 Hoes Lane West, Piscataway, 08854, USA
| | - Steven Lawrence
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, New York, 10029, USA
| | - Usha Govindarajulu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, New York, 10029, USA
| |
Collapse
|
8
|
Hu L, Joyce Lin JY, Ji J. Variable selection with missing data in both covariates and outcomes: Imputation and machine learning. Stat Methods Med Res 2021; 30:2651-2671. [PMID: 34696650 PMCID: PMC11181487 DOI: 10.1177/09622802211046385] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Variable selection in the presence of both missing covariates and outcomes is an important statistical research topic. Parametric regression are susceptible to misspecification, and as a result are sub-optimal for variable selection. Flexible machine learning methods mitigate the reliance on the parametric assumptions, but do not provide as naturally defined variable importance measure as the covariate effect native to parametric models. We investigate a general variable selection approach when both the covariates and outcomes can be missing at random and have general missing data patterns. This approach exploits the flexibility of machine learning models and bootstrap imputation, which is amenable to nonparametric methods in which the covariate effects are not directly available. We conduct expansive simulations investigating the practical operating characteristics of the proposed variable selection approach, when combined with four tree-based machine learning methods, extreme gradient boosting, random forests, Bayesian additive regression trees, and conditional random forests, and two commonly used parametric methods, lasso and backward stepwise selection. Numeric results suggest that, extreme gradient boosting and Bayesian additive regression trees have the overall best variable selection performance with respect to the F 1 score and Type I error, while the lasso and backward stepwise selection have subpar performance across various settings. There is no significant difference in the variable selection performance due to imputation methods. We further demonstrate the methods via a case study of risk factors for 3-year incidence of metabolic syndrome with data from the Study of Women's Health Across the Nation.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University School of Public Health, USA
| | - Jung-Yi Joyce Lin
- Department of Population Health Science & Policy, Icahn School of Medicine at Mount Sinai, USA
| | - Jiayi Ji
- Department of Population Health Science & Policy, Icahn School of Medicine at Mount Sinai, USA
| |
Collapse
|
9
|
Liu Z, Lin Z, Cao W, Li R, Liu L, Wu H, Tang K. Identify Key Determinants of Contraceptive Use for Sexually Active Young People: A Hybrid Ensemble of Machine Learning Methods. CHILDREN (BASEL, SWITZERLAND) 2021; 8:968. [PMID: 34828681 PMCID: PMC8622295 DOI: 10.3390/children8110968] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 10/09/2021] [Accepted: 10/25/2021] [Indexed: 02/05/2023]
Abstract
Sexually active young people face an increasing public health burden of unintended pregnancies and sexually transmitted diseases due to improper contraception. However, environmental and social factors related to young people's contraception remain unclear. To identify the key factors, we applied ensemble machine learning methods to the data of 12,280 heterosexual Chinese college students who reported sexual intercourse experience in the National College Student Survey on Sexual and Reproductive Health in 2020 (NCSS-SRH 2020). In the order of variable importance, convenient access to contraceptives, certain attitudes towards sex, sexual health knowledge level, being an only-child, and purchasing a bachelor's or master's degree were positively associated with a high frequency of contraceptive use. In contrast, smoking, free access to contraceptives, a specific attitude towards marriage, and negotiation with a sexual partner were negatively associated with a higher frequency of contraceptive use. Our analysis provides insights into young people's contraceptive use under a typically conservative culture of sexuality. Compared to previous studies, we thoroughly investigated internal and external factors that might impact young people's decision on contraception while having sex. Under a conservative culture of sexuality, the effects of the external factors on young people's contraception may outweigh those of the internal factors.
Collapse
Affiliation(s)
- Zongchao Liu
- Vanke School of Public Health, Tsinghua University, Zhongguancun North Street, Haidian District, Beijing 100084, China; (Z.L.); (Z.L.); (H.W.)
- Department of Biostatistics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Zhi Lin
- Vanke School of Public Health, Tsinghua University, Zhongguancun North Street, Haidian District, Beijing 100084, China; (Z.L.); (Z.L.); (H.W.)
- School of Public Health, Peking University, Beijing 100083, China
| | - Wenzhen Cao
- Shantou University Medical College, No. 22 Xinling Road, Shantou 515041, China;
- School of Public Health, Shantou University, No. 243 Daxue Road, Shantou 515063, China
| | - Rui Li
- Department of Surgery, Washington University School of Medicine, St. Louis, MO 63130, USA;
| | - Lilong Liu
- Department of Pharmacology, School of Basic Medical Sciences, Wuhan University, Wuhan 430071, China;
| | - Hanbin Wu
- Vanke School of Public Health, Tsinghua University, Zhongguancun North Street, Haidian District, Beijing 100084, China; (Z.L.); (Z.L.); (H.W.)
| | - Kun Tang
- Vanke School of Public Health, Tsinghua University, Zhongguancun North Street, Haidian District, Beijing 100084, China; (Z.L.); (Z.L.); (H.W.)
| |
Collapse
|
10
|
Hu L, Lin JY, Sigel K, Kale M. Estimating heterogeneous survival treatment effects of lung cancer screening approaches: A causal machine learning analysis. Ann Epidemiol 2021; 62:36-42. [PMID: 34157399 PMCID: PMC8463451 DOI: 10.1016/j.annepidem.2021.06.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 05/18/2021] [Accepted: 06/14/2021] [Indexed: 12/20/2022]
Abstract
The National Lung Screening Trial (NLST) found that low-dose computed tomography (LDCT) screening provided lung cancer (LC) mortality benefit compared to chest radiography (CXR). Considerable research concerns identifying the differential treatment effects that may exist in certain subpopulations. We shed light on several important issues in existing research and highlight the need for further investigation of the heterogeneous comparative effect of LDCT versus CXR, using more flexible and rigorous statistical approaches. We used a high-performance Bayesian machine learning approach designed for censored survival data, accelerated failure time Bayesian additive regression trees model (AFT-BART), to flexibly capture the relationships between the failure time and predictors. We then used the counterfactual framework to draw Markov chain Monte Carlo samples of the individual treatment effect for each participant. Using these posterior samples, we explored the possible treatment effect heterogeneity via a stepwise binary tree approach. When re-analyzed with AFT-BART, LDCT did not have a statistically significant LC or overall mortality benefit compared to CXR. The Asian and Black (particularly those with pack-year ≥ 37 years and without emphysema) NLST population were shown to have enhanced overall mortality benefit from LDCT than the population average. Although inconclusive for LC mortality benefit, Asians, Blacks and Whites with history of chronic obstructive pulmonary disease showed a small trend towards benefit from LDCT. Causal inference with flexible machine learning modeling can provide valuable knowledge for informing treatment decision and planning targeted clinical trials emphasizing personalized medicine approaches.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY; Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, NJ.
| | - Jung-Yi Lin
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY; Icahn School of Medicine at Mount Sinai, Institute for Health Care Delivery Science, New York, NY
| | - Keith Sigel
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Minal Kale
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| |
Collapse
|
11
|
Menchaca M, Pagone F, Erdal S. Comparison of positive SARS-CoV-2 incidence rate with environmental and socioeconomic factors in northern Illinois. Heliyon 2021; 7:e07806. [PMID: 34414309 PMCID: PMC8364149 DOI: 10.1016/j.heliyon.2021.e07806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/12/2021] [Accepted: 08/12/2021] [Indexed: 11/19/2022] Open
Abstract
Early studies showed positive associations fine particulate matter (PM2.5), course particulate matter PM10, nitrogen dioxide (NO2) and Ozone (O3) concentrations with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) confirmed cases in the United States. One study showed that a1 μg/m3 increase in PM2.5 is associated with an 8% increase in the COVID-19 death rate. Specifically, Chicago and surrounding suburbs have been labeled hot spots in the United States and correlation with air pollutants concentration will help identify specific communities most at risk. A number of studies have identified demographic variables associated with increased positive SARS-CoV-2 and the importance of air quality and socioeconomic factors must be further understood for more targeted public health responses. The results of this analysis noted positive relationships between zip code SARS-CoV-2 incidence rate and environmental and demographic EJ indicators. Evaluation of race and SARS-CoV-2 incidence rate at the zip code level found positive moderate correlations for ethnic minority individuals.
Collapse
Affiliation(s)
- Martha Menchaca
- School of Medicine, University of Illinois at Chicago, 1740 West Taylor, M/C 931, Chicago, Il 60612, USA
| | - Frank Pagone
- RHP Risk Management Inc., 8745 W, Higgins Rd. Suite 320, Chicago, IL 60631, USA
| | - Serap Erdal
- Environmental and Occupational Health Sciences, School of Public Health, University of Illinois at Chicago, 1603 West Taylor Street, M/C 923 Chicago, IL 60612, USA
| |
Collapse
|
12
|
Hu L, Li L, Ji J, Sanderson M. Identifying and understanding determinants of high healthcare costs for breast cancer: a quantile regression machine learning approach. BMC Health Serv Res 2020; 20:1066. [PMID: 33228683 PMCID: PMC7684910 DOI: 10.1186/s12913-020-05936-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 11/18/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND To identify and rank the importance of key determinants of high medical expenses among breast cancer patients and to understand the underlying effects of these determinants. METHODS The Oncology Care Model (OCM) developed by the Center for Medicare & Medicaid Innovation were used. The OCM data provided to Mount Sinai on 2938 breast-cancer episodes included both baseline periods and three performance periods between Jan 1, 2012 and Jan 1, 2018. We included 11 variables representing information on treatment, demography and socio-economics status, in addition to episode expenditures. OCM data were collected from participating practices and payers. We applied a principled variable selection algorithm using a flexible tree-based machine learning technique, Quantile Regression Forests. RESULTS We found that the use of chemotherapy drugs (versus hormonal therapy) and interval of days without chemotherapy predominantly affected medical expenses among high-cost breast cancer patients. The second-tier major determinants were comorbidities and age. Receipt of surgery or radiation, geographically adjusted relative cost and insurance type were also identified as important high-cost drivers. These factors had disproportionally larger effects upon the high-cost patients. CONCLUSIONS Data-driven machine learning methods provide insights into the underlying web of factors driving up the costs for breast cancer care management. Results from our study may help inform population health management initiatives and allow policymakers to develop tailored interventions to meet the needs of those high-cost patients and to avoid waste of scarce resource.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, One Gustave L. Levy Place, Box 1077, New York, NY, 10029, USA.
| | - Lihua Li
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, One Gustave L. Levy Place, Box 1077, New York, NY, 10029, USA
| | - Jiayi Ji
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, One Gustave L. Levy Place, Box 1077, New York, NY, 10029, USA
| | - Mark Sanderson
- Department of Health System Design and Global Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| |
Collapse
|
13
|
Identifying and assessing the impact of key neighborhood-level determinants on geographic variation in stroke: a machine learning and multilevel modeling approach. BMC Public Health 2020; 20:1666. [PMID: 33160324 PMCID: PMC7648288 DOI: 10.1186/s12889-020-09766-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 10/26/2020] [Indexed: 12/31/2022] Open
Abstract
Background Stroke is a chronic cardiovascular disease that puts major stresses on U.S. health and economy. The prevalence of stroke exhibits a strong geographical pattern at the state-level, where a cluster of southern states with a substantially higher prevalence of stroke has been called the stroke belt of the nation. Despite this recognition, the extent to which key neighborhood characteristics affect stroke prevalence remains to be further clarified. Methods We generated a new neighborhood health data set at the census tract level on nearly 27,000 tracts by pooling information from multiple data sources including the CDC’s 500 Cities Project 2017 data release. We employed a two-stage modeling approach to understand how key neighborhood-level risk factors affect the neighborhood-level stroke prevalence in each state of the US. The first stage used a state-of-the-art Bayesian machine learning algorithm to identify key neighborhood-level determinants. The second stage applied a Bayesian multilevel modeling approach to describe how these key determinants explain the variability in stroke prevalence in each state. Results Neighborhoods with a larger proportion of older adults and non-Hispanic blacks were associated with neighborhoods with a higher prevalence of stroke. Higher median household income was linked to lower stroke prevalence. Ozone was found to be positively associated with stroke prevalence in 10 states, while negatively associated with stroke in five states. There was substantial variation in both the direction and magnitude of the associations between these four key factors with stroke prevalence across the states. Conclusions When used in a principled variable selection framework, high-performance machine learning can identify key factors of neighborhood-level prevalence of stroke from wide-ranging information in a data-driven way. The Bayesian multilevel modeling approach provides a detailed view of the impact of key factors across the states. The identified major factors and their effect mechanisms can potentially aid policy makers in developing area-based stroke prevention strategies. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-020-09766-3.
Collapse
|
14
|
Hu L, Li L, Ji J. Machine learning to identify and understand key factors for provider-patient discussions about smoking. Prev Med Rep 2020; 20:101238. [PMID: 33224719 PMCID: PMC7666379 DOI: 10.1016/j.pmedr.2020.101238] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 10/07/2020] [Accepted: 10/20/2020] [Indexed: 12/15/2022] Open
Abstract
We sought to identify key determinants of the likelihood of provider-patient discussions about smoking and to understand the effects of these determinants. We used data on 3666 self-reported current smokers who talked to a health professional within a year of the time the survey was conducted using the 2017 National Health Interview Survey. We included wide-ranging information on 43 potential covariates across four domains, demographic and socio-economic status, behavior, health status and healthcare utilization. We exploited a principled nonparametric permutation based approach using Bayesian machine learning to identify and rank important determinants of discussions about smoking between health providers and patients. In the order of importance, frequency of doctor office visits, intensity of cigarette use, length of smoking history, chronic obstructive pulmonary disease, emphysema, marital status were major determinants of disparities in provider-patient discussions about smoking. There was a distinct interaction between intensity of cigarette use and length of smoking history. Our analysis may provide some insights into strategies for promoting discussions on smoking and facilitating smoking cessation. Health care resource usage, smoking intensity and duration and smoking-related conditions were key drivers. The "usual suspects", age, gender, race and ethnicity were less important, and gender, in particular, had little effect.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,The Institute for Healthcare Delivery, Mount Sinai Health System, New York, NY, USA
| | - Lihua Li
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,The Institute for Healthcare Delivery, Mount Sinai Health System, New York, NY, USA
| | - Jiayi Ji
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,The Institute for Healthcare Delivery, Mount Sinai Health System, New York, NY, USA
| |
Collapse
|