1
|
Salvatore M, Kundu R, Shi X, Friese CR, Lee S, Fritsche LG, Mondul AM, Hanauer D, Pearce CL, Mukherjee B. To weight or not to weight? The effect of selection bias in 3 large electronic health record-linked biobanks and recommendations for practice. J Am Med Inform Assoc 2024; 31:1479-1492. [PMID: 38742457 PMCID: PMC11187425 DOI: 10.1093/jamia/ocae098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 04/14/2024] [Accepted: 04/18/2024] [Indexed: 05/16/2024] Open
Abstract
OBJECTIVES To develop recommendations regarding the use of weights to reduce selection bias for commonly performed analyses using electronic health record (EHR)-linked biobank data. MATERIALS AND METHODS We mapped diagnosis (ICD code) data to standardized phecodes from 3 EHR-linked biobanks with varying recruitment strategies: All of Us (AOU; n = 244 071), Michigan Genomics Initiative (MGI; n = 81 243), and UK Biobank (UKB; n = 401 167). Using 2019 National Health Interview Survey data, we constructed selection weights for AOU and MGI to represent the US adult population more. We used weights previously developed for UKB to represent the UKB-eligible population. We conducted 4 common analyses comparing unweighted and weighted results. RESULTS For AOU and MGI, estimated phecode prevalences decreased after weighting (weighted-unweighted median phecode prevalence ratio [MPR]: 0.82 and 0.61), while UKB estimates increased (MPR: 1.06). Weighting minimally impacted latent phenome dimensionality estimation. Comparing weighted versus unweighted phenome-wide association study for colorectal cancer, the strongest associations remained unaltered, with considerable overlap in significant hits. Weighting affected the estimated log-odds ratio for sex and colorectal cancer to align more closely with national registry-based estimates. DISCUSSION Weighting had a limited impact on dimensionality estimation and large-scale hypothesis testing but impacted prevalence and association estimation. When interested in estimating effect size, specific signals from untargeted association analyses should be followed up by weighted analysis. CONCLUSION EHR-linked biobanks should report recruitment and selection mechanisms and provide selection weights with defined target populations. Researchers should consider their intended estimands, specify source and target populations, and weight EHR-linked biobank analyses accordingly.
Collapse
Affiliation(s)
- Maxwell Salvatore
- Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109-2029, United States
- Center for Precision Health Data Science, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States
| | - Ritoban Kundu
- Center for Precision Health Data Science, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States
| | - Xu Shi
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States
| | - Christopher R Friese
- Rogel Cancer Center, Michigan Medicine, University of Michigan, Ann Arbor, MI 48109-2029, United States
- Center for Improving Patient and Population Health, School of Nursing, University of Michigan, Ann Arbor, MI 48109-2029, United States
- Department of Health Management and Policy, University of Michigan, Ann Arbor, MI 48109-2029, United States
| | - Seunggeun Lee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States
- Graduate School of Data Science, Seoul National University, Gwanak-gu, Seoul, Republic of Korea
| | - Lars G Fritsche
- Center for Precision Health Data Science, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States
- Rogel Cancer Center, Michigan Medicine, University of Michigan, Ann Arbor, MI 48109-2029, United States
| | - Alison M Mondul
- Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109-2029, United States
- Rogel Cancer Center, Michigan Medicine, University of Michigan, Ann Arbor, MI 48109-2029, United States
| | - David Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI 48109-2054, United States
| | - Celeste Leigh Pearce
- Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109-2029, United States
- Rogel Cancer Center, Michigan Medicine, University of Michigan, Ann Arbor, MI 48109-2029, United States
| | - Bhramar Mukherjee
- Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109-2029, United States
- Center for Precision Health Data Science, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States
| |
Collapse
|
2
|
Razzaghi H, Goodwin Davies A, Boss S, Bunnell HT, Chen Y, Chrischilles EA, Dickinson K, Hanauer D, Huang Y, Ilunga KTS, Katsoufis C, Lehmann H, Lemas DJ, Matthews K, Mendonca EA, Morse K, Ranade D, Rosenman M, Taylor B, Walters K, Denburg MR, Forrest CB, Bailey LC. Systematic data quality assessment of electronic health record data to evaluate study-specific fitness: Report from the PRESERVE research study. PLOS DIGITAL HEALTH 2024; 3:e0000527. [PMID: 38935590 PMCID: PMC11210795 DOI: 10.1371/journal.pdig.0000527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 05/07/2024] [Indexed: 06/29/2024]
Abstract
Study-specific data quality testing is an essential part of minimizing analytic errors, particularly for studies making secondary use of clinical data. We applied a systematic and reproducible approach for study-specific data quality testing to the analysis plan for PRESERVE, a 15-site, EHR-based observational study of chronic kidney disease in children. This approach integrated widely adopted data quality concepts with healthcare-specific evaluation methods. We implemented two rounds of data quality assessment. The first produced high-level evaluation using aggregate results from a distributed query, focused on cohort identification and main analytic requirements. The second focused on extended testing of row-level data centralized for analysis. We systematized reporting and cataloguing of data quality issues, providing institutional teams with prioritized issues for resolution. We tracked improvements and documented anomalous data for consideration during analyses. The checks we developed identified 115 and 157 data quality issues in the two rounds, involving completeness, data model conformance, cross-variable concordance, consistency, and plausibility, extending traditional data quality approaches to address more complex stratification and temporal patterns. Resolution efforts focused on higher priority issues, given finite study resources. In many cases, institutional teams were able to correct data extraction errors or obtain additional data, avoiding exclusion of 2 institutions entirely and resolving 123 other gaps. Other results identified complexities in measures of kidney function, bearing on the study's outcome definition. Where limitations such as these are intrinsic to clinical data, the study team must account for them in conducting analyses. This study rigorously evaluated fitness of data for intended use. The framework is reusable and built on a strong theoretical underpinning. Significant data quality issues that would have otherwise delayed analyses or made data unusable were addressed. This study highlights the need for teams combining subject-matter and informatics expertise to address data quality when working with real world data.
Collapse
Affiliation(s)
- Hanieh Razzaghi
- Applied Clinical Research Center, Departments of Pediatrics and Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Amy Goodwin Davies
- Applied Clinical Research Center, Departments of Pediatrics and Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Samuel Boss
- Applied Clinical Research Center, Departments of Pediatrics and Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - H. Timothy Bunnell
- Biomedical Research Informatics Center, Nemours Children’s Hospital, Wilmington, Delaware, United States of America
| | - Yong Chen
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Elizabeth A. Chrischilles
- Department of Epidemiology, College of Public Health, University of Iowa, Iowa City, Iowa, United States of America
| | - Kimberley Dickinson
- Applied Clinical Research Center, Departments of Pediatrics and Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - David Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Yungui Huang
- IT Research and Innovation, Nationwide Children’s Hospital, Columbus, Ohio, United States of America
| | - K. T. Sandra Ilunga
- Applied Clinical Research Center, Departments of Pediatrics and Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Chryso Katsoufis
- Division of Pediatric Nephrology, University of Miami Miller School of Medicine, Miami, Florida United States of America
| | - Harold Lehmann
- Biomedical Informatics & Data Science Section, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
| | - Dominick J. Lemas
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, FLorida, United States of America
| | - Kevin Matthews
- Analytics Research Center, Children’s Hospital of Colorado, Aurora, Colorado, United States of America
| | - Eneida A. Mendonca
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Keith Morse
- Division of Pediatric Hospital Medicine, Stanford University School of Medicine, Stanford, California, United States of America
| | - Daksha Ranade
- Biostatistics, Epidemiology, and Analytics in Research (BEAR), Seattle Children’s Hospital, Seattle, Washington, United States of America
| | - Marc Rosenman
- Department of Pediatrics, Ann & Robert H. Lurie Children’s Hospital, Chicago, Illinois, United States of America
| | - Bradley Taylor
- Clinical and Translational Science Institute, Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America
| | - Kellie Walters
- Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Michelle R. Denburg
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Division of Nephrology, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Christopher B. Forrest
- Applied Clinical Research Center, Departments of Pediatrics and Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
- Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - L. Charles Bailey
- Applied Clinical Research Center, Departments of Pediatrics and Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
- Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
3
|
Garg E, Arguello-Pascualli P, Vishnyakova O, Halevy AR, Yoo S, Brooks JD, Bull SB, Gagnon F, Greenwood CMT, Hung RJ, Lawless JF, Lerner-Ellis J, Dennis JK, Abraham RJS, Garant JM, Thiruvahindrapuram B, Jones SJM, Strug LJ, Paterson AD, Sun L, Elliott LT. Canadian COVID-19 host genetics cohort replicates known severity associations. PLoS Genet 2024; 20:e1011192. [PMID: 38517939 PMCID: PMC10990181 DOI: 10.1371/journal.pgen.1011192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 04/03/2024] [Accepted: 02/22/2024] [Indexed: 03/24/2024] Open
Abstract
The HostSeq initiative recruited 10,059 Canadians infected with SARS-CoV-2 between March 2020 and March 2023, obtained clinical information on their disease experience and whole genome sequenced (WGS) their DNA. We analyzed the WGS data for genetic contributors to severe COVID-19 (considering 3,499 hospitalized cases and 4,975 non-hospitalized after quality control). We investigated the evidence for replication of loci reported by the International Host Genetics Initiative (HGI); analyzed the X chromosome; conducted rare variant gene-based analysis and polygenic risk score testing. Population stratification was adjusted for using meta-analysis across ancestry groups. We replicated two loci identified by the HGI for COVID-19 severity: the LZTFL1/SLC6A20 locus on chromosome 3 and the FOXP4 locus on chromosome 6 (the latter with a variant significant at P < 5E-8). We found novel significant associations with MRAS and WDR89 in gene-based analyses, and constructed a polygenic risk score that explained 1.01% of the variance in severe COVID-19. This study provides independent evidence confirming the robustness of previously identified COVID-19 severity loci by the HGI and identifies novel genes for further investigation.
Collapse
Affiliation(s)
- Elika Garg
- Department of Statistics and Actuarial Science, Simon Fraser University, Vancouver, British Columbia, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Paola Arguello-Pascualli
- BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Olga Vishnyakova
- Department of Statistics and Actuarial Science, Simon Fraser University, Vancouver, British Columbia, Canada
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Anat R. Halevy
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Samantha Yoo
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario, Canada
| | - Jennifer D. Brooks
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Shelley B. Bull
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - France Gagnon
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Celia M. T. Greenwood
- Gerald Bronfman Department of Oncology, Department of Epidemiology, Biostatistics and Occupational Health, Department of Human Genetics, McGill University, Montreal, Quebec, Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Rayjean J. Hung
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Jerald F. Lawless
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Jordan Lerner-Ellis
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
- Mount Sinai Hospital, Toronto, Ontario, Canada
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
| | - Jessica K. Dennis
- BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Rohan J. S. Abraham
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Jean-Michel Garant
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada
| | | | - Steven J. M. Jones
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada
| | | | - Lisa J. Strug
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Andrew D. Paterson
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Lei Sun
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Lloyd T. Elliott
- Department of Statistics and Actuarial Science, Simon Fraser University, Vancouver, British Columbia, Canada
| |
Collapse
|
4
|
Salvatore M, Kundu R, Shi X, Friese CR, Lee S, Fritsche LG, Mondul AM, Hanauer D, Pearce CL, Mukherjee B. To weight or not to weight? Studying the effect of selection bias in three large EHR-linked biobanks. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.12.24302710. [PMID: 38405832 PMCID: PMC10888982 DOI: 10.1101/2024.02.12.24302710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Objective To explore the role of selection bias adjustment by weighting electronic health record (EHR)-linked biobank data for commonly performed analyses. Materials and methods We mapped diagnosis (ICD code) data to standardized phecodes from three EHR-linked biobanks with varying recruitment strategies: All of Us (AOU; n=244,071), Michigan Genomics Initiative (MGI; n=81,243), and UK Biobank (UKB; n=401,167). Using 2019 National Health Interview Survey data, we constructed selection weights for AOU and MGI to be more representative of the US adult population. We used weights previously developed for UKB to represent the UKB-eligible population. We conducted four common descriptive and analytic tasks comparing unweighted and weighted results. Results For AOU and MGI, estimated phecode prevalences decreased after weighting (weighted-unweighted median phecode prevalence ratio [MPR]: 0.82 and 0.61), while UKB's estimates increased (MPR: 1.06). Weighting minimally impacted latent phenome dimensionality estimation. Comparing weighted versus unweighted PheWAS for colorectal cancer, the strongest associations remained unaltered and there was large overlap in significant hits. Weighting affected the estimated log-odds ratio for sex and colorectal cancer to align more closely with national registry-based estimates. Discussion Weighting had limited impact on dimensionality estimation and large-scale hypothesis testing but impacted prevalence and association estimation more. Results from untargeted association analyses should be followed by weighted analysis when effect size estimation is of interest for specific signals. Conclusion EHR-linked biobanks should report recruitment and selection mechanisms and provide selection weights with defined target populations. Researchers should consider their intended estimands, specify source and target populations, and weight EHR-linked biobank analyses accordingly.
Collapse
Affiliation(s)
- Maxwell Salvatore
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
- Center for Precision Health Data Science, University of Michigan, Ann Arbor, MI, USA
| | - Ritoban Kundu
- Center for Precision Health Data Science, University of Michigan, Ann Arbor, MI, USA
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Xu Shi
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Christopher R Friese
- Rogel Cancer Center, University of Michigan, Ann Arbor, MI, USA
- Center for Improving Patient and Population Health, School of Nursing, University of Michigan, Ann Arbor, MI, USA
- Department of Health Management and Policy, University of Michigan, Ann Arbor, MI, USA
| | - Seunggeun Lee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
- Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea
| | - Lars G Fritsche
- Center for Precision Health Data Science, University of Michigan, Ann Arbor, MI, USA
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
- Rogel Cancer Center, University of Michigan, Ann Arbor, MI, USA
| | - Alison M Mondul
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
- Rogel Cancer Center, University of Michigan, Ann Arbor, MI, USA
| | - David Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Celeste Leigh Pearce
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
- Rogel Cancer Center, University of Michigan, Ann Arbor, MI, USA
| | - Bhramar Mukherjee
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
- Center for Precision Health Data Science, University of Michigan, Ann Arbor, MI, USA
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
5
|
Kim J, Anthopolos R, Zhong J. Bias correction models for electronic health records data in the presence of non-random sampling. Biometrics 2024; 80:ujae014. [PMID: 38488466 PMCID: PMC10941326 DOI: 10.1093/biomtc/ujae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 01/12/2024] [Accepted: 02/20/2024] [Indexed: 03/18/2024]
Abstract
Electronic health records (EHRs) contain rich clinical information for millions of patients and are increasingly used for public health research. However, non-random inclusion of subjects in EHRs can result in selection bias, with factors such as demographics, socioeconomic status, healthcare referral patterns, and underlying health status playing a role. While this issue has been well documented, little work has been done to develop or apply bias-correction methods, often due to the fact that most of these factors are unavailable in EHRs. To address this gap, we propose a series of Heckman type bias correction methods by incorporating social determinants of health selection covariates to model the EHR non-random sampling probability. Through simulations under various settings, we demonstrate the effectiveness of our proposed method in correcting biases in both the association coefficient and the outcome mean. Our method augments the utility of EHRs for public health inferences, as we show by estimating the prevalence of cardiovascular disease and its correlation with risk factors in the New York City network of EHRs.
Collapse
Affiliation(s)
- Jiyu Kim
- Department of Population Health, NYU Grossman School of Medicine, New York University, 180 Madison Ave, New York, NY 10016, United States
| | - Rebecca Anthopolos
- Department of Population Health, NYU Grossman School of Medicine, New York University, 180 Madison Ave, New York, NY 10016, United States
| | - Judy Zhong
- Department of Population Health, NYU Grossman School of Medicine, New York University, 180 Madison Ave, New York, NY 10016, United States
| |
Collapse
|
6
|
Fritsche LG, Nam K, Du J, Kundu R, Salvatore M, Shi X, Lee S, Burgess S, Mukherjee B. Uncovering associations between pre-existing conditions and COVID-19 Severity: A polygenic risk score approach across three large biobanks. PLoS Genet 2023; 19:e1010907. [PMID: 38113267 PMCID: PMC10763941 DOI: 10.1371/journal.pgen.1010907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 01/03/2024] [Accepted: 12/05/2023] [Indexed: 12/21/2023] Open
Abstract
OBJECTIVE To overcome the limitations associated with the collection and curation of COVID-19 outcome data in biobanks, this study proposes the use of polygenic risk scores (PRS) as reliable proxies of COVID-19 severity across three large biobanks: the Michigan Genomics Initiative (MGI), UK Biobank (UKB), and NIH All of Us. The goal is to identify associations between pre-existing conditions and COVID-19 severity. METHODS Drawing on a sample of more than 500,000 individuals from the three biobanks, we conducted a phenome-wide association study (PheWAS) to identify associations between a PRS for COVID-19 severity, derived from a genome-wide association study on COVID-19 hospitalization, and clinical pre-existing, pre-pandemic phenotypes. We performed cohort-specific PRS PheWAS and a subsequent fixed-effects meta-analysis. RESULTS The current study uncovered 23 pre-existing conditions significantly associated with the COVID-19 severity PRS in cohort-specific analyses, of which 21 were observed in the UKB cohort and two in the MGI cohort. The meta-analysis yielded 27 significant phenotypes predominantly related to obesity, metabolic disorders, and cardiovascular conditions. After adjusting for body mass index, several clinical phenotypes, such as hypercholesterolemia and gastrointestinal disorders, remained associated with an increased risk of hospitalization following COVID-19 infection. CONCLUSION By employing PRS as a proxy for COVID-19 severity, we corroborated known risk factors and identified novel associations between pre-existing clinical phenotypes and COVID-19 severity. Our study highlights the potential value of using PRS when actual outcome data may be limited or inadequate for robust analyses.
Collapse
Affiliation(s)
- Lars G. Fritsche
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Kisung Nam
- Graduate School of Data Science, Seoul National University, Seoul, South Korea
| | - Jiacong Du
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Ritoban Kundu
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Maxwell Salvatore
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Xu Shi
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Seunggeun Lee
- Graduate School of Data Science, Seoul National University, Seoul, South Korea
| | - Stephen Burgess
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
- Cardiovascular Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
7
|
Yin J, Zhao M, Yang L. Comment on: Decreased psoas muscle area is a prognosticator for 90-day and 1-year survival in patients undergoing surgical treatment for spinal metastasis. Clin Nutr 2023; 42:2082-2083. [PMID: 37316332 DOI: 10.1016/j.clnu.2023.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 06/01/2023] [Indexed: 06/16/2023]
Affiliation(s)
- Jianqiao Yin
- Department of Oncology, Shengjing Hospital of China Medical University, Liaoning, 110004, China
| | - Mu Zhao
- Department of Orthopedics, Shengjing Hospital of China Medical University, Liaoning, 110004, China
| | - Liyu Yang
- Department of Orthopedics, Shengjing Hospital of China Medical University, Liaoning, 110004, China.
| |
Collapse
|
8
|
Ng DQ, Jia S, Wisseh C, Cadiz C, Nguyen M, Lee J, McBane S, Nguyen L, Chan A, Hurley-Kim K. Sociodemographic characteristics differ across routine adult vaccine cohorts: An All of Us descriptive study. J Am Pharm Assoc (2003) 2022; 63:582-591.e20. [PMID: 36549934 DOI: 10.1016/j.japh.2022.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/07/2022] [Accepted: 11/07/2022] [Indexed: 11/15/2022]
Abstract
BACKGROUND The National Institutes of Health All of Us (AoU) Research Program is currently building a database of 1million+ adult subjects. With it, we describe the characteristics of those with documented vaccinations. OBJECTIVES To describe the sociodemographic, health status, and lifestyle factors associated with vaccinations. METHODS This is a retrospective study involving data from the AoU program (R2020Q4R2, N = 315,297). Five vaccine cohorts [influenza, hepatitis B (HBV), pneumococcal <65 years old, pneumococcal ≥65 years old, and human papillomavirus (HPV)] were generated based on vaccination history. The influenza cohort comprised participants with documented influenza vaccinations in electronic health records (EHRs) from September 2017 to May 2018. Other vaccine cohorts comprised participants with ≥1 lifetime record(s) of vaccination documented in the EHR by December 2018. The vaccine cohorts were compared to the overall AoU cohort. Descriptive statistics were generated using EHR- and survey-based sociodemographic, health, and lifestyle information. The SAMBA (0.9.0) R package was utilized to adjust for EHR selection and outcome misclassification biases to infer sources of disparity for pneumococcal vaccinations in older adults. RESULTS Cohort counts were as follows: influenza (n = 15,346), HBV (n = 6323), pneumococcal <65 (n = 15,217), pneumococcal ≥65 (n = 15,100), and HPV (n = 2125). All vaccine cohorts had higher proportions of White and non-Hispanic/Latino participants compared to the overall AoU cohort. The largest differences were found in pneumococcal age ≥65, with 80.2% White participants compared to 52.9% in the overall study population. Multivariable analysis revealed that race/ethnic disparities in pneumococcal vaccination among older adults were explained by biological sex, income, health insurance, and education-related variables. CONCLUSION Racial, ethnic, education, and income characteristics differ across the vaccine cohorts among AoU participants. These findings inform future utilization of large health databases in vaccine epidemiology research and emphasize the need for more targeted interventions that address differences in vaccine uptake.
Collapse
|