Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Beesley LJ, Salvatore M, Fritsche LG, Pandit A, Rao A, Brummett C, Willer CJ, Lisabeth LD, Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat Med 2020;39:773-800. [PMID: 31859414 PMCID: PMC7983809 DOI: 10.1002/sim.8445] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 09/10/2019] [Accepted: 11/16/2019] [Indexed: 01/03/2023]

For:	Beesley LJ, Salvatore M, Fritsche LG, Pandit A, Rao A, Brummett C, Willer CJ, Lisabeth LD, Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat Med 2020;39:773-800. [PMID: 31859414 PMCID: PMC7983809 DOI: 10.1002/sim.8445] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 09/10/2019] [Accepted: 11/16/2019] [Indexed: 01/03/2023]

Number

Cited by Other Article(s)

ZHANG GUANGHAO, BEESLEY LAURENJ, MUKHERJEE BHRAMAR, SHI XU. PATIENT RECRUITMENT USING ELECTRONIC HEALTH RECORDS UNDER SELECTION BIAS: A TWO-PHASE SAMPLING FRAMEWORK. Ann Appl Stat 2024;18:1858-1878. [PMID: 39149424 PMCID: PMC11323140 DOI: 10.1214/23-aoas1860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]

Nam Y, Kim J, Jung SH, Woerner J, Suh EH, Lee DG, Shivakumar M, Lee ME, Kim D. Harnessing Artificial Intelligence in Multimodal Omics Data Integration: Paving the Path for the Next Frontier in Precision Medicine. Annu Rev Biomed Data Sci 2024;7:225-250. [PMID: 38768397 DOI: 10.1146/annurev-biodatasci-102523-103801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]

Goleva SB, Williams A, Schlueter DJ, Keaton JM, Tran TC, Waxse BJ, Ferrara TM, Cassini T, Mo H, Denny JC. Racial and Ethnic Disparities in Antihypertensive Medication Prescribing Patterns and Effectiveness. Clin Pharmacol Ther 2024. [PMID: 39051523 DOI: 10.1002/cpt.3360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 06/08/2024] [Indexed: 07/27/2024]

Abstract

Variability in drug effectiveness and provider prescribing patterns have been reported in different racial and ethnic populations. We sought to evaluate antihypertensive drug effectiveness and prescribing patterns among self-identified Hispanic/Latino (Hispanic), Non-Hispanic Black (Black), and Non-Hispanic White (White) populations that enrolled in the NIH All of Us Research Program, a US longitudinal cohort. We employed a self-controlled case study method using electronic health record and survey data from 17,718 White, Hispanic, and Black participants who were diagnosed with essential hypertension and prescribed at least one of 19 commonly used antihypertensive medications. Effectiveness was determined by calculating the reduction in systolic blood pressure measurements after 28 or more days of drug exposure. Starting systolic blood pressure and effectiveness for each medication were compared for self-reported Black, Hispanic, and White participants using adjusted linear regressions. Black and Hispanic participants were started on antihypertensive medications at significantly higher SBP than White participants in 13 and 7 out of 19 medications, respectively. More Black participants were prescribed multiple antihypertensive medications (58.46%) than White (52.35%) or Hispanic (49.9%) participants. First-line HTN medications differed by race and ethnicity. Following the 2017 American College of Cardiology and the American Heart Association High Blood Pressure Guideline release, around 64% of Black participants were prescribed a recommended first-line antihypertensive drug compared with 76% of White and 82% of Hispanic participants. Effect sizes suggested that most antihypertensive drugs were less effective in Hispanic and Black, compared with White, participants, and statistical significance was reached in 6 out of 19 drugs. These results indicate that Black and Hispanic populations may benefit from earlier intervention and screening and highlight the potential benefits of personalizing first-line medications.

Collapse

Venkatesh SS, Ganjgahi H, Palmer DS, Coley K, Linchangco GV, Hui Q, Wilson P, Ho YL, Cho K, Arumäe K, Wittemans LBL, Nellåker C, Vainik U, Sun YV, Holmes C, Lindgren CM, Nicholson G. Characterising the genetic architecture of changes in adiposity during adulthood using electronic health records. Nat Commun 2024;15:5801. [PMID: 38987242 PMCID: PMC11237142 DOI: 10.1038/s41467-024-49998-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 06/25/2024] [Indexed: 07/12/2024] Open

Affiliation(s)

Samvida S Venkatesh Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK. Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.
Habib Ganjgahi Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK Department of Statistics, University of Oxford, Oxford, UK
Duncan S Palmer Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK Nuffield Department of Population Health, Medical Sciences Division, University of Oxford, Oxford, UK
Kayesha Coley Department of Population Health Sciences, University of Leicester, Leicester, UK
Gregorio V Linchangco Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA Atlanta VA Health Care System, Decatur, GA, USA
Qin Hui Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA Atlanta VA Health Care System, Decatur, GA, USA
Peter Wilson Atlanta VA Health Care System, Decatur, GA, USA Department of Medicine, Emory University School of Medicine, Atlanta, GA, USA
Yuk-Lam Ho Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA
Kelly Cho Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA Division of Aging, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Kadri Arumäe Institute of Psychology, Faculty of Social Sciences, University of Tartu, Tartu, Estonia
Laura B L Wittemans Novo Nordisk Research Centre Oxford, Oxford, UK Nuffield Department of Women's and Reproductive Health, Medical Sciences Division, University of Oxford, Oxford, UK
Christoffer Nellåker Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK Nuffield Department of Women's and Reproductive Health, Medical Sciences Division, University of Oxford, Oxford, UK
Uku Vainik Institute of Psychology, Faculty of Social Sciences, University of Tartu, Tartu, Estonia Estonian Genome Centre, Institute of Genomics, Faculty of Science and Technology, University of Tartu, Tartu, Estonia Department of Neurology and Neurosurgery, Faculty of Medicine and Health Sciences, University of McGill, Montreal, Canada
Yan V Sun Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA Atlanta VA Health Care System, Decatur, GA, USA
Chris Holmes Department of Statistics, University of Oxford, Oxford, UK Nuffield Department of Medicine, Medical Sciences Division, University of Oxford, Oxford, UK The Alan Turing Institute, London, UK
Cecilia M Lindgren Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK. Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK. Nuffield Department of Women's and Reproductive Health, Medical Sciences Division, University of Oxford, Oxford, UK. Broad Institute of Harvard and MIT, Cambridge, MA, USA.
George Nicholson Department of Statistics, University of Oxford, Oxford, UK.

Collapse

McCaw ZR, Gao J, Lin X, Gronsbell J. Synthetic surrogates improve power for genome-wide association studies of partially missing phenotypes in population biobanks. Nat Genet 2024;56:1527-1536. [PMID: 38872030 DOI: 10.1038/s41588-024-01793-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 05/08/2024] [Indexed: 06/15/2024]

Martínez-Magaña JJ, Hurtado-Soriano J, Rivero-Segura NA, Montalvo-Ortiz JL, Garcia-delaTorre P, Becerril-Rojas K, Gomez-Verjan JC. Towards a Novel Frontier in the Use of Epigenetic Clocks in Epidemiology. Arch Med Res 2024;55:103033. [PMID: 38955096 DOI: 10.1016/j.arcmed.2024.103033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 05/10/2024] [Accepted: 06/17/2024] [Indexed: 07/04/2024]

Salvatore M, Kundu R, Shi X, Friese CR, Lee S, Fritsche LG, Mondul AM, Hanauer D, Pearce CL, Mukherjee B. To weight or not to weight? The effect of selection bias in 3 large electronic health record-linked biobanks and recommendations for practice. J Am Med Inform Assoc 2024;31:1479-1492. [PMID: 38742457 PMCID: PMC11187425 DOI: 10.1093/jamia/ocae098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 04/14/2024] [Accepted: 04/18/2024] [Indexed: 05/16/2024] Open

Abstract

OBJECTIVES

To develop recommendations regarding the use of weights to reduce selection bias for commonly performed analyses using electronic health record (EHR)-linked biobank data.

MATERIALS AND METHODS

We mapped diagnosis (ICD code) data to standardized phecodes from 3 EHR-linked biobanks with varying recruitment strategies: All of Us (AOU; n = 244 071), Michigan Genomics Initiative (MGI; n = 81 243), and UK Biobank (UKB; n = 401 167). Using 2019 National Health Interview Survey data, we constructed selection weights for AOU and MGI to represent the US adult population more. We used weights previously developed for UKB to represent the UKB-eligible population. We conducted 4 common analyses comparing unweighted and weighted results.

RESULTS

For AOU and MGI, estimated phecode prevalences decreased after weighting (weighted-unweighted median phecode prevalence ratio [MPR]: 0.82 and 0.61), while UKB estimates increased (MPR: 1.06). Weighting minimally impacted latent phenome dimensionality estimation. Comparing weighted versus unweighted phenome-wide association study for colorectal cancer, the strongest associations remained unaltered, with considerable overlap in significant hits. Weighting affected the estimated log-odds ratio for sex and colorectal cancer to align more closely with national registry-based estimates.

DISCUSSION

Weighting had a limited impact on dimensionality estimation and large-scale hypothesis testing but impacted prevalence and association estimation. When interested in estimating effect size, specific signals from untargeted association analyses should be followed up by weighted analysis.

CONCLUSION

EHR-linked biobanks should report recruitment and selection mechanisms and provide selection weights with defined target populations. Researchers should consider their intended estimands, specify source and target populations, and weight EHR-linked biobank analyses accordingly.

Collapse

Affiliation(s)

Maxwell Salvatore Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109-2029, United States Center for Precision Health Data Science, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States
Ritoban Kundu Center for Precision Health Data Science, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States
Xu Shi Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States
Christopher R Friese Rogel Cancer Center, Michigan Medicine, University of Michigan, Ann Arbor, MI 48109-2029, United States Center for Improving Patient and Population Health, School of Nursing, University of Michigan, Ann Arbor, MI 48109-2029, United States Department of Health Management and Policy, University of Michigan, Ann Arbor, MI 48109-2029, United States
Seunggeun Lee Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States Graduate School of Data Science, Seoul National University, Gwanak-gu, Seoul, Republic of Korea
Lars G Fritsche Center for Precision Health Data Science, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States Rogel Cancer Center, Michigan Medicine, University of Michigan, Ann Arbor, MI 48109-2029, United States
Alison M Mondul Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109-2029, United States Rogel Cancer Center, Michigan Medicine, University of Michigan, Ann Arbor, MI 48109-2029, United States
David Hanauer Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI 48109-2054, United States
Celeste Leigh Pearce Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109-2029, United States Rogel Cancer Center, Michigan Medicine, University of Michigan, Ann Arbor, MI 48109-2029, United States
Bhramar Mukherjee Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109-2029, United States Center for Precision Health Data Science, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, United States

Collapse

van Alten S, Domingue BW, Faul J, Galama T, Marees AT. Reweighting UK Biobank corrects for pervasive selection bias due to volunteering. Int J Epidemiol 2024;53:dyae054. [PMID: 38715336 PMCID: PMC11076923 DOI: 10.1093/ije/dyae054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 04/10/2024] [Indexed: 05/12/2024] Open

Salvatore M, Kundu R, Shi X, Friese CR, Lee S, Fritsche LG, Mondul AM, Hanauer D, Pearce CL, Mukherjee B. To weight or not to weight? Studying the effect of selection bias in three large EHR-linked biobanks. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.12.24302710. [PMID: 38405832 PMCID: PMC10888982 DOI: 10.1101/2024.02.12.24302710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]

Abstract

Objective

To explore the role of selection bias adjustment by weighting electronic health record (EHR)-linked biobank data for commonly performed analyses.

Materials and methods

We mapped diagnosis (ICD code) data to standardized phecodes from three EHR-linked biobanks with varying recruitment strategies: All of Us (AOU; n=244,071), Michigan Genomics Initiative (MGI; n=81,243), and UK Biobank (UKB; n=401,167). Using 2019 National Health Interview Survey data, we constructed selection weights for AOU and MGI to be more representative of the US adult population. We used weights previously developed for UKB to represent the UKB-eligible population. We conducted four common descriptive and analytic tasks comparing unweighted and weighted results.

Results

For AOU and MGI, estimated phecode prevalences decreased after weighting (weighted-unweighted median phecode prevalence ratio [MPR]: 0.82 and 0.61), while UKB's estimates increased (MPR: 1.06). Weighting minimally impacted latent phenome dimensionality estimation. Comparing weighted versus unweighted PheWAS for colorectal cancer, the strongest associations remained unaltered and there was large overlap in significant hits. Weighting affected the estimated log-odds ratio for sex and colorectal cancer to align more closely with national registry-based estimates.

Discussion

Weighting had limited impact on dimensionality estimation and large-scale hypothesis testing but impacted prevalence and association estimation more. Results from untargeted association analyses should be followed by weighted analysis when effect size estimation is of interest for specific signals.

Conclusion

Collapse

Jordan DM, Vy HMT, Do R. A deep learning transformer model predicts high rates of undiagnosed rare disease in large electronic health systems. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.21.23300393. [PMID: 38196638 PMCID: PMC10775679 DOI: 10.1101/2023.12.21.23300393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]

Abstract

It is estimated that as many as 1 in 16 people worldwide suffer from rare diseases. Rare disease patients face difficulty finding diagnosis and treatment for their conditions, including long diagnostic odysseys, multiple incorrect diagnoses, and unavailable or prohibitively expensive treatments. As a result, it is likely that large electronic health record (EHR) systems include high numbers of participants suffering from undiagnosed rare disease. While this has been shown in detail for specific diseases, these studies are expensive and time consuming and have only been feasible to perform for a handful of the thousands of known rare diseases. The bulk of these undiagnosed cases are effectively hidden, with no straightforward way to differentiate them from healthy controls. The ability to access them at scale would enormously expand our capacity to study and develop drugs for rare diseases, adding to tools aimed at increasing availability of study cohorts for rare disease. In this study, we train a deep learning transformer algorithm, RarePT (Rare-Phenotype Prediction Transformer), to impute undiagnosed rare disease from EHR diagnosis codes in 436,407 participants in the UK Biobank and validated on an independent cohort from 3,333,560 individuals from the Mount Sinai Health System. We applied our model to 155 rare diagnosis codes with fewer than 250 cases each in the UK Biobank and predicted participants with elevated risk for each diagnosis, with the number of participants predicted to be at risk ranging from 85 to 22,000 for different diagnoses. These risk predictions are significantly associated with increased mortality for 65% of diagnoses, with disease burden expressed as disability-adjusted life years (DALY) for 73% of diagnoses, and with 72% of available disease-specific diagnostic tests. They are also highly enriched for known rare diagnoses in patients not included in the training set, with an odds ratio (OR) of 48.0 in cross-validation cohorts of the UK Biobank and an OR of 30.6 in the independent Mount Sinai Health System cohort. Most importantly, RarePT successfully screens for undiagnosed patients in 32 rare diseases with available diagnostic tests in the UK Biobank. Using the trained model to estimate the prevalence of undiagnosed disease in the UK Biobank for these 32 rare phenotypes, we find that at least 50% of patients remain undiagnosed for 20 of 32 diseases. These estimates provide empirical evidence of a high prevalence of undiagnosed rare disease, as well as demonstrating the enormous potential benefit of using RarePT to screen for undiagnosed rare disease patients in large electronic health systems.

Collapse

Leviton A, Loddenkemper T. Design, implementation, and inferential issues associated with clinical trials that rely on data in electronic medical records: a narrative review. BMC Med Res Methodol 2023;23:271. [PMID: 37974111 PMCID: PMC10652539 DOI: 10.1186/s12874-023-02102-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 11/08/2023] [Indexed: 11/19/2023] Open

Sánchez-Valle J, Valencia A. Molecular bases of comorbidities: present and future perspectives. Trends Genet 2023;39:773-786. [PMID: 37482451 DOI: 10.1016/j.tig.2023.06.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 06/12/2023] [Accepted: 06/12/2023] [Indexed: 07/25/2023]

Liang J, Li Q, Fu Z, Liu X, Shen P, Sun Y, Zhang J, Lu P, Lin H, Tang X, Gao P. Validation and comparison of cardiovascular risk prediction equations in Chinese patients with Type 2 diabetes. Eur J Prev Cardiol 2023;30:1293-1303. [PMID: 37315163 DOI: 10.1093/eurjpc/zwad198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 06/02/2023] [Accepted: 06/08/2023] [Indexed: 06/16/2023]

Abstract

AIMS

For patients with diabetes, the European guidelines updated the cardiovascular disease (CVD) risk prediction recommendations using diabetes-specific models with age-specific cut-offs, whereas American guidelines still advise models derived from the general population. We aimed to compare the performance of four cardiovascular risk models in diabetes populations.

METHODS AND RESULTS

Patients with diabetes from the CHERRY study, an electronic health records-based cohort study in China, were identified. Five-year CVD risk was calculated using original and recalibrated diabetes-specific models [Action in Diabetes and Vascular disease: PreterAx and diamicroN-MR Controlled Evaluation (ADVANCE) and the Hong Kong cardiovascular risk model (HK)] and general population-based models [Pooled Cohort Equations (PCE) and Prediction for Atherosclerotic cardiovascular disease Risk in China (China-PAR)]. During a median 5.8-year follow-up, 46 558 patients had 2605 CVD events. C-statistics were 0.711 [95% confidence interval: 0.693-0.729] for ADVANCE and 0.701 (0.683-0.719) for HK in men, and 0.742 (0.725-0.759) and 0.732 (0.718-0.747) in women. C-statistics were worse in two general population-based models. Recalibrated ADVANCE underestimated risk by 1.2% and 16.8% in men and women, whereas PCE underestimated risk by 41.9% and 24.2% in men and women. With the age-specific cut-offs, the overlap of the high-risk patients selected by every model pair ranged from only 22.6% to 51.2%. When utilizing the fixed cut-off at 5%, the recalibrated ADVANCE selected similar high-risk patients in men (7400) as compared to the age-specific cut-offs (7102), whereas age-specific cut-offs exhibited a reduction in the selection of high-risk patients in women (2646 under age-specific cut-offs vs. 3647 under fixed cut-off).

CONCLUSION

Diabetes-specific CVD risk prediction models showed better discrimination for patients with diabetes. High-risk patients selected by different models varied significantly. Age-specific cut-offs selected fewer patients at high CVD risk especially in women.

Collapse

Stevelink R, Campbell C, Chen S, Abou-Khalil B, Adesoji OM, Afawi Z, Amadori E, Anderson A, Anderson J, Andrade DM, Annesi G, Auce P, Avbersek A, Bahlo M, Baker MD, Balagura G, Balestrini S, Barba C, Barboza K, Bartolomei F, Bast T, Baum L, Baumgartner T, Baykan B, Bebek N, Becker AJ, Becker F, Bennett CA, Berghuis B, Berkovic SF, Beydoun A, Bianchini C, Bisulli F, Blatt I, Bobbili DR, Borggraefe I, Bosselmann C, Braatz V, Bradfield JP, Brockmann K, Brody LC, Buono RJ, Busch RM, Caglayan H, Campbell E, Canafoglia L, Canavati C, Cascino GD, Castellotti B, Catarino CB, Cavalleri GL, Cerrato F, Chassoux F, Cherny SS, Cheung CL, Chinthapalli K, Chou IJ, Chung SK, Churchhouse C, Clark PO, Cole AJ, Compston A, Coppola A, Cosico M, Cossette P, Craig JJ, Cusick C, Daly MJ, Davis LK, de Haan GJ, Delanty N, Depondt C, Derambure P, Devinsky O, Di Vito L, Dlugos DJ, Doccini V, Doherty CP, El-Naggar H, Elger CE, Ellis CA, Eriksson JG, Faucon A, Feng YCA, Ferguson L, Ferraro TN, Ferri L, Feucht M, Fitzgerald M, Fonferko-Shadrach B, Fortunato F, Franceschetti S, Franke A, French JA, Freri E, Gagliardi M, Gambardella A, Geller EB, Giangregorio T, Gjerstad L, Glauser T, Goldberg E, Goldman A, Granata T, Greenberg DA, Guerrini R, Gupta N, Haas KF, Hakonarson H, Hallmann K, Hassanin E, Hegde M, Heinzen EL, Helbig I, Hengsbach C, Heyne HO, Hirose S, Hirsch E, Hjalgrim H, Howrigan DP, Hucks D, Hung PC, Iacomino M, Imbach LL, Inoue Y, Ishii A, Jamnadas-Khoda J, Jehi L, Johnson MR, Kälviäinen R, Kamatani Y, Kanaan M, Kanai M, Kantanen AM, Kara B, Kariuki SM, Kasperavičiūte D, Kasteleijn-Nolst Trenite D, Kato M, Kegele J, Kesim Y, Khoueiry-Zgheib N, King C, Kirsch HE, Klein KM, Kluger G, Knake S, Knowlton RC, Koeleman BPC, Korczyn AD, Koupparis A, Kousiappa I, Krause R, Krenn M, Krestel H, Krey I, Kunz WS, Kurki MI, Kurlemann G, Kuzniecky R, Kwan P, Labate A, Lacey A, Lal D, Landoulsi Z, Lau YL, Lauxmann S, Leech SL, Lehesjoki AE, Lemke JR, Lerche H, Lesca G, Leu C, Lewin N, Lewis-Smith D, Li GHY, Li QS, Licchetta L, Lin KL, Lindhout D, Linnankivi T, Lopes-Cendes I, Lowenstein DH, Lui CHT, Madia F, Magnusson S, Marson AG, May P, McGraw CM, Mei D, Mills JL, Minardi R, Mirza N, Møller RS, Molloy AM, Montomoli M, Mostacci B, Muccioli L, Muhle H, Müller-Schlüter K, Najm IM, Nasreddine W, Neale BM, Neubauer B, Newton CRJC, Nöthen MM, Nothnagel M, Nürnberg P, O’Brien TJ, Okada Y, Ólafsson E, Oliver KL, Özkara C, Palotie A, Pangilinan F, Papacostas SS, Parrini E, Pato CN, Pato MT, Pendziwiat M, Petrovski S, Pickrell WO, Pinsky R, Pippucci T, Poduri A, Pondrelli F, Powell RHW, Privitera M, Rademacher A, Radtke R, Ragona F, Rau S, Rees MI, Regan BM, Reif PS, Rhelms S, Riva A, Rosenow F, Ryvlin P, Saarela A, Sadleir LG, Sander JW, Sander T, Scala M, Scattergood T, Schachter SC, Schankin CJ, Scheffer IE, Schmitz B, Schoch S, Schubert-Bast S, Schulze-Bonhage A, Scudieri P, Sham P, Sheidley BR, Shih JJ, Sills GJ, Sisodiya SM, Smith MC, Smith PE, Sonsma ACM, Speed D, Sperling MR, Stefansson H, Stefansson K, Steinhoff BJ, Stephani U, Stewart WC, Stipa C, Striano P, Stroink H, Strzelczyk A, Surges R, Suzuki T, Tan KM, Taneja RS, Tanteles GA, Taubøll E, Thio LL, Thomas GN, Thomas RH, Timonen O, Tinuper P, Todaro M, Topaloğlu P, Tozzi R, Tsai MH, Tumiene B, Turkdogan D, Unnsteinsdóttir U, Utkus A, Vaidiswaran P, Valton L, van Baalen A, Vetro A, Vining EPG, Visscher F, von Brauchitsch S, von Wrede R, Wagner RG, Weber YG, Weckhuysen S, Weisenberg J, Weller M, Widdess-Walsh P, Wolff M, Wolking S, Wu D, Yamakawa K, Yang W, Yapıcı Z, Yücesan E, Zagaglia S, Zahnert F, Zara F, Zhou W, Zimprich F, Zsurka G, Zulfiqar Ali Q. GWAS meta-analysis of over 29,000 people with epilepsy identifies 26 risk loci and subtype-specific genetic architecture. Nat Genet 2023;55:1471-1482. [PMID: 37653029 PMCID: PMC10484785 DOI: 10.1038/s41588-023-01485-w] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 07/21/2023] [Indexed: 09/02/2023]

Salvatore M, Clark-Boucher D, Fritsche LG, Ortlieb J, Houghtby J, Driscoll A, Caldwell-Larkins B, Smith JA, Brummett CM, Kheterpal S, Lisabeth L, Mukherjee B. Epidemiologic Questionnaire (EPI-Q) - a scalable, app-based health survey linked to electronic health record and genotype data. Epidemiol Health 2023;45:e2023074. [PMID: 37591787 PMCID: PMC10867525 DOI: 10.4178/epih.e2023074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 07/03/2023] [Indexed: 08/19/2023] Open

Mignogna G, Carey CE, Wedow R, Baya N, Cordioli M, Pirastu N, Bellocco R, Malerbi KF, Nivard MG, Neale BM, Walters RK, Ganna A. Patterns of item nonresponse behaviour to survey questionnaires are systematic and associated with genetic loci. Nat Hum Behav 2023;7:1371-1387. [PMID: 37386106 PMCID: PMC10444625 DOI: 10.1038/s41562-023-01632-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 05/17/2023] [Indexed: 07/01/2023]

Affiliation(s)

Gianmarco Mignogna Analytic and Translational Genetics Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Caitlin E Carey Analytic and Translational Genetics Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
Robbee Wedow Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA. Department of Sociology, Purdue University, West Lafayette, IN, USA. Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA. AnalytiXIN (Analytics Indiana), Indianapolis, IN, USA. Department of Statistics, Purdue University, West Lafayette, IN, USA.
Nikolas Baya Analytic and Translational Genetics Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Mattia Cordioli Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
Nicola Pirastu Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, Scotland Fondazione Human Technopole, Viale Rita Levi-Montalcini, Milan, Italy
Rino Bellocco Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Kathryn Fiuza Malerbi Department of Public Health, Purdue University, West Lafayette, IN, USA
Michel G Nivard Department of Biological Psychiatry, Faculty of Behavioural and Movement Sciences, Vrije Universiteit, Amsterdam, the Netherlands Methodology Program, Amsterdam Public Health, Amsterdam, the Netherlands Amsterdam Neuroscience - Mood, Anxiety, Psychosis, Stress and Sleep, Amsterdam, the Netherlands
Benjamin M Neale Analytic and Translational Genetics Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA Novo Nordisk Foundation for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Raymond K Walters Analytic and Translational Genetics Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Andrea Ganna Analytic and Translational Genetics Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA. Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Collapse

Wang S, Quan L, Ding M, Kang JH, Koenen KC, Kubzansky LD, Branch-Elliman W, Chavarro JE, Roberts AL. Depression, worry, and loneliness are associated with subsequent risk of hospitalization for COVID-19: a prospective study. Psychol Med 2023;53:4022-4031. [PMID: 35586906 PMCID: PMC9924056 DOI: 10.1017/s0033291722000691] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Abstract

BACKGROUND

Pre-pandemic psychological distress is associated with increased susceptibility to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, but associations with the coronavirus disease 2019 (COVID-19) severity are not established. The authors examined the associations between distress prior to SARS-CoV-2 infection and subsequent risk of hospitalization.

METHODS

Between April 2020 (baseline) and April 2021, we followed 54 781 participants from three ongoing cohorts: Nurses' Health Study II (NHSII), Nurses' Health Study 3 (NHS3), and the Growing Up Today Study (GUTS) who reported no current or prior SARS-CoV-2 infection at baseline. Chronic depression was assessed during 2010-2019. Depression, anxiety, worry about COVID-19, perceived stress, and loneliness were measured at baseline. SARS-CoV-2 infection and hospitalization due to COVID-19 was self-reported. Relative risks (RRs) were calculated by Poisson regression.

RESULTS

3663 participants reported a positive SARS-CoV-2 test (mean age = 55.0 years, standard deviation = 13.8) during follow-up. Among these participants, chronic depression prior to the pandemic [RR = 1.72; 95% confidence interval (CI) 1.20-2.46], and probable depression (RR = 1.81, 95% CI 1.08-3.03), being very worried about COVID-19 (RR = 1.79; 95% CI 1.12-2.86), and loneliness (RR = 1.81, 95% CI 1.02-3.20) reported at baseline were each associated with subsequent COVID-19 hospitalization, adjusting for demographic factors and healthcare worker status. Anxiety and perceived stress were not associated with hospitalization. Depression, worry about COVID-19, and loneliness were as strongly associated with hospitalization as were high cholesterol and hypertension, established risk factors for COVID-19 severity.

CONCLUSIONS

Psychological distress may be a risk factor for hospitalization in patients with SARS-CoV-2 infection. Assessment of psychological distress may identify patients at greater risk of hospitalization. Future work should examine whether addressing distress improves physical health outcomes.

Collapse

Mezuk B, Kelly K, Bennion E, Concha JB. Leveraging a genetically-informative study design to explore depression as a risk factor for type 2 diabetes: Rationale and participant characteristics of the Mood and Immune Regulation in Twins Study. FRONTIERS IN CLINICAL DIABETES AND HEALTHCARE 2023;4:1026402. [PMID: 37008275 PMCID: PMC10064086 DOI: 10.3389/fcdhc.2023.1026402] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 03/01/2023] [Indexed: 03/19/2023]

Abstract BackgroundComorbidity between depression and type 2 diabetes is thought to arise from the joint effects of psychological, behavioral, and biological processes. Studies of monozygotic twins may provide a unique opportunity for clarifying how these processes inter-relate. This paper describes the rationale, characteristics, and initial findings of a longitudinal co-twin study aimed at examining the biopsychosocial mechanisms linking depression and risk of diabetes in mid-life.MethodsParticipants in the Mood and Immune Regulation in Twins (MIRT) Study were recruited from the Mid-Atlantic Twin Registry. MIRT consisted of 94 individuals who do not have diabetes at baseline, representing 43 twin pairs (41 monozygotic and 2 dizygotic), one set of monozygotic triplets, and 5 individuals whose co-twin did not participate. A broad set of variables were assessed including psychological factors (e.g., lifetime history major depression (MD)); social factors (e.g., stress perceptions and experiences); and biological factors, including indicators of metabolic risk (e.g., BMI, blood pressure (BP), HbA1c) and immune functioning (e.g., pro- and anti-inflammatory cytokines), as well as collection of RNA. Participants were re-assessed 6-month later. Intra-class correlation coefficients (ICC) and descriptive comparisons were used to explore variation in these psychological, social, and biological factors across time and within pairs.ResultsMean age was 53 years, 68% were female, and 77% identified as white. One-third had a history of MD, and 18 sibling sets were discordant for MD. MD was associated with higher systolic (139.1 vs 132.2 mmHg, p=0.05) and diastolic BP (87.2 vs. 80.8 mmHg, p=0.002) and IL-6 (1.47 vs. 0.93 pg/mL, p=0.001). MD was not associated with BMI, HbA1c, or other immune markers. While the biological characteristics of the co-twins were significantly correlated, all within-person ICCs were higher than the within-pair correlations (e.g., HbA1c within-person ICC=0.88 vs. within-pair ICC=0.49; IL-6 within-person ICC=0.64 vs. within-pair=0.54). Among the pairs discordant for MD, depression was not substantially associated with metabolic or immune markers, but was positively associated with stress.ConclusionsTwin studies have the potential to clarify the biopsychosocial processes linking depression and diabetes, and recently completed processing of RNA samples from MIRT permits future exploration of gene expression as a potential mechanism. Collapse

Getz K, Hubbard RA, Linn KA. Performance of Multiple Imputation Using Modern Machine Learning Methods in Electronic Health Records Data. Epidemiology 2023;34:206-215. [PMID: 36722803 DOI: 10.1097/ede.0000000000001578] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]

Abstract

BACKGROUND

Missing data are common in studies using electronic health records (EHRs)-derived data. Missingness in EHR data is related to healthcare utilization patterns, resulting in complex and potentially missing not at random missingness mechanisms. Prior research has suggested that machine learning-based multiple imputation methods may outperform traditional methods and may perform well even in settings of missing not at random missingness.

METHODS

We used plasmode simulations based on a nationwide EHR-derived de-identified database for patients with metastatic urothelial carcinoma to compare the performance of multiple imputation using chained equations, random forests, and denoising autoencoders in terms of bias and precision of hazard ratio estimates under varying proportions of observations with missing values and missingness mechanisms (missing completely at random, missing at random, and missing not at random).

RESULTS

Multiple imputation by chained equations and random forest methods had low bias and similar standard errors for parameter estimates under missingness completely at random. Under missingness at random, denoising autoencoders had higher bias than multiple imputation by chained equations and random forests. Contrary to results of prior studies of denoising autoencoders, all methods exhibited substantial bias under missingness not at random, with bias increasing in direct proportion to the amount of missing data.

CONCLUSIONS

We found no advantage of denoising autoencoders for multiple imputation in the setting of an epidemiologic study conducted using EHR data. Results suggested that denoising autoencoders may overfit the data leading to poor confounder control. Use of more flexible imputation approaches does not mitigate bias induced by missingness not at random and can produce estimates with spurious precision.

Collapse

Bagheri M, Chung CP, Dickson AL, Van Driest SL, Borinstein SC, Mosley JD. White blood cell ranges and frequency of neutropenia by Duffy genotype status. Blood Adv 2023;7:406-409. [PMID: 35895516 PMCID: PMC9979714 DOI: 10.1182/bloodadvances.2022007680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 07/07/2022] [Accepted: 07/08/2022] [Indexed: 02/02/2023] Open

Zawistowski M, Fritsche LG, Pandit A, Vanderwerff B, Patil S, Schmidt EM, VandeHaar P, Willer CJ, Brummett CM, Kheterpal S, Zhou X, Boehnke M, Abecasis GR, Zöllner S. The Michigan Genomics Initiative: A biobank linking genotypes and electronic clinical records in Michigan Medicine patients. CELL GENOMICS 2023;3:100257. [PMID: 36819667 PMCID: PMC9932985 DOI: 10.1016/j.xgen.2023.100257] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 06/07/2022] [Accepted: 01/05/2023] [Indexed: 02/04/2023]

Affiliation(s)

Matthew Zawistowski Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
Lars G. Fritsche Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
Anita Pandit Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
Brett Vanderwerff Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
Snehal Patil Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
Ellen M. Schmidt Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
Peter VandeHaar Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
Cristen J. Willer Department of Internal Medicine, Division of Cardiovascular Medicine, Department of Human Genetics, University of Michigan, Ann Arbor, MI 48103, USA
Chad M. Brummett Department of Anesthesiology, University of Michigan, Ann Arbor, MI 48103, USA
Sachin Kheterpal Department of Anesthesiology, University of Michigan, Ann Arbor, MI 48103, USA
Xiang Zhou Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
Michael Boehnke Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
Gonçalo R. Abecasis Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA Regeneron Genetics Center, Tarrytown, NY 10591, USA
Sebastian Zöllner Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA Department of Psychiatry, University of Michigan, Ann Arbor, MI 48103, USA

Collapse

Ri K, Fukasawa T, Yoshida S, Takeuchi M, Kawakami K. Risk of parkinsonism and related movement disorders with gabapentinoids or tramadol: A case-crossover study. Pharmacotherapy 2023;43:136-144. [PMID: 36633384 DOI: 10.1002/phar.2761] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 12/17/2022] [Accepted: 12/18/2022] [Indexed: 01/13/2023]

Yang S, Varghese P, Stephenson E, Tu K, Gronsbell J. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 2023;30:367-381. [PMID: 36413056 PMCID: PMC9846699 DOI: 10.1093/jamia/ocac216] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/27/2022] [Accepted: 10/27/2022] [Indexed: 11/23/2022] Open

Abstract

OBJECTIVE

Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used.

MATERIALS AND METHODS

We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies.

RESULTS

Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions.

DISCUSSION

Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released.

CONCLUSION

Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.

Collapse

Venkatesh SS, Ganjgahi H, Palmer DS, Coley K, Wittemans LBL, Nellaker C, Holmes C, Lindgren CM, Nicholson G. The genetic architecture of changes in adiposity during adulthood. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.09.23284364. [PMID: 36711652 PMCID: PMC9882550 DOI: 10.1101/2023.01.09.23284364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]

Klinkhammer H, Staerk C, Maj C, Krawitz PM, Mayr A. A statistical boosting framework for polygenic risk scores based on large-scale genotype data. Front Genet 2023;13:1076440. [PMID: 36704342 PMCID: PMC9871367 DOI: 10.3389/fgene.2022.1076440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 12/20/2022] [Indexed: 01/12/2023] Open

Gu T, Lee PH, Duan R. COMMUTE: Communication-efficient transfer learning for multi-site risk prediction. J Biomed Inform 2023;137:104243. [PMID: 36403757 PMCID: PMC9868117 DOI: 10.1016/j.jbi.2022.104243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 09/20/2022] [Accepted: 11/06/2022] [Indexed: 11/19/2022]

Abstract

OBJECTIVES

We propose a communication-efficient transfer learning approach (COMMUTE) that effectively incorporates multi-site healthcare data for training a risk prediction model in a target population of interest, accounting for challenges including population heterogeneity and data sharing constraints across sites.

METHODS

We first train population-specific source models locally within each site. Using data from a given target population, COMMUTE learns a calibration term for each source model, which adjusts for potential data heterogeneity through flexible distance-based regularizations. In a centralized setting where multi-site data can be directly pooled, all data are combined to train the target model after calibration. When individual-level data are not shareable in some sites, COMMUTE requests only the locally trained models from these sites, with which, COMMUTE generates heterogeneity-adjusted synthetic data for training the target model. We evaluate COMMUTE via extensive simulation studies and an application to multi-site data from the electronic Medical Records and Genomics (eMERGE) Network to predict extreme obesity.

RESULTS

Simulation studies show that COMMUTE outperforms methods without adjusting for population heterogeneity and methods trained in a single population over a broad spectrum of settings. Using eMERGE data, COMMUTE achieves an area under the receiver operating characteristic curve (AUC) around 0.80, which outperforms other benchmark methods with AUC ranging from 0.51 to 0.70.

CONCLUSION

COMMUTE improves the risk prediction in a target population with limited samples and safeguards against negative transfer when some source populations are highly different from the target. In a federated setting, it is highly communication efficient as it only requires each site to share model parameter estimates once, and no iterative communication or higher-order terms are needed.

Collapse

Beesley LJ, Mukherjee B. Case studies in bias reduction and inference for electronic health record data with selection bias and phenotype misclassification. Stat Med 2022;41:5501-5516. [PMID: 36131394 PMCID: PMC9826451 DOI: 10.1002/sim.9579] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 08/12/2022] [Accepted: 08/13/2022] [Indexed: 01/11/2023]

Khera AV, Wang M, Chaffin M, Emdin CA, Samani NJ, Schunkert H, Watkins H, McPherson R, Elosua R, Boerwinkle E, Ardissino D, Butterworth AS, Di Angelantonio E, Naheed A, Danesh J, Chowdhury R, Krumholz HM, Sheu WHH, Rich SS, Rotter JI, Chen YDI, Gabriel S, Lander ES, Saleheen D, Kathiresan S. Gene Sequencing Identifies Perturbation in Nitric Oxide Signaling as a Nonlipid Molecular Subtype of Coronary Artery Disease. CIRCULATION. GENOMIC AND PRECISION MEDICINE 2022;15:e003598. [PMID: 36215124 PMCID: PMC9771961 DOI: 10.1161/circgen.121.003598] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Accepted: 06/24/2022] [Indexed: 12/24/2022]

Affiliation(s)

Amit V. Khera Program in Medical & Population Genetics, Broad Inst of MIT & Harvard, Cambridge, MA Ctr for Genomic Medicine, Massachusetts General Hospital, Boston, MA Dept of Medicine, Harvard Medical School, Boston, MA Cardiology Division, Dept of Medicine, Massachusetts General Hospital, Boston, MA
Minxian Wang Ctr for Genomic Medicine, Massachusetts General Hospital, Boston, MA Program in Medical & Population Genetics, Broad Inst of MIT & Harvard, Cambridge, MA CAS Key Laboratory of Genome Sciences & Information, Beijing Inst of Genomics, Chinese Academy of Sciences & China National Ctr for Bioinformation, Beijing, China
Mark Chaffin Program in Medical & Population Genetics, Broad Inst of MIT & Harvard, Cambridge, MA
Connor A. Emdin Ctr for Genomic Medicine, Massachusetts General Hospital, Boston, MA Dept of Medicine, Harvard Medical School, Boston, MA Program in Medical & Population Genetics, Broad Inst of MIT & Harvard, Cambridge, MA
Nilesh J. Samani Dept of Cardiovascular Sciences, Univ of Leicester, Leicester, UK NIHR Leicester Biomedical Research Ctr, Glenfield Hospital, Leicester, UK
Heribert Schunkert Dept of Cardiology, German Heart Ctr Munich, Technical Univ of Munich, Munich, Germany DZHK (German Ctr for Cardiovascular Research), Partner site Munich, Munich Heart Alliance, Munich, Germany
Hugh Watkins Division of Cardiovascular Medicine, Radcliffe Dept of Medicine, Univ of Oxford, Headington, UK Wellcome Trust Ctr for Human Genetics, Univ of Oxford, Oxford, UK
Ruth McPherson Inst for Cardiogenetics, Univ of Lübeck, Lübeck, Schleswig-Holstein, Germany German Research Ctr for Cardiovascular Research, Partner Site Hamburg/Lübeck/Kiel & Univ Heart Center Lübeck (J.E.), Berlin, Brandenburg, Germany Depts of Medicine & Biochemistry, Univ of Ottawa Heart Inst, Ottawa, ON, Canada
Roberto Elosua Cardiovascular Epidemiology & Genetics, Hospital del Mar Research Inst, Barcelona, Spain CIBER Enfermedades Cardiovasculares, Barcelona, Spain Facultat de Medicina, Universitat de Vic-Central de Cataluña, Barcelona, Spain
Eric Boerwinkle Ctr for Human Genetics & Dept. of Epidemiology, Univ of Texas Health Science Ctr School of Public Health, Houston, TX
Diego Ardissino Cardiology, Azienda Ospedaliero-Universitaria di Parma, Univ of Parma, Parma, Italy Associazione per lo Studio Della Trombosi in Cardiologia, Pavia, Italy
Adam S. Butterworth British Heart Foundation Cardiovascular Epidemiology Unit, Dept of Public Health & Primary Care, Univ of Cambridge, Cambridge, UK National Inst for Health Research Blood & Transplant Research Unit in Donor Health & Genomics, Univ of Cambridge, Cambridge, UK Health Data Research UK Cambridge, Wellcome Genome Campus & Univ of Cambridge, Cambridge, UK
Emanuele Di Angelantonio British Heart Foundation Cardiovascular Epidemiology Unit, Dept of Public Health & Primary Care, Univ of Cambridge, Cambridge, UK Health Data Research UK Cambridge, Wellcome Genome Campus & Univ of Cambridge, Cambridge, UK NIHR Blood & Transplant Research Unit in Donor Health & Genomics, Univ of Cambridge, Cambridge, UK BHF Ctr of Research Excellence, School of Clinical Medicine, Addenbrooke’s Hospital, Univ of Cambridge, Cambridge, UK Health Data Science Research Ctr, Human Technopole, Milan, Italy
Aliya Naheed Initiative for Noncommunicable Bangladesh, Diseases, Health Systems & Population Studies Division, International Ctr for Diarrhoeal Disease Research, Dhaka, Bangladesh
John Danesh British Heart Foundation Cardiovascular Epidemiology Unit, Dept of Public Health & Primary Care, Univ of Cambridge, Cambridge, UK National Inst for Health Research Blood & Transplant Research Unit in Donor Health & Genomics, Univ of Cambridge, Cambridge, UK British Heart Foundation Ctr of Research Excellence, Univ of Cambridge, Cambridge, UK Health Data Research UK Cambridge, Wellcome Genome Campus & Univ of Cambridge, Cambridge, UK Dept of Human Genetics, Wellcome Sanger Inst, Hinxton, UK
Rajiv Chowdhury British Heart Foundation Cardiovascular Epidemiology Unit, Dept of Public Health & Primary Care, Univ of Cambridge, Cambridge, UK Centre for Non-Communicable Disease Research, Dhaka, Bangladesh
Harlan M. Krumholz Section of Cardiovascular Medicine, Dept of Medicine, Yale Univ, New Haven, CT Ctr for Outcomes Research & Evaluation, Yale-New Haven Hospital, New Haven, CT
Wayne H-H Sheu Cardiovascular Research Ctr, Dept of Medicine, National Yang Ming Univ School of Medicine, Taipei, Taiwan
Stephen S. Rich Ctr for Public Health Genomics, Univ of Virginia, Charlottesville, VA
Jerome I. Rotter The Inst for Translational Genomics & Population Sciences, Dept of Pediatrics, The Lundquist Inst for Biomedical Innovation at Harbor-UCLA Medical Ctr, Torrance, CA
Yii-der Ida Chen The Inst for Translational Genomics & Population Sciences, Dept of Pediatrics, The Lundquist Inst for Biomedical Innovation at Harbor-UCLA Medical Ctr, Torrance, CA
Stacey Gabriel Program in Medical & Population Genetics, Broad Inst of MIT & Harvard, Cambridge, MA
Eric S. Lander Program in Medical & Population Genetics, Broad Inst of MIT & Harvard, Cambridge, MA Dept of Biology, MIT, Cambridge, MA Dept of Systems Biology, Harvard Medical School, Boston, MA
Danish Saleheen Dept of Medicine, Columbia Univ, New York, NY Ctr for Non-Communicable Diseases, Karachi, Sindh, Pakistan
Sekar Kathiresan Ctr for Genomic Medicine, Massachusetts General Hospital, Boston, MA Dept of Medicine, Harvard Medical School, Boston, MA Cardiology Division, Dept of Medicine, Massachusetts General Hospital, Boston, MA Verve Therapeutics, Cambridge, MA

Collapse

Étiévant L, Viallon V. Causal inference under over-simplified longitudinal causal models. Int J Biostat 2022;18:421-437. [PMID: 34727585 DOI: 10.1515/ijb-2020-0081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 10/14/2021] [Indexed: 01/10/2023]

Ma Y, Patil S, Zhou X, Mukherjee B, Fritsche LG. ExPRSweb: An online repository with polygenic risk scores for common health-related exposures. Am J Hum Genet 2022;109:1742-1760. [PMID: 36152628 PMCID: PMC9606385 DOI: 10.1016/j.ajhg.2022.09.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 08/31/2022] [Indexed: 01/25/2023] Open

Affiliation(s)

Ying Ma Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
Snehal Patil Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
Xiang Zhou Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
Bhramar Mukherjee Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; University of Michigan Rogel Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA; Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI 48109, USA
Lars G Fritsche Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; University of Michigan Rogel Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA.

Collapse

Clark-Boucher D, Boss J, Salvatore M, Smith JA, Fritsche LG, Mukherjee B. Assessing the added value of linking electronic health records to improve the prediction of self-reported COVID-19 testing and diagnosis. PLoS One 2022;17:e0269017. [PMID: 35877617 PMCID: PMC9312965 DOI: 10.1371/journal.pone.0269017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 05/12/2022] [Indexed: 11/19/2022] Open

He Y, Patel CJ. Shared exposure liability of type 2 diabetes and other chronic conditions in the UK Biobank. Acta Diabetol 2022;59:851-860. [PMID: 35348899 PMCID: PMC9085680 DOI: 10.1007/s00592-022-01864-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 01/31/2022] [Indexed: 11/09/2022]

Abstract

AIMS

To investigate whether the cumulative exposure risks of incident T2D are shared with other common chronic diseases.

RESEARCH DESIGN AND METHODS

We first establish and report the cross-sectional prevalence, cross-sectional co-prevalence, and incidence of seven T2D-associated chronic diseases [hypertension, atrial fibrillation, coronary artery disease, obesity, chronic obstructive pulmonary disease (COPD), and chronic kidney and liver diseases] in the UK Biobank. We use published weights of genetic variants and exposure variables to derive the T2D polygenic (PGS) and polyexposure (PXS) risk scores and test their associations to incident diseases.

RESULTS

PXS was associated with higher levels of clinical risk factors including BMI, systolic blood pressure, blood glucose, triglycerides, and HbA1c in individuals without overt or diagnosed T2D. In addition to predicting incident T2D, PXS and PGS were significantly and positively associated with the incidence of all 7 other chronic diseases. There were 4% and 8% of individuals in the bottom deciles of PXS and PGS, respectively, who were prediabetic at baseline but had low risks of T2D and other chronic diseases. Compared to the remaining population, individuals in the top deciles of PGS and PXS had particularly high risks of developing chronic diseases. For instance, the hazard ratio of COPD and obesity for individuals in the top T2D PXS deciles was 2.82 (95% CI 2.39-3.35, P = 4.00 × 10-33) and 2.54 (95% CI 2.24-2.87, P = 9.86 × 10-50), respectively, compared to the remaining population. We also found that PXS and PGS were both significantly (P < 0.0001) and positively associated with the total number of incident diseases.

CONCLUSIONS

T2D shares polyexposure risks with other common chronic diseases. Individuals with an elevated genetic and non-genetic risk of T2D also have high risks of cardiovascular, liver, lung, and kidney diseases.

Collapse

Clinical artificial intelligence quality improvement: towards continual monitoring and updating of AI algorithms in healthcare. NPJ Digit Med 2022;5:66. [PMID: 35641814 PMCID: PMC9156743 DOI: 10.1038/s41746-022-00611-y] [Citation(s) in RCA: 72] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 04/29/2022] [Indexed: 12/13/2022] Open

Yang S, Zhou X. PGS-server: accuracy, robustness and transferability of polygenic score methods for biobank scale studies. Brief Bioinform 2022;23:6534383. [PMID: 35193147 DOI: 10.1093/bib/bbac039] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 12/29/2021] [Accepted: 01/26/2022] [Indexed: 01/02/2023] Open

Kawaguchi ES, Li G, Lewinger JP, Gauderman WJ. Two-step hypothesis testing to detect gene-environment interactions in a genome-wide scan with a survival endpoint. Stat Med 2022;41:1644-1657. [PMID: 35075649 PMCID: PMC9007892 DOI: 10.1002/sim.9319] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 11/10/2021] [Accepted: 12/26/2021] [Indexed: 01/13/2023]

McGee G, Haneuse S, Coull BA, Weisskopf MG, Rotem RS. On the Nature of Informative Presence Bias in Analyses of Electronic Health Records. Epidemiology 2022;33:105-113. [PMID: 34711733 PMCID: PMC8633193 DOI: 10.1097/ede.0000000000001432] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Spector-Bagdady K, Tang S, Jabbour S, Price WN, Bracic A, Creary MS, Kheterpal S, Brummett CM, Wiens J. Respecting Autonomy And Enabling Diversity: The Effect Of Eligibility And Enrollment On Research Data Demographics. Health Aff (Millwood) 2021;40:1892-1899. [PMID: 34871076 DOI: 10.1377/hlthaff.2021.01197] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Davitte JM, Stott-Miller M, Ehm MG, Cunnington MC, Reynolds RF. Integration of Real-World Data and Genetics to Support Target Identification and Validation. Clin Pharmacol Ther 2021;111:63-76. [PMID: 34818443 DOI: 10.1002/cpt.2477] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 10/06/2021] [Accepted: 10/27/2021] [Indexed: 01/01/2023]

Willers C, Lynch T, Chand V, Islam M, Lassere M, March L. A Versatile, Secure, and Sustainable All-in-One Biobank-Registry Data Solution: The A3BC REDCap Model. Biopreserv Biobank 2021;20:244-259. [PMID: 34807733 DOI: 10.1089/bio.2021.0098] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Abstract

Introduction: A key element in the big data revolution is large-scale biobanking and the associated development of high-quality data collections and supporting informatics solutions. As such, in establishing the Australian Arthritis and Autoimmune Biobank Collaborative (A3BC), we sought to establish a low-cost, nation-scale data management system capable of managing a multisite biobank registry with complex longitudinal sample and data requirements. Materials and Methods: We assessed several international commercial and nonprofit software platforms using standardized system requirement criteria and follow-up interviews. Vendor compliance scoring was prioritized to meet our project-critical requirements. Consumer/end-user codesign was integral to refining our system requirements for optimized adoption. Customization of the selected software solution was performed to optimize field auto-population between participant timepoints and forms, using modules that are transferable and that do not impact core code. Institutional and independent testing was used to ensure data security. Results: We selected the widely used research web application Research Electronic Data Capture (REDCap), which is "free" (under nonprofit license agreement terms), highly configurable, and customizable to a variety of biobank and registry needs and can be developed/maintained by biobank users with modest IT skill, time, and cost. We created a secure, comprehensive participant-centric biobank-registry database that includes: (1) best practice data security measures (incl. multisite access login using institutional user credentials), (2) permission-to-contact and dynamic itemized electronic consent, (3) a complete chain of custody from consent to longitudinal biospecimen data collection to publication, (4) complex longitudinal patient-reported surveys, (5) integration of record-level extracted/linked participant data, (6) significant form auto-population for streamlined data capture, and (7) native dashboards for operational visualizations. Conclusion: We recommend the versatile, reusable, and sustainable informatics model we have developed in REDCap for prospective chronic disease biobanks or registry biobanks (of local to national complexity) supporting holistic research into disease prediction, precision medicine, and prevention strategies.

Collapse

Rush A, Catchpoole DR, Reaiche-Miller G, Gilbert T, Ng W, Watson PH, Byrne JA. What Do Biomedical Researchers Want from Biobanks? Results of an Online Survey. Biopreserv Biobank 2021;20:271-282. [PMID: 34756100 DOI: 10.1089/bio.2021.0084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

Aims: The purpose of biobanking is to provide biospecimens and associated data to researchers, yet the perspectives of biobank research users have been under-investigated. This study aimed to ascertain biobank research users' needs and opinions about biobanking services. Methods: An online survey was developed, which requested information about researcher demographics, localities of biobanks accessed, methods of sourcing biospecimens, and opinions on topics including but not limited to, application processes, data availability, access fees, and return of research results. There were 27 multiple choice/check box questions, 4 questions with a 10-point Likert scale, and 8 questions with provision for further comment. A web link for the survey was distributed to researchers in late 2019/early 2020 in four Australian states: New South Wales, Victoria, Western Australia, and South Australia. Results: Respondents were generally satisfied with biobank application processes and the fit for purpose of received biospecimens/data. Nonetheless, most researchers (n = 61/99, 62%) reported creating their own collections owing to gaps in sample availability and a perceived increase in efficiency. Most accessed biobanks (n = 58/74, 78%) were in close proximity (local or intrastate) to the researcher. Most researchers had limited the scope of their research owing to difficulty of obtaining biospecimens (n = 55/86, 64%) and/or data (n = 52/85, 60%), with the top three responses for additional types of data required being "more long term follow up data," "more clinical data," and "more linked government data." The top influence to use a particular biobank was cost, and the most frequently suggested improvement was reduced direct "cost of obtaining biospecimens." Conclusion: Biobanks that do not meet the needs of their end-users are unlikely to be optimally utilized or sustainable. This survey provides valuable insights to guide biobanks and other stakeholders, such as developing marketing and client engagement plans to encourage local research users and discouraging the creation of unnecessary new collections.

Collapse

Antoniades A, Papaioannou M, Malatras A, Papagregoriou G, Müller H, Holub P, Deltas C, Schizas CN. Integration of Biobanks in National eHealth Ecosystems Facilitating Long-Term Longitudinal Clinical-Omics Studies and Citizens' Engagement in Research Through eHealthBioR. Front Digit Health 2021;3:628646. [PMID: 34713101 PMCID: PMC8521893 DOI: 10.3389/fdgth.2021.628646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 05/11/2021] [Indexed: 11/13/2022] Open

Abstract

Biobanks have long existed to support research activities with BBMRI-ERIC formed as a European research infrastructure supporting the coordination for biobanking with 20 country members and one international organization. Although the benefits of biobanks to the research community are well-established, the direct benefit to citizens is limited to the generic benefit of promoting future research. Furthermore, the advent of General Data Protection Regulation (GDPR) legislation raised a series of challenges for scientific research especially related to biobanking associate activities and longitudinal research studies. Electronic health record (EHR) registries have long existed in healthcare providers. In some countries, even at the national level, these record the state of the health of citizens through time for the purposes of healthcare and data portability between different providers. The potential of EHRs in research is great and has been demonstrated in many projects that have transformed EHR data into retrospective medical history information on participating subjects directly from their physician's collected records; many key challenges, however, remain. In this paper, we present a citizen-centric framework called eHealthBioR, which would enable biobanks to link to EHR systems, thus enabling not just retrospective but also lifelong prospective longitudinal studies of participating citizens. It will also ensure strict adherence to legal and ethical requirements, enabling greater control that encourages participation. Citizens would benefit from the real and direct control of their data and samples, utilizing technology, to empower them to make informed decisions about providing consent and practicing their rights related to the use of their data, as well as by having access to knowledge and data generated from samples they provided to biobanks. This is expected to motivate patient engagement in future research and even leads to participatory design methodologies with citizen/patient-centric designed studies. The development of platforms based on the eHealthBioR framework would need to overcome significant challenges. However, it would shift the burden of addressing these to experts in the field while providing solutions enabling in the long term the lower monetary and time cost of longitudinal studies coupled with the option of lifelong monitoring through EHRs.

Collapse

Coleman JR. The Validity of Brief Phenotyping in Population Biobanks for Psychiatric Genome-Wide Association Studies on the Biobank Scale. Complex Psychiatry 2021;7:11-15. [PMID: 34883499 PMCID: PMC8443942 DOI: 10.1159/000516837] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 04/14/2021] [Indexed: 11/19/2022] Open

Hubbard RA. Commentary on Professor Austin Bradford Hill's Alfred Watson Memorial Lecture. Stat Med 2021;40:29-31. [PMID: 33368363 DOI: 10.1002/sim.8826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 11/06/2020] [Indexed: 11/08/2022]

Bi W, Lee S. Scalable and Robust Regression Methods for Phenome-Wide Association Analysis on Large-Scale Biobank Data. Front Genet 2021;12:682638. [PMID: 34211504 PMCID: PMC8239389 DOI: 10.3389/fgene.2021.682638] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/17/2021] [Indexed: 02/05/2023] Open

Bi W, Zhou W, Dey R, Mukherjee B, Sampson JN, Lee S. Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes. Am J Hum Genet 2021;108:825-839. [PMID: 33836139 DOI: 10.1016/j.ajhg.2021.03.019] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 03/22/2021] [Indexed: 12/12/2022] Open

Salvatore M, Beesley LJ, Fritsche LG, Hanauer D, Shi X, Mondul AM, Pearce CL, Mukherjee B. Phenotype risk scores (PheRS) for pancreatic cancer using time-stamped electronic health record data: Discovery and validation in two large biobanks. J Biomed Inform 2021;113:103652. [PMID: 33279681 PMCID: PMC7855433 DOI: 10.1016/j.jbi.2020.103652] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 10/27/2020] [Accepted: 11/30/2020] [Indexed: 12/31/2022]

Abstract

BACKGROUND

Traditional methods for disease risk prediction and assessment, such as diagnostic tests using serum, urine, blood, saliva or imaging biomarkers, have been important for identifying high-risk individuals for many diseases, leading to early detection and improved survival. For pancreatic cancer, traditional methods for screening have been largely unsuccessful in identifying high-risk individuals in advance of disease progression leading to high mortality and poor survival. Electronic health records (EHR) linked to genetic profiles provide an opportunity to integrate multiple sources of patient information for risk prediction and stratification. We leverage a constellation of temporally associated diagnoses available in the EHR to construct a summary risk score, called a phenotype risk score (PheRS), for identifying individuals at high-risk for having pancreatic cancer. The proposed PheRS approach incorporates the time with respect to disease onset into the prediction framework. We combine and contrast the PheRS with more well-known measures of inherited susceptibility, namely, the polygenic risk scores (PRS) for prediction of pancreatic cancer.

METHODOLOGY

We first calculated pairwise, unadjusted associations between pancreatic cancer diagnosis and all possible other diagnoses across the medical phenome. We call these pairwise associations co-occurrences. After accounting for cross-phenotype correlations, the multivariable association estimates from a subset of relatively independent diagnoses were used to create a weighted sum PheRS. We constructed time-restricted risk scores using data from 38,359 participants in the Michigan Genomics Initiative (MGI) based on the diagnoses contained in the EHR at 0, 1, 2, and 5 years prior to the target pancreatic cancer diagnosis. The PheRS was assessed for predictability in the UK Biobank (UKB). We tested the relative contribution of PheRS when added to a model containing a summary measure of inherited genetic susceptibility (PRS) plus other covariates like age, sex, smoking status, drinking status, and body mass index (BMI).

RESULTS

Our exploration of co-occurrence patterns identified expected associations while also revealing unexpected relationships that may warrant closer attention. Solely using the pancreatic cancer PheRS at 5 years before the target diagnoses yielded an AUC of 0.60 (95% CI = [0.58, 0.62]) in UKB. A larger predictive model including PheRS, PRS, and the covariates at the 5-year threshold achieved an AUC of 0.74 (95% CI = [0.72, 0.76]) in UKB. We note that PheRS does contribute independently in the joint model. Finally, scores at the top percentiles of the PheRS distribution demonstrated promise in terms of risk stratification. Scores in the top 2% were 10.20 (95% CI = [9.34, 12.99]) times more likely to identify cases than those in the bottom 98% in UKB at the 5-year threshold prior to pancreatic cancer diagnosis.

CONCLUSIONS

We developed a framework for creating a time-restricted PheRS from EHR data for pancreatic cancer using the rich information content of a medical phenome. In addition to identifying hypothesis-generating associations for future research, this PheRS demonstrates a potentially important contribution in identifying high-risk individuals, even after adjusting for PRS for pancreatic cancer and other traditional epidemiologic covariates. The methods are generalizable to other phenotypic traits.

Collapse

Beesley LJ, Mukherjee B. Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification. Biometrics 2020;78:214-226. [PMID: 33179768 DOI: 10.1111/biom.13400] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 10/26/2020] [Accepted: 10/29/2020] [Indexed: 12/27/2022]

Fritsche LG, Patil S, Beesley LJ, VandeHaar P, Salvatore M, Ma Y, Peng RB, Taliun D, Zhou X, Mukherjee B. Cancer PRSweb: An Online Repository with Polygenic Risk Scores for Major Cancer Traits and Their Evaluation in Two Independent Biobanks. Am J Hum Genet 2020;107:815-836. [PMID: 32991828 PMCID: PMC7675001 DOI: 10.1016/j.ajhg.2020.08.025] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Accepted: 08/28/2020] [Indexed: 02/06/2023] Open

Affiliation(s)

Lars G Fritsche Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; University of Michigan Rogel Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA.
Snehal Patil Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
Lauren J Beesley Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
Peter VandeHaar Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
Maxwell Salvatore Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
Ying Ma Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
Robert B Peng Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Department of Statistics, Northwestern University, Evanston, IL 60208, USA
Daniel Taliun Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
Xiang Zhou Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
Bhramar Mukherjee Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI 48109, USA; Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; University of Michigan Rogel Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA.

Collapse

King C, Mulugeta A, Nabi F, Walton R, Zhou A, Hyppönen E. Mendelian randomization case-control PheWAS in UK Biobank shows evidence of causality for smoking intensity in 28 distinct clinical conditions. EClinicalMedicine 2020;26:100488. [PMID: 33089118 PMCID: PMC7564324 DOI: 10.1016/j.eclinm.2020.100488] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 07/14/2020] [Accepted: 07/15/2020] [Indexed: 12/30/2022] Open

Bi W, Fritsche LG, Mukherjee B, Kim S, Lee S. A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank. Am J Hum Genet 2020;107:222-233. [PMID: 32589924 DOI: 10.1016/j.ajhg.2020.06.003] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 06/03/2020] [Indexed: 12/09/2022] Open