1
|
Gadd DA, Hillary RF, Kuncheva Z, Mangelis T, Cheng Y, Dissanayake M, Admanit R, Gagnon J, Lin T, Ferber KL, Runz H, Foley CN, Marioni RE, Sun BB. Blood protein assessment of leading incident diseases and mortality in the UK Biobank. NATURE AGING 2024; 4:939-948. [PMID: 38987645 PMCID: PMC11257969 DOI: 10.1038/s43587-024-00655-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 05/22/2024] [Indexed: 07/12/2024]
Abstract
The circulating proteome offers insights into the biological pathways that underlie disease. Here, we test relationships between 1,468 Olink protein levels and the incidence of 23 age-related diseases and mortality in the UK Biobank (n = 47,600). We report 3,209 associations between 963 protein levels and 21 incident outcomes. Next, protein-based scores (ProteinScores) are developed using penalized Cox regression. When applied to test sets, six ProteinScores improve the area under the curve estimates for the 10-year onset of incident outcomes beyond age, sex and a comprehensive set of 24 lifestyle factors, clinically relevant biomarkers and physical measures. Furthermore, the ProteinScore for type 2 diabetes outperforms a polygenic risk score and HbA1c-a clinical marker used to monitor and diagnose type 2 diabetes. The performance of scores using metabolomic and proteomic features is also compared. These data characterize early proteomic contributions to major age-related diseases, demonstrating the value of the plasma proteome for risk stratification.
Collapse
Affiliation(s)
- Danni A Gadd
- Optima Partners, Edinburgh, UK
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Robert F Hillary
- Optima Partners, Edinburgh, UK
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Zhana Kuncheva
- Optima Partners, Edinburgh, UK
- Bayes Centre, University of Edinburgh, Edinburgh, UK
| | - Tasos Mangelis
- Optima Partners, Edinburgh, UK
- Bayes Centre, University of Edinburgh, Edinburgh, UK
| | - Yipeng Cheng
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Manju Dissanayake
- Optima Partners, Edinburgh, UK
- Bayes Centre, University of Edinburgh, Edinburgh, UK
| | - Romi Admanit
- Biostatistics, Research and Development, Biogen Inc., Cambridge, MA, USA
| | - Jake Gagnon
- Biostatistics, Research and Development, Biogen Inc., Cambridge, MA, USA
| | - Tinchi Lin
- Biostatistics, Research and Development, Biogen Inc., Cambridge, MA, USA
| | - Kyle L Ferber
- Biostatistics, Research and Development, Biogen Inc., Cambridge, MA, USA
| | - Heiko Runz
- Translational Sciences, Research and Development, Biogen Inc., Cambridge, MA, USA
| | - Christopher N Foley
- Optima Partners, Edinburgh, UK.
- Bayes Centre, University of Edinburgh, Edinburgh, UK.
| | - Riccardo E Marioni
- Optima Partners, Edinburgh, UK.
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
| | - Benjamin B Sun
- Translational Sciences, Research and Development, Biogen Inc., Cambridge, MA, USA.
- Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
| |
Collapse
|
2
|
Hou K, Xu Z, Ding Y, Mandla R, Shi Z, Boulier K, Harpak A, Pasaniuc B. Calibrated prediction intervals for polygenic scores across diverse contexts. Nat Genet 2024; 56:1386-1396. [PMID: 38886587 DOI: 10.1038/s41588-024-01792-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 05/08/2024] [Indexed: 06/20/2024]
Abstract
Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields. We show that PGS performance varies broadly across contexts and biobanks. Contexts such as age, sex and income can impact PGS accuracy with similar magnitudes as genetic ancestry. Here we introduce an approach (CalPred) that models all contexts jointly to produce prediction intervals that vary across contexts to achieve calibration (include the trait with 90% probability), whereas existing methods are miscalibrated. In analyses of 72 traits across large and diverse biobanks (All of Us and UK Biobank), we find that prediction intervals required adjustment by up to 80% for quantitative traits. For disease traits, PGS-based predictions were miscalibrated across socioeconomic contexts such as annual household income levels, further highlighting the need of accounting for context information in PGS-based prediction across diverse populations.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA.
| | - Ziqi Xu
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Ravi Mandla
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Zhuozheng Shi
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Kristin Boulier
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Arbel Harpak
- Department of Population Health, The University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA.
- Institute for Precision Health, University of California Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
3
|
Kervezee L, Dashti HS, Pilz LK, Skarke C, Ruben MD. Using routinely collected clinical data for circadian medicine: A review of opportunities and challenges. PLOS DIGITAL HEALTH 2024; 3:e0000511. [PMID: 38781189 PMCID: PMC11115276 DOI: 10.1371/journal.pdig.0000511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
A wealth of data is available from electronic health records (EHR) that are collected as part of routine clinical care in hospitals worldwide. These rich, longitudinal data offer an attractive object of study for the field of circadian medicine, which aims to translate knowledge of circadian rhythms to improve patient health. This narrative review aims to discuss opportunities for EHR in studies of circadian medicine, highlight the methodological challenges, and provide recommendations for using these data to advance the field. In the existing literature, we find that data collected in real-world clinical settings have the potential to shed light on key questions in circadian medicine, including how 24-hour rhythms in clinical features are associated with-or even predictive of-health outcomes, whether the effect of medication or other clinical activities depend on time of day, and how circadian rhythms in physiology may influence clinical reference ranges or sampling protocols. However, optimal use of EHR to advance circadian medicine requires careful consideration of the limitations and sources of bias that are inherent to these data sources. In particular, time of day influences almost every interaction between a patient and the healthcare system, creating operational 24-hour patterns in the data that have little or nothing to do with biology. Addressing these challenges could help to expand the evidence base for the use of EHR in the field of circadian medicine.
Collapse
Affiliation(s)
- Laura Kervezee
- Group of Circadian Medicine, Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Hassan S. Dashti
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Luísa K. Pilz
- Department of Anesthesiology and Intensive Care Medicine CCM / CVK, Charité–Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany
- ECRC Experimental and Clinical Research Center, Charité–Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany
| | - Carsten Skarke
- Institute for Translational Medicine and Therapeutics (ITMAT), University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- Chronobiology and Sleep Institute (CSI), University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Marc D. Ruben
- Divisions of Pulmonary and Sleep Medicine and Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| |
Collapse
|
4
|
Zhang J, Zhan J, Jin J, Ma C, Zhao R, O'Connell J, Jiang Y, Koelsch BL, Zhang H, Chatterjee N. An ensemble penalized regression method for multi-ancestry polygenic risk prediction. Nat Commun 2024; 15:3238. [PMID: 38622117 PMCID: PMC11271575 DOI: 10.1038/s41467-024-47357-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 03/28/2024] [Indexed: 04/17/2024] Open
Abstract
Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination ofL 1 (lasso) andL 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.
Collapse
Affiliation(s)
- Jingning Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| | | | - Jin Jin
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Cheng Ma
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Ruzhang Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | | | | | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
5
|
Zhang J, Zhan J, Jin J, Ma C, Zhao R, O’Connell J, Jiang Y, Koelsch BL, Zhang H, Chatterjee N. An Ensemble Penalized Regression Method for Multi-ancestry Polygenic Risk Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.15.532652. [PMID: 36993331 PMCID: PMC10055041 DOI: 10.1101/2023.03.15.532652] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of ℒ 1 (lasso) and ℒ 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.
Collapse
Affiliation(s)
- Jingning Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | - Jin Jin
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Cheng Ma
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Ruzhang Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | | | | | | | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
6
|
Mikołajczyk-Stecyna J, Zuk E, Seremak-Mrozikiewicz A, Kurzawińska G, Wolski H, Drews K, Chmurzynska A. Genetic risk score for gestational weight gain. Eur J Obstet Gynecol Reprod Biol 2024; 294:20-27. [PMID: 38184896 DOI: 10.1016/j.ejogrb.2023.12.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 12/15/2023] [Accepted: 12/20/2023] [Indexed: 01/09/2024]
Abstract
Gestational weight gain (GWG) involves health consequences for both mother and offspring. Genetic factors seem to play a role in the GWG trait. For small effect sizes of a single genetic polymorphism (SNP), a genetic risk score (GRS) summarizing risk-associated variation from multiple SNPs can serve as an effective approach to genetic association analysis. The aim of the study was to analyze the association between genetic risk score (GRS) and gestational weight gain (GWG). GWG was calculated for a total of 342 healthy Polish women of Caucasian origin, aged 19 to 45 years. The SNPs rs9939609 (FTO), rs6548238 (TMEM18), rs17782313 (MC4R), rs10938397 (GNPDA2), rs10913469 (SEC16B), rs1137101 (LEPR), rs7799039 (LEP), and rs5443 (GNB3) were genotyped using commercial TaqMan SNP assays. A simple genetic risk score was calculated into two ways: GRS1 based on the sum of risk alleles from each of the SNPs, while GRS2 based on the sum of risk alleles of FTO, LEPR, LEP, and GNB3. Positive association between GRS2 and GWG (β = 0.12, p = 0.029) was observed. Genetic risk variants of TMEM18 (p = 0.006, OR = 2.6) and GNB3 (p < 0.001, OR = 3.3) are more frequent in women with increased GWG, but a risk variant of GNPDA2 (p < 0.001, OR = 2.7) is more frequent in women with adequate GWG, and a risk variant of LEPR (p = 0.011, OR = 3.1) in women with decreased GWG. GRS2 and genetic variants of TMEM18, GNB3, GNPDA2, and LEPR are associated with weight gain during pregnancy.
Collapse
Affiliation(s)
- Joanna Mikołajczyk-Stecyna
- Department of Human Nutrition and Dietetics, Poznań University of Life Sciences, Wojska Polskiego 31, 60-624 Poznań, Poland
| | - Ewelina Zuk
- Department of Human Nutrition and Dietetics, Poznań University of Life Sciences, Wojska Polskiego 31, 60-624 Poznań, Poland
| | - Agnieszka Seremak-Mrozikiewicz
- Division of Perinatology and Women's Diseases, Poznań University of Medical Sciences, Polna 33, 60-535 Poznań, Poland; Laboratory of Molecular Biology, Division of Perinatology and Women's Diseases, Poznań University of Medical Sciences, Polna 33, 60-535 Poznań, Poland
| | - Grażyna Kurzawińska
- Division of Perinatology and Women's Diseases, Poznań University of Medical Sciences, Polna 33, 60-535 Poznań, Poland; Laboratory of Molecular Biology, Division of Perinatology and Women's Diseases, Poznań University of Medical Sciences, Polna 33, 60-535 Poznań, Poland
| | - Hubert Wolski
- Division of Perinatology and Women's Diseases, Poznań University of Medical Sciences, Polna 33, 60-535 Poznań, Poland; Podhale State College of Applied Sciences in Nowy Targ, Kokoszków 71, 34-400 Nowy Targ, Poland
| | - Krzysztof Drews
- Division of Perinatology and Women's Diseases, Poznań University of Medical Sciences, Polna 33, 60-535 Poznań, Poland; Laboratory of Molecular Biology, Division of Perinatology and Women's Diseases, Poznań University of Medical Sciences, Polna 33, 60-535 Poznań, Poland
| | - Agata Chmurzynska
- Department of Human Nutrition and Dietetics, Poznań University of Life Sciences, Wojska Polskiego 31, 60-624 Poznań, Poland.
| |
Collapse
|
7
|
Gao J, Bonzel CL, Hong C, Varghese P, Zakir K, Gronsbell J. Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms. J Am Med Inform Assoc 2024; 31:640-650. [PMID: 38128118 PMCID: PMC10873838 DOI: 10.1093/jamia/ocad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/22/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023] Open
Abstract
OBJECTIVE High-throughput phenotyping will accelerate the use of electronic health records (EHRs) for translational research. A critical roadblock is the extensive medical supervision required for phenotyping algorithm (PA) estimation and evaluation. To address this challenge, numerous weakly-supervised learning methods have been proposed. However, there is a paucity of methods for reliably evaluating the predictive performance of PAs when a very small proportion of the data is labeled. To fill this gap, we introduce a semi-supervised approach (ssROC) for estimation of the receiver operating characteristic (ROC) parameters of PAs (eg, sensitivity, specificity). MATERIALS AND METHODS ssROC uses a small labeled dataset to nonparametrically impute missing labels. The imputations are then used for ROC parameter estimation to yield more precise estimates of PA performance relative to classical supervised ROC analysis (supROC) using only labeled data. We evaluated ssROC with synthetic, semi-synthetic, and EHR data from Mass General Brigham (MGB). RESULTS ssROC produced ROC parameter estimates with minimal bias and significantly lower variance than supROC in the simulated and semi-synthetic data. For the 5 PAs from MGB, the estimates from ssROC are 30% to 60% less variable than supROC on average. DISCUSSION ssROC enables precise evaluation of PA performance without demanding large volumes of labeled data. ssROC is also easily implementable in open-source R software. CONCLUSION When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research.
Collapse
Affiliation(s)
- Jianhui Gao
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Paul Varghese
- Health Informatics, Verily Life Sciences, Cambridge, MA, United States
| | - Karim Zakir
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
8
|
Yuan G, Zhai Y, Tang J, Zhou X. Selection of HBV key reactivation factors based on maximum information coefficient combined with cosine similarity. Technol Health Care 2024; 32:749-763. [PMID: 37393455 DOI: 10.3233/thc-230161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2023]
Abstract
BACKGROUND Hepatitis B Virus (HBV) reactivation is the most common complication for patients with primary liver cancer (PLC) after radiotherapy. How to reduce the reactivation of HBV has been a hot topic in the study of postoperative radiotherapy for liver cancer. OBJECTIVE To find out the inducement of HBV reactivation, a feature selection algorithm (MIC-CS) using maximum information coefficient (MIC) combined with cosine similarity (CS) was proposed to screen the risk factors that may affect HBV reactivation. METHOD Firstly, different factors were coded and MIC between patients was calculated to acquire the association between different factors and HBV reactivation. Secondly, a cosine similarity algorithm was constructed to calculate the similarity relationship between different factors, thus removing redundant information. Finally, combined with the weight of the two, the potential risk factors were sorted and the key factors leading to HBV reactivation were selected. RESULTS The results indicated that HBV baseline, external boundary, TNM, KPS score, VD, AFP, and Child-Pugh could lead to HBV reactivation after radiotherapy. The classification model was constructed for the above factors, with the highest classification accuracy of 84% and the AUC value of 0.71. CONCLUSION Comparing multiple feature selection methods, the results showed that the effect of the MIC-CS was significantly better than MIM, CMIM, and mRMR, so it has a very broad application prospect.
Collapse
Affiliation(s)
- Gaoteng Yuan
- College of Computer and Information, Hohai University, Nanjing, Jiangsu, China
| | - Yi Zhai
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, Shandong, China
| | - Jiansong Tang
- College of Computer and Information, Hohai University, Nanjing, Jiangsu, China
| | - Xiaofeng Zhou
- College of Computer and Information, Hohai University, Nanjing, Jiangsu, China
| |
Collapse
|
9
|
Zhang Y, Xu W, Yang P, Zhang A. Machine learning for the prediction of sepsis-related death: a systematic review and meta-analysis. BMC Med Inform Decis Mak 2023; 23:283. [PMID: 38082381 PMCID: PMC10712076 DOI: 10.1186/s12911-023-02383-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 11/28/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND AND OBJECTIVES Sepsis is accompanied by a considerably high risk of mortality in the short term, despite the availability of recommended mortality risk assessment tools. However, these risk assessment tools seem to have limited predictive value. With the gradual integration of machine learning into clinical practice, some researchers have attempted to employ machine learning for early mortality risk prediction in sepsis patients. Nevertheless, there is a lack of comprehensive understanding regarding the construction of predictive variables using machine learning and the value of various machine learning methods. Thus, we carried out this systematic review and meta-analysis to explore the predictive value of machine learning for sepsis-related death at different time points. METHODS PubMed, Embase, Cochrane, and Web of Science databases were searched until August 9th, 2022. The risk of bias in predictive models was assessed using the Prediction model Risk of Bias Assessment Tool (PROBAST). We also performed subgroup analysis according to time of death and type of model and summarized current predictive variables used to construct models for sepsis death prediction. RESULTS Fifty original studies were included, covering 104 models. The combined Concordance index (C-index), sensitivity, and specificity of machine learning models were 0.799, 0.81, and 0.80 in the training set, and 0.774, 0.71, and 0.68 in the validation set, respectively. Machine learning outperformed conventional clinical scoring tools and showed excellent C-index, sensitivity, and specificity in different subgroups. Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) are the preferred machine learning models because they showed more favorable accuracy with similar modeling variables. This study found that lactate was the most frequent predictor but was seriously ignored by current clinical scoring tools. CONCLUSION Machine learning methods demonstrate relatively favorable accuracy in predicting the mortality risk in sepsis patients. Given the limitations in accuracy and applicability of existing prediction scoring systems, there is an opportunity to explore updates based on existing machine learning approaches. Specifically, it is essential to develop or update more suitable mortality risk assessment tools based on the specific contexts of use, such as emergency departments, general wards, and intensive care units.
Collapse
Affiliation(s)
- Yan Zhang
- Department of Critical Care Medicine, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, 400010, China
| | - Weiwei Xu
- Department of Endocrine and Metabolic Diseases, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, 400010, China
| | - Ping Yang
- Department of Critical Care Medicine, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, 400010, China.
| | - An Zhang
- Department of Critical Care Medicine, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, 400010, China.
| |
Collapse
|
10
|
Chapman CR. Ethical, legal, and social implications of genetic risk prediction for multifactorial disease: a narrative review identifying concerns about interpretation and use of polygenic scores. J Community Genet 2023; 14:441-452. [PMID: 36529843 PMCID: PMC10576696 DOI: 10.1007/s12687-022-00625-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 12/04/2022] [Indexed: 12/23/2022] Open
Abstract
Advances in genomics have enabled the development of polygenic scores (PGS), sometimes called polygenic risk scores, in the context of multifactorial diseases and disorders such as cancer, cardiovascular disease, and schizophrenia. PGS estimate an individual's genetic predisposition, as compared to other members of a population, for conditions which are influenced by both genetic and environmental factors. There is significant interest in using genetic risk prediction afforded through PGS in public health, clinical care, and research settings, yet many acknowledge the need to thoughtfully consider and address ethical, legal, and social implications (ELSI). To contribute to this effort, this paper reports on a narrative review of the literature, with the aim of identifying and categorizing ELSI relating to genetic risk prediction in the context of multifactorial disease, which have been raised by scholars in the field. Ninety-two articles, spanning from 1977 to 2021, met the inclusion criteria for this study. Identified ELSI included potential benefits, challenges and risks that focused on concerns about interpretation and use, and ethical obligations to maximize benefits, minimize risks, promote justice, and support autonomy. This research will support geneticists, clinicians, genetic counselors, patients, patient advocates, and policymakers in recognizing and addressing ethical concerns associated with PGS; it will also guide future empirical and normative research.
Collapse
Affiliation(s)
- Carolyn Riley Chapman
- Department of Population Health (Division of Medical Ethics), NYU Grossman School of Medicine, New York, NY, USA.
- Center for Human Genetics and Genomics, NYU Grossman School of Medicine, Science Building, 435 E. 30th St, 8th Floor, New York, NY, 10016, USA.
| |
Collapse
|
11
|
Ren Y, Zhang Y, Zhan J, Sun J, Luo J, Liao W, Cheng X. Machine learning for prediction of delirium in patients with extensive burns after surgery. CNS Neurosci Ther 2023; 29:2986-2997. [PMID: 37122154 PMCID: PMC10493655 DOI: 10.1111/cns.14237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 02/23/2023] [Accepted: 04/15/2023] [Indexed: 05/02/2023] Open
Abstract
AIMS Machine learning-based identification of key variables and prediction of postoperative delirium in patients with extensive burns. METHODS Five hundred and eighteen patients with extensive burns who underwent surgery were included and randomly divided into a training set, a validation set, and a testing set. Multifactorial logistic regression analysis was used to screen for significant variables. Nine prediction models were constructed in the training and validation sets (80% of dataset). The testing set (20% of dataset) was used to further evaluate the model. The area under the receiver operating curve (AUROC) was used to compare model performance. SHapley Additive exPlanations (SHAP) was used to interpret the best one and to externally validate it in another large tertiary hospital. RESULTS Seven variables were used in the development of nine prediction models: physical restraint, diabetes, sex, preoperative hemoglobin, acute physiological and chronic health assessment, time in the Burn Intensive Care Unit and total body surface area. Random Forest (RF) outperformed the other eight models in terms of predictive performance (ROC:84.00%) When external validation was performed, RF performed well (accuracy: 77.12%, sensitivity: 67.74% and specificity: 80.46%). CONCLUSION The first machine learning-based delirium prediction model for patients with extensive burns was successfully developed and validated. High-risk patients for delirium can be effectively identified and targeted interventions can be made to reduce the incidence of delirium.
Collapse
Affiliation(s)
- Yujie Ren
- Medical Center of Burn Plastic and Wound RepairThe First Affiliated Hospital of Nanchang UniversityNanchangChina
| | - Yu Zhang
- Medical Innovation CenterThe First Affiliated Hospital of Nanchang UniversityNanchangChina
| | - Jianhua Zhan
- Medical Center of Burn Plastic and Wound RepairThe First Affiliated Hospital of Nanchang UniversityNanchangChina
| | - Junfeng Sun
- Medical Center of Burns and PlasticGanzhou People's HospitalGanzhouChina
| | - Jinhua Luo
- Medical Center of Burn Plastic and Wound RepairThe First Affiliated Hospital of Nanchang UniversityNanchangChina
| | - Wenqiang Liao
- Medical Center of Burn Plastic and Wound RepairThe First Affiliated Hospital of Nanchang UniversityNanchangChina
| | - Xing Cheng
- Medical Center of Burn Plastic and Wound RepairThe First Affiliated Hospital of Nanchang UniversityNanchangChina
| |
Collapse
|
12
|
Gao Y, Sharma T, Cui Y. Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective. Annu Rev Biomed Data Sci 2023; 6:153-171. [PMID: 37104653 PMCID: PMC10529864 DOI: 10.1146/annurev-biodatasci-020722-020704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Artificial intelligence (AI) and other data-driven technologies hold great promise to transform healthcare and confer the predictive power essential to precision medicine. However, the existing biomedical data, which are a vital resource and foundation for developing medical AI models, do not reflect the diversity of the human population. The low representation in biomedical data has become a significant health risk for non-European populations, and the growing application of AI opens a new pathway for this health risk to manifest and amplify. Here we review the current status of biomedical data inequality and present a conceptual framework for understanding its impacts on machine learning. We also discuss the recent advances in algorithmic interventions for mitigating health disparities arising from biomedical data inequality. Finally, we briefly discuss the newly identified disparity in data quality among ethnic groups and its potential impacts on machine learning.
Collapse
Affiliation(s)
- Yan Gao
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| | - Teena Sharma
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| | - Yan Cui
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| |
Collapse
|
13
|
Hou K, Xu Z, Ding Y, Harpak A, Pasaniuc B. Calibrated prediction intervals for polygenic scores across diverse contexts. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.24.23293056. [PMID: 37546999 PMCID: PMC10402211 DOI: 10.1101/2023.07.24.23293056] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields from agriculture to personalized medicine. We analyze data from two large biobanks in the US (All of Us) and the UK (UK Biobank) to find widespread variability in PGS performance across contexts. Many contexts, including age, sex, and income, impact PGS accuracies with similar magnitudes as genetic ancestry. PGSs trained in single versus multi-ancestry cohorts show similar context-specificity in their accuracies. We introduce trait prediction intervals that are allowed to vary across contexts as a principled approach to account for context-specific PGS accuracy in genomic prediction. We model the impact of all contexts in a joint framework to enable PGS-based trait predictions that are well-calibrated (contain the trait value with 90% probability in all contexts), whereas methods that ignore context are mis-calibrated. We show that prediction intervals need to be adjusted for all considered traits ranging from 10% for diastolic blood pressure to 80% for waist circumference. Adjustment of prediction intervals depends on the dataset; for example, prediction intervals for education years need to be adjusted by 90% in All of Us versus 8% in UK Biobank. Our results provide a path forward towards utilization of PGS as a prediction tool across all individuals regardless of their contexts while highlighting the importance of comprehensive profile of context information in study design and data collection.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ziqi Xu
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Arbel Harpak
- Department of Population Health, The University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, Los Angeles
| |
Collapse
|
14
|
Gu Y, Yan C, Wang T, Hu B, Zhu M, Jin G. Construction and evaluation of the functional polygenic risk score for gastric cancer in a prospective cohort of the European population. Chin Med J (Engl) 2023:00029330-990000000-00640. [PMID: 37394533 DOI: 10.1097/cm9.0000000000002716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Indexed: 07/04/2023] Open
Abstract
BACKGROUND A polygenic risk score (PRS) derived from 112 single-nucleotide polymorphisms (SNPs) for gastric cancer has been reported in Chinese populations (PRS-112). However, its performance in other populations is unknown. A functional PRS (fPRS) using functional SNPs (fSNPs) may improve the generalizability of the PRS across populations with distinct ethnicities. METHODS We performed functional annotations on SNPs in strong linkage disequilibrium (LD) with the 112 previously reported SNPs to identify fSNPs that affect protein-coding or transcriptional regulation. Subsequently, we constructed an fPRS based on the fSNPs by using the LDpred2-infinitesimal model and then analyzed the performance of the PRS-112 and fPRS in the risk prediction of gastric cancer in 457,521 European participants of the UK Biobank cohort. Finally, the performance of the fPRS in combination with lifestyle factors were evaluated in predicting the risk of gastric cancer. RESULTS During 4,582,045 person-years of follow-up with a total of 623 incident gastric cancer cases, we found no significant association between the PRS-112 and gastric cancer risk in the European population (hazard ratio [HR] = 1.00 [95% confidence interval (CI) 0.93-1.09], P = 0.846). We identified 125 fSNPs, including seven deleterious protein-coding SNPs and 118 regulatory non-coding SNPs, and used them to constructed the fPRS-125. Our result showed that the fPRS-125 was significantly associated with gastric cancer risk (HR = 1.11 [95% CI, 1.03-1.20], P = 0.009). Compared to participants with a low fPRS-125 (bottom quintile), those with a high fPRS-125 (top quintile) had a higher risk of incident gastric cancer (HR = 1.43 [95% CI, 1.12-1.84], P = 0.005). Moreover, we observed that participants with both an unfavorable lifestyle and a high genetic risk had the highest risk of incident gastric cancer (HR = 4.99 [95% CI, 1.55-16.10], P = 0.007) compared to those with both a favorable lifestyle and a low genetic risk. CONCLUSION These results indicate that the fPRS-125 derived from fSNPs may act as an indicator to measure the genetic risk of gastric cancer in the European population.
Collapse
Affiliation(s)
- Yuanliang Gu
- Department of Epidemiology, School of Public Health, Southeast University, Nanjing, Jiangsu 210009, China
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Caiwang Yan
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
- Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine and China International Cooperation Center for Environment and Human Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Tianpei Wang
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Beiping Hu
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Meng Zhu
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
- Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine and China International Cooperation Center for Environment and Human Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, Jiangsu 210009, China
| | - Guangfu Jin
- Department of Epidemiology, School of Public Health, Southeast University, Nanjing, Jiangsu 210009, China
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
- Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine and China International Cooperation Center for Environment and Human Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, Jiangsu 210009, China
| |
Collapse
|
15
|
Ding Y, Hou K, Xu Z, Pimplaskar A, Petter E, Boulier K, Privé F, Vilhjálmsson BJ, Olde Loohuis LM, Pasaniuc B. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 2023; 618:774-781. [PMID: 37198491 PMCID: PMC10284707 DOI: 10.1038/s41586-023-06079-4] [Citation(s) in RCA: 65] [Impact Index Per Article: 65.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 04/12/2023] [Indexed: 05/19/2023]
Abstract
Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use1-3. PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R2)4, ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank5 (ATLAS, n = 36,778) along with the UK Biobank6 (UKBB, n = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries7 in all considered populations, even within traditionally labelled 'homogeneous' genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of -0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs.
Collapse
Affiliation(s)
- Yi Ding
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ziqi Xu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Aditya Pimplaskar
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ella Petter
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Kristin Boulier
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Florian Privé
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark
| | - Bjarni J Vilhjálmsson
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA
| | - Loes M Olde Loohuis
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Institute for Precision Health, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
16
|
Drouet DE, Liu S, Crawford DC. Assessment of multi-population polygenic risk scores for lipid traits in African Americans. PeerJ 2023; 11:e14910. [PMID: 37214096 PMCID: PMC10198155 DOI: 10.7717/peerj.14910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/25/2023] [Indexed: 05/24/2023] Open
Abstract
Polygenic risk scores (PRS) based on genome-wide discoveries are promising predictors or classifiers of disease development, severity, and/or progression for common clinical outcomes. A major limitation of most risk scores is the paucity of genome-wide discoveries in diverse populations, prompting an emphasis to generate these needed data for trans-population and population-specific PRS construction. Given diverse genome-wide discoveries are just now being completed, there has been little opportunity for PRS to be evaluated in diverse populations independent from the discovery efforts. To fill this gap, we leverage here summary data from a recent genome-wide discovery study of lipid traits (HDL-C, LDL-C, triglycerides, and total cholesterol) conducted in diverse populations represented by African Americans, Hispanics, Asians, Native Hawaiians, Native Americans, and others by the Population Architecture using Genomics and Epidemiology (PAGE) Study. We constructed lipid trait PRS using PAGE Study published genetic variants and weights in an independent African American adult patient population linked to de-identified electronic health records and genotypes from the Illumina Metabochip (n = 3,254). Using multi-population lipid trait PRS, we assessed levels of association for their respective lipid traits, clinical outcomes (cardiovascular disease and type 2 diabetes), and common clinical labs. While none of the multi-population PRS were strongly associated with the tested trait or outcome, PRSLDL-Cwas nominally associated with cardiovascular disease. These data demonstrate the complexity in applying PRS to real-world clinical data even when data from multiple populations are available.
Collapse
Affiliation(s)
- Domenica E. Drouet
- Department of Medicine, Case Western Reserve University, Cleveland, OH, United States of America
| | - Shiying Liu
- Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, United States of America
| | - Dana C. Crawford
- Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, United States of America
- Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States of America
- Genetics and Genome Sciences, Case Western Reserve University, Cleveland, OH, United States of America
| |
Collapse
|
17
|
Mahaux O, Powell G, Haguinet F, Sobczak P, Saini N, Barry A, Mustafa A, Bate A. Identifying Safety Subgroups at Risk: Assessing the Agreement Between Statistical Alerting and Patient Subgroup Risk. Drug Saf 2023; 46:601-614. [PMID: 37131012 PMCID: PMC10153776 DOI: 10.1007/s40264-023-01306-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/03/2023] [Indexed: 05/04/2023]
Abstract
INTRODUCTION Identifying individual characteristics or underlying conditions linked to adverse drug reactions (ADRs) can help optimise the benefit-risk ratio for individuals. A systematic evaluation of statistical methods to identify subgroups potentially at risk using spontaneous ADR report datasets is lacking. OBJECTIVES In this study, we aimed to assess concordance between subgroup disproportionality scores and European Medicines Agency Pharmacovigilance Risk Assessment Committee (PRAC) discussions of potential subgroup risk. METHODS The subgroup disproportionality method described by Sandberg et al., and variants, were applied to statistically screen for subgroups at potential increased risk of ADRs, using data from the US FDA Adverse Event Reporting System (FAERS) cumulative from 2004 to quarter 2 2021. The reference set used to assess concordance was manually extracted from PRAC minutes from 2015 to 2019. Mentions of subgroups presenting potential differentiated risk and overlapping with the Sandberg method were included. RESULTS Twenty-seven PRAC subgroup examples representing 1719 subgroup drug-event combinations (DECs) in FAERS were included. Using the Sandberg methodology, 2 of the 27 could be detected (one for age and one for sex). No subgroup examples for pregnancy and underlying condition were detected. With a methodological variant, 14 of 27 examples could be detected. CONCLUSIONS We observed low concordance between subgroup disproportionality scores and PRAC discussions of potential subgroup risk. Subgroup analyses performed better for age and sex, while for covariates not well-captured in FAERS, such as underlying condition and pregnancy, additional data sources should be considered.
Collapse
Affiliation(s)
- Olivia Mahaux
- Safety Innovation and Analytics, GSK, Wavre, Belgium.
| | - Greg Powell
- Safety Innovation and Analytics, GSK, Durham, NC, USA
| | | | | | - Namrata Saini
- Safety Evaluation and Risk Management, GSK, Bangalore, India
| | - Allen Barry
- University of North Carolina, Chapel Hill, NC, USA
| | | | - Andrew Bate
- Safety Innovation and Analytics, GSK, London, UK
- London School of Hygiene and Tropical Medicine, University of London, London, UK
| |
Collapse
|
18
|
Forrest IS, Petrazzini BO, Duffy Á, Park JK, O'Neal AJ, Jordan DM, Rocheleau G, Nadkarni GN, Cho JH, Blazer AD, Do R. A machine learning model identifies patients in need of autoimmune disease testing using electronic health records. Nat Commun 2023; 14:2385. [PMID: 37169741 PMCID: PMC10130143 DOI: 10.1038/s41467-023-37996-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 04/05/2023] [Indexed: 05/13/2023] Open
Abstract
Systemic autoimmune rheumatic diseases (SARDs) can lead to irreversible damage if left untreated, yet these patients often endure long diagnostic journeys before being diagnosed and treated. Machine learning may help overcome the challenges of diagnosing SARDs and inform clinical decision-making. Here, we developed and tested a machine learning model to identify patients who should receive rheumatological evaluation for SARDs using longitudinal electronic health records of 161,584 individuals from two institutions. The model demonstrated high performance for predicting cases of autoantibody-tested individuals in a validation set, an external test set, and an independent cohort with a broader case definition. This approach identified more individuals for autoantibody testing compared with current clinical standards and a greater proportion of autoantibody carriers among those tested. Diagnoses of SARDs and other autoimmune conditions increased with higher model probabilities. The model detected a need for autoantibody testing and rheumatology encounters up to five years before the test date and assessment date, respectively. Altogether, these findings illustrate that the clinical manifestations of a diverse array of autoimmune conditions are detectable in electronic health records using machine learning, which may help systematize and accelerate autoimmune testing.
Collapse
Affiliation(s)
- Iain S Forrest
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ben O Petrazzini
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Áine Duffy
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joshua K Park
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Anya J O'Neal
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Daniel M Jordan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ghislain Rocheleau
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Girish N Nadkarni
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Judy H Cho
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ashira D Blazer
- Division of Rheumatology, Hospital for Special Surgery, New York, NY, USA
| | - Ron Do
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
19
|
Ryu E, Jenkins GD, Wang Y, Olfson M, Talati A, Lepow L, Coombes BJ, Charney AW, Glicksberg BS, Mann JJ, Weissman MM, Wickramaratne P, Pathak J, Biernacka JM. The importance of social activity to risk of major depression in older adults. Psychol Med 2023; 53:2634-2642. [PMID: 34763736 PMCID: PMC9095757 DOI: 10.1017/s0033291721004566] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 10/04/2021] [Accepted: 10/20/2021] [Indexed: 11/07/2022]
Abstract
BACKGROUND Several social determinants of health (SDoH) have been associated with the onset of major depressive disorder (MDD). However, prior studies largely focused on individual SDoH and thus less is known about the relative importance (RI) of SDoH variables, especially in older adults. Given that risk factors for MDD may differ across the lifespan, we aimed to identify the SDoH that was most strongly related to newly diagnosed MDD in a cohort of older adults. METHODS We used self-reported health-related survey data from 41 174 older adults (50-89 years, median age = 67 years) who participated in the Mayo Clinic Biobank, and linked ICD codes for MDD in the participants' electronic health records. Participants with a history of clinically documented or self-reported MDD prior to survey completion were excluded from analysis (N = 10 938, 27%). We used Cox proportional hazards models with a gradient boosting machine approach to quantify the RI of 30 pre-selected SDoH variables on the risk of future MDD diagnosis. RESULTS Following biobank enrollment, 2073 older participants were diagnosed with MDD during the follow-up period (median duration = 6.7 years). The most influential SDoH was perceived level of social activity (RI = 0.17). Lower level of social activity was associated with a higher risk of MDD [hazard ratio = 2.27 (95% CI 2.00-2.50) for highest v. lowest level]. CONCLUSION Across a range of SDoH variables, perceived level of social activity is most strongly related to MDD in older adults. Monitoring changes in the level of social activity may help identify older adults at an increased risk of MDD.
Collapse
Affiliation(s)
- Euijung Ryu
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Gregory D. Jenkins
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Yanshan Wang
- Department of AI and Informatics, Mayo Clinic, Rochester, USA
| | - Mark Olfson
- Department of Psychiatry, Columbia University and New York State Psychiatric Institute, New York, USA
| | - Ardesheer Talati
- Department of Psychiatry, Columbia University and New York State Psychiatric Institute, New York, USA
| | - Lauren Lepow
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Brandon J. Coombes
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Alexander W. Charney
- Mount Sinai Clinical Intelligence Center, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Benjamin S. Glicksberg
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, USA
| | - J. John Mann
- Department of Psychiatry, Columbia University and New York State Psychiatric Institute, New York, USA
| | - Myrna M. Weissman
- Department of Psychiatry, Columbia University and New York State Psychiatric Institute, New York, USA
| | - Priya Wickramaratne
- Department of Psychiatry, Columbia University and New York State Psychiatric Institute, New York, USA
| | | | - Joanna M. Biernacka
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
- Department of Psychiatry & Psychology, Mayo Clinic, Rochester, USA
| |
Collapse
|
20
|
Zhao Y, Sun L. A stable and adaptive polygenic signal detection method based on repeated sample splitting. CAN J STAT 2023. [DOI: 10.1002/cjs.11768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
21
|
Baker BH, Joo YY, Park J, Cha J, Baccarelli AA, Posner J. Maternal age at birth and child attention-deficit hyperactivity disorder: causal association or familial confounding? J Child Psychol Psychiatry 2023; 64:299-310. [PMID: 36440655 DOI: 10.1111/jcpp.13726] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/13/2022] [Indexed: 11/29/2022]
Abstract
BACKGROUND Causal explanations for the association of young motherhood with increased risk for child attention-deficit hyperactivity disorder (ADHD) remain unclear. METHODS The ABCD Study recruited 11,878 youth from 22 sites across the United States between June 1, 2016 and October 15, 2018. This cross-sectional analysis of 8,514 children aged 8-11 years excluded 2,260 twins/triplets, 265 adopted children, and 839 younger siblings. We examined associations of maternal age with ADHD clinical range diagnoses based on the Child Behavior Checklist and NIH Toolbox Flanker Attention Scores using mixed logistic and linear regression models, respectively. We conducted confounding and causal mediation analyses using genotype array, demographic, socioeconomic, and prenatal environment data to investigate which genetic and environmental variables may explain the association between young maternal age and child ADHD. RESULTS In crude models, each 10-year increase in maternal age was associated with 32% decreased odds of ADHD clinical range diagnosis (OR = 0.68; 95% CI [0.59, 0.78]) and 1.09-points increased NIH Flanker Attention Scores (β = 1.09; 95% CI [0.76, 1.41]), indicating better child visual selective attention. However, adjustment for confounders weakened these associations. The strongest confounders were family income, caregiver education, and ADHD polygenic risk score for ADHD clinical range diagnoses, and family income, caregiver education, and race/ethnicity for NIH Flanker Attention Scores. Breastfeeding duration, prenatal alcohol exposure, and prenatal tobacco exposure were responsible for up to 18%, 6%, and 4% mediation, respectively. CONCLUSIONS Socioeconomic disadvantages were likely the primary explanation for the association of young maternal age with child ADHD, although genetics and modifiable environmental factors also played a role. Public policies aimed at reducing the burden of ADHD associated with young motherhood should target socioeconomic inequalities and support young pregnant women by advocating for reduced prenatal tobacco exposure and healthy breastfeeding practices after childbirth.
Collapse
Affiliation(s)
- Brennan H Baker
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York, USA
| | | | - Junghoon Park
- Department of Economics, Seoul National University, Seoul, Korea
| | - Jiook Cha
- Department of Psychology, Seoul National University, Seoul, Korea.,Department of Brain and Cognitive Sciences, Seoul National University, Seoul, Korea.,AI Institute, Seoul National University, Seoul, Korea
| | - Andrea A Baccarelli
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York, USA
| | - Jonathan Posner
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, North Carolina, USA
| |
Collapse
|
22
|
Yang S, Varghese P, Stephenson E, Tu K, Gronsbell J. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 2023; 30:367-381. [PMID: 36413056 PMCID: PMC9846699 DOI: 10.1093/jamia/ocac216] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/27/2022] [Accepted: 10/27/2022] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used. MATERIALS AND METHODS We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies. RESULTS Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions. DISCUSSION Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released. CONCLUSION Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.
Collapse
Affiliation(s)
- Siyue Yang
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | | | - Ellen Stephenson
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Karen Tu
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
23
|
Mao Z, Gray ALH, Thyagarajan B, Bostick RM. Antioxidant enzyme and DNA base repair genetic risk scores' associations with systemic oxidative stress biomarker in pooled cross-sectional studies. FRONTIERS IN AGING 2023; 4:1000166. [PMID: 37152862 PMCID: PMC10161255 DOI: 10.3389/fragi.2023.1000166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 03/28/2023] [Indexed: 05/09/2023]
Abstract
Background: Oxidative stress is hypothesized to contribute to the pathogenesis of several chronic diseases. Numerous dietary and lifestyle factors are associated with oxidative stress; however, little is known about associations of genetic factors, individually or jointly with dietary and lifestyle factors, with oxidative stress in humans. Methods: We genotyped 22 haplotype-tagging single nucleotide polymorphisms (SNPs) in 3 antioxidant enzyme (AE) genes and 79 SNPs in 14 DNA base excision repair (BER) genes to develop oxidative stress-specific AE and BER genetic risk scores (GRS) in two pooled cross-sectional studies (n = 245) of 30-74-year-old, White, cancer- and inflammatory bowel disease-free adults. Of the genotypes, based on their associations with a systemic oxidative stress biomarker, plasma F2-isoprostanes (FiP) concentrations, we selected 4 GSTP1 SNPs for an AE GRS, and 12 SNPs of 5 genes (XRCC1, TDG, PNKP, MUTYH, and FEN1) for a BER GRS. We also calculated a previously-reported, validated, questionnaire-based, oxidative stress biomarker-weighted oxidative balance score (OBS) comprising 17 anti- and pro-oxidant dietary and lifestyle exposures, with higher scores representing a higher predominance of antioxidant exposures. We used general linear regression to assess adjusted mean FiP concentrations across GRS and OBS tertiles, separately and jointly. Results: The adjusted mean FiP concentrations among those in the highest relative to the lowest oxidative stress-specific AE and BER GRS tertiles were, proportionately, 11.8% (p = 0.12) and 21.2% (p = 0.002) higher, respectively. In the joint AE/BER GRS analysis, the highest estimated mean FiP concentration was among those with jointly high AE/BER GRS. Mean FiP concentrations across OBS tertiles were similar across AE and BER GRS strata. Conclusion: Our pilot study findings suggest that DNA BER, and possibly AE, genotypes collectively may be associated with systemic oxidative stress in humans, and support further research in larger, general populations.
Collapse
Affiliation(s)
- Ziling Mao
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Abigail L. H. Gray
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Bharat Thyagarajan
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, United States
| | - Roberd M. Bostick
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, United States
- Winship Cancer Institute, Emory University, Atlanta, GA, United States
- *Correspondence: Roberd M. Bostick,
| |
Collapse
|
24
|
Zhang Y, Elgart M, Granot-Hershkovitz E, Wang H, Tarraf W, Ramos AR, Stickel AM, Zeng D, Garcia TP, Testai FD, Wassertheil-Smoller S, Isasi CR, Daviglus ML, Kaplan R, Fornage M, DeCarli C, Redline S, González HM, Sofer T. Genetic associations between sleep traits and cognitive ageing outcomes in the Hispanic Community Health Study/Study of Latinos. EBioMedicine 2023; 87:104393. [PMID: 36493726 PMCID: PMC9732133 DOI: 10.1016/j.ebiom.2022.104393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 11/18/2022] [Accepted: 11/18/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Sleep phenotypes have been reported to be associated with cognitive ageing outcomes. However, there is limited research using genetic variants as proxies for sleep traits to study their associations. We estimated associations between Polygenic Risk Scores (PRSs) for sleep duration, insomnia, daytime sleepiness, and obstructive sleep apnoea (OSA) and measures of cogntive ageing in Hispanic/Latino adults. METHODS We used summary statistics from published genome-wide association studies to construct PRSs representing the genetic basis of each sleep trait, then we studied the association of the PRSs of the sleep phenotypes with cognitive outcomes in the Hispanic Community Healthy Study/Study of Latinos. The primary model adjusted for age, sex, study centre, and measures of genetic ancestry. Associations are highlighted if their p-value <0.05. FINDINGS Higher PRS for insomnia was associated with lower global cognitive function and higher risk of mild cognitive impairment (MCI) (OR = 1.20, 95% CI [1.06, 1.36]). Higher PRS for daytime sleepiness was also associated with increased MCI risk (OR = 1.14, 95% CI [1.02, 1.28]). Sleep duration PRS was associated with reduced MCI risk among short and normal sleepers, while among long sleepers it was associated with reduced global cognitive function and with increased MCI risk (OR = 1.40, 95% CI [1.10, 1.78]). Furthermore, adjustment of analyses for the measured sleep phenotypes and APOE-ε4 allele had minor effects on the PRS associations with the cognitive outcomes. INTERPRETATION Genetic measures underlying insomnia, daytime sleepiness, and sleep duration are associated with MCI risk. Genetic and self-reported sleep duration interact in their effect on MCI. FUNDING Described in Acknowledgments.
Collapse
Affiliation(s)
- Yuan Zhang
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Respiratory Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Michael Elgart
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Einat Granot-Hershkovitz
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Heming Wang
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Wassim Tarraf
- Institute of Gerontology, Wayne State University, Detroit, MI, USA
| | - Alberto R Ramos
- Department of Neurology, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Ariana M Stickel
- Department of Psychology, San Diego State University, San Diego, CA, USA
| | - Donglin Zeng
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Tanya P Garcia
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Fernando D Testai
- Department of Neurology and Rehabilitation, University of Illinois College of Medicine at Chicago, Chicago, IL, USA
| | | | - Carmen R Isasi
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Martha L Daviglus
- Institute for Minority Health Research, University of Illinois at Chicago, Chicago, IL, USA
| | - Robert Kaplan
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA; Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Charles DeCarli
- Department of Neurology, Alzheimer's Disease Center, University of California, Davis, Sacramento, CA, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Hector M González
- Department of Neurosciences and Shiley-Marcos Alzheimer's Disease Center, University of California, San Diego, La Jolla, CA, USA
| | - Tamar Sofer
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
25
|
Paltta J, Heikkilä HK, Pirilä L, Eklund KK, Huhtakangas J, Isomäki P, Kaipiainen-Seppänen O, Kristiansson K, Havulinna AS, Sokka-Isler T, Palomäki A. The validity of rheumatoid arthritis diagnoses in Finnish biobanks. Scand J Rheumatol 2023; 52:1-9. [PMID: 34643165 DOI: 10.1080/03009742.2021.1967047] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
OBJECTIVE The aim of this study was to determine the validity of rheumatoid arthritis (RA) diagnoses in patients participating in Finnish biobanks. METHOD We reviewed the electronic medical records of 500 Finnish biobank participants: 125 patients with at least one visit with a diagnosis of seropositive RA, 125 patients with at least one visit with a diagnosis of seronegative RA, and 250 age- and gender-matched controls. The patients were chosen from five different biobank hospitals in Finland. A rheumatologist reviewed the medical records to assess whether each patients' diagnosis was correct. The diagnosis was compared with the diagnostic codes in the Finnish Care Register for Health Care (CRHC) and special reimbursement data of the Social Insurance Institution of Finland. RESULTS The positive predictive value (PPV) of CRHC diagnosis of RA (for seropositive and seronegative RA combined) was 0.82. For patients with a special reimbursement for anti-rheumatic medications for RA, the PPV was 0.89. The PPV was higher in patients with more than one visit. For one, two, five, and 10 visits, the PPV was 0.82, 0.85, 0.89, and 0.90, respectively, and for patients who also had the special reimbursement, the PPV was 0.89, 0.91, 0.93, and 0.94 for one, two, five, and 10 visits, respectively. In patients positive for anti-citrullinated protein antibodies, the PPV was 0.98. CONCLUSION These results demonstrate that the validity of RA diagnoses in Finnish biobanks was good and can be further improved by including data on special reimbursement for medication, number of visits, and serological data.
Collapse
Affiliation(s)
- J Paltta
- Centre for Rheumatology and Clinical Immunology, Division of Medicine, Turku University Hospital and University of Turku, Turku, Finland
| | - H-K Heikkilä
- Centre for Rheumatic Diseases, Tampere University Hospital, Tampere, Finland
| | - L Pirilä
- Centre for Rheumatology and Clinical Immunology, Division of Medicine, Turku University Hospital and University of Turku, Turku, Finland
| | - K K Eklund
- Department of Rheumatology, Helsinki University Hospital, University of Helsinki and Orton Orthopaedic Hospital, Helsinki, Finland
| | - J Huhtakangas
- Division of Rheumatology, Department of Internal Medicine, Oulu University Hospital and Medical Research Center Oulu, Oulu, Finland
| | - P Isomäki
- Centre for Rheumatic Diseases, Tampere University Hospital, Tampere, Finland.,Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | | | - K Kristiansson
- Department of Public Health Solutions, Finnish Institute for Health and Welfare (THL), Helsinki, Finland
| | - A S Havulinna
- Department of Public Health Solutions, Finnish Institute for Health and Welfare (THL), Helsinki, Finland.,Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland
| | - T Sokka-Isler
- Department of Medicine, Jyväskylä Central Hospital, Jyväskylä, Finland
| | - A Palomäki
- Centre for Rheumatology and Clinical Immunology, Division of Medicine, Turku University Hospital and University of Turku, Turku, Finland.,Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland
| | -
- FinnGen members are listed in the Supplementary material
| |
Collapse
|
26
|
Lazareva TE, Barbitoff YA, Changalidis AI, Tkachenko AA, Maksiutenko EM, Nasykhova YA, Glotov AS. Biobanking as a Tool for Genomic Research: From Allele Frequencies to Cross-Ancestry Association Studies. J Pers Med 2022; 12:2040. [PMID: 36556260 PMCID: PMC9783756 DOI: 10.3390/jpm12122040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 11/19/2022] [Accepted: 11/28/2022] [Indexed: 12/14/2022] Open
Abstract
In recent years, great advances have been made in the field of collection, storage, and analysis of biological samples. Large collections of samples, biobanks, have been established in many countries. Biobanks typically collect large amounts of biological samples and associated clinical information; the largest collections include over a million samples. In this review, we summarize the main directions in which biobanks aid medical genetics and genomic research, from providing reference allele frequency information to allowing large-scale cross-ancestry meta-analyses. The largest biobanks greatly vary in the size of the collection, and the amount of available phenotype and genotype data. Nevertheless, all of them are extensively used in genomics, providing a rich resource for genome-wide association analysis, genetic epidemiology, and statistical research into the structure, function, and evolution of the human genome. Recently, multiple research efforts were based on trans-biobank data integration, which increases sample size and allows for the identification of robust genetic associations. We provide prominent examples of such data integration and discuss important caveats which have to be taken into account in trans-biobank research.
Collapse
Affiliation(s)
- Tatyana E. Lazareva
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
- Department of Genetics and Biotechnology, St. Petersburg State University, 199034 St. Petersburg, Russia
| | - Yury A. Barbitoff
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
- Department of Genetics and Biotechnology, St. Petersburg State University, 199034 St. Petersburg, Russia
| | - Anton I. Changalidis
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
- Faculty of Software Engineering and Computer Systems, ITMO University, 197101 St. Petersburg, Russia
| | - Alexander A. Tkachenko
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
| | - Evgeniia M. Maksiutenko
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
| | - Yulia A. Nasykhova
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
| | - Andrey S. Glotov
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
| |
Collapse
|
27
|
Waksmunski AR, Kinzy TG, Cruz LA, Nealon CL, Halladay CW, Simpson P, Canania RL, Anthony SA, Roncone DP, Sawicki Rogers L, Leber JN, Dougherty JM, Greenberg PB, Sullivan JM, Wu WC, Iyengar SK, Crawford DC, Peachey NS, Cooke Bailey JN. Glaucoma Genetic Risk Scores in the Million Veteran Program. Ophthalmology 2022; 129:1263-1274. [PMID: 35718050 PMCID: PMC9997524 DOI: 10.1016/j.ophtha.2022.06.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 06/07/2022] [Accepted: 06/09/2022] [Indexed: 11/22/2022] Open
Abstract
PURPOSE Primary open-angle glaucoma (POAG) is a degenerative eye disease for which early treatment is critical to mitigate visual impairment and irreversible blindness. POAG-associated loci individually confer incremental risk. Genetic risk score(s) (GRS) could enable POAG risk stratification. Despite significantly higher POAG burden among individuals of African ancestry (AFR), GRS are limited in this population. A recent large-scale, multi-ancestry meta-analysis identified 127 POAG-associated loci and calculated cross-ancestry and ancestry-specific effect estimates, including in European ancestry (EUR) and AFR individuals. We assessed the utility of the 127-variant GRS for POAG risk stratification in EUR and AFR Veterans in the Million Veteran Program (MVP). We also explored the association between GRS and documented invasive glaucoma surgery (IGS). DESIGN Cross-sectional study. PARTICIPANTS MVP Veterans with imputed genetic data, including 5830 POAG cases (445 with IGS documented in the electronic health record) and 64 476 controls. METHODS We tested unweighted and weighted GRS of 127 published risk variants in EUR (3382 cases and 58 811 controls) and AFR (2448 cases and 5665 controls) Veterans in the MVP. Weighted GRS were calculated using effect estimates from the most recently published report of cross-ancestry and ancestry-specific meta-analyses. We also evaluated GRS in POAG cases with documented IGS. MAIN OUTCOME MEASURES Performance of 127-variant GRS in EUR and AFR Veterans for POAG risk stratification and association with documented IGS. RESULTS GRS were significantly associated with POAG (P < 5 × 10-5) in both groups; a higher proportion of EUR compared with AFR were consistently categorized in the top GRS decile (21.9%-23.6% and 12.9%-14.5%, respectively). Only GRS weighted by ancestry-specific effect estimates were associated with IGS documentation in AFR cases; all GRS types were associated with IGS in EUR cases. CONCLUSIONS Varied performance of the GRS for POAG risk stratification and documented IGS association in EUR and AFR Veterans highlights (1) the complex risk architecture of POAG, (2) the importance of diverse representation in genomics studies that inform GRS construction and evaluation, and (3) the necessity of expanding diverse POAG-related genomic data so that GRS can equitably aid in screening individuals at high risk of POAG and who may require more aggressive treatment.
Collapse
Affiliation(s)
- Andrea R Waksmunski
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, Ohio; Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio
| | - Tyler G Kinzy
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, Ohio; Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio; Research Service, VA Northeast Ohio Healthcare System, Cleveland, Ohio
| | - Lauren A Cruz
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, Ohio; Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio
| | - Cari L Nealon
- Eye Clinic, VA Northeast Ohio Healthcare System, Cleveland, Ohio
| | - Christopher W Halladay
- Center of Innovation in Long Term Services and Supports, Providence VA Medical Center, Providence, Rhode Island
| | - Piana Simpson
- Eye Clinic, VA Northeast Ohio Healthcare System, Cleveland, Ohio
| | | | - Scott A Anthony
- Eye Clinic, VA Northeast Ohio Healthcare System, Cleveland, Ohio
| | - David P Roncone
- Eye Clinic, VA Northeast Ohio Healthcare System, Cleveland, Ohio
| | - Lea Sawicki Rogers
- Ophthalmology Section, VA Western NY Healthcare System, Buffalo, New York
| | - Jenna N Leber
- Ophthalmology Section, VA Western NY Healthcare System, Buffalo, New York
| | | | - Paul B Greenberg
- Ophthalmology Section, Providence VA Medical Center, Providence, Rhode Island; Division of Ophthalmology, Alpert Medical School, Brown University, Providence, Rhode Island
| | - Jack M Sullivan
- Ophthalmology Section, VA Western NY Healthcare System, Buffalo, New York; Research Service, VA Western NY Healthcare System, Buffalo, New York
| | - Wen-Chih Wu
- Cardiology Section, Medical Service, Providence VA Medical Center, Providence, Rhode Island
| | - Sudha K Iyengar
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, Ohio; Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio; Research Service, VA Northeast Ohio Healthcare System, Cleveland, Ohio
| | - Dana C Crawford
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, Ohio; Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio; Research Service, VA Northeast Ohio Healthcare System, Cleveland, Ohio
| | - Neal S Peachey
- Research Service, VA Northeast Ohio Healthcare System, Cleveland, Ohio; Cole Eye Institute, Cleveland Clinic Foundation, Cleveland, Ohio; Department of Ophthalmology, Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, Ohio
| | - Jessica N Cooke Bailey
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, Ohio; Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio; Research Service, VA Northeast Ohio Healthcare System, Cleveland, Ohio.
| |
Collapse
|
28
|
Abraham A, Le B, Kosti I, Straub P, Velez-Edwards DR, Davis LK, Newton JM, Muglia LJ, Rokas A, Bejan CA, Sirota M, Capra JA. Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth. BMC Med 2022; 20:333. [PMID: 36167547 PMCID: PMC9516830 DOI: 10.1186/s12916-022-02522-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 08/10/2022] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Identifying pregnancies at risk for preterm birth, one of the leading causes of worldwide infant mortality, has the potential to improve prenatal care. However, we lack broadly applicable methods to accurately predict preterm birth risk. The dense longitudinal information present in electronic health records (EHRs) is enabling scalable and cost-efficient risk modeling of many diseases, but EHR resources have been largely untapped in the study of pregnancy. METHODS Here, we apply machine learning to diverse data from EHRs with 35,282 deliveries to predict singleton preterm birth. RESULTS We find that machine learning models based on billing codes alone can predict preterm birth risk at various gestational ages (e.g., ROC-AUC = 0.75, PR-AUC = 0.40 at 28 weeks of gestation) and outperform comparable models trained using known risk factors (e.g., ROC-AUC = 0.65, PR-AUC = 0.25 at 28 weeks). Examining the patterns learned by the model reveals it stratifies deliveries into interpretable groups, including high-risk preterm birth subtypes enriched for distinct comorbidities. Our machine learning approach also predicts preterm birth subtypes (spontaneous vs. indicated), mode of delivery, and recurrent preterm birth. Finally, we demonstrate the portability of our approach by showing that the prediction models maintain their accuracy on a large, independent cohort (5978 deliveries) from a different healthcare system. CONCLUSIONS By leveraging rich phenotypic and genetic features derived from EHRs, we suggest that machine learning algorithms have great potential to improve medical care during pregnancy. However, further work is needed before these models can be applied in clinical settings.
Collapse
Affiliation(s)
- Abin Abraham
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, 37235, USA
- Vanderbilt University Medical Center, Vanderbilt University, Nashville, TN, 37232, USA
| | - Brian Le
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Idit Kosti
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA
| | - Peter Straub
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, 37235, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Digna R Velez-Edwards
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, 37235, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Obstetrics and Gynecology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Lea K Davis
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, 37235, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Psychiatry and Behavioral Sciences, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - J M Newton
- Department of Obstetrics and Gynecology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Louis J Muglia
- Burroughs-Wellcome Fund, Research Triangle Park, NC, USA
| | - Antonis Rokas
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, USA
| | - Cosmin A Bejan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Marina Sirota
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA
| | - John A Capra
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, 37235, USA.
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Biological Sciences, Vanderbilt University, Nashville, USA.
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, USA.
| |
Collapse
|
29
|
Kuo TT, Jiang X, Tang H, Wang X, Harmanci A, Kim M, Post K, Bu D, Bath T, Kim J, Liu W, Chen H, Ohno-Machado L. The evolving privacy and security concerns for genomic data analysis and sharing as observed from the iDASH competition. J Am Med Inform Assoc 2022; 29:2182-2190. [PMID: 36164820 PMCID: PMC9667175 DOI: 10.1093/jamia/ocac165] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 08/25/2022] [Accepted: 09/13/2022] [Indexed: 01/11/2023] Open
Abstract
Concerns regarding inappropriate leakage of sensitive personal information as well as unauthorized data use are increasing with the growth of genomic data repositories. Therefore, privacy and security of genomic data have become increasingly important and need to be studied. With many proposed protection techniques, their applicability in support of biomedical research should be well understood. For this purpose, we have organized a community effort in the past 8 years through the integrating data for analysis, anonymization and sharing consortium to address this practical challenge. In this article, we summarize our experience from these competitions, report lessons learned from the events in 2020/2021 as examples, and discuss potential future research directions in this emerging field.
Collapse
Affiliation(s)
- Tsung-Ting Kuo
- Corresponding Author: Tsung-Ting Kuo, PhD, UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA;
| | | | | | | | - Arif Harmanci
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Miran Kim
- Department of Mathematics, Hanyang University, Seoul, Republic of Korea,Department of Computer Science, Hanyang University, Seoul, Republic of Korea
| | - Kai Post
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Diyue Bu
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Tyler Bath
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Jihoon Kim
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Weijie Liu
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Hongbo Chen
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA,Division of Health Services Research & Development, Veteran Affairs San Diego Healthcare System, San Diego, California, USA
| |
Collapse
|
30
|
Johnson R, Ding Y, Venkateswaran V, Bhattacharya A, Boulier K, Chiu A, Knyazev S, Schwarz T, Freund M, Zhan L, Burch KS, Caggiano C, Hill B, Rakocz N, Balliu B, Denny CT, Sul JH, Zaitlen N, Arboleda VA, Halperin E, Sankararaman S, Butte MJ, Lajonchere C, Geschwind DH, Pasaniuc B. Leveraging genomic diversity for discovery in an electronic health record linked biobank: the UCLA ATLAS Community Health Initiative. Genome Med 2022; 14:104. [PMID: 36085083 PMCID: PMC9461263 DOI: 10.1186/s13073-022-01106-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 08/03/2022] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Large medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative-an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients (N=36,736). METHODS We quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes. RESULTS We identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals' SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p-value=2.32×10-16, EAA p-value=6.73×10-11). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group. CONCLUSIONS Overall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping.
Collapse
Affiliation(s)
- Ruth Johnson
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| | - Yi Ding
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Vidhya Venkateswaran
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Oral Biology, School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Kristin Boulier
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Medicine, Division of Cardiology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Alec Chiu
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Sergey Knyazev
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Tommer Schwarz
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Malika Freund
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Genetics, Stanford School of Medicine, Stanford, CA, 94305, USA
| | - Lingyu Zhan
- Molecular Biology Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Kathryn S Burch
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Christa Caggiano
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Brian Hill
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Nadav Rakocz
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Brunilda Balliu
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Christopher T Denny
- Division of Hematology/Oncology, Department of Pediatrics, Gwynne Hazen Cherry Memorial Laboratories, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Jae Hoon Sul
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Noah Zaitlen
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Valerie A Arboleda
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Eran Halperin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Anesthesiology and Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Manish J Butte
- Department of Pediatrics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Clara Lajonchere
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| |
Collapse
|
31
|
Freda PJ, Kranzler HR, Moore JH. Novel digital approaches to the assessment of problematic opioid use. BioData Min 2022; 15:14. [PMID: 35840990 PMCID: PMC9284824 DOI: 10.1186/s13040-022-00301-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 06/30/2022] [Indexed: 11/16/2022] Open
Abstract
The opioid epidemic continues to contribute to loss of life through overdose and significant social and economic burdens. Many individuals who develop problematic opioid use (POU) do so after being exposed to prescribed opioid analgesics. Therefore, it is important to accurately identify and classify risk factors for POU. In this review, we discuss the etiology of POU and highlight novel approaches to identifying its risk factors. These approaches include the application of polygenic risk scores (PRS) and diverse machine learning (ML) algorithms used in tandem with data from electronic health records (EHR), clinical notes, patient demographics, and digital footprints. The implementation and synergy of these types of data and approaches can greatly assist in reducing the incidence of POU and opioid-related mortality by increasing the knowledge base of patient-related risk factors, which can help to improve prescribing practices for opioid analgesics.
Collapse
Affiliation(s)
- Philip J Freda
- Cedars-Sinai Medical Center, Department of Computational Biomedicine, 700 N. San Vicente Blvd., Pacific Design Center Suite G540, West Hollywood, CA, 90069, USA.
| | - Henry R Kranzler
- University of Pennsylvania, Center for Studies of Addiction, 3535 Market St., Suite 500 and Crescenz VAMC, 3800 Woodland Ave., Philadelphia, PA, 19104, USA
| | - Jason H Moore
- Cedars-Sinai Medical Center, Department of Computational Biomedicine, 700 N. San Vicente Blvd., Pacific Design Center Suite G540, West Hollywood, CA, 90069, USA
| |
Collapse
|
32
|
Yang A, Rolls ET, Dong G, Du J, Li Y, Feng J, Cheng W, Zhao XM. Longer screen time utilization is associated with the polygenic risk for Attention-deficit/hyperactivity disorder with mediation by brain white matter microstructure. EBioMedicine 2022; 80:104039. [PMID: 35509143 PMCID: PMC9079003 DOI: 10.1016/j.ebiom.2022.104039] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 04/06/2022] [Accepted: 04/14/2022] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND Attention-deficit/hyperactivity disorder (ADHD) has been reported to be associated with longer screen time utilization (STU) at the behavioral level. However, whether there are shared neural links between ADHD symptoms and prolonged STU is not clear and has not been explored in a single large-scale dataset. METHODS Leveraging the genetics, neuroimaging and behavioral data of 11,000+ children aged 9-11 from the Adolescent Brain Cognitive Development cohort, this study investigates the associations between the polygenic risk and trait for ADHD, STU, and white matter microstructure through cross-sectionally and longitudinal analyses. FINDINGS Children with higher polygenic risk scores for ADHD tend to have longer STU and more severe ADHD symptoms. Fractional anisotropy (FA) values in several white matter tracts are negatively correlated with both the ADHD polygenic risk score and STU, including the inferior frontal-striatal tract, inferior frontal-occipital fasciculus, superior longitudinal fasciculus and corpus callosum. Most of these tracts are linked to visual-related functions. Longitudinal analyses indicate a directional effect of white matter microstructure on the ADHD scale, and a bi-directional effect between the ADHD scale and STU. Furthermore, reduction of FA in several white matter tracts mediates the association between the ADHD polygenic risk score and STU. INTERPRETATION These findings shed new light on the shared neural overlaps between ADHD symptoms and prolonged STU, and provide evidence that the polygenic risk for ADHD is related, via white matter microstructure and the ADHD trait, to STU. FUNDING This study was mainly supported by NSFC and National Key R&D Program of China.
Collapse
Affiliation(s)
- Anyi Yang
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Edmund T Rolls
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China; Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China; Department of Computer Science, University of Warwick, Coventry, UK; Oxford Centre for Computational Neuroscience, Oxford, UK
| | - Guiying Dong
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China; Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China
| | - Jingnan Du
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China; Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China
| | - Yuzhu Li
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China; Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China
| | - Jianfeng Feng
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China; Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China; Department of Computer Science, University of Warwick, Coventry, UK; Fudan ISTBI-ZJNU Algorithm Centre for Brain-inspired Intelligence, Zhejiang Normal University, Jinhua, China; MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China; Zhangjiang Fudan International Innovation Center, Shanghai, China
| | - Wei Cheng
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China; Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China; Fudan ISTBI-ZJNU Algorithm Centre for Brain-inspired Intelligence, Zhejiang Normal University, Jinhua, China.
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China; Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China; MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China; Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China.
| |
Collapse
|
33
|
Hasanzad M, Sarhangi N, Naghavi A, Ghavimehr E, Khatami F, Ehsani Chimeh S, Larijani B, Aghaei Meybodi HR. Genomic medicine on the frontier of precision medicine. J Diabetes Metab Disord 2022; 21:853-861. [PMID: 35673457 PMCID: PMC9167337 DOI: 10.1007/s40200-021-00880-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 08/11/2021] [Indexed: 10/20/2022]
Abstract
Genomic medicine has created a great deal of hope since the completion of the Human Genome Project (HGP). Genomic medicine promises disease prevention and early diagnosis in the context of precision medicine. Precision medicine as a scientific discipline has introduced as an evolution in medicine. The rapid growth of high-development technologies permits the assessment of biological systems. Study of the integrated profiles of omics, such as genome, transcriptome, proteome and other omics information lead to significant advances in personalized and precision medicine. In the context of precision medicine, pharmacogenomics can play an important role in order to discriminate responders and non-responders to medications and avoiding toxicity and achieving the optimum dose. So precision medicine in accordance with genomic medicine will transform medicine from conventional evidence-based medicine in the diagnosis and treatment towards precision based-medicine. In this review, we have summarized the related issues for genomic medicine and precision medicine.
Collapse
Affiliation(s)
- Mandana Hasanzad
- Medical Genomics Research Center, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
- Personalized Medicine Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, No.10- Jalal -e-Ale-Ahmad Street, Chamran Highway, 1411713119 Tehran, Iran
| | - Negar Sarhangi
- Personalized Medicine Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, No.10- Jalal -e-Ale-Ahmad Street, Chamran Highway, 1411713119 Tehran, Iran
| | - Anoosh Naghavi
- Cellular and Molecular Research Center, Zahedan University of Medical Sciences, Zahedan, Iran
| | - Ehsan Ghavimehr
- Medical Genomics Research Center, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
| | - Fatemeh Khatami
- Urology Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | | | - Bagher Larijani
- Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Hamid Reza Aghaei Meybodi
- Personalized Medicine Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, No.10- Jalal -e-Ale-Ahmad Street, Chamran Highway, 1411713119 Tehran, Iran
| |
Collapse
|
34
|
Mao Z, Gray ALH, Gross MD, Thyagarajan B, Bostick RM. Associations of DNA Base Excision Repair and Antioxidant Enzyme Genetic Risk Scores with Biomarker of Systemic Inflammation. FRONTIERS IN AGING 2022; 3:897907. [PMID: 36338835 PMCID: PMC9632613 DOI: 10.3389/fragi.2022.897907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 04/14/2022] [Indexed: 06/16/2023]
Abstract
Background: Inflammation is implicated in the etiology of various aging-related diseases. Numerous dietary and lifestyle factors contribute to chronic systemic inflammation; genetic variation may too. However, despite biological plausibility, little is known about associations of antioxidant enzyme (AE) and DNA base excision repair (BER) genotypes with human systemic inflammation. Methods: We genotyped 22 single nucleotide polymorphisms (SNPs) in 3 AE genes, and 79 SNPs in 14 BER genes to develop inflammation-specific AE and BER genetic risk scores (GRS) in two pooled cross-sectional studies (n = 333) of 30-74-year-old White adults without inflammatory bowel disease, familial adenomatous polyposis, or a history of cancer or colorectal adenoma. Of the genotypes, based on their associations with a biomarker of systemic inflammation, circulating high sensitivity C-reactive protein (hsCRP) concentrations, we selected 2 SNPs of 2 genes (CAT and MnSoD) for an AE GRS, and 7 SNPs of 5 genes (MUTYH, SMUG1, TDG, UNG, and XRCC1) for a BER GRS. A higher GRS indicates a higher balance of variant alleles directly associated with hsCRP relative to variant alleles inversely associated with hsCRP. We also calculated previously-reported, validated, questionnaire-based dietary (DIS) and lifestyle (LIS) inflammation scores. We used multivariable general linear regression to compare mean hsCRP concentrations across AE and BER GRS categories, individually and jointly with the DIS and LIS. Results: The mean hsCRP concentrations among those in the highest relative to the lowest AE and BER GRS categories were, proportionately, 13.9% (p = 0.30) and 57.4% (p = 0.009) higher. Neither GRS clearly appeared to modify the associations of the DIS or LIS with hsCRP. Conclusion: Our findings suggest that genotypes of DNA BER genes collectively may be associated with systemic inflammation in humans.
Collapse
Affiliation(s)
- Ziling Mao
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Abigail L. H. Gray
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Myron D. Gross
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minnesota, MN, United States
| | - Bharat Thyagarajan
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minnesota, MN, United States
| | - Roberd M. Bostick
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, United States
- Winship Cancer Institute, Emory University, Atlanta, GA, United States
| |
Collapse
|
35
|
Auwerx C, Sadler MC, Reymond A, Kutalik Z. From pharmacogenetics to pharmaco-omics: Milestones and future directions. HGG ADVANCES 2022; 3:100100. [PMID: 35373152 PMCID: PMC8971318 DOI: 10.1016/j.xhgg.2022.100100] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
The origins of pharmacogenetics date back to the 1950s, when it was established that inter-individual differences in drug response are partially determined by genetic factors. Since then, pharmacogenetics has grown into its own field, motivated by the translation of identified gene-drug interactions into therapeutic applications. Despite numerous challenges ahead, our understanding of the human pharmacogenetic landscape has greatly improved thanks to the integration of tools originating from disciplines as diverse as biochemistry, molecular biology, statistics, and computer sciences. In this review, we discuss past, present, and future developments of pharmacogenetics methodology, focusing on three milestones: how early research established the genetic basis of drug responses, how technological progress made it possible to assess the full extent of pharmacological variants, and how multi-dimensional omics datasets can improve the identification, functional validation, and mechanistic understanding of the interplay between genes and drugs. We outline novel strategies to repurpose and integrate molecular and clinical data originating from biobanks to gain insights analogous to those obtained from randomized controlled trials. Emphasizing the importance of increased diversity, we envision future directions for the field that should pave the way to the clinical implementation of pharmacogenetics.
Collapse
Affiliation(s)
- Chiara Auwerx
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Center for Primary Care and Public Health, Lausanne, Switzerland
| | - Marie C. Sadler
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Center for Primary Care and Public Health, Lausanne, Switzerland
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Zoltán Kutalik
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Center for Primary Care and Public Health, Lausanne, Switzerland
| |
Collapse
|
36
|
Weissbrod O, Kanai M, Shi H, Gazal S, Peyrot WJ, Khera AV, Okada Y, Martin AR, Finucane HK, Price AL. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat Genet 2022; 54:450-458. [PMID: 35393596 PMCID: PMC9009299 DOI: 10.1038/s41588-022-01036-9] [Citation(s) in RCA: 108] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 02/25/2022] [Indexed: 01/25/2023]
Abstract
Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred+, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred+ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred+ attained similar improvements.
Collapse
Affiliation(s)
- Omer Weissbrod
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA.
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Huwenbo Shi
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- OMNI Bioinformatics, San Francisco, CA, USA
| | - Steven Gazal
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Wouter J Peyrot
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- Department of Psychiatry, Amsterdam UMC, Vrije Universiteit, Amsterdam, the Netherlands
| | - Amit V Khera
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Verve Therapeutics, Cambridge, MA, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | | | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Alkes L Price
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
37
|
Association between TAP gene polymorphisms and tuberculosis susceptibility in a Han Chinese population in Guangdong. Mol Genet Genomics 2022; 297:779-790. [PMID: 35325275 PMCID: PMC8943507 DOI: 10.1007/s00438-022-01885-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 03/08/2022] [Indexed: 12/02/2022]
Abstract
Tuberculosis (TB) is an important public health problem. Studies indicated that TAP plays a key role in the presentation and transport of antigenic peptides during anti-M.tb infection. Given the important biological role of the TAP gene involved in anti-M.tb infection, a family-based case–control study including 133 tuberculosis patients, 107 healthy household contacts, and 173 healthy controls was conducted to assess the association between TAP gene polymorphisms and TB susceptibility. The basic information of subjects and their blood samples were collected. Four SNPs including rs1135216, rs1057141, rs241447, and rs3819721 were genotyped by polymerase chain reaction-restriction fragment length polymorphism (PCR–RFLP). Our results suggested that BMI, residence, bedroom crowding, indoor humidity, fitness activities, history of smoking, and TB exposure history were associated with the occurrence of tuberculosis (P < 0.05). A significant association was observed between the TAP1 rs1135216 CT/CC genotype and increased TB risk, and the ORs were 2.56 (95% CI 1.31–4.99) and 6.73 (95% CI 1.33–34.02), respectively. TAP2 rs3819721 GG genotype carriers also showed an increased risk of TB when compared TB patients to healthy household contacts. Haplotype analysis revealed that the haplotype CT at rs1057141 and rs1135216 (OR = 11.34, 95% CI 1.49–86.56; OR = 7.45, 95% CI 1.43–38.76), as well as TA at rs241447 and rs3819721 (OR = 2.20, 95% CI 1.07–4.56) had a significantly increased risk of TB. The genetic risk scores (GRS) analysis of the four loci indicated that the risk of tuberculosis increased with increasing GRS scores in TB vs HHC (Ptrend = 0.010) and in TB vs HC (Ptrend = 0.001). In conclusion, our findings suggested that the SNPs of rs1135216 and rs3819721 were associated with TB susceptibility among the tuberculosis-prone families in the Chinese Han population and the risk of developing tuberculosis increases with the number of risk alleles, which could help identify high-risk groups in time and take scientific preventive measures. Further cohort studies with large samples are needed to validate the role of TAP gene variants on TB susceptibility.
Collapse
|
38
|
The Road Traveled and Journey Ahead for the Genetics and Genomics of Tinnitus. Mol Diagn Ther 2022; 26:129-136. [PMID: 35167110 PMCID: PMC8942952 DOI: 10.1007/s40291-022-00578-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/16/2022] [Indexed: 10/29/2022]
Abstract
The feasibility to unravel genetic and genomic signatures for disorders affecting the auditory system has accelerated since arriving in the post-genomics era roughly 20 years ago. Newly emerging studies have provided initial landmarks signaling heritability and thus, a genetic link, to severe tinnitus. Tinnitus, the phantom perception of ringing in the ears, is experienced by at least 15% of the adult population and can be extremely disabling. Despite its ubiquity, there is no cure for tinnitus and modalities offering relief are often of limited success. Because tinnitus is frequently reported in patients with acquired conductive or sensorineural hearing impairment, it has been widely accepted that tinnitus is secondary to and a symptom arising from hearing impairment. However, tinnitus has also been identified in the absence of auditory dysfunction and in young individuals, resulting in a debate about its origins. Genetics studies have identified severe tinnitus as a complex disorder arising from gene and environment interactions, refining its classification as a neurological disorder and, in at least a subset of patients, it appears not as a symptom of another health issue. This current opinion summarizes several recent studies that have challenged a long-accepted dogma and postulates how this information could eventually be used in the future to help patients. It is with great hope that this knowledge opens translational paths to provide relief for the many who suffer from the burden of tinnitus on a daily basis.
Collapse
|
39
|
Ding Y, Hou K, Burch KS, Lapinska S, Privé F, Vilhjálmsson B, Sankararaman S, Pasaniuc B. Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification. Nat Genet 2022; 54:30-39. [PMID: 34931067 PMCID: PMC8758557 DOI: 10.1038/s41588-021-00961-5] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Accepted: 09/29/2021] [Indexed: 01/05/2023]
Abstract
Although the cohort-level accuracy of polygenic risk scores (PRSs)-estimates of genetic value at the individual level-has been widely assessed, uncertainty in PRSs remains underexplored. In the present study, we show that Bayesian PRS methods can estimate the variance of an individual's PRS and can yield well-calibrated credible intervals via posterior sampling. For 13 real traits in the UK Biobank (n = 291,273 unrelated 'white British'), we observe large variances in individual PRS estimates which impact interpretation of PRS-based stratification; averaging across traits, only 0.8% (s.d. = 1.6%) of individuals with PRS point estimates in the top decile have corresponding 95% credible intervals fully contained in the top decile. We provide an analytical estimator for the expectation of individual PRS variance as a function of SNP heritability, number of causal SNPs and sample size. Our results showcase the importance of incorporating uncertainty in individual PRS estimates into subsequent analyses.
Collapse
Affiliation(s)
- Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA.
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA.
| | - Kathryn S Burch
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
| | - Sandra Lapinska
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
| | - Florian Privé
- Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Bjarni Vilhjálmsson
- Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Sriram Sankararaman
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
- Department of Computer Science, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
| |
Collapse
|
40
|
AIM in Medical Informatics. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_32] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
41
|
Kloeve-Mogensen K, Rohde PD, Twisttmann S, Nygaard M, Koldby KM, Steffensen R, Dahl CM, Rytter D, Overgaard MT, Forman A, Christiansen L, Nyegaard M. Polygenic Risk Score Prediction for Endometriosis. FRONTIERS IN REPRODUCTIVE HEALTH 2021; 3:793226. [PMID: 36303976 PMCID: PMC9580817 DOI: 10.3389/frph.2021.793226] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 11/09/2021] [Indexed: 12/19/2022] Open
Abstract
Endometriosis is a major health care challenge because many young women with endometriosis go undetected for an extended period, which may lead to pain sensitization. Clinical tools to better identify candidates for laparoscopy-guided diagnosis are urgently needed. Since endometriosis has a strong genetic component, there is a growing interest in using genetics as part of the clinical risk assessment. The aim of this work was to investigate the discriminative ability of a polygenic risk score (PRS) for endometriosis using three different cohorts: surgically confirmed cases from the Western Danish endometriosis referral Center (249 cases, 348 controls), cases identified from the Danish Twin Registry (DTR) based on ICD-10 codes from the National Patient Registry (140 cases, 316 controls), and replication analysis in the UK Biobank (2,967 cases, 256,222 controls). Patients with adenomyosis from the DTR (25 cases) and from the UK Biobank (1,883 cases) were included for comparison. The PRS was derived from 14 genetic variants identified in a published genome-wide association study with more than 17,000 cases. The PRS was associated with endometriosis in surgically confirmed cases [odds ratio (OR) = 1.59, p = 2.57× 10−7] and in cases from the DTR biobank (OR = 1.50, p = 0.0001). Combining the two Danish cohorts, each standard deviation increase in PRS was associated with endometriosis (OR = 1.57, p = 2.5× 10−11), as well as the major subtypes of endometriosis; ovarian (OR = 1.72, p = 6.7× 10−5), infiltrating (OR = 1.66, p = 2.7× 10−9), and peritoneal (OR = 1.51, p = 2.6 × 10−3). These findings were replicated in the UK Biobank with a much larger sample size (OR = 1.28, p < 2.2× 10−16). The PRS was not associated with adenomyosis, suggesting that adenomyosis is not driven by the same genetic risk variants as endometriosis. Our results suggest that a PRS captures an increased risk of all types of endometriosis rather than an increased risk for endometriosis in specific locations. Although the discriminative accuracy is not yet sufficient as a stand-alone clinical utility, our data demonstrate that genetics risk variants in form of a simple PRS may add significant new discriminatory value. We suggest that an endometriosis PRS in combination with classical clinical risk factors and symptoms could be an important step in developing an urgently needed endometriosis risk stratification tool.
Collapse
Affiliation(s)
- Kirstine Kloeve-Mogensen
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
- Department of Clinical Immunology, Aalborg University Hospital, Aalborg, Denmark
| | - Palle Duun Rohde
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| | - Simone Twisttmann
- The Danish Twin Registry, Department of Public Health, University of Southern Denmark, Odense, Denmark
| | - Marianne Nygaard
- The Danish Twin Registry, Department of Public Health, University of Southern Denmark, Odense, Denmark
| | | | - Rudi Steffensen
- Department of Clinical Immunology, Aalborg University Hospital, Aalborg, Denmark
| | - Christian Møller Dahl
- Department of Business and Economics, University of Southern Denmark, Odense, Denmark
| | - Dorte Rytter
- Research Unit for Epidemiology, Department of Public Health, Aarhus University, Aarhus, Denmark
| | | | - Axel Forman
- Department of Gynecology and Obstetrics, Aarhus University Hospital, Skejby, Denmark
| | - Lene Christiansen
- The Danish Twin Registry, Department of Public Health, University of Southern Denmark, Odense, Denmark
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Mette Nyegaard
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
- *Correspondence: Mette Nyegaard
| |
Collapse
|
42
|
Colbran LL, Johnson MR, Mathieson I, Capra JA. Tracing the Evolution of Human Gene Regulation and Its Association with Shifts in Environment. Genome Biol Evol 2021; 13:evab237. [PMID: 34718543 PMCID: PMC8576593 DOI: 10.1093/gbe/evab237] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/16/2021] [Indexed: 12/16/2022] Open
Abstract
As humans populated the world, they adapted to many varying environmental factors, including climate, diet, and pathogens. Because many of these adaptations were mediated by multiple noncoding variants with small effects on gene regulation, it has been difficult to link genomic signals of selection to specific genes, and to describe the regulatory response to selection. To overcome this challenge, we adapted PrediXcan, a machine learning method for imputing gene regulation from genotype data, to analyze low-coverage ancient human DNA (aDNA). First, we used simulated genomes to benchmark strategies for adapting PrediXcan to increase robustness to incomplete data. Applying the resulting models to 490 ancient Eurasians, we found that genes with the strongest divergent regulation among ancient populations with hunter-gatherer, pastoralist, and agricultural lifestyles are enriched for metabolic and immune functions. Next, we explored the contribution of divergent gene regulation to two traits with strong evidence of recent adaptation: dietary metabolism and skin pigmentation. We found enrichment for divergent regulation among genes proposed to be involved in diet-related local adaptation, and the predicted effects on regulation often suggest explanations for known signals of selection, for example, at FADS1, GPX1, and LEPR. In contrast, skin pigmentation genes show little regulatory change over a 38,000-year time series of 2,999 ancient Europeans, suggesting that adaptation mainly involved large-effect coding variants. This work demonstrates that combining aDNA with present-day genomes is informative about the biological differences among ancient populations, the role of gene regulation in adaptation, and the relationship between genetic diversity and complex traits.
Collapse
Affiliation(s)
- Laura L Colbran
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, USA
| | - Maya R Johnson
- School for Science and Math at Vanderbilt, Vanderbilt University, USA
- Department of Computer Science, Bryn Mawr College, Pennsylvania, USA
| | - Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, USA
| | - John A Capra
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, USA
- Department of Biological Sciences, Vanderbilt University, USA
- Department of Biomedical Informatics, Vanderbilt University, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, USA
| |
Collapse
|
43
|
Jonnagaddala J, Chen A, Batongbacal S, Nekkantti C. The OpenDeID corpus for patient de-identification. Sci Rep 2021; 11:19973. [PMID: 34620985 PMCID: PMC8497517 DOI: 10.1038/s41598-021-99554-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 09/28/2021] [Indexed: 11/18/2022] Open
Abstract
For research purposes, protected health information is often redacted from unstructured electronic health records to preserve patient privacy and confidentiality. The OpenDeID corpus is designed to assist development of automatic methods to redact sensitive information from unstructured electronic health records. We retrieved 4548 unstructured surgical pathology reports from four urban Australian hospitals. The corpus was developed by two annotators under three different experimental settings. The quality of the annotations was evaluated for each setting. Specifically, we employed serial annotations, parallel annotations, and pre-annotations. Our results suggest that the pre-annotations approach is not reliable in terms of quality when compared to the serial annotations but can drastically reduce annotation time. The OpenDeID corpus comprises 2,100 pathology reports from 1,833 cancer patients with an average of 737.49 tokens and 7.35 protected health information entities annotated per report. The overall inter annotator agreement and deviation scores are 0.9464 and 0.9726, respectively. Realistic surrogates are also generated to make the corpus suitable for distribution to other researchers.
Collapse
Affiliation(s)
| | - Aipeng Chen
- School of Computer Science and Engineering, UNSW Sydney, Sydney, Australia
| | - Sean Batongbacal
- School of Computer Science and Engineering, UNSW Sydney, Sydney, Australia
| | | |
Collapse
|
44
|
DiBlasi E, Kang J, Docherty AR. Genetic contributions to suicidal thoughts and behaviors. Psychol Med 2021; 51:2148-2155. [PMID: 34030748 PMCID: PMC8477225 DOI: 10.1017/s0033291721001720] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 03/28/2021] [Accepted: 04/19/2021] [Indexed: 12/27/2022]
Abstract
Suicidal ideation, suicide attempt (SA) and suicide are significantly heritable phenotypes. However, the extent to which these phenotypes share genetic architecture is unclear. This question is of great relevance to determining key risk factors for suicide, and to alleviate the societal burden of suicidal thoughts and behaviors (STBs). To help address the question of heterogeneity, consortia efforts have recently shifted from a focus on suicide within the context of major psychopathology (e.g. major depressive disorder, schizophrenia) to suicide as an independent entity. Recent molecular studies of suicide risk by members of the Psychiatric Genomics Consortium and the International Suicide Genetics Consortium have identified genome-wide significant loci associated with SA and with suicide death, and have examined these phenotypes within and outside of the context of major psychopathology. This review summarizes important insights from epidemiological and biometrical research on suicide, and discusses key empirical findings from molecular genetic examinations of STBs. Polygenic risk scores for these phenotypes have been observed to be associated with case-control status and other risk phenotypes. In addition, estimated shared genetic covariance with other phenotypes suggests specific medical and psychiatric risks beyond major depressive disorder. Broadly, molecular studies suggest a complexity of suicide etiology that cannot simply be accounted for by depression. Discussion of the state of suicide genetics, a growing field, also includes important ethical and clinical implications of studying the genetic risk of suicide.
Collapse
Affiliation(s)
- Emily DiBlasi
- Department of Psychiatry & the Center for Genomic Medicine, University of Utah School of Medicine, Salt Lake City, UT, USA
- Huntsman Mental Health Institute, Salt Lake City, UT, USA
| | - Jooeun Kang
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Anna R. Docherty
- Department of Psychiatry & the Center for Genomic Medicine, University of Utah School of Medicine, Salt Lake City, UT, USA
- Huntsman Mental Health Institute, Salt Lake City, UT, USA
- Virginia Institute for Psychiatric & Behavioral Genetics, Virginia Commonwealth University School of Medicine, Richmond, VA, USA
| |
Collapse
|
45
|
Lin C, Lee YT, Wu FJ, Lin SA, Hsu CJ, Lee CC, Tsai DJ, Fang WH. The Application of Projection Word Embeddings on Medical Records Scoring System. Healthcare (Basel) 2021; 9:healthcare9101298. [PMID: 34682978 PMCID: PMC8544381 DOI: 10.3390/healthcare9101298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/24/2021] [Accepted: 09/28/2021] [Indexed: 11/16/2022] Open
Abstract
Medical records scoring is important in a health care system. Artificial intelligence (AI) with projection word embeddings has been validated in its performance disease coding tasks, which maintain the vocabulary diversity of open internet databases and the medical terminology understanding of electronic health records (EHRs). We considered that an AI-enhanced system might be also applied to automatically score medical records. This study aimed to develop a series of deep learning models (DLMs) and validated their performance in medical records scoring task. We also analyzed the practical value of the best model. We used the admission medical records from the Tri-Services General Hospital during January 2016 to May 2020, which were scored by our visiting staffs with different levels from different departments. The medical records were scored ranged 0 to 10. All samples were divided into a training set (n = 74,959) and testing set (n = 152,730) based on time, which were used to train and validate the DLMs, respectively. The mean absolute error (MAE) was used to evaluate each DLM performance. In original AI medical record scoring, the predicted score by BERT architecture is closer to the actual reviewer score than the projection word embedding and LSTM architecture. The original MAE is 0.84 ± 0.27 using the BERT model, and the MAE is 1.00 ± 0.32 using the LSTM model. Linear mixed model can be used to improve the model performance, and the adjusted predicted score was closer compared to the original score. However, the project word embedding with the LSTM model (0.66 ± 0.39) provided better performance compared to BERT (0.70 ± 0.33) after linear mixed model enhancement (p < 0.001). In addition to comparing different architectures to score the medical records, this study further uses a mixed linear model to successfully adjust the AI medical record score to make it closer to the actual physician's score.
Collapse
Affiliation(s)
- Chin Lin
- School of Medicine, National Defense Medical Center, Taipei 114, Taiwan;
- School of Public Health, National Defense Medical Center, Taipei 114, Taiwan
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei 114, Taiwan
- Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
| | - Yung-Tsai Lee
- Division of Cardiovascular Surgery, Cheng Hsin Rehabilitation and Medical Center, Taipei 112, Taiwan;
| | - Feng-Jen Wu
- Department of Informatics, Taoyuan Armed Forces General Hospital, Taoyuan 325, Taiwan;
| | - Shing-An Lin
- Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.)
| | - Chia-Jung Hsu
- Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.)
| | - Chia-Cheng Lee
- Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.)
- Division of Colorectal Surgery, Department of Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
| | - Dung-Jang Tsai
- School of Public Health, National Defense Medical Center, Taipei 114, Taiwan
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei 114, Taiwan
- Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
- Correspondence: (D.-J.T.); (W.-H.F.); Tel.: +886-2-8792-3100 (ext. #18305) (D.-J.T.); +886-2-8792-3100 (ext. #12322) (W.-H.F.); Fax: +886-2-8792-3147 (D.-J.T. & W.-H.F.)
| | - Wen-Hui Fang
- Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
- Department of Family and Community Medicine, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
- Correspondence: (D.-J.T.); (W.-H.F.); Tel.: +886-2-8792-3100 (ext. #18305) (D.-J.T.); +886-2-8792-3100 (ext. #12322) (W.-H.F.); Fax: +886-2-8792-3147 (D.-J.T. & W.-H.F.)
| |
Collapse
|
46
|
Kim DS, Gloyn AL, Knowles JW. Genetics of Type 2 Diabetes: Opportunities for Precision Medicine: JACC Focus Seminar. J Am Coll Cardiol 2021; 78:496-512. [PMID: 34325839 PMCID: PMC8328195 DOI: 10.1016/j.jacc.2021.03.346] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 03/14/2021] [Accepted: 03/16/2021] [Indexed: 12/30/2022]
Abstract
Type 2 diabetes (T2D) is highly prevalent and is a strong contributor for cardiovascular disease. However, there is significant heterogeneity in disease pathogenesis and the risk of complications. Enormous progress has been made in our ability to catalog genetic variation associated with T2D risk and variation in disease-relevant quantitative traits. These discoveries hold the potential to shed light on tractable targets and pathways for safe and effective therapeutic development, but the promise of precision medicine has been slow to be realized. Recent studies have identified subgroups of individuals with differential risk for intermediate phenotypes (eg, lipid levels, fasting insulin, body mass index) that contribute to T2D risk, helping to account for the observed clinical heterogeneity. These "partitioned genetic risk scores" not only have the potential to identify patients at greatest risk of cardiovascular disease and rapid disease progression, but also could aid patient stratification bridging the gap toward precision medicine for T2D.
Collapse
Affiliation(s)
- Daniel Seung Kim
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - Anna L Gloyn
- Division of Endocrinology, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, USA; Stanford Diabetes Research Center, Stanford University, Stanford, California, USA
| | - Joshua W Knowles
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, California, USA; Stanford Diabetes Research Center, Stanford University, Stanford, California, USA; Stanford Cardiovascular Institute, Stanford University, Stanford, California, USA.
| |
Collapse
|
47
|
Liu C, Zeinomar N, Chung WK, Kiryluk K, Gharavi AG, Hripcsak G, Crew KD, Shang N, Khan A, Fasel D, Manolio TA, Jarvik GP, Rowley R, Justice AE, Rahm AK, Fullerton SM, Smoller JW, Larson EB, Crane PK, Dikilitas O, Wiesner GL, Bick AG, Terry MB, Weng C. Generalizability of Polygenic Risk Scores for Breast Cancer Among Women With European, African, and Latinx Ancestry. JAMA Netw Open 2021; 4:e2119084. [PMID: 34347061 PMCID: PMC8339934 DOI: 10.1001/jamanetworkopen.2021.19084] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
IMPORTANCE Multiple polygenic risk scores (PRSs) for breast cancer have been developed from large research consortia; however, their generalizability to diverse clinical settings is unknown. OBJECTIVE To examine the performance of previously developed breast cancer PRSs in a clinical setting for women of European, African, and Latinx ancestry. DESIGN, SETTING, AND PARTICIPANTS This cohort study using the Electronic Medical Records and Genomics (eMERGE) network data set included 39 591 women from 9 contributing medical centers in the US that had electronic medical records (EMR) linked to genotype data. Breast cancer cases and controls were identified through a validated EMR phenotyping algorithm. MAIN OUTCOMES AND MEASURES Multivariable logistic regression was used to assess the association between breast cancer risk and 7 previously developed PRSs, adjusting for age, study site, breast cancer family history, and first 3 ancestry informative principal components. RESULTS This study included 39 591 women: 33 594 with European, 3801 with African, and 2196 with Latinx ancestry. The mean (SD) age at breast cancer diagnosis was 60.7 (13.0), 58.8 (12.5), and 60.1 (13.0) years for women with European, African, and Latinx ancestry, respectively. PRSs derived from women with European ancestry were associated with breast cancer risk in women with European ancestry (highest odds ratio [OR] per 1-SD increase, 1.46; 95% CI, 1.41-1.51), women with Latinx ancestry (highest OR, 1.31; 95% CI, 1.09-1.58), and women with African ancestry (OR, 1.19; 95% CI, 1.05-1.35). For women with European ancestry, this association with breast cancer risk was largest in the extremes of the PRS distribution, with ORs ranging from 2.19 (95% CI, 1.84-2.53) to 2.48 (95% CI, 1.89-3.25) for the 3 different PRSs examined for those in the highest 1% of the PRS compared with those in the middle quantile. Among women with Latinx and African ancestries at the extremes of the PRS distribution, there were no statistically significant associations. CONCLUSIONS AND RELEVANCE This cohort study found that PRS models derived from women with European ancestry for breast cancer risk generalized well for women with European, Latinx, and African ancestries across different clinical settings, although the effect sizes for women with African ancestry were smaller, likely because of differences in risk allele frequencies and linkage disequilibrium patterns. These results highlight the need to improve representation of diverse population groups, particularly women with African ancestry, in genomic research cohorts.
Collapse
Affiliation(s)
- Cong Liu
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York
| | - Nur Zeinomar
- Department of Epidemiology, Columbia University Irving Medical Center, New York, New York
- Division of Medical Oncology, Rutgers Cancer Institute of New Jersey, Robert Wood Johnson Medical School, New Brunswick, New Jersey
| | - Wendy K. Chung
- Department of Pediatrics, Columbia University Irving Medical Center, New York, New York
| | - Krzysztof Kiryluk
- Department of Medicine, Columbia University Irving Medical Center, New York, New York
| | - Ali G. Gharavi
- Department of Medicine, Columbia University Irving Medical Center, New York, New York
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York
| | - Katherine D. Crew
- Department of Medicine, Columbia University Irving Medical Center, New York, New York
| | - Ning Shang
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York
| | - Atlas Khan
- Department of Medicine, Columbia University Irving Medical Center, New York, New York
| | - David Fasel
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York
| | - Teri A. Manolio
- National Human Genome Research Institute, Bethesda, Maryland
| | - Gail P. Jarvik
- Department of Medicine, University of Washington, Seattle
| | - Robb Rowley
- National Human Genome Research Institute, Bethesda, Maryland
| | - Ann E. Justice
- Department of Population Health Sciences, Geisinger, Danville, Pennsylvania
| | - Alanna K. Rahm
- Genomic Medicine Institute, Geisinger, Danville, Pennsylvania
| | | | - Jordan W. Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | - Eric B. Larson
- Kaiser Permanente Washington Health Research Institute, Seattle, Washington
| | - Paul K. Crane
- Department of Medicine, University of Washington, Seattle
| | - Ozan Dikilitas
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota
| | - Georgia L. Wiesner
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Alexander G. Bick
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Mary Beth Terry
- Department of Epidemiology, Columbia University Irving Medical Center, New York, New York
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York
| |
Collapse
|
48
|
Slunecka JL, van der Zee MD, Beck JJ, Johnson BN, Finnicum CT, Pool R, Hottenga JJ, de Geus EJC, Ehli EA. Implementation and implications for polygenic risk scores in healthcare. Hum Genomics 2021; 15:46. [PMID: 34284826 PMCID: PMC8290135 DOI: 10.1186/s40246-021-00339-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 06/11/2021] [Indexed: 12/15/2022] Open
Abstract
Increasing amounts of genetic data have led to the development of polygenic risk scores (PRSs) for a variety of diseases. These scores, built from the summary statistics of genome-wide association studies (GWASs), are able to stratify individuals based on their genetic risk of developing various common diseases and could potentially be used to optimize the use of screening and preventative treatments and improve personalized care for patients. Many challenges are yet to be overcome, including PRS validation, healthcare professional and patient education, and healthcare systems integration. Ethical challenges are also present in how this information is used and the current lack of diverse populations with PRSs available. In this review, we discuss the topics above and cover the nature of PRSs, visualization schemes, and how PRSs can be improved. With these tools on the horizon for multiple diseases, scientists, clinicians, health systems, regulatory bodies, and the public should discuss the uses, benefits, and potential risks of PRSs.
Collapse
Affiliation(s)
- John L Slunecka
- Avera Institute for Human Genetics, Avera McKennan & University Health Center, Sioux Falls, SD, USA.
| | - Matthijs D van der Zee
- Department of Biological Psychology, Netherlands Twin Register, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Jeffrey J Beck
- Avera Institute for Human Genetics, Avera McKennan & University Health Center, Sioux Falls, SD, USA
| | - Brandon N Johnson
- Avera Institute for Human Genetics, Avera McKennan & University Health Center, Sioux Falls, SD, USA
| | - Casey T Finnicum
- Avera Institute for Human Genetics, Avera McKennan & University Health Center, Sioux Falls, SD, USA
| | - René Pool
- Department of Biological Psychology, Netherlands Twin Register, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Jouke-Jan Hottenga
- Department of Biological Psychology, Netherlands Twin Register, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Eco J C de Geus
- Department of Biological Psychology, Netherlands Twin Register, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Erik A Ehli
- Avera Institute for Human Genetics, Avera McKennan & University Health Center, Sioux Falls, SD, USA
| |
Collapse
|
49
|
Muse ED, Chen SF, Torkamani A. Monogenic and Polygenic Models of Coronary Artery Disease. Curr Cardiol Rep 2021; 23:107. [PMID: 34196841 DOI: 10.1007/s11886-021-01540-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/17/2021] [Indexed: 12/14/2022]
Abstract
PURPOSE OF THE REVIEW Coronary artery disease (CAD) is a common disease globally attributable to the interplay of complex genetic and lifestyle factors. Here, we review how genomic sequencing advances have broadened the fundamental understanding of the monogenic and polygenic contributions to CAD and how these insights can be utilized, in part by creating polygenic risk estimates, for improved disease risk stratification at the individual patient level. RECENT FINDINGS Initial studies linking premature CAD with rare familial cases of elevated blood lipids highlighted high-risk monogenic contributions, predominantly presenting as familial hypercholesterolemia (FH). More commonly CAD genetic risk is a function of multiple, higher frequency variants each imparting lower magnitude of risk, which can be combined to form polygenic risk scores (PRS) conveying significant risk to individuals at the extremes. However, gaps remain in clinical validation of PRSs, most notably in non-European populations. With improved and more broadly utilized genomic sequencing technologies, the genetic underpinnings of coronary artery disease are being unraveled. As a result, polygenic risk estimation is poised to become a widely used and powerful tool in the clinical setting. While the use of PRSs to augment current clinical risk stratification for optimization of cardiovascular disease risk by lifestyle change or therapeutic targeting is promising, we await adequately powered, prospective studies, demonstrating the clinical utility of polygenic risk estimation in practice.
Collapse
Affiliation(s)
- Evan D Muse
- Scripps Research Translational Institute, Scripps Research, 3344 N Torrey Pines Court, Suite 300, La Jolla, CA, 92037, USA.,Division of Cardiovascular Diseases, Scripps Clinic, La Jolla, CA, 92037, USA
| | - Shang-Fu Chen
- Scripps Research Translational Institute, Scripps Research, 3344 N Torrey Pines Court, Suite 300, La Jolla, CA, 92037, USA
| | - Ali Torkamani
- Scripps Research Translational Institute, Scripps Research, 3344 N Torrey Pines Court, Suite 300, La Jolla, CA, 92037, USA.
| |
Collapse
|
50
|
Hartl D, de Luca V, Kostikova A, Laramie J, Kennedy S, Ferrero E, Siegel R, Fink M, Ahmed S, Millholland J, Schuhmacher A, Hinder M, Piali L, Roth A. Translational precision medicine: an industry perspective. J Transl Med 2021; 19:245. [PMID: 34090480 PMCID: PMC8179706 DOI: 10.1186/s12967-021-02910-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 05/25/2021] [Indexed: 02/08/2023] Open
Abstract
In the era of precision medicine, digital technologies and artificial intelligence, drug discovery and development face unprecedented opportunities for product and business model innovation, fundamentally changing the traditional approach of how drugs are discovered, developed and marketed. Critical to this transformation is the adoption of new technologies in the drug development process, catalyzing the transition from serendipity-driven to data-driven medicine. This paradigm shift comes with a need for both translation and precision, leading to a modern Translational Precision Medicine approach to drug discovery and development. Key components of Translational Precision Medicine are multi-omics profiling, digital biomarkers, model-based data integration, artificial intelligence, biomarker-guided trial designs and patient-centric companion diagnostics. In this review, we summarize and critically discuss the potential and challenges of Translational Precision Medicine from a cross-industry perspective.
Collapse
Affiliation(s)
- Dominik Hartl
- Novartis Institutes for BioMedical Research, Basel, Switzerland.
- Department of Pediatrics I, University of Tübingen, Tübingen, Germany.
| | - Valeria de Luca
- Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Anna Kostikova
- Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Jason Laramie
- Novartis Institutes for BioMedical Research, Cambridge, MA, USA
| | - Scott Kennedy
- Novartis Institutes for BioMedical Research, Cambridge, MA, USA
| | - Enrico Ferrero
- Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Richard Siegel
- Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Martin Fink
- Novartis Institutes for BioMedical Research, Basel, Switzerland
| | | | | | | | - Markus Hinder
- Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Luca Piali
- Roche Innovation Center Basel, Basel, Switzerland
| | - Adrian Roth
- Roche Innovation Center Basel, Basel, Switzerland
| |
Collapse
|