1
|
Drouet DE, Liu S, Crawford DC. Assessment of multi-population polygenic risk scores for lipid traits in African Americans. PeerJ 2023; 11:e14910. [PMID: 37214096 PMCID: PMC10198155 DOI: 10.7717/peerj.14910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/25/2023] [Indexed: 05/24/2023] Open
Abstract
Polygenic risk scores (PRS) based on genome-wide discoveries are promising predictors or classifiers of disease development, severity, and/or progression for common clinical outcomes. A major limitation of most risk scores is the paucity of genome-wide discoveries in diverse populations, prompting an emphasis to generate these needed data for trans-population and population-specific PRS construction. Given diverse genome-wide discoveries are just now being completed, there has been little opportunity for PRS to be evaluated in diverse populations independent from the discovery efforts. To fill this gap, we leverage here summary data from a recent genome-wide discovery study of lipid traits (HDL-C, LDL-C, triglycerides, and total cholesterol) conducted in diverse populations represented by African Americans, Hispanics, Asians, Native Hawaiians, Native Americans, and others by the Population Architecture using Genomics and Epidemiology (PAGE) Study. We constructed lipid trait PRS using PAGE Study published genetic variants and weights in an independent African American adult patient population linked to de-identified electronic health records and genotypes from the Illumina Metabochip (n = 3,254). Using multi-population lipid trait PRS, we assessed levels of association for their respective lipid traits, clinical outcomes (cardiovascular disease and type 2 diabetes), and common clinical labs. While none of the multi-population PRS were strongly associated with the tested trait or outcome, PRSLDL-Cwas nominally associated with cardiovascular disease. These data demonstrate the complexity in applying PRS to real-world clinical data even when data from multiple populations are available.
Collapse
Affiliation(s)
- Domenica E. Drouet
- Department of Medicine, Case Western Reserve University, Cleveland, OH, United States of America
| | - Shiying Liu
- Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, United States of America
| | - Dana C. Crawford
- Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, United States of America
- Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States of America
- Genetics and Genome Sciences, Case Western Reserve University, Cleveland, OH, United States of America
| |
Collapse
|
2
|
Zhang Z, Yan C, Mesa DA, Sun J, Malin BA. Ensuring electronic medical record simulation through better training, modeling, and evaluation. J Am Med Inform Assoc 2021; 27:99-108. [PMID: 31592533 DOI: 10.1093/jamia/ocz161] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 07/29/2019] [Accepted: 08/15/2019] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVE Electronic medical records (EMRs) can support medical research and discovery, but privacy risks limit the sharing of such data on a wide scale. Various approaches have been developed to mitigate risk, including record simulation via generative adversarial networks (GANs). While showing promise in certain application domains, GANs lack a principled approach for EMR data that induces subpar simulation. In this article, we improve EMR simulation through a novel pipeline that (1) enhances the learning model, (2) incorporates evaluation criteria for data utility that informs learning, and (3) refines the training process. MATERIALS AND METHODS We propose a new electronic health record generator using a GAN with a Wasserstein divergence and layer normalization techniques. We designed 2 utility measures to characterize similarity in the structural properties of real and simulated EMRs in the original and latent space, respectively. We applied a filtering strategy to enhance GAN training for low-prevalence clinical concepts. We evaluated the new and existing GANs with utility and privacy measures (membership and disclosure attacks) using billing codes from over 1 million EMRs at Vanderbilt University Medical Center. RESULTS The proposed model outperformed the state-of-the-art approaches with significant improvement in retaining the nature of real records, including prediction performance and structural properties, without sacrificing privacy. Additionally, the filtering strategy achieved higher utility when the EMR training dataset was small. CONCLUSIONS These findings illustrate that EMR simulation through GANs can be substantially improved through more appropriate training, modeling, and evaluation criteria.
Collapse
Affiliation(s)
- Ziqi Zhang
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA
| | - Chao Yan
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA
| | - Diego A Mesa
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jimeng Sun
- College of Computing, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Bradley A Malin
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA.,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
3
|
Martin-Sanchez FJ, Aguiar-Pulido V, Lopez-Campos GH, Peek N, Sacchi L. Secondary Use and Analysis of Big Data Collected for Patient Care. Yearb Med Inform 2017; 26:28-37. [PMID: 28480474 PMCID: PMC6239231 DOI: 10.15265/iy-2017-008] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Objectives: To identify common methodological challenges and review relevant initiatives related to the re-use of patient data collected in routine clinical care, as well as to analyze the economic benefits derived from the secondary use of this data. Through the use of several examples, this article aims to provide a glimpse into the different areas of application, namely clinical research, genomic research, study of environmental factors, and population and health services research. This paper describes some of the informatics methods and Big Data resources developed in this context, such as electronic phenotyping, clinical research networks, biorepositories, screening data banks, and wide association studies. Lastly, some of the potential limitations of these approaches are discussed, focusing on confounding factors and data quality. Methods: A series of literature searches in main bibliographic databases have been conducted in order to assess the extent to which existing patient data has been repurposed for research. This contribution from the IMIA working group on "Data mining and Big Data analytics" focuses on the literature published during the last two years, covering the timeframe since the working group's last survey. Results and Conclusions: Although most of the examples of secondary use of patient data lie in the arena of clinical and health services research, we have started to witness other important applications, particularly in the area of genomic research and the study of health effects of environmental factors. Further research is needed to characterize the economic impact of secondary use across the broad spectrum of translational research.
Collapse
Affiliation(s)
- F. J. Martin-Sanchez
- Weill Cornell Medicine, Department of Healthcare Policy and Research, Division of Health Informatics, New York, USA
| | - V. Aguiar-Pulido
- Weill Cornell Medicine, Brain and Mind Research Institute, New York, USA
| | - G. H. Lopez-Campos
- The University of Melbourne, Health & Biomedical Informatics Centre, Melbourne, Australia
| | - N. Peek
- MRC Health e-Research Centre, Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, UK
| | - L. Sacchi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| |
Collapse
|
4
|
Wang Z, Xu K, Zhang X, Wu X, Wang Z. Longitudinal SNP-set association analysis of quantitative phenotypes. Genet Epidemiol 2017; 41:81-93. [PMID: 27859628 PMCID: PMC5154867 DOI: 10.1002/gepi.22016] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 08/10/2016] [Accepted: 09/19/2016] [Indexed: 02/06/2023]
Abstract
Many genetic epidemiological studies collect repeated measurements over time. This design not only provides a more accurate assessment of disease condition, but allows us to explore the genetic influence on disease development and progression. Thus, it is of great interest to study the longitudinal contribution of genes to disease susceptibility. Most association testing methods for longitudinal phenotypes are developed for single variant, and may have limited power to detect association, especially for variants with low minor allele frequency. We propose Longitudinal SNP-set/sequence kernel association test (LSKAT), a robust, mixed-effects method for association testing of rare and common variants with longitudinal quantitative phenotypes. LSKAT uses several random effects to account for the within-subject correlation in longitudinal data, and allows for adjustment for both static and time-varying covariates. We also present a longitudinal trait burden test (LBT), where we test association between the trait and the burden score in linear mixed models. In simulation studies, we demonstrate that LBT achieves high power when variants are almost all deleterious or all protective, while LSKAT performs well in a wide range of genetic models. By making full use of trait values from repeated measures, LSKAT is more powerful than several tests applied to a single measurement or average over all time points. Moreover, LSKAT is robust to misspecification of the covariance structure. We apply the LSKAT and LBT methods to detect association with longitudinally measured body mass index in the Framingham Heart Study, where we are able to replicate association with a circadian gene NR1D2.
Collapse
Affiliation(s)
- Zhong Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Baker Institute for Animal Health, Cornell University, Ithaca, New York, United States of America
- Center for Computational Biology, Beijing Forestry University, Beijing, China
| | - Ke Xu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America
- VA Connecticut Healthcare System, West Haven, Connecticut, United States of America
| | - Xinyu Zhang
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America
- VA Connecticut Healthcare System, West Haven, Connecticut, United States of America
| | - Xiaowei Wu
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| |
Collapse
|
5
|
Abstract
OBJECTIVES To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2015. METHOD A bibliographic search using a combination of MeSH and free terms search over PubMed on Clinical Research Informatics (CRI) was performed followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. A consensus meeting between the two section editors and the editorial team was finally organized to conclude on the selection of best papers. RESULTS Among the 579 returned papers published in the past year in the various areas of Clinical Research Informatics (CRI) - i) methods supporting clinical research, ii) data sharing and interoperability, iii) re-use of healthcare data for research, iv) patient recruitment and engagement, v) data privacy, security and regulatory issues and vi) policy and perspectives - the full review process selected four best papers. The first selected paper evaluates the capability of the Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM) to support the representation of case report forms (in both the design stage and with patient level data) during a complete clinical study lifecycle. The second selected paper describes a prototype for secondary use of electronic health records data captured in non-standardized text. The third selected paper presents a privacy preserving electronic health record linkage tool and the last selected paper describes how big data use in US relies on access to health information governed by varying and often misunderstood legal requirements and ethical considerations. CONCLUSIONS A major trend in the 2015 publications is the analysis of observational, "nonexperimental" information and the potential biases and confounding factors hidden in the data that will have to be carefully taken into account to validate new predictive models. In addiction, researchers have to understand complicated and sometimes contradictory legal requirements and to consider ethical obligations in order to balance privacy and promoting discovery.
Collapse
Affiliation(s)
- C Daniel
- Christel Daniel, MD, PhD, INSERM UMRS 1142 - WIND-DSI, - Assistance Publique - Hôpitaux de Paris, 05 rue Santerre, 75 012 Paris, France, Tel: +33 1 48 04 20 29, E-mail: christel.daniel@ aphp.fr
| | | |
Collapse
|
6
|
Laper SM, Restrepo NA, Crawford DC. THE CHALLENGES IN USING ELECTRONIC HEALTH RECORDS FOR PHARMACOGENOMICS AND PRECISION MEDICINE RESEARCH. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:369-80. [PMID: 26776201 PMCID: PMC4720980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Access and utilization of electronic health records with extensive medication lists and genetic profiles is rapidly advancing discoveries in pharmacogenomics. In this study, we analyzed ~116,000 variants on the Illumina Metabochip for response to antihypertensive and lipid lowering medications in African American adults from BioVU, the Vanderbilt University Medical Center's biorepository linked to de-identified electronic health records. Our study population included individuals who were prescribed an antihypertensive or lipid lowering medication, and who had both pre- and post-medication blood pressure or low-density lipoprotein cholesterol (LDL-C) measurements, respectively. Among those with pre- and post-medication systolic and diastolic blood pressure measurements (n=2,268), the average change in systolic and diastolic blood pressure was -0.6 mg Hg and -0.8 mm Hg, respectively. Among those with pre- and post-medication LDL-C measurements (n=1,244), the average change in LDL-C was -26.3 mg/dL. SNPs were tested for an association with change and percent change in blood pressure or blood levels of LDL-C. After adjustment for multiple testing, we did not observe any significant associations, and we were not able to replicate previously reported associations, such as in APOE and LPA, from the literature. The present study illustrates the benefits and challenges with using electronic health records linked to biorepositories for pharmacogenomic studies.
Collapse
Affiliation(s)
- Sarah M. Laper
- Eastern Virginia Medical School, Norfolk, VA, 23507, USA
| | - Nicole A. Restrepo
- Center for Human Genetics Research, Vanderbilt University, 519 Light Hall, 2215 Garland Avenue, Nashville, TN 37232, USA
| | - Dana C. Crawford
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Suite 2527, Cleveland, OH 44106, USA
| |
Collapse
|
7
|
Cao P, Pan H, Xiao T, Zhou T, Guo J, Su Z. Advances in the Study of the Antiatherogenic Function and Novel Therapies for HDL. Int J Mol Sci 2015. [PMID: 26225968 PMCID: PMC4581191 DOI: 10.3390/ijms160817245] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The hypothesis that raising high-density lipoprotein cholesterol (HDL-C) levels could improve the risk for cardiovascular disease (CVD) is facing challenges. There is multitudinous clear clinical evidence that the latest failures of HDL-C-raising drugs show no clear association with risks for CVD. At the genetic level, recent research indicates that steady-state HDL-C concentrations may provide limited information regarding the potential antiatherogenic functions of HDL. It is evident that the newer strategies may replace therapeutic approaches to simply raise plasma HDL-C levels. There is an urgent need to identify an efficient biomarker that accurately predicts the increased risk of atherosclerosis (AS) in patients and that may be used for exploring newer therapeutic targets. Studies from recent decades show that the composition, structure and function of circulating HDL are closely associated with high cardiovascular risk. A vast amount of data demonstrates that the most important mechanism through which HDL antagonizes AS involves the reverse cholesterol transport (RCT) process. Clinical trials of drugs that specifically target HDL have so far proven disappointing, so it is necessary to carry out review on the HDL therapeutics.
Collapse
Affiliation(s)
- Peiqiu Cao
- Key Research Center of Liver Regulation for Hyperlipemia SATCM/Class III, Laboratory of Metabolism SATCM, Guangdong TCM Key Laboratory for Metabolic Diseases, Guangdong Pharmaceutical University, Guangzhou 510006, China.
| | - Haitao Pan
- Key Research Center of Liver Regulation for Hyperlipemia SATCM/Class III, Laboratory of Metabolism SATCM, Guangdong TCM Key Laboratory for Metabolic Diseases, Guangdong Pharmaceutical University, Guangzhou 510006, China.
| | - Tiancun Xiao
- Inorganic Chemistry Laboratory, University of Oxford, South Parks Road, Oxford OX1 3QR, UK.
- Guangzhou Boxabio Ltd., D-106 Guangzhou International Business Incubator, Guangzhou 510530, China.
| | - Ting Zhou
- Guangzhou Boxabio Ltd., D-106 Guangzhou International Business Incubator, Guangzhou 510530, China.
| | - Jiao Guo
- Key Research Center of Liver Regulation for Hyperlipemia SATCM/Class III, Laboratory of Metabolism SATCM, Guangdong TCM Key Laboratory for Metabolic Diseases, Guangdong Pharmaceutical University, Guangzhou 510006, China.
| | - Zhengquan Su
- Key Research Center of Liver Regulation for Hyperlipemia SATCM/Class III, Laboratory of Metabolism SATCM, Guangdong TCM Key Laboratory for Metabolic Diseases, Guangdong Pharmaceutical University, Guangzhou 510006, China.
| |
Collapse
|