1
|
Boßelmann CM, Hedrich UBS, Lerche H, Pfeifer N. Predicting functional effects of ion channel variants using new phenotypic machine learning methods. PLoS Comput Biol 2023; 19:e1010959. [PMID: 36877742 PMCID: PMC10019634 DOI: 10.1371/journal.pcbi.1010959] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 03/16/2023] [Accepted: 02/19/2023] [Indexed: 03/07/2023] Open
Abstract
Missense variants in genes encoding ion channels are associated with a spectrum of severe diseases. Variant effects on biophysical function correlate with clinical features and can be categorized as gain- or loss-of-function. This information enables a timely diagnosis, facilitates precision therapy, and guides prognosis. Functional characterization presents a bottleneck in translational medicine. Machine learning models may be able to rapidly generate supporting evidence by predicting variant functional effects. Here, we describe a multi-task multi-kernel learning framework capable of harmonizing functional results and structural information with clinical phenotypes. This novel approach extends the human phenotype ontology towards kernel-based supervised machine learning. Our gain- or loss-of-function classifier achieves high performance (mean accuracy 0.853 SD 0.016, mean AU-ROC 0.912 SD 0.025), outperforming both conventional baseline and state-of-the-art methods. Performance is robust across different phenotypic similarity measures and largely insensitive to phenotypic noise or sparsity. Localized multi-kernel learning offered biological insight and interpretability by highlighting channels with implicit genotype-phenotype correlations or latent task similarity for downstream analysis.
Collapse
Affiliation(s)
- Christian Malte Boßelmann
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Tuebingen, Germany
- Methods in Medical Informatics, Department of Computer Science, University of Tuebingen, Tuebingen, Germany
| | - Ulrike B. S. Hedrich
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Tuebingen, Germany
| | - Holger Lerche
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Tuebingen, Germany
- * E-mail: (HL); (NP)
| | - Nico Pfeifer
- Methods in Medical Informatics, Department of Computer Science, University of Tuebingen, Tuebingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tuebingen, Tuebingen, Germany
- * E-mail: (HL); (NP)
| |
Collapse
|
2
|
Webster DE, Tummalacherla M, Higgins M, Wing D, Ashley E, Kelly VE, McConnell MV, Muse ED, Olgin JE, Mangravite LM, Godino J, Kellen MR, Omberg L. Smartphone-Based VO2max Measurement With Heart Snapshot in Clinical and Real-world Settings With a Diverse Population: Validation Study. JMIR Mhealth Uhealth 2021; 9:e26006. [PMID: 34085945 PMCID: PMC8214186 DOI: 10.2196/26006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 02/04/2021] [Accepted: 04/12/2021] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Maximal oxygen consumption (VO2max) is one of the most predictive biometrics for cardiovascular health and overall mortality. However, VO2max is rarely measured in large-scale research studies or routine clinical care because of the high cost, participant burden, and requirement for specialized equipment and staff. OBJECTIVE To overcome the limitations of clinical VO2max measurement, we aim to develop a digital VO2max estimation protocol that can be self-administered remotely using only the sensors within a smartphone. We also aim to validate this measure within a broadly representative population across a spectrum of smartphone devices. METHODS Two smartphone-based VO2max estimation protocols were developed: a 12-minute run test (12-MRT) based on distance measured by GPS and a 3-minute step test (3-MST) based on heart rate recovery measured by a camera. In a 101-person cohort, balanced across age deciles and sex, participants completed a gold standard treadmill-based VO2max measurement, two silver standard clinical protocols, and the smartphone-based 12-MRT and 3-MST protocols in the clinic and at home. In a separate 120-participant cohort, the video-based heart rate measurement underlying the 3-MST was measured for accuracy in individuals across the spectrum skin tones while using 8 different smartphones ranging in cost from US $99 to US $999. RESULTS When compared with gold standard VO2max testing, Lin concordance was pc=0.66 for 12-MRT and pc=0.61 for 3-MST. However, in remote settings, the 12-MRT was significantly less concordant with the gold standard (pc=0.25) compared with the 3-MST (pc=0.61), although both had high test-retest reliability (12-MRT intraclass correlation coefficient=0.88; 3-MST intraclass correlation coefficient=0.86). On the basis of the finding that 3-MST concordance was generalizable to remote settings whereas 12-MRT was not, the video-based heart rate measure within the 3-MST was selected for further investigation. Heart rate measurements in any of the combinations of the six Fitzpatrick skin tones and 8 smartphones resulted in a concordance of pc≥0.81. Performance did not correlate with device cost, with all phones selling under US $200 performing better than pc>0.92. CONCLUSIONS These findings demonstrate the importance of validating mobile health measures in the real world across a diverse cohort and spectrum of hardware. The 3-MST protocol, termed as heart snapshot, measured VO2max with similar accuracy to supervised in-clinic tests such as the Tecumseh (pc=0.94) protocol, while also generalizing to remote and unsupervised measurements. Heart snapshot measurements demonstrated fidelity across demographic variation in age and sex, across diverse skin pigmentation, and between various iOS and Android phone configurations. This software is freely available for all validation data and analysis code.
Collapse
Affiliation(s)
| | | | - Michael Higgins
- Exercise and Physical Activity Resource Center, University of California at San Diego, San Diego, CA, United States
| | - David Wing
- Exercise and Physical Activity Resource Center, University of California at San Diego, San Diego, CA, United States
| | - Euan Ashley
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, United States
| | - Valerie E Kelly
- Department of Rehabilitation Medicine, University of Washington, Seattle, WA, United States
| | - Michael V McConnell
- Stanford University School of Medicine, Stanford, CA, United States.,Google Health, Palo Alto, CA, United States
| | - Evan D Muse
- Scripps Research Translational Institute and Scripps Clinic, La Jolla, CA, United States
| | - Jeffrey E Olgin
- Division of Cardiology and the Cardiovascular Research Institute, University of California San Francisco, San Francisco, CA, United States
| | | | - Job Godino
- Exercise and Physical Activity Resource Center, University of California at San Diego, San Diego, CA, United States.,Scripps Research Translational Institute and Scripps Clinic, La Jolla, CA, United States
| | | | | |
Collapse
|
3
|
Chen HH, Petty LE, Bush W, Naj AC, Below JE. GWAS and Beyond: Using Omics Approaches to Interpret SNP Associations. CURRENT GENETIC MEDICINE REPORTS 2019; 7:30-40. [PMID: 33312764 PMCID: PMC7731888 DOI: 10.1007/s40142-019-0159-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
PURPOSE OF REVIEW Neurodegenerative diseases, neuropsychiatric disorders, and related traits have highly complex etiologies but are also highly heritable and identifying the causal genes and biological pathways underlying these traits may advance the development of treatments and preventive strategies. While many genome-wide association studies (GWAS) have successfully identified variants contributing to polygenic neurodegenerative and neuropsychiatric phenotypes including Alzheimer's disease (AD), schizophrenia (SCZ), and bipolar disorder (BPD) amongst others, interpreting the biological roles of significantly-associated variants in the genetic architecture of these traits remains a significant challenge. Here we review several 'omics' approaches which attempt to bridge the gap from associated genetic variants to phenotype by helping define the functional roles of GWAS loci in the development of neuropsychiatric disorders and traits. RECENT FINDINGS Several common 'omics' approaches have been applied to examine neuropsychiatric traits, such as nearest-gene mapping, trans-ethnic fine mapping, annotation enrichment analysis, transcriptomic analysis, and pathway analysis, and each of these approaches has strengths and limitations in providing insight into biological mechanisms. One popular emerging method is the examination of tissue-specific genetically-regulated gene expression (GReX), which aggregates the genetic variants' effects at the gene-level. Furthermore, proteomic, metabolomic, and microbiomic studies and phenome-wide association studies will further enhance our understanding of neuropsychiatric traits. SUMMARY GWAS has been applied to neuropsychiatric traits for a decade, but our understanding about the biological function of identified variants remains limited. Today, technological advancements have created analytical approaches for integrating transcriptomics, metabolomics, proteomics, pharmacology and toxicology as tools for understanding the functional roles of genetics variants. These data, as well as the broader clinical information provided by electronic health records, can provide additional insight and complement genomic analyses.
Collapse
Affiliation(s)
- Hung-Hsin Chen
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Lauren E. Petty
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - William Bush
- Institute for Computational Biology, Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Adam C. Naj
- Department of Biostatistics, Epidemiology, and Informatics; Department of Pathology and Laboratory Medicine; Center for Clinical Epidemiology and Biostatistics; Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jennifer E. Below
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
4
|
Stanaway IB, Hall TO, Rosenthal EA, Palmer M, Naranbhai V, Knevel R, Namjou-Khales B, Carroll RJ, Kiryluk K, Gordon AS, Linder J, Howell KM, Mapes BM, Lin FTJ, Joo YY, Hayes MG, Gharavi AG, Pendergrass SA, Ritchie MD, de Andrade M, Croteau-Chonka DC, Raychaudhuri S, Weiss ST, Lebo M, Amr SS, Carrell D, Larson EB, Chute CG, Rasmussen-Torvik LJ, Roy-Puckelwartz MJ, Sleiman P, Hakonarson H, Li R, Karlson EW, Peterson JF, Kullo IJ, Chisholm R, Denny JC, Jarvik GP, Crosslin DR. The eMERGE genotype set of 83,717 subjects imputed to ~40 million variants genome wide and association with the herpes zoster medical record phenotype. Genet Epidemiol 2018; 43:63-81. [PMID: 30298529 PMCID: PMC6375696 DOI: 10.1002/gepi.22167] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 08/10/2018] [Accepted: 08/28/2018] [Indexed: 12/30/2022]
Abstract
The Electronic Medical Records and Genomics (eMERGE) network is a network of medical centers with electronic medical records linked to existing biorepository samples for genomic discovery and genomic medicine research. The network sought to unify the genetic results from 78 Illumina and Affymetrix genotype array batches from 12 contributing medical centers for joint association analysis of 83,717 human participants. In this report, we describe the imputation of eMERGE results and methods to create the unified imputed merged set of genome‐wide variant genotype data. We imputed the data using the Michigan Imputation Server, which provides a missing single‐nucleotide variant genotype imputation service using the minimac3 imputation algorithm with the Haplotype Reference Consortium genotype reference set. We describe the quality control and filtering steps used in the generation of this data set and suggest generalizable quality thresholds for imputation and phenotype association studies. To test the merged imputed genotype set, we replicated a previously reported chromosome 6 HLA‐B herpes zoster (shingles) association and discovered a novel zoster‐associated loci in an epigenetic binding site near the terminus of chromosome 3 (3p29).
Collapse
Affiliation(s)
- Ian B Stanaway
- Department of Biomedical Informatics Medical Education, School of Medicine, University of Washington, Seattle, Washington
| | - Taryn O Hall
- Department of Biomedical Informatics Medical Education, School of Medicine, University of Washington, Seattle, Washington
| | - Elisabeth A Rosenthal
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, Washington
| | - Melody Palmer
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, Washington
| | - Vivek Naranbhai
- Department of Biomedical Informatics Medical Education, School of Medicine, University of Washington, Seattle, Washington.,Harvard Medical School, Harvard University, Cambridge, Massachusetts
| | - Rachel Knevel
- Harvard Medical School, Harvard University, Cambridge, Massachusetts
| | - Bahram Namjou-Khales
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Robert J Carroll
- Departments of Biomedical Informatics and Medicine, Vanderbilt University, Nashville, Tennessee
| | - Krzysztof Kiryluk
- Department of Medicine, Columbia University, New York City, New York
| | - Adam S Gordon
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, Washington
| | - Jodell Linder
- Vanderbilt Institute for Clinical and Translational Research, School of Medicine, Vanderbilt University, Nashville, Tennessee
| | - Kayla Marie Howell
- Vanderbilt Institute for Clinical and Translational Research, School of Medicine, Vanderbilt University, Nashville, Tennessee
| | - Brandy M Mapes
- Vanderbilt Institute for Clinical and Translational Research, School of Medicine, Vanderbilt University, Nashville, Tennessee
| | - Frederick T J Lin
- Feinberg School of Medicine, Northwestern University, Chicago, Illinois
| | | | - M Geoffrey Hayes
- Feinberg School of Medicine, Northwestern University, Chicago, Illinois
| | - Ali G Gharavi
- Department of Medicine, Columbia University, New York City, New York
| | | | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania
| | | | | | - Soumya Raychaudhuri
- Harvard Medical School, Harvard University, Cambridge, Massachusetts.,Program in Medical and Population Genetics, Broad Institute of Massachusetts Technical Institute and Harvard University, Cambridge, Massachusetts
| | - Scott T Weiss
- Harvard Medical School, Harvard University, Cambridge, Massachusetts
| | - Matt Lebo
- Harvard Medical School, Harvard University, Cambridge, Massachusetts
| | - Sami S Amr
- Harvard Medical School, Harvard University, Cambridge, Massachusetts
| | - David Carrell
- Kaiser Permanente Washington Health Research Institute (Formerly Group Health Cooperative-Seattle), Kaiser Permanente, Seattle, Washington
| | - Eric B Larson
- Kaiser Permanente Washington Health Research Institute (Formerly Group Health Cooperative-Seattle), Kaiser Permanente, Seattle, Washington
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland
| | | | | | - Patrick Sleiman
- Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
| | | | - Rongling Li
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland
| | - Elizabeth W Karlson
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Josh F Peterson
- Departments of Biomedical Informatics and Medicine, Vanderbilt University, Nashville, Tennessee
| | | | - Rex Chisholm
- Feinberg School of Medicine, Northwestern University, Chicago, Illinois
| | - Joshua Charles Denny
- Departments of Biomedical Informatics and Medicine, Vanderbilt University, Nashville, Tennessee
| | - Gail P Jarvik
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, Washington
| | -
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland
| | - David R Crosslin
- Department of Biomedical Informatics Medical Education, School of Medicine, University of Washington, Seattle, Washington
| |
Collapse
|
5
|
Fraade-Blanar LA, Hansen RN, Chan KCG, Sears JM, Thompson HJ, Crane PK, Ebel BE. Diagnosed dementia and the risk of motor vehicle crash among older drivers. ACCIDENT; ANALYSIS AND PREVENTION 2018; 113:47-53. [PMID: 29407668 PMCID: PMC5869102 DOI: 10.1016/j.aap.2017.12.021] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 11/13/2017] [Accepted: 12/28/2017] [Indexed: 05/28/2023]
Abstract
Older adults are an active and growing segment of drivers in the United States. We compared the risk of motor vehicle crash among older licensed drivers diagnosed with dementia to crash risk among older licensed drivers without diagnosis of dementia. This retrospective cohort study used data from Group Health (GH), a Washington State health maintenance organization. Research participants were members of GH, aged 65-79 during the study who lived in Washington State from 1999-2009. Participant health records were linked with police-reported crash and licensure records. We estimated the risk of crash for older drivers diagnosed with dementia compared to older drivers without diagnosis of dementia using a Cox proportional hazards model with robust standard errors, accounting for recurrent events (crashes). Multivariable models were adjusted for age, sex, history of alcohol abuse or depression, comorbidities, and medications. There were 29,730 eligible individuals with an active driving license. Approximately 6% were diagnosed with dementia before or during the study. The police-reported crash rate was 14.7 per 1000 driver-years. The adjusted hazard ratio of crash among older drivers with diagnosed dementia was 0.56 (95% CI 0.33, 0.95) compared to those without diagnosed dementia. On-road and simulator-based research showed older adults with dementia demonstrated impaired driving skill and capabilities. The observed lower crash risk in our study may result from protective steps to limit driving among older adults diagnosed with dementia. Future research should examine driving risk reduction strategies at the time of dementia diagnosis and their impact on reducing crash risk.
Collapse
Affiliation(s)
- Laura A Fraade-Blanar
- Department of Health Services, University of Washington, 1410 NE Campus Parkway, Seattle, WA, 98195-5852, USA; Harborview Injury Prevention Research Center, 401 Broadway, Seattle, WA, 98122, USA.
| | - Ryan N Hansen
- Department of Health Services, University of Washington, 1410 NE Campus Parkway, Seattle, WA, 98195-5852, USA; Department of Pharmacy, University of Washington, 1410 NE Campus Parkway, Seattle, WA, 98195-5852, USA; Group Health Research Institute, 1730 Minor Ave, Seattle, WA, 98101, USA; Harborview Injury Prevention Research Center, 401 Broadway, Seattle, WA, 98122, USA
| | - Kwun Chuen G Chan
- Department of Health Services, University of Washington, 1410 NE Campus Parkway, Seattle, WA, 98195-5852, USA; Departments of Biostatistics, University of Washington, 1410 NE Campus Parkway, Seattle, WA, 98195-5852, USA
| | - Jeanne M Sears
- Department of Health Services, University of Washington, 1410 NE Campus Parkway, Seattle, WA, 98195-5852, USA; Harborview Injury Prevention Research Center, 401 Broadway, Seattle, WA, 98122, USA; Institute for Work & Health, Institute for Work & Health, Ontario, Canada
| | - Hilaire J Thompson
- Harborview Injury Prevention Research Center, 401 Broadway, Seattle, WA, 98122, USA; Department of Biobehavioral Nursing and Health Informatics, University of Washington, 1410 NE Campus Parkway, Seattle, WA, 98195-5852, USA
| | - Paul K Crane
- Department of Health Services, University of Washington, 1410 NE Campus Parkway, Seattle, WA, 98195-5852, USA; Department of Medicine, University of Washington, University of Washington, 1410 NE Campus Parkway, Seattle, WA, 98195-5852, USA
| | - Beth E Ebel
- Department of Health Services, University of Washington, 1410 NE Campus Parkway, Seattle, WA, 98195-5852, USA; Harborview Injury Prevention Research Center, 401 Broadway, Seattle, WA, 98122, USA; Department of Pediatrics, University of Washington and Seattle Children's Hospital; 1410 NE Campus Parkway, Seattle, WA, 98195-5852, USA; Department of Epidemiology, University of Washington, 1410 NE Campus Parkway, Seattle, WA, 98195-5852, USA
| |
Collapse
|
6
|
Wang L, Damrauer SM, Zhang H, Zhang AX, Xiao R, Moore JH, Chen J. Phenotype validation in electronic health records based genetic association studies. Genet Epidemiol 2017; 41:790-800. [PMID: 29023970 DOI: 10.1002/gepi.22080] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Revised: 06/30/2017] [Accepted: 08/01/2017] [Indexed: 12/13/2022]
Abstract
The linkage between electronic health records (EHRs) and genotype data makes it plausible to study the genetic susceptibility of a wide range of disease phenotypes. Despite that EHR-derived phenotype data are subjected to misclassification, it has been shown useful for discovering susceptible genes, particularly in the setting of phenome-wide association studies (PheWAS). It is essential to characterize discovered associations using gold standard phenotype data by chart review. In this work, we propose a genotype stratified case-control sampling strategy to select subjects for phenotype validation. We develop a closed-form maximum-likelihood estimator for the odds ratio parameters and a score statistic for testing genetic association using the combined validated and error-prone EHR-derived phenotype data, and assess the extent of power improvement provided by this approach. Compared with case-control sampling based only on EHR-derived phenotype data, our genotype stratified strategy maintains nominal type I error rates, and result in higher power for detecting associations. It also corrects the bias in the odds ratio parameter estimates, and reduces the corresponding variance especially when the minor allele frequency is small.
Collapse
Affiliation(s)
- Lu Wang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Scott M Damrauer
- Division of Vascular Surgery and Endovascular Therapy, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Department of Surgery, Corporal Michael Crescenz VA Medical Center, Philadelphia, Pennsylvania, United States of America
| | - Hong Zhang
- Institute of Biostatistics, Fudan University, Shanghai, P.R. China
| | - Alan X Zhang
- Sidwell Friends School, Washington, DC, United States of America
| | - Rui Xiao
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jason H Moore
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
7
|
Stewart R, Davis K. 'Big data' in mental health research: current status and emerging possibilities. Soc Psychiatry Psychiatr Epidemiol 2016; 51:1055-72. [PMID: 27465245 PMCID: PMC4977335 DOI: 10.1007/s00127-016-1266-8] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/07/2016] [Accepted: 07/08/2016] [Indexed: 01/24/2023]
Abstract
PURPOSE 'Big data' are accumulating in a multitude of domains and offer novel opportunities for research. The role of these resources in mental health investigations remains relatively unexplored, although a number of datasets are in use and supporting a range of projects. We sought to review big data resources and their use in mental health research to characterise applications to date and consider directions for innovation in future. METHODS A narrative review. RESULTS Clear disparities were evident in geographic regions covered and in the disorders and interventions receiving most attention. DISCUSSION We discuss the strengths and weaknesses of the use of different types of data and the challenges of big data in general. Current research output from big data is still predominantly determined by the information and resources available and there is a need to reverse the situation so that big data platforms are more driven by the needs of clinical services and service users.
Collapse
Affiliation(s)
- Robert Stewart
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, Box 63, De Crespigny Park, London, SE5 8AF, UK.
| | - Katrina Davis
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, Box 63, De Crespigny Park, London, SE5 8AF, UK
| |
Collapse
|
8
|
Unravelling the human genome-phenome relationship using phenome-wide association studies. Nat Rev Genet 2016; 17:129-45. [PMID: 26875678 DOI: 10.1038/nrg.2015.36] [Citation(s) in RCA: 166] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Advances in genotyping technology have, over the past decade, enabled the focused search for common genetic variation associated with human diseases and traits. With the recently increased availability of detailed phenotypic data from electronic health records and epidemiological studies, the impact of one or more genetic variants on the phenome is starting to be characterized both in clinical and population-based settings using phenome-wide association studies (PheWAS). These studies reveal a number of challenges that will need to be overcome to unlock the full potential of PheWAS for the characterization of the complex human genome-phenome relationship.
Collapse
|
9
|
Dumitrescu L, Diggins KE, Goodloe R, Crawford DC. TESTING POPULATION-SPECIFIC QUANTITATIVE TRAIT ASSOCIATIONS FOR CLINICAL OUTCOME RELEVANCE IN A BIOREPOSITORY LINKED TO ELECTRONIC HEALTH RECORDS: LPA AND MYOCARDIAL INFARCTION IN AFRICAN AMERICANS. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:96-107. [PMID: 26776177 PMCID: PMC4720978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Previous candidate gene and genome-wide association studies have identified common genetic variants in LPA associated with the quantitative trait Lp(a), an emerging risk factor for cardiovascular disease. These associations are population-specific and many have not yet been tested for association with the clinical outcome of interest. To fill this gap in knowledge, we accessed the epidemiologic Third National Health and Nutrition Examination Surveys (NHANES III) and BioVU, the Vanderbilt University Medical Center biorepository linked to de-identified electronic health records (EHRs), including billing codes (ICD-9-CM) and clinical notes, to test population-specific Lp(a)-associated variants for an association with myocardial infarction (MI) among African Americans. We performed electronic phenotyping among African Americans in BioVU≥40 years of age using billing codes. At total of 93 cases and 522 controls were identified in NHANES III and 265 cases and 363 controls were identified in BioVU. We tested five known Lp(a)-associated genetic variants (rs1367211, rs41271028, rs6907156, rs10945682, and rs1652507) in both NHANES III and BioVU for association with myocardial infarction. We also tested LPA rs3798220 (I4399M), previously associated with increased levels of Lp(a), MI, and coronary artery disease in European Americans, in BioVU. After meta-analysis, tests of association using logistic regression assuming an additive genetic model revealed no significant associations (p<0.05) for any of the five LPA variants previously associated with Lp(a) levels in African Americans. Also, I4399M rs3798220 was not associated with MI in African Americans (odds ratio = 0.51; 95% confidence interval: 0.16 - 1.65; p=0.26) despite strong, replicated associations with MI and coronary artery disease in European American genome-wide association studies. These data highlight the challenges in translating quantitative trait associations to clinical outcomes in diverse populations using large epidemiologic and clinic-based collections as envisioned for the Precision Medicine Initiative.
Collapse
Affiliation(s)
- Logan Dumitrescu
- Center for Human Genetics Research, Vanderbilt University, 519 Light Hall, 2215 Garland Avenue, Nashville, TN 37232, USA
| | - Kirsten E. Diggins
- Cancer Biology, Vanderbilt University, 742 Preston Research Building, 2220 Pierce Avenue, Nashville, TN 37232, USA
| | - Robert Goodloe
- Center for Human Genetics Research, Vanderbilt University, 519 Light Hall, 2215 Garland Avenue, Nashville, TN 37232, USA
| | - Dana C. Crawford
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Suite 2527, Cleveland, OH 44106, USA
| |
Collapse
|
10
|
Low YS, Gallego B, Shah NH. Comparing high-dimensional confounder control methods for rapid cohort studies from electronic health records. J Comp Eff Res 2015; 5:179-92. [PMID: 26634383 PMCID: PMC4933592 DOI: 10.2217/cer.15.53] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Aims: Electronic health records (EHR), containing rich clinical histories of large patient populations, can provide evidence for clinical decisions when evidence from trials and literature is absent. To enable such observational studies from EHR in real time, particularly in emergencies, rapid confounder control methods that can handle numerous variables and adjust for biases are imperative. This study compares the performance of 18 automatic confounder control methods. Methods: Methods include propensity scores, direct adjustment by machine learning, similarity matching and resampling in two simulated and one real-world EHR datasets. Results & conclusions: Direct adjustment by lasso regression and ensemble models involving multiple resamples have performance comparable to expert-based propensity scores and thus, may help provide real-time EHR-based evidence for timely clinical decisions.
Collapse
Affiliation(s)
- Yen Sia Low
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305, USA
| | - Blanca Gallego
- Center for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Nigam Haresh Shah
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
11
|
The effects of electronic medical record phenotyping details on genetic association studies: HDL-C as a case study. BioData Min 2015; 8:15. [PMID: 25969697 PMCID: PMC4428098 DOI: 10.1186/s13040-015-0048-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Accepted: 04/28/2015] [Indexed: 02/01/2023] Open
Abstract
Background Biorepositories linked to de-identified electronic medical records (EMRs) have the potential to complement traditional epidemiologic studies in genotype-phenotype studies of complex human diseases and traits. A major challenge in meeting this potential is the use of EMR-derived data to extract phenotypes and covariates for genetic association studies. Unlike traditional epidemiologic data, EMR-derived data are collected for clinical care and are therefore highly variable across patients. The variability of clinical data coupled with the challenges associated with searching unstructured clinical notes requires the development of algorithms to extract phenotypes for analysis. Given the number of possible algorithms that could be developed for any one EMR-derived phenotype, we explored here the impact algorithm decision logic has on genetic association study results for a single quantitative trait, high density lipoprotein cholesterol (HDL-C). Results We used five different algorithms to extract HDL-C from African American subjects genotyped on the Illumina Metabochip (n = 11,519) as part of Epidemiologic Architecture for Genes Linked to Environment (EAGLE). Tests of association between HDL-C and genetic risk scores for HDL-C associated variants suggest that the genetic effect size does not vary substantially across the five HDL-C definitions. Conclusions These data collectively suggest that, at least for this quantitative trait, algorithm decision logic and phenotyping details do not appreciably impact genetic association study test statistics.
Collapse
|
12
|
Sinnott JA, Dai W, Liao KP, Shaw SY, Ananthakrishnan AN, Gainer VS, Karlson EW, Churchill S, Szolovits P, Murphy S, Kohane I, Plenge R, Cai T. Improving the power of genetic association tests with imperfect phenotype derived from electronic medical records. Hum Genet 2014; 133:1369-82. [PMID: 25062868 PMCID: PMC4185241 DOI: 10.1007/s00439-014-1466-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Accepted: 06/29/2014] [Indexed: 01/04/2023]
Abstract
To reduce costs and improve clinical relevance of genetic studies, there has been increasing interest in performing such studies in hospital-based cohorts by linking phenotypes extracted from electronic medical records (EMRs) to genotypes assessed in routinely collected medical samples. A fundamental difficulty in implementing such studies is extracting accurate information about disease outcomes and important clinical covariates from large numbers of EMRs. Recently, numerous algorithms have been developed to infer phenotypes by combining information from multiple structured and unstructured variables extracted from EMRs. Although these algorithms are quite accurate, they typically do not provide perfect classification due to the difficulty in inferring meaning from the text. Some algorithms can produce for each patient a probability that the patient is a disease case. This probability can be thresholded to define case-control status, and this estimated case-control status has been used to replicate known genetic associations in EMR-based studies. However, using the estimated disease status in place of true disease status results in outcome misclassification, which can diminish test power and bias odds ratio estimates. We propose to instead directly model the algorithm-derived probability of being a case. We demonstrate how our approach improves test power and effect estimation in simulation studies, and we describe its performance in a study of rheumatoid arthritis. Our work provides an easily implemented solution to a major practical challenge that arises in the use of EMR data, which can facilitate the use of EMR infrastructure for more powerful, cost-effective, and diverse genetic studies.
Collapse
Affiliation(s)
- Jennifer A Sinnott
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, 02115, USA,
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Crawford DC, Crosslin DR, Tromp G, Kullo IJ, Kuivaniemi H, Hayes MG, Denny JC, Bush WS, Haines JL, Roden DM, McCarty CA, Jarvik GP, Ritchie MD. eMERGEing progress in genomics-the first seven years. Front Genet 2014; 5:184. [PMID: 24987407 PMCID: PMC4060012 DOI: 10.3389/fgene.2014.00184] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 05/30/2014] [Indexed: 12/15/2022] Open
Abstract
The electronic MEdical Records & GEnomics (eMERGE) network was established in 2007 by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) in part to explore the utility of electronic medical records (EMRs) in genome science. The initial focus was on discovery primarily using the genome-wide association paradigm, but more recently, the network has begun evaluating mechanisms to implement new genomic information coupled to clinical decision support into EMRs. Herein, we describe this evolution including the development of the individual and merged eMERGE genomic datasets, the contribution the network has made toward genomic discovery and human health, and the steps taken toward the next generation genotype-phenotype association studies and clinical implementation.
Collapse
Affiliation(s)
- Dana C Crawford
- Center for Human Genetics Research, Vanderbilt University Nashville, TN, USA ; Department of Molecular Physiology and Biophysics, Vanderbilt University Nashville, TN, USA
| | - David R Crosslin
- Medical Genetics, Department of Medicine, School of Medicine, University of Washington Seattle, WA, USA ; Department of Genome Sciences, University of Washington Seattle, WA, USA
| | - Gerard Tromp
- The Sigfried and Janet Weis Center for Research, Geisinger Health System Danville, PA, USA
| | - Iftikhar J Kullo
- Division of Cardiovascular Diseases and the Gonda Vascular Center, Mayo Clinic Rochester, MN, USA
| | - Helena Kuivaniemi
- The Sigfried and Janet Weis Center for Research, Geisinger Health System Danville, PA, USA
| | - M Geoffrey Hayes
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Feinberg School of Medicine, Northwestern University Chicago, IL, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Nashville, TN, USA ; Department of Medicine, Vanderbilt University Nashville, TN, USA
| | - William S Bush
- Center for Human Genetics Research, Vanderbilt University Nashville, TN, USA ; Department of Biomedical Informatics, Vanderbilt University Nashville, TN, USA
| | - Jonathan L Haines
- Department of Epidemiology and Biostatistics, Case Western Reserve University Cleveland, OH, USA ; Institute for Computational Biology, Case Western Reserve University Cleveland, OH, USA
| | - Dan M Roden
- Department of Medicine, Vanderbilt University Nashville, TN, USA ; Department of Pharmacology, Vanderbilt University Nashville, TN, USA
| | | | - Gail P Jarvik
- Medical Genetics, Department of Medicine, School of Medicine, University of Washington Seattle, WA, USA ; Department of Genome Sciences, University of Washington Seattle, WA, USA
| | - Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Pennsylvania State University University Park, PA, USA ; Center for Systems Genomics, Pennsylvania State University University Park, PA, USA
| |
Collapse
|
14
|
Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, Field JR, Pulley JM, Ramirez AH, Bowton E, Basford MA, Carrell DS, Peissig PL, Kho AN, Pacheco JA, Rasmussen LV, Crosslin DR, Crane PK, Pathak J, Bielinski SJ, Pendergrass SA, Xu H, Hindorff LA, Li R, Manolio TA, Chute CG, Chisholm RL, Larson EB, Jarvik GP, Brilliant MH, McCarty CA, Kullo IJ, Haines JL, Crawford DC, Masys DR, Roden DM. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol 2014; 31:1102-10. [PMID: 24270849 DOI: 10.1038/nbt.2749] [Citation(s) in RCA: 658] [Impact Index Per Article: 65.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Accepted: 10/21/2013] [Indexed: 02/06/2023]
Abstract
Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10⁻⁶ (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
Collapse
|