1
|
Li BS, Cai T, Duan R. TARGETING UNDERREPRESENTED POPULATIONS IN PRECISION MEDICINE: A FEDERATED TRANSFER LEARNING APPROACH. Ann Appl Stat 2023; 17:2970-2992. [PMID: 39314265 PMCID: PMC11417462 DOI: 10.1214/23-aoas1747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
The limited representation of minorities and disadvantaged populations in large-scale clinical and genomics research poses a significant barrier to translating precision medicine research into practice. Prediction models are likely to underperform in underrepresented populations due to heterogeneity across populations, thereby exacerbating known health disparities. To address this issue, we propose FETA, a two-way data integration method that leverages a federated transfer learning approach to integrate heterogeneous data from diverse populations and multiple healthcare institutions, with a focus on a target population of interest having limited sample sizes. We show that FETA achieves performance comparable to the pooled analysis, where individual-level data is shared across institutions, with only a small number of communications across participating sites. Our theoretical analysis and simulation study demonstrate how FETA's estimation accuracy is influenced by communication budgets, privacy restrictions, and heterogeneity across populations. We apply FETA to multisite data from the electronic Medical Records and Genomics (eMERGE) Network to construct genetic risk prediction models for extreme obesity. Compared to models trained using target data only, source data only, and all data without accounting for population-level differences, FETA shows superior predictive performance. FETA has the potential to improve estimation and prediction accuracy in underrepresented populations and reduce the gap in model performance across populations.
Collapse
Affiliation(s)
- By Sai Li
- Institute of Statistics and Big Data, Renmin University of China
| | - Tianxi Cai
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| | - Rui Duan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| |
Collapse
|
2
|
Maple-Grødem J, Ushakova A, Pedersen KF, Tysnes OB, Alves G, Lange J. Identification of diagnostic and prognostic biomarkers of PD using a multiplex proteomics approach. Neurobiol Dis 2023; 186:106281. [PMID: 37673381 DOI: 10.1016/j.nbd.2023.106281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 08/29/2023] [Accepted: 09/02/2023] [Indexed: 09/08/2023] Open
Abstract
Given the complexity of Parkinson's disease (PD), achieving acceptable diagnostic and prognostic accuracy will require the support of a panel of diverse biomarkers. We used Proximity extension assays to measure a panel of 92 proteins in CSF of 120 newly diagnosed PD patients and 45 control subjects without neurological disease. From 75 proteins detectable in the CSF of >90% of the subjects, regularized regression analysis identified four proteins (β-NGF, CD38, tau and NCAN) as downregulated in newly diagnosed PD patients (age at diagnosis 67.2 ± 9.4 years) compared to controls (age 65.4 ± 10.9 years). Higher tau (β -0.82 transformed MMSE points/year, 95% CI -1.37 to -0.27, P = 0.005) was also linked to faster cognitive decline over the first ten years after PD diagnosis. These findings provide insights into multiple aspects of PD pathophysiology and may serve as the foundation for identifying new biomarkers and therapeutic targets.
Collapse
Affiliation(s)
- Jodi Maple-Grødem
- Centre for Movement Disorders, Centre for Brain Health, Stavanger University Hospital, Stavanger, Norway; Department of Chemistry, Bioscience and Environmental Engineering, University of Stavanger, Stavanger, Norway.
| | - Anastasia Ushakova
- Section of Biostatistics, Department of Research, Stavanger University Hospital, Stavanger, Norway.
| | - Kenn Freddy Pedersen
- Centre for Movement Disorders, Centre for Brain Health, Stavanger University Hospital, Stavanger, Norway; Department of Neurology, Stavanger University Hospital, Stavanger, Norway.
| | - Ole-Bjørn Tysnes
- Department of Neurology, Haukeland University Hospital, Bergen, Norway; Department of Clinical Medicine, University of Bergen, Bergen, Norway.
| | - Guido Alves
- Centre for Movement Disorders, Centre for Brain Health, Stavanger University Hospital, Stavanger, Norway; Department of Chemistry, Bioscience and Environmental Engineering, University of Stavanger, Stavanger, Norway; Department of Neurology, Stavanger University Hospital, Stavanger, Norway.
| | - Johannes Lange
- Centre for Movement Disorders, Centre for Brain Health, Stavanger University Hospital, Stavanger, Norway; Department of Chemistry, Bioscience and Environmental Engineering, University of Stavanger, Stavanger, Norway.
| |
Collapse
|
3
|
Pedersen CC, Ushakova A, Skogseth RE, Alves G, Tysnes OB, Aarsland D, Lange J, Maple-Grødem J. Inflammatory Biomarkers in Newly Diagnosed Patients With Parkinson Disease and Related Neurodegenerative Disorders. NEUROLOGY(R) NEUROIMMUNOLOGY & NEUROINFLAMMATION 2023; 10:10/4/e200132. [PMID: 37258413 DOI: 10.1212/nxi.0000000000200132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 04/19/2023] [Indexed: 06/02/2023]
Abstract
BACKGROUND AND OBJECTIVES Neuroinflammation contributes to Parkinson disease (PD) pathology, and inflammatory biomarkers may aid in PD diagnosis. Proximity extension assay (PEA) technology is a promising method for multiplex analysis of inflammatory markers. Neuroinflammation also plays a role in related neurodegenerative diseases, such as dementia with Lewy bodies (DLB) and Alzheimer disease (AD). The aim of this work was to assess the value of inflammatory biomarkers in newly diagnosed patients with PD and in patients with DLB and AD. METHODS Patients from the Norwegian ParkWest and Dementia Study of Western Norway longitudinal cohorts (PD, n = 120; DLB, n = 15; AD, n = 27) and 44 normal controls were included in this study. A PEA inflammation panel of 92 biomarkers was measured in the CSF. Disease-associated biomarkers were identified using elastic net (EN) analysis. We assessed the discriminatory power of disease-associated biomarkers using receiver operating characteristic (ROC) curve analysis and estimated the optimism-adjusted area under the curve (AUC) using the bootstrapping method. RESULTS EN analysis identified 9 PEA inflammatory biomarkers (ADA, CCL23, CD5, CD8A, CDCP1, FGF-19, IL-18R1, IL-6, and MCP-2) associated with PD. Seven of the 9 biomarkers were included in a diagnostic panel, which was able to discriminate between those with PD and controls (optimism-adjusted AUC 0.82). Our 7-biomarker PD panel was also able to distinguish PD from DLB and from AD. In addition, 4 inflammatory biomarkers were associated with AD and included in a panel, which could distinguish those with AD from controls (optimism-adjusted AUC 0.87). Our 4-biomarker AD panel was also able to distinguish AD from DLB and from PD. DISCUSSION In our exploratory study, we identified a 7-biomarker panel for PD and a 4-biomarker panel for AD. Our findings indicate potential inflammation-related biomarker candidates that could contribute toward PD-specific and AD-specific diagnostic panels, which should be further explored in other larger cohorts.
Collapse
Affiliation(s)
- Camilla Christina Pedersen
- From the The Norwegian Centre for Movement Disorders (C.C.P., G.A., J.L., J.M.-G.), Stavanger University Hospital; Department of Chemistry, Bioscience and Environmental Engineering (C.C.P., G.A., J.L., J.M.-G.), University of Stavanger; Section of Biostatistics (A.U.), Department of Research, Stavanger University Hospital; Department of Geriatric Medicine (R.E.S.), Haraldsplass Deaconess Hospital, Bergen; Department of Clinical Medicine (R.E.S., O.-B.T.), University of Bergen; Department of Neurology (G.A.), Stavanger University Hospital; Department of Neurology (O.-B.T.), Haukeland University Hospital, Bergen; Centre for Age-Related Medicine (D.A.), Stavanger University Hospital, Norway; and Department of Old Age Psychiatry (D.A.), Institute of Psychiatry, Psychology, and Neuroscience, King's College London, United Kingdom
| | - Anastasia Ushakova
- From the The Norwegian Centre for Movement Disorders (C.C.P., G.A., J.L., J.M.-G.), Stavanger University Hospital; Department of Chemistry, Bioscience and Environmental Engineering (C.C.P., G.A., J.L., J.M.-G.), University of Stavanger; Section of Biostatistics (A.U.), Department of Research, Stavanger University Hospital; Department of Geriatric Medicine (R.E.S.), Haraldsplass Deaconess Hospital, Bergen; Department of Clinical Medicine (R.E.S., O.-B.T.), University of Bergen; Department of Neurology (G.A.), Stavanger University Hospital; Department of Neurology (O.-B.T.), Haukeland University Hospital, Bergen; Centre for Age-Related Medicine (D.A.), Stavanger University Hospital, Norway; and Department of Old Age Psychiatry (D.A.), Institute of Psychiatry, Psychology, and Neuroscience, King's College London, United Kingdom
| | - Ragnhild Eide Skogseth
- From the The Norwegian Centre for Movement Disorders (C.C.P., G.A., J.L., J.M.-G.), Stavanger University Hospital; Department of Chemistry, Bioscience and Environmental Engineering (C.C.P., G.A., J.L., J.M.-G.), University of Stavanger; Section of Biostatistics (A.U.), Department of Research, Stavanger University Hospital; Department of Geriatric Medicine (R.E.S.), Haraldsplass Deaconess Hospital, Bergen; Department of Clinical Medicine (R.E.S., O.-B.T.), University of Bergen; Department of Neurology (G.A.), Stavanger University Hospital; Department of Neurology (O.-B.T.), Haukeland University Hospital, Bergen; Centre for Age-Related Medicine (D.A.), Stavanger University Hospital, Norway; and Department of Old Age Psychiatry (D.A.), Institute of Psychiatry, Psychology, and Neuroscience, King's College London, United Kingdom
| | - Guido Alves
- From the The Norwegian Centre for Movement Disorders (C.C.P., G.A., J.L., J.M.-G.), Stavanger University Hospital; Department of Chemistry, Bioscience and Environmental Engineering (C.C.P., G.A., J.L., J.M.-G.), University of Stavanger; Section of Biostatistics (A.U.), Department of Research, Stavanger University Hospital; Department of Geriatric Medicine (R.E.S.), Haraldsplass Deaconess Hospital, Bergen; Department of Clinical Medicine (R.E.S., O.-B.T.), University of Bergen; Department of Neurology (G.A.), Stavanger University Hospital; Department of Neurology (O.-B.T.), Haukeland University Hospital, Bergen; Centre for Age-Related Medicine (D.A.), Stavanger University Hospital, Norway; and Department of Old Age Psychiatry (D.A.), Institute of Psychiatry, Psychology, and Neuroscience, King's College London, United Kingdom
| | - Ole-Bjørn Tysnes
- From the The Norwegian Centre for Movement Disorders (C.C.P., G.A., J.L., J.M.-G.), Stavanger University Hospital; Department of Chemistry, Bioscience and Environmental Engineering (C.C.P., G.A., J.L., J.M.-G.), University of Stavanger; Section of Biostatistics (A.U.), Department of Research, Stavanger University Hospital; Department of Geriatric Medicine (R.E.S.), Haraldsplass Deaconess Hospital, Bergen; Department of Clinical Medicine (R.E.S., O.-B.T.), University of Bergen; Department of Neurology (G.A.), Stavanger University Hospital; Department of Neurology (O.-B.T.), Haukeland University Hospital, Bergen; Centre for Age-Related Medicine (D.A.), Stavanger University Hospital, Norway; and Department of Old Age Psychiatry (D.A.), Institute of Psychiatry, Psychology, and Neuroscience, King's College London, United Kingdom
| | - Dag Aarsland
- From the The Norwegian Centre for Movement Disorders (C.C.P., G.A., J.L., J.M.-G.), Stavanger University Hospital; Department of Chemistry, Bioscience and Environmental Engineering (C.C.P., G.A., J.L., J.M.-G.), University of Stavanger; Section of Biostatistics (A.U.), Department of Research, Stavanger University Hospital; Department of Geriatric Medicine (R.E.S.), Haraldsplass Deaconess Hospital, Bergen; Department of Clinical Medicine (R.E.S., O.-B.T.), University of Bergen; Department of Neurology (G.A.), Stavanger University Hospital; Department of Neurology (O.-B.T.), Haukeland University Hospital, Bergen; Centre for Age-Related Medicine (D.A.), Stavanger University Hospital, Norway; and Department of Old Age Psychiatry (D.A.), Institute of Psychiatry, Psychology, and Neuroscience, King's College London, United Kingdom
| | - Johannes Lange
- From the The Norwegian Centre for Movement Disorders (C.C.P., G.A., J.L., J.M.-G.), Stavanger University Hospital; Department of Chemistry, Bioscience and Environmental Engineering (C.C.P., G.A., J.L., J.M.-G.), University of Stavanger; Section of Biostatistics (A.U.), Department of Research, Stavanger University Hospital; Department of Geriatric Medicine (R.E.S.), Haraldsplass Deaconess Hospital, Bergen; Department of Clinical Medicine (R.E.S., O.-B.T.), University of Bergen; Department of Neurology (G.A.), Stavanger University Hospital; Department of Neurology (O.-B.T.), Haukeland University Hospital, Bergen; Centre for Age-Related Medicine (D.A.), Stavanger University Hospital, Norway; and Department of Old Age Psychiatry (D.A.), Institute of Psychiatry, Psychology, and Neuroscience, King's College London, United Kingdom
| | - Jodi Maple-Grødem
- From the The Norwegian Centre for Movement Disorders (C.C.P., G.A., J.L., J.M.-G.), Stavanger University Hospital; Department of Chemistry, Bioscience and Environmental Engineering (C.C.P., G.A., J.L., J.M.-G.), University of Stavanger; Section of Biostatistics (A.U.), Department of Research, Stavanger University Hospital; Department of Geriatric Medicine (R.E.S.), Haraldsplass Deaconess Hospital, Bergen; Department of Clinical Medicine (R.E.S., O.-B.T.), University of Bergen; Department of Neurology (G.A.), Stavanger University Hospital; Department of Neurology (O.-B.T.), Haukeland University Hospital, Bergen; Centre for Age-Related Medicine (D.A.), Stavanger University Hospital, Norway; and Department of Old Age Psychiatry (D.A.), Institute of Psychiatry, Psychology, and Neuroscience, King's College London, United Kingdom.
| |
Collapse
|
4
|
Ouhourane M, Yang Y, Benedet AL, Oualkacha K. Group penalized quantile regression. STAT METHOD APPL-GER 2022. [DOI: 10.1007/s10260-021-00580-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
5
|
O’Shea RJ, Tsoka S, Cook GJR, Goh V. Sparse Regression in Cancer Genomics: Comparing Variable Selection and Predictions in Real World Data. Cancer Inform 2021; 20:11769351211056298. [PMID: 34866896 PMCID: PMC8640984 DOI: 10.1177/11769351211056298] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 10/09/2021] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Evaluation of gene interaction models in cancer genomics is challenging, as the true distribution is uncertain. Previous analyses have benchmarked models using synthetic data or databases of experimentally verified interactions - approaches which are susceptible to misrepresentation and incompleteness, respectively. The objectives of this analysis are to (1) provide a real-world data-driven approach for comparing performance of genomic model inference algorithms, (2) compare the performance of LASSO, elastic net, best-subset selection,L 0 L 1 penalisation andL 0 L 2 penalisation in real genomic data and (3) compare algorithmic preselection according to performance in our benchmark datasets to algorithmic selection by internal cross-validation. METHODS Five large ( n 4000 ) genomic datasets were extracted from Gene Expression Omnibus. 'Gold-standard' regression models were trained on subspaces of these datasets ( n 4000 , p = 500 ). Penalised regression models were trained on small samples from these subspaces ( n ∈ { 25 , 75 , 150 } , p = 500 ) and validated against the gold-standard models. Variable selection performance and out-of-sample prediction were assessed. Penalty 'preselection' according to test performance in the other 4 datasets was compared to selection internal cross-validation error minimisation. RESULTS L 1 L 2 -penalisation achieved the highest cosine similarity between estimated coefficients and those of gold-standard models.L 0 L 2 -penalised models explained the greatest proportion of variance in test responses, though performance was unreliable in low signal:noise conditions.L 0 L 2 also attained the highest overall median variable selection F1 score. Penalty preselection significantly outperformed selection by internal cross-validation in each of 3 examined metrics. CONCLUSIONS This analysis explores a novel approach for comparisons of model selection approaches in real genomic data from 5 cancers. Our benchmarking datasets have been made publicly available for use in future research. Our findings support the use ofL 0 L 2 penalisation for structural selection andL 1 L 2 penalisation for coefficient recovery in genomic data. Evaluation of learning algorithms according to observed test performance in external genomic datasets yields valuable insights into actual test performance, providing a data-driven complement to internal cross-validation in genomic regression tasks.
Collapse
Affiliation(s)
- Robert J O’Shea
- Department of Cancer Imaging, School of Biomedical Engineering and Imaging Sciences, King’s College London, London, UK
| | - Sophia Tsoka
- Department of Informatics, School of Natural and Mathematical Sciences, King’s College London, London, UK
| | - Gary JR Cook
- Department of Cancer Imaging, School of Biomedical Engineering and Imaging Sciences, King’s College London, London, UK
- King’s College London & Guy’s and St Thomas’ PET Centre, St Thomas’ Hospital, London, UK
| | - Vicky Goh
- Department of Cancer Imaging, School of Biomedical Engineering and Imaging Sciences, King’s College London, London, UK
- Department of Radiology, Guy’s and St Thomas’ NHS Foundation Trust, London, UK
| |
Collapse
|
6
|
Bradley JR, Holan SH, Wikle CK. Bayesian Hierarchical Models With Conjugate Full-Conditional Distributions for Dependent Data From the Natural Exponential Family. J Am Stat Assoc 2019. [DOI: 10.1080/01621459.2019.1677471] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
| | - Scott H. Holan
- Department of Statistics, University of Missouri, Columbia, MO
- U.S. Census Bureau, Washington, DC
| | | |
Collapse
|
7
|
Naidenov B, Lim A, Willyerd K, Torres NJ, Johnson WL, Hwang HJ, Hoyt P, Gustafson JE, Chen C. Pan-Genomic and Polymorphic Driven Prediction of Antibiotic Resistance in Elizabethkingia. Front Microbiol 2019; 10:1446. [PMID: 31333599 PMCID: PMC6622151 DOI: 10.3389/fmicb.2019.01446] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Accepted: 06/07/2019] [Indexed: 01/21/2023] Open
Abstract
The Elizabethkingia are a genetically diverse genus of emerging pathogens that exhibit multidrug resistance to a range of common antibiotics. Two representative species, Elizabethkingia bruuniana and E. meningoseptica, were phenotypically tested to determine minimum inhibitory concentrations (MICs) for five antibiotics. Ultra-long read sequencing with Oxford Nanopore Technologies (ONT) and subsequent de novo assembly produced complete, gapless circular genomes for each strain. Alignment based annotation with Prokka identified 5,480 features in E. bruuniana and 5,203 features in E. meningoseptica, where none of these identified genes or gene combinations corresponded to observed phenotypic resistance values. Pan-genomic analysis, performed with an additional 19 Elizabethkingia strains, identified a core-genome size of 2,658,537 bp, 32 uniquely identifiable intrinsic chromosomal antibiotic resistance core-genes and 77 antibiotic resistance pan-genes. Using core-SNPs and pan-genes in combination with six machine learning (ML) algorithms, binary classification of clindamycin and vancomycin resistance achieved f1 scores of 0.94 and 0.84, respectively. Performance on the more challenging multiclass problem for fusidic acid, rifampin and ciprofloxacin resulted in f1 scores of 0.70, 0.75, and 0.54, respectively. By producing two sets of quality biological predictors, pan-genome genes and core-genome SNPs, from long-read sequence data and applying an ensemble of ML techniques, our results demonstrated that accurate phenotypic inference, at multiple AMR resolutions, can be achieved.
Collapse
Affiliation(s)
- Bryan Naidenov
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
| | - Alexander Lim
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
| | - Karyn Willyerd
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
| | - Nathanial J. Torres
- Department of Cell Biology, Microbiology and Molecular Biology, University of South Florida, Tampa, FL, United States
| | - William L. Johnson
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
| | - Hong Jin Hwang
- 110F Henry Bellmon Research Center, Bioinformatics Graduate Certificate Program and Genomics Core Facility, Oklahoma State University, Stillwater, OK, United States
| | - Peter Hoyt
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
- 110F Henry Bellmon Research Center, Bioinformatics Graduate Certificate Program and Genomics Core Facility, Oklahoma State University, Stillwater, OK, United States
| | - John E. Gustafson
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
| | - Charles Chen
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
| |
Collapse
|
8
|
Taylor KD, Guo X, Zangwill LM, Liebmann JM, Girkin CA, Feldman RM, Dubiner H, Hai Y, Samuels BC, Panarelli JF, Mitchell JP, Al-Aswad LA, Park SC, Tello C, Cotliar J, Bansal R, Sidoti PA, Cioffi GA, Blumberg D, Ritch R, Bell NP, Blieden LS, Davis G, Medeiros FA, Das SK, Divers J, Langefeld CD, Palmer ND, Freedman BI, Bowden DW, Ng MCY, Ida Chen YD, Ayyagari R, Rotter JI, Weinreb RN. Genetic Architecture of Primary Open-Angle Glaucoma in Individuals of African Descent: The African Descent and Glaucoma Evaluation Study III. Ophthalmology 2019; 126:38-48. [PMID: 30352225 PMCID: PMC6309605 DOI: 10.1016/j.ophtha.2018.10.031] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Revised: 10/04/2018] [Accepted: 10/10/2018] [Indexed: 12/12/2022] Open
Abstract
PURPOSE To find genetic contributions to glaucoma in African Americans. DESIGN Cross-sectional, case-control study. PARTICIPANTS One thousand eight hundred seventy-five primary open-angle glaucoma (POAG) patients and 1709 controls, self-identified as being of African descent (AD), from the African Descent and Glaucoma Evaluation Study (ADAGES) III and Wake Forest School of Medicine. METHODS MegaChip genotypes were imputed to Thousand Genomes data. Association of single nucleotide polymorphisms (SNPs) with POAG and advanced POAG was tested by linear mixed model correcting for relatedness and population stratification. Genetic risk scores were tested by receiver operator characteristic curves (ROC-AUCs). MAIN OUTCOME MEASURES Primary open-angle glaucoma defined by visual field loss without other nonocular conditions (n = 1875). Advanced POAG was defined by age-based mean deviation of visual field (n = 946). RESULTS Eighteen million two hundred eighty-one thousand nine hundred twenty SNPs met imputation quality of r2 > 0.7 and minor allele frequency > 0.005. Association of a novel locus, EN04, was observed for advanced POAG (rs185815146 β, 0.36; standard error, 0.065; P < 3×10-8). For POAG, an AD signal was observed at the 9p21 European descent (ED) POAG signal (rs79721419; P < 6.5×10-5) independent of the previously observed 9p21 ED signal (rs2383204; P < 2.3×10-5) by conditional analyses. An association with POAG in FNDC3B (rs111698934; P < 3.9×10-5) was observed, not in linkage disequilibrium (LD) with the previously reported ED SNP. Additional previously identified loci associated with POAG in persons of AD were: 8q22, AFAP1, and TMC01. An AUC of 0.62 was observed with an unweighted genetic risk score comprising 11 SNPs in candidate genes. Two additional risk scores were studied by using a penalized matrix decomposition with cross-validation; risk scores of 50 and 400 SNPs were identified with ROC of AUC = 0.74 and AUC = 0.94, respectively. CONCLUSIONS A novel association with advanced POAG in the EN04 locus was identified putatively in persons of AD. In addition to this finding, this genome-wide association study in POAG patients of AD contributes to POAG genetics by identification of novel signals in prior loci (9p21), as well as advancing the fine mapping of regions because of shorter average LD (FNDC3B). Although not useful without confirmation and clinical trials, the use of genetic risk scores demonstrated that considerable AD-specific genetic information remains in these data.
Collapse
Affiliation(s)
- Kent D Taylor
- Institute for Translational Genomics and Population Sciences and Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, California
| | - Xiuqing Guo
- Institute for Translational Genomics and Population Sciences and Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, California
| | - Linda M Zangwill
- Department of Ophthalmology, Hamilton Glaucoma Center, Shiley Eye Institute, University of California, San Diego, La Jolla, California
| | - Jeffrey M Liebmann
- Bernard and Shirlee Brown Glaucoma Research Laboratory, Harkness Eye Institute, Columbia University Medical Center, New York, New York
| | - Christopher A Girkin
- Department of Ophthalmology, University of Alabama at Birmingham, Birmingham, Alabama
| | - Robert M Feldman
- Ruiz Department of Ophthalmology and Visual Science, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas
| | | | - Yang Hai
- Institute for Translational Genomics and Population Sciences and Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, California
| | - Brian C Samuels
- Department of Ophthalmology, University of Alabama at Birmingham, Birmingham, Alabama
| | - Joseph F Panarelli
- Einhorn Clinical Research Center, New York Eye and Ear Infirmary of Mount Sinai, New York, New York
| | - John P Mitchell
- Bernard and Shirlee Brown Glaucoma Research Laboratory, Harkness Eye Institute, Columbia University Medical Center, New York, New York
| | - Lama A Al-Aswad
- Ruiz Department of Ophthalmology and Visual Science, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Sung Chul Park
- Einhorn Clinical Research Center, New York Eye and Ear Infirmary of Mount Sinai, New York, New York
| | - Celso Tello
- Einhorn Clinical Research Center, New York Eye and Ear Infirmary of Mount Sinai, New York, New York
| | - Jeremy Cotliar
- Bernard and Shirlee Brown Glaucoma Research Laboratory, Harkness Eye Institute, Columbia University Medical Center, New York, New York
| | - Rajendra Bansal
- Bernard and Shirlee Brown Glaucoma Research Laboratory, Harkness Eye Institute, Columbia University Medical Center, New York, New York
| | - Paul A Sidoti
- Einhorn Clinical Research Center, New York Eye and Ear Infirmary of Mount Sinai, New York, New York
| | - George A Cioffi
- Bernard and Shirlee Brown Glaucoma Research Laboratory, Harkness Eye Institute, Columbia University Medical Center, New York, New York
| | - Dana Blumberg
- Bernard and Shirlee Brown Glaucoma Research Laboratory, Harkness Eye Institute, Columbia University Medical Center, New York, New York
| | - Robert Ritch
- Einhorn Clinical Research Center, New York Eye and Ear Infirmary of Mount Sinai, New York, New York
| | - Nicholas P Bell
- Ruiz Department of Ophthalmology and Visual Science, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Lauren S Blieden
- Ruiz Department of Ophthalmology and Visual Science, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Garvin Davis
- Ruiz Department of Ophthalmology and Visual Science, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Felipe A Medeiros
- Department of Ophthalmology, Hamilton Glaucoma Center, Shiley Eye Institute, University of California, San Diego, La Jolla, California
| | - Swapan K Das
- Center for Public Health Genomics, Wake Forest School of Medicine, Winston-Salem, North Carolina; Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina
| | - Jasmin Divers
- Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine, Winston-Salem, North Carolina; Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina
| | - Carl D Langefeld
- Center for Public Health Genomics, Wake Forest School of Medicine, Winston-Salem, North Carolina; Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina
| | - Nicholette D Palmer
- Center for Public Health Genomics, Wake Forest School of Medicine, Winston-Salem, North Carolina; Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine, Winston-Salem, North Carolina; Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina; Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, North Carolina
| | - Barry I Freedman
- Center for Public Health Genomics, Wake Forest School of Medicine, Winston-Salem, North Carolina; Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina; Center for Diabetes Research, Wake Forest School of Medicine, Winston-Salem, North Carolina
| | - Donald W Bowden
- Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine, Winston-Salem, North Carolina; Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, North Carolina; Center for Diabetes Research, Wake Forest School of Medicine, Winston-Salem, North Carolina
| | - Maggie C Y Ng
- Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine, Winston-Salem, North Carolina; Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, North Carolina; Center for Diabetes Research, Wake Forest School of Medicine, Winston-Salem, North Carolina
| | - Yii-Der Ida Chen
- Institute for Translational Genomics and Population Sciences and Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, California
| | - Radha Ayyagari
- Department of Ophthalmology, Hamilton Glaucoma Center, Shiley Eye Institute, University of California, San Diego, La Jolla, California
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences and Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, California
| | - Robert N Weinreb
- Department of Ophthalmology, Hamilton Glaucoma Center, Shiley Eye Institute, University of California, San Diego, La Jolla, California.
| |
Collapse
|
9
|
|
10
|
Erga AH, Dalen I, Ushakova A, Chung J, Tzoulis C, Tysnes OB, Alves G, Pedersen KF, Maple-Grødem J. Dopaminergic and Opioid Pathways Associated with Impulse Control Disorders in Parkinson's Disease. Front Neurol 2018. [PMID: 29541058 PMCID: PMC5835501 DOI: 10.3389/fneur.2018.00109] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Introduction Impulse control disorders (ICDs) are frequent non-motor symptoms in Parkinson’s disease (PD), with potential negative effects on the quality of life and social functioning. ICDs are closely associated with dopaminergic therapy, and genetic polymorphisms in several neurotransmitter pathways may increase the risk of addictive behaviors in PD. However, clinical differentiation between patients at risk and patients without risk of ICDs is still troublesome. The aim of this study was to investigate if genetic polymorphisms across several neurotransmitter pathways were associated with ICD status in patients with PD. Methods Whole-exome sequencing data were available for 119 eligible PD patients from the Norwegian ParkWest study. All participants underwent comprehensive neurological, neuropsychiatric, and neuropsychological assessments. ICDs were assessed using the self-report short form version of the Questionnaire for Impulsive-Compulsive Disorders in PD. Single-nucleotide polymorphisms (SNPs) from 17 genes were subjected to regression with elastic net penalization to identify candidate variants associated with ICDs. The area under the curve of receiver-operating characteristic curves was used to evaluate the level of ICD prediction. Results Among the 119 patients with PD included in the analysis, 29% met the criteria for ICD and 63% were using dopamine agonists (DAs). Eleven SNPs were associated with ICDs, and the four SNPs with the most robust performance significantly increased ICD predictability (AUC = 0.81, 95% CI 0.73–0.90) compared to clinical data alone (DA use and age; AUC = 0.65, 95% CI 0.59–0.78). The strongest predictive factors were rs5326 in DRD1, which was associated with increased odds of ICDs, and rs702764 in OPRK1, which was associated with decreased odds of ICDs. Conclusion Using an advanced statistical approach, we identified SNPs in nine genes, including a novel polymorphism in DRD1, with potential application for the identification of PD patients at risk for ICDs.
Collapse
Affiliation(s)
- Aleksander H Erga
- The Norwegian Centre for Movement Disorders, Stavanger University Hospital, Stavanger, Norway
| | - Ingvild Dalen
- Department of Research, Section of Biostatistics, Stavanger University Hospital, Stavanger, Norway
| | - Anastasia Ushakova
- Department of Research, Section of Biostatistics, Stavanger University Hospital, Stavanger, Norway
| | - Janete Chung
- The Norwegian Centre for Movement Disorders, Stavanger University Hospital, Stavanger, Norway
| | - Charalampos Tzoulis
- Department of Neurology, Haukeland University Hospital, Bergen, Norway.,Department of Clinical Medicine, University of Bergen, Bergen, Norway
| | - Ole Bjørn Tysnes
- Department of Neurology, Haukeland University Hospital, Bergen, Norway.,Department of Clinical Medicine, University of Bergen, Bergen, Norway
| | - Guido Alves
- The Norwegian Centre for Movement Disorders, Stavanger University Hospital, Stavanger, Norway.,Department of Neurology, Stavanger University Hospital, Stavanger, Norway.,Department of Mathematics and Natural Sciences, University of Stavanger, Stavanger, Norway
| | - Kenn Freddy Pedersen
- The Norwegian Centre for Movement Disorders, Stavanger University Hospital, Stavanger, Norway.,Department of Neurology, Stavanger University Hospital, Stavanger, Norway
| | - Jodi Maple-Grødem
- The Norwegian Centre for Movement Disorders, Stavanger University Hospital, Stavanger, Norway.,The Centre for Organelle Research, University of Stavanger, Stavanger, Norway
| |
Collapse
|
11
|
Bull SB, Andrulis IL, Paterson AD. Statistical challenges in high-dimensional molecular and genetic epidemiology. CAN J STAT 2017. [DOI: 10.1002/cjs.11342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Shelley B. Bull
- Lunenfeld-Tanenbaum Research Institute; Sinai Health System; Toronto Ontario, Canada M5T 3L9
- Dalla Lana School of Public Health; University of Toronto; Toronto, Ontario Canada M5T 3M7
| | - Irene L. Andrulis
- Lunenfeld-Tanenbaum Research Institute; Sinai Health System; Toronto Ontario, Canada M5T 3L9
- Department of Molecular Genetics; University of Toronto; Toronto, Ontario Canada M5S 1A8
| | - Andrew D. Paterson
- Dalla Lana School of Public Health; University of Toronto; Toronto, Ontario Canada M5T 3M7
- Genetics and Genome Biology Program; The Hospital for Sick Children; Toronto, Ontario Canada M5G 0A4
| |
Collapse
|
12
|
Keys KL, Chen GK, Lange K. Iterative hard thresholding for model selection in genome-wide association studies. Genet Epidemiol 2017; 41:756-768. [PMID: 28875524 DOI: 10.1002/gepi.22068] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Revised: 07/13/2017] [Accepted: 08/02/2017] [Indexed: 11/05/2022]
Abstract
A genome-wide association study (GWAS) correlates marker and trait variation in a study sample. Each subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here, we assume that subjects are randomly collected unrelateds and that trait values are normally distributed or can be transformed to normality. Over the past decade, geneticists have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with the ℓ1 penalty (LASSO) or minimax concave penalty (MCP) penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. Here, we compare LASSO and MCP penalized regression to iterative hard thresholding (IHT). On GWAS regression data, IHT is better at model selection and comparable in speed to both methods of penalized regression. This conclusion holds for both simulated and real GWAS data. IHT fosters parallelization and scales well in problems with large numbers of causal markers. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoid supercomputing. AVAILABILITY Source code is freely available at https://github.com/klkeys/IHT.jl.
Collapse
Affiliation(s)
- Kevin L Keys
- Department of Medicine, University of California, San Francisco, San Francisco, California, United States of America
| | - Gary K Chen
- Division of Biostatistics, University of Southern California, Los Angeles, California, United States of America
| | - Kenneth Lange
- Departments of Biomathematics, Human Genetics, and Statistics, University of California, Los Angeles, California, United States of America
| |
Collapse
|
13
|
Reza Soroushmehr SM, Najarian K. Transforming big data into computational models for personalized medicine and health care. DIALOGUES IN CLINICAL NEUROSCIENCE 2017. [PMID: 27757067 PMCID: PMC5067150 DOI: 10.31887/dcns.2016.18.3/ssoroushmehr] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Health care systems generate a huge volume of different types of data. Due to the complexity and challenges inherent in studying medical information, it is not yet possible to create a comprehensive model capable of considering all the aspects of health care systems. There are different points of view regarding what the most efficient approaches toward utilization of this data would be. In this paper, we describe the potential role of big data approaches in improving health care systems and review the most common challenges facing the utilization of health care big data.
Collapse
Affiliation(s)
- S M Reza Soroushmehr
- Emergency Medicine Department, University of Michigan, Ann Arbor, Michigan, USA; University of Michigan Center for Integrative Research in Critical Care (MCIRCC), University of Michigan, Ann Arbor, Michigan, USA; Department of Computational Medicine and Bio-informatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Kayvan Najarian
- Emergency Medicine Department, University of Michigan, Ann Arbor, Michigan, USA; University of Michigan Center for Integrative Research in Critical Care (MCIRCC), University of Michigan, Ann Arbor, Michigan, USA; Department of Computational Medicine and Bio-informatics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
14
|
Zhou H, Blangero J, Dyer TD, Chan KHK, Lange K, Sobel EM. Fast Genome-Wide QTL Association Mapping on Pedigree and Population Data. Genet Epidemiol 2017; 41:174-186. [PMID: 27943406 PMCID: PMC5340631 DOI: 10.1002/gepi.21988] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Revised: 05/02/2016] [Accepted: 05/08/2016] [Indexed: 01/14/2023]
Abstract
Since most analysis software for genome-wide association studies (GWAS) currently exploit only unrelated individuals, there is a need for efficient applications that can handle general pedigree data or mixtures of both population and pedigree data. Even datasets thought to consist of only unrelated individuals may include cryptic relationships that can lead to false positives if not discovered and controlled for. In addition, family designs possess compelling advantages. They are better equipped to detect rare variants, control for population stratification, and facilitate the study of parent-of-origin effects. Pedigrees selected for extreme trait values often segregate a single gene with strong effect. Finally, many pedigrees are available as an important legacy from the era of linkage analysis. Unfortunately, pedigree likelihoods are notoriously hard to compute. In this paper, we reexamine the computational bottlenecks and implement ultra-fast pedigree-based GWAS analysis. Kinship coefficients can either be based on explicitly provided pedigrees or automatically estimated from dense markers. Our strategy (a) works for random sample data, pedigree data, or a mix of both; (b) entails no loss of power; (c) allows for any number of covariate adjustments, including correction for population stratification; (d) allows for testing SNPs under additive, dominant, and recessive models; and (e) accommodates both univariate and multivariate quantitative traits. On a typical personal computer (six CPU cores at 2.67 GHz), analyzing a univariate HDL (high-density lipoprotein) trait from the San Antonio Family Heart Study (935,392 SNPs on 1,388 individuals in 124 pedigrees) takes less than 2 min and 1.5 GB of memory. Complete multivariate QTL analysis of the three time-points of the longitudinal HDL multivariate trait takes less than 5 min and 1.5 GB of memory. The algorithm is implemented as the Ped-GWAS Analysis (Option 29) in the Mendel statistical genetics package, which is freely available for Macintosh, Linux, and Windows platforms from http://genetics.ucla.edu/software/mendel.
Collapse
Affiliation(s)
- Hua Zhou
- Department of Biostatistics, University of California, Los Angeles, California, United States of America
| | - John Blangero
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Texas, United States of America
| | - Thomas D Dyer
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Texas, United States of America
| | - Kei-Hang K Chan
- Department of Human Genetics, University of California, Los Angeles, California, United States of America
- Department of Epidemiology, University of California, Los Angeles, California, United States of America
| | - Kenneth Lange
- Department of Human Genetics, University of California, Los Angeles, California, United States of America
- Department of Biomathematics, University of California, Los Angeles, California, United States of America
- Department of Statistics, University of California, Los Angeles, California, United States of America
| | - Eric M Sobel
- Department of Human Genetics, University of California, Los Angeles, California, United States of America
| |
Collapse
|
15
|
Luu K, Bazin E, Blum MGB. pcadapt: an R package to perform genome scans for selection based on principal component analysis. Mol Ecol Resour 2017; 17:67-77. [PMID: 27601374 DOI: 10.1101/056135] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Revised: 07/29/2016] [Accepted: 08/01/2016] [Indexed: 05/23/2023]
Abstract
The R package pcadapt performs genome scans to detect genes under selection based on population genomic data. It assumes that candidate markers are outliers with respect to how they are related to population structure. Because population structure is ascertained with principal component analysis, the package is fast and works with large-scale data. It can handle missing data and pooled sequencing data. By contrast to population-based approaches, the package handle admixed individuals and does not require grouping individuals into populations. Since its first release, pcadapt has evolved in terms of both statistical approach and software implementation. We present results obtained with robust Mahalanobis distance, which is a new statistic for genome scans available in the 2.0 and later versions of the package. When hierarchical population structure occurs, Mahalanobis distance is more powerful than the communality statistic that was implemented in the first version of the package. Using simulated data, we compare pcadapt to other computer programs for genome scans (BayeScan, hapflk, OutFLANK, sNMF). We find that the proportion of false discoveries is around a nominal false discovery rate set at 10% with the exception of BayeScan that generates 40% of false discoveries. We also find that the power of BayeScan is severely impacted by the presence of admixed individuals whereas pcadapt is not impacted. Last, we find that pcadapt and hapflk are the most powerful in scenarios of population divergence and range expansion. Because pcadapt handles next-generation sequencing data, it is a valuable tool for data analysis in molecular ecology.
Collapse
Affiliation(s)
- Keurcien Luu
- Laboratoire TIMC-IMAG, UMR 5525, CNRS, Université Grenoble Alpes, Grenoble, France
| | - Eric Bazin
- Laboratoire d'Ecologie Alpine UMR 5553, CNRS, Université Grenoble Alpes, Grenoble, France
| | - Michael G B Blum
- Laboratoire TIMC-IMAG, UMR 5525, CNRS, Université Grenoble Alpes, Grenoble, France
| |
Collapse
|
16
|
Zhou H, Zhou J, Hu T, Sobel EM, Lange K. Genome-wide QTL and eQTL analyses using Mendel. BMC Proc 2016; 10:239-244. [PMID: 27980643 PMCID: PMC5133530 DOI: 10.1186/s12919-016-0037-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Pedigree genome-wide association studies (GWAS) (Option 29) in the current version of the Mendel software is an optimized subroutine for performing large-scale genome-wide quantitative trait locus (QTL) analysis. This analysis (a) works for random sample data, pedigree data, or a mix of both; (b) is highly efficient in both run time and memory requirement; (c) accommodates both univariate and multivariate traits; (d) works for autosomal and x-linked loci; (e) correctly deals with missing data in traits, covariates, and genotypes; (f) allows for covariate adjustment and constraints among parameters; (g) uses either theoretical or single nucleotide polymorphism (SNP)–based empirical kinship matrix for additive polygenic effects; (h) allows extra variance components such as dominant polygenic effects and household effects; (i) detects and reports outlier individuals and pedigrees; and (j) allows for robust estimation via the t-distribution. This paper assesses these capabilities on the genetics analysis workshop 19 (GAW19) sequencing data. We analyzed simulated and real phenotypes for both family and random sample data sets. For instance, when jointly testing the 8 longitudinally measured systolic blood pressure and diastolic blood pressure traits, it takes Mendel 78 min on a standard laptop computer to read, quality check, and analyze a data set with 849 individuals and 8.3 million SNPs. Genome-wide expression QTL analysis of 20,643 expression traits on 641 individuals with 8.3 million SNPs takes 30 h using 20 parallel runs on a cluster. Mendel is freely available at http://www.genetics.ucla.edu/software.
Collapse
Affiliation(s)
- Hua Zhou
- Department of Biostatistics, University of California, Los Angeles, CA 90095 USA
| | - Jin Zhou
- Division of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, Tucson, AZ 85721-0066 USA
| | - Tao Hu
- Department of Biostatistics, University of California, Los Angeles, CA 90095 USA ; Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695 USA
| | - Eric M Sobel
- Department of Human Genetics, University of California, Los Angeles, CA 90095 USA
| | - Kenneth Lange
- Department of Human Genetics, University of California, Los Angeles, CA 90095 USA ; Department of Biomathematics, University of California, Los Angeles, CA 90095 USA ; Department of Statistics, University of California, Los Angeles, CA 90095 USA
| |
Collapse
|
17
|
Luu K, Bazin E, Blum MGB. pcadapt: anRpackage to perform genome scans for selection based on principal component analysis. Mol Ecol Resour 2016; 17:67-77. [DOI: 10.1111/1755-0998.12592] [Citation(s) in RCA: 471] [Impact Index Per Article: 58.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Revised: 07/29/2016] [Accepted: 08/01/2016] [Indexed: 12/24/2022]
Affiliation(s)
- Keurcien Luu
- Laboratoire TIMC-IMAG; UMR 5525; CNRS; Université Grenoble Alpes; Grenoble France
| | - Eric Bazin
- Laboratoire d'Ecologie Alpine UMR 5553; CNRS; Université Grenoble Alpes; Grenoble France
| | - Michael G. B. Blum
- Laboratoire TIMC-IMAG; UMR 5525; CNRS; Université Grenoble Alpes; Grenoble France
| |
Collapse
|
18
|
Reza Soroushmehr SM. Transforming big data into computational models for personalized medicine and health care. DIALOGUES IN CLINICAL NEUROSCIENCE 2016; 18:339-343. [PMID: 27757067 PMCID: PMC5067150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/05/2023]
Abstract
Health care systems generate a huge volume of different types of data. Due to the complexity and challenges inherent in studying medical information, it is not yet possible to create a comprehensive model capable of considering all the aspects of health care systems. There are different points of view regarding what the most efficient approaches toward utilization of this data would be. In this paper, we describe the potential role of big data approaches in improving health care systems and review the most common challenges facing the utilization of health care big data.
Collapse
Affiliation(s)
- S. M. Reza Soroushmehr
- Emergency Medicine Department, University of Michigan, Ann Arbor, Michigan, USA; University of Michigan Center for Integrative Research in Critical Care (MCIRCC), University of Michigan, Ann Arbor, Michigan, USA; Department of Computational Medicine and Bio-informatics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
19
|
Laurin C, Boomsma D, Lubke G. The use of vector bootstrapping to improve variable selection precision in Lasso models. Stat Appl Genet Mol Biol 2016; 15:305-20. [PMID: 27248122 PMCID: PMC5131926 DOI: 10.1515/sagmb-2015-0043] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The Lasso is a shrinkage regression method that is widely used for variable selection in statistical genetics. Commonly, K-fold cross-validation is used to fit a Lasso model. This is sometimes followed by using bootstrap confidence intervals to improve precision in the resulting variable selections. Nesting cross-validation within bootstrapping could provide further improvements in precision, but this has not been investigated systematically. We performed simulation studies of Lasso variable selection precision (VSP) with and without nesting cross-validation within bootstrapping. Data were simulated to represent genomic data under a polygenic model as well as under a model with effect sizes representative of typical GWAS results. We compared these approaches to each other as well as to software defaults for the Lasso. Nested cross-validation had the most precise variable selection at small effect sizes. At larger effect sizes, there was no advantage to nesting. We illustrated the nested approach with empirical data comprising SNPs and SNP-SNP interactions from the most significant SNPs in a GWAS of borderline personality symptoms. In the empirical example, we found that the default Lasso selected low-reliability SNPs and interactions which were excluded by bootstrapping.
Collapse
|
20
|
Rubanovich AV, Khromov-Borisov NN. Genetic risk assessment of the joint effect of several genes: Critical appraisal. RUSS J GENET+ 2016. [DOI: 10.1134/s1022795416070073] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
21
|
Assimes TL, Lee IT, Juang JM, Guo X, Wang TD, Kim ET, Lee WJ, Absher D, Chiu YF, Hsu CC, Chuang LM, Quertermous T, Hsiung CA, Rotter JI, Sheu WHH, Chen YDI, Taylor KD. Genetics of Coronary Artery Disease in Taiwan: A Cardiometabochip Study by the Taichi Consortium. PLoS One 2016; 11:e0138014. [PMID: 26982883 PMCID: PMC4794124 DOI: 10.1371/journal.pone.0138014] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Accepted: 08/24/2015] [Indexed: 01/12/2023] Open
Abstract
By means of a combination of genome-wide and follow-up studies, recent large-scale association studies of populations of European descent have now identified over 46 loci associated with coronary artery disease (CAD). As part of the TAICHI Consortium, we have collected and genotyped 8556 subjects from Taiwan, comprising 5423 controls and 3133 cases with coronary artery disease, for 9087 CAD SNPs using the CardioMetaboChip. We applied penalized logistic regression to ascertain the top SNPs that contribute together to CAD susceptibility in Taiwan. We observed that the 9p21 locus contributes to CAD at the level of genome-wide significance (rs1537372, with the presence of C, the major allele, the effect estimate is -0.216, standard error 0.033, p value 5.8x10-10). In contrast to a previous report, we propose that the 9p21 locus is a single genetic contribution to CAD in Taiwan because: 1) the penalized logistic regression and the follow-up conditional analysis suggested that rs1537372 accounts for all of the CAD association in 9p21, and 2) the high linkage disequilibrium observed for all associated SNPs in 9p21. We also observed evidence for the following loci at a false discovery rate >5%: SH2B3, ADAMTS7, PHACTR1, GGCX, HTRA1, COL4A1, and LARP6-LRRC49. We also took advantage of the fact that penalized methods are an efficient approach to search for gene-by-gene interactions, and observed that two-way interactions between the PHACTR1 and ADAMTS7 loci and between the SH2B3 and COL4A1 loci contribute to CAD risk. Both the similarities and differences between the significance of these loci when compared with significance of loci in studies of populations of European descent underscore the fact that further genetic association of studies in additional populations will provide clues to identify the genetic architecture of CAD across all populations worldwide.
Collapse
Affiliation(s)
- Themistocles L. Assimes
- Department of Medicine, Stanford University School of Medicine, Stanford, California, United States of America
| | - I. -T. Lee
- Division of Endocrine and Metabolism, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Jyh-Ming Juang
- Cardiovascular Center and Division of Cardiology, Department of Internal Medicine, National Taiwan University Hospital, National University College of Medicine, Taipei, Taiwan
| | - Xiuqing Guo
- Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, and Department of Pediatrics, University of California Los Angeles, Los Angeles, California, United States of America
| | - Tzung-Dau Wang
- Cardiovascular Center and Division of Cardiology, Department of Internal Medicine, National Taiwan University Hospital, National University College of Medicine, Taipei, Taiwan
| | - Eric T. Kim
- Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, and Department of Pediatrics, University of California Los Angeles, Los Angeles, California, United States of America
| | - Wen-Jane Lee
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Devin Absher
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama
| | - Yen-Feng Chiu
- Institute of Population Health Sciences, Division of Biostatistics and Bioinformatics, National Health Research Institutes, Zhunan Town, Miaoli County, Taiwan
| | - Chih-Cheng Hsu
- Institute of Population Health Sciences, Division of Biostatistics and Bioinformatics, National Health Research Institutes, Zhunan Town, Miaoli County, Taiwan
| | - Lee-Ming Chuang
- Division of Endocrine and Metabolism, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Thomas Quertermous
- Department of Medicine, Stanford University School of Medicine, Stanford, California, United States of America
| | - Chao A. Hsiung
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama
| | - Jerome I. Rotter
- Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, and Department of Pediatrics, University of California Los Angeles, Los Angeles, California, United States of America
| | - Wayne H.-H. Sheu
- Division of Endocrine and Metabolism, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Yii-Der Ida Chen
- Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, and Department of Pediatrics, University of California Los Angeles, Los Angeles, California, United States of America
| | - Kent D. Taylor
- Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, and Department of Pediatrics, University of California Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
22
|
Abstract
Matrix completion discriminant analysis (MCDA) is designed for semi-supervised learning where the rate of missingness is high and predictors vastly outnumber cases. MCDA operates by mapping class labels to the vertices of a regular simplex. With c classes, these vertices are arranged on the surface of the unit sphere in c - 1 dimensional Euclidean space. Because all pairs of vertices are equidistant, the classes are treated symmetrically. To assign unlabeled cases to classes, the data is entered into a large matrix (cases along rows and predictors along columns) that is augmented by vertex coordinates stored in the last c - 1 columns. Once the matrix is constructed, its missing entries can be filled in by matrix completion. To carry out matrix completion, one minimizes a sum of squares plus a nuclear norm penalty. The simplest solution invokes an MM algorithm and singular value decomposition. Choice of the penalty tuning constant can be achieved by cross validation on randomly withheld case labels. Once the matrix is completed, an unlabeled case is assigned to the class vertex closest to the point deposited in its last c - 1 columns. A variety of examples drawn from the statistical literature demonstrate that MCDA is competitive on traditional problems and outperforms alternatives on large-scale problems.
Collapse
Affiliation(s)
- Tong Tong Wu
- Associate Professor in the Departments of Biostatistics and Computational Biology, University of Rochester, NY 14642
| | - Kenneth Lange
- Professor of Biomathematics, Human Genetics, and Statistics at the University of California, Los Angeles, CA 90095
| |
Collapse
|
23
|
Zeng P, Wang T. Detecting the Genomic Signature of Divergent Selection in Presence of Gene Flow. Curr Genomics 2015; 16:203-12. [PMID: 26069460 PMCID: PMC4460224 DOI: 10.2174/1389202916666150313230943] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2014] [Revised: 02/23/2015] [Accepted: 03/09/2015] [Indexed: 11/22/2022] Open
Abstract
In this paper the detection of rare variants association with continuous phenotypes of interest is investigated via the likelihood-ratio based variance component test under the framework of linear mixed models. The hypothesis testing is challenging and nonstandard, since under the null the variance component is located on the boundary of its parameter space. In this situation the usual asymptotic chisquare distribution of the likelihood ratio statistic does not necessarily hold. To circumvent the derivation of the null distribution we resort to the bootstrap method due to its generic applicability and being easy to implement. Both parametric and nonparametric bootstrap likelihood ratio tests are studied. Numerical studies are implemented to evaluate the performance of the proposed bootstrap likelihood ratio test and compare to some existing methods for the identification of rare variants. To reduce the computational time of the bootstrap likelihood ratio test we propose an effective approximation mixture for the bootstrap null distribution. The GAW17 data is used to illustrate the proposed test.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, and Center of Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, 221004, P. R. China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, 221004, P. R. China
| |
Collapse
|
24
|
Tiesinga P, Bakker R, Hill S, Bjaalie JG. Feeding the human brain model. Curr Opin Neurobiol 2015; 32:107-14. [DOI: 10.1016/j.conb.2015.02.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Revised: 02/06/2015] [Accepted: 02/06/2015] [Indexed: 10/23/2022]
|
25
|
Duforet-Frebourg N, Bazin E, Blum MGB. Genome scans for detecting footprints of local adaptation using a Bayesian factor model. Mol Biol Evol 2014; 31:2483-95. [PMID: 24899666 PMCID: PMC4137708 DOI: 10.1093/molbev/msu182] [Citation(s) in RCA: 75] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
There is a considerable impetus in population genomics to pinpoint loci involved in local adaptation. A powerful approach to find genomic regions subject to local adaptation is to genotype numerous molecular markers and look for outlier loci. One of the most common approaches for selection scans is based on statistics that measure population differentiation such as FST. However, there are important caveats with approaches related to FST because they require grouping individuals into populations and they additionally assume a particular model of population structure. Here, we implement a more flexible individual-based approach based on Bayesian factor models. Factor models capture population structure with latent variables called factors, which can describe clustering of individuals into populations or isolation-by-distance patterns. Using hierarchical Bayesian modeling, we both infer population structure and identify outlier loci that are candidates for local adaptation. In order to identify outlier loci, the hierarchical factor model searches for loci that are atypically related to population structure as measured by the latent factors. In a model of population divergence, we show that it can achieve a 2-fold or more reduction of false discovery rate compared with the software BayeScan or with an FST approach. We show that our software can handle large data sets by analyzing the single nucleotide polymorphisms of the Human Genome Diversity Project. The Bayesian factor model is implemented in the open-source PCAdapt software.
Collapse
Affiliation(s)
- Nicolas Duforet-Frebourg
- Laboratoire TIMC-IMAG, UMR 5525, Centre National de la Recherche Scientifique, Université Joseph Fourier, Grenoble, France
| | - Eric Bazin
- Laboratoire d'Ecologie Alpine, UMR 5553, Centre National de la Recherche Scientifique, Université Joseph Fourier, Grenoble, France
| | - Michael G B Blum
- Laboratoire TIMC-IMAG, UMR 5525, Centre National de la Recherche Scientifique, Université Joseph Fourier, Grenoble, France
| |
Collapse
|
26
|
Vandenbergh DJ, Schlomer GL. Finding genomic function for genetic associations in nicotine addiction research: the ENCODE project's role in future pharmacogenomic analysis. Pharmacol Biochem Behav 2014; 123:34-44. [PMID: 24486638 PMCID: PMC4117825 DOI: 10.1016/j.pbb.2014.01.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Revised: 01/17/2014] [Accepted: 01/22/2014] [Indexed: 11/16/2022]
Abstract
Tobacco-related behaviors and the underlying addiction to nicotine are complex tangles of genetic and environmental factors. Efforts to understand the genetic component of these traits have identified sites in the genome (single nucleotide polymorphisms, or SNPs) that might account for some part of the role of genetics in nicotine addiction. Encouragingly, some of these candidate SNPs remain significant in meta-analyses. However, genetic associations cannot be fully assessed, regardless of statistical significance, without an understanding of the functional consequences of the alleles present at these SNPs. The proper experimental test for allelic function can be very difficult to define, representing a roadblock in translating genetic results into treatment to prevent smoking and other nicotine-related behaviors. This roadblock can be navigated in part with a new web-based tool, the Encyclopedia of DNA Elements (ENCODE). ENCODE is a compilation of searchable data on several types of biochemical functions or "marks" across the genome. These data can be queried for the co-localization of a candidate SNP and a biochemical mark. The presence of a SNP within a marked region of DNA enables the generation of better-informed hypotheses to test possible functional roles of alleles at a candidate SNP. Two examples of such co-localizations are presented. One example reveals ENCODE's ability to relate a candidate SNP's function with a gene very far from the physical location of the SNP. The second example reveals a new potential function of the SNP, rs4105144, that has been genetically associated with the number of cigarettes smoked per day. Details for accessing the ENCODE data for this SNP are provided to serve as a tutorial. By serving as a bridge between genetic associations and biochemical function, ENCODE has the power to propel progress in untangling the genetic aspects of nicotine addiction - a major public health concern.
Collapse
Affiliation(s)
- David J Vandenbergh
- Department of Biobehavioral Health, The Pennsylvania State University, 219 Biobehavioral Health Building, University Park, PA 16802, USA; Penn State Institute of the Neurosciences, 101 Life Sciences Building, University Park, PA 16802, USA.
| | - Gabriel L Schlomer
- Department of Human Development and Family Studies, The Pennsylvania State University, 315 Health and Human Development, East, University Park, PA 16802, USA.
| |
Collapse
|