1
|
Yao X, Ouyang S, Lian Y, Peng Q, Zhou X, Huang F, Hu X, Shi F, Xia J. PheSeq, a Bayesian deep learning model to enhance and interpret the gene-disease association studies. Genome Med 2024; 16:56. [PMID: 38627848 PMCID: PMC11020195 DOI: 10.1186/s13073-024-01330-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open
Abstract
Despite the abundance of genotype-phenotype association studies, the resulting association outcomes often lack robustness and interpretations. To address these challenges, we introduce PheSeq, a Bayesian deep learning model that enhances and interprets association studies through the integration and perception of phenotype descriptions. By implementing the PheSeq model in three case studies on Alzheimer's disease, breast cancer, and lung cancer, we identify 1024 priority genes for Alzheimer's disease and 818 and 566 genes for breast cancer and lung cancer, respectively. Benefiting from data fusion, these findings represent moderate positive rates, high recall rates, and interpretation in gene-disease association studies.
Collapse
Affiliation(s)
- Xinzhi Yao
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Sizhuo Ouyang
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Yulong Lian
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Qianqian Peng
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Xionghui Zhou
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Feier Huang
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Xuehai Hu
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Feng Shi
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Jingbo Xia
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.
| |
Collapse
|
2
|
Li R, Benz L, Duan R, Denny JC, Hakonarson H, Mosley JD, Smoller JW, Wei WQ, Ritchie MD, Moore JH, Chen Y. mixWAS: An efficient distributed algorithm for mixed-outcomes genome-wide association studies. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.09.24301073. [PMID: 38260403 PMCID: PMC10802662 DOI: 10.1101/2024.01.09.24301073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Genome-wide association studies (GWAS) have been instrumental in identifying genetic associations for various diseases and traits. However, uncovering genetic underpinnings among traits beyond univariate phenotype associations remains a challenge. Multi-phenotype associations (MPA), or genetic pleiotropy, offer important insights into shared genes and pathways among traits, enhancing our understanding of genetic architectures of complex diseases. GWAS of biobank-linked electronic health record (EHR) data are increasingly being utilized to identify MPA among various traits and diseases. However, methodologies that can efficiently take advantage of distributed EHR to detect MPA are still lacking. Here, we introduce mixWAS, a novel algorithm that efficiently and losslessly integrates multiple EHRs via summary statistics, allowing the detection of MPA among mixed phenotypes while accounting for heterogeneities across EHRs. Simulations demonstrate that mixWAS outperforms the widely used MPA detection method, Phenome-wide association study (PheWAS), across diverse scenarios. Applying mixWAS to data from seven EHRs in the US, we identified 4,534 MPA among blood lipids, BMI, and circulatory diseases. Validation in an independent EHR data from UK confirmed 97.7% of the associations. mixWAS fundamentally improves the detection of MPA and is available as a free, open-source software.
Collapse
Affiliation(s)
- Ruowang Li
- Department of Computational Biomedicine, Cedars-Sinai Medical Center
| | - Luke Benz
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| | - Rui Duan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health
| | - Hakon Hakonarson
- Division of Human Genetics, Children's Hospital of Philadelphia
- Center for Applied Genomics, Children's Hospital of Philadelphia
- Department of Pediatrics, University of Pennsylvania, Perelman School of Medicine
| | - Jonathan D Mosley
- Department of Medicine, Vanderbilt University Medical Center
- Department of Biomedical Informatics, Vanderbilt University Medical Center
| | - Jordan W Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center
| | - Marylyn D Ritchie
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania
| |
Collapse
|
3
|
Coral DE, Fernandez-Tajes J, Tsereteli N, Pomares-Millan H, Fitipaldi H, Mutie PM, Atabaki-Pasdar N, Kalamajski S, Poveda A, Miller-Fleming TW, Zhong X, Giordano GN, Pearson ER, Cox NJ, Franks PW. A phenome-wide comparative analysis of genetic discordance between obesity and type 2 diabetes. Nat Metab 2023; 5:237-247. [PMID: 36703017 PMCID: PMC9970876 DOI: 10.1038/s42255-022-00731-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 12/20/2022] [Indexed: 01/27/2023]
Abstract
Obesity and type 2 diabetes are causally related, yet there is considerable heterogeneity in the consequences of both conditions and the mechanisms of action are poorly defined. Here we show a genetic-driven approach defining two obesity profiles that convey highly concordant and discordant diabetogenic effects. We annotate and then compare association signals for these profiles across clinical and molecular phenotypic layers. Key differences are identified in a wide range of traits, including cardiovascular mortality, fat distribution, liver metabolism, blood pressure, specific lipid fractions and blood levels of proteins involved in extracellular matrix remodelling. We find marginal differences in abundance of Bacteroidetes and Firmicutes bacteria in the gut. Instrumental analyses reveal prominent causal roles for waist-to-hip ratio, blood pressure and cholesterol content of high-density lipoprotein particles in the development of diabetes in obesity. We prioritize 17 genes from the discordant signature that convey protection against type 2 diabetes in obesity, which may represent logical targets for precision medicine approaches.
Collapse
Affiliation(s)
- Daniel E Coral
- Genetic and Molecular Epidemiology Unit, Lund University Diabetes Centre, Department of Clinical Science, Lund University, Skåne University Hospital, Malmö, Sweden.
| | - Juan Fernandez-Tajes
- Genetic and Molecular Epidemiology Unit, Lund University Diabetes Centre, Department of Clinical Science, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Neli Tsereteli
- Genetic and Molecular Epidemiology Unit, Lund University Diabetes Centre, Department of Clinical Science, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Hugo Pomares-Millan
- Genetic and Molecular Epidemiology Unit, Lund University Diabetes Centre, Department of Clinical Science, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Hugo Fitipaldi
- Genetic and Molecular Epidemiology Unit, Lund University Diabetes Centre, Department of Clinical Science, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Pascal M Mutie
- Genetic and Molecular Epidemiology Unit, Lund University Diabetes Centre, Department of Clinical Science, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Naeimeh Atabaki-Pasdar
- Genetic and Molecular Epidemiology Unit, Lund University Diabetes Centre, Department of Clinical Science, Lund University, Skåne University Hospital, Malmö, Sweden
- Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK
| | - Sebastian Kalamajski
- Genetic and Molecular Epidemiology Unit, Lund University Diabetes Centre, Department of Clinical Science, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Alaitz Poveda
- Genetic and Molecular Epidemiology Unit, Lund University Diabetes Centre, Department of Clinical Science, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Tyne W Miller-Fleming
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Xue Zhong
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Giuseppe N Giordano
- Genetic and Molecular Epidemiology Unit, Lund University Diabetes Centre, Department of Clinical Science, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Ewan R Pearson
- Genetic and Molecular Epidemiology Unit, Lund University Diabetes Centre, Department of Clinical Science, Lund University, Skåne University Hospital, Malmö, Sweden
- Population Health and Genomics, University of Dundee, Dundee, UK
| | - Nancy J Cox
- Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK
| | - Paul W Franks
- Genetic and Molecular Epidemiology Unit, Lund University Diabetes Centre, Department of Clinical Science, Lund University, Skåne University Hospital, Malmö, Sweden.
- Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
4
|
Large-scale genomic analyses reveal insights into pleiotropy across circulatory system diseases and nervous system disorders. Nat Commun 2022; 13:3428. [PMID: 35701404 PMCID: PMC9198016 DOI: 10.1038/s41467-022-30678-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 05/10/2022] [Indexed: 01/18/2023] Open
Abstract
Clinical and epidemiological studies have shown that circulatory system diseases and nervous system disorders often co-occur in patients. However, genetic susceptibility factors shared between these disease categories remain largely unknown. Here, we characterized pleiotropy across 107 circulatory system and 40 nervous system traits using an ensemble of methods in the eMERGE Network and UK Biobank. Using a formal test of pleiotropy, five genomic loci demonstrated statistically significant evidence of pleiotropy. We observed region-specific patterns of direction of genetic effects for the two disease categories, suggesting potential antagonistic and synergistic pleiotropy. Our findings provide insights into the relationship between circulatory system diseases and nervous system disorders which can provide context for future prevention and treatment strategies.
Collapse
|
5
|
Abbas T, Chaturvedi G, Prakrithi P, Pathak AK, Kutum R, Dakle P, Narang A, Manchanda V, Patil R, Aggarwal D, Girase B, Srivastava A, Kapoor M, Gupta I, Pandey R, Juvekar S, Dash D, Mukerji M, Prasher B. Whole Exome Sequencing in Healthy Individuals of Extreme Constitution Types Reveals Differential Disease Risk: A Novel Approach towards Predictive Medicine. J Pers Med 2022; 12:jpm12030489. [PMID: 35330488 PMCID: PMC8952204 DOI: 10.3390/jpm12030489] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 02/23/2022] [Indexed: 12/10/2022] Open
Abstract
Precision medicine aims to move from traditional reactive medicine to a system where risk groups can be identified before the disease occurs. However, phenotypic heterogeneity amongst the diseased and healthy poses a major challenge for identification markers for risk stratification and early actionable interventions. In Ayurveda, individuals are phenotypically stratified into seven constitution types based on multisystem phenotypes termed “Prakriti”. It enables the prediction of health and disease trajectories and the selection of health interventions. We hypothesize that exome sequencing in healthy individuals of phenotypically homogeneous Prakriti types might enable the identification of functional variations associated with the constitution types. Exomes of 144 healthy Prakriti stratified individuals and controls from two genetically homogeneous cohorts (north and western India) revealed differential risk for diseases/traits like metabolic disorders, liver diseases, and body and hematological measurements amongst healthy individuals. These SNPs differ significantly from the Indo-European background control as well. Amongst these we highlight novel SNPs rs304447 (IFIT5) and rs941590 (SERPINA10) that could explain differential trajectories for immune response, bleeding or thrombosis. Our method demonstrates the requirement of a relatively smaller sample size for a well powered study. This study highlights the potential of integrating a unique phenotyping approach for the identification of predictive markers and the at-risk population amongst the healthy.
Collapse
Affiliation(s)
- Tahseen Abbas
- Centre of Excellence for Applied Development of Ayurveda Prakriti and Genomics, CSIR Ayurgenomics Unit-TRISUTRA, CSIR-Institute of Genomics & Integrative Biology, Delhi 110020, India; (T.A.); (G.C.); (R.K.); (P.D.); (A.N.); (V.M.)
- Informatics and Big Data Unit, CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110020, India
- Academy of Scientific and Innovative Research, Ghaziabad 201002, India
| | - Gaura Chaturvedi
- Centre of Excellence for Applied Development of Ayurveda Prakriti and Genomics, CSIR Ayurgenomics Unit-TRISUTRA, CSIR-Institute of Genomics & Integrative Biology, Delhi 110020, India; (T.A.); (G.C.); (R.K.); (P.D.); (A.N.); (V.M.)
- Academy of Scientific and Innovative Research, Ghaziabad 201002, India
- Genomics and Molecular Medicine, CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110020, India; (P.P.); (A.K.P.)
| | - P. Prakrithi
- Genomics and Molecular Medicine, CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110020, India; (P.P.); (A.K.P.)
| | - Ankit Kumar Pathak
- Genomics and Molecular Medicine, CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110020, India; (P.P.); (A.K.P.)
| | - Rintu Kutum
- Centre of Excellence for Applied Development of Ayurveda Prakriti and Genomics, CSIR Ayurgenomics Unit-TRISUTRA, CSIR-Institute of Genomics & Integrative Biology, Delhi 110020, India; (T.A.); (G.C.); (R.K.); (P.D.); (A.N.); (V.M.)
- Informatics and Big Data Unit, CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110020, India
- Academy of Scientific and Innovative Research, Ghaziabad 201002, India
| | - Pushkar Dakle
- Centre of Excellence for Applied Development of Ayurveda Prakriti and Genomics, CSIR Ayurgenomics Unit-TRISUTRA, CSIR-Institute of Genomics & Integrative Biology, Delhi 110020, India; (T.A.); (G.C.); (R.K.); (P.D.); (A.N.); (V.M.)
| | - Ankita Narang
- Centre of Excellence for Applied Development of Ayurveda Prakriti and Genomics, CSIR Ayurgenomics Unit-TRISUTRA, CSIR-Institute of Genomics & Integrative Biology, Delhi 110020, India; (T.A.); (G.C.); (R.K.); (P.D.); (A.N.); (V.M.)
- Informatics and Big Data Unit, CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110020, India
| | - Vijeta Manchanda
- Centre of Excellence for Applied Development of Ayurveda Prakriti and Genomics, CSIR Ayurgenomics Unit-TRISUTRA, CSIR-Institute of Genomics & Integrative Biology, Delhi 110020, India; (T.A.); (G.C.); (R.K.); (P.D.); (A.N.); (V.M.)
| | - Rutuja Patil
- Vadu Rural Health Program, KEM Hospital Research Centre, Pune 412216, India; (R.P.); (D.A.); (B.G.); (A.S.); (S.J.)
| | - Dhiraj Aggarwal
- Vadu Rural Health Program, KEM Hospital Research Centre, Pune 412216, India; (R.P.); (D.A.); (B.G.); (A.S.); (S.J.)
| | - Bhushan Girase
- Vadu Rural Health Program, KEM Hospital Research Centre, Pune 412216, India; (R.P.); (D.A.); (B.G.); (A.S.); (S.J.)
| | - Ankita Srivastava
- Vadu Rural Health Program, KEM Hospital Research Centre, Pune 412216, India; (R.P.); (D.A.); (B.G.); (A.S.); (S.J.)
| | - Manav Kapoor
- Department of Neuroscience, Icahn School of Medicine at Mt. Sinai, New York, NY 10029, USA;
| | - Ishaan Gupta
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi 110016, India;
| | - Rajesh Pandey
- INtegrative GENomics of HOst-PathogEn (INGEN-HOPE) Laboratory, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110007, India;
| | - Sanjay Juvekar
- Vadu Rural Health Program, KEM Hospital Research Centre, Pune 412216, India; (R.P.); (D.A.); (B.G.); (A.S.); (S.J.)
| | - Debasis Dash
- Informatics and Big Data Unit, CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110020, India
- Academy of Scientific and Innovative Research, Ghaziabad 201002, India
- Correspondence: (D.D.); (M.M.); (B.P.)
| | - Mitali Mukerji
- Centre of Excellence for Applied Development of Ayurveda Prakriti and Genomics, CSIR Ayurgenomics Unit-TRISUTRA, CSIR-Institute of Genomics & Integrative Biology, Delhi 110020, India; (T.A.); (G.C.); (R.K.); (P.D.); (A.N.); (V.M.)
- Academy of Scientific and Innovative Research, Ghaziabad 201002, India
- Genomics and Molecular Medicine, CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110020, India; (P.P.); (A.K.P.)
- Department of Bioscience and Bioengineering, Indian Institute of Technology Jodhpur, NH 62, Jodhpur 342037, India
- Correspondence: (D.D.); (M.M.); (B.P.)
| | - Bhavana Prasher
- Centre of Excellence for Applied Development of Ayurveda Prakriti and Genomics, CSIR Ayurgenomics Unit-TRISUTRA, CSIR-Institute of Genomics & Integrative Biology, Delhi 110020, India; (T.A.); (G.C.); (R.K.); (P.D.); (A.N.); (V.M.)
- Academy of Scientific and Innovative Research, Ghaziabad 201002, India
- Genomics and Molecular Medicine, CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110020, India; (P.P.); (A.K.P.)
- Correspondence: (D.D.); (M.M.); (B.P.)
| |
Collapse
|
6
|
Choe EK, Shivakumar M, Verma A, Verma SS, Choi SH, Kim JS, Kim D. Leveraging deep phenotyping from health check-up cohort with 10,000 Korean individuals for phenome-wide association study of 136 traits. Sci Rep 2022; 12:1930. [PMID: 35121771 PMCID: PMC8817039 DOI: 10.1038/s41598-021-04580-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 12/17/2021] [Indexed: 11/09/2022] Open
Abstract
The expanding use of the phenome-wide association study (PheWAS) faces challenges in the context of using International Classification of Diseases billing codes for phenotype definition, imbalanced study population ethnicity, and constrained application of the results in research. We performed a PheWAS utilizing 136 deep phenotypes corroborated by comprehensive health check-ups in a Korean population, along with trans-ethnic comparisons through using the UK Biobank and Biobank Japan Project. Meta-analysis with Korean and Japanese population was done. The PheWAS associated 65 phenotypes with 14,101 significant variants (P < 4.92 × 10-10). Network analysis, visualization of cross-phenotype mapping, and causal inference mapping with Mendelian randomization were conducted. Among phenotype pairs from the genotype-driven cross-phenotype associations, we evaluated penetrance in correlation analysis using a clinical database. We focused on the application of PheWAS in order to make it robust and to aid the derivation of biological meaning post-PheWAS. This comprehensive analysis of PheWAS results based on a health check-up database will provide researchers and clinicians with a panoramic overview of the networks among multiple phenotypes and genetic variants, laying groundwork for the practical application of precision medicine.
Collapse
Affiliation(s)
- Eun Kyung Choe
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, B304 Richards Building, 3700 Hamilton Walk, Philadelphia, PA, 19104-6116, USA.,Department of Surgery, Seoul National University Hospital Healthcare System Gangnam Center, Seoul, 06236, South Korea
| | - Manu Shivakumar
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, B304 Richards Building, 3700 Hamilton Walk, Philadelphia, PA, 19104-6116, USA
| | - Anurag Verma
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Shefali Setia Verma
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Seung Ho Choi
- Department of Internal Medicine, Seoul National University Hospital Healthcare System Gangnam Center, Seoul, 06236, South Korea
| | - Joo Sung Kim
- Department of Internal Medicine, Seoul National University Hospital Healthcare System Gangnam Center, Seoul, 06236, South Korea. .,Department of Internal Medicine and Liver Research Institute, Seoul National University College of Medicine, Seoul, 03080, South Korea.
| | - Dokyoon Kim
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, B304 Richards Building, 3700 Hamilton Walk, Philadelphia, PA, 19104-6116, USA. .,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
7
|
HLA-disease association and pleiotropy landscape in over 235,000 Finns. Hum Immunol 2022; 83:391-398. [DOI: 10.1016/j.humimm.2022.02.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 01/31/2022] [Accepted: 02/09/2022] [Indexed: 01/10/2023]
|
8
|
Maturation and application of phenome-wide association studies. Trends Genet 2022; 38:353-363. [PMID: 34991903 DOI: 10.1016/j.tig.2021.12.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 11/12/2021] [Accepted: 12/02/2021] [Indexed: 12/12/2022]
Abstract
In the past 10 years since its introduction, phenome-wide association studies (PheWAS) have uncovered novel genotype-phenotype relationships. Along the way, PheWAS have evolved in many aspects as a study design with the expanded availability of large data repositories with genome-wide data linked to detailed phenotypic data. Advancement in methods, including algorithms, software, and publicly available integrated resources, makes it feasible to more fully realize the potential of PheWAS, overcoming the previous computational and analytical limitations. We review here the most recent improvements and notable applications of PheWAS since the second half of the decade from its inception. We also note the challenges that remain embedded along the entire PheWAS analytical pipeline that necessitate further development of tools and resources to further advance the understanding of the complex genetic architecture underlying human diseases and traits.
Collapse
|
9
|
Recent innovations and in-depth aspects of post-genome wide association study (Post-GWAS) to understand the genetic basis of complex phenotypes. Heredity (Edinb) 2021; 127:485-497. [PMID: 34689168 PMCID: PMC8626474 DOI: 10.1038/s41437-021-00479-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 10/13/2021] [Accepted: 10/13/2021] [Indexed: 12/13/2022] Open
Abstract
In the past decade, the high throughput and low cost of sequencing/genotyping approaches have led to the accumulation of a large amount of data from genome-wide association studies (GWASs). The first aim of this review is to highlight how post-GWAS analysis can be used make sense of the obtained associations. Novel directions for integrating GWAS results with other resources, such as somatic mutation, metabolite-transcript, and transcriptomic data, are also discussed; these approaches can help us move beyond each individual data point and provide valuable information about complex trait genetics. In addition, cross-phenotype association tests, when the loci detected by GWASs have significant associations with multiple traits, are reviewed to provide biologically informative results for use in real-time applications. This review also discusses the challenges of identifying interactions between genetic mutations (epistasis) and mutations of loci affecting more than one trait (pleiotropy) as underlying causes of cross-phenotype associations; these challenges can be overcome using post-GWAS analysis. Genetic similarities between phenotypes that can be revealed using post-GWAS analysis are also discussed. In summary, different methodologies of post-GWAS analysis are now available, enhancing the value of information obtained from GWAS results, and facilitating application in both humans and nonhuman species. However, precise methods still need to be developed to overcome challenges in the field and uncover the genetic underpinnings of complex traits.
Collapse
|
10
|
Sakaue S, Kanai M, Tanigawa Y, Karjalainen J, Kurki M, Koshiba S, Narita A, Konuma T, Yamamoto K, Akiyama M, Ishigaki K, Suzuki A, Suzuki K, Obara W, Yamaji K, Takahashi K, Asai S, Takahashi Y, Suzuki T, Shinozaki N, Yamaguchi H, Minami S, Murayama S, Yoshimori K, Nagayama S, Obata D, Higashiyama M, Masumoto A, Koretsune Y, Ito K, Terao C, Yamauchi T, Komuro I, Kadowaki T, Tamiya G, Yamamoto M, Nakamura Y, Kubo M, Murakami Y, Yamamoto K, Kamatani Y, Palotie A, Rivas MA, Daly MJ, Matsuda K, Okada Y. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 2021; 53:1415-1424. [PMID: 34594039 DOI: 10.1038/s41588-021-00931-x] [Citation(s) in RCA: 531] [Impact Index Per Article: 177.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 08/04/2021] [Indexed: 02/08/2023]
Abstract
Current genome-wide association studies do not yet capture sufficient diversity in populations and scope of phenotypes. To expand an atlas of genetic associations in non-European populations, we conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining of electronic medical records. Meta-analyses with the UK Biobank and FinnGen (ntotal = 628,000) identified ~5,000 new loci, which improved the resolution of the genomic map of human traits. This atlas elucidated the landscape of pleiotropy as represented by the major histocompatibility complex locus, where we conducted HLA fine-mapping. Finally, we performed statistical decomposition of matrices of phenome-wide summary statistics, and identified latent genetic components, which pinpointed responsible variants and biological mechanisms underlying current disease classifications across populations. The decomposed components enabled genetically informed subtyping of similar diseases (for example, allergic diseases). Our study suggests a potential avenue for hypothesis-free re-investigation of human diseases through genetics.
Collapse
Affiliation(s)
- Saori Sakaue
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan. .,Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan. .,Center for Data Sciences, Harvard Medical School, Boston, MA, USA. .,Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. .,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| | - Masahiro Kanai
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan.,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Yosuke Tanigawa
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA
| | - Juha Karjalainen
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Mitja Kurki
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Seizo Koshiba
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,Advanced Research Center for Innovations in Next-Generation Medicine (INGEM), Sendai, Japan
| | - Akira Narita
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Takahiro Konuma
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Kenichi Yamamoto
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan.,Department of Pediatrics, Osaka University Graduate School of Medicine, Suita, Japan.,Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
| | - Masato Akiyama
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,Department of Ocular Pathology and Imaging Science, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Kazuyoshi Ishigaki
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,Center for Data Sciences, Harvard Medical School, Boston, MA, USA.,Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Akari Suzuki
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Ken Suzuki
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Wataru Obara
- Department of Urology, Iwate Medical University, Iwate, Japan
| | - Ken Yamaji
- Department of Internal Medicine and Rheumatology, Juntendo University Graduate School of Medicine, Tokyo, Japan
| | - Kazuhisa Takahashi
- Department of Respiratory Medicine, Juntendo University Graduate School of Medicine, Tokyo, Japan
| | - Satoshi Asai
- Division of Pharmacology, Department of Biomedical Science, Nihon University School of Medicine, Tokyo, Japan.,Division of Genomic Epidemiology and Clinical Trials, Clinical Trials Research Center, Nihon University School of Medicine, Tokyo, Japan
| | - Yasuo Takahashi
- Division of Genomic Epidemiology and Clinical Trials, Clinical Trials Research Center, Nihon University School of Medicine, Tokyo, Japan
| | | | | | | | - Shiro Minami
- Department of Bioregulation, Nippon Medical School, Kawasaki, Japan
| | - Shigeo Murayama
- Tokyo Metropolitan Geriatric Hospital and Institute of Gerontology, Tokyo, Japan
| | - Kozo Yoshimori
- Fukujuji Hospital, Japan Anti-Tuberculosis Association, Tokyo, Japan
| | - Satoshi Nagayama
- The Cancer Institute Hospital of the Japanese Foundation for Cancer Research, Tokyo, Japan
| | - Daisuke Obata
- Center for Clinical Research and Advanced Medicine, Shiga University of Medical Science, Otsu, Japan
| | - Masahiko Higashiyama
- Department of General Thoracic Surgery, Osaka International Cancer Institute, Osaka, Japan
| | | | | | | | - Kaoru Ito
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Toshimasa Yamauchi
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Issei Komuro
- Department of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Takashi Kadowaki
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.,Toranomon Hospital, Tokyo, Japan
| | - Gen Tamiya
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,Advanced Research Center for Innovations in Next-Generation Medicine (INGEM), Sendai, Japan.,Graduate School of Medicine, Tohoku University, Sendai, Japan.,Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
| | - Masayuki Yamamoto
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,Advanced Research Center for Innovations in Next-Generation Medicine (INGEM), Sendai, Japan.,Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Yusuke Nakamura
- Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan.,Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Tokyo, Japan
| | - Michiaki Kubo
- RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yoshinori Murakami
- Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Kazuhiko Yamamoto
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Aarno Palotie
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.,Psychiatric & Neurodevelopmental Genetics Unit, Department of Psychiatry, Analytic and Translational Genetics Unit, Department of Medicine, and the Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | - Manuel A Rivas
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA
| | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Koichi Matsuda
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, The University of Tokyo, Tokyo, Japan.
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan. .,Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan. .,Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan. .,Psychiatric & Neurodevelopmental Genetics Unit, Department of Psychiatry, Analytic and Translational Genetics Unit, Department of Medicine, and the Department of Neurology, Massachusetts General Hospital, Boston, MA, USA. .,Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Japan.
| |
Collapse
|
11
|
Abstract
Electronic health records (EHRs) are a rich source of data for researchers, but extracting meaningful information out of this highly complex data source is challenging. Phecodes represent one strategy for defining phenotypes for research using EHR data. They are a high-throughput phenotyping tool based on ICD (International Classification of Diseases) codes that can be used to rapidly define the case/control status of thousands of clinically meaningful diseases and conditions. Phecodes were originally developed to conduct phenome-wide association studies to scan for phenotypic associations with common genetic variants. Since then, phecodes have been used to support a wide range of EHR-based phenotyping methods, including the phenotype risk score. This review aims to comprehensively describe the development, validation, and applications of phecodes and suggest some future directions for phecodes and high-throughput phenotyping.
Collapse
Affiliation(s)
- Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA;
| |
Collapse
|
12
|
Vergara C, Valencia A, Thio CL, Goedert JJ, Mangia A, Piazzolla V, Johnson E, Kral AH, O’Brien TR, Mehta SH, Kirk GD, Kim AY, Lauer GM, Chung RT, Cox AL, Peters MG, Khakoo SI, Alric L, Cramp ME, Donfield SM, Edlin BR, Busch MP, Alexander G, Rosen HR, Murphy EL, Wojcik GL, Taub MA, Thomas DL, Duggal P. A Multiancestry Sex-Stratified Genome-Wide Association Study of Spontaneous Clearance of Hepatitis C Virus. J Infect Dis 2021; 223:2090-2098. [PMID: 33119750 PMCID: PMC8205624 DOI: 10.1093/infdis/jiaa677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 10/28/2020] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Spontaneous clearance of acute hepatitis C virus (HCV) infection is more common in women than in men, independent of known risk factors. METHODS To identify sex-specific genetic loci, we studied 4423 HCV-infected individuals (2903 male, 1520 female) of European, African, and Hispanic ancestry. We performed autosomal, and X chromosome sex-stratified and combined association analyses in each ancestry group. RESULTS A male-specific region near the adenosine diphosphate-ribosylation factor-like 5B (ARL5B) gene was identified. Individuals with the C allele of rs76398191 were about 30% more likely to have chronic HCV infection than individuals with the T allele (OR, 0.69; P = 1.98 × 10-07), and this was not seen in females. The ARL5B gene encodes an interferon-stimulated gene that inhibits immune response to double-stranded RNA viruses. We also identified suggestive associations near septin 6 and ribosomal protein L39 genes on the X chromosome. In box sexes, allele G of rs12852885 was associated with a 40% increase in HCV clearance compared with the A allele (OR, 1.4; P = 2.46 × 10-06). Septin 6 facilitates HCV replication via interaction with the HCV NS5b protein, and ribosomal protein L39 acts as an HCV core interactor. CONCLUSIONS These novel gene associations support differential mechanisms of HCV clearance between the sexes and provide biological targets for treatment or vaccine development.
Collapse
Affiliation(s)
- Candelaria Vergara
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Ana Valencia
- Johns Hopkins University, School of Medicine, Baltimore, Maryland, USA
- Universidad Pontificia Bolivariana, Medellín, Colombia
| | - Chloe L Thio
- Johns Hopkins University, School of Medicine, Baltimore, Maryland, USA
| | - James J Goedert
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Alessandra Mangia
- Liver Unit IRCCS “Casa Sollievo della Sofferenza,” San Giovanni Rotondo, Italy
| | - Valeria Piazzolla
- Liver Unit IRCCS “Casa Sollievo della Sofferenza,” San Giovanni Rotondo, Italy
| | - Eric Johnson
- RTI International, Research Triangle Park, North Carolina, USA
| | - Alex H Kral
- RTI International, Research Triangle Park, North Carolina, USA
| | - Thomas R O’Brien
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Shruti H Mehta
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Gregory D Kirk
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, Maryland, USA
- Johns Hopkins University, School of Medicine, Baltimore, Maryland, USA
| | - Arthur Y Kim
- Division of Infectious Diseases, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Georg M Lauer
- Liver Center and Gastrointestinal Division, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Raymond T Chung
- Liver Center and Gastrointestinal Division, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Andrea L Cox
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Marion G Peters
- Division of Gastroenterology, Department of Medicine, School of Medicine, University of California, San Francisco, California, USA
| | - Salim I Khakoo
- University of Southampton, Southampton General Hospital, Southampton, United Kingdom
| | - Laurent Alric
- Department of Internal Medicine and Digestive Diseases, CHU Rangueil, UMR 152 IRD, Toulouse 3 University, France
| | | | | | - Brian R Edlin
- SUNY Downstate College of Medicine, Brooklyn, New York, USA
| | - Michael P Busch
- University of California and Vitalant Research Institute, San Francisco, California, USA
| | - Graeme Alexander
- UCL Institute for Liver and Digestive Health, Royal Free Hospital, Hampstead, London, United Kingdom
| | | | - Edward L Murphy
- University of California and Vitalant Research Institute, San Francisco, California, USA
| | - Genevieve L Wojcik
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Margaret A Taub
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - David L Thomas
- Johns Hopkins University, School of Medicine, Baltimore, Maryland, USA
| | - Priya Duggal
- Johns Hopkins University, School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
13
|
Salvatore M, Gu T, Mack JA, Prabhu Sankar S, Patil S, Valley TS, Singh K, Nallamothu BK, Kheterpal S, Lisabeth L, Fritsche LG, Mukherjee B. A Phenome-Wide Association Study (PheWAS) of COVID-19 Outcomes by Race Using the Electronic Health Records Data in Michigan Medicine. J Clin Med 2021; 10:jcm10071351. [PMID: 33805886 PMCID: PMC8037108 DOI: 10.3390/jcm10071351] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 03/10/2021] [Accepted: 03/17/2021] [Indexed: 12/16/2022] Open
Abstract
Background: We performed a phenome-wide association study to identify pre-existing conditions related to Coronavirus disease 2019 (COVID-19) prognosis across the medical phenome and how they vary by race. Methods: The study is comprised of 53,853 patients who were tested/diagnosed for COVID-19 between 10 March and 2 September 2020 at a large academic medical center. Results: Pre-existing conditions strongly associated with hospitalization were renal failure, pulmonary heart disease, and respiratory failure. Hematopoietic conditions were associated with intensive care unit (ICU) admission/mortality and mental disorders were associated with mortality in non-Hispanic Whites. Circulatory system and genitourinary conditions were associated with ICU admission/mortality in non-Hispanic Blacks. Conclusions: Understanding pre-existing clinical diagnoses related to COVID-19 outcomes informs the need for targeted screening to support specific vulnerable populations to improve disease prevention and healthcare delivery.
Collapse
Affiliation(s)
- Maxwell Salvatore
- Department of Biostatistics, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA; (M.S.); (T.G.); (J.A.M.); (S.P.); (L.G.F.)
- Center for Precision Health Data Science, University of Michigan, Ann Arbor, MI 48109, USA;
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA;
| | - Tian Gu
- Department of Biostatistics, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA; (M.S.); (T.G.); (J.A.M.); (S.P.); (L.G.F.)
- Center for Precision Health Data Science, University of Michigan, Ann Arbor, MI 48109, USA;
| | - Jasmine A. Mack
- Department of Biostatistics, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA; (M.S.); (T.G.); (J.A.M.); (S.P.); (L.G.F.)
| | - Swaraaj Prabhu Sankar
- Center for Precision Health Data Science, University of Michigan, Ann Arbor, MI 48109, USA;
- Rogel Cancer Center, Michigan Medicine, Ann Arbor, MI 48109, USA
- Data Office for Clinical and Translational Research, University of Michigan, Ann Arbor, MI 41809, USA
| | - Snehal Patil
- Department of Biostatistics, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA; (M.S.); (T.G.); (J.A.M.); (S.P.); (L.G.F.)
- Center for Precision Health Data Science, University of Michigan, Ann Arbor, MI 48109, USA;
| | - Thomas S. Valley
- Division of Pulmonary and Critical Care Medicine, University of Michigan Medicine, Ann Arbor, MI 48109, USA;
- Department of Internal Medicine, Michigan Medicine, Ann Arbor, MI 48109, USA;
- Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI 48109, USA; (K.S.); (S.K.)
| | - Karandeep Singh
- Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI 48109, USA; (K.S.); (S.K.)
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI 48109, USA
| | - Brahmajee K. Nallamothu
- Department of Internal Medicine, Michigan Medicine, Ann Arbor, MI 48109, USA;
- Division of Cardiovascular Medicine, Michigan Medicine, Ann Arbor, MI 48109, USA
| | - Sachin Kheterpal
- Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI 48109, USA; (K.S.); (S.K.)
- Department of Anesthesiology, Michigan Medicine, Ann Arbor, MI 48109, USA
| | - Lynda Lisabeth
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA;
| | - Lars G. Fritsche
- Department of Biostatistics, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA; (M.S.); (T.G.); (J.A.M.); (S.P.); (L.G.F.)
- Center for Precision Health Data Science, University of Michigan, Ann Arbor, MI 48109, USA;
- Rogel Cancer Center, Michigan Medicine, Ann Arbor, MI 48109, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA; (M.S.); (T.G.); (J.A.M.); (S.P.); (L.G.F.)
- Center for Precision Health Data Science, University of Michigan, Ann Arbor, MI 48109, USA;
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA;
- Correspondence: ; Tel.: +1-(734)-764-6544
| |
Collapse
|
14
|
A Phenome-Wide Analysis of Healthcare Costs Associated with Inflammatory Bowel Diseases. Dig Dis Sci 2021; 66:760-767. [PMID: 32436120 DOI: 10.1007/s10620-020-06329-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 05/08/2020] [Indexed: 02/07/2023]
Abstract
INTRODUCTION Crohn's disease (CD) and ulcerative colitis (UC) are associated with considerable direct healthcare costs. There have been few comprehensive analyses of all IBD- and non-IBD comorbidities that determine direct costs in this population. METHODS We used data from a validated cohort of patients with inflammatory bowel disease (IBD). Total healthcare costs were estimated as a sum of costs associated with IBD-related hospitalizations and surgery, imaging (CT or MR scans), outpatient visits, endoscopic evaluation, and emergency room (ER) care. All ICD-9 codes were extracted for each patient and clustered into 1804 distinct phecode clusters representing individual phenotypes. A phenome-wide association analysis (PheWAS) was performed using logistic regression to identify predictors of being in the top decile of costs. RESULTS Our cohort is comprised of 10,721 patients with IBD among whom 50% had CD. The median age was 46 years. The median total cost per patient is $11,203 (IQR $2396-30,563). The strongest association with total healthcare costs was intestinal obstruction without mention of hernia (p = 5.93 × 10-156) and other intestinal obstruction (p = 9.24 × 10-131). In addition, strong associations were observed for symptoms consistent with severity of IBD including the presence of fluid-electrolyte imbalance (p = 1.90 × 10-130), hypovolemia (p = 1.65 × 10-114), abdominal pain (p = 7.29 × 10-60), and anemia (p = 1.90-10-83). Cardiopulmonary diseases and psychological comorbidity also demonstrated significant associations with total costs with the latter being more strongly associated with ER visit-related costs. CONCLUSIONS Surrogate markers suggesting possible irreversible bowel damage and active disease demonstrate the greatest influence on IBD-related healthcare costs.
Collapse
|
15
|
Salvatore M, Gu T, Mack JA, Sankar SP, Patil S, Valley TS, Singh K, Nallamothu BK, Kheterpal S, Lisabeth L, Fritsche LG, Mukherjee B. A phenome-wide association study (PheWAS) of COVID-19 outcomes by race using the electronic health records data in Michigan Medicine. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021. [PMID: 32793923 PMCID: PMC7418740 DOI: 10.1101/2020.06.29.20141564] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Background: We perform a phenome-wide scan to identify pre-existing conditions related to COVID-19 susceptibility and prognosis across the medical phenome and how they vary by race. Methods: The study is comprised of 53,853 patients who were tested/positive for COVID-19 between March 10 and September 2, 2020 at a large academic medical center. Results: Pre-existing conditions strongly associated with hospitalization were renal failure, pulmonary heart disease, and respiratory failure. Hematopoietic conditions were associated with ICU admission/mortality and mental disorders were associated with mortality in non-Hispanic Whites. Circulatory system and genitourinary conditions were associated with ICU admission/mortality in non-Hispanic Blacks. Conclusions: Understanding pre-existing clinical diagnoses related to COVID-19 outcomes informs the need for targeted screening to support specific vulnerable populations to improve disease prevention and healthcare delivery.
Collapse
Affiliation(s)
- Maxwell Salvatore
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, United States
| | - Tian Gu
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, United States
| | - Jasmine A Mack
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, United States
| | - Swaraaj Prabhu Sankar
- Rogel Cancer Center, University of Michigan Medicine, Ann Arbor, MI 48109, United States.,Data Office for Clinical and Translational Research, University of Michigan, Ann Arbor, MI 41809, United States
| | - Snehal Patil
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, United States.,Precision Health, University of Michigan, Ann Arbor, MI 48109, United States
| | - Thomas S Valley
- Division of Pulmonary and Critical Care Medicine and Department of Internal Medicine, University of Michigan Medicine, Ann Arbor, MI 48109, United States.,Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI 48109, United States
| | - Karandeep Singh
- Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI 48109, United States.,Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI 48109, United States
| | - Brahmajee K Nallamothu
- Division of Cardiovascular Medicine and Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109, United States
| | - Sachin Kheterpal
- Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI 48109, United States.,Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, MI 48109, United States
| | - Lynda Lisabeth
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI 48109, United States
| | - Lars G Fritsche
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, United States.,Rogel Cancer Center, University of Michigan Medicine, Ann Arbor, MI 48109, United States.,Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, United States
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, United States.,Rogel Cancer Center, University of Michigan Medicine, Ann Arbor, MI 48109, United States.,Precision Health, University of Michigan, Ann Arbor, MI 48109, United States
| |
Collapse
|
16
|
Denault WRP, Jugessur A. Detecting differentially methylated regions using a fast wavelet-based approach to functional association analysis. BMC Bioinformatics 2021; 22:61. [PMID: 33568045 PMCID: PMC7876806 DOI: 10.1186/s12859-021-03979-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 01/27/2021] [Indexed: 11/10/2022] Open
Abstract
Background We present here a computational shortcut to improve a powerful wavelet-based method by Shim and Stephens (Ann Appl Stat 9(2):665–686, 2015. 10.1214/14-AOAS776) called WaveQTL that was originally designed to identify DNase I hypersensitivity quantitative trait loci (dsQTL). Results WaveQTL relies on permutations to evaluate the significance of an association. We applied a recent method by Zhou and Guan (J Am Stat Assoc 113(523):1362–1371, 2017. 10.1080/01621459.2017.1328361) to boost computational speed, which involves calculating the distribution of Bayes factors and estimating the significance of an association by simulations rather than permutations. We called this simulation-based approach “fast functional wavelet” (FFW), and tested it on a publicly available DNA methylation (DNAm) dataset on colorectal cancer. The simulations confirmed a substantial gain in computational speed compared to the permutation-based approach in WaveQTL. Furthermore, we show that FFW controls the type I error satisfactorily and has good power for detecting differentially methylated regions. Conclusions Our approach has broad utility and can be applied to detect associations between different types of functions and phenotypes. As more and more DNAm datasets are being made available through public repositories, an attractive application of FFW would be to re-analyze these data and identify associations that might have been missed by previous efforts. The full R package for FFW is freely available at GitHub https://github.com/william-denault/ffw.
Collapse
Affiliation(s)
- William R P Denault
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway. .,Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway. .,Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway.
| | - Astanand Jugessur
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway.,Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway.,Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
| |
Collapse
|
17
|
Kember RL, Merikangas AK, Verma SS, Verma A, Judy R, Damrauer SM, Ritchie MD, Rader DJ, Bućan M. Polygenic Risk of Psychiatric Disorders Exhibits Cross-trait Associations in Electronic Health Record Data From European Ancestry Individuals. Biol Psychiatry 2021; 89:236-245. [PMID: 32919613 PMCID: PMC7770066 DOI: 10.1016/j.biopsych.2020.06.026] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 06/19/2020] [Accepted: 06/22/2020] [Indexed: 12/14/2022]
Abstract
BACKGROUND Prediction of disease risk is a key component of precision medicine. Common traits such as psychiatric disorders have a complex polygenic architecture, making the identification of a single risk predictor difficult. Polygenic risk scores (PRSs) denoting the sum of an individual's genetic liability for a disorder are a promising biomarker for psychiatric disorders, but they require evaluation in a clinical setting. METHODS We developed PRSs for 6 psychiatric disorders (schizophrenia, bipolar disorder, major depressive disorder, cross disorder, attention-deficit/hyperactivity disorder, and anorexia nervosa) and 17 nonpsychiatric traits in more than 10,000 individuals from the Penn Medicine Biobank with accompanying electronic health records. We performed phenome-wide association analyses to test their association across disease categories. RESULTS Four of the 6 psychiatric PRSs were associated with their primary phenotypes (odds ratios from 1.2 to 1.6). Cross-trait associations were identified both within the psychiatric domain and across trait domains. PRSs for coronary artery disease and years of education were significantly associated with psychiatric disorders, largely driven by an association with tobacco use disorder. CONCLUSIONS We demonstrated that the genetic architecture of electronic health record-derived psychiatric diagnoses is similar to ascertained research cohorts from large consortia. Psychiatric PRSs are moderately associated with psychiatric diagnoses but are not yet clinically predictive in naïve patients. Cross-trait associations for these PRSs suggest a broader effect of genetic liability beyond traditional diagnostic boundaries. As identification of genetic markers increases, including PRSs alongside other clinical risk factors may enhance prediction of psychiatric disorders and associated conditions in clinical registries.
Collapse
Affiliation(s)
- Rachel L. Kember
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA,MIRECC, CPL. Michael J. Crescenz VA Medical Center, Philadelphia, PA
| | - Alison K. Merikangas
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA
| | - Shefali S. Verma
- Department of Genetics, University of Pennsylvania, Philadelphia, PA
| | - Anurag Verma
- Department of Genetics, University of Pennsylvania, Philadelphia, PA
| | - Renae Judy
- Department of Surgery, University of Pennsylvania, Philadelphia, PA
| | | | - Scott M. Damrauer
- Department of Surgery, University of Pennsylvania, Philadelphia, PA,Department of Surgery, CPL. Michael J. Crescenz VA Medical Center, Philadelphia, PA
| | - Marylyn D. Ritchie
- Department of Genetics, University of Pennsylvania, Philadelphia, PA,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel J. Rader
- Department of Genetics, University of Pennsylvania, Philadelphia, PA,Department of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Maja Bućan
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA,Department of Genetics, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
18
|
Zhang X, Li R, Ritchie MD. Statistical Impact of Sample Size and Imbalance on Multivariate Analysis in silico and A Case Study in the UK Biobank. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2021; 2020:1383-1391. [PMID: 33936514 PMCID: PMC8075427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Large-scale biobank cohorts coupled with electronic health records offer unprecedented opportunities to study genotype-phenotype relationships. Genome-wide association studies uncovered disease-associated loci through univariate methods, with the focus on one trait at a time. With genetic variants being identifiedfor thousands of traits, researchers found that 90% of human genetic loci are associated with more than one trait, highlighting the ubiquity of pleiotropy. Recently, multivariate methods have been proposed to effectively identify pleiotropy. However, the statistical performance in natural biomedical data, which often have unbalanced case-control sample sizes, is largely known. In this work, we designed 21 scenarios of real-data informed simulations to thoroughly evaluate the statistical characteristics of univariate and multivariate methods. Our results can serve as a reference guide for the application of multivariate methods. We also investigated potential pleiotropy across type II diabetes, Alzheimer's disease, atherosclerosis of arteries, depression, and atherosclerotic heart disease in the UK Biobank.
Collapse
Affiliation(s)
- Xinyuan Zhang
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Ruowang Li
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Marylyn D Ritchie
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
19
|
Dennis JK, Sealock JM, Straub P, Lee YH, Hucks D, Actkins K, Faucon A, Feng YCA, Ge T, Goleva SB, Niarchou M, Singh K, Morley T, Smoller JW, Ruderfer DM, Mosley JD, Chen G, Davis LK. Clinical laboratory test-wide association scan of polygenic scores identifies biomarkers of complex disease. Genome Med 2021; 13:6. [PMID: 33441150 PMCID: PMC7807864 DOI: 10.1186/s13073-020-00820-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 12/08/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Clinical laboratory (lab) tests are used in clinical practice to diagnose, treat, and monitor disease conditions. Test results are stored in electronic health records (EHRs), and a growing number of EHRs are linked to patient DNA, offering unprecedented opportunities to query relationships between genetic risk for complex disease and quantitative physiological measurements collected on large populations. METHODS A total of 3075 quantitative lab tests were extracted from Vanderbilt University Medical Center's (VUMC) EHR system and cleaned for population-level analysis according to our QualityLab protocol. Lab values extracted from BioVU were compared with previous population studies using heritability and genetic correlation analyses. We then tested the hypothesis that polygenic risk scores for biomarkers and complex disease are associated with biomarkers of disease extracted from the EHR. In a proof of concept analyses, we focused on lipids and coronary artery disease (CAD). We cleaned lab traits extracted from the EHR performed lab-wide association scans (LabWAS) of the lipids and CAD polygenic risk scores across 315 heritable lab tests then replicated the pipeline and analyses in the Massachusetts General Brigham Biobank. RESULTS Heritability estimates of lipid values (after cleaning with QualityLab) were comparable to previous reports and polygenic scores for lipids were strongly associated with their referent lipid in a LabWAS. LabWAS of the polygenic score for CAD recapitulated canonical heart disease biomarker profiles including decreased HDL, increased pre-medication LDL, triglycerides, blood glucose, and glycated hemoglobin (HgbA1C) in European and African descent populations. Notably, many of these associations remained even after adjusting for the presence of cardiovascular disease and were replicated in the MGBB. CONCLUSIONS Polygenic risk scores can be used to identify biomarkers of complex disease in large-scale EHR-based genomic analyses, providing new avenues for discovery of novel biomarkers and deeper understanding of disease trajectories in pre-symptomatic individuals. We present two methods and associated software, QualityLab and LabWAS, to clean and analyze EHR labs at scale and perform a Lab-Wide Association Scan.
Collapse
Affiliation(s)
- Jessica K Dennis
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada
| | - Julia M Sealock
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Peter Straub
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Younga H Lee
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Donald Hucks
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Ky'Era Actkins
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Microbiology, Immunology, and Physiology, Meharry Medical College, Nashville, TN, 37232, USA
| | - Annika Faucon
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Yen-Chen Anne Feng
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Tian Ge
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Slavina B Goleva
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Maria Niarchou
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Kritika Singh
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Theodore Morley
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Jordan W Smoller
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Douglas M Ruderfer
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Jonathan D Mosley
- Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Lea K Davis
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University, 511-A Light Hall, 2215 Garland Ave, Nashville, TN, 37232, USA.
| |
Collapse
|
20
|
Li R, Duan R, Zhang X, Lumley T, Pendergrass S, Bauer C, Hakonarson H, Carrell DS, Smoller JW, Wei WQ, Carroll R, Velez Edwards DR, Wiesner G, Sleiman P, Denny JC, Mosley JD, Ritchie MD, Chen Y, Moore JH. Lossless integration of multiple electronic health records for identifying pleiotropy using summary statistics. Nat Commun 2021; 12:168. [PMID: 33420026 PMCID: PMC7794298 DOI: 10.1038/s41467-020-20211-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 11/13/2020] [Indexed: 11/22/2022] Open
Abstract
Increasingly, clinical phenotypes with matched genetic data from bio-bank linked electronic health records (EHRs) have been used for pleiotropy analyses. Thus far, pleiotropy analysis using individual-level EHR data has been limited to data from one site. However, it is desirable to integrate EHR data from multiple sites to improve the detection power and generalizability of the results. Due to privacy concerns, individual-level patients' data are not easily shared across institutions. As a result, we introduce Sum-Share, a method designed to efficiently integrate EHR and genetic data from multiple sites to perform pleiotropy analysis. Sum-Share requires only summary-level data and one round of communication from each site, yet it produces identical test statistics compared with that of pooled individual-level data. Consequently, Sum-Share can achieve lossless integration of multiple datasets. Using real EHR data from eMERGE, Sum-Share is able to identify 1734 potential pleiotropic SNPs for five cardiovascular diseases.
Collapse
Affiliation(s)
- Ruowang Li
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Rui Duan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Xinyuan Zhang
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Thomas Lumley
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Sarah Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, USA
| | - Christopher Bauer
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - David S Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Jordan W Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Centre, Nashville, TN, USA
| | - Robert Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Centre, Nashville, TN, USA
| | - Digna R Velez Edwards
- Clinical and Translational Hereditary Cancer Program, Division of Genetic Medicine, Department of Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University, Nashville, TN, USA
| | - Georgia Wiesner
- Clinical and Translational Hereditary Cancer Program, Division of Genetic Medicine, Department of Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University, Nashville, TN, USA
| | - Patrick Sleiman
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Josh C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Centre, Nashville, TN, USA
| | - Jonathan D Mosley
- Department of Biomedical Informatics, Vanderbilt University Medical Centre, Nashville, TN, USA
| | - Marylyn D Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| | - Jason H Moore
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
21
|
Hoh BP, Rahman TA. The indigenous populations as the model by nature to understand human genomic-phenomics interactions. QUANTITATIVE BIOLOGY 2021. [DOI: 10.15302/j-qb-021-0251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
22
|
Artificial intelligence in oncology. Artif Intell Med 2021. [DOI: 10.1016/b978-0-12-821259-2.00018-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
23
|
LabWAS: Novel findings and study design recommendations from a meta-analysis of clinical labs in two independent biobanks. PLoS Genet 2020; 16:e1009077. [PMID: 33175840 PMCID: PMC7682892 DOI: 10.1371/journal.pgen.1009077] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 11/23/2020] [Accepted: 08/24/2020] [Indexed: 01/10/2023] Open
Abstract
Phenotypes extracted from Electronic Health Records (EHRs) are increasingly prevalent in genetic studies. EHRs contain hundreds of distinct clinical laboratory test results, providing a trove of health data beyond diagnoses. Such lab data is complex and lacks a ubiquitous coding scheme, making it more challenging than diagnosis data. Here we describe the first large-scale cross-health system genome-wide association study (GWAS) of EHR-based quantitative laboratory-derived phenotypes. We meta-analyzed 70 lab traits matched between the BioVU cohort from the Vanderbilt University Health System and the Michigan Genomics Initiative (MGI) cohort from Michigan Medicine. We show high replication of known association for these traits, validating EHR-based measurements as high-quality phenotypes for genetic analysis. Notably, our analysis provides the first replication for 699 previous GWAS associations across 46 different traits. We discovered 31 novel associations at genome-wide significance for 22 distinct traits, including the first reported associations for two lab-based traits. We replicated 22 of these novel associations in an independent tranche of BioVU samples. The summary statistics for all association tests are freely available to benefit other researchers. Finally, we performed mirrored analyses in BioVU and MGI to assess competing analytic practices for EHR lab traits. We find that using the mean of all available lab measurements provides a robust summary value, but alternate summarizations can improve power in certain circumstances. This study provides a proof-of-principle for cross health system GWAS and is a framework for future studies of quantitative EHR lab traits.
Collapse
|
24
|
Zong N, Sharma DK, Yu Y, Egan JB, Davila JI, Wang C, Jiang G. Developing a FHIR-based Framework for Phenome Wide Association Studies: A Case Study with A Pan-Cancer Cohort. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020; 2020:750-759. [PMID: 32477698 PMCID: PMC7233075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Phenome Wide Association Studies (PheWAS) enables phenome-wide scans to discover novel associations between genotype and clinical phenotypes via linking available genomic reports and large-scale Electronic Health Record (EHR). Data heterogeneity from different EHR systems and genetic reports has been a critical challenge that hinders meaningful validation. To address this, we propose an FHIR-based framework to model the PheWAS study in a standard manner. We developed an FHIR-based data model profile to enable the standard representation of data elements from genetic reports and EHR data that are used in the PheWAS study. As a proof-of-concept, we implemented the proposed method using a cohort of 1,595 pan-cancer patients with genetic reports from Foundation Medicine as well as the corresponding lab tests and diagnosis from Mayo EHRs. A PheWAS study is conducted and 81 significant genotype-phenotype associations are identified, in which 36 significant associations for cancers are validated based on a literature review.
Collapse
Affiliation(s)
- Nansu Zong
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Deepak K Sharma
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Yue Yu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Jan B Egan
- Center for Individualized Medicine, Mayo Clinic, Rochester, MN
| | - Jaime I Davila
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Chen Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
25
|
A Comprehensive Genome-Wide and Phenome-Wide Examination of BMI and Obesity in a Northern Nevadan Cohort. G3-GENES GENOMES GENETICS 2020; 10:645-664. [PMID: 31888951 PMCID: PMC7003082 DOI: 10.1534/g3.119.400910] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The aggregation of Electronic Health Records (EHR) and personalized genetics leads to powerful discoveries relevant to population health. Here we perform genome-wide association studies (GWAS) and accompanying phenome-wide association studies (PheWAS) to validate phenotype-genotype associations of BMI, and to a greater extent, severe Class 2 obesity, using comprehensive diagnostic and clinical data from the EHR database of our cohort. Three GWASs of 500,000 variants on the Illumina platform of 6,645 Healthy Nevada participants identified several published and novel variants that affect BMI and obesity. Each GWAS was followed with two independent PheWASs to examine associations between extensive phenotypes (incidence of diagnoses, condition, or disease), significant SNPs, BMI, and incidence of extreme obesity. The first GWAS examines associations with BMI in a cohort with no type 2 diabetics, focusing exclusively on BMI. The second GWAS examines associations with BMI in a cohort that includes type 2 diabetics. In the second GWAS, type 2 diabetes is a comorbidity, and thus becomes a covariate in the statistical model. The intersection of significant variants of these two studies is surprising. The third GWAS is a case vs. control study, with cases defined as extremely obese (Class 2 or 3 obesity), and controls defined as participants with BMI between 18.5 and 25. This last GWAS identifies strong associations with extreme obesity, including established variants in the FTO and NEGR1 genes, as well as loci not yet linked to obesity. The PheWASs validate published associations between BMI and extreme obesity and incidence of specific diagnoses and conditions, yet also highlight novel links. This study emphasizes the importance of our extensive longitudinal EHR database to validate known associations and identify putative novel links with BMI and obesity.
Collapse
|
26
|
Chhetri HB, Furches A, Macaya-Sanz D, Walker AR, Kainer D, Jones P, Harman-Ware AE, Tschaplinski TJ, Jacobson D, Tuskan GA, DiFazio SP. Genome-Wide Association Study of Wood Anatomical and Morphological Traits in Populus trichocarpa. FRONTIERS IN PLANT SCIENCE 2020; 11:545748. [PMID: 33013968 PMCID: PMC7509168 DOI: 10.3389/fpls.2020.545748] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 08/21/2020] [Indexed: 05/04/2023]
Abstract
To understand the genetic mechanisms underlying wood anatomical and morphological traits in Populus trichocarpa, we used 869 unrelated genotypes from a common garden in Clatskanie, Oregon that were previously collected from across the distribution range in western North America. Using GEMMA mixed model analysis, we tested for the association of 25 phenotypic traits and nine multitrait combinations with 6.741 million SNPs covering the entire genome. Broad-sense trait heritabilities ranged from 0.117 to 0.477. Most traits were significantly correlated with geoclimatic variables suggesting a role of climate and geography in shaping the variation of this species. Fifty-seven SNPs from single trait GWAS and 11 SNPs from multitrait GWAS passed an FDR threshold of 0.05, leading to the identification of eight and seven nearby candidate genes, respectively. The percentage of phenotypic variance explained (PVE) by the significant SNPs for both single and multitrait GWAS ranged from 0.01% to 6.18%. To further evaluate the potential roles of candidate genes, we used a multi-omic network containing five additional data sets, including leaf and wood metabolite GWAS layers and coexpression and comethylation networks. We also performed a functional enrichment analysis on coexpression nearest neighbors for each gene model identified by the wood anatomical and morphological trait GWAS analyses. Genes affecting cell wall composition and transport related genes were enriched in wood anatomy and stomatal density trait networks. Signaling and metabolism related genes were also common in networks for stomatal density. For leaf morphology traits (leaf dry and wet weight) the networks were significantly enriched for GO terms related to photosynthetic processes as well as cellular homeostasis. The identified genes provide further insights into the genetic control of these traits, which are important determinants of the suitability and sustainability of improved genotypes for lignocellulosic biofuel production.
Collapse
Affiliation(s)
- Hari B. Chhetri
- Department of Biology, West Virginia University, Morgantown, WV, United States
| | - Anna Furches
- Biosciences Division, and The Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, United States
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, TN, United States
| | - David Macaya-Sanz
- Department of Biology, West Virginia University, Morgantown, WV, United States
| | - Alejandro R. Walker
- Department of Oral Biology, College of Dentistry, University of Florida, Gainesville, FL, United States
| | - David Kainer
- Biosciences Division, and The Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Piet Jones
- Biosciences Division, and The Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, United States
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, TN, United States
| | - Anne E. Harman-Ware
- Biosciences Center, and National Bioenergy Center, National Renewable Energy Laboratory, Golden, CO, United States
| | - Timothy J. Tschaplinski
- Biosciences Division, and The Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Daniel Jacobson
- Biosciences Division, and The Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, United States
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, TN, United States
| | - Gerald A. Tuskan
- Biosciences Division, and The Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Stephen P. DiFazio
- Department of Biology, West Virginia University, Morgantown, WV, United States
- *Correspondence: Stephen P. DiFazio,
| |
Collapse
|
27
|
Pacanowski M, Liu Q. Precision Medicine 2030. Clin Pharmacol Ther 2019; 107:62-64. [DOI: 10.1002/cpt.1675] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Accepted: 10/03/2019] [Indexed: 11/07/2022]
Affiliation(s)
- Michael Pacanowski
- Office of Clinical Pharmacology Office of Translational Sciences Center for Drug Evaluation and Research US Food and Drug Administration Silver Spring Maryland USA
| | - Qi Liu
- Office of Clinical Pharmacology Office of Translational Sciences Center for Drug Evaluation and Research US Food and Drug Administration Silver Spring Maryland USA
| |
Collapse
|
28
|
Dershem R, Metpally RPR, Jeffreys K, Krishnamurthy S, Smelser DT, Hershfinkel M, Carey DJ, Robishaw JD, Breitwieser GE. Rare-variant pathogenicity triage and inclusion of synonymous variants improves analysis of disease associations of orphan G protein-coupled receptors. J Biol Chem 2019; 294:18109-18121. [PMID: 31628190 DOI: 10.1074/jbc.ra119.009253] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 10/08/2019] [Indexed: 02/02/2023] Open
Abstract
The pace of deorphanization of G protein-coupled receptors (GPCRs) has slowed, and new approaches are required. Small molecule targeting of orphan GPCRs can potentially be of clinical benefit even if the endogenous receptor ligand has not been identified. Many GPCRs lack common variants that lead to reproducible genome-wide disease associations, and rare-variant approaches have emerged as a viable alternative to identify disease associations for such genes. Therefore, our goal was to prioritize orphan GPCRs by determining their associations with human diseases in a large clinical population. We used sequence kernel association tests to assess the disease associations of 85 orphan or understudied GPCRs in an unselected cohort of 51,289 individuals. Using rare loss-of-function variants, missense variants predicted to be pathogenic or likely pathogenic, and a subset of rare synonymous variants that cause large changes in local codon bias as independent data sets, we found strong, phenome-wide disease associations shared by two or more variant categories for 39% of the GPCRs. To validate the bioinformatics and sequence kernel association test analyses, we functionally characterized rare missense and synonymous variants of GPR39, a family A GPCR, revealing altered expression or Zn2+-mediated signaling for members of both variant classes. These results support the utility of rare variant analyses for identifying disease associations for GPCRs that lack impactful common variants. We highlight the importance of rare synonymous variants in human physiology and argue for their routine inclusion in any comprehensive analysis of genomic variants as potential causes of disease.
Collapse
Affiliation(s)
- Ridge Dershem
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822
| | - Raghu P R Metpally
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822
| | - Kirk Jeffreys
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822
| | - Sarathbabu Krishnamurthy
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822
| | - Diane T Smelser
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822
| | - Michal Hershfinkel
- Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, 8410501 Israel
| | -
- Regeneron Pharmaceuticals, Inc., Tarrytown, New York 10591
| | - David J Carey
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822
| | - Janet D Robishaw
- Schmidt College of Medicine, Florida Atlantic University, Boca Raton, Florida 33431
| | - Gerda E Breitwieser
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822.
| |
Collapse
|
29
|
Affiliation(s)
- William S Weintraub
- From the MedStar Heart and Vascular Institute, Georgetown University, Washington, DC (W.S.W.)
| | - Akl C Fahed
- Division of Cardiology and Center for Genomic Medicine, Massachusetts General Hospital, Boston (A.C.F.).,The Broad Institute of Harvard and MIT, Cambridge, MA (A.C.F.).,Harvard Medical School, Boston, MA (A.C.F.)
| | | |
Collapse
|
30
|
Read RW, Schlauch KA, Elhanan G, Metcalf WJ, Slonim AD, Aweti R, Borkowski R, Grzymski JJ. GWAS and PheWAS of red blood cell components in a Northern Nevadan cohort. PLoS One 2019; 14:e0218078. [PMID: 31194788 PMCID: PMC6564422 DOI: 10.1371/journal.pone.0218078] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 05/21/2019] [Indexed: 01/20/2023] Open
Abstract
In this study, we perform a full genome-wide association study (GWAS) to identify statistically significantly associated single nucleotide polymorphisms (SNPs) with three red blood cell (RBC) components and follow it with two independent PheWASs to examine associations between phenotypic data (case-control status of diagnoses or disease), significant SNPs, and RBC component levels. We first identified associations between the three RBC components: mean platelet volume (MPV), mean corpuscular volume (MCV), and platelet counts (PC), and the genotypes of approximately 500,000 SNPs on the Illumina Infimum DNA Human OmniExpress-24 BeadChip using a single cohort of 4,673 Northern Nevadans. Twenty-one SNPs in five major genomic regions were found to be statistically significantly associated with MPV, two regions with MCV, and one region with PC, with p<5x10-8. Twenty-nine SNPs and nine chromosomal regions were identified in 30 previous GWASs, with effect sizes of similar magnitude and direction as found in our cohort. The two strongest associations were SNP rs1354034 with MPV (p = 2.4x10-13) and rs855791 with MCV (p = 5.2x10-12). We then examined possible associations between these significant SNPs and incidence of 1,488 phenotype groups mapped from International Classification of Disease version 9 and 10 (ICD9 and ICD10) codes collected in the extensive electronic health record (EHR) database associated with Healthy Nevada Project consented participants. Further leveraging data collected in the EHR, we performed an additional PheWAS to identify associations between continuous red blood cell (RBC) component measures and incidence of specific diagnoses. The first PheWAS illuminated whether SNPs associated with RBC components in our cohort were linked with other hematologic phenotypic diagnoses or diagnoses of other nature. Although no SNPs from our GWAS were identified as strongly associated to other phenotypic components, a number of associations were identified with p-values ranging between 1x10-3 and 1x10-4 with traits such as respiratory failure, sleep disorders, hypoglycemia, hyperglyceridemia, GERD and IBS. The second PheWAS examined possible phenotypic predictors of abnormal RBC component measures: a number of hematologic phenotypes such as thrombocytopenia, anemias, hemoglobinopathies and pancytopenia were found to be strongly associated to RBC component measures; additional phenotypes such as (morbid) obesity, malaise and fatigue, alcoholism, and cirrhosis were also identified to be possible predictors of RBC component measures.
Collapse
Affiliation(s)
- Robert W. Read
- Applied Innovation Center, Renown Institute for Health Innovation, Desert Research Institute, Reno, NV, United States of America
| | - Karen A. Schlauch
- Applied Innovation Center, Renown Institute for Health Innovation, Desert Research Institute, Reno, NV, United States of America
| | - Gai Elhanan
- Applied Innovation Center, Renown Institute for Health Innovation, Desert Research Institute, Reno, NV, United States of America
| | - William J. Metcalf
- Applied Innovation Center, Renown Institute for Health Innovation, Desert Research Institute, Reno, NV, United States of America
| | | | - Ramsey Aweti
- 23andMe, Inc., Mountain View, CA, United States of America
| | | | - Joseph J. Grzymski
- Applied Innovation Center, Renown Institute for Health Innovation, Desert Research Institute, Reno, NV, United States of America
- Renown Health, Reno, NV, United States of America
- * E-mail:
| |
Collapse
|
31
|
Miller JE, Metpally RP, Person TN, Krishnamurthy S, Dasari VR, Shivakumar M, Lavage DR, Cook AM, Carey DJ, Ritchie MD, Kim D, Gogoi R. Systematic characterization of germline variants from the DiscovEHR study endometrial carcinoma population. BMC Med Genomics 2019; 12:59. [PMID: 31053132 PMCID: PMC6499978 DOI: 10.1186/s12920-019-0504-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 04/15/2019] [Indexed: 02/02/2023] Open
Abstract
Background Endometrial cancer (EMCA) is the fifth most common cancer among women in the world. Identification of potentially pathogenic germline variants from individuals with EMCA will help characterize genetic features that underlie the disease and potentially predispose individuals to its pathogenesis. Methods The Geisinger Health System’s (GHS) DiscovEHR cohort includes exome sequencing on over 50,000 consenting patients, 297 of whom have evidence of an EMCA diagnosis in their electronic health record. Here, rare variants were annotated as potentially pathogenic. Results Eight genes were identified as having increased burden in the EMCA cohort relative to the non-cancer control cohort. None of the eight genes had an increased burden in the other hormone related cancer cohort from GHS, suggesting they can help characterize the underlying genetic variation that gives rise to EMCA. Comparing GHS to the cancer genome atlas (TCGA) EMCA germline data illustrated 34 genes with potentially pathogenic variation and eight unique potentially pathogenic variants that were present in both studies. Thus, similar germline variation among genes can be observed in unique EMCA cohorts and could help prioritize genes to investigate for future work. Conclusion In summary, this systematic characterization of potentially pathogenic germline variants describes the genetic underpinnings of EMCA through the use of data from a single hospital system. Electronic supplementary material The online version of this article (10.1186/s12920-019-0504-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jason E Miller
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Raghu P Metpally
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, 17822, USA
| | - Thomas N Person
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, 17822, USA
| | | | | | - Manu Shivakumar
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, 17822, USA
| | - Daniel R Lavage
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, 17822, USA
| | - Adam M Cook
- Weis Center for Research, Geisinger Medical Center, Danville, PA, 17822, USA
| | - David J Carey
- Weis Center for Research, Geisinger Medical Center, Danville, PA, 17822, USA
| | - Marylyn D Ritchie
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Dokyoon Kim
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, 17822, USA.,Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.,Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, USA
| | - Radhika Gogoi
- Weis Center for Research, Geisinger Medical Center, Danville, PA, 17822, USA.
| | | |
Collapse
|
32
|
Zhang XA, Yates A, Vasilevsky N, Gourdine JP, Callahan TJ, Carmody LC, Danis D, Joachimiak MP, Ravanmehr V, Pfaff ER, Champion J, Robasky K, Xu H, Fecho K, Walton NA, Zhu RL, Ramsdill J, Mungall CJ, Köhler S, Haendel MA, McDonald CJ, Vreeman DJ, Peden DB, Bennett TD, Feinstein JA, Martin B, Stefanski AL, Hunter LE, Chute CG, Robinson PN. Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery. NPJ Digit Med 2019; 2:32. [PMID: 31119199 PMCID: PMC6527418 DOI: 10.1038/s41746-019-0110-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 04/18/2019] [Indexed: 12/22/2022] Open
Abstract
Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms. We annotated the medical implications of 2923 commonly used laboratory tests with HPO terms. Using these annotations, our software assesses laboratory test results and converts each result into an HPO term. We validated our approach with EHR data from 15,681 patients with respiratory complaints and identified known biomarkers for asthma. Finally, we provide a freely available SMART on FHIR application that can be used within EHR systems. Our approach allows readily available laboratory tests in EHR to be reused for deep phenotyping and exploits the hierarchical structure of HPO to integrate distinct tests that have comparable medical interpretations for association studies.
Collapse
Affiliation(s)
| | - Amy Yates
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
| | - Nicole Vasilevsky
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR 97239 USA
| | - J. P. Gourdine
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
- Library, Oregon Health and Science University, Portland, OR 97239 USA
| | - Tiffany J. Callahan
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
| | - Leigh C. Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
| | - Marcin P. Joachimiak
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
| | - Vida Ravanmehr
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
| | - Emily R. Pfaff
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - James Champion
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Kimberly Robasky
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
- Genetics Department, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
- School of Information and Library Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Hao Xu
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Karamarie Fecho
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Nephi A. Walton
- Genomic Medicine Institute, Geisinger Health System, Danville, PA 17822 USA
| | - Richard L. Zhu
- Institute for Clinical and Translational Research, Johns Hopkins University, Baltimore, MD 21202 USA
| | - Justin Ramsdill
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
| | - Sebastian Köhler
- Charité Centrum für Therapieforschung, Charité - Universitätsmedizin Berlin Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, 10117 Germany
- Einstein Center Digital Future, Berlin, 10117 Germany
| | - Melissa A. Haendel
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR 97239 USA
- Linus Pauling Institute and Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR 97331 USA
| | - Clement J. McDonald
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| | - Daniel J. Vreeman
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202 USA
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, IN 46202 USA
| | - David B. Peden
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
- Division of Allergy, Immunology and Rheumatology, Department of Pediatrics, University of North Carolina, Chapel Hill, NC 27599 USA
- University of North Carolina Center for Environmental Medicine, Asthma and Lung Biology, University of North Carolina, Chapel Hill, NC 27599 USA
| | - Tellen D. Bennett
- Department of Pediatrics, Section of Pediatric Critical Care, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - James A. Feinstein
- Adult and Child Consortium for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Blake Martin
- Department of Pediatrics, Section of Pediatric Critical Care, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Adrianne L. Stefanski
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
| | - Lawrence E. Hunter
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
| | - Christopher G. Chute
- Institute for Clinical and Translational Research, Johns Hopkins University, Baltimore, MD 21202 USA
| | - Peter N. Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032 USA
| |
Collapse
|
33
|
James G, Reisberg S, Lepik K, Galwey N, Avillach P, Kolberg L, Mägi R, Esko T, Alexander M, Waterworth D, Loomis AK, Vilo J. An exploratory phenome wide association study linking asthma and liver disease genetic variants to electronic health records from the Estonian Biobank. PLoS One 2019; 14:e0215026. [PMID: 30978214 PMCID: PMC6461350 DOI: 10.1371/journal.pone.0215026] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 03/25/2019] [Indexed: 12/22/2022] Open
Abstract
The Estonian Biobank, governed by the Institute of Genomics at the University of Tartu (Biobank), has stored genetic material/DNA and continuously collected data since 2002 on a total of 52,274 individuals representing ~5% of the Estonian adult population and is increasing. To explore the utility of data available in the Biobank, we conducted a phenome-wide association study (PheWAS) in two areas of interest to healthcare researchers; asthma and liver disease. We used 11 asthma and 13 liver disease-associated single nucleotide polymorphisms (SNPs), identified from published genome-wide association studies, to test our ability to detect established associations. We confirmed 2 asthma and 5 liver disease associated variants at nominal significance and directionally consistent with published results. We found 2 associations that were opposite to what was published before (rs4374383:AA increases risk of NASH/NAFLD, rs11597086 increases ALT level). Three SNP-diagnosis pairs passed the phenome-wide significance threshold: rs9273349 and E06 (thyroiditis, p = 5.50x10-8); rs9273349 and E10 (type-1 diabetes, p = 2.60x10-7); and rs2281135 and K76 (non-alcoholic liver diseases, including NAFLD, p = 4.10x10-7). We have validated our approach and confirmed the quality of the data for these conditions. Importantly, we demonstrate that the extensive amount of genetic and medical information from the Estonian Biobank can be successfully utilized for scientific research.
Collapse
Affiliation(s)
- Glen James
- AstraZeneca, Global Medical Affairs, Cambridge, United Kingdom
| | - Sulev Reisberg
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- STACC, Tartu, Estonia
- Quretec, Tartu, Estonia
| | - Kaido Lepik
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Nicholas Galwey
- GlaxoSmithKline, Research and Development, Stevenage, United Kingdom
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, United States of America
- Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Rotterdam, Netherlands
| | - Liis Kolberg
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Reedik Mägi
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Tõnu Esko
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Myriam Alexander
- GlaxoSmithKline, Research and Development, Stevenage, United Kingdom
| | - Dawn Waterworth
- GlaxoSmithKline, Genetics, Collegeville, PA, United States of America
| | - A. Katrina Loomis
- Pfizer Worldwide Research and Development, Groton, CT, United States of America
| | - Jaak Vilo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| |
Collapse
|
34
|
Vergara C, Thio CL, Johnson E, Kral AH, O'Brien TR, Goedert JJ, Mangia A, Piazzolla V, Mehta SH, Kirk GD, Kim AY, Lauer GM, Chung RT, Cox AL, Peters MG, Khakoo SI, Alric L, Cramp ME, Donfield SM, Edlin BR, Busch MP, Alexander G, Rosen HR, Murphy EL, Latanich R, Wojcik GL, Taub MA, Valencia A, Thomas DL, Duggal P. Multi-Ancestry Genome-Wide Association Study of Spontaneous Clearance of Hepatitis C Virus. Gastroenterology 2019; 156:1496-1507.e7. [PMID: 30593799 PMCID: PMC6788806 DOI: 10.1053/j.gastro.2018.12.014] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Revised: 12/05/2018] [Accepted: 12/19/2018] [Indexed: 02/08/2023]
Abstract
BACKGROUND & AIMS Spontaneous clearance of hepatitis C virus (HCV) occurs in approximately 30% of infected persons and less often in populations of African ancestry. Variants in major histocompatibility complex (MHC) and in interferon lambda genes are associated with spontaneous HCV clearance, but there have been few studies of these variants in persons of African ancestry. We performed a dense multi-ancestry genome-wide association study of spontaneous clearance of HCV, focusing on individuals of African ancestry. METHODS We performed genotype analyses of 4423 people from 3 ancestry groups: 2201 persons of African ancestry (445 with HCV clearance and 1756 with HCV persistence), 1739 persons of European ancestry (701 with HCV clearance and 1036 with HCV persistence), and 486 multi-ancestry Hispanic persons (173 with HCV clearance and 313 with HCV persistence). Samples were genotyped using Illumina (San Diego, CA) arrays and statistically imputed to the 1000 Genomes Project. For each ancestry group, the association of single-nucleotide polymorphisms with HCV clearance was tested by log-additive analysis, and then a meta-analysis was performed. RESULTS In the meta-analysis, significant associations with HCV clearance were confirmed at the interferon lambda gene locus IFNL4-IFNL3 (19q13.2) (P = 5.99 × 10-50) and the MHC locus 6p21.32 (P = 1.15 × 10-21). We also associated HCV clearance with polymorphisms in the G-protein-coupled receptor 158 gene (GPR158) at 10p12.1 (P = 1.80 × 10-07). These 3 loci had independent, additive effects of HCV clearance, and account for 6.8% and 5.9% of the variance of HCV clearance in persons of European and African ancestry, respectively. Persons of African or European ancestry carrying all 6 variants were 24-fold and 11-fold, respectively, more likely to clear HCV infection compared with individuals carrying none or 1 of the clearance-associated variants. CONCLUSIONS In a meta-analysis of data from 3 studies, we found variants in MHC genes, IFNL4-IFNL3, and GPR158 to increase odds of HCV clearance in patients of European and African ancestry. These findings could increase our understanding of immune response to and clearance of HCV infection.
Collapse
Affiliation(s)
| | - Chloe L Thio
- Johns Hopkins University, School of Medicine, Baltimore, Maryland
| | - Eric Johnson
- Research Triangle Institute International, Research Triangle Park, North Carolina; Atlanta, Georgia; San Francisco, California
| | - Alex H Kral
- Research Triangle Institute International, Research Triangle Park, North Carolina; Atlanta, Georgia; San Francisco, California
| | - Thomas R O'Brien
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - James J Goedert
- Liver Unit Istituto Di Ricovero e Cura a Carattere Scientifico "Casa Sollievo della Sofferenza", San Giovanni Rotondo, Italy
| | - Alessandra Mangia
- Liver Unit Istituto Di Ricovero e Cura a Carattere Scientifico "Casa Sollievo della Sofferenza", San Giovanni Rotondo, Italy
| | - Valeria Piazzolla
- Liver Unit Istituto Di Ricovero e Cura a Carattere Scientifico "Casa Sollievo della Sofferenza", San Giovanni Rotondo, Italy
| | - Shruti H Mehta
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, Maryland
| | - Gregory D Kirk
- Johns Hopkins University, School of Medicine, Baltimore, Maryland; Johns Hopkins University, Bloomberg School of Public Health, Baltimore, Maryland
| | - Arthur Y Kim
- Liver Center and Gastrointestinal Division, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts
| | - Georg M Lauer
- Liver Center and Gastrointestinal Division, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts
| | - Raymond T Chung
- Liver Center and Gastrointestinal Division, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts
| | - Andrea L Cox
- Johns Hopkins University, School of Medicine, Baltimore, Maryland
| | - Marion G Peters
- Division of Gastroenterology, Department of Medicine, School of Medicine, University of California, San Francisco, California
| | - Salim I Khakoo
- University of Southampton, Southampton General Hospital, Southampton, UK
| | - Laurent Alric
- Department of Internal Medicine and Digestive Diseases, Centre Hospitalier Universitaire Purpan, UMR 152, Institut de Recherche pour le Développement Toulouse 3 University, France
| | | | | | - Brian R Edlin
- State University of New York Downstate College of Medicine, Brooklyn, New York
| | - Michael P Busch
- University of California and Vitalant Research Institute, San Francisco, California
| | - Graeme Alexander
- University College London Institute for Liver and Digestive Health, The Royal Free Hospital, London, UK
| | | | - Edward L Murphy
- University of California and Vitalant Research Institute, San Francisco, California
| | - Rachel Latanich
- Johns Hopkins University, School of Medicine, Baltimore, Maryland
| | - Genevieve L Wojcik
- Department of Genetics, Stanford University School of Medicine, Stanford, California
| | - Margaret A Taub
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, Maryland
| | - Ana Valencia
- Johns Hopkins University, School of Medicine, Baltimore, Maryland; Universidad Pontificia Bolivariana, Medellin, Colombia
| | - David L Thomas
- Johns Hopkins University, School of Medicine, Baltimore, Maryland
| | - Priya Duggal
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, Maryland.
| |
Collapse
|
35
|
Fox CS. Using Human Genetics to Drive Drug Discovery: A Perspective. Am J Kidney Dis 2019; 74:111-119. [PMID: 30898364 DOI: 10.1053/j.ajkd.2018.12.045] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Accepted: 12/24/2018] [Indexed: 12/11/2022]
Abstract
The probability of success of developing medicines to treat human disease can be improved by leveraging human genetics. Different types of genetic data and techniques, including genome-wide association, whole-exome sequencing, and whole-genome sequencing, can be used to gain insight into human disease. Layering different types of genetic evidence from Mendelian disease, coding variants, and common variation can bolster support for a genetic target. Human knockouts offer the potential to perform reverse genetic screens in humans to identify physiologically relevant targets. Other components of a good genetic target include protective loss-of-function mutations, some degree of known biology, tractability, and a clean on-target safety profile. In addition to using human genetics to inspire new drug programs, phenome-wide association studies can be used to identify alternative indications or repurposing opportunities. This information can be combined into a 5-step approach for selecting a genetic target for validation, which is presented in detail in this review. Finally, current challenges in leveraging human genetics are highlighted, including the difficulties translating certain types of genetic data, relatively small number of bona fide disease-associated coding rare variants, and current sample sizes of large well-curated biobanks linked to comprehensive genetic information.
Collapse
|
36
|
Dashti HS, Redline S, Saxena R. Polygenic risk score identifies associations between sleep duration and diseases determined from an electronic medical record biobank. Sleep 2019; 42:zsy247. [PMID: 30521049 PMCID: PMC6424085 DOI: 10.1093/sleep/zsy247] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Revised: 11/07/2018] [Accepted: 12/03/2018] [Indexed: 01/01/2023] Open
Abstract
STUDY OBJECTIVES We aimed to detect cross-sectional phenotype and polygenic risk score (PRS) associations between sleep duration and prevalent diseases using the Partners Biobank, a hospital-based cohort study linking electronic medical records (EMR) with genetic information. METHODS Disease prevalence was determined from EMR, and sleep duration was self-reported. A PRS for sleep duration was derived using 78 previously associated SNPs from genome-wide association studies (GWAS) for self-reported sleep duration. We tested for associations between (1) self-reported sleep duration and 22 prevalent diseases (n = 30 251), (2) the PRS and self-reported sleep duration (n = 6903), and (3) the PRS and the 22 prevalent diseases (n = 16 033). For observed PRS-disease associations, we tested causality using two-sample Mendelian randomization (MR). RESULTS In the age-, sex-, and race-adjusted model, U-shaped associations were observed for sleep duration and asthma, depression, hypertension, insomnia, obesity, obstructive sleep apnea, and type 2 diabetes, where both short and long sleepers had higher odds for these diseases than normal sleepers (p < 2.27 × 10-3). Next, we confirmed associations between the PRS and longer sleep duration (0.65 ± 0.19 SD minutes per effect allele; p = 7.32 × 10-04). The PRS collectively explained 1.4% of the phenotypic variance in sleep duration. After adjusting for age, sex, genotyping array, and principal components of ancestry, we observed that the PRS was also associated with congestive heart failure (CHF; p = 0.015), obesity (p = 0.019), hypertension (p = 0.039), restless legs syndrome (RLS; p = 0.041), and insomnia (p = 0.049). Associations were maintained following additional adjustment for obesity status, except for hypertension and insomnia. For all diseases, except RLS, carrying a higher genetic burden of the 78 sleep duration-increasing alleles (i.e. higher sleep duration PRS) associated with lower odds for prevalent disease. In MR, we estimated causal associations between genetically defined longer sleep duration with decreased risk of CHF (inverse variance weighted [IVW] OR per minute of sleep [95% CI] = 0.978 [0.961-0.996]; p = 0.019) and hypertension (IVW OR [95% CI] = 0.993 [0.986-1.000]; p = 0.049), and increased risk of RLS (IVW OR [95% CI] = 1.018 [1.000-1.036]; p = 0.045). CONCLUSIONS By validating the PRS for sleep duration and identifying cross-phenotype associations, we lay the groundwork for future investigations on the intersection between sleep, genetics, clinical measures, and diseases using large EMR datasets.
Collapse
Affiliation(s)
- Hassan S Dashti
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA
| | - Susan Redline
- Departments of Medicine, Brigham and Women’s Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Richa Saxena
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Boston, MA
| |
Collapse
|
37
|
Verma A, Bang L, Miller JE, Zhang Y, Lee MTM, Zhang Y, Byrska-Bishop M, Carey DJ, Ritchie MD, Pendergrass SA, Kim D. Human-Disease Phenotype Map Derived from PheWAS across 38,682 Individuals. Am J Hum Genet 2019; 104:55-64. [PMID: 30598166 PMCID: PMC6323551 DOI: 10.1016/j.ajhg.2018.11.006] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 11/12/2018] [Indexed: 12/17/2022] Open
Abstract
Phenome-wide association studies (PheWASs) have been a useful tool for testing associations between genetic variations and multiple complex traits or diagnoses. Linking PheWAS-based associations between phenotypes and a variant or a genomic region into a network provides a new way to investigate cross-phenotype associations, and it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy. We created a network of associations from one of the largest PheWASs on electronic health record (EHR)-derived phenotypes across 38,682 unrelated samples from the Geisinger's biobank; the samples were genotyped through the DiscovEHR project. We computed associations between 632,574 common variants and 541 diagnosis codes. Using these associations, we constructed a "disease-disease" network (DDN) wherein pairs of diseases were connected on the basis of shared associations with a given genetic variant. The DDN provides a landscape of intra-connections within the same disease classes, as well as inter-connections across disease classes. We identified clusters of diseases with known biological connections, such as autoimmune disorders (type 1 diabetes, rheumatoid arthritis, and multiple sclerosis) and cardiovascular disorders. Previously unreported relationships between multiple diseases were identified on the basis of genetic associations as well. The network approach applied in this study can be used to uncover interactions between diseases as a result of their shared, potentially pleiotropic SNPs. Additionally, this approach might advance clinical research and even clinical practice by accelerating our understanding of disease mechanisms on the basis of similar underlying genetic associations.
Collapse
Affiliation(s)
- Anurag Verma
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA; The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Lisa Bang
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA 17821, USA
| | - Jason E Miller
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yanfei Zhang
- Genomic Medicine Institute, Geisinger, Danville, PA 17821, USA
| | | | - Yu Zhang
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Marta Byrska-Bishop
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA 17821, USA
| | - David J Carey
- Weis Center for Research, Geisinger, Danville, PA 17821, USA
| | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA; The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA 17821, USA
| | - Dokyoon Kim
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA; Biomedical and Translational Informatics Institute, Geisinger, Danville, PA 17821, USA.
| |
Collapse
|
38
|
Beaulieu-Jones BK, Kohane IS, Beam AL. Learning Contextual Hierarchical Structure of Medical Concepts with Poincairé Embeddings to Clarify Phenotypes. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019; 24:8-17. [PMID: 30864306 PMCID: PMC6417814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Biomedical association studies are increasingly done using clinical concepts, and in particular diagnostic codes from clinical data repositories as phenotypes. Clinical concepts can be represented in a meaningful, vector space using word embedding models. These embeddings allow for comparison between clinical concepts or for straightforward input to machine learning models. Using traditional approaches, good representations require high dimensionality, making downstream tasks such as visualization more difficult. We applied Poincaré embeddings in a 2-dimensional hyperbolic space to a large-scale administrative claims database and show performance comparable to 100-dimensional embeddings in a euclidean space. We then examine disease relationships under different disease contexts to better understand potential phenotypes.
Collapse
Affiliation(s)
| | - Isaac S. Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Andrew L. Beam
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
39
|
Farashi S, Kryza T, Clements J, Batra J. Post-GWAS in prostate cancer: from genetic association to biological contribution. Nat Rev Cancer 2019; 19:46-59. [PMID: 30538273 DOI: 10.1038/s41568-018-0087-3] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Genome-wide association studies (GWAS) have been successful in deciphering the genetic component of predisposition to many human complex diseases including prostate cancer. Germline variants identified by GWAS progressively unravelled the substantial knowledge gap concerning prostate cancer heritability. With the beginning of the post-GWAS era, more and more studies reveal that, in addition to their value as risk markers, germline variants can exert active roles in prostate oncogenesis. Consequently, current research efforts focus on exploring the biological mechanisms underlying specific susceptibility loci known as causal variants by applying novel and precise analytical methods to available GWAS data. Results obtained from these post-GWAS analyses have highlighted the potential of exploiting prostate cancer risk-associated germline variants to identify new gene networks and signalling pathways involved in prostate tumorigenesis. In this Review, we describe the molecular basis of several important prostate cancer-causal variants with an emphasis on using post-GWAS analysis to gain insight into cancer aetiology. In addition to discussing the current status of post-GWAS studies, we also summarize the main molecular mechanisms of potential causal variants at prostate cancer risk loci and explore the major challenges in moving from association to functional studies and their implication in clinical translation.
Collapse
Affiliation(s)
- Samaneh Farashi
- Cancer Program, School of Biomedical Sciences, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia
- Australian Prostate Cancer Research Centre - Queensland, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Thomas Kryza
- Cancer Program, School of Biomedical Sciences, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia
- Australian Prostate Cancer Research Centre - Queensland, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Judith Clements
- Cancer Program, School of Biomedical Sciences, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia
- Australian Prostate Cancer Research Centre - Queensland, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Jyotsna Batra
- Cancer Program, School of Biomedical Sciences, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia.
- Australian Prostate Cancer Research Centre - Queensland, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia.
| |
Collapse
|
40
|
Zhang X, Veturi Y, Verma S, Bone W, Verma A, Lucas A, Hebbring S, Denny JC, Stanaway IB, Jarvik GP, Crosslin D, Larson EB, Rasmussen-Torvik L, Pendergrass SA, Smoller JW, Hakonarson H, Sleiman P, Weng C, Fasel D, Wei WQ, Kullo I, Schaid D, Chung WK, Ritchie MD. Detecting potential pleiotropy across cardiovascular and neurological diseases using univariate, bivariate, and multivariate methods on 43,870 individuals from the eMERGE network. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019; 24:272-283. [PMID: 30864329 PMCID: PMC6457436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The link between cardiovascular diseases and neurological disorders has been widely observed in the aging population. Disease prevention and treatment rely on understanding the potential genetic nexus of multiple diseases in these categories. In this study, we were interested in detecting pleiotropy, or the phenomenon in which a genetic variant influences more than one phenotype. Marker-phenotype association approaches can be grouped into univariate, bivariate, and multivariate categories based on the number of phenotypes considered at one time. Here we applied one statistical method per category followed by an eQTL colocalization analysis to identify potential pleiotropic variants that contribute to the link between cardiovascular and neurological diseases. We performed our analyses on ~530,000 common SNPs coupled with 65 electronic health record (EHR)-based phenotypes in 43,870 unrelated European adults from the Electronic Medical Records and Genomics (eMERGE) network. There were 31 variants identified by all three methods that showed significant associations across late onset cardiac- and neurologic- diseases. We further investigated functional implications of gene expression on the detected "lead SNPs" via colocalization analysis, providing a deeper understanding of the discovered associations. In summary, we present the framework and landscape for detecting potential pleiotropy using univariate, bivariate, multivariate, and colocalization methods. Further exploration of these potentially pleiotropic genetic variants will work toward understanding disease causing mechanisms across cardiovascular and neurological diseases and may assist in considering disease prevention as well as drug repositioning in future research.
Collapse
Affiliation(s)
- Xinyuan Zhang
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA*Authors contributed equally to this work
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Genomic and Phenomic Research in the 21st Century. Trends Genet 2018; 35:29-41. [PMID: 30342790 DOI: 10.1016/j.tig.2018.09.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 09/24/2018] [Accepted: 09/25/2018] [Indexed: 02/06/2023]
Abstract
The field of human genomics has changed dramatically over time. Initial genomic studies were predominantly restricted to rare disorders in small families. Over the past decade, researchers changed course from family-based studies and instead focused on common diseases and traits in populations of unrelated individuals. With further advancements in biobanking, computer science, electronic health record (EHR) data, and more affordable high-throughput genomics, we are experiencing a new paradigm in human genomic research. Rapidly changing technologies and resources now make it possible to study thousands of diseases simultaneously at the genomic level. This review will focus on these advancements as scientists begin to incorporate phenome-wide strategies in human genomic research to understand the etiology of human diseases and develop new drugs to treat them.
Collapse
|
42
|
Silvestrov P, Maier SJ, Fang M, Cisneros GA. DNArCdb: A database of cancer biomarkers in DNA repair genes that includes variants related to multiple cancer phenotypes. DNA Repair (Amst) 2018; 70:10-17. [PMID: 30098577 PMCID: PMC6151283 DOI: 10.1016/j.dnarep.2018.07.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Revised: 07/30/2018] [Accepted: 07/30/2018] [Indexed: 02/04/2023]
Abstract
Functioning DNA repair capabilities are vital for organisms to ensure that the biological information is preserved and correctly propagated. Disruptions in DNA repair pathways can result in the accumulation of DNA mutations, which may lead to onset of complex disease such as cancer. The discovery and characterization of cancer-related biomarkers may allow early diagnosis and targeted treatment, which could significantly contribute to the survival rates of cancer patients. To this end, we have applied a hypothesis driven bioinformatics approach to identify biomarkers related to 25 different DNA repair enzymes, in combination with structural analysis of six selected missense mutations of newly discovered SNPs that are associated with cancer phenotypes. Our search on 8 distinct cancer databases uncovered 43 missense SNPs that statistically significantly associated at least one phenotype. Moreover, nine of these missense SNPs are statistically significantly associated with two or more cancers. In addition, we have performed classical molecular dynamics to characterize the impact of rs10018786 on POLN, which results in the M310 L Pol ν variant, and rs3218784 on POLI, which results in the I236 M Pol ι. Our results suggest that both of these cancer-associated variants result in noticeable structural and dynamical changes compared with their respective wild-type proteins.
Collapse
Affiliation(s)
- Pavel Silvestrov
- Department of Chemistry, University of North Texas, Denton, TX, 76201, United States
| | - Sarah J Maier
- Department of Chemistry, University of North Texas, Denton, TX, 76201, United States
| | - Michelle Fang
- Department of Chemistry, University of North Texas, Denton, TX, 76201, United States
| | - G Andrés Cisneros
- Department of Chemistry, University of North Texas, Denton, TX, 76201, United States.
| |
Collapse
|