1
|
Stoneman HR, Price A, Gignoux CR, Hendricks AE. CCAFE: Estimating Case and Control Allele Frequencies from GWAS Summary Statistics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.24.619530. [PMID: 39554201 PMCID: PMC11565872 DOI: 10.1101/2024.10.24.619530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Methods involving summary statistics in genetics can be quite powerful but can be limited in utility. For instance, many post-hoc analyses of disease studies require case and control allele frequencies (AFs), which are not always published. We present two frameworks to derive case and control AFs from GWAS summary statistics using the odds ratio, case and control sample sizes, and either the total (case and control aggregated) AF or standard error (SE). In simulations and real data, derivations of case and controls AFs using total AF is highly accurate across all settings (e.g., minor AF, condition prevalence). Conversely, derivations using SE underestimate common variant AFs (e.g. minor allele frequency >0.3) in the presence of covariates. We develop an adjustment using gnomAD AFs as a proxy for true AFs, which reduces the bias when using SE. While estimating case and control AFs using the total AF is preferred due to its high accuracy, estimating from the SE can be used more broadly since SE can be derived from p-values and beta estimates, which are commonly provided. The methods provided here expand the utility of publicly available genetic summary statistics and promote the reusability of genomic data. The R package CCAFE, with implementations of both methods, is freely available on Bioconductor and GitHub.
Collapse
Affiliation(s)
- Hayley R Stoneman
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Adelle Price
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Christopher R Gignoux
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Audrey E Hendricks
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| |
Collapse
|
2
|
Yang H, Wang Y, Zhao Y, Cao L, Chen C, Yu W. Causal effects of genetically determined metabolites and metabolite ratios on esophageal diseases: a two-sample Mendelian randomization study. BMC Gastroenterol 2024; 24:310. [PMID: 39271994 PMCID: PMC11401347 DOI: 10.1186/s12876-024-03411-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Accepted: 09/10/2024] [Indexed: 09/15/2024] Open
Abstract
BACKGROUND Esophageal diseases (ED) are a kind of common diseases of upper digestive tract. Previous studies have proved that metabolic disorders are closely related to the occurrence and development of ED. However, there is a lack of evidence for causal relationships between metabolites and ED, as well as between metabolite ratios representing enzyme activities and ED. Herein, we explored the causality of genetically determined metabolites (GDMs) on ED through Mendelian Randomization (MR) study. METHODS Two-sample Mendelian randomization analysis was used to assess the causal effects of genetically determined metabolites and metabolite ratios on ED. A genome-wide association analysis (GWAS) encompassing 850 individual metabolites along with 309 metabolite ratios served as the exposures. Meanwhile, the outcomes were defined by 10 types of ED phenotypes, including Congenital Malformations of Esophagus (CME), Esophageal Varices (EV), Esophageal Obstructions (EO), Esophageal Ulcers (EU), Esophageal Perforations (EP), Gastroesophageal Reflux Disease (GERD), Esophagitis, Barrett's Esophagus (BE), Benign Esophageal Tumors (BETs), and Malignant Esophageal Neoplasms (MENs). The standard inverse variance weighted (IVW) method was applied to estimate the causal relationship between exposure and outcome. Sensitivity analyses were carried out using multiple methods, including MR-Egger, Weighted Median, MR-PRESSO, Cochran's Q test, and leave-one-out analysis. P < 0.05 was conventionally considered statistically significant. After applying the Bonferroni correction for multiple testing, a threshold of P < 4.3E-05 (0.05/1159) was regarded as indicative of a statistically significant causal relationship. Furthermore, metabolic pathway analysis was performed using the web-based MetaboAnalyst 6.0 software. RESULTS The findings revealed that initially, a total of 869 candidate causal association pairs ( P ivw < 0.05) were identified, involving 442 metabolites, 145 metabolite ratios and 10 types of ED. However, upon applying the Bonferroni correction for multiple testing, only 36 pairs remained significant, involving 28 metabolites (predominantly lipids and amino acids), 5 metabolite ratios and 6 types of ED. Sensitivity analyses and reverse MR were performed for these 36 causal association pairs, where the results showed that the pair of EV and 1-(1-enyl-palmitoyl)-2-linoleoyl-GPE (p-16:0/18:2) did not withstand the sensitivity tests, and Hexadecenedioate (C16:1-DC) was found to have a reverse causality with GERD. The final 34 robust causal pairs included 26 metabolites, 5 metabolite ratios and 5 types of ED. The involved 26 metabolites predominantly consisted of methylated nucleotides, glycine derivatives, sex hormones, phospholipids, bile acids, fatty acid dicarboxylic acid derivatives, and N-acetylated amino acids. Furthermore, through metabolic pathway analysis, we uncovered 8 significant pathways that played pivotal roles in five types of ED conditions. CONCLUSIONS This study integrated genomics with metabolomics to assess causal relationships between ED and both metabolites and metabolite ratios, uncovering several key metabolic features in ED pathogenesis. These findings have potential as novel biomarkers for ED and provide insights into the disease's etiology and progression. However, further clinical and experimental validations are necessary.
Collapse
Affiliation(s)
- Hanlei Yang
- Department of Laboratory Medicine, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China
| | - Yulan Wang
- Department of Laboratory Medicine, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China
| | - Yuewei Zhao
- Department of Laboratory Medicine, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China
| | - Leiqun Cao
- Department of Laboratory Medicine, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China
| | - Changqiang Chen
- Department of Laboratory Medicine, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China.
| | - Wenjun Yu
- Department of Laboratory Medicine, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, P. R. China.
| |
Collapse
|
3
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
4
|
Pleić N, Babić Leko M, Gunjača I, Zemunik T. Vitamin D and thyroid function: A mendelian randomization study. PLoS One 2024; 19:e0304253. [PMID: 38900813 PMCID: PMC11189194 DOI: 10.1371/journal.pone.0304253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 05/08/2024] [Indexed: 06/22/2024] Open
Abstract
BACKGROUND Numerous organs, including the thyroid gland, depend on vitamin D to function normally. Insufficient levels of serum 25-hydroxyvitamin D [25(OH)D] are seen as a potential factor contributing to the emergence of several thyroid disorders, however, the causal relationship remains unclear. Here we use a Mendelian randomization (MR) approach to investigate the causal effect of serum 25(OH)D concentration on the indicators of thyroid function. METHODS We conducted a two-sample MR analysis utilizing summary data from the most extensive genome-wide association studies (GWAS) of serum 25(OH)D concentration (n = 443,734 and 417,580), thyroid-stimulating hormone (TSH, n = 271,040), free thyroxine (fT4, n = 119,120), free triiodothyronine (fT3, n = 59,061), total triiodothyronine (TT3, n = 15,829), as well as thyroid peroxidase antibody levels and positivity (TPOAb, n = 12,353 and n = 18,297), low TSH (n = 153,241), high TSH (n = 141,549), autoimmune hypothyroidism (n = 287,247) and autoimmune hyperthyroidism (n = 257,552). The primary analysis was conducted using the multiplicative random-effects inverse variance weighted (IVW) method. The weighted mode, weighted median, MR-Egger, MR-PRESSO, and Causal Analysis Using Summary Effect estimates (CAUSE) were used in the sensitivity analysis. RESULTS The IVW, as well as MR Egger and CAUSE analysis, showed a suggestive causal effect of 25(OH)D concentration on high TSH. Each 1 SD increase in serum 25(OH)D concentration was associated with a 12% decrease in the risk of high TSH (p = 0.02). Additionally, in the MR Egger and CAUSE analysis, we found a suggestive causal effect of 25(OH)D concentration on autoimmune hypothyroidism. Specifically, each 1 SD increase in serum 25(OH)D concentration was associated with a 16.34% decrease in the risk of autoimmune hypothyroidism (p = 0.02). CONCLUSIONS Our results support a suggestive causal effect which was negative in direction across all methods used, meaning that higher genetically predicted vitamin D concentration possibly lowers the odds of having high TSH or autoimmune hypothyroidism. Other thyroid parameters were not causally influenced by vitamin D serum concentration.
Collapse
Affiliation(s)
- Nikolina Pleić
- Department of Medical Biology, University of Split, School of Medicine, Split, Croatia
| | - Mirjana Babić Leko
- Department of Medical Biology, University of Split, School of Medicine, Split, Croatia
| | - Ivana Gunjača
- Department of Medical Biology, University of Split, School of Medicine, Split, Croatia
| | - Tatijana Zemunik
- Department of Medical Biology, University of Split, School of Medicine, Split, Croatia
| |
Collapse
|
5
|
Miao DNR, Ladha F, Lyle SM, Olivier DW, Ahmed S, Drögemöller BI. Current Perspectives on Data Sharing and Open Science in Pharmacogenomics. Clin Pharmacol Ther 2024; 115:408-411. [PMID: 38087986 DOI: 10.1002/cpt.3115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 11/21/2023] [Indexed: 02/17/2024]
Affiliation(s)
- Deanne Nixie R Miao
- Department of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Feryal Ladha
- Department of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Sarah M Lyle
- Department of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Daniel W Olivier
- Department of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
- Department of Physiological Sciences, Stellenbosch University, Stellenbosch, Western Cape, South Africa
| | - Samah Ahmed
- Department of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Britt I Drögemöller
- Department of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
- Paul Albrechtsen Research Institute CancerCare Manitoba Research, Winnipeg, Manitoba, Canada
- Children's Hospital Research Institute of Manitoba, Winnipeg, Manitoba, Canada
- Centre on Aging, Winnipeg, Manitoba, Canada
| |
Collapse
|
6
|
Genovese G, Rockweiler NB, Gorman BR, Bigdeli TB, Pato MT, Pato CN, Ichihara K, McCarroll SA. BCFtools/liftover: an accurate and comprehensive tool to convert genetic variants across genome assemblies. Bioinformatics 2024; 40:btae038. [PMID: 38261650 PMCID: PMC10832354 DOI: 10.1093/bioinformatics/btae038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/07/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024] Open
Abstract
MOTIVATION Many genetics studies report results tied to genomic coordinates of a legacy genome assembly. However, as assemblies are updated and improved, researchers are faced with either realigning raw sequence data using the updated coordinate system or converting legacy datasets to the updated coordinate system to be able to combine results with newer datasets. Currently available tools to perform the conversion of genetic variants have numerous shortcomings, including poor support for indels and multi-allelic variants, that lead to a higher rate of variants being dropped or incorrectly converted. As a result, many researchers continue to work with and publish using legacy genomic coordinates. RESULTS Here we present BCFtools/liftover, a tool to convert genomic coordinates across genome assemblies for variants encoded in the variant call format with improved support for indels represented by different reference alleles across genome assemblies and full support for multi-allelic variants. It further supports variant annotation fields updates whenever the reference allele changes across genome assemblies. The tool has the lowest rate of variants being dropped with an order of magnitude less indels dropped or incorrectly converted and is an order of magnitude faster than other tools typically used for the same task. It is particularly suited for converting variant callsets from large cohorts to novel telomere-to-telomere assemblies as well as summary statistics from genome-wide association studies tied to legacy genome assemblies. AVAILABILITY AND IMPLEMENTATION The tool is written in C and freely available under the MIT open source license as a BCFtools plugin available at http://github.com/freeseek/score.
Collapse
Affiliation(s)
- Giulio Genovese
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Genetics, Harvard Medical School, Boston, MA 02115, United States
| | - Nicole B Rockweiler
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Genetics, Harvard Medical School, Boston, MA 02115, United States
| | - Bryan R Gorman
- Center for Data and Computational Sciences, VA Boston HealthCare System, Boston, MA 02130, United States
- Booz Allen Hamilton Inc, McLean, VA 22102, United States
| | - Tim B Bigdeli
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY 11203, United States
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, NY 11203, United States
- Cooperative Studies Program, VA New York Harbor Healthcare System, Brooklyn, NY 11209, United States
| | - Michelle T Pato
- Department of Psychiatry, Robert Wood Johnson Medical School, New Brunswick, NJ 08901, United States
| | - Carlos N Pato
- Department of Psychiatry, Robert Wood Johnson Medical School, New Brunswick, NJ 08901, United States
| | - Kiku Ichihara
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Genetics, Harvard Medical School, Boston, MA 02115, United States
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Genetics, Harvard Medical School, Boston, MA 02115, United States
| |
Collapse
|