1
|
Exome-wide rare variant analyses of two bone mineral density phenotypes: the challenges of analyzing rare genetic variation. Sci Rep 2018; 8:220. [PMID: 29317680 PMCID: PMC5760616 DOI: 10.1038/s41598-017-18385-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 12/11/2017] [Indexed: 11/08/2022] Open
Abstract
Performance of a recently developed test for association between multivariate phenotypes and sets of genetic variants (MURAT) is demonstrated using measures of bone mineral density (BMD). By combining individual-level whole genome sequenced data from the UK10K study, and imputed genome-wide genetic data on individuals from the Study of Osteoporotic Fractures (SOF) and the Osteoporotic Fractures in Men Study (MrOS), a data set of 8810 individuals was assembled; tests of association were performed between autosomal gene-sets of genetic variants and BMD measured at lumbar spine and femoral neck. Distributions of p-values obtained from analyses of a single BMD phenotype are compared to those from the multivariate tests, across several region definitions and variant weightings. There is evidence of increased power with the multivariate test, although no new loci for BMD were identified. Among 17 genes highlighted either because there were significant p-values in region-based association tests or because they were in well-known BMD genes, 4 windows in 2 genes as well as 6 single SNPs in one of these genes showed association at genome-wide significant thresholds with the multivariate phenotype test but not with the single-phenotype test, Sequence Kernel Association Test (SKAT).
Collapse
|
2
|
Kao CF, Liu JR, Hung H, Kuo PH. A robust GWSS method to simultaneously detect rare and common variants for complex disease. PLoS One 2015; 10:e0120873. [PMID: 25880329 PMCID: PMC4399906 DOI: 10.1371/journal.pone.0120873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Accepted: 01/26/2015] [Indexed: 11/19/2022] Open
Abstract
The rapid advances in sequencing technologies and the resulting next-generation sequencing data provide the opportunity to detect disease-associated variants with a better solution, in particular for low-frequency variants. Although both common and rare variants might exert their independent effects on the risk for the trait of interest, previous methods to detect the association effects rarely consider them simultaneously. We proposed a class of test statistics, the generalized weighted-sum statistic (GWSS), to detect disease associations in the presence of common and rare variants with a case-control study design. Information of rare variants was aggregated using a weighted sum method, while signal directions and strength of the variants were considered at the same time. Permutations were performed to obtain the empirical p-values of the test statistics. Our simulation showed that, compared to the existing methods, the GWSS method had better performance in most of the scenarios. The GWSS (in particular VDWSS-t) method is particularly robust for opposite association directions, association strength, and varying distributions of minor-allele frequencies. It is therefore promising for detecting disease-associated loci. For empirical data application, we also applied our GWSS method to the Genetic Analysis Workshop 17 data, and the results were consistent with the simulation, suggesting good performance of our method. As re-sequencing studies become more popular to identify putative disease loci, we recommend the use of this newly developed GWSS to detect associations with both common and rare variants.
Collapse
Affiliation(s)
- Chung-Feng Kao
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Jia-Rou Liu
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- Department of Public Health, Chang Gung University, Taoyuan,Taiwan
| | - Hung Hung
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- Research Center for Genes, Environment and Human Health, National Taiwan University, Taipei, Taiwan
- * E-mail: (PHK); (HH)
| | - Po-Hsiu Kuo
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- Research Center for Genes, Environment and Human Health, National Taiwan University, Taipei, Taiwan
- * E-mail: (PHK); (HH)
| |
Collapse
|
3
|
Li W, Dobbins S, Tomlinson I, Houlston R, Pal DK, Strug LJ. Prioritizing rare variants with conditional likelihood ratios. Hum Hered 2015; 79:5-13. [PMID: 25659987 PMCID: PMC4759929 DOI: 10.1159/000371579] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Accepted: 12/15/2014] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Prioritizing individual rare variants within associated genes or regions often consists of an ad hoc combination of statistical and biological considerations. From the statistical perspective, rare variants are often ranked using Fisher's exact p values, which can lead to different rankings of the same set of variants depending on whether 1- or 2-sided p values are used. RESULTS We propose a likelihood ratio-based measure, maxLRc, for the statistical component of ranking rare variants under a case-control study design that avoids the hypothesis-testing paradigm. We prove analytically that the maxLRc is always well-defined, even when the data has zero cell counts in the 2×2 disease-variant table. Via simulation, we show that the maxLRc outperforms Fisher's exact p values in most practical scenarios considered. Using next-generation sequence data from 27 rolandic epilepsy cases and 200 controls in a region previously shown to be linked to and associated with rolandic epilepsy, we demonstrate that rankings assigned by the maxLRc and exact p values can differ substantially. CONCLUSION The maxLRc provides reliable statistical prioritization of rare variants using only the observed data, avoiding the need to specify parameters associated with hypothesis testing that can result in ranking discrepancies across p value procedures; and it is applicable to common variant prioritization.
Collapse
Affiliation(s)
- Weili Li
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ont., Canada
| | | | | | | | | | | |
Collapse
|
4
|
Xu C, Ciampi A, Greenwood CMT. Exploring the potential benefits of stratified false discovery rates for region-based testing of association with rare genetic variation. Front Genet 2014; 5:11. [PMID: 24523729 PMCID: PMC3905218 DOI: 10.3389/fgene.2014.00011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2013] [Accepted: 01/13/2014] [Indexed: 01/13/2023] Open
Abstract
When analyzing the data that arises from exome or whole-genome sequencing studies, window-based tests, (i.e., tests that jointly analyze all genetic data in a small genomic region), are very popular. However, power is known to be quite low for finding associations with phenotypes using these tests, and therefore a variety of analytic strategies may be employed to potentially improve power. Using sequencing data of all of chromosome 3 from an interim release of data on 2432 individuals from the UK10K project, we simulated phenotypes associated with rare genetic variation, and used the results to explore the window-based test power. We asked two specific questions: firstly, whether there could be substantial benefits associated with incorporating information from external annotation on the genetic variants, and secondly whether the false discovery rate (FDRs) would be a useful metric for assessing significance. Although, as expected, there are benefits to using additional information (such as annotation) when it is associated with causality, we confirmed the general pattern of low sensitivity and power for window-based tests. For our chosen example, even when power is high to detect some of the associations, many of the regions containing causal variants are not detectable, despite using lax significance thresholds and optimal analytic methods. Furthermore, our estimated FDR values tended to be much smaller than the true FDRs. Long-range correlations between variants—due to linkage disequilibrium—likely explain some of this bias. A more sophisticated approach to using the annotation information may improve power, however, many causal variants of realistic effect sizes may simply be undetectable, at least with this sample size. Perhaps annotation information could assist in distinguishing windows containing causal variants from windows that are merely correlated with causal variants.
Collapse
Affiliation(s)
- Changjiang Xu
- Lady Davis Institute for Medical Research, Jewish General Hospital Montreal, QC, Canada ; Department of Epidemiology, Biostatistics and Occupational Health, McGill University Montreal, QC, Canada
| | - Antonio Ciampi
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University Montreal, QC, Canada
| | - Celia M T Greenwood
- Lady Davis Institute for Medical Research, Jewish General Hospital Montreal, QC, Canada ; Department of Epidemiology, Biostatistics and Occupational Health, McGill University Montreal, QC, Canada ; Departments of Oncology and Human Genetics, McGill University Montreal, QC, Canada
| | | |
Collapse
|
5
|
Yang T, Deng HW, Niu T. Critical assessment of coalescent simulators in modeling recombination hotspots in genomic sequences. BMC Bioinformatics 2014; 15:3. [PMID: 24387001 PMCID: PMC3890628 DOI: 10.1186/1471-2105-15-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2013] [Accepted: 12/30/2013] [Indexed: 12/04/2022] Open
Abstract
Background Coalescent simulation is pivotal for understanding population evolutionary models and demographic histories, as well as for developing novel analytical methods for genetic association studies for DNA sequence data. A plethora of coalescent simulators are developed, but selecting the most appropriate program remains challenging. Results We extensively compared performances of five widely used coalescent simulators – Hudson’s ms, msHOT, MaCS, Simcoal2, and fastsimcoal, to provide a practical guide considering three crucial factors, 1) speed, 2) scalability and 3) recombination hotspot position and intensity accuracy. Although ms represents a popular standard coalescent simulator, it lacks the ability to simulate sequences with recombination hotspots. An extended program msHOT has compensated for the deficiency of ms by incorporating recombination hotspots and gene conversion events at arbitrarily chosen locations and intensities, but remains limited in simulating long stretches of DNA sequences. Simcoal2, based on a discrete generation-by-generation approach, could simulate more complex demographic scenarios, but runs comparatively slow. MaCS and fastsimcoal, both built on fast, modified sequential Markov coalescent algorithms to approximate standard coalescent, are much more efficient whilst keeping salient features of msHOT and Simcoal2, respectively. Our simulations demonstrate that they are more advantageous over other programs for a spectrum of evolutionary models. To validate recombination hotspots, LDhat 2.2 rhomap package, sequenceLDhot and Haploview were compared for hotspot detection, and sequenceLDhot exhibited the best performance based on both real and simulated data. Conclusions While ms remains an excellent choice for general coalescent simulations of DNA sequences, MaCS and fastsimcoal are much more scalable and flexible in simulating a variety of demographic events under different recombination hotspot models. Furthermore, sequenceLDhot appears to give the most optimal performance in detecting and validating cross-over hotspots.
Collapse
Affiliation(s)
| | | | - Tianhua Niu
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, 1440 Canal Street, Suite 2001, New Orleans, LA 70112, USA.
| |
Collapse
|
6
|
|
7
|
Ellsworth KA, Eckloff BW, Li L, Moon I, Fridley BL, Jenkins GD, Carlson E, Brisbin A, Abo R, Bamlet W, Petersen G, Wieben ED, Wang L. Contribution of FKBP5 genetic variation to gemcitabine treatment and survival in pancreatic adenocarcinoma. PLoS One 2013; 8:e70216. [PMID: 23936393 PMCID: PMC3731355 DOI: 10.1371/journal.pone.0070216] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Accepted: 06/17/2013] [Indexed: 01/06/2023] Open
Abstract
PURPOSE FKBP51, (FKBP5), is a negative regulator of Akt. Variability in FKBP5 expression level is a major factor contributing to variation in response to chemotherapeutic agents including gemcitabine, a first line treatment for pancreatic cancer. Genetic variation in FKBP5 could influence its function and, ultimately, treatment response of pancreatic cancer. EXPERIMENTAL DESIGN We set out to comprehensively study the role of genetic variation in FKBP5 identified by Next Generation DNA resequencing on response to gemcitabine treatment of pancreatic cancer by utilizing both tumor and germline DNA samples from 43 pancreatic cancer patients, including 19 paired normal-tumor samples. Next, genotype-phenotype association studies were performed with overall survival as well as with FKBP5 gene expression in tumor using the same samples in which resequencing had been performed, followed by functional genomics studies. RESULTS In-depth resequencing identified 404 FKBP5 single nucleotide polymorphisms (SNPs) in normal and tumor DNA. SNPs with the strongest associations with survival or FKBP5 expression were subjected to functional genomic study. Electromobility shift assay showed that the rs73748206 "A(T)" SNP altered DNA-protein binding patterns, consistent with significantly increased reporter gene activity, possibly through its increased binding to Glucocorticoid Receptor (GR). The effect of rs73748206 was confirmed on the basis of its association with FKBP5 expression by affecting the binding to GR in lymphoblastoid cell lines derived from the same patients for whom DNA was used for resequencing. CONCLUSION This comprehensive FKBP5 resequencing study provides insights into the role of genetic variation in variation of gemcitabine response.
Collapse
Affiliation(s)
- Katarzyna A. Ellsworth
- Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Bruce W. Eckloff
- Biochemistry and Molecular Biology, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Liang Li
- Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Irene Moon
- Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Brooke L. Fridley
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Gregory D. Jenkins
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Erin Carlson
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Abra Brisbin
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Ryan Abo
- Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - William Bamlet
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Gloria Petersen
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Eric D. Wieben
- Biochemistry and Molecular Biology, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Liewei Wang
- Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, United States of America
| |
Collapse
|