1
|
Duangjan C, Arpawong TE, Spatola BN, Curran SP. Hepatic WDR23 proteostasis mediates insulin homeostasis by regulating insulin-degrading enzyme capacity. GeroScience 2024:10.1007/s11357-024-01196-y. [PMID: 38767782 DOI: 10.1007/s11357-024-01196-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 05/08/2024] [Indexed: 05/22/2024] Open
Abstract
Maintaining insulin homeostasis is critical for cellular and organismal metabolism. In the liver, insulin is degraded by the activity of the insulin-degrading enzyme (IDE). Here, we establish a hepatic regulatory axis for IDE through WDR23-proteostasis. Wdr23KO mice have increased IDE expression, reduced circulating insulin, and defective insulin responses. Genetically engineered human cell models lacking WDR23 also increase IDE expression and display dysregulated phosphorylation of insulin signaling cascade proteins, IRS-1, AKT2, MAPK, FoxO, and mTOR, similar to cells treated with insulin, which can be mitigated by chemical inhibition of IDE. Mechanistically, the cytoprotective transcription factor NRF2, a direct target of WDR23-Cul4 proteostasis, mediates the enhanced transcriptional expression of IDE when WDR23 is ablated. Moreover, an analysis of human genetic variation in WDR23 across a large naturally aging human cohort in the US Health and Retirement Study reveals a significant association of WDR23 with altered hemoglobin A1C (HbA1c) levels in older adults, supporting the use of WDR23 as a new molecular determinant of metabolic health in humans.
Collapse
Affiliation(s)
- Chatrawee Duangjan
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Thalida Em Arpawong
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Brett N Spatola
- Dornsife College of Letters, Arts, and Science, University of Southern California, Los Angeles, CA, 90089, USA
| | - Sean P Curran
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
2
|
Powell NR, Geck RC, Lai D, Shugg T, Skaar TC, Dunham M. Functional Analysis of G6PD Variants Associated With Low G6PD Activity in the All of Us Research Program. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.12.24305393. [PMID: 38645242 PMCID: PMC11030488 DOI: 10.1101/2024.04.12.24305393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Glucose-6-phosphate dehydrogenase (G6PD) protects red blood cells against oxidative damage through regeneration of NADPH. Individuals with G6PD polymorphisms (variants) that produce an impaired G6PD enzyme are usually asymptomatic, but at risk of hemolytic anemia from oxidative stressors, including certain drugs and foods. Prevention of G6PD deficiency-related hemolytic anemia is achievable through G6PD genetic testing or whole-genome sequencing (WGS) to identify affected individuals who should avoid hemolytic triggers. However, accurately predicting the clinical consequence of G6PD variants is limited by over 800 G6PD variants which remain of uncertain significance. There also remains significant variability in which deficiency-causing variants are included in pharmacogenomic testing arrays across institutions: many panels only include c.202G>A, even though dozens of other variants can also cause G6PD deficiency. Here, we seek to improve G6PD genotype interpretation using data available in the All of Us Research Program and using a yeast functional assay. We confirm that G6PD coding variants are the main contributor to decreased G6PD activity, and that 13% of individuals in the All of Us data with deficiency-causing variants would be missed if only the c.202G>A variant were tested for. We expand clinical interpretation for G6PD variants of uncertain significance; reporting that c.595A>G, known as G6PD Dagua or G6PD Açores, and the newly identified variant c.430C>G, reduce activity sufficiently to lead to G6PD deficiency. We also provide evidence that five missense variants of uncertain significance are unlikely to lead to G6PD deficiency, since they were seen in hemi- or homozygous individuals without a reduction in G6PD activity. We also applied the new WHO guidelines and were able to classify two synonymous variants as WHO class C. We anticipate these results will improve the accuracy, and prompt increased use, of G6PD genetic tests through a more complete clinical interpretation of G6PD variants. As the All of Us data increases from 245,000 to 1 million participants, and additional functional assays are carried out, we expect this research to serve as a template to enable complete characterization of G6PD deficiency genotypes. With an increased number of interpreted variants, genetic testing of G6PD will be more informative for preemptively identifying individuals at risk for drug- or food-induced hemolytic anemia.
Collapse
Affiliation(s)
- Nicholas R Powell
- Indiana University School of Medicine, Department of Medicine, Division of Clinical Pharmacology, Indianapolis IN
| | - Renee C Geck
- University of Washington, Department of Genome Sciences, Seattle WA
| | - Dongbing Lai
- Indiana University School of Medicine, Department of Medical and Molecular Genetics, Indianapolis IN
| | - Tyler Shugg
- Indiana University School of Medicine, Department of Medicine, Division of Clinical Pharmacology, Indianapolis IN
| | - Todd C Skaar
- Indiana University School of Medicine, Department of Medicine, Division of Clinical Pharmacology, Indianapolis IN
| | - Maitreya Dunham
- University of Washington, Department of Genome Sciences, Seattle WA
| |
Collapse
|
3
|
Shi Y, Shi W, Wang M, Lee JH, Kang H, Jiang H. Accurate and fast small p-value estimation for permutation tests in high-throughput genomic data analysis with the cross-entropy method. Stat Appl Genet Mol Biol 2023; 22:sagmb-2021-0067. [PMID: 37622330 DOI: 10.1515/sagmb-2021-0067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 06/23/2023] [Indexed: 08/26/2023]
Abstract
Permutation tests are widely used for statistical hypothesis testing when the sampling distribution of the test statistic under the null hypothesis is analytically intractable or unreliable due to finite sample sizes. One critical challenge in the application of permutation tests in genomic studies is that an enormous number of permutations are often needed to obtain reliable estimates of very small p-values, leading to intensive computational effort. To address this issue, we develop algorithms for the accurate and efficient estimation of small p-values in permutation tests for paired and independent two-group genomic data, and our approaches leverage a novel framework for parameterizing the permutation sample spaces of those two types of data respectively using the Bernoulli and conditional Bernoulli distributions, combined with the cross-entropy method. The performance of our proposed algorithms is demonstrated through the application to two simulated datasets and two real-world gene expression datasets generated by microarray and RNA-Seq technologies and comparisons to existing methods such as crude permutations and SAMC, and the results show that our approaches can achieve orders of magnitude of computational efficiency gains in estimating small p-values. Our approaches offer promising solutions for the improvement of computational efficiencies of existing permutation test procedures and the development of new testing methods using permutations in genomic data analysis.
Collapse
Affiliation(s)
- Yang Shi
- Division of Biostatistics and Data Science, Department of Population Health Sciences and Department of Neuroscience and Regenerative Medicine, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA
- University of New Mexico Comprehensive Cancer Center Biostatistics Shared Resource, University of New Mexico, Albuquerque, NM 87131, USA
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Weiping Shi
- College of Mathematics, Jilin University, Changchun, 130012, China
| | - Mengqiao Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Chengdu Medical College, Chengdu, 610500, China
| | - Ji-Hyun Lee
- Division of Quantitative Sciences, University of Florida Health Cancer Center and Department of Biostatistics, University of Florida, Gainesville, FL 32610, USA
| | - Huining Kang
- University of New Mexico Comprehensive Cancer Center Biostatistics Shared Resource, University of New Mexico, Albuquerque, NM 87131, USA
- Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA
| | - Hui Jiang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- University of Michigan Rogel Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
4
|
Villa O, Stuhr NL, Yen CA, Crimmins EM, Arpawong TE, Curran SP. Genetic variation in ALDH4A1 is associated with muscle health over the lifespan and across species. eLife 2022; 11:74308. [PMID: 35470798 PMCID: PMC9106327 DOI: 10.7554/elife.74308] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 04/13/2022] [Indexed: 11/13/2022] Open
Abstract
The influence of genetic variation on the aging process, including the incidence and severity of age-related diseases, is complex. Here, we define the evolutionarily conserved mitochondrial enzyme ALH-6/ALDH4A1 as a predictive biomarker for age-related changes in muscle health by combining Caenorhabditis elegans genetics and a gene-wide association scanning (GeneWAS) from older human participants of the US Health and Retirement Study (HRS). In a screen for mutations that activate oxidative stress responses, specifically in the muscle of C. elegans, we identified 96 independent genetic mutants harboring loss-of-function alleles of alh-6, exclusively. Each of these genetic mutations mapped to the ALH-6 polypeptide and led to the age-dependent loss of muscle health. Intriguingly, genetic variants in ALDH4A1 show associations with age-related muscle-related function in humans. Taken together, our work uncovers mitochondrial alh-6/ALDH4A1 as a critical component to impact normal muscle aging across species and a predictive biomarker for muscle health over the lifespan. Ageing is inevitable, but what makes one person ‘age well’ and another decline more quickly remains largely unknown. While many aspects of ageing are clearly linked to genetics, the specific genes involved often remain unidentified. Sarcopenia is an age-related condition affecting the muscles. It involves a gradual loss of muscle mass that becomes faster with age, and is associated with loss of mobility, decreased quality of life, and increased risk of death. Around half of all people aged 80 and over suffer from sarcopenia. Several lifestyle factors, especially poor diet and lack of exercise, are associated with the condition, but genetics is also involved: the condition accelerates more quickly in some people than others, and even fit, physically active individuals can be affected. To study the genetics of conditions like sarcopenia, researchers often use animals like flies or worms, which have short generation times but share genetic similarities with humans. For example, the worm Caenorhabditis elegans has equivalents of several human muscle genes, including the gene alh-6. In worms, alh-6 is important for maintaining energy supply to the muscles, and mutating it not only leads to muscle damage but also to premature ageing. Given this insight, Villa, Stuhr, Yen et al. wanted to determine if variation in the human version of alh-6, ALDH4A1, also contributes to individual differences in muscle ageing and decline in humans. Evaluating variation in this gene required a large amount of genetic data from older adults. These were taken from a continuous study that follows >35,000 older adults. Importantly, the study collects not only information on gene sequences but also measures of muscle health and performance over time for each individual. Analysis of these genetic data revealed specific small variations in the DNA of ALDH4A1, all of which associated with reduced muscle health. Follow-up experiments in worms used genetic engineering techniques to test how variation in the worm alh-6 gene could influence age-related health. The resulting mutant worms developed muscle problems much earlier than their normal counterparts, supporting the role of alh-6/ALDH4A1 in determining muscle health across the lifespan of both worms and humans. These results have identified a key influencer of muscle health during ageing in worms, and emphasize the importance of validating effects of genetic variation among humans during this process. Villa, Stuhr, Yen et al. hope that this study will help researchers find more genetic ‘markers’ of muscle health, and ultimately allow us to predict an individual’s risk of sarcopenia based on their genetic make-up.
Collapse
Affiliation(s)
- Osvaldo Villa
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, United States
| | - Nicole L Stuhr
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, United States.,Dornsife College of Letters, Arts, and Science, Department of Molecular and Computational Biology, University of Southern California, Los Angeles, United States
| | - Chia-An Yen
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, United States.,Dornsife College of Letters, Arts, and Science, Department of Molecular and Computational Biology, University of Southern California, Los Angeles, United States
| | - Eileen M Crimmins
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, United States
| | - Thalida Em Arpawong
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, United States
| | - Sean P Curran
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, United States.,Dornsife College of Letters, Arts, and Science, Department of Molecular and Computational Biology, University of Southern California, Los Angeles, United States.,Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, United States
| |
Collapse
|
5
|
Asif H, Alliey-Rodriguez N, Keedy S, Tamminga CA, Sweeney JA, Pearlson G, Clementz BA, Keshavan MS, Buckley P, Liu C, Neale B, Gershon ES. GWAS significance thresholds for deep phenotyping studies can depend upon minor allele frequencies and sample size. Mol Psychiatry 2021; 26:2048-2055. [PMID: 32066829 PMCID: PMC7429341 DOI: 10.1038/s41380-020-0670-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 01/28/2020] [Accepted: 01/29/2020] [Indexed: 02/01/2023]
Abstract
An important issue affecting genome-wide association studies with deep phenotyping (multiple correlated phenotypes) is determining the suitable family-wise significance threshold. Straightforward family-wise correction (Bonferroni) of p < 0.05 for 4.3 million genotypes and 335 phenotypes would give a threshold of p < 3.46E-11. This would be too conservative because it assumes all tests are independent. The effective number of tests, both phenotypic and genotypic, must be adjusted for the correlations between them. Spectral decomposition of the phenotype matrix and LD-based correction of the number of tested SNPs are currently used to determine an effective number of tests. In this paper, we compare these calculated estimates with permutation-determined family-wise significance thresholds. Permutations are performed by shuffling individual IDs of the genotype vector for this dataset, to preserve correlation of phenotypes. Our results demonstrate that the permutation threshold is influenced by minor allele frequency (MAF) of the SNPs, and by the number of individuals tested. For the more common SNPs (MAF > 0.1), the permutation family-wise threshold was in close agreement with spectral decomposition methods. However, for less common SNPs (0.05 < MAF ≤ 0.1), the permutation threshold calculated over all SNPs was off by orders of magnitude. This applies to the number of individuals studied (here 777) but not to very much larger numbers. Based on these findings, we propose that the threshold to find a particular level of family-wise significance may need to be established using separate permutations of the actual data for several MAF bins.
Collapse
Affiliation(s)
- Huma Asif
- Department of Psychiatry and Behavioral Neurosciences, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA.
| | - Ney Alliey-Rodriguez
- Department of Psychiatry and Behavioral Neurosciences, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA
| | - Sarah Keedy
- Department of Psychiatry and Behavioral Neurosciences, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA
| | - Carol A Tamminga
- Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - John A Sweeney
- Department of Psychiatry, University of Cincinnati, Cincinnati, OH, USA
| | - Godfrey Pearlson
- Departments of Psychiatry & Neuroscience, Yale University, New Haven, CT, USA
| | - Brett A Clementz
- Department of Psychology, University of Georgia, Athens, GA, USA
| | | | | | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Binghamton, NY, USA
| | | | - Elliot S Gershon
- Department of Psychiatry and Behavioral Neurosciences, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA. .,Department of Human Genetics, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA.
| |
Collapse
|
6
|
Liu L, He J, Lu X, Yuan Y, Jiang D, Xiao H, Lin S, Xu L, Chen Y. Association of Myopia and Genetic Variants of TGFB2-AS1 and TGFBR1 in the TGF-β Signaling Pathway: A Longitudinal Study in Chinese School-Aged Children. Front Cell Dev Biol 2021; 9:628182. [PMID: 33996791 PMCID: PMC8115727 DOI: 10.3389/fcell.2021.628182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 04/06/2021] [Indexed: 11/17/2022] Open
Abstract
Background Myopia is a complex multifactorial condition which involves several overlapping signaling pathways mediated by distinct genes. This prospective cohort study evaluated the associations of two genetic variants in the TGF-β signaling pathway with the onset and progression of myopia and ocular biometric parameters in Chinese school-aged children. Methods A total of 556 second grade children were examined and followed up for 3.5 years. Non-cycloplegic refraction and ocular biometric parameters were measured annually. Multivariate regression analysis was used to assess the effect of the TGFBR1 rs10760673 and TGFB2-AS1 rs7550232 variants on the occurrence and progression of myopia. A 10,000 permutations test was used to correct for multiple testing. Functional annotation of single nucleotide polymorphisms (SNPs) was performed using RegulomeDB, HaploReg, and rVarBase. Results A total of 448 children were included in the analysis. After adjustments for gender, age, near work time and outdoor time with 10,000 permutations, the results indicated that the C allele and the AC or CC genotypes of rs7550232 adjacent to TGFB2-AS1 were associated with a significantly increased risk of the onset of myopia in two genetic models (additive: P’ = 0.022; dominant: P’ = 0.025). Additionally, the A allele and the AA or AG genotypes of rs10760673 of TGFBR1 were associated with a significant myopic shift (additive: P’ = 0.008; dominant: P’ = 0.028; recessive: P’ = 0.027). Furthermore, rs10760673 was associated with an increase in axial length (AL) (P’ = 0.013, β = 0.03) and a change in the ratio of AL to the corneal radius of curvature (AL/CRC) (P’ = 0.031, β = 0.003). Analysis using RegulomeDB, HaploReg, and rVarBase indicated that rs7550232 is likely to affect transcription factor binding, any motif, DNase footprint, and DNase peak. Conclusion The present study indicated that rs10760673 and rs7550232 may represent susceptibility loci for the progression and onset of myopia, respectively, in school-aged children. Associations of the variants of the TGFBR1 and TGFB2-AS1 genes with myopia may be mediated by the TGF-β signaling pathway; this hypothesis requires validation in functional studies. This trial was registered as ChiCTR1900020584 at www.Chictr.org.cn.
Collapse
Affiliation(s)
- Linjie Liu
- School of Optometry and Ophthalmology, Wenzhou Medical University, Wenzhou, China
| | - Juan He
- School of Optometry and Ophthalmology, Wenzhou Medical University, Wenzhou, China
| | - Xiaoyan Lu
- School of Optometry and Ophthalmology, Wenzhou Medical University, Wenzhou, China
| | - Yimin Yuan
- Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Dandan Jiang
- Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Haishao Xiao
- School of Optometry and Ophthalmology, Wenzhou Medical University, Wenzhou, China
| | - Shudan Lin
- School of Optometry and Ophthalmology, Wenzhou Medical University, Wenzhou, China
| | - Liangde Xu
- School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China
| | - Yanyan Chen
- Eye Hospital, Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
7
|
Kunert-Graf JM, Sakhanenko NA, Galas DJ. Optimized permutation testing for information theoretic measures of multi-gene interactions. BMC Bioinformatics 2021; 22:180. [PMID: 33827420 PMCID: PMC8028212 DOI: 10.1186/s12859-021-04107-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 03/29/2021] [Indexed: 11/17/2022] Open
Abstract
Background Permutation testing is often considered the “gold standard” for multi-test significance analysis, as it is an exact test requiring few assumptions about the distribution being computed. However, it can be computationally very expensive, particularly in its naive form in which the full analysis pipeline is re-run after permuting the phenotype labels. This can become intractable in multi-locus genome-wide association studies (GWAS), in which the number of potential interactions to be tested is combinatorially large. Results In this paper, we develop an approach for permutation testing in multi-locus GWAS, specifically focusing on SNP–SNP-phenotype interactions using multivariable measures that can be computed from frequency count tables, such as those based in Information Theory. We find that the computational bottleneck in this process is the construction of the count tables themselves, and that this step can be eliminated at each iteration of the permutation testing by transforming the count tables directly. This leads to a speed-up by a factor of over 103 for a typical permutation test compared to the naive approach. Additionally, this approach is insensitive to the number of samples making it suitable for datasets with large number of samples. Conclusions The proliferation of large-scale datasets with genotype data for hundreds of thousands of individuals enables new and more powerful approaches for the detection of multi-locus genotype-phenotype interactions. Our approach significantly improves the computational tractability of permutation testing for these studies. Moreover, our approach is insensitive to the large number of samples in these modern datasets. The code for performing these computations and replicating the figures in this paper is freely available at https://github.com/kunert/permute-counts.
Collapse
Affiliation(s)
- James M Kunert-Graf
- Pacific Northwest Research Institute, 720 Broadway, Seattle, WA, 98122, USA.
| | | | - David J Galas
- Pacific Northwest Research Institute, 720 Broadway, Seattle, WA, 98122, USA
| |
Collapse
|
8
|
Hao Z, Jiang L, Gao J, Ye J, Zhao J, Li S, Yang R. Quick approximation of threshold values for genome-wide association studies. Brief Bioinform 2020; 20:2217-2223. [PMID: 30219836 DOI: 10.1093/bib/bby082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Revised: 08/10/2018] [Accepted: 08/14/2018] [Indexed: 11/13/2022] Open
Abstract
Standard normal statistics, chi-squared statistics, Student's t statistics and F statistics are used to map quantitative trait nucleotides for both small and large sample sizes. In genome-wide association studies (GWASs) of single-nucleotide polymorphisms (SNPs), the statistical distributions depend on both genetic effects and SNPs but are independent of SNPs under the null hypothesis of no genetic effects. Therefore, hypothesis testing when a nuisance parameter is present only under the alternative was introduced to quickly approximate the critical thresholds of these test statistics for GWASs. When only the statistical probabilities are available for high-throughput SNPs, the approximate critical thresholds can be estimated with chi-squared statistics, formulated by statistical probabilities with a degree of freedom of two. High similarities in the critical thresholds between the accurate and approximate estimations were demonstrated by extensive simulations and real data analysis.
Collapse
Affiliation(s)
- Zhiyu Hao
- College of Animal Science and Technology at the Northeast Agricultural University
| | - Li Jiang
- Research Centre for Aquatic Biotechnology at the Chinese Academy of Fishery Sciences
| | - Jin Gao
- Research Centre for Aquatic Biotechnology at the Chinese Academy of Fishery Sciences
| | - Jinhua Ye
- Department of Mathematics at the Heilongjiang Bayi Agricultural University
| | - Jingli Zhao
- Research Centre for Aquatic Biotechnology at the Chinese Academy of Fishery Sciences
| | - Shuling Li
- College of Life Science at the Northeast Agricultural University. Her research focuses on animal behavior genetics and genomics
| | - Runqing Yang
- Research Centre for Aquatic Biotechnology at the Chinese Academy of Fishery Sciences
| |
Collapse
|
9
|
Leem S, Huh I, Park T. Enhanced Permutation Tests via Multiple Pruning. Front Genet 2020; 11:509. [PMID: 32670346 PMCID: PMC7330123 DOI: 10.3389/fgene.2020.00509] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2019] [Accepted: 04/27/2020] [Indexed: 11/25/2022] Open
Abstract
Big multi-omics data in bioinformatics often consists of a huge number of features and relatively small numbers of samples. In addition, features from multi-omics data have their own specific characteristics depending on whether they are from genomics, proteomics, metabolomics, etc. Due to these distinct characteristics, standard statistical analyses using parametric-based assumptions may sometimes fail to provide exact asymptotic results. To resolve this issue, permutation tests can be a way to exactly analyze multi-omics data because they are distribution-free and flexible to use. In permutation tests, p-values are evaluated by estimating the locations of test statistics in an empirical null distribution generated by random shuffling. However, the permutation approach can be infeasible when the number of features increases, because more stringent control of type I error is needed for multiple hypothesis testing, and consequently, much larger numbers of permutations are required to reach significance. To address this problem, we propose a well-organized strategy, “ENhanced Permutation tests via multiple Pruning (ENPP).” ENPP prunes the features in every permutation round if they are determined to be non-significant. In other words, if the feature statistics from the permuted datasets exceed the feature statistics from the original dataset, beyond a predetermined threshold, the feature is determined to be non-significant. If so, ENPP removes the feature and iterates the process without the feature in the next permutation round. Our simulation study showed that the ENPP method could remove about 50% of the features at the first permutation round, and, by the 100th permutation round, 98% of the features had been removed and only 7.4% of the computation time with the original unpruned permutation approach had elapsed. In addition, we applied this approach to a real data set (Korea Association REsource: KARE) of 327,872 SNPs to find association with a non-normally distributed phenotype (fasting plasma glucose), interpreted the results, and discussed the feasibility and advantages of the approach.
Collapse
Affiliation(s)
- Sangseob Leem
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Iksoo Huh
- College of Nursing and Research Institute of Nursing Science, Seoul National University, Seoul, South Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, South Korea
| |
Collapse
|
10
|
George AW, Verbyla A, Bowden J. Eagle: multi-locus association mapping on a genome-wide scale made routine. Bioinformatics 2020; 36:1509-1516. [PMID: 31596455 DOI: 10.1093/bioinformatics/btz759] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 08/19/2019] [Accepted: 10/02/2019] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION We present Eagle, a new method for multi-locus association mapping. The motivation for developing Eagle was to make multi-locus association mapping 'easy' and the method-of-choice. Eagle's strengths are that it (i) is considerably more powerful than single-locus association mapping, (ii) does not suffer from multiple testing issues, (iii) gives results that are immediately interpretable and (iv) has a computational footprint comparable to single-locus association mapping. RESULTS By conducting a large simulation study, we will show that Eagle finds true and avoids false single-nucleotide polymorphism trait associations better than competing single- and multi-locus methods. We also analyze data from a published mouse study. Eagle found over 50% more validated findings than the state-of-the-art single-locus method. AVAILABILITY AND IMPLEMENTATION Eagle has been implemented as an R package, with a browser-based Graphical User Interface for users less familiar with R. It is freely available via the CRAN website at https://cran.r-project.org. Videos, Quick Start guides, FAQs and Demos are available via the Eagle website http://eagle.r-forge.r-project.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
11
|
Rojano E, Seoane P, Ranea JAG, Perkins JR. Regulatory variants: from detection to predicting impact. Brief Bioinform 2019; 20:1639-1654. [PMID: 29893792 PMCID: PMC6917219 DOI: 10.1093/bib/bby039] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 04/18/2018] [Indexed: 02/01/2023] Open
Abstract
Variants within non-coding genomic regions can greatly affect disease. In recent years, increasing focus has been given to these variants, and how they can alter regulatory elements, such as enhancers, transcription factor binding sites and DNA methylation regions. Such variants can be considered regulatory variants. Concurrently, much effort has been put into establishing international consortia to undertake large projects aimed at discovering regulatory elements in different tissues, cell lines and organisms, and probing the effects of genetic variants on regulation by measuring gene expression. Here, we describe methods and techniques for discovering disease-associated non-coding variants using sequencing technologies. We then explain the computational procedures that can be used for annotating these variants using the information from the aforementioned projects, and prediction of their putative effects, including potential pathogenicity, based on rule-based and machine learning approaches. We provide the details of techniques to validate these predictions, by mapping chromatin-chromatin and chromatin-protein interactions, and introduce Clustered Regularly Interspaced Short Palindromic Repeats-Associated Protein 9 (CRISPR-Cas9) technology, which has already been used in this field and is likely to have a big impact on its future evolution. We also give examples of regulatory variants associated with multiple complex diseases. This review is aimed at bioinformaticians interested in the characterization of regulatory variants, molecular biologists and geneticists interested in understanding more about the nature and potential role of such variants from a functional point of views, and clinicians who may wish to learn about variants in non-coding genomic regions associated with a given disease and find out what to do next to uncover how they impact on the underlying mechanisms.
Collapse
Affiliation(s)
- Elena Rojano
- Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - Pedro Seoane
- Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - Juan A G Ranea
- CIBER de Enfermedades Raras, ISCIII, Madrid, Spain and Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - James R Perkins
- Research laboratory, IBIMA-Regional University Hospital of Malaga, UMA, Malaga 29009, Spain
| |
Collapse
|
12
|
Zhang M, Wang J, Wang Y, Wu S, Sandford AJ, Luo J, He JQ. Association of the TLR1 variant rs5743557 with susceptibility to tuberculosis. J Thorac Dis 2019; 11:583-594. [PMID: 30963003 DOI: 10.21037/jtd.2019.01.74] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Background Toll-like receptor 1 (TLR1) and TLR6 play important roles in the innate immune response against Mycobacterium tuberculosis (M.TB) via interactions with TIR domain-containing adaptor protein (TIRAP) and myeloid differentiation primary response 88 (MYD88). The aim of this study was to investigate the relationship of TLR1, TLR6, MYD88 and TIRAP polymorphisms with susceptibility to latent tuberculosis infection (LTBI) and tuberculosis (TB). Methods In total, 204 uninfected healthy controls (HC), 201 individuals with LTBI and 209 TB patients were enrolled. Two interferon-γ release assays were used to differentiate individuals with LTBI from uninfected controls. TagSNPs of the four genes were genotyped by the SNPscanTM Kit. The Haploview 4.2 and SHEsis software packages were combined to perform linkage disequilibrium (LD) and haplotype analyses. Multifactor dimensionality reduction (MDR) software was used to investigate gene-gene interaction. The Stata 12.0 software was used to perform meta-analysis of the relationship between rs5743557 and TB susceptibility. Results The AA genotype of rs5743557 was associated with reduced TB risk (P=0.006) and the AA/GA genotypes of TLR1 rs5743604 were associated with increased TB risk (P=0.017) when the LTBI group was compared with the TB group. The frequency of TLR1 haplotype rs4833095-rs5743604 CG was significantly higher in the LTBI group than in the TB group (P=0.019877). However, only the relationship between rs5743557 and TB susceptibility remained significant after 1000-fold permutation testing (P=0.023). The meta-analysis suggested that rs5743557_A was associated with decreased TB risk in the Chinese adult population (P<0.001, OR 0.80, 95% CI: 0.72-0.88). No significant gene-gene interactions were found. Conclusions The results of our study suggest that the tagSNP rs5743557 of TLR1 is associated with the risk of TB.
Collapse
Affiliation(s)
- Miaomiao Zhang
- Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Jing Wang
- Division of Infectious Diseases, People's Hospital of Aba Tibetan Autonomous Prefecture, Aba Autonomous 624000, China
| | - Yu Wang
- Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Shouquan Wu
- Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Andrew J Sandford
- Centre for Heart Lung Innovation, University of British Columbia and St. Paul's Hospital, Vancouver, BC, Canada
| | - Jun Luo
- Division of Infectious Diseases, People's Hospital of Aba Tibetan Autonomous Prefecture, Aba Autonomous 624000, China
| | - Jian-Qing He
- Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China
| |
Collapse
|
13
|
Yu F, Liang K, Zhang Z, Du D, Zhang X, Zhao H, Ui Haq B, Qiu F. Dissecting the genetic architecture of waterlogging stress-related traits uncovers a key waterlogging tolerance gene in maize. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2018; 131:2299-2310. [PMID: 30062652 DOI: 10.1007/s00122-018-3152-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Accepted: 07/23/2018] [Indexed: 06/08/2023]
Abstract
A key candidate gene, GRMZM2G110141, which could be used in marker-assisted selection in maize breeding programs, was detected among the 16 genetic loci associated with waterlogging tolerance identified through genome-wide association study. Waterlogging stress seriously affects the growth and development of upland crops such as maize (Zea mays L.). However, the genetic basis of waterlogging tolerance in crop plants is largely unknown. Here, we identified genetic loci for waterlogging tolerance-related traits by conducting a genome-wide association study using maize phenotypes evaluated in the greenhouse under waterlogging stress and normal conditions. A total of 110 trait-single nucleotide polymorphism associations spanning 16 genomic regions were identified; single associations explained 2.88-10.67% of the phenotypic variance. Among the genomic regions identified, 14 co-localized with previously detected waterlogging tolerance-related quantitative trail loci. Furthermore, 33 candidate genes involved in a wide range of stress-response pathways were predicted. We resequenced a key candidate gene (GRMZM2G110141) in 138 randomly selected inbred lines and found that variations in the 5'-UTR and in the mRNA abundance of this gene under waterlogging conditions were significantly associated with leaf injury. Furthermore, we detected favorable alleles of this gene and validated the favorable alleles in two different recombinant inbred line populations. These alleles enhanced waterlogging tolerance in segregating populations, strongly suggesting that GRMZM2G110141 is a key waterlogging tolerance gene. The set of waterlogging tolerance-related genomic regions and associated markers identified here could be valuable for isolating waterlogging tolerance genes and improving this trait in maize.
Collapse
Affiliation(s)
- Feng Yu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Kun Liang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Zuxin Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Dengxiang Du
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xuehai Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hailiang Zhao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Basir Ui Haq
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Fazhan Qiu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
14
|
Al-Rawi MS, Freitas A, Duarte JV, Cunha JP, Castelo-Branco M. Permutations of functional magnetic resonance imaging classification may not be normally distributed. Stat Methods Med Res 2017; 26:2567-2585. [PMID: 29251253 DOI: 10.1177/0962280215601707] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
A fundamental question that often occurs in statistical tests is the normality of distributions. Countless distributions exist in science and life, but one distribution that is obtained via permutations, usually referred to as permutation distribution, is interesting. Although a permutation distribution should behave in accord with the central limit theorem, if both the independence condition and the identical distribution condition are fulfilled, no studies have corroborated this concurrence in functional magnetic resonance imaging data. In this work, we used Anderson-Darling test to evaluate the accordance level of permutation distributions of classification accuracies to normality expected under central limit theorem. A simulation study has been carried out using functional magnetic resonance imaging data collected, while human subjects responded to visual stimulation paradigms. Two scrambling schemes are evaluated: the first based on permuting both the training and the testing sets and the second on permuting only the testing set. The results showed that, while a normal distribution does not adequately fit to permutation distributions most of the times, it tends to be quite well acceptable when mean classification accuracies averaged over a set of different classifiers is considered. The results also showed that permutation distributions can be probabilistically affected by performing motion correction to functional magnetic resonance imaging data, and thus may weaken the approximation of permutation distributions to a normal law. Such findings, however, have no relation to univariate/univoxel analysis of functional magnetic resonance imaging data. Overall, the results revealed a strong dependence across the folds of cross-validation and across functional magnetic resonance imaging runs and that may hinder the reliability of using cross-validation. The obtained p-values and the drawn confidence level intervals exhibited beyond doubt that different permutation schemes may beget different permutation distributions as well as different levels of accord with central limit theorem. We also found that different permutation schemes can lead to different permutation distributions and that may lead to different assessment of the statistical significance of classification accuracy.
Collapse
Affiliation(s)
- Mohammed S Al-Rawi
- 1 The Institute of Nuclear Sciences Applied to Health, University of Coimbra, Coimbra, Portugal.,2 Visual Neuroscience Laboratory, IBILI - Institute for Biomedical Imaging and Life Sciences, Faculty of Medicine, University of Coimbra, Coimbra, Portugal
| | - Adelaide Freitas
- 3 Department of Mathematics, Center for Research & Development in Mathematics and Applications, University of Aveiro, Aveiro, Portugal
| | - João V Duarte
- 2 Visual Neuroscience Laboratory, IBILI - Institute for Biomedical Imaging and Life Sciences, Faculty of Medicine, University of Coimbra, Coimbra, Portugal
| | - Joao P Cunha
- 4 IEETA, University of Aveiro, Aveiro, Portugal.,5 Department of Electrical and Computer Engineering, University of Porto, Porto, Portugal
| | - Miguel Castelo-Branco
- 1 The Institute of Nuclear Sciences Applied to Health, University of Coimbra, Coimbra, Portugal.,2 Visual Neuroscience Laboratory, IBILI - Institute for Biomedical Imaging and Life Sciences, Faculty of Medicine, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
15
|
Segal BD, Braun T, Elliott MR, Jiang H. Fast approximation of small p-values in permutation tests by partitioning the permutations. Biometrics 2017. [PMID: 29542118 DOI: 10.1111/biom.12731] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Researchers in genetics and other life sciences commonly use permutation tests to evaluate differences between groups. Permutation tests have desirable properties, including exactness if data are exchangeable, and are applicable even when the distribution of the test statistic is analytically intractable. However, permutation tests can be computationally intensive. We propose both an asymptotic approximation and a resampling algorithm for quickly estimating small permutation p-values (e.g., <10-6) for the difference and ratio of means in two-sample tests. Our methods are based on the distribution of test statistics within and across partitions of the permutations, which we define. In this article, we present our methods and demonstrate their use through simulations and an application to cancer genomic data. Through simulations, we find that our resampling algorithm is more computationally efficient than another leading alternative, particularly for extremely small p-values (e.g., <10-30). Through application to cancer genomic data, we find that our methods can successfully identify up- and down-regulated genes. While we focus on the difference and ratio of means, we speculate that our approaches may work in other settings.
Collapse
Affiliation(s)
- Brian D Segal
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, Michigan 48109-2029, U.S.A
| | - Thomas Braun
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, Michigan 48109-2029, U.S.A
| | - Michael R Elliott
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, Michigan 48109-2029, U.S.A
| | - Hui Jiang
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, Michigan 48109-2029, U.S.A
| |
Collapse
|
16
|
Lakhal-Chaieb L, Oualkacha K, Richards BJ, Greenwood CM. A rare variant association test in family-based designs and non-normal quantitative traits. Stat Med 2015; 35:905-21. [DOI: 10.1002/sim.6750] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Revised: 09/04/2015] [Accepted: 09/05/2015] [Indexed: 12/13/2022]
Affiliation(s)
- Lajmi Lakhal-Chaieb
- Département de mathématiques et statistique; Université Laval; Québec G1V 0A6 Québec Canada
| | - Karim Oualkacha
- Département de mathématiques; Université de Québec À Montréal; Montreal Québec Canada
| | - Brent J. Richards
- Lady Davis Institute for Medical Research; Jewish General Hospital; Montreal Québec Canada
- Department of Epidemiology, Biostatistics and Occupational Health; McGill University; Montreal Québec Canada
- Department of Twin Research; King's College London; London U.K
| | - Celia M.T. Greenwood
- Lady Davis Institute for Medical Research; Jewish General Hospital; Montreal Québec Canada
- Department of Epidemiology, Biostatistics and Occupational Health; McGill University; Montreal Québec Canada
- Departments of Oncology and Human Genetics; McGill University; Montreal Québec Canada
| |
Collapse
|
17
|
Lee BY, Lee KN, Lee T, Park JH, Kim SM, Lee HS, Chung DS, Shim HS, Lee HK, Kim H. Bovine Genome-wide Association Study for Genetic Elements to Resist the Infection of Foot-and-mouth Disease in the Field. ASIAN-AUSTRALASIAN JOURNAL OF ANIMAL SCIENCES 2015; 28:166-70. [PMID: 25557811 PMCID: PMC4283160 DOI: 10.5713/ajas.14.0383] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Revised: 08/06/2014] [Accepted: 08/21/2014] [Indexed: 12/29/2022]
Abstract
Foot-and-mouth disease (FMD) is a highly contagious disease affecting cloven-hoofed animals and causes severe economic loss and devastating effect on international trade of animal or animal products. Since FMD outbreaks have recently occurred in some Asian countries, it is important to understand the relationship between diverse immunogenomic structures of host animals and the immunity to foot-and-mouth disease virus (FMDV). We performed genome wide association study based on high-density bovine single nucleotide polymorphism (SNP) chip for identifying FMD resistant loci in Holstein cattle. Among 624532 SNP after quality control, we found that 11 SNPs on 3 chromosomes (chr17, 22, and 15) were significantly associated with the trait at the p.adjust <0.05 after PERMORY test. Most significantly associated SNPs were located on chromosome 17, around the genes Myosin XVIIIB and Seizure related 6 homolog (mouse)-like, which were associated with lung cancer. Based on the known function of the genes nearby the significant SNPs, the FMD resistant animals might have ability to improve their innate immune response to FMDV infection.
Collapse
Affiliation(s)
- Bo-Young Lee
- Foot-and-Mouth Disease Division, Animal and Plant Quarantine Agency, Anyang 430-757, Korea
| | - Kwang-Nyeong Lee
- Foot-and-Mouth Disease Division, Animal and Plant Quarantine Agency, Anyang 430-757, Korea
| | - Taeheon Lee
- Foot-and-Mouth Disease Division, Animal and Plant Quarantine Agency, Anyang 430-757, Korea
| | - Jong-Hyeon Park
- Foot-and-Mouth Disease Division, Animal and Plant Quarantine Agency, Anyang 430-757, Korea
| | - Su-Mi Kim
- Foot-and-Mouth Disease Division, Animal and Plant Quarantine Agency, Anyang 430-757, Korea
| | - Hyang-Sim Lee
- Foot-and-Mouth Disease Division, Animal and Plant Quarantine Agency, Anyang 430-757, Korea
| | - Dong-Su Chung
- Gangwon Veterinary Service Laboratory, Chuncheon 220-822, Korea
| | | | - Hak-Kyo Lee
- Genomic Informatics Center, Hankyong National University, Anseong 456-749, Korea
| | - Heebal Kim
- Foot-and-Mouth Disease Division, Animal and Plant Quarantine Agency, Anyang 430-757, Korea
| |
Collapse
|
18
|
Yang G, Jiang W, Yang Q, Yu W. PBOOST: a GPU-based tool for parallel permutation tests in genome-wide association studies. ACTA ACUST UNITED AC 2014; 31:1460-2. [PMID: 25535244 DOI: 10.1093/bioinformatics/btu840] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 12/16/2014] [Indexed: 11/13/2022]
Abstract
MOTIVATION The importance of testing associations allowing for interactions has been demonstrated by Marchini et al. (2005). A fast method detecting associations allowing for interactions has been proposed by Wan et al. (2010a). The method is based on likelihood ratio test with the assumption that the statistic follows the χ(2) distribution. Many single nucleotide polymorphism (SNP) pairs with significant associations allowing for interactions have been detected using their method. However, the assumption of χ(2) test requires the expected values in each cell of the contingency table to be at least five. This assumption is violated in some identified SNP pairs. In this case, likelihood ratio test may not be applicable any more. Permutation test is an ideal approach to checking the P-values calculated in likelihood ratio test because of its non-parametric nature. The P-values of SNP pairs having significant associations with disease are always extremely small. Thus, we need a huge number of permutations to achieve correspondingly high resolution for the P-values. In order to investigate whether the P-values from likelihood ratio tests are reliable, a fast permutation tool to accomplish large number of permutations is desirable. RESULTS We developed a permutation tool named PBOOST. It is based on GPU with highly reliable P-value estimation. By using simulation data, we found that the P-values from likelihood ratio tests will have relative error of >100% when 50% cells in the contingency table have expected count less than five or when there is zero expected count in any of the contingency table cells. In terms of speed, PBOOST completed 10(7) permutations for a single SNP pair from the Wellcome Trust Case Control Consortium (WTCCC) genome data (Wellcome Trust Case Control Consortium, 2007) within 1 min on a single Nvidia Tesla M2090 device, while it took 60 min in a single CPU Intel Xeon E5-2650 to finish the same task. More importantly, when simultaneously testing 256 SNP pairs for 10(7) permutations, our tool took only 5 min, while the CPU program took 10 h. By permuting on a GPU cluster consisting of 40 nodes, we completed 10(12) permutations for all 280 SNP pairs reported with P-values smaller than 1.6 × 10⁻¹² in the WTCCC datasets in 1 week. AVAILABILITY AND IMPLEMENTATION The source code and sample data are available at http://bioinformatics.ust.hk/PBOOST.zip. CONTACT gyang@ust.hk; eeyu@ust.hk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guangyuan Yang
- Laboratory of Bioinformatics and Computational Biology, Department of Electronic and Computer Engineering and Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Wei Jiang
- Laboratory of Bioinformatics and Computational Biology, Department of Electronic and Computer Engineering and Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Qiang Yang
- Laboratory of Bioinformatics and Computational Biology, Department of Electronic and Computer Engineering and Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Weichuan Yu
- Laboratory of Bioinformatics and Computational Biology, Department of Electronic and Computer Engineering and Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| |
Collapse
|
19
|
Che R, Jack JR, Motsinger-Reif AA, Brown CC. An adaptive permutation approach for genome-wide association study: evaluation and recommendations for use. BioData Min 2014; 7:9. [PMID: 24976866 PMCID: PMC4070098 DOI: 10.1186/1756-0381-7-9] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2013] [Accepted: 06/02/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Permutation testing is a robust and popular approach for significance testing in genomic research, which has the broad advantage of estimating significance non-parametrically, thereby safe guarding against inflated type I error rates. However, the computational efficiency remains a challenging issue that limits its wide application, particularly in genome-wide association studies (GWAS). Because of this, adaptive permutation strategies can be employed to make permutation approaches feasible. While these approaches have been used in practice, there is little research into the statistical properties of these approaches, and little guidance into the proper application of such a strategy for accurate p-value estimation at the GWAS level. METHODS In this work, we advocate an adaptive permutation procedure that is statistically valid as well as computationally feasible in GWAS. We perform extensive simulation experiments to evaluate the robustness of the approach to violations of modeling assumptions and compare the power of the adaptive approach versus standard approaches. We also evaluate the parameter choices in implementing the adaptive permutation approach to provide guidance on proper implementation in real studies. Additionally, we provide an example of the application of adaptive permutation testing on real data. RESULTS The results provide sufficient evidence that the adaptive test is robust to violations of modeling assumptions. In addition, even when modeling assumptions are correct, the power achieved by adaptive permutation is identical to the parametric approach over a range of significance thresholds and effect sizes under the alternative. A framework for proper implementation of the adaptive procedure is also generated. CONCLUSIONS While the adaptive permutation approach presented here is not novel, the current study provides evidence of the validity of the approach, and importantly provides guidance on the proper implementation of such a strategy. Additionally, tools are made available to aid investigators in implementing these approaches.
Collapse
Affiliation(s)
- Ronglin Che
- Bioinformatics Research Center, Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - John R Jack
- Bioinformatics Research Center, Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Alison A Motsinger-Reif
- Bioinformatics Research Center, Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Chad C Brown
- Bioinformatics Research Center, Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| |
Collapse
|
20
|
MCPerm: a Monte Carlo permutation method for accurately correcting the multiple testing in a meta-analysis of genetic association studies. PLoS One 2014; 9:e89212. [PMID: 24586601 PMCID: PMC3931718 DOI: 10.1371/journal.pone.0089212] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Accepted: 01/17/2014] [Indexed: 02/01/2023] Open
Abstract
Traditional permutation (TradPerm) tests are usually considered the gold standard for multiple testing corrections. However, they can be difficult to complete for the meta-analyses of genetic association studies based on multiple single nucleotide polymorphism loci as they depend on individual-level genotype and phenotype data to perform random shuffles, which are not easy to obtain. Most meta-analyses have therefore been performed using summary statistics from previously published studies. To carry out a permutation using only genotype counts without changing the size of the TradPerm P-value, we developed a Monte Carlo permutation (MCPerm) method. First, for each study included in the meta-analysis, we used a two-step hypergeometric distribution to generate a random number of genotypes in cases and controls. We then carried out a meta-analysis using these random genotype data. Finally, we obtained the corrected permutation P-value of the meta-analysis by repeating the entire process N times. We used five real datasets and five simulation datasets to evaluate the MCPerm method and our results showed the following: (1) MCPerm requires only the summary statistics of the genotype, without the need for individual-level data; (2) Genotype counts generated by our two-step hypergeometric distributions had the same distributions as genotype counts generated by shuffling; (3) MCPerm had almost exactly the same permutation P-values as TradPerm (r = 0.999; P<2.2e-16); (4) The calculation speed of MCPerm is much faster than that of TradPerm. In summary, MCPerm appears to be a viable alternative to TradPerm, and we have developed it as a freely available R package at CRAN: http://cran.r-project.org/web/packages/MCPerm/index.html.
Collapse
|
21
|
Abstract
Genome-wide association studies (GWAS) are a powerful tool for investigators to examine the human genome to detect genetic risk factors, reveal the genetic architecture of diseases and open up new opportunities for treatment and prevention. However, despite its successes, GWAS have not been able to identify genetic loci that are effective classifiers of disease, limiting their value for genetic testing. This chapter highlights the challenges that lie ahead for GWAS in better identifying disease risk predictors, and how we may address them. In this regard, we review basic concepts regarding GWAS, the technologies used for capturing genetic variation, the missing heritability problem, the need for efficient study design especially for replication efforts, reducing the bias introduced into a dataset, and how to utilize new resources available, such as electronic medical records. We also look to what lies ahead for the field, and the approaches that can be taken to realize the full potential of GWAS.
Collapse
Affiliation(s)
- Rishika De
- Department of Genetics, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | | | | |
Collapse
|
22
|
Rajabli F, Inan G, Ilk O. Power analysis of C-TDT for small sample size genome-wide association studies by the joint use of case-parent trios and pairs. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:235825. [PMID: 23737858 PMCID: PMC3659481 DOI: 10.1155/2013/235825] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/02/2013] [Revised: 04/08/2013] [Accepted: 04/13/2013] [Indexed: 11/18/2022]
Abstract
In family-based genetic association studies, it is possible to encounter missing genotype information for one of the parents. This leads to a study consisting of both case-parent trios and case-parent pairs. One of the approaches to this problem is permutation-based combined transmission disequilibrium test statistic. However, it is still unknown how powerful this test statistic is with small sample sizes. In this paper, a simulation study is carried out to estimate the power and false positive rate of this test across different sample sizes for a family-based genome-wide association study. It is observed that a statistical power of over 80% and a reasonable false positive rate estimate can be achieved even with a combination of 50 trios and 30 pairs when 2% of the SNPs are assumed to be associated. Moreover, even smaller samples provide high power when smaller percentages of SNPs are associated with the disease.
Collapse
Affiliation(s)
- Farid Rajabli
- Department of Electrical and Electronic Engineering, Faculty of Engineering, Turgut Ozal University, 06010 Ankara, Turkey.
| | | | | |
Collapse
|
23
|
Sheikh H, Kryski K, Smith H, Hayden E, Singh S. Corticotropin-releasing hormone system polymorphisms are associated with children’s cortisol reactivity. Neuroscience 2013; 229:1-11. [DOI: 10.1016/j.neuroscience.2012.10.056] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2012] [Revised: 10/26/2012] [Accepted: 10/29/2012] [Indexed: 11/26/2022]
|
24
|
Abstract
Genome-wide association studies (GWAS) have evolved over the last ten years into a powerful tool for investigating the genetic architecture of human disease. In this work, we review the key concepts underlying GWAS, including the architecture of common diseases, the structure of common human genetic variation, technologies for capturing genetic information, study designs, and the statistical methods used for data analysis. We also look forward to the future beyond GWAS.
Collapse
|
25
|
Steiß V, Letschert T, Schäfer H, Pahl R. PERMORY-MPI: a program for high-speed parallel permutation testing in genome-wide association studies. Bioinformatics 2012; 28:1168-9. [PMID: 22345620 DOI: 10.1093/bioinformatics/bts086] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED PERMORY is software for accelerated permutation testing of genome-wide association studies (GWAS). We have parallelized PERMORY using the Message-Passing Interface resulting in a nearly linear speedup. Furthermore, we added accelerated analysis of GWAS using quantitative phenotypes, and an accurate estimation of the effective number of independent tests. AVAILABILITY AND IMPLEMENTATION Free download from http://permory.org.
Collapse
Affiliation(s)
- Volker Steiß
- Institute of Medical Biometry and Epidemiology, Philipps-Universität Marburg, Marburg, Germany.
| | | | | | | |
Collapse
|
26
|
Rapid and robust resampling-based multiple-testing correction with application in a genome-wide expression quantitative trait loci study. Genetics 2012; 190:1511-20. [PMID: 22298711 DOI: 10.1534/genetics.111.137737] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. In a typical eQTL study, the huge number of genetic markers and expression traits and their complicated correlations present a challenging multiple-testing correction problem. The resampling-based test using permutation or bootstrap procedures is a standard approach to address the multiple-testing problem in eQTL studies. A brute force application of the resampling-based test to large-scale eQTL data sets is often computationally infeasible. Several computationally efficient methods have been proposed to calculate approximate resampling-based P-values. However, these methods rely on certain assumptions about the correlation structure of the genetic markers, which may not be valid for certain studies. We propose a novel algorithm, rapid and exact multiple testing correction by resampling (REM), to address this challenge. REM calculates the exact resampling-based P-values in a computationally efficient manner. The computational advantage of REM lies in its strategy of pruning the search space by skipping genetic markers whose upper bounds on test statistics are small. REM does not rely on any assumption about the correlation structure of the genetic markers. It can be applied to a variety of resampling-based multiple-testing correction methods including permutation and bootstrap methods. We evaluate REM on three eQTL data sets (yeast, inbred mouse, and human rare variants) and show that it achieves accurate resampling-based P-value estimation with much less computational cost than existing methods. The software is available at http://csbio.unc.edu/eQTL.
Collapse
|
27
|
Li MX, Yeung JMY, Cherny SS, Sham PC. Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum Genet 2011; 131:747-56. [PMID: 22143225 PMCID: PMC3325408 DOI: 10.1007/s00439-011-1118-2] [Citation(s) in RCA: 557] [Impact Index Per Article: 42.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2011] [Accepted: 11/13/2011] [Indexed: 11/25/2022]
Abstract
Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (Me) for the adjustment of multiple testing, but current methods of calculation for Me are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate Me. Applying this efficient method [implemented in a free software tool named Genetic type 1 error calculator (GEC)], we systematically examined the Me, and the corresponding p-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a p-value threshold of ~10−7 as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent p-value thresholds ~5 × 10−8 for current or merged commercial genotyping arrays, ~10−8 for all common SNPs in the 1000 Genomes Project dataset and ~5 × 10−8 for the common SNPs only within genes.
Collapse
Affiliation(s)
- Miao-Xin Li
- Department of Psychiatry, The University of Hong Kong, Pokfulam, Hong Kong
| | | | | | | |
Collapse
|
28
|
Association of milk protein genes with fertilization rate and early embryonic development in Holstein dairy cattle. J DAIRY RES 2011; 79:47-52. [DOI: 10.1017/s0022029911000744] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Concomitant with intensive selection for increased milk yield, reproductive performance of dairy cows has declined in the last decades, in part due to an unfavourable genetic relationship between these traits. Given that the six main milk protein genes (i.e. whey proteins and caseins) are directly involved in milk production and hence have been a target of the strong selection aimed at improving milk yield in dairy cattle, we hypothesized that these genes could show selection footprints associated with fertility traits. In this study, we used an in-vitro fertilization (IVF) system to test genetic association between 66 single nucleotide polymorphisms (SNPs) in the four caseins (αS1-casein, αS2-casein, β-casein and κ-casein) and the two whey protein genes (α-lactalbumin and β-lactoglobulin) with fertilization rate and early embryonic development in the Holstein breed. A total of 6893 in-vitro fertilizations were performed and a total of 4661 IVF embryos were produced using oocytes from 399 ovaries and semen samples from 12 bulls. Associations between SNPs and fertility traits were analysed using a mixed linear model with genotype as fixed effect and ovary and bull as random effects. A multiple testing correction approach was used to account for the correlation between SNPs due to linkage disequilibrium. After correction, polymorphisms in the LALBA and LGB genes showed significant associations with fertilization success and blastocyst rate. No significant associations were detected between SNPs located in the casein region and IVF fertility traits. Although the molecular mechanisms underlying the association between whey protein genes and fertility have not yet been characterized, this study provides the first evidence of association between these genes and fertility traits. Furthermore, these results could shed light on the antagonistic relationship that exists between milk yield and fertility in dairy cattle.
Collapse
|
29
|
Gui H, Li M, Sham PC, Cherny SS. Comparisons of seven algorithms for pathway analysis using the WTCCC Crohn's Disease dataset. BMC Res Notes 2011; 4:386. [PMID: 21981765 PMCID: PMC3199264 DOI: 10.1186/1756-0500-4-386] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2011] [Accepted: 10/07/2011] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Though rooted in genomic expression studies, pathway analysis for genome-wide association studies (GWAS) has gained increasing popularity, since it has the potential to discover hidden disease pathogenic mechanisms by combining statistical methods with biological knowledge. Generally, algorithms or programs proposed recently can be categorized by different types of input data, null hypothesis or counts of analysis stages. Due to complexity caused by SNP, gene and pathway relationships, re-sampling strategies like permutation are always utilized to derive an empirical distribution for test statistics for evaluating the significance of candidate pathways. However, evaluation of these algorithms on real GWAS datasets and real biological pathway databases needs to be addressed before we apply them widely with confidence. FINDINGS Two algorithms which use summary statistics from GWAS as input were implemented in KGG, a novel and user-friendly software tool for GWAS pathway analysis. Comparisons of these two algorithms as well as the other five selected algorithms were conducted by analyzing the WTCCC Crohn's Disease dataset utilizing the MsigDB canonical pathways. As a result of using permutation to obtain empirical p-value, most of these methods could control Type I error rate well, although some are conservative. However, the methods varied greatly in terms of power and running time, with the PLINK truncated set-based test being the most powerful and KGG being the fastest. CONCLUSIONS Raw data-based algorithms, such as those implemented in PLINK, are preferable for GWAS pathway analysis as long as computational capacity is available. It may be worthwhile to apply two or more pathway analysis algorithms on the same GWAS dataset, since the methods differ greatly in their outputs and might provide complementary findings for the studied complex disease.
Collapse
Affiliation(s)
- Hongsheng Gui
- Department of Psychiatry, The University of Hong Kong, Hong Kong, SAR, China.
| | | | | | | |
Collapse
|
30
|
Genes involved in vasoconstriction and vasodilation system affect salt-sensitive hypertension. PLoS One 2011; 6:e19620. [PMID: 21573014 PMCID: PMC3090407 DOI: 10.1371/journal.pone.0019620] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2010] [Accepted: 04/12/2011] [Indexed: 01/11/2023] Open
Abstract
The importance of excess salt intake in the pathogenesis of hypertension is widely recognized. Blood pressure is controlled primarily by salt and water balance because of the infinite gain property of the kidney to rapidly eliminate excess fluid and salt. Up to fifty percent of patients with essential hypertension are salt-sensitive, as manifested by a rise in blood pressure with salt loading. We conducted a two-stage genetic analysis in hypertensive patients very accurately phenotyped for their salt-sensitivity. All newly discovered never treated before, essential hypertensives underwent an acute salt load to monitor the simultaneous changes in blood pressure and renal sodium excretion. The first stage consisted in an association analysis of genotyping data derived from genome-wide array on 329 subjects. Principal Component Analysis demonstrated that this population was homogenous. Among the strongest results, we detected a cluster of SNPs located in the first introns of PRKG1 gene (rs7897633, p = 2.34E-05) associated with variation in diastolic blood pressure after acute salt load. We further focused on two genetic loci, SLC24A3 and SLC8A1 (plasma membrane sodium/calcium exchange proteins, NCKX3 and NCX1, respectively) with a functional relationship with the previous gene and associated to variations in systolic blood pressure (the imputed rs3790261, p = 4.55E-06; and rs434082, p = 4.7E-03). In stage 2, we characterized 159 more patients for the SNPs in PRKG1, SLC24A3 and SLC8A1. Combined analysis showed an epistatic interaction of SNPs in SLC24A3 and SLC8A1 on the pressure-natriuresis (p interaction = 1.55E-04, p model = 3.35E-05), supporting their pathophysiological link in cellular calcium homeostasis. In conclusions, these findings point to a clear association between body sodium-blood pressure relations and molecules modulating the contractile state of vascular cells through an increase in cytoplasmic calcium concentration.
Collapse
|
31
|
Johnson RC, Nelson GW, Troyer JL, Lautenberger JA, Kessing BD, Winkler CA, O'Brien SJ. Accounting for multiple comparisons in a genome-wide association study (GWAS). BMC Genomics 2010; 11:724. [PMID: 21176216 PMCID: PMC3023815 DOI: 10.1186/1471-2164-11-724] [Citation(s) in RCA: 189] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2010] [Accepted: 12/22/2010] [Indexed: 11/20/2022] Open
Abstract
Background As we enter an era when testing millions of SNPs in a single gene association study will become the standard, consideration of multiple comparisons is an essential part of determining statistical significance. Bonferroni adjustments can be made but are conservative due to the preponderance of linkage disequilibrium (LD) between genetic markers, and permutation testing is not always a viable option. Three major classes of corrections have been proposed to correct the dependent nature of genetic data in Bonferroni adjustments: permutation testing and related alternatives, principal components analysis (PCA), and analysis of blocks of LD across the genome. We consider seven implementations of these commonly used methods using data from 1514 European American participants genotyped for 700,078 SNPs in a GWAS for AIDS. Results A Bonferroni correction using the number of LD blocks found by the three algorithms implemented by Haploview resulted in an insufficiently conservative threshold, corresponding to a genome-wide significance level of α = 0.15 - 0.20. We observed a moderate increase in power when using PRESTO, SLIDE, and simpleℳ when compared with traditional Bonferroni methods for population data genotyped on the Affymetrix 6.0 platform in European Americans (α = 0.05 thresholds between 1 × 10-7 and 7 × 10-8). Conclusions Correcting for the number of LD blocks resulted in an anti-conservative Bonferroni adjustment. SLIDE and simpleℳ are particularly useful when using a statistical test not handled in optimized permutation testing packages, and genome-wide corrected p-values using SLIDE, are much easier to interpret for consumers of GWAS studies.
Collapse
Affiliation(s)
- Randall C Johnson
- Basic Research Program, SAIC-Frederick, Inc. NCI-Frederick, Frederick, MD, USA
| | | | | | | | | | | | | |
Collapse
|