1
|
Yang H, Wang X, Zhang Z, Chen F, Cao H, Yan L, Gao X, Dong H, Cui Y. A high-dimensional omnibus test for set-based association analysis. Brief Bioinform 2024; 25:bbae456. [PMID: 39288231 PMCID: PMC11407446 DOI: 10.1093/bib/bbae456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 08/21/2024] [Accepted: 09/03/2024] [Indexed: 09/19/2024] Open
Abstract
Set-based association analysis is a valuable tool in studying the etiology of complex diseases in genome-wide association studies, as it allows for the joint testing of variants in a region or group. Two common types of single nucleotide polymorphism (SNP)-disease functional models are recognized when evaluating the joint function of a set of SNP: the cumulative weak signal model, in which multiple functional variants with small effects contribute to disease risk, and the dominating strong signal model, in which a few functional variants with large effects contribute to disease risk. However, existing methods have two main limitations that reduce their power. Firstly, they typically only consider one disease-SNP association model, which can result in significant power loss if the model is misspecified. Secondly, they do not account for the high-dimensional nature of SNPs, leading to low power or high false positives. In this study, we propose a solution to these challenges by using a high-dimensional inference procedure that involves simultaneously fitting many SNPs in a regression model. We also propose an omnibus testing procedure that employs a robust and powerful P-value combination method to enhance the power of SNP-set association. Our results from extensive simulation studies and a real data analysis demonstrate that our set-based high-dimensional inference strategy is both flexible and computationally efficient and can substantially improve the power of SNP-set association analysis. Application to a real dataset further demonstrates the utility of the testing strategy.
Collapse
Affiliation(s)
- Haitao Yang
- Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
- Hebei Key Laboratory of Environment and Human Health, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
- Hebei Key Laboratory of Forensic Medicine, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
| | - Xin Wang
- Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
| | - Zechen Zhang
- Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
- Hebei Key Laboratory of Environment and Human Health, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
| | - Fuzhao Chen
- Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
| | - Hongyan Cao
- Department of Health Statistics, Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, School of Public Health; MOE Key Laboratory of Coal Environmental Pathogenicity and Prevention, Shanxi Medical University, No 56 Xinjian South Rd., Taiyuan, Shanxi 030001, P.R. China
| | - Lina Yan
- Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
- Hebei Key Laboratory of Environment and Human Health, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
| | - Xia Gao
- Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
- Hebei Key Laboratory of Environment and Human Health, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
| | - Hui Dong
- Department of Neurology, Second Hospital of Hebei Medical University, 215 West Heping Road, Shijiazhuang, Hebei 050000, P.R. China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, 619 Red Cedar Rd., East Lansing, MI 48824, United States
| |
Collapse
|
2
|
Nagle MF, Yuan J, Kaur D, Ma C, Peremyslova E, Jiang Y, Goralogia GS, Magnuson A, Li JY, Muchero W, Fuxin L, Strauss SH. Genome-wide association study and network analysis of in vitro transformation in Populus trichocarpa support key roles of diverse phytohormone pathways and cross talk. THE NEW PHYTOLOGIST 2024. [PMID: 38650352 DOI: 10.1111/nph.19737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 03/06/2024] [Indexed: 04/25/2024]
Abstract
Wide variation in amenability to transformation and regeneration (TR) among many plant species and genotypes presents a challenge to the use of genetic engineering in research and breeding. To help understand the causes of this variation, we performed association mapping and network analysis using a population of 1204 wild trees of Populus trichocarpa (black cottonwood). To enable precise and high-throughput phenotyping of callus and shoot TR, we developed a computer vision system that cross-referenced complementary red, green, and blue (RGB) and fluorescent-hyperspectral images. We performed association mapping using single-marker and combined variant methods, followed by statistical tests for epistasis and integration of published multi-omic datasets to identify likely regulatory hubs. We report 409 candidate genes implicated by associations within 5 kb of coding sequences, and epistasis tests implicated 81 of these candidate genes as regulators of one another. Gene ontology terms related to protein-protein interactions and transcriptional regulation are overrepresented, among others. In addition to auxin and cytokinin pathways long established as critical to TR, our results highlight the importance of stress and wounding pathways. Potential regulatory hubs of signaling within and across these pathways include GROWTH REGULATORY FACTOR 1 (GRF1), PHOSPHATIDYLINOSITOL 4-KINASE β1 (PI-4Kβ1), and OBF-BINDING PROTEIN 1 (OBP1).
Collapse
Affiliation(s)
- Michael F Nagle
- Department of Forest Ecosystems & Society, Oregon State University, Corvallis, OR, 97331, USA
| | - Jialin Yuan
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, 97331, USA
| | - Damanpreet Kaur
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, 97331, USA
| | - Cathleen Ma
- Department of Forest Ecosystems & Society, Oregon State University, Corvallis, OR, 97331, USA
| | - Ekaterina Peremyslova
- Department of Forest Ecosystems & Society, Oregon State University, Corvallis, OR, 97331, USA
| | - Yuan Jiang
- Statistics Department, Oregon State University, Corvallis, OR, 97331, USA
| | - Greg S Goralogia
- Department of Forest Ecosystems & Society, Oregon State University, Corvallis, OR, 97331, USA
| | - Anna Magnuson
- Department of Forest Ecosystems & Society, Oregon State University, Corvallis, OR, 97331, USA
| | - Jia Yi Li
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, 97331, USA
| | - Wellington Muchero
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
- Bredesen Center for Interdisciplinary Research, University of Tennessee, Knoxville, TN, 37996, USA
| | - Li Fuxin
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, 97331, USA
| | - Steven H Strauss
- Department of Forest Ecosystems & Society, Oregon State University, Corvallis, OR, 97331, USA
| |
Collapse
|
3
|
Nagle MF, Yuan J, Kaur D, Ma C, Peremyslova E, Jiang Y, Niño de Rivera A, Jawdy S, Chen JG, Feng K, Yates TB, Tuskan GA, Muchero W, Fuxin L, Strauss SH. GWAS supported by computer vision identifies large numbers of candidate regulators of in planta regeneration in Populus trichocarpa. G3 (BETHESDA, MD.) 2024; 14:jkae026. [PMID: 38325329 PMCID: PMC10989874 DOI: 10.1093/g3journal/jkae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/18/2024] [Accepted: 01/20/2024] [Indexed: 02/09/2024]
Abstract
Plant regeneration is an important dimension of plant propagation and a key step in the production of transgenic plants. However, regeneration capacity varies widely among genotypes and species, the molecular basis of which is largely unknown. Association mapping methods such as genome-wide association studies (GWAS) have long demonstrated abilities to help uncover the genetic basis of trait variation in plants; however, the performance of these methods depends on the accuracy and scale of phenotyping. To enable a large-scale GWAS of in planta callus and shoot regeneration in the model tree Populus, we developed a phenomics workflow involving semantic segmentation to quantify regenerating plant tissues over time. We found that the resulting statistics were of highly non-normal distributions, and thus employed transformations or permutations to avoid violating assumptions of linear models used in GWAS. We report over 200 statistically supported quantitative trait loci (QTLs), with genes encompassing or near to top QTLs including regulators of cell adhesion, stress signaling, and hormone signaling pathways, as well as other diverse functions. Our results encourage models of hormonal signaling during plant regeneration to consider keystone roles of stress-related signaling (e.g. involving jasmonates and salicylic acid), in addition to the auxin and cytokinin pathways commonly considered. The putative regulatory genes and biological processes we identified provide new insights into the biological complexity of plant regeneration, and may serve as new reagents for improving regeneration and transformation of recalcitrant genotypes and species.
Collapse
Affiliation(s)
- Michael F Nagle
- Department of Forest Ecosystems and Society, Oregon State University, 321 Richardson Hall, Corvallis, OR 97311, USA
| | - Jialin Yuan
- Department of Electrical Engineering and Computer Science, Oregon State University, 1148 Kelley Engineering Center, Corvallis, OR 97331, USA
| | - Damanpreet Kaur
- Department of Electrical Engineering and Computer Science, Oregon State University, 1148 Kelley Engineering Center, Corvallis, OR 97331, USA
| | - Cathleen Ma
- Department of Forest Ecosystems and Society, Oregon State University, 321 Richardson Hall, Corvallis, OR 97311, USA
| | - Ekaterina Peremyslova
- Department of Forest Ecosystems and Society, Oregon State University, 321 Richardson Hall, Corvallis, OR 97311, USA
| | - Yuan Jiang
- Statistics Department, Oregon State University, 239 Weniger Hall, Corvallis, OR 97331, USA
| | - Alexa Niño de Rivera
- Department of Forest Ecosystems and Society, Oregon State University, 321 Richardson Hall, Corvallis, OR 97311, USA
| | - Sara Jawdy
- Biosciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
| | - Jin-Gui Chen
- Biosciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Bredesen Center for Interdisciplinary Research, University of Tennessee-Knoxville, 310 Ferris Hall 1508 Middle Dr, Knoxville, TN 37996, USA
| | - Kai Feng
- Biosciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
| | - Timothy B Yates
- Biosciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Bredesen Center for Interdisciplinary Research, University of Tennessee-Knoxville, 310 Ferris Hall 1508 Middle Dr, Knoxville, TN 37996, USA
| | - Gerald A Tuskan
- Biosciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
| | - Wellington Muchero
- Biosciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Bredesen Center for Interdisciplinary Research, University of Tennessee-Knoxville, 310 Ferris Hall 1508 Middle Dr, Knoxville, TN 37996, USA
| | - Li Fuxin
- Department of Electrical Engineering and Computer Science, Oregon State University, 1148 Kelley Engineering Center, Corvallis, OR 97331, USA
| | - Steven H Strauss
- Department of Forest Ecosystems and Society, Oregon State University, 321 Richardson Hall, Corvallis, OR 97311, USA
| |
Collapse
|
4
|
Nagle MF, Yuan J, Kaur D, Ma C, Peremyslova E, Jiang Y, Zahl B, Niño de Rivera A, Muchero W, Fuxin L, Strauss SH. GWAS identifies candidate genes controlling adventitious rooting in Populus trichocarpa. HORTICULTURE RESEARCH 2023; 10:uhad125. [PMID: 37560019 PMCID: PMC10407606 DOI: 10.1093/hr/uhad125] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 06/05/2023] [Indexed: 08/11/2023]
Abstract
Adventitious rooting (AR) is critical to the propagation, breeding, and genetic engineering of trees. The capacity for plants to undergo this process is highly heritable and of a polygenic nature; however, the basis of its genetic variation is largely uncharacterized. To identify genetic regulators of AR, we performed a genome-wide association study (GWAS) using 1148 genotypes of Populus trichocarpa. GWASs are often limited by the abilities of researchers to collect precise phenotype data on a high-throughput scale; to help overcome this limitation, we developed a computer vision system to measure an array of traits related to adventitious root development in poplar, including temporal measures of lateral and basal root length and area. GWAS was performed using multiple methods and significance thresholds to handle non-normal phenotype statistics and to gain statistical power. These analyses yielded a total of 277 unique associations, suggesting that genes that control rooting include regulators of hormone signaling, cell division and structure, reactive oxygen species signaling, and other processes with known roles in root development. Numerous genes with uncharacterized functions and/or cryptic roles were also identified. These candidates provide targets for functional analysis, including physiological and epistatic analyses, to better characterize the complex polygenic regulation of AR.
Collapse
Affiliation(s)
- Michael F Nagle
- Department of Forest Ecosystems and Society, Oregon State University, 3180 SW Jefferson Way, Corvallis, OR, 97331, United States
| | - Jialin Yuan
- Department of Electrical Engineering and Computer Science, Oregon State University, 110 SW Park Terrace, Corvallis, OR, 97331, United States
| | - Damanpreet Kaur
- Department of Electrical Engineering and Computer Science, Oregon State University, 110 SW Park Terrace, Corvallis, OR, 97331, United States
| | - Cathleen Ma
- Department of Forest Ecosystems and Society, Oregon State University, 3180 SW Jefferson Way, Corvallis, OR, 97331, United States
| | - Ekaterina Peremyslova
- Department of Forest Ecosystems and Society, Oregon State University, 3180 SW Jefferson Way, Corvallis, OR, 97331, United States
| | - Yuan Jiang
- Statistics Department, Oregon State University, 103 SW Memorial Place, Corvallis, OR, 97331, United States
| | - Bahiya Zahl
- Department of Forest Ecosystems and Society, Oregon State University, 3180 SW Jefferson Way, Corvallis, OR, 97331, United States
| | - Alexa Niño de Rivera
- Department of Forest Ecosystems and Society, Oregon State University, 3180 SW Jefferson Way, Corvallis, OR, 97331, United States
| | - Wellington Muchero
- Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge, TN, 37830, United States
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge, TN, 37830, United States
- Bredesen Center for Interdisciplinary Research, University of Tennessee, 821 Volunteer Blvd., Knoxville, TN, 37996, United States
| | - Li Fuxin
- Department of Electrical Engineering and Computer Science, Oregon State University, 110 SW Park Terrace, Corvallis, OR, 97331, United States
| | - Steven H Strauss
- Department of Forest Ecosystems and Society, Oregon State University, 3180 SW Jefferson Way, Corvallis, OR, 97331, United States
| |
Collapse
|
5
|
Shao Z, Wang T, Qiao J, Zhang Y, Huang S, Zeng P. A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies. BMC Bioinformatics 2022; 23:359. [PMID: 36042399 PMCID: PMC9429742 DOI: 10.1186/s12859-022-04897-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/22/2022] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-set methods, few comprehensive comparison studies have been previously performed to evaluate the effectiveness of these methods. RESULTS We herein sought to fill this knowledge gap by conducting a comprehensive empirical comparison for 22 commonly-used summary-statistics based SNP-set methods. We showed that only seven methods could effectively control the type I error, and that these well-calibrated approaches had varying power performance under the simulation scenarios. Overall, we confirmed that the burden test was generally underpowered and score-based variance component tests (e.g., sequence kernel association test) were much powerful under the polygenic genetic architecture in both common and rare variant association analyses. We further revealed that two linkage-disequilibrium-free P value combination methods (e.g., harmonic mean P value method and aggregated Cauchy association test) behaved very well under the sparse genetic architecture in simulations and real-data applications to common and rare variant association analyses as well as in expression quantitative trait loci weighted integrative analysis. We also assessed the scalability of these approaches by recording computational time and found that all these methods can be scalable to biobank-scale data although some might be relatively slow. CONCLUSION In conclusion, we hope that our findings can offer an important guidance on how to choose appropriate multilocus association analysis methods in post-GWAS era. All the SNP-set methods are implemented in the R package called MCA, which is freely available at https://github.com/biostatpzeng/ .
Collapse
Affiliation(s)
- Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuchen Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
6
|
Duroux D, Climente-González H, Azencott CA, Van Steen K. Interpretable network-guided epistasis detection. Gigascience 2022; 11:giab093. [PMID: 35134928 PMCID: PMC8848319 DOI: 10.1093/gigascience/giab093] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 10/12/2021] [Accepted: 12/13/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Detecting epistatic interactions at the gene level is essential to understanding the biological mechanisms of complex diseases. Unfortunately, genome-wide interaction association studies involve many statistical challenges that make such detection hard. We propose a multi-step protocol for epistasis detection along the edges of a gene-gene co-function network. Such an approach reduces the number of tests performed and provides interpretable interactions while keeping type I error controlled. Yet, mapping gene interactions into testable single-nucleotide polymorphism (SNP)-interaction hypotheses, as well as computing gene pair association scores from SNP pair ones, is not trivial. RESULTS Here we compare 3 SNP-gene mappings (positional overlap, expression quantitative trait loci, and proximity in 3D structure) and use the adaptive truncated product method to compute gene pair scores. This method is non-parametric, does not require a known null distribution, and is fast to compute. We apply multiple variants of this protocol to a genome-wide association study dataset on inflammatory bowel disease. Different configurations produced different results, highlighting that various mechanisms are implicated in inflammatory bowel disease, while at the same time, results overlapped with known disease characteristics. Importantly, the proposed pipeline also differs from a conventional approach where no network is used, showing the potential for additional discoveries when prior biological knowledge is incorporated into epistasis detection.
Collapse
Affiliation(s)
- Diane Duroux
- BIO3 - Systems Genetics, GIGA-R Medical Genomics, University of Liège, 4000 Liège, Belgium, 11 Liège 4000, Belgium
| | - Héctor Climente-González
- Institut Curie, PSL Research University, F-75005 Paris, France
- INSERM, U900, F-75005 Paris, France
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France
- High-Dimensional Statistical Modeling Team, RIKEN Center for Advanced Intelligence Project, Chuo-ku, Tokyo 103-0027, Japan
| | - Chloé-Agathe Azencott
- Institut Curie, PSL Research University, F-75005 Paris, France
- INSERM, U900, F-75005 Paris, France
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France
| | - Kristel Van Steen
- BIO3 - Systems Genetics, GIGA-R Medical Genomics, University of Liège, 4000 Liège, Belgium, 11 Liège 4000, Belgium
- BIO3 - Systems Medicine, Department of Human Genetics, KU Leuven, 3000 Leuven, Belgium, 49 3000 Leuven, Belgium
| |
Collapse
|
7
|
Vsevolozhskaya OA, Shi M, Hu F, Zaykin DV. DOT: Gene-set analysis by combining decorrelated association statistics. PLoS Comput Biol 2020; 16:e1007819. [PMID: 32287273 PMCID: PMC7182280 DOI: 10.1371/journal.pcbi.1007819] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 04/24/2020] [Accepted: 03/23/2020] [Indexed: 12/12/2022] Open
Abstract
Historically, the majority of statistical association methods have been designed assuming availability of SNP-level information. However, modern genetic and sequencing data present new challenges to access and sharing of genotype-phenotype datasets, including cost of management, difficulties in consolidation of records across research groups, etc. These issues make methods based on SNP-level summary statistics particularly appealing. The most common form of combining statistics is a sum of SNP-level squared scores, possibly weighted, as in burden tests for rare variants. The overall significance of the resulting statistic is evaluated using its distribution under the null hypothesis. Here, we demonstrate that this basic approach can be substantially improved by decorrelating scores prior to their addition, resulting in remarkable power gains in situations that are most commonly encountered in practice; namely, under heterogeneity of effect sizes and diversity between pairwise LD. In these situations, the power of the traditional test, based on the added squared scores, quickly reaches a ceiling, as the number of variants increases. Thus, the traditional approach does not benefit from information potentially contained in any additional SNPs, while our decorrelation by orthogonal transformation (DOT) method yields steady gain in power. We present theoretical and computational analyses of both approaches, and reveal causes behind sometimes dramatic difference in their respective powers. We showcase DOT by analyzing breast cancer and cleft lip data, in which our method strengthened levels of previously reported associations and implied the possibility of multiple new alleles that jointly confer disease risk.
Collapse
Affiliation(s)
- Olga A. Vsevolozhskaya
- Department of Biostatistics, College of Public Health, University of Kentucky, Lexington, Kentucky, United States of America
| | - Min Shi
- Biostatistics and Computational Biology, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Fengjiao Hu
- Biostatistics and Computational Biology, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Dmitri V. Zaykin
- Biostatistics and Computational Biology, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| |
Collapse
|