1
|
Du J, Wang C, Wang L, Mao S, Zhu B, Li Z, Fan X. Automatic block-wise genotype-phenotype association detection based on hidden Markov model. BMC Bioinformatics 2023; 24:138. [PMID: 37029361 PMCID: PMC10082540 DOI: 10.1186/s12859-023-05265-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 03/31/2023] [Indexed: 04/09/2023] Open
Abstract
BACKGROUND For detecting genotype-phenotype association from case-control single nucleotide polymorphism (SNP) data, one class of methods relies on testing each genomic variant site individually. However, this approach ignores the tendency for associated variant sites to be spatially clustered instead of uniformly distributed along the genome. Therefore, a more recent class of methods looks for blocks of influential variant sites. Unfortunately, existing such methods either assume prior knowledge of the blocks, or rely on ad hoc moving windows. A principled method is needed to automatically detect genomic variant blocks which are associated with the phenotype. RESULTS In this paper, we introduce an automatic block-wise Genome-Wide Association Study (GWAS) method based on Hidden Markov model. Using case-control SNP data as input, our method detects the number of blocks associated with the phenotype and the locations of the blocks. Correspondingly, the minor allele of each variate site will be classified as having negative influence, no influence or positive influence on the phenotype. We evaluated our method using both datasets simulated from our model and datasets from a block model different from ours, and compared the performance with other methods. These included both simple methods based on the Fisher's exact test, applied site-by-site, as well as more complex methods built into the recent Zoom-Focus Algorithm. Across all simulations, our method consistently outperformed the comparisons. CONCLUSIONS With its demonstrated better performance, we expect our algorithm for detecting influential variant sites may help find more accurate signals across a wide range of case-control GWAS.
Collapse
Affiliation(s)
- Jin Du
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
| | - Chaojie Wang
- School of Mathematical Science, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Lijun Wang
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Shanjun Mao
- College of Finance and Statistics, Hunan University, Changsha, Hunan Province, China
| | - Bencong Zhu
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Zheng Li
- Department of Surgery, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
| |
Collapse
|
2
|
Jin X, Shi G. Variance-component-based meta-analysis of gene-environment interactions for rare variants. G3-GENES GENOMES GENETICS 2021; 11:6298593. [PMID: 34544119 PMCID: PMC8661424 DOI: 10.1093/g3journal/jkab203] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 06/07/2021] [Indexed: 11/13/2022]
Abstract
Complex diseases are often caused by interplay between genetic and environmental factors. Existing gene-environment interaction (G × E) tests for rare variants largely focus on detecting gene-based G × E effects in a single study; thus, their statistical power is limited by the sample size of the study. Meta-analysis methods that synthesize summary statistics of G × E effects from multiple studies for rare variants are still limited. Based on variance component models, we propose four meta-analysis methods of testing G × E effects for rare variants: HOM-INT-FIX, HET-INT-FIX, HOM-INT-RAN, and HET-INT-RAN. Our methods consider homogeneous or heterogeneous G × E effects across studies and treat the main genetic effect as either fixed or random. Through simulations, we show that the empirical distributions of the four meta-statistics under the null hypothesis align with their expected theoretical distributions. When the interaction effect is homogeneous across studies, HOM-INT-FIX and HOM-INT-RAN have as much statistical power as a pooled analysis conducted on a single interaction test with individual-level data from all studies. When the interaction effect is heterogeneous across studies, HET-INT-FIX and HET-INT-RAN provide higher power than pooled analysis. Our methods are further validated via testing 12 candidate gene-age interactions in blood pressure traits using whole-exome sequencing data from UK Biobank.
Collapse
Affiliation(s)
- Xiaoqin Jin
- State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an 710071, China
| | - Gang Shi
- State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an 710071, China
| |
Collapse
|
3
|
Sun R, Weng H, Wang MH. W-Test for Genetic Epistasis Testing. Methods Mol Biol 2021; 2212:45-53. [PMID: 33733349 DOI: 10.1007/978-1-0716-0947-7_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The genetic epistasis effect has been widely acknowledged as an essential contributor to genetic variation in complex diseases. In this chapter, we introduce a powerful and efficient statistical method, called W-test, for genetic epistasis testing. A wtest R package is developed for the implementation of the W-test method, which provides various functions to measure the main effect, pairwise interaction, higher-order interaction, and cis-regulation of SNP-CpG pairs in genetic and epigenetic data. It allows flexible stagewise and exhaustive association testing as well as diagnostic checking on the probability distributions in a user-friendly interface. The wtest package is available in CRAN at https://CRAN.R-project.org/package=wtest .
Collapse
Affiliation(s)
- Rui Sun
- The Chinese University of Hong Kong, Hong Kong, China
| | - Haoyi Weng
- The Chinese University of Hong Kong, Hong Kong, China
| | | |
Collapse
|
4
|
Sun R, Xia X, Chong KC, Zee BCY, Wu WKK, Wang MH. wtest: an integrated R package for genetic epistasis testing. BMC Med Genomics 2019; 12:180. [PMID: 31874630 PMCID: PMC6929460 DOI: 10.1186/s12920-019-0638-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 11/26/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND With the increasing amount of high-throughput genomic sequencing data, there is a growing demand for a robust and flexible tool to perform interaction analysis. The identification of SNP-SNP, SNP-CpG, and higher order interactions helps explain the genetic etiology of human diseases, yet genome-wide analysis for interactions has been very challenging, due to the computational burden and a lack of statistical power in most datasets. RESULTS The wtest R package performs association testing for main effects, pairwise and high order interactions in genome-wide association study data, and cis-regulation of SNP and CpG sites in genome-wide and epigenome-wide data. The software includes a number of post-test diagnostic and analysis functions and offers an integrated toolset for genetic epistasis testing. CONCLUSIONS The wtest is an efficient and powerful statistical tool for integrated genetic epistasis testing. The package is available in CRAN: https://CRAN.R-project.org/package=wtest.
Collapse
Affiliation(s)
- Rui Sun
- Division of Biostatistics and Centre for Clinical Research and Biostatistics(CCRB), JC School of Public Health and Primary Care, the Chinese University of Hong Kong, Sha Tin, Hong Kong SAR, China.,Centre for Clinical Trials and Biostatistics, CUHK Shenzhen Research Institute, Shenzhen, China
| | - Xiaoxuan Xia
- Division of Biostatistics and Centre for Clinical Research and Biostatistics(CCRB), JC School of Public Health and Primary Care, the Chinese University of Hong Kong, Sha Tin, Hong Kong SAR, China.,Centre for Clinical Trials and Biostatistics, CUHK Shenzhen Research Institute, Shenzhen, China
| | - Ka Chun Chong
- Division of Biostatistics and Centre for Clinical Research and Biostatistics(CCRB), JC School of Public Health and Primary Care, the Chinese University of Hong Kong, Sha Tin, Hong Kong SAR, China.,Centre for Clinical Trials and Biostatistics, CUHK Shenzhen Research Institute, Shenzhen, China
| | - Benny Chung-Ying Zee
- Division of Biostatistics and Centre for Clinical Research and Biostatistics(CCRB), JC School of Public Health and Primary Care, the Chinese University of Hong Kong, Sha Tin, Hong Kong SAR, China.,Centre for Clinical Trials and Biostatistics, CUHK Shenzhen Research Institute, Shenzhen, China
| | - William Ka Kei Wu
- Institute of Digestive Diseases and Department of Medicine & Therapeutics, State Key Laboratory of Digestive Diseases, LKS Institute of Health Sciences, CUHK Shenzhen Research Institute, Shenzhen, China.,Department of Anesthesia, the Chinese University of Hong Kong, Sha Tin, Hong Kong SAR, China
| | - Maggie Haitian Wang
- Division of Biostatistics and Centre for Clinical Research and Biostatistics(CCRB), JC School of Public Health and Primary Care, the Chinese University of Hong Kong, Sha Tin, Hong Kong SAR, China. .,Centre for Clinical Trials and Biostatistics, CUHK Shenzhen Research Institute, Shenzhen, China.
| |
Collapse
|
5
|
Wang MH, Weng H, Sun R, Lee J, Wu WKK, Chong KC, Zee BCY. A Zoom-Focus algorithm (ZFA) to locate the optimal testing region for rare variant association tests. Bioinformatics 2018; 33:2330-2336. [PMID: 28334355 DOI: 10.1093/bioinformatics/btx130] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Accepted: 03/09/2017] [Indexed: 01/24/2023] Open
Abstract
Motivation Increasing amounts of whole exome or genome sequencing data present the challenge of analysing rare variants with extremely small minor allele frequencies. Various statistical tests have been proposed, which are specifically configured to increase power for rare variants by conducting the test within a certain bin, such as a gene or a pathway. However, a gene may contain from several to thousands of markers, and not all of them are related to the phenotype. Combining functional and non-functional variants in an arbitrary genomic region could impair the testing power. Results We propose a Zoom-Focus algorithm (ZFA) to locate the optimal testing region within a given genomic region. It can be applied as a wrapper function in existing rare variant association tests to increase testing power. The algorithm consists of two steps. In the first step, Zooming, a given genomic region is partitioned by an order of two, and the best partition is located. In the second step, Focusing, the boundaries of the zoomed region are refined. Simulation studies showed that ZFA substantially increased the statistical power of rare variants' tests, including the SKAT, SKAT-O, burden test and the W-test. The algorithm was applied on real exome sequencing data of hypertensive disorder, and identified biologically relevant genetic markers to metabolic disorders that were undetectable by a gene-based method. The proposed algorithm is an efficient and powerful tool to enhance the power of association study for whole exome or genome sequencing data. Availability and Implementation The ZFA software is available at: http://www2.ccrb.cuhk.edu.hk/statgene/software.html. Contact maggiew@cuhk.edu.hk or bzee@cuhk.edu.hk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maggie Haitian Wang
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, N.T, Hong Kong SAR.,CUHK Shenzhen Research Institute, Shenzhen, China
| | - Haoyi Weng
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, N.T, Hong Kong SAR.,CUHK Shenzhen Research Institute, Shenzhen, China
| | - Rui Sun
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, N.T, Hong Kong SAR.,CUHK Shenzhen Research Institute, Shenzhen, China
| | - Jack Lee
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, N.T, Hong Kong SAR.,CUHK Shenzhen Research Institute, Shenzhen, China
| | - William Ka Kei Wu
- Department of Anaesthesia and Intensive Care, The Chinese University of Hong Kong, Hong Kong SAR
| | - Ka Chun Chong
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, N.T, Hong Kong SAR.,CUHK Shenzhen Research Institute, Shenzhen, China
| | - Benny Chung-Ying Zee
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, N.T, Hong Kong SAR.,CUHK Shenzhen Research Institute, Shenzhen, China
| |
Collapse
|
6
|
Russo A, Di Gaetano C, Cugliari G, Matullo G. Advances in the Genetics of Hypertension: The Effect of Rare Variants. Int J Mol Sci 2018; 19:E688. [PMID: 29495593 PMCID: PMC5877549 DOI: 10.3390/ijms19030688] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 02/19/2018] [Accepted: 02/26/2018] [Indexed: 12/22/2022] Open
Abstract
Worldwide, hypertension still represents a serious health burden with nine million people dying as a consequence of hypertension-related complications. Essential hypertension is a complex trait supported by multifactorial genetic inheritance together with environmental factors. The heritability of blood pressure (BP) is estimated to be 30-50%. A great effort was made to find genetic variants affecting BP levels through Genome-Wide Association Studies (GWAS). This approach relies on the "common disease-common variant" hypothesis and led to the identification of multiple genetic variants which explain, in aggregate, only 2-3% of the genetic variance of hypertension. Part of the missing genetic information could be caused by variants too rare to be detected by GWAS. The use of exome chips and Next-Generation Sequencing facilitated the discovery of causative variants. Here, we report the advances in the detection of novel rare variants, genes, and/or pathways through the most promising approaches, and the recent statistical tests that have emerged to handle rare variants. We also discuss the need to further support rare novel variants with replication studies within larger consortia and with deeper functional studies to better understand how new genes might improve patient care and the stratification of the response to antihypertensive treatments.
Collapse
Affiliation(s)
- Alessia Russo
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Cornelia Di Gaetano
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Giovanni Cugliari
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Giuseppe Matullo
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| |
Collapse
|