1
|
Wang K, Alberding SY. Powerful Test of Heterogeneity in Two-Sample Summary-Data Mendelian Randomization. Stat Med 2024; 43:5791-5802. [PMID: 39552275 PMCID: PMC11639658 DOI: 10.1002/sim.10279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 07/30/2024] [Accepted: 10/26/2024] [Indexed: 11/19/2024]
Abstract
BACKGROUND The success of a Mendelian randomization (MR) study critically depends on the validity of the assumptions underlying MR. We focus on detecting heterogeneity (also known as horizontal pleiotropy) in two-sample summary-data MR. A popular approach is to apply Cochran'sQ $$ Q $$ statistic method, developed for meta-analysis. However, Cochran'sQ $$ Q $$ statistic, including its modifications, is known to lack power when its degrees of freedom are large. Furthermore, there is no theoretical justification for the claimed null distribution of the minimum of the modified Cochran'sQ $$ Q $$ statistic with exact weighting (Q min $$ {Q}_{\mathrm{min}} $$ ), although it seems to perform well in simulation studies. METHOD The principle of our proposed method is straightforward: if a set of variables are valid instruments, then any linear combination of these variables is still a valid instrument. Specifically, this principle holds when these linear combinations are formed using eigenvectors derived from a variance matrix. Each linear combination follows a known normal distribution from which ap $$ p $$ value can be calculated. We use the minimump $$ p $$ value for these eigenvector-based linear combinations as the test statistic. Additionally, we explore a modification of the modified Cochran'sQ $$ Q $$ statistic by replacing the weighting matrix with a truncated singular value decomposition. RESULTS Extensive simulation studies reveal that the proposed methods outperform Cochran'sQ $$ Q $$ statistic, including those with modified weights, and MR-PRESSO, another popular method for detecting heterogeneity, in cases where the number of instruments is not large or the Wald ratios take two values. We also demonstrate these methods using empirical examples. Furthermore, we show thatQ min $$ {Q}_{\mathrm{min}} $$ does not follow, but is dominated by, the claimed null chi-square distribution. The proposed methods are implemented in an R package iGasso. CONCLUSIONS Dimension reduction techniques are useful for generating powerful tests of heterogeneity in MR.
Collapse
Affiliation(s)
- Kai Wang
- Department of BiostatisticsUniversity of IowaIowa CityIowaUSA
| | | |
Collapse
|
2
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Rare variant association on unrelated individuals in case-control studies using aggregation tests: existing methods and current limitations. Brief Bioinform 2023; 24:bbad412. [PMID: 37974506 DOI: 10.1093/bib/bbad412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 10/14/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- WELBIO department, WEL Research Institute, avenue Pasteur, 6, 1300 Wavre, Belgium
| |
Collapse
|
3
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
4
|
Knutson KA, Pan W. MATS: a novel multi-ancestry transcriptome-wide association study to account for heterogeneity in the effects of cis-regulated gene expression on complex traits. Hum Mol Genet 2023; 32:1237-1251. [PMID: 36179104 PMCID: PMC10077507 DOI: 10.1093/hmg/ddac247] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 09/16/2022] [Accepted: 09/28/2022] [Indexed: 01/16/2023] Open
Abstract
The Transcriptome-Wide Association Study (TWAS) is a widely used approach which integrates gene expression and Genome Wide Association Study (GWAS) data to study the role of cis-regulated gene expression (GEx) in complex traits. However, the genetic architecture of GEx varies across populations, and recent findings point to possible ancestral heterogeneity in the effects of GEx on complex traits, which may be amplified in TWAS by modeling GEx as a function of cis-eQTLs. Here, we present a novel extension to TWAS to account for heterogeneity in the effects of cis-regulated GEx which are correlated with ancestry. Our proposed Multi-Ancestry TwaS (MATS) framework jointly analyzes samples from multiple populations and distinguishes between shared, ancestry-specific and/or subject-specific expression-trait associations. As such, MATS amplifies power to detect shared GEx associations over ancestry-stratified TWAS through increased sample sizes, and facilitates the detection of genes with subgroup-specific associations which may be masked by standard TWAS. Our simulations highlight the improved Type-I error conservation and power of MATS compared with competing approaches. Our real data applications to Alzheimer's disease (AD) case-control genotypes from the Alzheimer's Disease Sequencing Project (ADSP) and continuous phenotypes from the UK Biobank (UKBB) identify a number of unique gene-trait associations which were not discovered through standard and/or ancestry-stratified TWAS. Ultimately, these findings promote MATS as a powerful method for detecting and estimating significant gene expression effects on complex traits within multi-ancestry cohorts and corroborates the mounting evidence for inter-population heterogeneity in gene-trait associations.
Collapse
Affiliation(s)
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
5
|
Li B, Jin B, Capra JA, Bush WS. Integration of Protein Structure and Population-Scale DNA Sequence Data for Disease Gene Discovery and Variant Interpretation. Annu Rev Biomed Data Sci 2022; 5:141-161. [PMID: 35508071 DOI: 10.1146/annurev-biodatasci-122220-112147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The experimental and computational techniques for capturing information about protein structures and genetic variation within the human genome have advanced dramatically in the past 20 years, generating extensive new data resources. In this review, we discuss these advances, along with new approaches for determining the impact a genetic variant has on protein function. We focus on the potential of new methods that integrate human genetic variation into protein structures to discover relationships to disease, including the discovery of mutational hotspots in cancer-related proteins, the localization of protein-altering variants within protein regions for common complex diseases, and the assessment of variants of unknown significance for Mendelian traits. We expect that approaches that integrate these data sources will play increasingly important roles in disease gene discovery and variant interpretation. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Bian Li
- Department of Biological Sciences and Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, USA
| | - Bowen Jin
- Graduate Program in Systems Biology and Bioinformatics, Department of Nutrition, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - John A Capra
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA;
| | - William S Bush
- Cleveland Institute for Computational Biology, Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA;
| |
Collapse
|
6
|
Wu C, Xu G, Shen X, Pan W. A Regularization-Based Adaptive Test for High-Dimensional Generalized Linear Models. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2020; 21:128. [PMID: 32802002 PMCID: PMC7425805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In spite of its urgent importance in the era of big data, testing high-dimensional parameters in generalized linear models (GLMs) in the presence of high-dimensional nuisance parameters has been largely under-studied, especially with regard to constructing powerful tests for general (and unknown) alternatives. Most existing tests are powerful only against certain alternatives and may yield incorrect Type I error rates under high-dimensional nuisance parameter situations. In this paper, we propose the adaptive interaction sum of powered score (aiSPU) test in the framework of penalized regression with a non-convex penalty, called truncated Lasso penalty (TLP), which can maintain correct Type I error rates while yielding high statistical power across a wide range of alternatives. To calculate its p-values analytically, we derive its asymptotic null distribution. Via simulations, its superior finite-sample performance is demonstrated over several representative existing methods. In addition, we apply it and other representative tests to an Alzheimer's Disease Neuroimaging Initiative (ADNI) data set, detecting possible gene-gender interactions for Alzheimer's disease. We also put R package "aispu" implementing the proposed test on GitHub.
Collapse
Affiliation(s)
- Chong Wu
- Department of Statistics, Florida State University, FL, USA
| | - Gongjun Xu
- Department of Statistics, University of Michigan, MI, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, MN, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, MN, USA
| |
Collapse
|
7
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|
8
|
Bi W, Li Y, Smeltzer MP, Gao G, Zhao S, Kang G. STEPS: an efficient prospective likelihood approach to genetic association analyses of secondary traits in extreme phenotype sequencing. Biostatistics 2020; 21:33-49. [PMID: 30007308 PMCID: PMC8559722 DOI: 10.1093/biostatistics/kxy030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 05/16/2018] [Accepted: 06/02/2018] [Indexed: 11/13/2022] Open
Abstract
It has been well acknowledged that methods for secondary trait (ST) association analyses under a case-control design (ST$_{\text{CC}}$) should carefully consider the sampling process to avoid biased risk estimates. A similar situation also exists in the extreme phenotype sequencing (EPS) designs, which is to select subjects with extreme values of continuous primary phenotype for sequencing. EPS designs are commonly used in modern epidemiological and clinical studies such as the well-known National Heart, Lung, and Blood Institute Exome Sequencing Project. Although naïve generalized regression or ST$_{\text{CC}}$ method could be applied, their validity is questionable due to difference in statistical designs. Herein, we propose a general prospective likelihood framework to perform association testing for binary and continuous STs under EPS designs (STEPS), which can also incorporate covariates and interaction terms. We provide a computationally efficient and robust algorithm to obtain the maximum likelihood estimates. We also present two empirical mathematical formulas for power/sample size calculations to facilitate planning of binary/continuous STs association analyses under EPS designs. Extensive simulations and application to a genome-wide association study of benign ethnic neutropenia under an EPS design demonstrate the superiority of STEPS over all its alternatives above.
Collapse
Affiliation(s)
- Wenjian Bi
- Department of Biostatistics, St. Jude Children’s Research
Hospital, Memphis, TN 38105, USA
| | - Yun Li
- Department of Genetics, University of North Carolina, Chapel
Hill, NC 27599, USA
- Department of Biostatistics, University of North Carolina, Chapel
Hill, NC 27599, USA
- Department of Computer Science, University of North Carolina,
Chapel Hill, NC 27599, USA
| | - Matthew P Smeltzer
- Division of Epidemiology, Biostatistics, and Environmental Health, School of
Public Health, University of Memphis, Memphis, TN 38152, USA
| | - Guimin Gao
- Department of Public Health Sciences, University of Chicago,
Chicago, IL 60637, USA
| | - Shengli Zhao
- School of Statistics, Qufu Normal University, Qufu 273165, PR
China
| | - Guolian Kang
- Department of Biostatistics, St. Jude Children’s Research
Hospital, Memphis, TN 38105, USA
| |
Collapse
|
9
|
Yang T, Kim J, Wu C, Ma Y, Wei P, Pan W. An adaptive test for meta-analysis of rare variant association studies. Genet Epidemiol 2020; 44:104-116. [PMID: 31830326 PMCID: PMC6980317 DOI: 10.1002/gepi.22273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/12/2019] [Accepted: 11/25/2019] [Indexed: 01/02/2023]
Abstract
Single genome-wide studies may be underpowered to detect trait-associated rare variants with moderate or weak effect sizes. As a viable alternative, meta-analysis is widely used to increase power by combining different studies. The power of meta-analysis critically depends on the underlying association patterns and heterogeneity levels, which are unknown and vary from locus to locus. However, existing methods mainly focus on one or only a few combinations of the association pattern and heterogeneity level, thus may lose power in many situations. To address this issue, we propose a general and unified framework by combining a class of tests including and beyond some existing ones, leading to high power across a wide range of scenarios. We demonstrate that the proposed test is more powerful than some existing methods in simulation studies, then show their performance with the NHLBI Exome-Sequencing Project (ESP) data. One gene (B4GALNT2) was found by our proposed test, but not by others, to be statistically significantly associated with plasma triglyceride. The signal was driven by African-ancestry subjects but it was previously reported to be associated with coronary artery disease among European-ancestry subjects. We implemented our method in an R package aSPUmeta, publicly available at https://github.com/ytzhong/metaRV and will be on CRAN soon.
Collapse
Affiliation(s)
- Tianzhong Yang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Junghi Kim
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Yiding Ma
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
10
|
Chen Z, Wang K. Gene-based sequential burden association test. Stat Med 2019; 38:2353-2363. [PMID: 30706509 DOI: 10.1002/sim.8111] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 11/29/2018] [Accepted: 01/10/2019] [Indexed: 11/10/2022]
Abstract
Detecting the association between a set of variants and a phenotype of interest is the first and important step in genetic and genomic studies. Although it attracted a large amount of attention in the scientific community and several related statistical approaches have been proposed in the literature, powerful and robust statistical tests are still highly desired and yet to be developed in this area. In this paper, we propose a powerful and robust association test, which combines information from each individual single-nucleotide polymorphisms based on sequential independent burden tests. We compare the proposed approach with some popular tests through a comprehensive simulation study and real data application. Our results show that, in general, the new test is more powerful; the gain in detecting power can be substantial in many situations, compared to other methods.
Collapse
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, Bloomington, Indiana
| | - Kai Wang
- Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, Iowa
| |
Collapse
|
11
|
Larson NB, Chen J, Schaid DJ. A review of kernel methods for genetic association studies. Genet Epidemiol 2019; 43:122-136. [PMID: 30604442 DOI: 10.1002/gepi.22180] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 11/09/2018] [Accepted: 11/26/2018] [Indexed: 12/17/2022]
Abstract
Evaluating the association of multiple genetic variants with a trait of interest by use of kernel-based methods has made a significant impact on how genetic association analyses are conducted. An advantage of kernel methods is that they tend to be robust when the genetic variants have effects that are a mixture of positive and negative effects, as well as when there is a small fraction of causal variants. Another advantage is that kernel methods fit within the framework of mixed models, providing flexible ways to adjust for additional covariates that influence traits. Herein, we review the basic ideas behind the use of kernel methods for genetic association analysis as well as recent methodological advancements for different types of traits, multivariate traits, pedigree data, and longitudinal data. Finally, we discuss opportunities for future research.
Collapse
Affiliation(s)
- Nicholas B Larson
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Jun Chen
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Daniel J Schaid
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
12
|
Wu Y, Zhang H, Liu X, Shi Z, Li H, Wang Z, Jie X, Huang S, Zhang F, Li J, Zhang K, Gao X. Mutations of ARX and non-syndromic intellectual disability in Chinese population. Genes Genomics 2018; 41:125-131. [PMID: 30255221 DOI: 10.1007/s13258-018-0745-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 09/15/2018] [Indexed: 02/08/2023]
Abstract
Mutations of Aristaless-related homeobox (ARX) gene were looked as the third cause of non-syndromic intellectual disability (NSID), while the boundary between true disease-causing mutations and non-disease-causing variants within this gene remains elusive. To investigate the relationship between ARX mutations and NSID, a panel comprising six reported causal mutations of the ARX was detected in 369 sporadic NSID patients and 550 random participants in Chinese. Two mutations, c.428_451 dup and p.G286S, may be disease-causing mutations for NSID, while p.Q163R and p.P353L showed a great predictive value in female NSID diagnosis with significant associations (X2 = 19.60, p = 9.54e-6 for p.Q163R; X2 = 25.70, p = 4.00e-07 for p.P353L), carriers of these mutations had an increased risk of NSID of more than fourfold. Detection of this panel also predicted significant associations between genetic variants of the ARX gene and NSID (p = 3.73e-4). The present study emphasized the higher genetic burden of the ARX gene on NSID in the Chinese population, molecular analysis of this gene should be considered for patients presenting NSID of unknown etiology.
Collapse
Affiliation(s)
- Yufei Wu
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), Institute of Population and Health, Northwest University, Xi'an, 710069, China
| | - Huan Zhang
- The 2nd Affiliated Hospital, Xi'an Jiaotong University, Xi'an, 710004, China
| | - Xiaofen Liu
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), Institute of Population and Health, Northwest University, Xi'an, 710069, China
| | - Zhangyan Shi
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), Institute of Population and Health, Northwest University, Xi'an, 710069, China
| | - Hongling Li
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), Institute of Population and Health, Northwest University, Xi'an, 710069, China
| | - Zhibin Wang
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), Institute of Population and Health, Northwest University, Xi'an, 710069, China
| | - Xiaoyong Jie
- Xi'an Cangning Psychiatric Hospital, Xi'an, 710114, China
| | - Shaoping Huang
- The 2nd Affiliated Hospital, Xi'an Jiaotong University, Xi'an, 710004, China
| | - Fuchang Zhang
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), Institute of Population and Health, Northwest University, Xi'an, 710069, China.,College of Public Management, Institute of Application Psychology, Northwest University, Xi'an, 710127, China
| | - Junlin Li
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), Institute of Population and Health, Northwest University, Xi'an, 710069, China
| | - Kejin Zhang
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), Institute of Population and Health, Northwest University, Xi'an, 710069, China.
| | - Xiaocai Gao
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), Institute of Population and Health, Northwest University, Xi'an, 710069, China. .,College of Public Management, Institute of Application Psychology, Northwest University, Xi'an, 710127, China.
| |
Collapse
|
13
|
Chen Z, Liu Q, Wang K. A novel gene-set association test based on variance-gamma distribution. Stat Methods Med Res 2018; 28:2868-2875. [PMID: 30056781 DOI: 10.1177/0962280218791205] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Several gene- or set-based association tests have been proposed recently in the literature. Powerful statistical approaches are still highly desirable in this area. In this paper we propose a novel statistical association test, which uses information of the burden component and its complement from the genotypes. This new test statistic has a simple null distribution, which is a special and simplified variance-gamma distribution, and its p-value can be easily calculated. Through a comprehensive simulation study, we show that the new test can control type I error rate and has superior detecting power compared with some popular existing methods. We also apply the new approach to a real data set; the results demonstrate that this test is promising.
Collapse
Affiliation(s)
- Zhongxue Chen
- 1 Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, Bloomington, IN, USA
| | - Qingzhong Liu
- 2 Department of Computer Science, Sam Houston State University, Huntsville, Texas 77341, USA
| | - Kai Wang
- 3 Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
14
|
Chen Z, Liu Q, Wang K. A genetic association test through combining two independent tests. Genomics 2018; 111:1152-1159. [PMID: 30009923 DOI: 10.1016/j.ygeno.2018.07.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 06/25/2018] [Accepted: 07/11/2018] [Indexed: 12/21/2022]
Abstract
Gene- and pathway-based variant association tests are important tools in finding genetic variants that are associated with phenotypes of interest. Although some methods have been proposed in the literature, powerful and robust statistical tests are still desirable in this area. In this study, we propose a statistical test based on decomposing the genotype data into orthogonal parts from which powerful and robust independent p-value combination approaches can be utilized. Through a comprehensive simulation study, we compare the proposed test with some existing popular ones. Our simulation results show that the new test has great performance in terms of controlling type I error rate and statistical power. Real data applications are also conducted to illustrate the performance and usefulness of the proposed test.
Collapse
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, 1025 E. 7th street, Bloomington, IN 47405, USA.
| | - Qingzhong Liu
- Department of Computer Science, Sam Houston State University, 1803 Avenue I, Huntsville, TX 77341, USA
| | - Kai Wang
- Department of Biostatistics, College of Public Health, University of Iowa, 145 N. Riverside Drive, Iowa City, IA 52242, USA
| |
Collapse
|
15
|
Russo A, Di Gaetano C, Cugliari G, Matullo G. Advances in the Genetics of Hypertension: The Effect of Rare Variants. Int J Mol Sci 2018; 19:E688. [PMID: 29495593 PMCID: PMC5877549 DOI: 10.3390/ijms19030688] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 02/19/2018] [Accepted: 02/26/2018] [Indexed: 12/22/2022] Open
Abstract
Worldwide, hypertension still represents a serious health burden with nine million people dying as a consequence of hypertension-related complications. Essential hypertension is a complex trait supported by multifactorial genetic inheritance together with environmental factors. The heritability of blood pressure (BP) is estimated to be 30-50%. A great effort was made to find genetic variants affecting BP levels through Genome-Wide Association Studies (GWAS). This approach relies on the "common disease-common variant" hypothesis and led to the identification of multiple genetic variants which explain, in aggregate, only 2-3% of the genetic variance of hypertension. Part of the missing genetic information could be caused by variants too rare to be detected by GWAS. The use of exome chips and Next-Generation Sequencing facilitated the discovery of causative variants. Here, we report the advances in the detection of novel rare variants, genes, and/or pathways through the most promising approaches, and the recent statistical tests that have emerged to handle rare variants. We also discuss the need to further support rare novel variants with replication studies within larger consortia and with deeper functional studies to better understand how new genes might improve patient care and the stratification of the response to antihypertensive treatments.
Collapse
Affiliation(s)
- Alessia Russo
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Cornelia Di Gaetano
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Giovanni Cugliari
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Giuseppe Matullo
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| |
Collapse
|
16
|
Chen Z, Lu Y, Lin T, Liu Q, Wang K. Gene-based genetic association test with adaptive optimal weights. Genet Epidemiol 2017; 42:95-103. [DOI: 10.1002/gepi.22098] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 10/22/2017] [Indexed: 12/13/2022]
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics; School of Public Health; Indiana University Bloomington; Bloomington Indiana United States of America
| | - Yan Lu
- Department of Mathematics and Statistics; University of New Mexico; Albuquerque New Mexico United States of America
| | - Tong Lin
- The Key Laboratory of Machine Perception (Ministry of Education); School of EECS; Peking University; Beijing China
| | - Qingzhong Liu
- Department of Computer Science; Sam Houston State University; Huntsville Texas United States of America
| | - Kai Wang
- Department of Biostatistics; College of Public Health; University of Iowa; Iowa City Iowa United States of America
| |
Collapse
|
17
|
RL-SKAT: An Exact and Efficient Score Test for Heritability and Set Tests. Genetics 2017; 207:1275-1283. [PMID: 29025915 DOI: 10.1534/genetics.117.300395] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 09/24/2017] [Indexed: 11/18/2022] Open
Abstract
Testing for the existence of variance components in linear mixed models is a fundamental task in many applicative fields. In statistical genetics, the score test has recently become instrumental in the task of testing an association between a set of genetic markers and a phenotype. With few markers, this amounts to set-based variance component tests, which attempt to increase power in association studies by aggregating weak individual effects. When the entire genome is considered, it allows testing for the heritability of a phenotype, defined as the proportion of phenotypic variance explained by genetics. In the popular score-based Sequence Kernel Association Test (SKAT) method, the assumed distribution of the score test statistic is uncalibrated in small samples, with a correction being computationally expensive. This may cause severe inflation or deflation of P-values, even when the null hypothesis is true. Here, we characterize the conditions under which this discrepancy holds, and show it may occur also in large real datasets, such as a dataset from the Wellcome Trust Case Control Consortium 2 (n = 13,950) study, and, in particular, when the individuals in the sample are unrelated. In these cases, the SKAT approximation tends to be highly overconservative and therefore underpowered. To address this limitation, we suggest an efficient method to calculate exact P-values for the score test in the case of a single variance component and a continuous response vector, which can speed up the analysis by orders of magnitude. Our results enable fast and accurate application of the score test in heritability and in set-based association tests. Our method is available in http://github.com/cozygene/RL-SKAT.
Collapse
|
18
|
A Powerful Variant-Set Association Test Based on Chi-Square Distribution. Genetics 2017; 207:903-910. [PMID: 28912342 DOI: 10.1534/genetics.117.300287] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 09/10/2017] [Indexed: 01/19/2023] Open
Abstract
Detecting the association between a set of variants and a given phenotype has attracted a large amount of attention in the scientific community, although it is a difficult task. Recently, several related statistical approaches have been proposed in the literature; powerful statistical tests are still highly desired and yet to be developed in this area. In this paper, we propose a powerful test that combines information from each individual single nucleotide polymorphism (SNP) based on principal component analysis without relying on the eigenvalues associated with the principal components. We compare the proposed approach with some popular tests through a simulation study and real data applications. Our results show that, in general, the new test is more powerful than its competitors considered in this study; the gain in detecting power can be substantial in many situations.
Collapse
|
19
|
A Powerful Framework for Integrating eQTL and GWAS Summary Data. Genetics 2017; 207:893-902. [PMID: 28893853 DOI: 10.1534/genetics.117.300270] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 09/05/2017] [Indexed: 01/26/2023] Open
Abstract
Two new gene-based association analysis methods, called PrediXcan and TWAS for GWAS individual-level and summary data, respectively, were recently proposed to integrate GWAS with eQTL data, alleviating two common problems in GWAS by boosting statistical power and facilitating biological interpretation of GWAS discoveries. Based on a novel reformulation of PrediXcan and TWAS, we propose a more powerful gene-based association test to integrate single set or multiple sets of eQTL data with GWAS individual-level data or summary statistics. The proposed test was applied to several GWAS datasets, including two lipid summary association datasets based on [Formula: see text] and [Formula: see text] samples, respectively, and uncovered more known or novel trait-associated genes, showcasing much improved performance of our proposed method. The software implementing the proposed method is freely available as an R package.
Collapse
|
20
|
A gene-based test of association through an orthogonal decomposition of genotype scores. Hum Genet 2017; 136:1385-1394. [PMID: 28864915 DOI: 10.1007/s00439-017-1839-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 08/26/2017] [Indexed: 10/18/2022]
Abstract
The burden test and the sequence kernel association test (SKAT) are two popular methods for detecting association with rare variants. Treated as two different sources of association information, they are adaptively combined to form an optimal SKAT (SKAT-O) method for optimal power. We show that the burden test is part of rather than independent of the SKAT. We introduce a new test statistic that is the sum of the burden statistic and a statistic asymptotically independent of the burden statistic. The performance of this new test statistic is demonstrated through extensive simulation studies and applications to a Genetic Analysis Workshop 17 data set and the Ocular Hypertension Treatment Study data.
Collapse
|
21
|
Chen Z, Han S, Wang K. Genetic association test based on principal component analysis. Stat Appl Genet Mol Biol 2017; 16:189-198. [DOI: 10.1515/sagmb-2016-0061] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
AbstractMany gene- and pathway-based association tests have been proposed in the literature. Among them, the SKAT is widely used, especially for rare variants association studies. In this paper, we investigate the connection between SKAT and a principal component analysis. This investigation leads to a procedure that encompasses SKAT as a special case. Through simulation studies and real data applications, we compare the proposed method with some existing tests.
Collapse
|