1
|
Incorporating information from markers in LD with test locus for detecting imprinting and maternal effects. Eur J Hum Genet 2020; 28:1087-1097. [PMID: 32080366 DOI: 10.1038/s41431-020-0590-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 11/26/2019] [Accepted: 02/04/2020] [Indexed: 11/08/2022] Open
Abstract
Numerous statistical methods have been developed to explore genomic imprinting and maternal effects by identifying parent-of-origin patterns in complex human diseases. However, because most of these methods only use available locus-specific genotype data, it is sometimes impossible for them to infer the distribution of parental origin of a variant allele, especially when some genotypes are missing. In this article, we propose a two-step approach, LIMEhap, to improve upon a recent partial likelihood inference method. In the first step, the distribution of the missing genotypes is inferred through the construction of haplotypes by using information from nearby loci. In the second step, a partial likelihood method is applied to the inferred data. To substantiate the validity of the proposed procedures, we simulated data in a genomic region of gene GPX1. The results show that, by borrowing genetic information from nearby loci, the power of the proposed method can be close to that with complete genotype data at the locus of interest. Since the inference on the genotype distribution is made under the assumption of Hardy-Weinberg Equilibrium (HWE), we further studied the robustness of LIMEhap to violation of HWE. Finally, we demonstrate the utility of LIMEhap by applying it to an autism dataset.
Collapse
|
2
|
Wang L, Graubard BI, Li Y. A composite likelihood approach in testing for Hardy Weinberg Equilibrium using family-based genetic survey data. Stat Med 2016; 35:5040-5050. [PMID: 27481259 PMCID: PMC7210008 DOI: 10.1002/sim.7044] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 06/20/2016] [Accepted: 06/23/2016] [Indexed: 11/09/2022]
Abstract
In population-based household surveys, for example, the National Health and Nutrition Examination Survey, households are often sampled by stratified multistage cluster sampling, and multiple individuals related by blood are often sampled within households. Therefore, genetic data collected from these population-based household surveys, called National Genetic Household Surveys, can be correlated because of two levels of correlation. One level of correlation is caused by the multistage geographical cluster sampling and the other is caused by biological inheritance among participants within the same sampled family. In this paper, we develop an efficient Hardy Weinberg Equilibrium (HWE) test utilizing pairwise composite likelihood methods that incorporate the sample weighting effect induced by the differential selection probabilities in complex sample designs, as well as the two-level clustering (correlation) effects described above. Monte Carlo simulation studies show that the proposed HWE test maintains the nominal levels, and is more powerful than existing methods (Li et al. 2011) under various (non)informative sample designs that depend on genotypes (explicitly or implicitly), family relationships or both, especially when within-household sampling depends on the genotypes. The developed tests are further evaluated using simulated genetic data based on the Hispanic Health and Nutrition Survey. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Lingxiao Wang
- Joint Program in Survey Methodology, University of Maryland, College Park, MD, 20742, U.S.A
| | - Barry I Graubard
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, 20850, U.S.A
| | - Yan Li
- Joint Program in Survey Methodology, University of Maryland, College Park, MD, 20742, U.S.A..
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, 20850, U.S.A..
| |
Collapse
|
3
|
Biswas S, Xia S, Lin S. Detecting rare haplotype-environment interaction with logistic Bayesian LASSO. Genet Epidemiol 2013; 38:31-41. [PMID: 24272913 DOI: 10.1002/gepi.21773] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Revised: 09/13/2013] [Accepted: 10/15/2013] [Indexed: 11/09/2022]
Abstract
Two important contributors to missing heritability are believed to be rare variants and gene-environment interaction (GXE). Thus, detecting GXE where G is a rare haplotype variant (rHTV) is a pressing problem. Haplotype analysis is usually the natural second step to follow up on a genomic region that is implicated to be associated through single nucleotide variants (SNV) analysis. Further, rHTV can tag associated rare SNV and provide greater power to detect them than popular collapsing methods. Recently we proposed Logistic Bayesian LASSO (LBL) for detecting rHTV association with case-control data. LBL shrinks the unassociated (especially common) haplotypes toward zero so that an associated rHTV can be identified with greater power. Here, we incorporate environmental factors and their interactions with haplotypes in LBL. As LBL is based on retrospective likelihood, this extension is not trivial. We model the joint distribution of haplotypes and covariates given the case-control status. We apply the approach (LBL-GXE) to the Michigan, Mayo, AREDS, Pennsylvania Cohort Study on Age-related Macular Degeneration (AMD). LBL-GXE detects interaction of a specific rHTV in CFH gene with smoking. To the best of our knowledge, this is the first time in the AMD literature that an interaction of smoking with a specific (rather than pooled) rHTV has been implicated. We also carry out simulations and find that LBL-GXE has reasonably good powers for detecting interactions with rHTV while keeping the type I error rates well controlled. Thus, we conclude that LBL-GXE is a useful tool for uncovering missing heritability.
Collapse
Affiliation(s)
- Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, United States of America
| | | | | |
Collapse
|
4
|
Li Y. A comparison of tests for Hardy-Weinberg Equilibrium in national genetic household surveys. BMC Genet 2013; 14:14. [PMID: 23448225 PMCID: PMC3606615 DOI: 10.1186/1471-2156-14-14] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2012] [Accepted: 02/18/2013] [Indexed: 12/21/2022] Open
Abstract
Background This study is motivated by National Household Surveys that collect genetic data, in which complex samples (e.g. stratified multistage cluster sample), partially from the same family, are selected. In addition to the differential selection probabilities of selecting households and persons within the sampled households, there are two levels of correlations of the collected genetic data in National Genetic Household Surveys (NGHS). The first level of correlation is induced by the hierarchical geographic clustered sampling of households and the second level of correlation is induced by biological inheritances from individuals sampled in the same household. Results To test for Hardy-Weinberg Equilibrium (HWE) in NGHS, two test statistics, the CCS method [1] and the QS method [2], appear to be the only existing methods that take account of both correlations. In this paper, I evaluate both methods in terms of the test size and power under a variety of complex designs with different weighting schemes and varying magnitudes of the two correlation effects. Both methods are applied to a real data example from the Hispanic Health and Nutrition Examination Survey with simulated genotype data. Conclusions The QS method maintains the nominal size well and consistently achieves higher power than the CCS method in testing HWE under a variety of sample designs, and therefore is recommended for testing HWE of genetic survey data with complex designs.
Collapse
Affiliation(s)
- Yan Li
- Joint Program in Survey Methodology, University of Maryland at College Park, College Park, MD 20742, USA.
| |
Collapse
|
5
|
Lin D, Weinberg CR, Feng R, Hochner H, Chen J. A multi-locus likelihood method for assessing parent-of-origin effects using case-control mother-child pairs. Genet Epidemiol 2012. [PMID: 23184538 DOI: 10.1002/gepi.21700] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Parent-of-origin effects have been pointed out to be one plausible source of the heritability that was unexplained by genome-wide association studies. Here, we consider a case-control mother-child pair design for studying parent-of-origin effects of offspring genes on neonatal/early-life disorders or pregnancy-related conditions. In contrast to the standard case-control design, the case-control mother-child pair design contains valuable parental information and therefore permits powerful assessment of parent-of-origin effects. Suppose the region under study is in Hardy-Weinberg equilibrium, inheritance is Mendelian at the diallelic locus under study, there is random mating in the source population, and the SNP under study is not related to risk for the phenotype under study because of linkage disequilibrium (LD) with other SNPs. Using a maximum likelihood method that simultaneously assesses likely parental sources and estimates effect sizes of the two offspring genotypes, we investigate the extent of power increase for testing parent-of-origin effects through the incorporation of genotype data for adjacent markers that are in LD with the test locus. Our method does not need to assume the outcome is rare because it exploits supplementary information on phenotype prevalence. Analysis with simulated SNP data indicates that incorporating genotype data for adjacent markers greatly help recover the parent-of-origin information. This recovery can sometimes substantially improve statistical power for detecting parent-of-origin effects. We demonstrate our method by examining parent-of-origin effects of the gene PPARGC1A on low birth weight using data from 636 mother-child pairs in the Jerusalem Perinatal Study.
Collapse
Affiliation(s)
- Dongyu Lin
- Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| | | | | | | | | |
Collapse
|
6
|
French B, Lumley T, Cappola TP, Mitra N. Non-iterative, regression-based estimation of haplotype associations with censored survival outcomes. Stat Appl Genet Mol Biol 2012; 11:Article 4. [PMID: 22499703 DOI: 10.1515/1544-6115.1764] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The general availability of reliable and affordable genotyping technology has enabled genetic association studies to move beyond small case-control studies to large prospective studies. For prospective studies, genetic information can be integrated into the analysis via haplotypes, with focus on their association with a censored survival outcome. We develop non-iterative, regression-based methods to estimate associations between common haplotypes and a censored survival outcome in large cohort studies. Our non-iterative methods--weighted estimation and weighted haplotype combination--are both based on the Cox regression model, but differ in how the imputed haplotypes are integrated into the model. Our approaches enable haplotype imputation to be performed once as a simple data-processing step, and thus avoid implementation based on sophisticated algorithms that iterate between haplotype imputation and risk estimation. We show that non-iterative weighted estimation and weighted haplotype combination provide valid tests for genetic associations and reliable estimates of moderate associations between common haplotypes and a censored survival outcome, and are straightforward to implement in standard statistical software. We apply the methods to an analysis of HSPB7-CLCNKA haplotypes and risk of adverse outcomes in a prospective cohort study of outpatients with chronic heart failure.
Collapse
|
7
|
Abstract
Genetic association studies often investigate the effect of haplotypes on an outcome of interest. Haplotypes are not observed directly, and this complicates the inclusion of such effects in survival models. We describe a new estimating equations approach for Cox's regression model to assess haplotype effects for survival data. These estimating equations are simple to implement and avoid the use of the EM algorithm, which may be slow in the context of the semiparametric Cox model with incomplete covariate information. These estimating equations also lead to easily computable, direct estimators of standard errors, and thus overcome some of the difficulty in obtaining variance estimators based on the EM algorithm in this setting. We also develop an easily implemented goodness-of-fit procedure for Cox's regression model including haplotype effects. Finally, we apply the procedures presented in this article to investigate possible haplotype effects of the PAF-receptor on cardiovascular events in patients with coronary artery disease, and compare our results to those based on the EM algorithm.
Collapse
Affiliation(s)
- Thomas H Scheike
- Department of Biostatistics, University of Copenhagen, Copenhagen K, Denmark.
| | | | | |
Collapse
|
8
|
Li Y, Graubard BI. Testing Hardy-Weinberg Equilibrium and Homogeneity of Hardy-Weinberg Disequilibrium using Complex Survey Data. Biometrics 2009; 65:1096-104. [DOI: 10.1111/j.1541-0420.2009.01199.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
9
|
Sinha S, Gruber SB, Mukherjee B, Rennert G. Inference of the haplotype effect in a matched case-control study using unphased genotype data. Int J Biostat 2008; 4:Article 6. [PMID: 20231916 PMCID: PMC2835450 DOI: 10.2202/1557-4679.1079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Typically locus specific genotype data do not contain information regarding the gametic phase of haplotypes, especially when an individual is heterozygous at more than one locus among a large number of linked polymorphic loci. Thus, studying disease-haplotype association using unphased genotype data is essentially a problem of handling a missing covariate in a case-control design. There are several methods for estimating a disease-haplotype association parameter in a matched case-control study. Here we propose a conditional likelihood approach for inference regarding the disease-haplotype association using unphased genotype data arising from a matched case-control study design. The proposed method relies on a logistic disease risk model and a Hardy-Weinberg equilibrium (HWE) among the control population only. We develop an expectation and conditional maximization (ECM) algorithm for jointly estimating the haplotype frequency and the disease-haplotype association parameter(s). We apply the proposed method to analyze the data from the Alpha-Tocopherol, Beta-Carotene Cancer prevention study, and a matched case-control study of breast cancer patients conducted in Israel. The performance of the proposed method is evaluated via simulation studies.
Collapse
Affiliation(s)
| | | | | | - Gad Rennert
- Carmel Medical Center; Technion-Israel Institute of Technology; CHS National Cancer Control Center,
| |
Collapse
|
10
|
Chen J, Rodriguez C. Conditional likelihood methods for haplotype-based association analysis using matched case-control data. Biometrics 2008; 63:1099-107. [PMID: 18078481 DOI: 10.1111/j.1541-0420.2007.00797.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Genetic epidemiologists routinely assess disease susceptibility in relation to haplotypes, that is, combinations of alleles on a single chromosome. We study statistical methods for inferring haplotype-related disease risk using single nucleotide polymorphism (SNP) genotype data from matched case-control studies, where controls are individually matched to cases on some selected factors. Assuming a logistic regression model for haplotype-disease association, we propose two conditional likelihood approaches that address the issue that haplotypes cannot be inferred with certainty from SNP genotype data (phase ambiguity). One approach is based on the likelihood of disease status conditioned on the total number of cases, genotypes, and other covariates within each matching stratum, and the other is based on the joint likelihood of disease status and genotypes conditioned only on the total number of cases and other covariates. The joint-likelihood approach is generally more efficient, particularly for assessing haplotype-environment interactions. Simulation studies demonstrated that the first approach was more robust to model assumptions on the diplotype distribution conditioned on environmental risk variables and matching factors in the control population. We applied the two methods to analyze a matched case-control study of prostate cancer.
Collapse
Affiliation(s)
- Jinbo Chen
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 6120 Executive Boulevard, Rockville, Maryland 20852, USA.
| | | |
Collapse
|