1
|
Lou XY, Hou TT, Liu SY, Xu HM, Lin F, Tang X, MacLeod SL, Cleves MA, Hobbs CA. Innovative approach to identify multigenomic and environmental interactions associated with birth defects in family-based hybrid designs. Genet Epidemiol 2021; 45:171-189. [PMID: 32996630 PMCID: PMC8495752 DOI: 10.1002/gepi.22363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 09/08/2020] [Accepted: 09/11/2020] [Indexed: 11/09/2022]
Abstract
Genes, including those with transgenerational effects, work in concert with behavioral, environmental, and social factors via complex biological networks to determine human health. Understanding complex relationships between causal factors underlying human health is an essential step towards deciphering biological mechanisms. We propose a new analytical framework to investigate the interactions between maternal and offspring genetic variants or their surrogate single nucleotide polymorphisms (SNPs) and environmental factors using family-based hybrid study design. The proposed approach can analyze diverse genetic and environmental factors and accommodate samples from a variety of family units, including case/control-parental triads, and case/control-parental dyads, while minimizing potential bias introduced by population admixture. Comprehensive simulations demonstrated that our innovative approach outperformed the log-linear approach, the best available method for case-control family data. The proposed approach had greater statistical power and was capable to unbiasedly estimate the maternal and child genetic effects and the effects of environmental factors, while controlling the Type I error rate against population stratification. Using our newly developed approach, we analyzed the associations between maternal and fetal SNPs and obstructive and conotruncal heart defects, with adjustment for demographic and lifestyle factors and dietary supplements. Fourteen and 11 fetal SNPs were associated with obstructive and conotruncal heart defects, respectively. Twenty-seven and 17 maternal SNPs were associated with obstructive and conotruncal heart defects, respectively. In addition, maternal body mass index was a significant risk factor for obstructive defects. The proposed approach is a powerful tool for interrogating the etiological mechanism underlying complex traits.
Collapse
Affiliation(s)
- Xiang-Yang Lou
- Department of Biostatistics, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Ting-Ting Hou
- Department of Biostatistics, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, Florida, USA
- Institute of Bioinformatics and Institute of Crop Science, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Shou-Ye Liu
- Institute of Bioinformatics and Institute of Crop Science, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Hai-Ming Xu
- Institute of Bioinformatics and Institute of Crop Science, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Feng Lin
- Institute of Bioinformatics and Institute of Crop Science, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Xinyu Tang
- The US Food and Drug Administration, Silver Spring, Maryland, USA
| | | | - Mario A. Cleves
- Department of Pediatrics, Morsani College of Medicine, Health Informatics Institute, University of South Florida, Tampa, Florida, USA
| | - Charlotte A. Hobbs
- Rady Children’s Institute for Genomic Medicine, San Diego, California, USA
| |
Collapse
|
2
|
Gjerdevik M, Gjessing HK, Romanowska J, Haaland ØA, Jugessur A, Czajkowski NO, Lie RT. Design efficiency in genetic association studies. Stat Med 2020; 39:1292-1310. [PMID: 31943314 DOI: 10.1002/sim.8476] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 12/20/2019] [Accepted: 12/21/2019] [Indexed: 11/07/2022]
Abstract
Selecting the best design for genetic association studies requires careful deliberation; different study designs can be used to scan for different genetic effects, and each design has its own set of strengths and limitations. A variety of family and unrelated control configurations are amenable to genetic association analyses, including the case-control design, case-parent triads, and case-parent triads in combination with unrelated controls or control-parent triads. Ultimately, the goal is to choose the design that achieves the highest statistical power using the lowest cost. For given parameter values and genotyped individuals, designs can be compared directly by computing the power. However, a more informative and general design comparison can be achieved by studying the relative efficiency, defined as the ratio of variances of two different parameter estimators, corresponding to two separate designs. Using log-linear modeling, we derive the relative efficiency from the asymptotic variance of the parameter estimators and relate it to the concept of Pitman efficiency. The relative efficiency takes into account the fact that different designs impose different costs relative to the number of genotyped individuals. We show that while optimal efficiency for analyses of regular autosomal effects is achieved using the standard case-control design, the case-parent triad design without unrelated controls is efficient when searching for parent-of-origin effects. Due to the potential loss of efficiency, maternal genes should generally not be adjusted for in an initial genome-wide association study scan of offspring genes but instead checked post hoc. The relative efficiency calculations are implemented in our R package Haplin.
Collapse
Affiliation(s)
- Miriam Gjerdevik
- Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway.,Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway
| | - Håkon K Gjessing
- Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway.,Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Julia Romanowska
- Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway.,Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Øystein A Haaland
- Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
| | - Astanand Jugessur
- Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway.,Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway.,Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Nikolai O Czajkowski
- Department of Psychology, University of Oslo, Oslo, Norway.,Division of Mental Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Rolv T Lie
- Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway.,Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
| |
Collapse
|
3
|
The Gene Variants of Maternal/Fetal Renin-Angiotensin System in Preeclampsia: A Hybrid Case-Parent/Mother-Control Study. Sci Rep 2017; 7:5087. [PMID: 28698595 PMCID: PMC5506018 DOI: 10.1038/s41598-017-05411-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 05/30/2017] [Indexed: 12/17/2022] Open
Abstract
Preeclampsia (PE) is a common pregnancy-related complication, and polymorphisms in angiotensinogen (AGT), angiotensin-converting enzyme (ACE), and angiotensin II type 1 receptor (AT1R) are believed to contribute to PE development. We implemented a hybrid study to investigate the influence of maternal and fetal ACE I/D, ACE G2350A, AGT M235T, AGT T174M, and AT1R A1166C polymorphisms on PE in Han Chinese women. Polymorphisms were genotyped in 1,488 subjects (256 patients experiencing PE, along with their fetuses and partners, and 360 normotensive controls with their fetuses). Transmission disequilibrium tests revealed that ACE I/D (P = 0.041), ACE G2350A (P = 0.035), and AT1R A1166C (P = 0.018) were associated with maternal PE. The log-linear analyses revealed that mothers whose offspring carried the MM genotype of AGT M235T had a higher risk of PE (OR = 1.54, P = 0.010), whereas mothers whose offspring carried the II genotype of ACE I/D or the GG genotype of ACE G2350A had a reduced risk (OR = 0.58, P = 0.039; OR = 0.47, P = 0.045, respectively). Our findings demonstrate that fetal ACE I/D, ACE G2350A, AGT M235T, and AT1R A1166C polymorphisms may play significant roles in PE development among pregnant Han Chinese women.
Collapse
|
4
|
Wang M, Stewart WCL. A Pragmatic Test for Detecting Association between a Dichotomous Trait and the Genotypes of Affected Families, Controls and Independent Cases. Front Genet 2017; 8:49. [PMID: 28536599 PMCID: PMC5422425 DOI: 10.3389/fgene.2017.00049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 04/06/2017] [Indexed: 11/13/2022] Open
Abstract
The efficient analysis of hybrid designs [e.g., affected families, controls, and (optionally) independent cases] is attractive because it should have increased power to detect associations between genetic variants and disease. However, the computational complexity of such an analysis is not trivial, especially when the data contain pedigrees of arbitrary size and structure. To address this concern, we developed a pragmatic test of association that summarizes all of the available evidence in certain hybrid designs, irrespective of pedigree size or structure. Under the null hypothesis of no association, our proposed test statistic (POPFAM+) is the quadratic form of two correlated tests: a population-based test (e.g., wQLS), and a family-based test (e.g., PDT). We use the parametric bootstrap in conjunction with an estimate of the correlation to compute p-values, and we illustrate the potential for increased power when (1) the heritability of the trait is high; and, (2) the marker-specific association is driven by the over-representation of risk alleles in cases, and by the preferential transmission of risk alleles from heterozygous parents to their affected offspring. Based on simulation, we show that type I error is controlled, and that POPFAM+ is more powerful than wQLS or PDT alone. In a real data application, we used POPFAM+ to analyze 43 genes of a hybrid epilepsy study containing 85 affected families, 80 independent cases, 234 controls, and 118 reference samples from the International HapMap Project. The results of our analysis identified a promising epilepsy candidate gene for follow-up sequencing: malic enzyme 2 (ME2; min p < 0.0084).
Collapse
Affiliation(s)
- Meng Wang
- The Research Institute at Nationwide Children's HospitalColumbus, OH, USA
| | - William C L Stewart
- The Research Institute at Nationwide Children's HospitalColumbus, OH, USA.,Departments of Statistics and Pediatrics, Ohio State UniversityColumbus, OH, USA
| |
Collapse
|
5
|
Guo CY, Chen YJ, Chen YH. The logistic regression model for gene-environment interactions using both case-parent trios and unrelated case-controls. Ann Hum Genet 2014; 78:299-305. [PMID: 24766627 DOI: 10.1111/ahg.12063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2013] [Accepted: 03/12/2014] [Indexed: 12/01/2022]
Abstract
One of the greatest challenges in genetic studies is the determination of gene-environment interactions due to underlying complications and inadequate statistical power. With the increased sample size gained by using case-parent trios and unrelated cases and controls, the performance may be much improved. Focusing on a dichotomous trait, a two-stage approach was previously proposed to deal with gene-environment interaction when utilizing mixed study samples. Theoretically, the two-stage association analysis uses likelihood functions such that the computational algorithms may not converge in the maximum likelihood estimation with small study samples. In an effort to avoid such convergence issues, we propose a logistic regression framework model, based on the combined haplotype relative risk (CHRR) method, which intuitively pools the case-parent trios and unrelated subjects in a two by two table. A positive feature of the logistic regression model is the effortless adjustment for either discrete or continuous covariates. According to computer simulations, under the circumstances in which the two-stage test converges in larger sample sizes, we discovered that the performances of the two tests were quite similar; the two-stage test is more powerful under the dominant and additive disease models, but the extended CHRR is more powerful under the recessive disease model.
Collapse
Affiliation(s)
- Chao-Yu Guo
- Division of Biostatistics, Institute of Public Health, National Yang Ming University, Taipei, Taiwan; Aging and Health Research Center, National Yang Ming University, Taipei, Taiwan; Biostatistical Consulting Center, National Yang Ming University, Taipei, Taiwan
| | | | | |
Collapse
|
6
|
Wen SH, Tsai MY. Haplotype association analysis of combining unrelated case-control and triads with consideration of population stratification. Front Genet 2014; 5:103. [PMID: 24860592 PMCID: PMC4028876 DOI: 10.3389/fgene.2014.00103] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Accepted: 04/09/2014] [Indexed: 12/27/2022] Open
Abstract
Combining data when data are collected under different study designs, such as family trios and unrelated case-control samples, gains more power and is cost-effective than analyzing each data separately. However, a potential concern is population stratification (PS) among unrelated case-control samples and analyses integrating data should address this confounding effect. In this paper, we develop a simpler method, haplotype generalized linear model (HGLM), that tests and estimates haplotype effects on disease risk and allows for modification against PS for combining data. We proposed to combine information across aggregations of haplotype weighted-counts estimated from population case-control data and trio data separately, and to perform subsequent GLM analysis. Furthermore, we present a framework of analysis of variance based on haplotype weighted-counts for detecting whether it is appropriate to combine two data sources, as well as the modified HGLM with clustering methods for addressing PS. We evaluate the statistical properties in terms of the accuracy, false positive rate (FPR) and empirical power using simulated data with regard to various disease risks, sample sizes, multi-SNP haplotypes and the presence of PS. Our simulation results indicate that HGLM performs comparably well with the likelihood-based haplotype association analysis, particularly when the haplotype effects are moderate, but may not perform well when dealing with lengthy haplotypes for small sample sizes. In the presence of PS, the modified HGLM remains valid and has satisfactory nominal level and small bias. Overall, HGLM appears to be successful in combining data and is simple to implement in standard statistical software.
Collapse
Affiliation(s)
- Shu-Hui Wen
- Department of Public Health, College of Medicine, Tzu-Chi University Hualien, Taiwan
| | - Miao-Yu Tsai
- Institute of Statistics and Information Science, National Changhua University of Education Chang-Hua, Taiwan
| |
Collapse
|
7
|
Fan R, Lee A, Lu Z, Liu A, Troendle JF, Mills JL. Association analysis of complex diseases using triads, parent-child dyads and singleton monads. BMC Genet 2013; 14:78. [PMID: 24007308 PMCID: PMC3844511 DOI: 10.1186/1471-2156-14-78] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2013] [Accepted: 08/17/2013] [Indexed: 11/16/2022] Open
Abstract
Background Triad families are routinely used to test association between genetic variants and complex diseases. Triad studies are important and popular since they are robust in terms of being less prone to false positives due to population structure. In practice, one may collect not only complete triads, but also incomplete families such as dyads (affected child with one parent) and singleton monads (affected child without parents). Since there is a lack of convenient algorithms and software to analyze the incomplete data, dyads and monads are usually discarded. This may lead to loss of power and insufficient utilization of genetic information in a study. Results We develop likelihood-based statistical models and likelihood ratio tests to test for association between complex diseases and genetic markers by using combinations of full triads, parent-child dyads, and affected singleton monads for a unified analysis. A likelihood is calculated directly to facilitate the data analysis without imputation and to avoid computational complexity. This makes it easy to implement the models and to explain the results. Conclusion By simulation studies, we show that the proposed models and tests are very robust in terms of accurately controlling type I error evaluations, and are powerful by empirical power evaluations. The methods are applied to test for association between transforming growth factor alpha (TGFA) gene and cleft palate in an Irish study.
Collapse
Affiliation(s)
- Ruzong Fan
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, 6100 Executive Blvd, MSC 7510, Rockville, MD 20852, USA.
| | | | | | | | | | | |
Collapse
|
8
|
Chiu YF, Lee CY, Kao HY, Pan WH, Hsu FC. Analysis of family- and population-based samples using multiple linkage disequilibrium mapping. Ann Hum Genet 2013; 77:251-67. [PMID: 23330688 DOI: 10.1111/ahg.12008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Accepted: 11/05/2012] [Indexed: 12/31/2022]
Abstract
We report two methods for linkage disequilibrium mapping that involve incorporation of covariates through parametric modeling to utilize combined case-parent trios and unrelated case and/or control data. The proposed two combined methods were used to map the disease locus of hypertension in the angiotensin-converting enzyme (ACE) gene with incorporation of ACE activity. The efficiencies in estimating the disease locus increased by 351- and 100-fold in the hybrid study with respect to the two proposed methods when compared to the estimates from the trios study; and they changed by 1.4- and 0.4-fold, respectively, when compared to the case-control study. Efficiency of disease locus estimates was greatly improved in both simulations and hypertension studies based on the hybrid data, compared to case-parent trio studies only. These newly developed methods preserve the advantages of the previous methods, including flexible modeling and assessment of gene-gene and gene-covariate effects, while providing more power by using all the data combined. The computing program for analysis using the separate and hybrid data sets is freely available on the author's website.
Collapse
Affiliation(s)
- Yen-Feng Chiu
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Taiwan, ROC.
| | | | | | | | | |
Collapse
|
9
|
Skare O, Jugessur A, Lie RT, Wilcox AJ, Murray JC, Lunde A, Nguyen TT, Gjessing HK. Application of a novel hybrid study design to explore gene-environment interactions in orofacial clefts. Ann Hum Genet 2012; 76:221-36. [PMID: 22497478 DOI: 10.1111/j.1469-1809.2012.00707.x] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Orofacial clefts are common birth defects with strong evidence for both genetic and environmental causal factors. Candidate gene studies combined with exposures known to influence the outcome provide a highly targeted approach to detecting GxE interactions. We developed a new statistical approach that combines the case-control and offspring-parent triad designs into a "hybrid design" to search for GxE interactions among 334 autosomal cleft candidate genes and maternal first-trimester exposure to smoking, alcohol, coffee, folic acid supplements, dietary folate and vitamin A. The study population comprised 425 case-parent triads of isolated clefts and 562 control-parent triads derived from a nationwide study of orofacial clefts in Norway (1996-2001). A full maximum-likelihood model was used in combination with a Wald test statistic to screen for statistically significant GxE interaction between strata of exposed and unexposed mothers. In addition, we performed pathway-based analyses on 28 detoxification genes and 21 genes involved in folic acid metabolism. With the possible exception of the T-box 4 gene (TBX4) and dietary folate interaction in isolated CPO, there was little evidence overall of GxE interaction in our data. This study is the largest to date aimed at detecting interactions between orofacial clefts candidate genes and well-established risk exposures.
Collapse
Affiliation(s)
- Oivind Skare
- Division of Epidemiology, Norwegian Institute of Public Health, Oslo, Norway.
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Bagos PG. On the covariance of two correlated log-odds ratios. Stat Med 2012; 31:1418-31. [PMID: 22302419 DOI: 10.1002/sim.4474] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2010] [Revised: 09/20/2011] [Accepted: 10/31/2011] [Indexed: 01/08/2023]
Abstract
In many applications two correlated estimates of an effect size need to be considered simultaneously to be combined or compared. Apparently, there is a need for calculating their covariance, which however requires access to the individual data that may not be available to a researcher performing the analysis. We present a simple and efficient method for calculating the covariance of two correlated log-odds ratios. The method is very simple, is based on the well-known large sample approximations, can be applied using only data that are available in the published reports and more importantly, is very general, because it is shown to encompass several previously derived estimates (multiple outcomes, multiple treatments, dose-response models, mutually exclusive outcomes, genetic association studies) as special cases. By encompassing the previous approaches in a unified framework, the method allows easily deriving estimates for the covariance concerning problems that were not easy to be obtained otherwise. We show that the method can be used to derive the covariance of log-odds ratios from matched and unmatched case-control studies that use the same cases, a situation that has been addressed in the past only using individual data. Future applications of the method are discussed.
Collapse
Affiliation(s)
- Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Central Greece, Papasiopoulou 2-4, Lamia, GR35100, Greece.
| |
Collapse
|
11
|
Myking S, Myhre R, Gjessing HK, Morken NH, Sengpiel V, Williams SM, Ryckman KK, Magnus P, Jacobsson B. Candidate gene analysis of spontaneous preterm delivery: new insights from re-analysis of a case-control study using case-parent triads and control-mother dyads. BMC MEDICAL GENETICS 2011; 12:174. [PMID: 22208904 PMCID: PMC3260094 DOI: 10.1186/1471-2350-12-174] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2011] [Accepted: 12/30/2011] [Indexed: 11/10/2022]
Abstract
Background Spontaneous preterm delivery (PTD) has a multifactorial etiology with evidence of a genetic contribution to its pathogenesis. A number of candidate gene case-control studies have been performed on spontaneous PTD, but the results have been inconsistent, and do not fully assess the role of how two genotypes can impact outcome. To elucidate this latter point we re-analyzed data from a previously published case-control candidate gene study, using a case-parent triad design and a hybrid design combining case-parent triads and control-mother dyads. These methods offer a robust approach to genetic association studies for PTD compared to traditional case-control designs. Methods The study participants were obtained from the Norwegian Mother and Child Cohort Study (MoBa). A total of 196 case triads and 211 control dyads were selected for the analysis. A case-parent triad design as well as a hybrid design was used to analyze 1,326 SNPs from 159 candidate genes. We compared our results to those from a previous case-control study on the same samples. Haplotypes were analyzed using a sliding window of three SNPs and a pathway analysis was performed to gain biological insight into the pathophysiology of preterm delivery. Results The most consistent significant fetal gene across all analyses was COL5A2. The functionally similar COL5A1 was significant when combining fetal and maternal genotypes. PON1 was significant with analytical approaches for single locus association of fetal genes alone, but was possibly confounded by maternal effects. Focal adhesion (hsa04510), Cell Communication (hsa01430) and ECM receptor interaction (hsa04512) were the most constant significant pathways. Conclusion This study suggests a fetal association of COL5A2 and a combined fetal-maternal association of COL5A1 with spontaneous PTD. In addition, the pathway analysis implied interactions of genes affecting cell communication and extracellular matrix.
Collapse
Affiliation(s)
- Solveig Myking
- Department of Genes and Environment, Division of Epidemiology, Norwegian Institute of Public Health, Oslo, Norway.
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Fardo DW, Druen AR, Liu J, Mirea L, Infante-Rivard C, Breheny P. Exploration and comparison of methods for combining population- and family-based genetic association using the Genetic Analysis Workshop 17 mini-exome. BMC Proc 2011; 5 Suppl 9:S28. [PMID: 22373349 PMCID: PMC3287863 DOI: 10.1186/1753-6561-5-s9-s28] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
We examine the performance of various methods for combining family- and population-based genetic association data. Several approaches have been proposed for situations in which information is collected from both a subset of unrelated subjects and a subset of family members. Analyzing these samples separately is known to be inefficient, and it is important to determine the scenarios for which differing methods perform well. Others have investigated this question; however, no extensive simulations have been conducted, nor have these methods been applied to mini-exome-style data such as that provided by Genetic Analysis Workshop 17. We quantify the empirical power and false-positive rates for three existing methods applied to the Genetic Analysis Workshop 17 mini-exome data and compare relative performance. We use knowledge of the underlying data simulation model to make these assessments.
Collapse
Affiliation(s)
- David W Fardo
- Department of Biostatistics, University of Kentucky College of Public Health, 121 Washington Avenue, Lexington, KY 40536, USA.
| | | | | | | | | | | |
Collapse
|
13
|
Ainsworth HF, Unwin J, Jamison DL, Cordell HJ. Investigation of maternal effects, maternal-fetal interactions and parent-of-origin effects (imprinting), using mothers and their offspring. Genet Epidemiol 2011; 35:19-45. [PMID: 21181895 PMCID: PMC3025173 DOI: 10.1002/gepi.20547] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Many complex genetic effects, including epigenetic effects, may be expected to operate via mechanisms in the inter-uterine environment. A popular design for the investigation of such effects, including effects of parent-of-origin (imprinting), maternal genotype, and maternal-fetal genotype interactions, is to collect DNA from affected offspring and their mothers (case/mother duos) and to compare with an appropriate control sample. An alternative design uses data from cases and both parents (case/parent trios) but does not require controls. In this study, we describe a novel implementation of a multinomial modeling approach that allows the estimation of such genetic effects using either case/mother duos or case/parent trios. We investigate the performance of our approach using computer simulations and explore the sample sizes and data structures required to provide high power for detection of effects and accurate estimation of the relative risks conferred. Through the incorporation of additional assumptions (such as Hardy-Weinberg equilibrium, random mating and known allele frequencies) and/or the incorporation of additional types of control sample (such as unrelated controls, controls and their mothers, or both parents of controls), we show that the (relative risk) parameters of interest are identifiable and well estimated. Nevertheless, parameter interpretation can be complex, as we illustrate by demonstrating the mathematical equivalence between various different parameterizations. Our approach scales up easily to allow the analysis of large-scale genome-wide association data, provided both mothers and affected offspring have been genotyped at all variants of interest. Genet. Epidemiol. 35:19–45, 2011. © 2010 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- Holly F Ainsworth
- School of Mathematics and Statistics, Newcastle University, Newcastle upon Tyne, United Kingdom
| | | | | | | |
Collapse
|
14
|
Crossett A, Kent BP, Klei L, Ringquist S, Trucco M, Roeder K, Devlin B. Using ancestry matching to combine family-based and unrelated samples for genome-wide association studies. Stat Med 2011; 29:2932-45. [PMID: 20862653 DOI: 10.1002/sim.4057] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We propose a method to analyze family-based samples together with unrelated cases and controls. The method builds on the idea of matched case-control analysis using conditional logistic regression (CLR). For each trio within the family, a case (the proband) and matched pseudo-controls are constructed, based upon the transmitted and untransmitted alleles. Unrelated controls, matched by genetic ancestry, supplement the sample of pseudo-controls; likewise unrelated cases are also paired with genetically matched controls. Within each matched stratum, the case genotype is contrasted with control/pseudo-control genotypes via CLR, using a method we call matched-CLR (mCLR). Eigenanalysis of numerous SNP genotypes provides a tool for mapping genetic ancestry. The result of such an analysis can be thought of as a multidimensional map, or eigenmap, in which the relative genetic similarities and differences amongst individuals is encoded in the map. Once constructed, new individuals can be projected onto the ancestry map based on their genotypes. Successful differentiation of individuals of distinct ancestry depends on having a diverse, yet representative sample from which to construct the ancestry map. Once samples are well-matched, mCLR yields comparable power to competing methods while ensuring excellent control over Type I error.
Collapse
Affiliation(s)
- Andrew Crossett
- Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | | | | | | | | | | |
Collapse
|
15
|
Yang Y, Remmers EF, Ogunwole CB, Kastner DL, Gregersen PK, Li W. Effective sample size: Quick estimation of the effect of related samples in genetic case-control association analyses. Comput Biol Chem 2011; 35:40-9. [PMID: 21333602 DOI: 10.1016/j.compbiolchem.2010.12.006] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2010] [Revised: 12/28/2010] [Accepted: 12/29/2010] [Indexed: 01/21/2023]
Abstract
Affected relatives are essential for pedigree linkage analysis, however, they cause a violation of the independent sample assumption in case-control association studies. To avoid the correlation between samples, a common practice is to take only one affected sample per pedigree in association analysis. Although several methods exist in handling correlated samples, they are still not widely used in part because these are not easily implemented, or because they are not widely known. We advocate the effective sample size method as a simple and accessible approach for case-control association analysis with correlated samples. This method modifies the chi-square test statistic, p-value, and 95% confidence interval of the odds-ratio by replacing the apparent number of allele or genotype counts with the effective ones in the standard formula, without the need for specialized computer programs. We present a simple formula for calculating effective sample size for many types of relative pairs and relative sets. For allele frequency estimation, the effective sample size method captures the variance inflation exactly. For genotype frequency, simulations showed that effective sample size provides a satisfactory approximation. A gene which is previously identified as a type 1 diabetes susceptibility locus, the interferon-induced helicase gene (IFIH1), is shown to be significantly associated with rheumatoid arthritis when the effective sample size method is applied. This significant association is not established if only one affected sib per pedigree were used in the association analysis. Relationship between the effective sample size method and other methods - the generalized estimation equation, variance of eigenvalues for correlation matrices, and genomic controls - are discussed.
Collapse
Affiliation(s)
- Yaning Yang
- Department of Statistics and Finance, University of Science and Technology of China, Anhui, Hefei, China
| | | | | | | | | | | |
Collapse
|
16
|
Mirea L, Sun L, Stafford JE, Bull SB. Using evidence for population stratification bias in combined individual- and family-level genetic association analyses of quantitative traits. Genet Epidemiol 2010; 34:502-11. [PMID: 20552647 DOI: 10.1002/gepi.20506] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Genetic association studies are generally performed either by examining differences in the genotype distribution between individuals or by testing for preferential allele transmission within families. In the absence of population stratification bias (PSB), integrated analyses of individual and family data can increase power to identify susceptibility loci [Abecasis et al., 2000. Am. J. Hum. Genet. 66:279-292; Chen and Lin, 2008. Genet. Epidemiol. 32:520-527; Epstein et al., 2005. Am. J. Hum. Genet. 76:592-608]. In existing methods, the presence of PSB is initially assessed by comparing results from between-individual and within-family analyses, and then combined analyses are performed only if no significant PSB is detected. However, this strategy requires specification of an arbitrary testing level alpha(PSB), typically 5%, to declare PSB significance. As a novel alternative, we propose to directly use the PSB evidence in weights that combine results from between-individual and within-family analyses. The weighted approach generalizes previous methods by using a continuous weighting function that depends only on the observed P-value instead of a binary weight that depends on alpha(PSB). Using simulations, we demonstrate that for quantitative trait analysis, the weighted approach provides a good compromise between type I error control and power to detect association in studies with few genotyped markers and limited information regarding population structure.
Collapse
Affiliation(s)
- Lucia Mirea
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | | | | | | |
Collapse
|
17
|
POLYMORPHISMS IN PARP, IL1B, IL4, IL10, C1INH, DEFB1, AND DEFA4 IN MENINGOCOCCAL DISEASE IN THREE POPULATIONS. Shock 2010; 34:17-22. [DOI: 10.1097/shk.0b013e3181ce2c7d] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
18
|
Perreault LPL, Andelfinger GU, Asselin G, Dubé MP. Partitioning of copy-number genotypes in pedigrees. BMC Bioinformatics 2010; 11:226. [PMID: 20438641 PMCID: PMC2874807 DOI: 10.1186/1471-2105-11-226] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2009] [Accepted: 05/03/2010] [Indexed: 01/03/2023] Open
Abstract
Background Copy number variations (CNVs) and polymorphisms (CNPs) have only recently gained the genetic community's attention. Conservative estimates have shown that CNVs and CNPs might affect more than 10% of the genome and that they may be at least as important as single nucleotide polymorphisms in assessing human variability. Widely used tools for CNP analysis have been implemented in Birdsuite and PLINK for the purpose of conducting genetic association studies based on the unpartitioned total number of CNP copies provided by the intensities from Affymetrix's Genome-Wide Human SNP Array. Here, we are interested in partitioning copy number variations and polymorphisms in extended pedigrees for the purpose of linkage analysis on familial data. Results We have developed CNGen, a new software for the partitioning of copy number polymorphism using the integrated genotypes from Birdsuite with the Affymetrix platform. The algorithm applied to familial trios or extended pedigrees can produce partitioned copy number genotypes with distinct parental alleles. We have validated the algorithm using simulations on a complex pedigree structure using frequencies calculated from a real dataset of 300 genotyped samples from 42 pedigrees segregating a congenital heart defect phenotype. Conclusions CNGen is the first published software for the partitioning of copy number genotypes in pedigrees, making possible the use CNPs and CNVs for linkage analysis. It was implemented with the Python interpreter version 2.5.2. It was successfully tested on current Linux, Windows and Mac OS workstations.
Collapse
|
19
|
Won S, Wilk JB, Mathias RA, O'Donnell CJ, Silverman EK, Barnes K, O'Connor GT, Weiss ST, Lange C. On the analysis of genome-wide association studies in family-based designs: a universal, robust analysis approach and an application to four genome-wide association studies. PLoS Genet 2009; 5:e1000741. [PMID: 19956679 PMCID: PMC2777973 DOI: 10.1371/journal.pgen.1000741] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2009] [Accepted: 10/26/2009] [Indexed: 11/19/2022] Open
Abstract
For genome-wide association studies in family-based designs, we propose a new, universally applicable approach. The new test statistic exploits all available information about the association, while, by virtue of its design, it maintains the same robustness against population admixture as traditional family-based approaches that are based exclusively on the within-family information. The approach is suitable for the analysis of almost any trait type, e.g. binary, continuous, time-to-onset, multivariate, etc., and combinations of those. We use simulation studies to verify all theoretically derived properties of the approach, estimate its power, and compare it with other standard approaches. We illustrate the practical implications of the new analysis method by an application to a lung-function phenotype, forced expiratory volume in one second (FEV1) in 4 genome-wide association studies. In genome-wide association studies, the multiple testing problem and confounding due to population stratification have been intractable issues. Family-based designs have considered only the transmission of genotypes from founder to nonfounder to prevent sensitivity to the population stratification, which leads to the loss of information. Here we propose a novel analysis approach that combines mutually independent FBAT and screening statistics in a robust way. The proposed method is more powerful than any other, while it preserves the complete robustness of family-based association tests, which only achieves much smaller power level. Furthermore, the proposed method is virtually as powerful as population-based approaches/designs, even in the absence of population stratification. By nature of the proposed method, it is always robust as long as FBAT is valid, and the proposed method achieves the optimal efficiency if our linear model for screening test reasonably explains the observed data in terms of covariance structure and population admixture. We illustrate the practical relevance of the approach by an application in 4 genome-wide association studies.
Collapse
Affiliation(s)
- Sungho Won
- Department of Statistics, Chung-Ang University, Seoul, Korea
- Research Center for Data Science, Chung-Ang University, Seoul, Korea
| | - Jemma B. Wilk
- Department of Neurology, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Rasika A. Mathias
- Genometrics Section, Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, Maryland, United States of America
| | - Christopher J. O'Donnell
- National Heart, Lung, and Blood Institute and Framingham Heart Study, Bethesda, Maryland, United States of America
- Cardiology Division, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Edwin K. Silverman
- Channing Laboratory, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Kathleen Barnes
- Department of Medicine, School of Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - George T. O'Connor
- Pulmonary Center, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Scott T. Weiss
- Channing Laboratory, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Center for Genomic Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Christoph Lange
- Harvard Medical School, Boston, Massachusetts, United States of America
- Center for Genomic Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
20
|
Infante-Rivard C, Mirea L, Bull SB. Combining case-control and case-trio data from the same population in genetic association analyses: overview of approaches and illustration with a candidate gene study. Am J Epidemiol 2009; 170:657-64. [PMID: 19635737 DOI: 10.1093/aje/kwp180] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
In genetic association studies, investigators compare allele or genotype frequencies in unrelated case and control subjects or examine preferential allele transmissions from parents to affected offspring. In many genetic case-control studies, the collection of DNA material extends to relatives such as parents of cases. Thus, case-control and case-parent trio association analyses are possible. Whereas the goal of collecting genetic information from family members in a study initially designed as a case-control study is to enrich the genetic analysis, increase power, or address concern about population structure bias, methods of combining genetic data from unrelated case and control subjects with genetic trio data from the same study population are not well known. A number of hybrid approaches have been developed that utilize such data together. In this paper, the authors describe key features of genetic case-control and case-parent trio studies and review commonly used methods of genetic analysis for case-parent trio designs. In addition, they provide a pragmatic review of statistical methods and available software for existing hybrid approaches that combine various components of case-control and genetic trio data. The application of all methods is illustrated using a candidate gene study of childhood leukemia that included case-control subjects and their parents.
Collapse
Affiliation(s)
- Claire Infante-Rivard
- Department of Epidemiology, Biostatistics and Occupational Health, Faculty of Medicine, McGill University, 1110 Pine Avenue West, Montreal, Quebec H3A1A3, Canada.
| | | | | |
Collapse
|
21
|
Univariate/multivariate genome-wide association scans using data from families and unrelated samples. PLoS One 2009; 4:e6502. [PMID: 19652719 PMCID: PMC2715864 DOI: 10.1371/journal.pone.0006502] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2009] [Accepted: 06/30/2009] [Indexed: 11/19/2022] Open
Abstract
As genome-wide association studies (GWAS) are becoming more popular, two approaches, among others, could be considered in order to improve statistical power for identifying genes contributing subtle to moderate effects to human diseases. The first approach is to increase sample size, which could be achieved by combining both unrelated and familial subjects together. The second approach is to jointly analyze multiple correlated traits. In this study, by extending generalized estimating equations (GEEs), we propose a simple approach for performing univariate or multivariate association tests for the combined data of unrelated subjects and nuclear families. In particular, we correct for population stratification by integrating principal component analysis and transmission disequilibrium test strategies. The proposed method allows for multiple siblings as well as missing parental information. Simulation studies show that the proposed test has improved power compared to two popular methods, EIGENSTRAT and FBAT, by analyzing the combined data, while correcting for population stratification. In addition, joint analysis of bivariate traits has improved power over univariate analysis when pleiotropic effects are present. Application to the Genetic Analysis Workshop 16 (GAW16) data sets attests to the feasibility and applicability of the proposed method.
Collapse
|
22
|
Vermeulen SH, Shi M, Weinberg CR, Umbach DM. A hybrid design: case-parent triads supplemented by control-mother dyads. Genet Epidemiol 2009; 33:136-44. [PMID: 18759250 DOI: 10.1002/gepi.20365] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Hybrid designs arose from an effort to combine the benefits of family-based and population-based study designs. A recently proposed hybrid approach augments case-parent triads with population-based control-parent triads, genotyping everyone except the control offspring. Including parents of controls substantially improves statistical efficiency for testing and estimating both offspring and maternal genetic relative risk parameters relative to using case-parent triads alone. Moreover, it allows testing of required assumptions. Nevertheless, control fathers can be hard to recruit, whereas control offspring and their mothers may be readily available. Consequently, we propose an alternative hybrid design where offspring-mother pairs, instead of parents, serve as population-based controls. We compare the power of our proposed method with several competitors and show that it performs well in various scenarios, though it is slightly less powerful than the hybrid design that uses control parents. We describe approaches for checking whether population stratification will bias inferences that use controls and whether the mating-symmetry assumption holds. Surprisingly, if mating symmetry is violated, even though mating-type parameters cannot be directly estimated using control-mother dyads alone, and maternal effects cannot be estimated using case-parent triads alone, combining both sources of data allows estimation of all the parameters. This hybrid design can also be used to study environmental influences on disease risk and gene-by-environment interactions.
Collapse
Affiliation(s)
- Sita H Vermeulen
- Department of Endocrinology, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | | | | | | |
Collapse
|
23
|
Cheng KF. Combining unrelated family studies to improve the power of genetic association test. Stat Med 2009; 28:311-25. [PMID: 18991259 DOI: 10.1002/sim.3466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Family-based studies provide powerful inferences regarding associations between genetic variants and risks, but have limitations. Since very often, the availability of the parental genotypes can pose a problem for using family-based design, especially when the disease of interest has a late age of onset. To improve the efficiency of the studies, a popular approach is to reconstruct the missing genotypes from the genotypes of their offspring and correct the biases resulting from the reconstruction. In this paper, the author shows that two or more unrelated family studies, for the same candidate marker but different diseases, can also be combined to construct a more efficient test for association analysis. The usual case-control study with parental genotypes is a special case of the data discussed here. The author used a simulation study to compare the performance of the new method with other well-known methods. The results showed that the new test has an advantage of having larger power when there is no effect of population stratification between two study samples. However, if there is effect of population stratification between the two samples, the new test still maintains the expected type I error rate and has comparable power performance. Since the unrelated family studies not for the disease of interest are often readily accessible with minimal cost, the proposed method has practical value. The new approach can also be easily modified to allow for missing parental data.
Collapse
Affiliation(s)
- K F Cheng
- Biostatistics Center and Department of Public Health, China Medical University, Taichung, Taiwan.
| |
Collapse
|
24
|
Abstract
Studies to detect genetic association with disease can be family-based, often using families with multiple affected members, or population based, as in population-based case-control studies. If data on both study types are available from the same population, it is useful to combine them to improve power to detect genetic associations. Two aspects of the data need to be accommodated, the sampling scheme and potential residual correlations among family members. We propose two approaches for combining data from a case-control study and a family study that collected families with multiple cases. In the first approach, we view a family as the sampling unit and specify the joint likelihood for the family members using a two-level mixed effects model to account for random familial effects and for residual genetic correlations among family members. The ascertainment of the families is accommodated by conditioning on the ascertainment event. The individuals in the case-control study are treated as families of size one, and their unconditional likelihood is combined with the conditional likelihood for the families. This approach yields subject specific maximum likelihood estimates of covariate effects. In the second approach, we view an individual as the sampling unit. The sampling scheme is accommodated using two-phase sampling techniques, marginal covariate effects are estimated, and correlations among family members are accounted for in the variance calculations. The models are compared in simulations. Data from a case-control and a family study from north-eastern Italy on melanoma and a low-risk melanoma-susceptibility gene, MC1R, are used to illustrate the approaches.
Collapse
Affiliation(s)
- Ruth M Pfeiffer
- National Cancer Institute, Division of Cancer Epidemiology and Genetics, Bethesda, Maryland 20892-7244, USA.
| | | | | |
Collapse
|
25
|
Guo CY, Lunetta KL, DeStefano AL, Cupples LA. Combined haplotype relative risk (CHRR): a general and simple genetic association test that combines trios and unrelated case-controls. Genet Epidemiol 2009; 33:54-62. [PMID: 18636528 PMCID: PMC2700841 DOI: 10.1002/gepi.20356] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In some genetic association studies, samples contain both parental and unrelated controls. Under such scenarios, instead of analyzing only trios using family-based association tests or only unrelated subjects using a case-control study design, Nagelkerke et al. ([2004] Eur. J. Hum. Genet. 12:964-970) and Epstein et al. ([2005] Am. J. Hum. Genet. 76:592-608) proposed methods that implemented a likelihood ratio test to combine the two different types of data. In this article, we put forward a more powerful and simplified strategy to combine trios with unrelated subjects based on the haplotype relative risk (HRR) (Falk and Rubinstein [1987] Ann. Hum. Genet. 51:227-233). The HRR compares parental marker alleles transmitted to an affected offspring to those not transmitted as a test for association, a strategy that is similar to a case-control study that compares allele frequencies in diseased cases to those of unrelated controls. We prove that affected offspring can be pooled with diseased cases and that parental controls can be treated as unrelated controls when the trios and unrelated subjects are randomly sampled from the same population. Therefore, unrelated subjects can be incorporated into the HRR intuitively and effortlessly. For trios without complete parental genotypes, we adopted the strategy proposed by (Guo et al. [2005a] BMC Genet. 6:S90; [2005b] Hum. Hered. 59: 125-135), which is more feasible than the one proposed by Weinberg ([1999] Am. J. Hum. Genet. 64:1186-1193). In addition, simulation results suggest that the combined haplotype relative risk is more powerful than Epstein et al.'s method regardless of the disease prevalence in a homogeneous population.
Collapse
Affiliation(s)
- Chao-Yu Guo
- Clinical Research Program, Children's Hospital Boston, Boston, Massachusetts 02115, USA.
| | | | | | | |
Collapse
|
26
|
Hsu L, Starr JR, Zheng Y, Schwartz SM. On combining triads and unrelated subjects data in candidate gene studies: an application to data on testicular cancer. Hum Hered 2008; 67:88-103. [PMID: 19077426 PMCID: PMC2763779 DOI: 10.1159/000179557] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2008] [Accepted: 04/16/2008] [Indexed: 11/19/2022] Open
Abstract
Combining data collected from different sources is a cost-effective and time-efficient approach for enhancing the statistical efficiency in estimating weak-to-modest genetic effects or gene-gene or gene-environment interactions. However, combining data across studies becomes complicated when data are collected under different study designs, such as family-based and unrelated individual-based (e.g., population-based case-control design). In this paper, we describe a general method that permits the joint estimation of effects on disease risk of genes, environmental factors, and gene-gene/gene-environment interactions under a hybrid design that includes cases, parents of cases, and unrelated individuals. We provide both asymptotic theory and statistical inference. Extensive simulation experiments demonstrate that the proposed estimation and inferential methods perform well in realistic settings. We illustrate the method by an application to a study of testicular cancer.
Collapse
Affiliation(s)
- Li Hsu
- Biostatistics and Biomathematics Program, Fred Hutchinson Cancer Research Center, Seattle, Wash., USA.
| | | | | | | |
Collapse
|
27
|
Chen YH, Lin HW. Simple association analysis combining data from trios/sibships and unrelated controls. Genet Epidemiol 2008; 32:520-7. [DOI: 10.1002/gepi.20325] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
28
|
Weinberg CR. Less is more, except when less is less: Studying joint effects. Genomics 2008; 93:10-2. [PMID: 18598750 DOI: 10.1016/j.ygeno.2008.06.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2008] [Revised: 06/09/2008] [Accepted: 06/09/2008] [Indexed: 11/29/2022]
Abstract
Most diseases are complex in that they are caused by the joint action of multiple factors, both genetic and environmental. Over the past few decades, the mathematical convenience of logistic regression has served to enshrine the multiplicative model, to the point where many epidemiologists believe that departure from additivity on a log scale implies that two factors interact in causing disease. Other terminology in epidemiology, where students are told that inequality of relative risks across levels of a second factor should be seen as "effect modification," reinforces an uncritical acceptance of multiplicative joint effect as the biologically meaningful no-interaction null. Our first task, when studying joint effects, is to understand the limitations of our definitions for "interaction," and recognize that what statisticians mean and what biologists might want to mean by interaction may not coincide. Joint effects are notoriously hard to identify and characterize, even when asking a simple and unsatisfying question, like whether two effects are log-additive. The rule of thumb for such efforts is that a factor-of-four sample size is needed, compared with that needed to demonstrate main effects of either genes or exposures. So strategies have been devised that focus on the most informative individuals, either through risk-based sampling for a cohort, or case-control sampling, extreme phenotype sampling, pooling, two-stage sampling, exposed-only, or case-only designs. These designs gain efficiency, but at a cost of flexibility in models for joint effects. A relatively new approach avoids population controls by genotyping case-parent triads. Because it requires parents, the method works best for diseases with onset early in life. With this design, the role of autosomal genetic variants is assessed by in effect treating the nontransmitted parental alleles as controls for affected offspring. Despite advantages for looking at genetic effects, the triad design faces limitations when examining joint effects of genetic and environmental factors. Because population-based controls are not included, main effects for exposures cannot be estimated, and consequently one only has access to inference related to a multiplicative null. We have proposed a hybrid approach that offers the best features of both case-parent and case-control designs. Through genotyping of parents of population-based controls and assuming Mendelian transmission, power is markedly enhanced. One can also estimate main effects for exposures and now flexibly assess models for joint effects.
Collapse
Affiliation(s)
- C R Weinberg
- National Institute of Environmental Health Sciences, MD A3-03, P.O. Box 12233, Research Triangle Park, NC 27709, USA.
| |
Collapse
|
29
|
Tiwari HK, Barnholtz-Sloan J, Wineinger N, Padilla MA, Vaughan LK, Allison DB. Review and evaluation of methods correcting for population stratification with a focus on underlying statistical principles. Hum Hered 2008; 66:67-86. [PMID: 18382087 PMCID: PMC2803696 DOI: 10.1159/000119107] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
When two or more populations have been separated by geographic or cultural boundaries for many generations, drift, spontaneous mutations, differential selection pressures and other factors may lead to allele frequency differences among populations. If these 'parental' populations subsequently come together and begin inter-mating, disequilibrium among linked markers may span a greater genetic distance than it typically does among populations under panmixia [see glossary]. This extended disequilibrium can make association studies highly effective and more economical than disequilibrium mapping in panmictic populations since less marker loci are needed to detect regions of the genome that harbor phenotype-influencing loci. However, under some circumstances, this process of intermating (as well as other processes) can produce disequilibrium between pairs of unlinked loci and thus create the possibility of confounding or spurious associations due to this population stratification. Accordingly, researchers are advised to employ valid statistical tests for linkage disequilibrium mapping allowing conduct of genetic association studies that control for such confounding. Many recent papers have addressed this need. We provide a comprehensive review of advances made in recent years in correcting for population stratification and then evaluate and synthesize these methods based on statistical principles such as (1) randomization, (2) conditioning on sufficient statistics, and (3) identifying whether the method is based on testing the genotype-phenotype covariance (conditional upon familial information) and/or testing departures of the marginal distribution from the expected genotypic frequencies.
Collapse
Affiliation(s)
- Hemant K Tiwari
- Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
| | | | | | | | | | | |
Collapse
|
30
|
Nicodemus KK. Catmap: case-control and TDT meta-analysis package. BMC Bioinformatics 2008; 9:130. [PMID: 18307795 PMCID: PMC2291045 DOI: 10.1186/1471-2105-9-130] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2007] [Accepted: 02/28/2008] [Indexed: 11/17/2022] Open
Abstract
Background Risk for complex disease is thought to be controlled by multiple genetic risk factors, each with small individual effects. Meta-analyses of several independent studies may be helpful to increase the ability to detect association when effect sizes are modest. Although many software options are available for meta-analysis of genetic case-control data, no currently available software implements the method described by Kazeem and Farrall (2005), which combines data from independent family-based and case-control studies. Results I introduce the package catmap for the R statistical computing environment that implements fixed- and random-effects pooled estimates for case-control and transmission disequilibrium methods, allowing for the use of genetic association data across study types. In addition, catmap may be used to create forest and funnel plots and to perform sensitivity analysis and cumulative meta-analysis. catmap is available from the Comprehensive R Archive Network . Conclusion catmap allows researchers to synthesize data to assess evidence for association in studies of genetic polymorphisms, facilitating the use of pooled data analyses which may increase power to detect moderate genetic associations.
Collapse
Affiliation(s)
- Kristin K Nicodemus
- Genes, Cognition and Psychosis Program, Clinical Brain Disorders Branch, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, USA.
| |
Collapse
|
31
|
Li MX, Jiang L, Song YQ, Sham PC. Power of transmission/disequilibrium tests in admixed populations. Genet Epidemiol 2008; 32:434-44. [PMID: 18278814 DOI: 10.1002/gepi.20316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The power of transmission/disequilibrium tests (TDTs) for detecting disease susceptibility loci is expected to be influenced by population admixture through its impact on the degree of linkage disequilibrium (LD) between the genetic marker and the DSL. However, few studies have been done to systematically examine this behavior of the TDTs in admixed populations. In the present study, extensive computer simulations were conducted to explore how population admixture affects the power of TDTs. It was found that (1) in newly admixed populations, the LD due to admixture makes no contribution to the power of TDTs, and it is the averaged background LD in the parental populations that determines the power of TDTs; but (2) after random mating between the admixed populations, the LD due to admixture becomes effective in increasing or decreasing the power of the tests, and (3) incomplete random mating can prolong the time for the LD due to admixture to become effective. This study clarifies the potential influence of population admixture on the performance of TDTs.
Collapse
Affiliation(s)
- Miao-Xin Li
- Department of Biochemistry, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | | | | | | |
Collapse
|
32
|
Zhu X, Li S, Cooper RS, Elston RC. A unified association analysis approach for family and unrelated samples correcting for stratification. Am J Hum Genet 2008; 82:352-65. [PMID: 18252216 DOI: 10.1016/j.ajhg.2007.10.009] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2007] [Revised: 10/05/2007] [Accepted: 10/09/2007] [Indexed: 10/22/2022] Open
Abstract
There are two common designs for association mapping of complex diseases: case-control and family-based designs. A case-control sample is more powerful to detect genetic effects than a family-based sample that contains the same numbers of affected and unaffected persons, although additional markers may be required to control for spurious association. When family and unrelated samples are available, statistical analyses are often performed in the family and unrelated samples separately, conditioning on parental information for the former, thus resulting in reduced power. In this report, we propose a unified approach that can incorporate both family and case-control samples and, provided the additional markers are available, at the same time corrects for population stratification. We apply the principal components of a marker matrix to adjust for the effect of population stratification. This unified approach makes it unnecessary to perform a conditional analysis of the family data and is more powerful than the separate analyses of unrelated and family samples, or a meta-analysis performed by combining the results of the usual separate analyses. This property is demonstrated in both a variety of simulation models and empirical data. The proposed approach can be equally applied to the analysis of both qualitative and quantitative traits.
Collapse
|
33
|
Joo J, Tian X, Zheng G, Stylianou M, Lin JP, Geller NL. Joint analysis of case-parents trio and unrelated case-control designs in large scale association studies. BMC Proc 2007; 1 Suppl 1:S28. [PMID: 18466525 PMCID: PMC2367524 DOI: 10.1186/1753-6561-1-s1-s28] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
We present a new method for testing association when data from both case-parents trios and unrelated controls are available. Our method combines test statistics for case-parents trio and unrelated case-control studies by adjusting for the correlation that arises when the same set of cases is used for both tests. We further consider several analytical approaches for two-stage studies on a large number of markers, including methods based on the joint analysis. The performance of the proposed approaches is examined by analyzing the simulated data provided by the Genetic Analysis Workshop 15.
Collapse
Affiliation(s)
- Jungnam Joo
- Office of Biostatistics Research, National Heart, Lung and Blood Institute, 6701 Rockledge Drive, MSC 7913, Bethesda, MD 20892-7913, USA.
| | | | | | | | | | | |
Collapse
|
34
|
Curtin K, Wong J, Allen-Brady K, Camp NJ. PedGenie: meta genetic association testing in mixed family and case-control designs. BMC Bioinformatics 2007; 8:448. [PMID: 18005446 PMCID: PMC2200673 DOI: 10.1186/1471-2105-8-448] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2007] [Accepted: 11/15/2007] [Indexed: 11/11/2022] Open
Abstract
Background- PedGenie software, introduced in 2006, includes genetic association testing of cases and controls that may be independent or related (nuclear families or extended pedigrees) or mixtures thereof using Monte Carlo significance testing. Our aim is to demonstrate that PedGenie, a unique and flexible analysis tool freely available in Genie 2.4 software, is significantly enhanced by incorporating meta statistics for detecting genetic association with disease using data across multiple study groups. Methods- Meta statistics (chi-squared tests, odds ratios, and confidence intervals) were calculated using formal Cochran-Mantel-Haenszel techniques. Simulated data from unrelated individuals and individuals in families were used to illustrate meta tests and their empirically-derived p-values and confidence intervals are accurate, precise, and for independent designs match those provided by standard statistical software. Results- PedGenie yields accurate Monte Carlo p-values for meta analysis of data across multiple studies, based on validation testing using pedigree, nuclear family, and case-control data simulated under both the null and alternative hypotheses of a genotype-phenotype association. Conclusion- PedGenie allows valid combined analysis of data from mixtures of pedigree-based and case-control resources. Added meta capabilities provide new avenues for association analysis, including pedigree resources from large consortia and multi-center studies.
Collapse
Affiliation(s)
- Karen Curtin
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84108, USA.
| | | | | | | |
Collapse
|
35
|
Abstract
In the past, to study Mendelian diseases, segregating families have been carefully ascertained for segregation analysis, followed by collecting extended multiplex families for linkage analysis. This would then be followed by association studies, using independent case-control samples and/or additional family data. Recently, for complex diseases, the initial sampling has been for a genome-wide linkage analysis, often using independent sib-pairs or nuclear families, to identify candidate regions for follow-up with association studies, again using case-control samples and/or additional family data. We now have the ability to conduct genome-wide association studies using 100,000-500,000 diallelic genetic markers. For such studies we focus especially on efficient two-stage association sampling designs, which can retain nearly optimal statistical power at about half the genotyping cost. Similarly, beginning an association study by genotyping pooled samples may also be a viable option if the cost of accurately pooling DNA samples outweighs genotyping costs. Finally, we note that the sampling of family data for linkage analysis is not a practice that should be automatically discontinued.
Collapse
Affiliation(s)
- Robert C Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio 44106, USA.
| | | | | |
Collapse
|
36
|
Allen AS, Satten GA. Inference on haplotype/disease association using parent-affected-child data: the projection conditional on parental haplotypes method. Genet Epidemiol 2007; 31:211-23. [PMID: 17266114 DOI: 10.1002/gepi.20203] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We develop a method that allows inference on parameters in log-linear models of the relative risk of disease given an individual's haplotypes, that can be used to analyze case-parent trio data. Our methods are robust to population stratification and can also be used for inference on the effect of interactions between haplotypes and environmental covariates. We compare our results with the family-based association test (FBAT) of Horvath et al. ([2004] Genet. Epidemiol. 26:61-69), and discuss when marginal tests, such as those available in FBAT, can be misleading. Our approach generalizes previous results of Allen et al. ([2005] Biometrika 92:559-571), allowing for missing genotype data and haplotype x environment interactions. Additional computational simplifications are also discussed.
Collapse
Affiliation(s)
- Andrew S Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| | | |
Collapse
|
37
|
BIEDERMANN STEFANIE, NAGEL EVA, MUNK AXEL, HOLZMANN HAJO, STELAND ANSGAR. Tests in a Case?control Design Including Relatives. Scand Stat Theory Appl 2006. [DOI: 10.1111/j.1467-9469.2006.00500.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
38
|
Vionnet N, Tregouët D, Kazeem G, Gut I, Groop PH, Tarnow L, Parving HH, Hadjadj S, Forsblom C, Farrall M, Gauguier D, Cox R, Matsuda F, Heath S, Thévard A, Rousseau R, Cambien F, Marre M, Lathrop M. Analysis of 14 candidate genes for diabetic nephropathy on chromosome 3q in European populations: strongest evidence for association with a variant in the promoter region of the adiponectin gene. Diabetes 2006; 55:3166-74. [PMID: 17065357 DOI: 10.2337/db06-0271] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Linkage studies have mapped loci for diabetic nephropathy and associated phenotypes on chromosome 3q. We studied 14 plausible candidate genes in the linkage region because of their potential role in vascular complications. In a large-scale study of patients from Denmark, Finland, and France who have type 1 diabetes, 1,057 case and 1,127 control subjects, as well as 532 trios, were investigated for association with diabetic nephropathy. We analyzed 69 haplotype-tagging single nucleotide polymorphisms and nonsynonymous variants that were identified by sequencing. Polymorphisms in three genes, glucose transporter 2 (SLC2A2), kininogen (KNG1), and adiponectin (ADIPOQ), showed nominal association with diabetic nephropathy in single-point analysis. The T-allele of SLC2A2_16459CT was associated with a decreased risk of diabetic nephropathy (odds ratio 0.79 [95% CI 0.66-0.96], P = 0.016), whereas the T-allele of KNG_7965CT and the A-allele of ADIPOQ_prom2GA were associated with increased risk of nephropathy (1.17 [1.03-1.32], P = 0.016; 1.46 [1.11-1.93], P = 0.006, respectively). Analyses of the transmission disequilibrium test showed similar trends only for ADIPOQ_prom2GA with the overtransmission of the A-allele to patients with diabetic nephropathy (1.52 [0.86-2.66], P = NS) and of the G-allele to patients without diabetic nephropathy (0.50 [0.27-0.92], P = 0.026). The overall significance for this variant (nominal P = 0.011) suggests that ADIPOQ might be involved in the development of diabetic nephropathy.
Collapse
Affiliation(s)
- Nathalie Vionnet
- INSERM U525, Centre National de Génotypage, 2, Rue Gaston Crémieux, 91006 Evry Cedex, France.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Hsieh HJ, Palmer CGS, Harney S, Newton JL, Wordsworth P, Brown MA, Sinsheimer JS. The v-MFG test: investigating maternal, offspring and maternal-fetal genetic incompatibility effects on disease and viability. Genet Epidemiol 2006; 30:333-47. [PMID: 16607625 DOI: 10.1002/gepi.20148] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The MFG test is a family-based association test that detects genetic effects contributing to disease in offspring, including offspring allelic effects, maternal allelic effects and MFG incompatibility effects. Like many other family-based association tests, it assumes that the offspring survival and the offspring-parent genotypes are conditionally independent provided the offspring is affected. However, when the putative disease-increasing locus can affect another competing phenotype, for example, offspring viability, the conditional independence assumption fails and these tests could lead to incorrect conclusions regarding the role of the gene in disease. We propose the v-MFG test to adjust for the genetic effects on one phenotype, e.g., viability, when testing the effects of that locus on another phenotype, e.g., disease. Using genotype data from nuclear families containing parents and at least one affected offspring, the v-MFG test models the distribution of family genotypes conditional on offspring phenotypes. It simultaneously estimates genetic effects on two phenotypes, viability and disease. Simulations show that the v-MFG test produces accurate genetic effect estimates on disease as well as on viability under several different scenarios. It generates accurate type-I error rates and provides adequate power with moderate sample sizes to detect genetic effects on disease risk when viability is reduced. We demonstrate the v-MFG test with HLA-DRB1 data from study participants with rheumatoid arthritis (RA) and their parents, we show that the v-MFG test successfully detects an MFG incompatibility effect on RA while simultaneously adjusting for a possible viability loss.
Collapse
Affiliation(s)
- Hsin-Ju Hsieh
- Biostatistics, University of California, Los Angeles, CA, USA
| | | | | | | | | | | | | |
Collapse
|
40
|
Abstract
Host genes, together with viral and environmental factors, determine the susceptibility, severity and course of respiratory syncytial virus infections. The course of infection is influenced by several frequently occurring gene variants that especially appear to influence the innate immune system and the regulation of the T helper (Th) type 1/Th2 cytokine pathways. Naturally occurring polymorphisms in certain genes have been associated with a severe course of respiratory syncytial virus infection. Genetic association between interleukin (IL)-4, IL-4Rα and IL-10 polymorphisms and respiratory syncytial virus bronchiolitis differ between children younger and older than 6 months, indicating a different pathogenesis in these subsets of patients. Knowledge of host genetic variants adds to our understanding of pathogenesis, and may identify critical steps to which prevention and therapy may be directed.
Collapse
Affiliation(s)
- Tjeerd G Kimman
- National Institute of Public Health and the Environment, Laboratory for Vaccine-Preventable Diseases, PO Box 1, 3720 BA Bilthoven, The Netherlands
| | - Riny Janssen
- National Institute of Public Health and the Environment, Laboratory of Toxicology, Pathology and Genetics, PO Box 1, 3720 BA Bilthoven, The Netherlands
| | - Barbara Hoebee
- National Institute of Public Health and the Environment, Laboratory of Toxicology, Pathology and Genetics, PO Box 1, 3720 BA Bilthoven, The Netherlands
| |
Collapse
|
41
|
Goldstein AM, Dondon MG, Andrieu N. Unconditional analyses can increase efficiency in assessing gene-environment interaction of the case-combined-control design. Int J Epidemiol 2006; 35:1067-73. [PMID: 16556643 PMCID: PMC2080880 DOI: 10.1093/ije/dyl048] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND A design combining both related and unrelated controls, named the case-combined-control design, was recently proposed to increase the power for detecting gene-environment (GxE) interaction. Under a conditional analytic approach, the case-combined-control design appeared to be more efficient and feasible than a classical case-control study for detecting interaction involving rare events. METHODS We now propose an unconditional analytic strategy to further increase the power for detecting gene-environment (GxE) interactions. This strategy allows the estimation of GxE interaction and exposure (E) main effects under certain assumptions (e.g. no correlation in E between siblings and the same exposure frequency in both control groups). Only the genetic (G) main effect cannot be estimated because it is biased. RESULTS Using simulations, we show that unconditional logistic regression analysis is often more efficient than conditional analysis for detecting GxE interaction, particularly for a rare gene and strong effects. The unconditional analysis is also at least as efficient as the conditional analysis when the gene is common and the main and joint effects of E and G are small. CONCLUSIONS Under the required assumptions, the unconditional analysis retains more information than does the conditional analysis for which only discordant case-control pairs are informative leading to more precise estimates of the odds ratios.
Collapse
Affiliation(s)
- Alisa M Goldstein
- Genetic Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD 20892, USA.
| | | | | |
Collapse
|
42
|
Epstein MP, Waldman ID, Satten GA. Improved association analyses of disease subtypes in case-parent triads. Genet Epidemiol 2006; 30:209-19. [PMID: 16496304 DOI: 10.1002/gepi.20138] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The sampling of case-parent triads is an appealing strategy for conducting association analyses of complex diseases. In certain situations, one may have interest in using the triads to identify genetic variants that are associated with a specific subtype of disease, perhaps related to a characteristic cluster of symptoms. A straightforward strategy for conducting such a subtype analysis would be to analyze only those triads with the subtype of interest. While such a strategy is valid, we show that triads without the subtype of interest can provide additional genetic information that increases power to detect association with the subtype of interest. We incorporate this additional information using a likelihood-based framework that permits flexible modeling and estimation of allelic effects on disease subtypes and also allows for missing parental data. Using simulated data under a variety of genetic models, we show that our proposed association test consistently outperforms association tests that only analyze triads with the subtype of interest. We also apply our method to a triad study of attention-deficit hyperactivity disorder and identify a genetic variant in the dopamine transporter gene that is associated with a subtype characterized by extreme levels of both inattentive and hyperactive-impulsive symptoms.
Collapse
Affiliation(s)
- Michael P Epstein
- Department of Human Genetics, Emory University, Atlanta, Georgia 30322, USA.
| | | | | |
Collapse
|
43
|
Putter H, Houwing-Duistermaat JJ, Nagelkerke NJD. Combining evidence for association from transmission disequilibrium and case-control studies using single-nucleotide polymorphisms. BMC Genet 2005; 6 Suppl 1:S106. [PMID: 16451562 PMCID: PMC1866833 DOI: 10.1186/1471-2156-6-s1-s106] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The aim of the present analysis is to combine evidence for association from the two most commonly used designs in genetic association analysis, the case-control design and the transmission disequilibrium test (TDT) design. The cases here are affected offspring from nuclear families and are used in both the case-control and TDT designs. As a result, inference from these designs is not independent. We applied a simple logistic regression method for combining evidence for association from case-control and TDT designs to single-nucleotide polymorphism data purchased on a region on chromosome 3, replicate 1 of the Aipotu population. Combining the evidence from the case-control and TDT designs yielded a 5–10% reduction in the standard errors of the relative risk estimates. The authors did not know the results before the analyses were conducted.
Collapse
Affiliation(s)
- Hein Putter
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, University of Leiden, PO Box 9604, 2300 RC, Leiden, The Netherlands
| | - Jeanine J Houwing-Duistermaat
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, University of Leiden, PO Box 9604, 2300 RC, Leiden, The Netherlands
| | - Nico JD Nagelkerke
- Department of Community Medicine, United Arab Emirates University, Al Ain, United Arab Emirates
| |
Collapse
|
44
|
Guo W, Fung WK. Combining the case-control methodology with the small size transmission/disequilibrium test for multiallelic markers. Eur J Hum Genet 2005; 13:1007-12. [PMID: 15957000 DOI: 10.1038/sj.ejhg.5201453] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Case-control studies compare marker-allele distributions in affected and unaffected individuals, and significant results may be due to linkage but can also simply reflect population structure. To test for linkage after obtaining a significant case-control finding, within-family analysis can be performed. In a transmission/disequilibrium test (TDT), genotypes of cases are compared to those of their parents to explore whether a specific allele, or marker, at a locus of interest is transmitted to a greater degree than Mendelian inheritance would warrant. For multiallelic markers, several authors have proposed extensions to the TDT. In this article, we propose a TDT test, utilizing the available information of a case-control study in the grouping of alleles for multiallelic markers, and thereby increase the statistical power of a TDT test with a small sample size.
Collapse
Affiliation(s)
- Wei Guo
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China.
| | | |
Collapse
|
45
|
Weinberg CR, Umbach DM. A hybrid design for studying genetic influences on risk of diseases with onset early in life. Am J Hum Genet 2005; 77:627-36. [PMID: 16175508 PMCID: PMC1275611 DOI: 10.1086/496900] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2005] [Accepted: 08/02/2005] [Indexed: 11/03/2022] Open
Abstract
Studies of genetic contributions to risk can be family-based, such as the case-parents design, or population-based, such as the case-control design. Both provide powerful inference regarding associations between genetic variants and risks, but both have limitations. The case-control design requires identifying and recruiting appropriate controls, but it has the advantage that nongenetic risk factors like exposures can be assessed. For a condition with an onset early in life, such as a birth defect, one should also genotype the mothers of cases and the mothers of controls to avoid potential confounding due to maternally mediated genetic effects acting on the fetus during gestation. The case-parents approach is less vulnerable than the case-mother/control-mother approach to biases due to population structure and self-selection. The case-parents approach also allows access to epigenetic phenomena like imprinting, but it cannot evaluate the role of nongenetic cofactors like exposures. We propose a hybrid design based on augmenting a set of affected individuals and their parents with a set of unaffected, unrelated individuals and their parents. The affected individuals and their parents are all genotyped, whereas only the parents of unaffected individuals are genotyped, although exposures are ascertained for both affected and unaffected offspring. The proposed hybrid design, through log-linear, likelihood-based analysis, allows estimation of the relative risk parameters, can provide more power than either the case-parents approach or the case-mother/control-mother approach, permits straightforward likelihood-ratio tests for bias due to mating asymmetry or population stratification, and admits valid alternative analyses when mating is asymmetric or when population stratification is detected.
Collapse
Affiliation(s)
- C R Weinberg
- Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA.
| | | |
Collapse
|
46
|
Epstein MP, Veal CD, Trembath RC, Barker JNWN, Li C, Satten GA. Genetic association analysis using data from triads and unrelated subjects. Am J Hum Genet 2005; 76:592-608. [PMID: 15712104 PMCID: PMC1199297 DOI: 10.1086/429225] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2004] [Accepted: 01/27/2005] [Indexed: 11/03/2022] Open
Abstract
The selection of an appropriate control sample for use in association mapping requires serious deliberation. Unrelated controls are generally easy to collect, but the resulting analyses are susceptible to spurious association arising from population stratification. Parental controls are popular, since triads comprising a case and two parents can be used in analyses that are robust to this stratification. However, parental controls are often expensive and difficult to collect. In some situations, studies may have both parental and unrelated controls available for analysis. For example, a candidate-gene study may analyze triads but may have an additional sample of unrelated controls for examination of background linkage disequilibrium in genomic regions. Also, studies may collect a sample of triads to confirm results initially found using a traditional case-control study. Initial association studies also may collect each type of control, to provide insurance against the weaknesses of the other type. In these situations, resulting samples will consist of some triads, some unrelated controls, and, possibly, some unrelated cases. Rather than analyze the triads and unrelated subjects separately, we present a likelihood-based approach for combining their information in a single combined association analysis. Our approach allows for joint analysis of data from both triad and case-control study designs. Simulations indicate that our proposed approach is more powerful than association tests that are based on each separate sample. Our approach also allows for flexible modeling and estimation of allele effects, as well as for missing parental data. We illustrate the usefulness of our approach using SNP data from a candidate-gene study of psoriasis.
Collapse
Affiliation(s)
- Michael P Epstein
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA.
| | | | | | | | | | | |
Collapse
|