1
|
Jenkins D. How do stochastic processes and genetic threshold effects explain incomplete penetrance and inform causal disease mechanisms? Philos Trans R Soc Lond B Biol Sci 2024; 379:20230045. [PMID: 38432317 PMCID: PMC10909503 DOI: 10.1098/rstb.2023.0045] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 01/16/2024] [Indexed: 03/05/2024] Open
Abstract
Incomplete penetrance is the rule rather than the exception in Mendelian disease. In syndromic monogenic disorders, phenotypic variability can be viewed as the combination of incomplete penetrance for each of multiple independent clinical features. Within genetically identical individuals, such as isogenic model organisms, stochastic variation at molecular and cellular levels is the primary cause of incomplete penetrance according to a genetic threshold model. By defining specific probability distributions of causal biological readouts and genetic liability values, stochasticity and incomplete penetrance provide information about threshold values in biological systems. Ascertainment of threshold values has been achieved by simultaneous scoring of relatively simple phenotypes and quantitation of molecular readouts at the level of single cells. However, this is much more challenging for complex morphological phenotypes using experimental and reductionist approaches alone, where cause and effect are separated temporally and across multiple biological modes and scales. Here I consider how causal inference, which integrates observational data with high confidence causal models, might be used to quantify the relative contribution of different sources of stochastic variation to phenotypic diversity. Collectively, these approaches could inform disease mechanisms, improve predictions of clinical outcomes and prioritize gene therapy targets across modes and scales of gene function. This article is part of a discussion meeting issue 'Causes and consequences of stochastic processes in development and disease'.
Collapse
Affiliation(s)
- Dagan Jenkins
- Great Ormond Street Institute of Child Health, University College London, 30 Guilford Street, London WC1N 1EH, UK
| |
Collapse
|
2
|
Lou XY, Hou TT, Liu SY, Xu HM, Lin F, Tang X, MacLeod SL, Cleves MA, Hobbs CA. Innovative approach to identify multigenomic and environmental interactions associated with birth defects in family-based hybrid designs. Genet Epidemiol 2021; 45:171-189. [PMID: 32996630 PMCID: PMC8495752 DOI: 10.1002/gepi.22363] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 09/08/2020] [Accepted: 09/11/2020] [Indexed: 11/09/2022]
Abstract
Genes, including those with transgenerational effects, work in concert with behavioral, environmental, and social factors via complex biological networks to determine human health. Understanding complex relationships between causal factors underlying human health is an essential step towards deciphering biological mechanisms. We propose a new analytical framework to investigate the interactions between maternal and offspring genetic variants or their surrogate single nucleotide polymorphisms (SNPs) and environmental factors using family-based hybrid study design. The proposed approach can analyze diverse genetic and environmental factors and accommodate samples from a variety of family units, including case/control-parental triads, and case/control-parental dyads, while minimizing potential bias introduced by population admixture. Comprehensive simulations demonstrated that our innovative approach outperformed the log-linear approach, the best available method for case-control family data. The proposed approach had greater statistical power and was capable to unbiasedly estimate the maternal and child genetic effects and the effects of environmental factors, while controlling the Type I error rate against population stratification. Using our newly developed approach, we analyzed the associations between maternal and fetal SNPs and obstructive and conotruncal heart defects, with adjustment for demographic and lifestyle factors and dietary supplements. Fourteen and 11 fetal SNPs were associated with obstructive and conotruncal heart defects, respectively. Twenty-seven and 17 maternal SNPs were associated with obstructive and conotruncal heart defects, respectively. In addition, maternal body mass index was a significant risk factor for obstructive defects. The proposed approach is a powerful tool for interrogating the etiological mechanism underlying complex traits.
Collapse
Affiliation(s)
- Xiang-Yang Lou
- Department of Biostatistics, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Ting-Ting Hou
- Department of Biostatistics, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, Florida, USA
- Institute of Bioinformatics and Institute of Crop Science, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Shou-Ye Liu
- Institute of Bioinformatics and Institute of Crop Science, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Hai-Ming Xu
- Institute of Bioinformatics and Institute of Crop Science, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Feng Lin
- Institute of Bioinformatics and Institute of Crop Science, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Xinyu Tang
- The US Food and Drug Administration, Silver Spring, Maryland, USA
| | | | - Mario A. Cleves
- Department of Pediatrics, Morsani College of Medicine, Health Informatics Institute, University of South Florida, Tampa, Florida, USA
| | - Charlotte A. Hobbs
- Rady Children’s Institute for Genomic Medicine, San Diego, California, USA
| |
Collapse
|
3
|
Bates S, Sesia M, Sabatti C, Candès E. Causal inference in genetic trio studies. Proc Natl Acad Sci U S A 2020; 117:24117-24126. [PMID: 32948695 PMCID: PMC7533659 DOI: 10.1073/pnas.2007743117] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 08/11/2020] [Indexed: 12/26/2022] Open
Abstract
We introduce a method to draw causal inferences-inferences immune to all possible confounding-from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a conditional independence test that identifies regions of the genome containing distinct causal variants. The proposed digital twin test compares an observed offspring to carefully constructed synthetic offspring from the same parents to determine statistical significance, and it can leverage any black-box multivariate model and additional nontrio genetic data to increase power. Crucially, our inferences are based only on a well-established mathematical model of recombination and make no assumptions about the relationship between the genotypes and phenotypes. We compare our method to the widely used transmission disequilibrium test and demonstrate enhanced power and localization.
Collapse
Affiliation(s)
- Stephen Bates
- Department of Statistics, Stanford University, Stanford, CA 94305;
| | - Matteo Sesia
- Department of Data Sciences and Operations, Marshall School of Business, University of Southern California, Los Angeles, CA 90089
| | - Chiara Sabatti
- Department of Statistics, Stanford University, Stanford, CA 94305
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305
| | - Emmanuel Candès
- Department of Statistics, Stanford University, Stanford, CA 94305;
- Department of Mathematics, Stanford University, Stanford, CA 94305
| |
Collapse
|
4
|
Klein N, Entwistle A, Rosenberger A, Kneib T, Bickeböller H. Candidate-gene association analysis for a continuous phenotype with a spike at zero using parent-offspring trios. J Appl Stat 2019; 47:2066-2080. [PMID: 35707573 PMCID: PMC9042123 DOI: 10.1080/02664763.2019.1704226] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 12/07/2019] [Indexed: 10/25/2022]
Abstract
In this paper, we propose the class of generalized additive models for location, scale and shape in a test for the association of genetic markers with non-normally distributed phenotypes comprising a spike at zero. The resulting statistical test is a generalization of the quantitative transmission disequilibrium test with mating type indicator, which was originally designed for normally distributed quantitative traits and parent-offspring data. As a motivational example, we consider coronary artery calcification (CAC), which can accurately be identified by electron beam tomography. In the investigated regions, individuals will have a continuous measure of the extent of calcium found or they will be calcium-free. Hence, the resulting distribution is a mixed discrete-continuous distribution with spike at zero. We carry out parent-offspring simulations motivated by such CAC measurement values in a screening population to study statistical properties of the proposed test for genetic association. Furthermore, we apply the approach to data of the Genetic Analysis Workshop 16 that are based on real genotype and family data of the Framingham Heart Study, and test the association of selected genetic markers with simulated coronary artery calcification.
Collapse
Affiliation(s)
- Nadja Klein
- Humboldt University of Berlin, Berlin, Germany
| | | | | | - Thomas Kneib
- Georg-August-Universität Göttingen, Göttingen, Germany
| | | |
Collapse
|
5
|
Fang H, Yang Y, Chen L. Weighted Transmission Disequilibrium Test for Family Trio Association Design. Hum Hered 2019; 83:196-209. [PMID: 30865952 DOI: 10.1159/000494353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 10/09/2018] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Family-based design is one of the most popular designs in genetic studies. Transmission disequilibrium test (TDT) for family trio design is optimal only under the additive trait model and may lose power under the other trait models. The TDT-type tests are powerful only when the underlying trait model is correctly specified. Usually, the true trait model is unknown, and the selection of the TDT-type test is problematic. Several methods, which are robust against the mis-specification of the trait model, have been proposed. In this paper, we propose a new efficiency robust procedure for family trio design, namely, the weighted TDT (WTDT) test. METHODS We combine information of the largest two TDT-type tests by using weights related to the three TDT-type tests and take the weighted sum as the test statistic. RESULTS Simulation results demonstrate that WTDT has power close to, but much more robust than, the optimal TDT-type test based on a single trait model. WTDT also outperforms other efficiency robust methods in terms of power. Applications to real and simulated data from Genetic Analysis Workshop (GAW15) illustrate the practical application of the WTDT method. CONCLUSION WTDT is not only efficiency robust to model mis-specifications but also efficiency robust against mis-specifications of risk allele.
Collapse
Affiliation(s)
- Hongyan Fang
- School of Mathematical Sciences, Anhui University, Hefei, China
| | - Yaning Yang
- Department of Statistics and Finance, University of Science and Technology of China, Hefei, China,
| | - Ling Chen
- School of Mathematical Sciences, Anhui University, Hefei, China
| |
Collapse
|
6
|
Hartsfield J, Everett E, Al-Qawasmi R. Genetic Factors in External Apical Root Resorption and Orthodontic Treatment. ACTA ACUST UNITED AC 2016; 15:115-122. [PMID: 15059946 DOI: 10.1177/154411130401500205] [Citation(s) in RCA: 109] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
External apical root resorption (EARR) is a common sequela of orthodontic treatment, although it may also occur in the absence of orthodontic treatment. The degree and severity of EARR associated with orthodontic treatment are multifactorial, involving host and environmental factors. Genetic factors account for at least 50% of the variation in EARR. Variation in the Interleukin 1 beta gene in orthodontically treated individuals accounts for 15% of the variation in EARR. Historical and contemporary evidence implicates injury to the periodontal ligament and supporting structures at the site of root compression following the application of orthodontic force as the earliest event leading to EARR. Decreased IL-1β production in the case of IL-1B (+3953) allele 1 may result in relatively less catabolic bone modeling (resorption) at the cortical bone interface with the PDL, which may result in prolonged stress concentrated in the root of the tooth, triggering a cascade of fatigue-related events leading to root resorption. One mechanism of action for EARR may be mediated through impairment of alveolar resorption, resulting in prolonged stress and strain of the adjacent tooth root due to dynamic functional loads. Future estimation of susceptibility to EARR will likely require the analysis of a suite of genes, root morphology, skeleto-dental values, and the treatment method to be used—or essentially the amount of tooth movement planned for treatment.
Collapse
Affiliation(s)
- J.K. Hartsfield
- Department of Oral Facial Development, Indiana University School of Dentistry, 1121 West Michigan Street, Indianapolis, IN 46202-5186, USA; and
| | | | | |
Collapse
|
7
|
Statistical equivalent of the classical TDT for quantitative traits and multivariate phenotypes. J Genet 2016; 94:619-28. [PMID: 26690516 DOI: 10.1007/s12041-015-0563-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Clinical end-point traits are usually governed by quantitative precursors. Hence, there is active research interest in developing statistical methods for association mapping of quantitative traits. Unlike population-based tests for association, family-based tests for transmission disequilibrium are protected against population stratification. In this study, we propose a logistic regression model to test the association for quantitative traits based on a trio design. We show that the method can be viewed as a direct extension of the classical transmission diequilibrium test for binary traits to quantitative traits. We evaluate the performance of our method usingextensive simulations and compare it with an existing method, family-based association test. We found that the two methods yield comparable powers if all families are considered. However, unlike FBAT, which yields an inflated rate of false positives when noninformative trios with all three individuals' heterozygous are removed, our method maintains the correct size without compromising too much on power. We show that our method can be easily modified to incorporate multivariate phenotypes. Here, we applied this method to analyse a quantitative endophenotype associated with alcoholism.
Collapse
|
8
|
Huang LO, Infante-Rivard C, Labbe A. Analysis of Case-Parent Trios Using a Loglinear Model with Adjustment for Transmission Ratio Distortion. Front Genet 2016; 7:155. [PMID: 27630667 PMCID: PMC5005337 DOI: 10.3389/fgene.2016.00155] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 08/16/2016] [Indexed: 01/16/2023] Open
Abstract
Transmission of the two parental alleles to offspring deviating from the Mendelian ratio is termed Transmission Ratio Distortion (TRD), occurs throughout gametic and embryonic development. TRD has been well-studied in animals, but remains largely unknown in humans. The Transmission Disequilibrium Test (TDT) was first proposed to test for association and linkage in case-trios (affected offspring and parents); adjusting for TRD using control-trios was recommended. However, the TDT does not provide risk parameter estimates for different genetic models. A loglinear model was later proposed to provide child and maternal relative risk (RR) estimates of disease, assuming Mendelian transmission. Results from our simulation study showed that case-trios RR estimates using this model are biased in the presence of TRD; power and Type 1 error are compromised. We propose an extended loglinear model adjusting for TRD. Under this extended model, RR estimates, power and Type 1 error are correctly restored. We applied this model to an intrauterine growth restriction dataset, and showed consistent results with a previous approach that adjusted for TRD using control-trios. Our findings suggested the need to adjust for TRD in avoiding spurious results. Documenting TRD in the population is therefore essential for the correct interpretation of genetic association studies.
Collapse
Affiliation(s)
- Lam O. Huang
- Department of Epidemiology, Biostatistics and Occupational Health, McGill UniversityMontréal, QC, Canada
| | - Claire Infante-Rivard
- Department of Epidemiology, Biostatistics and Occupational Health, McGill UniversityMontréal, QC, Canada
| | - Aurélie Labbe
- Department of Epidemiology, Biostatistics and Occupational Health, McGill UniversityMontréal, QC, Canada
- Department of Psychiatry, McGill UniversityMontréal, QC, Canada
- Douglas Mental Health University InstituteMontréal, QC, Canada
| |
Collapse
|
9
|
Bernal Rubio YL, Gualdrón Duarte JL, Bates RO, Ernst CW, Nonneman D, Rohrer GA, King A, Shackelford SD, Wheeler TL, Cantet RJC, Steibel JP. Meta-analysis of genome-wide association from genomic prediction models. Anim Genet 2015; 47:36-48. [PMID: 26607299 PMCID: PMC4738412 DOI: 10.1111/age.12378] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/11/2015] [Indexed: 12/21/2022]
Abstract
Genome-wide association (GWA) studies based on GBLUP models are a common practice in animal breeding. However, effect sizes of GWA tests are small, requiring larger sample sizes to enhance power of detection of rare variants. Because of difficulties in increasing sample size in animal populations, one alternative is to implement a meta-analysis (MA), combining information and results from independent GWA studies. Although this methodology has been used widely in human genetics, implementation in animal breeding has been limited. Thus, we present methods to implement a MA of GWA, describing the proper approach to compute weights derived from multiple genomic evaluations based on animal-centric GBLUP models. Application to real datasets shows that MA increases power of detection of associations in comparison with population-level GWA, allowing for population structure and heterogeneity of variance components across populations to be accounted for. Another advantage of MA is that it does not require access to genotype data that is required for a joint analysis. Scripts related to the implementation of this approach, which consider the strength of association as well as the sign, are distributed and thus account for heterogeneity in association phase between QTL and SNPs. Thus, MA of GWA is an attractive alternative to summarizing results from multiple genomic studies, avoiding restrictions with genotype data sharing, definition of fixed effects and different scales of measurement of evaluated traits.
Collapse
Affiliation(s)
- Y L Bernal Rubio
- Departamento de Producción Animal, Facultad de Agronomía, UBA, Buenos Aires, 1417, Argentina.,Department of Animal Science, Michigan State University, East Lansing, MI, 48824-1225, USA
| | - J L Gualdrón Duarte
- Departamento de Producción Animal, Facultad de Agronomía, UBA, Buenos Aires, 1417, Argentina
| | - R O Bates
- Departamento de Producción Animal, Facultad de Agronomía, UBA, Buenos Aires, 1417, Argentina
| | - C W Ernst
- Departamento de Producción Animal, Facultad de Agronomía, UBA, Buenos Aires, 1417, Argentina
| | - D Nonneman
- USDA/ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933-0166, USA
| | - G A Rohrer
- USDA/ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933-0166, USA
| | - A King
- USDA/ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933-0166, USA
| | - S D Shackelford
- USDA/ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933-0166, USA
| | - T L Wheeler
- USDA/ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933-0166, USA
| | - R J C Cantet
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824-1225, USA.,Consejo Nacional de Investigaciones Cientificas y Tecnicas - CONICET, Buenos Aires, Argentina
| | - J P Steibel
- Departamento de Producción Animal, Facultad de Agronomía, UBA, Buenos Aires, 1417, Argentina.,Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, 48824-1225, USA
| |
Collapse
|
10
|
Kim W. Transmission Disequilibrium Tests Based on Read Counts for Low-Coverage Next-Generation Sequence Data. Hum Hered 2015; 80:36-49. [PMID: 26278553 DOI: 10.1159/000434645] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 05/30/2015] [Indexed: 11/19/2022] Open
Abstract
The purpose of this paper is the introduction of new statistical methods for case-parent trio association studies based on the read counts that can be obtained from next-generation sequencing (NGS) experiments. This work focuses on the inclusion of low-coverage data into the case-parent trio design without genotype classification or imputation. Two different approaches are considered: (1) a likelihood-based approach implementing a 15-component parametric mixture model and (2) a model-free approach that applies non-parametric statistical methods to the ratios of the read counts to coverage. Simulation studies are conducted to evaluate the performances of the proposed tests. In addition, the non-centrality parameters of the mixture likelihood-based tests are derived to determine sample sizes and coverage for a NGS experimental design. As an example, the sample sizes to maintain specified powers of a published adolescent idiopathic scoliosis (AIS) study are presented. The simulation results show that the tests using the genotypes classified by the maximum Bayesian posterior probability have significantly inflated type I error rates for low-coverage data. The tests using the posterior probabilities instead of the classified genotypes show lower power than the proposed tests. Generally, power for the likelihood-based approach is higher than that for the non-parametric ratio-based approach. For the AIS example, approximately 654 trios with 4× coverage are necessary to maintain 90% power when detecting an association of odds ratio 2 at a locus with a minor allele frequency of 0.35 at the level of significance α = 5 × 10(-8). By comparison, approximately 416 trios with 25× coverage are required to maintain the same power with the same settings. The R and C source codes to calculate the proposed test statistics, the sample sizes and power can be obtained by contacting the author (wkim@cau.ac.kr).
Collapse
Affiliation(s)
- Wonkuk Kim
- Department of Applied Statistics, Chung-Ang University, Seoul, South Korea
| |
Collapse
|
11
|
Hsieh TJ, Chang SH, Tai JJ. A family-based robust multivariate association test using maximum statistic. Ann Hum Genet 2014; 78:117-28. [PMID: 24571230 DOI: 10.1111/ahg.12054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 12/18/2013] [Indexed: 11/29/2022]
Abstract
For characterizing the genetic mechanisms of complex diseases familial data with multiple correlated quantitative traits are usually collected in genetic studies. To analyze such data, various multivariate tests have been proposed to investigate the association between the underlying disease genes and the multiple traits. Although these multivariate association tests may have better power performance than the univariate association tests, they suffer from loss of testing power when the genetic models of the putative genes are misspecified. To address the problem, in this paper we aim to develop a family-based robust multivariate association test. We will first establish the optimal multivariate score tests for the recessive, additive, and dominant genetic models. Based on these optimal tests, a maximum-type robust multivariate association test is then obtained. Simulations are conducted to compare the power of our method with that of other existing multivariate methods. The results show that the robust multivariate test does manifest the robustness in power over all plausible genetic models. A practical data set is applied to demonstrate the applicability of our approach. The results suggest that the robust multivariate test is more powerful than the robust univariate test when dealing with multiple quantitative traits.
Collapse
Affiliation(s)
- Tsung-Jen Hsieh
- Division of Biostatistics, College of Public Health, National Taiwan University, Taipei, Taiwan
| | | | | |
Collapse
|
12
|
Jiang Y, Li N, Zhang H. Identifying Genetic Variants for Addiction via Propensity Score Adjusted Generalized Kendall's Tau. J Am Stat Assoc 2014; 109:905-930. [PMID: 25382885 PMCID: PMC4219655 DOI: 10.1080/01621459.2014.901223] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Revised: 12/01/2013] [Indexed: 12/18/2022]
Abstract
Identifying replicable genetic variants for addiction has been extremely challenging. Besides the common difficulties with genome-wide association studies (GWAS), environmental factors are known to be critical to addiction, and comorbidity is widely observed. Despite the importance of environmental factors and comorbidity for addiction study, few GWAS analyses adequately considered them due to the limitations of the existing statistical methods. Although parametric methods have been developed to adjust for covariates in association analysis, difficulties arise when the traits are multivariate because there is no ready-to-use model for them. Recent nonparametric development includes U-statistics to measure the phenotype-genotype association weighted by a similarity score of covariates. However, it is not clear how to optimize the similarity score. Therefore, we propose a semiparametric method to measure the association adjusted by covariates. In our approach, the nonparametric U-statistic is adjusted by parametric estimates of propensity scores using the idea of inverse probability weighting. The new measurement is shown to be asymptotically unbiased under our null hypothesis while the previous non-weighted and weighted ones are not. Simulation results show that our test improves power as opposed to the non-weighted and two other weighted U-statistic methods, and it is particularly powerful for detecting gene-environment interactions. Finally, we apply our proposed test to the Study of Addiction: Genetics and Environment (SAGE) to identify genetic variants for addiction. Novel genetic variants are found from our analysis, which warrant further investigation in the future.
Collapse
Affiliation(s)
- Yuan Jiang
- Department of Statistics, Oregon State University, Corvallis, Oregon 97331-4606
| | - Ni Li
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| | | |
Collapse
|
13
|
Zhang Z, Wang JC, Howells W, Lin P, Agrawal A, Edenberg HJ, Tischfield JA, Schuckit MA, Bierut LJ, Goate A, Rice JP. Dosage transmission disequilibrium test (dTDT) for linkage and association detection. PLoS One 2013; 8:e63526. [PMID: 23691058 PMCID: PMC3653954 DOI: 10.1371/journal.pone.0063526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2012] [Accepted: 04/06/2013] [Indexed: 11/26/2022] Open
Abstract
Both linkage and association studies have been successfully applied to identify disease susceptibility genes with genetic markers such as microsatellites and Single Nucleotide Polymorphisms (SNPs). As one of the traditional family-based studies, the Transmission/Disequilibrium Test (TDT) measures the over-transmission of an allele in a trio from its heterozygous parents to the affected offspring and can be potentially useful to identify genetic determinants for complex disorders. However, there is reduced information when complete trio information is unavailable. In this study, we developed a novel approach to "infer" the transmission of SNPs by combining both the linkage and association data, which uses microsatellite markers from families informative for linkage together with SNP markers from the offspring who are genotyped for both linkage and a Genome-Wide Association Study (GWAS). We generalized the traditional TDT to process these inferred dosage probabilities, which we name as the dosage-TDT (dTDT). For evaluation purpose, we developed a simulation procedure to assess its operating characteristics. We applied the dTDT to the simulated data and documented the power of the dTDT under a number of different realistic scenarios. Finally, we applied our methods to a family study of alcohol dependence (COGA) and performed individual genotyping on complete families for the top signals. One SNP (rs4903712 on chromosome 14) remained significant after correcting for multiple testing Methods developed in this study can be adapted to other platforms and will have widespread applicability in genomic research when case-control GWAS data are collected in families with existing linkage data.
Collapse
Affiliation(s)
- Zhehao Zhang
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - Jen-Chyong Wang
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - William Howells
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - Peng Lin
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - Arpana Agrawal
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - Howard J. Edenberg
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Jay A. Tischfield
- LSB 136, Rutgers University, Piscataway, New Jersey, United States of America
| | - Marc A. Schuckit
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Laura J. Bierut
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - Alison Goate
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - John P. Rice
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| |
Collapse
|
14
|
Li Q, Li Z, Zheng G, Gao G, Yu K. Rank-based robust tests for quantitative-trait genetic association studies. Genet Epidemiol 2013; 37:358-65. [PMID: 23526350 DOI: 10.1002/gepi.21723] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Revised: 02/18/2013] [Accepted: 02/20/2013] [Indexed: 11/06/2022]
Abstract
Standard linear regression is commonly used for genetic association studies of quantitative traits. This approach may not be appropriate if the trait, on its original or transformed scales, does not follow a normal distribution. A rank-based nonparametric approach that does not rely on any distributional assumptions can be an attractive alternative. Although several nonparametric tests exist in the literature, their performance in the genetic association setting is not well studied. We evaluate various nonparametric tests for the analysis of quantitative traits and propose a new class of nonparametric tests that have robust performance for traits with various distributions and under different genetic models. We demonstrate the advantage of our proposed methods through simulation study and real data applications.
Collapse
Affiliation(s)
- Qizhai Li
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
| | | | | | | | | |
Collapse
|
15
|
Abstract
In this chapter we describe a novel Bayesian approach to designing GWAS studies with the goal of ensuring robust detection of effects of genomic loci associated with trait variation.The goal of GWAS is to detect loci associated with variation in traits of interest. Finding which of 500,000-1,000,000 loci has a practically significant effect is a difficult statistical problem, like finding a needle in a haystack. We address this problem by designing experiments to detect effects with a given Bayes factor, where the Bayes factor is chosen sufficiently large to overcome the low prior odds for genomic associations. Methods are given for various possible data structures including random population samples, case-control designs, transmission disequilibrium tests, sib-based transmission disequilibrium tests, and other family-based designs including designs for plants with clonal replication. We also consider the problem of eliciting prior information from experts, which is necessary to quantify prior odds for loci. We advocate a "subjective" Bayesian approach, where the prior distribution is considered as a mathematical representation of our prior knowledge, while also giving generic formulae that allow conservative computations based on low prior information, e.g., equivalent to the information in a single sample point. Examples using R and the R packages ldDesign are given throughout.
Collapse
Affiliation(s)
- Roderick D Ball
- Scion (New Zealand Forest Research Institute Limited), Rotorua, New Zealand
| |
Collapse
|
16
|
Teyssèdre S, Elsen JM, Ricard A. Statistical distributions of test statistics used for quantitative trait association mapping in structured populations. Genet Sel Evol 2012; 44:32. [PMID: 23146127 PMCID: PMC3817592 DOI: 10.1186/1297-9686-44-32] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2012] [Accepted: 10/31/2012] [Indexed: 11/25/2022] Open
Abstract
Background Spurious associations between single nucleotide polymorphisms and phenotypes are a major issue in genome-wide association studies and have led to underestimation of type 1 error rate and overestimation of the number of quantitative trait loci found. Many authors have investigated the influence of population structure on the robustness of methods by simulation. This paper is aimed at developing further the algebraic formalization of power and type 1 error rate for some of the classical statistical methods used: simple regression, two approximate methods of mixed models involving the effect of a single nucleotide polymorphism (SNP) and a random polygenic effect (GRAMMAR and FASTA) and the transmission/disequilibrium test for quantitative traits and nuclear families. Analytical formulae were derived using matrix algebra for the first and second moments of the statistical tests, assuming a true mixed model with a polygenic effect and SNP effects. Results The expectation and variance of the test statistics and their marginal expectations and variances according to the distribution of genotypes and estimators of variance components are given as a function of the relationship matrix and of the heritability of the polygenic effect. These formulae were used to compute type 1 error rate and power for any kind of relationship matrix between phenotyped and genotyped individuals for any level of heritability. For the regression method, type 1 error rate increased with the variability of relationships and with heritability, but decreased with the GRAMMAR method and was not affected with the FASTA and quantitative transmission/disequilibrium test methods. Conclusions The formulae can be easily used to provide the correct threshold of type 1 error rate and to calculate the power when designing experiments or data collection protocols. The results concerning the efficacy of each method agree with simulation results in the literature but were generalized in this work. The power of the GRAMMAR method was equal to the power of the FASTA method at the same type 1 error rate. The power of the quantitative transmission/disequilibrium test was low. In conclusion, the FASTA method, which is very close to the full mixed model, is recommended in association mapping studies.
Collapse
Affiliation(s)
- Simon Teyssèdre
- INRA, UR 631 Station d’Amélioration Génétique des Animaux, Castanet-Tolosan F-31326, France
| | | | | |
Collapse
|
17
|
Ding X, Wang C, Zhang Q. Pedigree transmission disequilibrium test for quantitative traits in farm animals. CHINESE SCIENCE BULLETIN-CHINESE 2012. [DOI: 10.1007/s11434-012-5218-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
18
|
Zhu W, Jiang Y, Zhang H. Nonparametric Covariate-Adjusted Association Tests Based on the Generalized Kendall's Tau(). J Am Stat Assoc 2012; 107:1-11. [PMID: 22745516 PMCID: PMC3381868 DOI: 10.1080/01621459.2011.643707] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Identifying the risk factors for comorbidity is important in psychiatric research. Empirically, studies have shown that testing multiple, correlated traits simultaneously is more powerful than testing a single trait at a time in association analysis. Furthermore, for complex diseases, especially mental illnesses and behavioral disorders, the traits are often recorded in different scales such as dichotomous, ordinal and quantitative. In the absence of covariates, nonparametric association tests have been developed for multiple complex traits to study comorbidity. However, genetic studies generally contain measurements of some covariates that may affect the relationship between the risk factors of major interest (such as genes) and the outcomes. While it is relatively easy to adjust these covariates in a parametric model for quantitative traits, it is challenging for multiple complex traits with possibly different scales. In this article, we propose a nonparametric test for multiple complex traits that can adjust for covariate effects. The test aims to achieve an optimal scheme of adjustment by using a maximum statistic calculated from multiple adjusted test statistics. We derive the asymptotic null distribution of the maximum test statistic, and also propose a resampling approach, both of which can be used to assess the significance of our test. Simulations are conducted to compare the type I error and power of the nonparametric adjusted test to the unadjusted test and other existing adjusted tests. The empirical results suggest that our proposed test increases the power through adjustment for covariates when there exist environmental effects, and is more robust to model misspecifications than some existing parametric adjusted tests. We further demonstrate the advantage of our test by analyzing a data set on genetics of alcoholism.
Collapse
|
19
|
Jamrozik EF, Warrington N, McClenaghan J, Hui J, Musk AW, James A, Beilby JP, Hansen J, DE Klerk NH, Palmer LJ. Functional haplotypes in the PTGDR gene fail to associate with asthma in two Australian populations. Respirology 2011; 16:359-66. [PMID: 21199159 DOI: 10.1111/j.1440-1843.2010.01917.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
BACKGROUND AND OBJECTIVE Haplotypes in the promoter region of the prostanoid DP receptor (PTGDR) gene have been shown to functionally influence gene transcription and to be associated with asthma in two previous case-control studies in Caucasians. This study tested the association of PTGDR haplotypes with asthma phenotypes in two large Caucasian-Australian populations. These results were incorporated in a meta-analysis with previously published data to determine the overall role for these haplotypes in the risk of asthma. METHODS Three PTGDR promoter-region single nucleotide polymorphisms (SNP) were genotyped in 368 individuals from the Western Australian Twin Child Health study and 2988 individuals from the Busselton Health Study. Logistic regression and transition disequilibrium tests were used to assess whether SNP genotypes and three SNP haplotypes were associated with doctor-diagnosed asthma or intermediate quantitative traits. Longitudinal data from the Busselton Health Study were used to examine whether PTGDR influences changes in lung function over time. Meta-analysis incorporated the findings of this study with those of two previous studies in Caucasian populations. RESULTS Cross-sectional associations between PTGDR haplotypes and asthma phenotypes were non-significant (P > 0.05) in both populations. Longitudinal analyses of PTGDR and lung function were also non-significant. Meta-analysis, however, suggested that haplotype TCT was significantly associated with decreased risk of asthma (OR = 0.76; P = 0.02) while haplotype CCC was not significantly associated with asthma (OR = 1.30; P = 0.07). CONCLUSIONS These results suggest that despite the non-significant findings in the present study populations, PTGDR promoter haplotypes may account for a small but significant proportion of the risk of asthma in Caucasian populations.
Collapse
Affiliation(s)
- Euzebiusz F Jamrozik
- Centre for Genetic Epidemiology and Biostatistics, University of Western Australia, West Perth, Western Australia, Australia.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Van Steen K. Perspectives on genome-wide multi-stage family-based association studies. Stat Med 2011; 30:2201-21. [DOI: 10.1002/sim.4259] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2010] [Accepted: 03/07/2011] [Indexed: 01/03/2023]
|
21
|
Abstract
Identifying the risk factors for mental illnesses is of significant public health importance. Diagnosis, stigma associated with mental illnesses, comorbidity, and complex etiologies, among others, make it very challenging to study mental disorders. Genetic studies of mental illnesses date back at least a century ago, beginning with descriptive studies based on Mendelian laws of inheritance. A variety of study designs including twin studies, family studies, linkage analysis, and more recently, genomewide association studies have been employed to study the genetics of mental illnesses, or complex diseases in general. In this paper, I will present the challenges and methods from a statistical perspective and focus on genetic association studies.
Collapse
Affiliation(s)
- Heping Zhang
- Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034
| |
Collapse
|
22
|
Mirea L, Sun L, Stafford JE, Bull SB. Using evidence for population stratification bias in combined individual- and family-level genetic association analyses of quantitative traits. Genet Epidemiol 2010; 34:502-11. [PMID: 20552647 DOI: 10.1002/gepi.20506] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Genetic association studies are generally performed either by examining differences in the genotype distribution between individuals or by testing for preferential allele transmission within families. In the absence of population stratification bias (PSB), integrated analyses of individual and family data can increase power to identify susceptibility loci [Abecasis et al., 2000. Am. J. Hum. Genet. 66:279-292; Chen and Lin, 2008. Genet. Epidemiol. 32:520-527; Epstein et al., 2005. Am. J. Hum. Genet. 76:592-608]. In existing methods, the presence of PSB is initially assessed by comparing results from between-individual and within-family analyses, and then combined analyses are performed only if no significant PSB is detected. However, this strategy requires specification of an arbitrary testing level alpha(PSB), typically 5%, to declare PSB significance. As a novel alternative, we propose to directly use the PSB evidence in weights that combine results from between-individual and within-family analyses. The weighted approach generalizes previous methods by using a continuous weighting function that depends only on the observed P-value instead of a binary weight that depends on alpha(PSB). Using simulations, we demonstrate that for quantitative trait analysis, the weighted approach provides a good compromise between type I error control and power to detect association in studies with few genotyped markers and limited information regarding population structure.
Collapse
Affiliation(s)
- Lucia Mirea
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | | | | | | |
Collapse
|
23
|
Erbe M, Ytournel F, Pimentel E, Sharifi A, Simianer H. Power and robustness of three whole genome association mapping approaches in selected populations. J Anim Breed Genet 2010; 128:3-14. [DOI: 10.1111/j.1439-0388.2010.00885.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
24
|
Sillanpää MJ. Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses. Heredity (Edinb) 2010; 106:511-9. [PMID: 20628415 DOI: 10.1038/hdy.2010.91] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Population-based genomic association analyses are more powerful than within-family analyses. However, population stratification (unknown or ignored origin of individuals from multiple source populations) and cryptic relatedness (unknown or ignored covariance between individuals because of their relatedness) are confounding factors in population-based genomic association analyses, which inflate the false-positive rate. As a consequence, false association signals may arise in genomic data association analyses for reasons other than true association between the tested genomic factor (marker genotype, gene or protein expression) and the study phenotype. It is therefore important to correct or account for these confounders in population-based genomic data association analyses. The common correction techniques for population stratification and cryptic relatedness problems are presented here in the phenotype-marker association analysis context, and comments on their suitability for other types of genomic association analyses (for example, phenotype-expression association) are also provided. Even though many of these techniques have originally been developed in the context of human genetics, most of them are also applicable to model organisms and breeding populations.
Collapse
Affiliation(s)
- M J Sillanpää
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
25
|
Zhang H, Liu CT, Wang X. An Association Test for Multiple Traits Based on the Generalized Kendall's Tau. J Am Stat Assoc 2010; 105:473-481. [PMID: 20711441 PMCID: PMC2920220 DOI: 10.1198/jasa.2009.ap08387] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In many genetics studies, especially in the investigation of mental illness and behavioral disorders, it is common for researchers to collect multiple phenotypes to characterize the complex disease of interest. It may be advantageous to analyze those phenotypic measurements simultaneously if they share a similar genetic mechanism. In this study, we present a nonparametric approach to studying multiple traits together rather than examining each trait separately. Through simulation we compared the nominal type I error and power of our proposed test to an existing test, i.e., a generalized family-based association test. The empirical results suggest that our proposed approach is superior to the existing test in the analysis of ordinal traits. The advantage is demonstrated on a data set concerning alcohol dependence. In this application, the use of our methods enhanced the signal of the association test.
Collapse
|
26
|
Joo J, Kwak M, Chen Z, Zheng G. Efficiency robust statistics for genetic linkage and association studies under genetic model uncertainty. Stat Med 2010; 29:158-80. [PMID: 19918942 DOI: 10.1002/sim.3759] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
When testing genetic linkage and association, test statistics that follow a normal or Chi-square distributions are often used. These statistics are usually derived under a specific mode of inheritance (genetic model). Common genetic models include, but not limited to, the recessive, additive, multiplicative, and dominant models. For many diseases, their underlying genetic models are often unknown. Instead, a family of scientifically plausible genetic models may be available, which includes the four commonly used models. Hence, the optimal test is not available. Employing a single test statistic which is optimal for one model may suffer from substantial loss of power when the model is misspecified. In this situation efficient robust tests are useful. In this tutorial, we first review several commonly used robust statistics, including maximum efficiency robust tests, maximal tests, and constrained likelihood ratio tests for three common designs in genetic studies: (i) linkage analysis using affected sib-pairs, (ii) association studies using parents-offspring trios, and (iii) case-control association studies (unmatched and matched). Codes in the R statistical language for applying these robust statistics to test for linkage and association are presented with examples. We also provide some comparisons of the performance of the various robust tests via simulation studies. Guidelines for applications are also given for each study design. Finally, applications of robust tests to genome-wide association studies and meta-analysis are discussed.
Collapse
Affiliation(s)
- Jungnam Joo
- Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA
| | | | | | | |
Collapse
|
27
|
York TP, Vargas-Irwin C, Anderson WH, van den Oord EJCG. Asthma pharmacogenetic study using finite mixture models to handle drug-response heterogeneity. Pharmacogenomics 2009; 10:753-67. [PMID: 19450127 DOI: 10.2217/pgs.09.19] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
AIMS Typically, only a proportion of the patients suffering from common diseases respond to frequently prescribed drugs. Since the presence of drug nonresponders in pharmacogenetic studies can adversely affect statistical power we propose a method to restrict genetic tests to drug responders only. In this paper, we estimate drug nonresponse in a clinical trial for the asthma drug montelukast as either the result of an inactive genetic variant or the presence of subgroups of patients not responding to the drug. MATERIALS & METHODS We propose finite mixture models where unobserved (latent) categorical variables represent either a drug responder or nonresponder class. Analytical results show this method can substantially improve power by testing for genetic variants only in the drug-responder class. We also demonstrate how, if appropriate, placebo data can be used to further increase power to detect genetic effects. RESULTS It was estimated that only 25-30% of the subjects responded to the drug montelukast. Genetic-association tests confined to the responder group resulted in a substantial increase in explained genetic variance, between 10.3 and 13.2%, for four markers in the arachidonate 5-lipoxigenase (ALOX5) and cysteinyl leukotriene receptor 1 (CYSLTR1) genes. CONCLUSION The presence of subgroups of patients that do not respond to the drug was an important reason for nonresponse. Additional analyses using finite mixture models in pharmacogenetic studies may provide insight into drug nonresponse and a better discrimination between true and false discoveries.
Collapse
Affiliation(s)
- Timothy P York
- Department of Human and Molecular Genetics, Virginia Commonwealth University, Richmond, VA 23298-0003, USA.
| | | | | | | |
Collapse
|
28
|
Toosi A, Fernando RL, Dekkers JCM. Genomic selection in admixed and crossbred populations. J Anim Sci 2009; 88:32-46. [PMID: 19749023 DOI: 10.2527/jas.2009-1975] [Citation(s) in RCA: 144] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In livestock, genomic selection (GS) has primarily been investigated by simulation of purebred populations. Traits of interest are, however, often measured in crossbred or mixed populations with uncertain breed composition. If such data are used as the training data for GS without accounting for breed composition, estimates of marker effects may be biased due to population stratification and admixture. To investigate this, a genome of 100 cM was simulated with varying marker densities (5 to 40 segregating markers per cM). After 1,000 generations of random mating in a population of effective size 500, 4 lines with effective size 100 were isolated and mated for another 50 generations to create 4 pure breeds. These breeds were used to generate combined, F(1), F(2), 3- and 4-way crosses, and admixed training data sets of 1,000 individuals with phenotypes for an additive trait controlled by 100 segregating QTL and heritability of 0.30. The validation data set was a sample of 1,000 genotyped individuals from one pure breed. Method Bayes-B was used to simultaneously estimate the effects of all markers for breeding value estimation. With 5 (40) markers per cM, the correlation of true with estimated breeding value of selection candidates (accuracy) was greatest, 0.79 (0.85), when data from the same pure breed were used for training. When the training data set consisted of crossbreds, the accuracy ranged from 0.66 (0.79) to 0.74 (0.83) for the 2 marker densities, respectively. The admixed training data set resulted in nearly the same accuracies as when training was in the breed to which selection candidates belonged. However, accuracy was greatly reduced when genes from the target pure breed were not included in the admixed or crossbred population. This implies that, with high-density markers, admixed and crossbred populations can be used to develop GS prediction equations for all pure breeds that contributed to the population, without a substantial loss of accuracy compared with training on purebred data, even if breed origin has not been explicitly taken into account. In addition, using GS based on high-density marker data, purebreds can be accurately selected for crossbred performance without the need for pedigree or breed information. Results also showed that haplotype segments with strong linkage disequilibrium are shorter in crossbred and admixed populations than in purebreds, providing opportunities for QTL fine mapping.
Collapse
Affiliation(s)
- A Toosi
- Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames 50011, USA
| | | | | |
Collapse
|
29
|
Zhang L, Li J, Pei YF, Liu Y, Deng HW. Tests of association for quantitative traits in nuclear families using principal components to correct for population stratification. Ann Hum Genet 2009; 73:601-13. [PMID: 19702646 DOI: 10.1111/j.1469-1809.2009.00539.x] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Traditional transmission disequilibrium test (TDT) based methods for genetic association analyses are robust to population stratification at the cost of a substantial loss of power. We here describe a novel method for family-based association studies that corrects for population stratification with the use of an extension of principal component analysis (PCA). Specifically, we adopt PCA on unrelated parents in each family. We then infer principal components for children from those for their parents through a TDT-like strategy. Two test statistics within the variance-components model are proposed for association tests. Simulation results show that the proposed tests have correct type I error rates regardless of population stratification, and have greatly improved power over two popular TDT-based methods: QTDT and FBAT. The application to the Genetic Analysis Workshop 16 (GAW16) data sets attests to the feasibility of the proposed method.
Collapse
Affiliation(s)
- Lei Zhang
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, People's Republic of China
| | | | | | | | | |
Collapse
|
30
|
Univariate/multivariate genome-wide association scans using data from families and unrelated samples. PLoS One 2009; 4:e6502. [PMID: 19652719 PMCID: PMC2715864 DOI: 10.1371/journal.pone.0006502] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2009] [Accepted: 06/30/2009] [Indexed: 11/19/2022] Open
Abstract
As genome-wide association studies (GWAS) are becoming more popular, two approaches, among others, could be considered in order to improve statistical power for identifying genes contributing subtle to moderate effects to human diseases. The first approach is to increase sample size, which could be achieved by combining both unrelated and familial subjects together. The second approach is to jointly analyze multiple correlated traits. In this study, by extending generalized estimating equations (GEEs), we propose a simple approach for performing univariate or multivariate association tests for the combined data of unrelated subjects and nuclear families. In particular, we correct for population stratification by integrating principal component analysis and transmission disequilibrium test strategies. The proposed method allows for multiple siblings as well as missing parental information. Simulation studies show that the proposed test has improved power compared to two popular methods, EIGENSTRAT and FBAT, by analyzing the combined data, while correcting for population stratification. In addition, joint analysis of bivariate traits has improved power over univariate analysis when pleiotropic effects are present. Application to the Genetic Analysis Workshop 16 (GAW16) data sets attests to the feasibility and applicability of the proposed method.
Collapse
|
31
|
Huang QY, Shen H, Deng HY, Conway T, Elze L, Davies KM, Recker RR, Deng HW. Linkage and association between CA repeat polymorphism of the TNFR2 gene and obesity phenotypes in two independent Caucasian populations. ACTA ACUST UNITED AC 2009; 33:775-81. [PMID: 16980123 DOI: 10.1016/s0379-4172(06)60110-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2005] [Accepted: 11/23/2005] [Indexed: 11/19/2022]
Abstract
Previously, our group has reported a suggestive linkage evidence of 1p36 with body mass index (BMI) (LOD = 2.09). The tumor necrosis factor receptor 2 (TNFR2) at 1p36 is an excellent positional and functional candidate gene for obesity. In this study, we have investigated the linkage and association between the TNFR2 gene and obesity phenotypes in two large independent samples, using the quantitative transmission disequilibrium tests (QTDT). The first group was made up of 1,836 individuals from 79 multi-generation pedigrees. The second group was a randomly ascertained set of 636 individuals from 157 US Caucasian nuclear families. Obesity phenotypes tested include BMI, fat mass, and percentage fat mass (PFM). A significant result (P = 0.0056) was observed for linkage with BMI in the sample of the multigenerational pedigrees. Our data support the TNFR2 gene as a quantitative trait locus (QTL) underlying BMI variation in the Caucasian populations.
Collapse
Affiliation(s)
- Qing-Yang Huang
- College of Life Sciences, Central China Normal University, Wuhan 430079, China
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Abstract
In studies of complex disorders such as nicotine dependence, it is common that researchers assess multiple variables related to a disorder as well as other disorders that are potentially correlated with the primary disorder of interest. In this work, we refer to those variables and disorders broadly as multiple traits. The multiple traits may or may not have a common causal genetic variant. Intuitively, it may be more powerful to accommodate multiple traits in genetic traits, but the analysis of multiple traits is generally more complicated than the analysis of a single trait. Furthermore, it is not well documented as to how much power we may potentially gain by considering multiple traits. Our aim is to enhance our understanding on this important and practical issue. We considered a variety of correlation structures between traits and the disease locus. To focus on the effect of accommodating multiple traits, we examined genetic models that are relatively simple so that we can pinpoint the factors affecting the power. We conducted simulation studies to explore the performance of testing multiple traits simultaneously and the performance of testing a single trait at a time in family-based association studies. Our simulation results demonstrated that the performance of testing multiple traits simultaneously is better than that of testing each trait individually for almost models considered. We also found that the power of association tests varies among the underlying models. The advantage of conducting a multiple traits test is minimized when some traits are influenced by the gene only through other traits; and it is maximized when there are causal relations between the traits and the gene, and among the traits themselves or when there are extraneous traits.
Collapse
Affiliation(s)
- Wensheng Zhu
- Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034
| | | |
Collapse
|
33
|
Wang JY, Tai JJ. Robust Quantitative Trait Association Tests in the Parent-Offspring Triad Design: Conditional Likelihood-Based Approaches. Ann Hum Genet 2009; 73:231-44. [DOI: 10.1111/j.1469-1809.2008.00502.x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
34
|
Li Z, Zhang H, Zheng G, Gastwirth JL, Gail MH. Excess false positive rate caused by population stratification and disease rate heterogeneity in case–control association studies. Comput Stat Data Anal 2009. [DOI: 10.1016/j.csda.2008.02.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
35
|
Benyamin B, Visscher PM, McRae AF. Family-based genome-wide association studies. Pharmacogenomics 2009; 10:181-90. [DOI: 10.2217/14622416.10.2.181] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
In the last 2 years, the effort to identify genes affecting common diseases and complex traits has been accelerated through the use of genome-wide association studies (GWAS). The availability of existing large collections of linkage data paved the way for the use of family-based GWAS. Although most published GWAS used population-based designs, family-based designs have played an important role, particularly in replication stages. Family-based designs offer advantages in terms of quality control, the robustness to population stratification and the ability to perform genetic analyses that cannot be achieved using a sample of unrelated individuals, such as testing for the effect of imprinted genes on phenotypes, testing whether a genetic variant is inherited or de novo and combined linkage and association analysis.
Collapse
Affiliation(s)
- Beben Benyamin
- Queensland Statistical Genetics Laboratory, Queensland Institute of Medical Research, 300 Herston Road, Brisbane, QLD 4029, Australia
| | - Peter M Visscher
- Queensland Statistical Genetics Laboratory, Queensland Institute of Medical Research, 300 Herston Road, Brisbane, QLD 4029, Australia
| | - Allan F McRae
- Queensland Statistical Genetics Laboratory, Queensland Institute of Medical Research, 300 Herston Road, Brisbane, QLD 4029, Australia
| |
Collapse
|
36
|
Hodge JC, T Cuenco K, Huyck KL, Somasundaram P, Panhuysen CIM, Stewart EA, Morton CC. Uterine leiomyomata and decreased height: a common HMGA2 predisposition allele. Hum Genet 2009; 125:257-63. [PMID: 19132395 DOI: 10.1007/s00439-008-0621-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2008] [Accepted: 12/25/2008] [Indexed: 11/30/2022]
Abstract
Uterine leiomyomata (UL) are the most common female pelvic tumors and the primary indication for hysterectomy in the United States. We assessed genetic liability for UL by a known embryonic proliferation modulator, HMGA2, in 248 families ascertained through medical record-confirmed affected sister-pairs. Using a (TC)( n ) repeat in the 5' UTR and 17 SNPs spanning HMGA2, permutation-based association tests identified a significant increase in transmission of a single TC repeat allele (TC227) with UL (allele-specific P = 0.00005, multiple testing corrected min-P = 0.0049). The hypothesis that TC227 is a pathogenic variant is supported by a trend towards higher HMGA2 expression in TC227 allele-positive compared with non-TC227 UL tissue as well as by absence of culpable exonic sequence variants. HMGA2 has also been suggested recently by three genome-wide SNP studies to influence human height variation, and our examination of the affected sister-pair families revealed a significant association of TC227 with decreased height (allele-specific P = 0.00033, multiple testing corrected min-P = 0.016). Diminished stature and elevated risk of UL development have both been correlated with an earlier age of menarche, which may be the biological mechanism for TC227 effects as a tendency of women with TC227 to have an earlier onset of menarche was identified in our study population. These results indicate HMGA2 has a role in two growth-related phenotypes, UL predisposition and height, of which the former may affect future medical management decisions for many women.
Collapse
Affiliation(s)
- Jennelle C Hodge
- Department of Obstetrics, Gynecology and Reproductive Biology, Brigham and Women's Hospital, Boston, MA 02115, USA.
| | | | | | | | | | | | | |
Collapse
|
37
|
Applications of Linkage Disequilibrium and Association Mapping in Maize. MOLECULAR GENETIC APPROACHES TO MAIZE IMPROVEMENT 2008. [DOI: 10.1007/978-3-540-68922-5_13] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
38
|
Neale BM, Purcell S. The positives, protocols, and perils of genome-wide association. Am J Med Genet B Neuropsychiatr Genet 2008; 147B:1288-94. [PMID: 18500721 DOI: 10.1002/ajmg.b.30747] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Genome-wide association aims to comprehensively survey genetic variation for the purposes of disease and trait mapping. We provide a brief history of the development of genetic technology necessary to realize genome-wide association. From there we identify and review the publicly available resources for conducting such work including the molecular technologies, genomic databases, and analytic tools. Following on from the analytic tools, we highlight common analytic considerations, ranging from study design, quality control, and data cleaning to association analysis and replication. We conclude with a look toward future developments such as the analysis of copy number variation and integration of expression and epigenetic phenomenon into genome-wide association.
Collapse
Affiliation(s)
- Benjamin M Neale
- Social, Genetic, and Developmental Psychiatry Centre, Institute of Psychiatry, King's College London, De Crespigny Park, London, UK.
| | | |
Collapse
|
39
|
A review of family-based tests for linkage disequilibrium between a quantitative trait and a genetic marker. PLoS Genet 2008; 4:e1000180. [PMID: 18818728 PMCID: PMC2528965 DOI: 10.1371/journal.pgen.1000180] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Quantitative trait transmission/disequilibrium tests (quantitative TDTs) are commonly used in family-based genetic association studies of quantitative traits. Despite the availability of various quantitative TDTs, some users are not aware of the properties of these tests and the relationships between them. This review aims at outlining the broad features of the various quantitative TDT procedures carried out in the frequently used QTDT and FBAT packages. Specifically, we discuss the “Rabinowitz” and the “Monks-Kaplan” procedures, as well as the various “Abecasis” and “Allison” regression-based procedures. We focus on the models assumed in these tests and the relationships between them. Moreover, we discuss what hypotheses are tested by the various quantitative TDTs, what testing procedures are best suited to various forms of data, and whether the regression-based tests overcome population stratification problems. Finally, we comment on power considerations in the choice of the test to be used. We hope this brief review will shed light on the similarities and differences of the various quantitative TDTs.
Collapse
|
40
|
Ewens WJ, Spielman RS, Kaplan NL, Gao X, Morris RW, Martin ER. Disease Associations and Family‐Based Tests. ACTA ACUST UNITED AC 2008; Chapter 1:Unit 1.12. [DOI: 10.1002/0471142905.hg0112s58] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
| | | | - Norman L. Kaplan
- National Institute of Environmental Health Sciences Research Triangle Park North Carolina
| | - Xiaoyi Gao
- Miami Institute for Human Genomics Miami Florida
| | | | | |
Collapse
|
41
|
Laird NM, Lange C. Family-based methods for linkage and association analysis. ADVANCES IN GENETICS 2008; 60:219-52. [PMID: 18358323 DOI: 10.1016/s0065-2660(07)00410-5] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Traditional epidemiological study concepts such as case-control or cohort designs can be used in the design of genetic association studies, giving them a prominent role in genetic association analysis. A different class of designs based on related individuals, typically families, uses the concept of Mendelian transmission to achieve design-independent randomization, which permits the testing of linkage and association. Family-based designs require specialized analytic methods but they have distinct advantages: They are robust to confounding and variance inflation, which can arise in standard designs in the presence of population substructure; they test for both linkage and association; and they offer a natural solution to the multiple comparison problem. This chapter focuses on family-based designs. We describe some basic study designs as well as general approaches to analysis for qualitative, quantitative, and complex traits. Finally, we review available software.
Collapse
Affiliation(s)
- Nan M Laird
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| | | |
Collapse
|
42
|
Li YW, Martin ER, Li YJ. EMK: A Novel Program for Family-Based Allelic and Genotypic Association Tests on Quantitative Traits. Ann Hum Genet 2008; 72:388-96. [DOI: 10.1111/j.1469-1809.2008.00432.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
43
|
Cupples LA. Family study designs in the age of genome-wide association studies: experience from the Framingham Heart Study. Curr Opin Lipidol 2008; 19:144-50. [PMID: 18388694 DOI: 10.1097/mol.0b013e3282f73746] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
PURPOSE OF REVIEW The past year has seen the publication of many genome-wide association studies, most of which are case-control studies. These publications are at the forefront of current research into the examination of genetic effects for numerous diseases, including diabetes, heart disease and cancer. Over the past 25 years the tour de force of genetics research has been in family studies, using segregation, linkage and association analyses. Are these approaches now passé? Here we discuss the role of family studies in modern genetics research, using results from the Framingham Heart Study as examples. RECENT FINDINGS Family studies permit both linkage and association analyses. Importantly, family-based association tests that consider transmission of genetic variants within a family provide important information on the genetic etiology of disease traits and avoid the potential of false-positive findings due to population substructure. SUMMARY Family-based study designs continue to contribute much to the modern era of genome-wide association studies.
Collapse
Affiliation(s)
- L Adrienne Cupples
- Boston University School of Public Health, Boston, Massachusetts 02118, USA.
| |
Collapse
|
44
|
Tiwari HK, Barnholtz-Sloan J, Wineinger N, Padilla MA, Vaughan LK, Allison DB. Review and evaluation of methods correcting for population stratification with a focus on underlying statistical principles. Hum Hered 2008; 66:67-86. [PMID: 18382087 PMCID: PMC2803696 DOI: 10.1159/000119107] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
When two or more populations have been separated by geographic or cultural boundaries for many generations, drift, spontaneous mutations, differential selection pressures and other factors may lead to allele frequency differences among populations. If these 'parental' populations subsequently come together and begin inter-mating, disequilibrium among linked markers may span a greater genetic distance than it typically does among populations under panmixia [see glossary]. This extended disequilibrium can make association studies highly effective and more economical than disequilibrium mapping in panmictic populations since less marker loci are needed to detect regions of the genome that harbor phenotype-influencing loci. However, under some circumstances, this process of intermating (as well as other processes) can produce disequilibrium between pairs of unlinked loci and thus create the possibility of confounding or spurious associations due to this population stratification. Accordingly, researchers are advised to employ valid statistical tests for linkage disequilibrium mapping allowing conduct of genetic association studies that control for such confounding. Many recent papers have addressed this need. We provide a comprehensive review of advances made in recent years in correcting for population stratification and then evaluate and synthesize these methods based on statistical principles such as (1) randomization, (2) conditioning on sufficient statistics, and (3) identifying whether the method is based on testing the genotype-phenotype covariance (conditional upon familial information) and/or testing departures of the marginal distribution from the expected genotypic frequencies.
Collapse
Affiliation(s)
- Hemant K Tiwari
- Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
| | | | | | | | | | | |
Collapse
|
45
|
Yang Y, Wise CA, Gordon D, Finch SJ. A family-based likelihood ratio test for general pedigree structures that allows for genotyping error and missing data. Hum Hered 2008; 66:99-110. [PMID: 18382089 DOI: 10.1159/000119109] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The purpose of this work is the development of a family-based association test that allows for random genotyping errors and missing data and makes use of information on affected and unaffected pedigree members. We derive the conditional likelihood functions of the general nuclear family for the following scenarios: complete parental genotype data and no genotyping errors; only one genotyped parent and no genotyping errors; no parental genotype data and no genotyping errors; and no parental genotype data with genotyping errors. We find maximum likelihood estimates of the marker locus parameters, including the penetrances and population genotype frequencies under the null hypothesis that all penetrance values are equal and under the alternative hypothesis. We then compute the likelihood ratio test. We perform simulations to assess the adequacy of the central chi-square distribution approximation when the null hypothesis is true. We also perform simulations to compare the power of the TDT and this likelihood-based method. Finally, we apply our method to 23 SNPs genotyped in nuclear families from a recently published study of idiopathic scoliosis (IS). Our simulations suggest that this likelihood ratio test statistic follows a central chi-square distribution with 1 degree of freedom under the null hypothesis, even in the presence of missing data and genotyping errors. The power comparison shows that this likelihood ratio test is more powerful than the original TDT for the simulations considered. For the IS data, the marker rs7843033 shows the most significant evidence for our method (p = 0.0003), which is consistent with a previous report, which found rs7843033 to be the 2nd most significant TDTae p value among a set of 23 SNPs.
Collapse
Affiliation(s)
- Yang Yang
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA
| | | | | | | |
Collapse
|
46
|
Zhu X, Li S, Cooper RS, Elston RC. A unified association analysis approach for family and unrelated samples correcting for stratification. Am J Hum Genet 2008; 82:352-65. [PMID: 18252216 DOI: 10.1016/j.ajhg.2007.10.009] [Citation(s) in RCA: 106] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2007] [Revised: 10/05/2007] [Accepted: 10/09/2007] [Indexed: 10/22/2022] Open
Abstract
There are two common designs for association mapping of complex diseases: case-control and family-based designs. A case-control sample is more powerful to detect genetic effects than a family-based sample that contains the same numbers of affected and unaffected persons, although additional markers may be required to control for spurious association. When family and unrelated samples are available, statistical analyses are often performed in the family and unrelated samples separately, conditioning on parental information for the former, thus resulting in reduced power. In this report, we propose a unified approach that can incorporate both family and case-control samples and, provided the additional markers are available, at the same time corrects for population stratification. We apply the principal components of a marker matrix to adjust for the effect of population stratification. This unified approach makes it unnecessary to perform a conditional analysis of the family data and is more powerful than the separate analyses of unrelated and family samples, or a meta-analysis performed by combining the results of the usual separate analyses. This property is demonstrated in both a variety of simulation models and empirical data. The proposed approach can be equally applied to the analysis of both qualitative and quantitative traits.
Collapse
|
47
|
Falchi M. Analysis of quantitative trait loci. Methods Mol Biol 2008; 453:297-326. [PMID: 18712311 DOI: 10.1007/978-1-60327-429-6_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Diseases with complex inheritance are characterized by multiple genetic and environmental factors that often interact to produce clinical symptoms. In addition, etiological heterogeneity (different risk factors causing similar phenotypes) obscure the inheritance pattern among affected relatives and hamper the feasibility of gene-mapping studies. For such diseases, the careful selection of quantitative phenotypes that may represent intermediary risk factors for disease development (intermediate phenotypes) is etiologically more homogeneous than the disease per se. Over the last 15 years quantitative trait locus mapping has become a popular method for understanding the genetic basis for intermediate phenotypes. This chapter provides an introduction to classical and recent strategies for mapping quantitative trait loci in humans.
Collapse
Affiliation(s)
- Mario Falchi
- Twin Research and Genetic Epidemiology Unit, King's College London School of Medicine, London, United Kingdom
| |
Collapse
|
48
|
Roy-Gagnon MH, Mathias RA, Fallin MD, Jee SH, Broman KW, Wilson AF. An extension of the regression of offspring on mid-parent to test for association and estimate locus-specific heritability: the revised ROMP method. Ann Hum Genet 2007; 72:115-25. [PMID: 18042270 DOI: 10.1111/j.1469-1809.2007.00401.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The Regression of Offspring on Mid-Parent (ROMP) method is a test of association between a quantitative trait and a candidate locus. ROMP estimates the trait heritability and the heritability attributable to a locus and requires genotyping the offspring only. In this study, the theory underlying ROMP was revised (ROMP(rev)) and extended. Computer simulations were used to determine the type I error and power of the test of association, and the accuracy of the locus-specific heritability estimate. The ROMP(rev) test had good power at the 5% significance level with properly controlled type I error. Locus-specific heritability estimates were, on average, close to simulated values. For non-zero locus-specific heritability, the proposed standard error was downwardly biased, yielding reduced coverage of 95% confidence intervals. A bootstrap approach with proper coverage is suggested as a second step for loci of interest. ROMP(rev) was applied to a study of cardiovascular-related traits to illustrate its use. An association between polymorphisms within the fibrinogen gene cluster and plasma fibrinogen was detected (p < 0.005) that accounted for 29% of the estimated fibrinogen heritability. The ROMP(rev) method provides a computationally fast and simple way of testing for association and obtaining accurate estimates of locus-specific heritability while minimizing the genotyping required.
Collapse
Affiliation(s)
- M-H Roy-Gagnon
- Genometrics Section, Inherited Disease Research Branch, National Human Genome Research Institute, NIH, Baltimore, MD 21224, USA
| | | | | | | | | | | |
Collapse
|
49
|
Chen WM, Abecasis GR. Family-based association tests for genomewide association scans. Am J Hum Genet 2007; 81:913-26. [PMID: 17924335 PMCID: PMC2265659 DOI: 10.1086/521580] [Citation(s) in RCA: 345] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2007] [Accepted: 07/11/2007] [Indexed: 01/20/2023] Open
Abstract
With millions of single-nucleotide polymorphisms (SNPs) identified and characterized, genomewide association studies have begun to identify susceptibility genes for complex traits and diseases. These studies involve the characterization and analysis of very-high-resolution SNP genotype data for hundreds or thousands of individuals. We describe a computationally efficient approach to testing association between SNPs and quantitative phenotypes, which can be applied to whole-genome association scans. In addition to observed genotypes, our approach allows estimation of missing genotypes, resulting in substantial increases in power when genotyping resources are limited. We estimate missing genotypes probabilistically using the Lander-Green or Elston-Stewart algorithms and combine high-resolution SNP genotypes for a subset of individuals in each pedigree with sparser marker data for the remaining individuals. We show that power is increased whenever phenotype information for ungenotyped individuals is included in analyses and that high-density genotyping of just three carefully selected individuals in a nuclear family can recover >90% of the information available if every individual were genotyped, for a fraction of the cost and experimental effort. To aid in study design, we evaluate the power of strategies that genotype different subsets of individuals in each pedigree and make recommendations about which individuals should be genotyped at a high density. To illustrate our method, we performed genomewide association analysis for 27 gene-expression phenotypes in 3-generation families (Centre d'Etude du Polymorphisme Humain pedigrees), in which genotypes for ~860,000 SNPs in 90 grandparents and parents are complemented by genotypes for ~6,700 SNPs in a total of 168 individuals. In addition to increasing the evidence of association at 15 previously identified cis-acting associated alleles, our genotype-inference algorithm allowed us to identify associated alleles at 4 cis-acting loci that were missed when analysis was restricted to individuals with the high-density SNP data. Our genotype-inference algorithm and the proposed association tests are implemented in software that is available for free.
Collapse
Affiliation(s)
- Wei-Min Chen
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA.
| | | |
Collapse
|
50
|
Feng T, Zhang S, Sha Q. Two-stage association tests for genome-wide association studies based on family data with arbitrary family structure. Eur J Hum Genet 2007; 15:1169-75. [PMID: 17653107 DOI: 10.1038/sj.ejhg.5201902] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Recently, Steen et al proposed a two-stage approach for genome-wide family-based association studies. In the first stage, a screening test is used to select markers, and in the second stage, a family-based association test is performed on a much smaller set of the selected markers. The two-stage method can be much more powerful than the traditional family-based association tests. In this article, we extend the approach so that it can incorporate parental information and can be applied to an arbitrary pedigree structure. We use simulation studies to evaluate the type I error rates and the power of the proposed methods. Our results show that the two-stage approach that incorporates founders' phenotypes has the correct type I error rates, and is much more powerful than the two-stage approach that uses children's phenotypes only. Also, by carefully choosing the number of markers retained in the first stage, the power of a two-stage approach can be much more than that of the corresponding one-stage approach.
Collapse
Affiliation(s)
- Tao Feng
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | | | | |
Collapse
|