1
|
Gordon D, Hoh J, Finch SJ, Levenstien MA, Edington J, Li W, Majewski J, Ott J. Two Approaches for Consolidating Results from Genome Scans of Complex Traits: Selection Methods and Scan Statistics. Genet Epidemiol 2017. [DOI: 10.1002/gepi.2001.21.s1.s403] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Derek Gordon
- The Rockefeller University; New York City, New York
| | | | | | | | | | - Wentian Li
- The Rockefeller University; New York City, New York
| | | | - Jürg Ott
- The Rockefeller University; New York City, New York
| |
Collapse
|
2
|
Wang KS, Mullersman JE, Liu XF. Family-based association analysis of the MAPT gene in Parkinson disease. J Appl Genet 2011; 51:509-14. [PMID: 21063069 DOI: 10.1007/bf03208881] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The MAPT gene has been shown to be associated with several neurodegenerative disorders, including forms of parkinsonism and Parkinson disease (PD), but the results reveal population differences. We investigated the association of 10 single-nucleotide polymorphisms (SNPs) in the region of MAPT on chromosome 17q21 with PD and age at onset, by using 443 discordant sib pairs in PD from a public dataset (Mayo-Perlegen LEAPS Collaboration). Association with PD was assessed by the FBAT using generalized estimating equations (FBAT-GEE), while the association with age at onset as a quantitative trait was evaluated using the FBAT-logrank statistic. Five SNPs were significantly associated with PD (P < 0.05) in an additive model, and 9 SNPs were associated with PD (P < 0.05) in dominant and recessive models. Interestingly, 8 PD-associated SNPs were also associated with age at onset of PD (P < 0.05) in dominant and recessive models. The SNP most significantly associated with PD and age at onset was rs17649641 (P = 0.015 and 0.021, respectively). Two-SNP haplotypes inferred from rs17563965 and rs17649641 also showed association with PD (P = 0.018) and age at onset (P = 0.026). These results provide further support for the role of MAPT in development of PD.
Collapse
Affiliation(s)
- K S Wang
- Department of Biostatistics and Epidemiology, College of Public Health, East Tennessee State University, PO Box 70259, Lamb Hall, Johnson City, TN 37614-1700, USA.
| | | | | |
Collapse
|
3
|
Moerkerke B, Vansteelandt S, Lange C. A doubly robust test for gene-environment interaction in family-based studies of affected offspring. Biostatistics 2010; 11:213-25. [PMID: 20154305 DOI: 10.1093/biostatistics/kxp061] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We develop a locally efficient test for (multiplicative) gene-environment interaction in family studies that collect genotypic information and environmental exposures for affected offspring along with genotypic information for their parents or relatives. The proposed test does not require modeling the effects of environmental exposures and is doubly robust in the sense of being valid if either a model for the main genetic effect holds or a model for the expected environmental exposure (given the offspring affection status and parental mating types) but not necessarily both. It extends the FBAT-I to allow for missing parental mating types and families of arbitrary size. Simulation studies and the analysis of an Alzheimer's disease study confirm the adequate performance of the proposed test.
Collapse
Affiliation(s)
- Beatrijs Moerkerke
- Department of Data Analysis, Ghent University, Henri Dunantlaan 1, Ghent, Belgium.
| | | | | |
Collapse
|
4
|
Murphy A, T Weiss S, Lange C. Two-stage testing strategies for genome-wide association studies in family-based designs. Methods Mol Biol 2010; 620:485-496. [PMID: 20652517 DOI: 10.1007/978-1-60761-580-4_17] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The analysis of genome-wide association studies (GWAS) poses statistical hurdles that have to be handled efficiently in order for the study to be successful. The two largest impediments in the analysis phase of the study are the multiple comparisons problem and maintaining robustness against confounding due to population admixture and stratification. For quantitative traits in family-based designs, Van Steen (1) proposed a two-stage testing strategy that can be considered a hybrid approach between family-based and population-based analysis. By including the population-based component into the family-based analysis, the Van Steen algorithm maximizes the statistical power, while at the same time, maintains the original robustness of family-based association tests (FBATs) (2-4). The Van Steen approach consists of two statistically independent steps, a screening step and a testing step. For all genotyped single nucleotide polymorphisms (SNPs), the screening step examines the evidence for association at a population-based level. Based on support for a potential genetic association from the screening step, the SNPs are prioritized for testing in the next step, where they are analyzed with a FBAT (3). By exploiting population-based information in the screening step that is not utilized in family-based association testing step, the two steps are statistically independent. Therefore, the use of the population-based data for the purposes of screening does not bias the FBAT statistic calculated in the testing step. Depending on the trait type and the ascertainment conditions, Van Steen-type testing strategies can achieve statistical power levels that are comparable to those of population-based studies with the same number of probands. In this chapter, we review the original Van Steen algorithm, its numerous extensions, and discuss its advantages and disadvantages.
Collapse
Affiliation(s)
- Amy Murphy
- Channing Laboratory, Center for Genomic Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | | | | |
Collapse
|
5
|
Li M, Reilly C, Hanson T. Association Tests for a Censored Quantitative Trait and Candidate Genes in Structured Populations with Multilevel Genetic Relatedness. Biometrics 2009; 66:925-33. [DOI: 10.1111/j.1541-0420.2009.01352.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
6
|
Voineskos D, De Luca V, Macgregor S, Likhodi O, Miller L, Voineskos AN, Kennedy JL. Neuregulin 1 and age of onset in the major psychoses. J Neural Transm (Vienna) 2009; 116:479-86. [PMID: 19184335 DOI: 10.1007/s00702-008-0182-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2008] [Accepted: 12/29/2008] [Indexed: 10/21/2022]
Abstract
Genetic vulnerability to psychiatric illness extends across major psychiatric illness. Neuregulin 1 (NRG1) is a large gene on chromosome 8p, that has been identified as a susceptibility factor in bipolar disorder and schizophrenia. In particular, a core at risk haplotype has received considerable attention for a putative role in the pathophysiology of the major psychoses (schizophrenia and bipolar disorder). This core haplotype can be represented by three markers 478B14-848, 420M9-1395, and SNP8NRG221533. We genotyped 312 families with bipolar probands, and 120 families with schizophrenia probands. Association of the core haplotype was tested for with age-at-onset and with three phenotypes: major psychosis, schizophrenia, and bipolar disorder. Neither age of onset (P = 0.893) nor the major psychosis phenotype (P = 0.374) was associated with the core haplotype in the overall sample. Ours was the first study to investigate the NRG1 core haplotype with age of onset of major psychoses, and despite our preliminary negative findings, this area deserves further investigation.
Collapse
Affiliation(s)
- Daphne Voineskos
- Centre for Addiction and Mental Health, 250 College Street, R-30, Toronto, ON, M5T 1R8, Canada
| | | | | | | | | | | | | |
Collapse
|
7
|
Lasky-Su J, Biederman J, Laird N, Tsuang M, Doyle AE, Smoller JW, Lange C, Faraone SV. Evidence for an Association of the Dopamine D5 Receptor Gene on Age at Onset of Attention Deficit Hyperactivity Disorder. Ann Hum Genet 2007; 71:648-59. [PMID: 17501935 DOI: 10.1111/j.1469-1809.2007.00366.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The purpose of this study was to determine whether the single nucleotide polymorphisms (SNPs) within candidate genes for attention deficit hyperactivity disorder (ADHD) are associated with the age at onset for ADHD. One hundred and forty-three SNPs were genotyped across five candidate genes (DRD5, SLC6A3, HTR1B, SNAP25, DRD4) for ADHD in 229 families with at least one affected offspring. SNPs with the highest estimated power to detect an association with age at onset were selected for each candidate gene, using a power-based screening procedure that does not compromise the nominal significance level. A time-to-onset analysis for family-based samples was performed on these SNPs to determine if an association exists with age at onset for ADHD. Seven consecutive SNPs surrounding the D5 dopamine receptor gene (DRD5), were associated with the age at onset for ADHD; FDR adjusted q-values ranged from 0.008 to 0.023. This analysis indicates that individuals with the risk genotype develop ADHD earlier than individuals with any other genotype. A haplotype analysis across the 6 significant SNPs that were in linkage disequilibrium with one another, CTCATA, was also found to be significant (p-value = 0.02). We did not observe significant associations with age at onset for the other candidate loci tested. Although definitive conclusions await independent replication, these results suggest that a variant in DRD5 may affect age at onset for ADHD.
Collapse
Affiliation(s)
- J Lasky-Su
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA
| | | | | | | | | | | | | | | |
Collapse
|
8
|
Diao G, Lin DY. Semiparametric variance-component models for linkage and association analyses of censored trait data. Genet Epidemiol 2006; 30:570-81. [PMID: 16858699 DOI: 10.1002/gepi.20168] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Variance-component (VC) models are widely used for linkage and association mapping of quantitative trait loci in general human pedigrees. Traditional VC methods assume that the trait values within a family follow a multivariate normal distribution and are fully observed. These assumptions are violated if the trait data contain censored observations. When the trait pertains to age at onset of disease, censoring is inevitable because of loss to follow-up and limited study duration. Censoring also arises when the trait assay cannot detect values below (or above) certain thresholds. The latent trait values tend to have a complex distribution. Applying traditional VC methods to censored trait data would inflate type I error and reduce power. We present valid and powerful methods for the linkage and association analyses of censored trait data. Our methods are based on a novel class of semiparametric VC models, which allows an arbitrary distribution for the latent trait values. We construct appropriate likelihood for the observed data, which may contain left or right censored observations. The maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. We develop stable and efficient numerical algorithms to implement the corresponding inference procedures. Extensive simulation studies demonstrate that the proposed methods outperform the existing ones in practical situations. We provide an application to the age at onset of alcohol dependence data from the Collaborative Study on the Genetics of Alcoholism. A computer program is freely available.
Collapse
Affiliation(s)
- G Diao
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | | |
Collapse
|
9
|
Mueller PW, Rogus JJ, Cleary PA, Zhao Y, Smiles AM, Steffes MW, Bucksa J, Gibson TB, Cordovado SK, Krolewski AS, Nierras CR, Warram JH. Genetics of Kidneys in Diabetes (GoKinD) study: a genetics collection available for identifying genetic susceptibility factors for diabetic nephropathy in type 1 diabetes. J Am Soc Nephrol 2006; 17:1782-90. [PMID: 16775037 PMCID: PMC2770870 DOI: 10.1681/asn.2005080822] [Citation(s) in RCA: 106] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The Genetics of Kidneys in Diabetes (GoKinD) study is an initiative that aims to identify genes that are involved in diabetic nephropathy. A large number of individuals with type 1 diabetes were screened to identify two subsets, one with clear-cut kidney disease and another with normal renal status despite long-term diabetes. Those who met additional entry criteria and consented to participate were enrolled. When possible, both parents also were enrolled to form family trios. As of November 2005, GoKinD included 3075 participants who comprise 671 case singletons, 623 control singletons, 272 case trios, and 323 control trios. Interested investigators may request the DNA collection and corresponding clinical data for GoKinD participants using the instructions and application form that are available at http://www.gokind.org/access. Participating scientists will have access to three data sets, each with distinct advantages. The set of 1294 singletons has adequate power to detect a wide range of genetic effects, even those of modest size. The set of case trios, which has adequate power to detect effects of moderate size, is not susceptible to false-positive results because of population substructure. The set of control trios is critical for excluding certain false-positive results that can occur in case trios and may be particularly useful for testing gene-environment interactions. Integration of the evidence from these three components into a single, unified analysis presents a challenge. This overview of the GoKinD study examines in detail the power of each study component and discusses analytic challenges that investigators will face in using this resource.
Collapse
Affiliation(s)
- Patricia W. Mueller
- Centers for Disease Control and Prevention, Diabetes and Molecular Risk Assessment Laboratory, Atlanta, Georgia
| | - John J. Rogus
- Research Division, Joslin Diabetes Center, Boston, Massachusetts
| | | | - Yuan Zhao
- George Washington University Biostatistics Center, Washington, DC
| | - Adam M. Smiles
- Research Division, Joslin Diabetes Center, Boston, Massachusetts
| | | | - Jean Bucksa
- University of Minnesota, Minneapolis, Minnesota
| | | | - Suzanne K. Cordovado
- Centers for Disease Control and Prevention, Diabetes and Molecular Risk Assessment Laboratory, Atlanta, Georgia
| | | | | | - James H. Warram
- Research Division, Joslin Diabetes Center, Boston, Massachusetts
| |
Collapse
|
10
|
Laird NM, Lange C. Family-based designs in the age of large-scale gene-association studies. Nat Rev Genet 2006; 7:385-94. [PMID: 16619052 DOI: 10.1038/nrg1839] [Citation(s) in RCA: 328] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Both population-based and family-based designs are commonly used in genetic association studies to locate genes that underlie complex diseases. The simplest version of the family-based design--the transmission disequilibrium test--is well known, but the numerous extensions that broaden its scope and power are less widely appreciated. Family-based designs have unique advantages over population-based designs, as they are robust against population admixture and stratification, allow both linkage and association to be tested for and offer a solution to the problem of model building. Furthermore, the fact that family-based designs contain both within- and between-family information has substantial benefits in terms of multiple-hypothesis testing, especially in the context of whole-genome association studies.
Collapse
Affiliation(s)
- Nan M Laird
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts 02115, USA.
| | | |
Collapse
|
11
|
Jiang H, Harrington D, Raby BA, Bertram L, Blacker D, Weiss ST, Lange C. Family-based association test for time-to-onset data with time-dependent differences between the hazard functions. Genet Epidemiol 2006; 30:124-32. [PMID: 16374805 DOI: 10.1002/gepi.20132] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In genetic association studies, the differences between the hazard functions for the individual genotypes are often time-dependent. We address the non-proportional hazards data by using the weighted logrank approach by Fleming and Harrington [1981]:Commun Stat-Theor M 10:763-794. We introduce a weighted FBAT-Logrank whose weights are based on a non-parametric estimator for the genetic marker distribution function under the alternative hypothesis. We show that the computation of the marker distribution under the alternative does not bias the significance level of any subsequently computed FBAT-statistic. Hence, we use the estimated marker distribution to select the Fleming-Harrington weights so that the power of the weighted FBAT-Logrank test is maximized. In simulation studies and applications to an asthma study, we illustrate the practical relevance of the new methodology. In addition to power increases of 100% over the original FBAT-Logrank test, we also gain insight into the age at which a genotype exerts the greatest influence on disease risk.
Collapse
Affiliation(s)
- Hongyu Jiang
- Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA.
| | | | | | | | | | | | | |
Collapse
|
12
|
Kraft P, Thomas DC. Case-sibling gene-association studies for diseases with variable age at onset. Stat Med 2005; 23:3697-712. [PMID: 15534888 DOI: 10.1002/sim.1722] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Studies which compare cases to disease-free siblings are useful for assessing association between a genetic locus and a phenotypic trait, as they eliminate the possibility of confounding by population stratification. Many analytic methods for such family-based studies are based on a binary disease model. However, complex diseases have variable age at onset. Consequently, binary-outcome methods can be inefficient or biased. We review methods for analysing censored age-at-onset data from family studies, including stratified Cox regression and genotype-decomposition regression, an unstratified procedure which regresses age-at-onset on between- and within-family genotype components. We also introduce a retrospective likelihood for censored age-at-onset data, which requires an external estimate of the baseline hazard. Stratified Cox regression does not use controls who have not attained the age of their case sibling(s), potentially leading to a loss of efficiency. Both genotype-decomposition regression and the retrospective likelihood use these younger controls. We assess the performance of these methods via simulation studies. Stratified Cox regression and the retrospective likelihood have appropriate type I error rates in almost all situations studied; genotype-decomposition regression is often anti-conservative. Away from the null, confidence intervals for the relative risk derived from stratified Cox regression are anti-conservative when the disease is rare and case-rich families are sampled. The retrospective likelihood is more efficient than stratified Cox regression and its confidence intervals have correct coverage when the disease is rare or the estimate of the baseline hazard is reasonably accurate. These results suggest that when estimating genotype relative risks is the principal analytic goal, stratified Cox regression is appropriate as long as the disease is common; when the disease is rare, the retrospective likelihood may be more appropriate.
Collapse
Affiliation(s)
- Peter Kraft
- Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA.
| | | |
Collapse
|
13
|
Lange C, Lyon H, DeMeo D, Raby B, Silverman EK, Weiss ST. A new powerful non-parametric two-stage approach for testing multiple phenotypes in family-based association studies. Hum Hered 2004; 56:10-7. [PMID: 14614234 DOI: 10.1159/000073728] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2003] [Accepted: 06/03/2003] [Indexed: 11/19/2022] Open
Abstract
We introduce a new powerful nonparametric testing strategy for family-based association studies in which multiple quantitative traits are recorded and the phenotype with the strongest genetic component is not known prior to the analysis. In the first stage, using a population-based test based on the generalized estimating equation approach, we test all recorded phenotypes for association with the marker locus without biasing the nominal significance level of the later family-based analysis. In the second stage the phenotype with the smallest p value is selected and tested by a family-based association test for association with the marker locus. This strategy is robust against population admixture and stratification and does not require any adjustment for multiple testing. We demonstrate the advantages of this testing strategy over standard methodology in a simulation study. The practical importance of our testing strategy is illustrated by applications to the Childhood Asthma Management Program asthma data sets.
Collapse
Affiliation(s)
- Christoph Lange
- Department of Biostatistics, Harvard School of Public Health, Boston, Mass. 02115, USA.
| | | | | | | | | | | |
Collapse
|
14
|
Rogus JJ, Warram JH, Krolewski AS. Genetic studies of late diabetic complications: the overlooked importance of diabetes duration before complication onset. Diabetes 2002; 51:1655-62. [PMID: 12031950 DOI: 10.2337/diabetes.51.6.1655] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Genes play a role in many processes underlying late diabetic complications, but efforts to identify genetic variants have produced disappointing and contradictory results. Here, we evaluate whether the study designs and analytic methods commonly being used are optimal for finding susceptibility genes for diabetic complications. We do so by generating plausible genetic models and assessing the performance of case-control and family-based trio study designs. What emerges as a key determinant of success is duration of diabetes. This perspective focuses on duration of diabetes before complication onset and its influence on the ability to detect major and minor gene effects. It does not delve into the distinct effect of duration after complication onset, which can enrich case subjects with genotypes conferring survival advantage. We use clinically diagnosed nephropathy in type 1 diabetes to show how ignoring duration can result in considerable power loss in both case-control and family-based trio designs. We further show how, under certain circumstances, disregard for duration information can paradoxically lead to implicating nonrisk alleles as causative. Our results indicate that problems can be minimized by selecting case subjects with short diabetes duration and, to a lesser extent, control subjects with long duration or, perhaps, by adjusting for duration during analysis.
Collapse
Affiliation(s)
- John J Rogus
- Research Division, Joslin Diabetes Center, Boston, Massachusetts 02215-5397, USA.
| | | | | |
Collapse
|
15
|
Abstract
We use likelihood-based score statistics to test for association between a disease and a diallelic polymorphism, based on data from arbitrary types of nuclear families. The Nonfounder statistic extends the transmission disequilibrium test (TDT) to accommodate affected and unaffected offspring, missing parental genotypes, phenotypes more general than qualitative traits, such as censored survival data and quantitative traits, and residual correlation of phenotypes within families. The Founder statistic compares observed or inferred parental genotypes to those expected in the general population. Here the genotypes of affected parents and those with many affected offspring are weighted more heavily than unaffected parents and those with few affected offspring. We illustrate the tests by applying them to data on a polymorphism of the SRD5A2 gene in nuclear families with multiple cases of prostate cancer. We also use simulations to compare the power of these family-based statistics to that of the score statistic based on Cox's partial likelihood for censored survival data, and find that the family-based statistics have considerably more power when there are many untyped parents. The software program FGAP for computing test statistics is available at http://www.stanford.edu/dept/HRP/epidemiology/FGAP.
Collapse
Affiliation(s)
- Mei-Chiung Shih
- Department of Health Research and Policy, Stanford University, Stanford, California 94305, USA
| | | |
Collapse
|
16
|
Abstract
In this paper we describe various study designs and analytic techniques for testing the joint hypothesis that a genetic marker is both linked to and associated with a quantitative phenotype. Issues of power and sampling are addressed. The distinction between methods that explicitly examine association and those that infer association by examining the distribution of allelic transmissions from a heterozygous parent is examined. Extensions to multivariate, multiallelic, and multilocus situations are addressed. Recent approaches that combine variance-components-based linkage analyses with joint tests of linkage in the presence of association for disentanglement of the linkage and association and the application of such methods to fine mapping are discussed. Finally, new classes of joint tests of linkage and association that do not require samples of related individuals are described.
Collapse
Affiliation(s)
- D B Allison
- Department of Biostatistics Section on Statistical Genetics & Clinical Nutrition Research Center, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | | |
Collapse
|