1
|
Pan D, Li Q, Jiang N, Liu A, Yu K. Robust joint analysis allowing for model uncertainty in two-stage genetic association studies. BMC Bioinformatics 2011; 12:9. [PMID: 21211060 PMCID: PMC3027114 DOI: 10.1186/1471-2105-12-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Accepted: 01/07/2011] [Indexed: 01/10/2023] Open
Abstract
Background The cost efficient two-stage design is often used in genome-wide association studies (GWASs) in searching for genetic loci underlying the susceptibility for complex diseases. Replication-based analysis, which considers data from each stage separately, often suffers from loss of efficiency. Joint test that combines data from both stages has been proposed and widely used to improve efficiency. However, existing joint analyses are based on test statistics derived under an assumed genetic model, and thus might not have robust performance when the assumed genetic model is not appropriate. Results In this paper, we propose joint analyses based on two robust tests, MERT and MAX3, for GWASs under a two-stage design. We developed computationally efficient procedures and formulas for significant level evaluation and power calculation. The performances of the proposed approaches are investigated through the extensive simulation studies and a real example. Numerical results show that the joint analysis based on the MAX3 test statistic has the best overall performance. Conclusions MAX3 joint analysis is the most robust procedure among the considered joint analyses, and we recommend using it in a two-stage genome-wide association study.
Collapse
Affiliation(s)
- Dongdong Pan
- Department of Statistics, Yunnan University, Kunming 650091, PR China
| | | | | | | | | |
Collapse
|
2
|
Nguyen TT, Pahl R, Schäfer H. Optimal robust two-stage designs for genome-wide association studies. Ann Hum Genet 2009; 73:638-51. [PMID: 19839987 DOI: 10.1111/j.1469-1809.2009.00544.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Optimal robust two-stage designs for genome-wide association studies are proposed using the maximum of the recessive, additive and dominant linear trend test statistics. These designs combine cost-saving two-stage genotyping with robustness against misspecification of the genetic model and are much more efficient than designs based on a single model specific test statistic in detecting multiple loci with different modes of inheritance. For given power of 90%, typical cost savings of 34% can be realised by increasing the total sample size by about 13% but genotyping only about half of the sample for the full marker set in the first stage and carrying forward about 0.06% of the markers to the second stage analysis. We also present robust two-stage designs providing optimal allocation of a limited budget for pre-existing samples. If a sample is available which would yield a power of 90% when fully genotyped, genotyping only half of the sample due to a limited budget will typically cause a loss of power of more than 55%. Using an optimal two-stage approach in the same sample under the same budget restrictions will limit the loss of power to less than 10%. In general, the optimal proportion of markers to be followed up in the second stage strongly depends on the cost ratio for chips and individual genotyping, while the design parameters of the optimal designs (total sample size, first stage proportion, first and second stage significance limit) do not much depend on the genetic model assumptions.
Collapse
Affiliation(s)
- Thuy Trang Nguyen
- Institute of Medical Biometry and Epidemiology, Philipps-University Marburg, Marburg, Germany
| | | | | |
Collapse
|
3
|
Thomas DC, Casey G, Conti DV, Haile RW, Lewinger JP, Stram DO. Methodological Issues in Multistage Genome-wide Association Studies. Stat Sci 2009; 24:414-429. [PMID: 20607129 DOI: 10.1214/09-sts288] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Because of the high cost of commercial genotyping chip technologies, many investigations have used a two-stage design for genome-wide association studies, using part of the sample for an initial discovery of "promising" SNPs at a less stringent significance level and the remainder in a joint analysis of just these SNPs using custom genotyping. Typical cost savings of about 50% are possible with this design to obtain comparable levels of overall type I error and power by using about half the sample for stage I and carrying about 0.1% of SNPs forward to the second stage, the optimal design depending primarily upon the ratio of costs per genotype for stages I and II. However, with the rapidly declining costs of the commercial panels, the generally low observed ORs of current studies, and many studies aiming to test multiple hypotheses and multiple endpoints, many investigators are abandoning the two-stage design in favor of simply genotyping all available subjects using a standard high-density panel. Concern is sometimes raised about the absence of a "replication" panel in this approach, as required by some high-profile journals, but it must be appreciated that the two-stage design is not a discovery/replication design but simply a more efficient design for discovery using a joint analysis of the data from both stages. Once a subset of highly-significant associations has been discovered, a truly independent "exact replication" study is needed in a similar population of the same promising SNPs using similar methods. This can then be followed by (1) "generalizability" studies to assess the full scope of replicated associations across different races, different endpoints, different interactions, etc.; (2) fine-mapping or re-sequencing to try to identify the causal variant; and (3) experimental studies of the biological function of these genes. Multistage sampling designs may be more useful at this stage, say for selecting subsets of subjects for deep re-sequencing of regions identified in the GWAS.
Collapse
Affiliation(s)
- Duncan C Thomas
- Department of Preventive Medicine, University of Southern California
| | | | | | | | | | | |
Collapse
|
4
|
Vasan RS, Glazer NL, Felix JF, Lieb W, Wild PS, Felix SB, Watzinger N, Larson MG, Smith NL, Dehghan A, Grosshennig A, Schillert A, Teumer A, Schmidt R, Kathiresan S, Lumley T, Aulchenko YS, König IR, Zeller T, Homuth G, Struchalin M, Aragam J, Bis JC, Rivadeneira F, Erdmann J, Schnabel RB, Dörr M, Zweiker R, Lind L, Rodeheffer RJ, Greiser KH, Levy D, Haritunians T, Deckers JW, Stritzke J, Lackner KJ, Völker U, Ingelsson E, Kullo I, Haerting J, O'Donnell CJ, Heckbert SR, Stricker BH, Ziegler A, Reffelmann T, Redfield MM, Werdan K, Mitchell GF, Rice K, Arnett DK, Hofman A, Gottdiener JS, Uitterlinden AG, Meitinger T, Blettner M, Friedrich N, Wang TJ, Psaty BM, van Duijn CM, Wichmann HE, Munzel TF, Kroemer HK, Benjamin EJ, Rotter JI, Witteman JC, Schunkert H, Schmidt H, Völzke H, Blankenberg S. Genetic variants associated with cardiac structure and function: a meta-analysis and replication of genome-wide association data. JAMA 2009; 302:168-78. [PMID: 19584346 PMCID: PMC2975567 DOI: 10.1001/jama.2009.978-a] [Citation(s) in RCA: 173] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
CONTEXT Echocardiographic measures of left ventricular (LV) structure and function are heritable phenotypes of cardiovascular disease. OBJECTIVE To identify common genetic variants associated with cardiac structure and function by conducting a meta-analysis of genome-wide association data in 5 population-based cohort studies (stage 1) with replication (stage 2) in 2 other community-based samples. DESIGN, SETTING, AND PARTICIPANTS Within each of 5 community-based cohorts comprising the EchoGen consortium (stage 1; n = 12 612 individuals of European ancestry; 55% women, aged 26-95 years; examinations between 1978-2008), we estimated the association between approximately 2.5 million single-nucleotide polymorphisms (SNPs; imputed to the HapMap CEU panel) and echocardiographic traits. In stage 2, SNPs significantly associated with traits in stage 1 were tested for association in 2 other cohorts (n = 4094 people of European ancestry). Using a prespecified P value threshold of 5 x 10(-7) to indicate genome-wide significance, we performed an inverse variance-weighted fixed-effects meta-analysis of genome-wide association data from each cohort. MAIN OUTCOME MEASURES Echocardiographic traits: LV mass, internal dimensions, wall thickness, systolic dysfunction, aortic root, and left atrial size. RESULTS In stage 1, 16 genetic loci were associated with 5 echocardiographic traits: 1 each with LV internal dimensions and systolic dysfunction, 3 each with LV mass and wall thickness, and 8 with aortic root size. In stage 2, 5 loci replicated (6q22 locus associated with LV diastolic dimensions, explaining <1% of trait variance; 5q23, 12p12, 12q14, and 17p13 associated with aortic root size, explaining 1%-3% of trait variance). CONCLUSIONS We identified 5 genetic loci harboring common variants that were associated with variation in LV diastolic dimensions and aortic root size, but such findings explained a very small proportion of variance. Further studies are required to replicate these findings, identify the causal variants at or near these loci, characterize their functional significance, and determine whether they are related to overt cardiovascular disease.
Collapse
|
5
|
Posch M, Zehetmayer S, Bauer P. Hunting for Significance With the False Discovery Rate. J Am Stat Assoc 2009. [DOI: 10.1198/jasa.2009.0137] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
6
|
Tintle N, Gordon D, Van Bruggen D, Finch S. The cost effectiveness of duplicate genotyping for testing genetic association. Ann Hum Genet 2009; 73:370-8. [PMID: 19344449 DOI: 10.1111/j.1469-1809.2009.00516.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We consider a modification to the traditional genome wide association (GWA) study design: duplicate genotyping. Duplicate genotyping (re-genotyping some of the samples) has long been suggested for quality control reasons; however, it has not been evaluated for its statistical cost-effectiveness. We demonstrate that when genotyping error rates are at least m%, duplicate genotyping provides a cost-effective (more statistical power for the same price) design alternative when relative genotype to phenotype/sample acquisition costs are no more than m%. In addition to cost and error rate, duplicate genotyping is most cost-effective for SNPs with low minor allele frequency. In general, relative genotype to phenotype/sample acquisition costs will be low when following up a limited number of SNPs in the second stage of a two-stage GWA study design, and, thus, duplicate genotyping may be useful in these situations. In cases where many SNPs are being followed up at the second stage, duplicate genotyping only low-quality SNPs with low minor allele frequency may be cost-effective. We also find that in almost all cases where duplicate genotyping is cost-effective, the most cost-effective design strategy involves duplicate genotyping all samples. Free software is provided which evaluates the cost-effectiveness of duplicate genotyping based on user inputs.
Collapse
Affiliation(s)
- Nathan Tintle
- Hope College, Department of Mathematics, Holland, Michigan 49423, USA.
| | | | | | | |
Collapse
|
7
|
Scherag A, Hebebrand J, Schäfer H, Müller HH. Flexible designs for genomewide association studies. Biometrics 2009; 65:815-21. [PMID: 19173695 DOI: 10.1111/j.1541-0420.2008.01174.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Genomewide association studies attempting to unravel the genetic etiology of complex traits have recently gained attention. Frequently, these studies employ a sequential genotyping strategy: A large panel of markers is examined in a subsample of subjects, and the most promising markers are genotyped in the remaining subjects. In this article, we introduce a novel method for such designs enabling investigators to, for example, modify marker densities and sample proportions while strongly controlling the family-wise type I error rate. Loss of efficiency is avoided by redistributing conditional type I error rates of discarded markers. Our approach can be combined with cost optimal designs and entails a greater flexibility than all previously suggested designs. Among other features, it allows for marker selections based upon biological criteria instead of statistical criteria alone, or the option to modify the sample size at any time during the course of the project. For practical applicability, we develop a new algorithm, subsequently evaluate it by simulations, and illustrate it using a real data set.
Collapse
Affiliation(s)
- André Scherag
- Institute of Medical Biometry and Epidemiology, Philipps-University, Marburg, Germany
| | | | | | | |
Collapse
|
8
|
Pahl R, Schäfer H, Müller HH. Optimal multistage designs—a general framework for efficient genome-wide association studies. Biostatistics 2008; 10:297-309. [DOI: 10.1093/biostatistics/kxn036] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
9
|
Zehetmayer S, Bauer P, Posch M. Optimized multi-stage designs controlling the false discovery or the family-wise error rate. Stat Med 2008; 27:4145-60. [PMID: 18444249 DOI: 10.1002/sim.3300] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
When a large number of hypotheses are investigated, we propose multi-stage designs where in each interim analysis promising hypotheses are screened, which are investigated in further stages. Given a fixed overall number of observations, this allows one to spend more observations for promising hypotheses than with single-stage designs, where the observations are equally distributed among all considered hypotheses. We propose multi-stage procedures controlling either the family-wise error rate (FWER) or the false discovery rate (FDR) and derive asymptotically optimal stopping boundaries and sample size allocations (across stages) to maximize the power of the procedure. Optimized two-stage designs lead to a considerable increase in power compared with the classical single-stage design. Going from two to three stages additionally leads to a distinctive increase in power. Adding a fourth stage leads to a further improvement, which is, however, less pronounced. Surprisingly, we found only small differences in power between optimized integrated designs, where the data of all stages are used in the final test statistics, and optimized pilot designs where only the data from the final stage are used for testing. However, the integrated design controlling the FDR appeared to be more robust against misspecifications in the planning phase. Additionally, we found that with increasing number of stages the drop in power when controlling the FWER instead of the FDR becomes negligible. Our investigations show that the crucial point is not the choice of the error rate or the type of design, but the sequential nature of the trial where non-promising hypotheses are dropped in the early phases of the experiment.
Collapse
Affiliation(s)
- Sonja Zehetmayer
- Section of Medical Statistics, Medical University of Vienna, Spitalgasse 23, 1090 Vienna, Austria
| | | | | |
Collapse
|
10
|
Gail MH, Pfeiffer RM, Wheeler W, Pee D. Probability that a two-stage genome-wide association study will detect a disease-associated snp and implications for multistage designs. Ann Hum Genet 2008; 72:812-20. [PMID: 18652601 DOI: 10.1111/j.1469-1809.2008.00467.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Large two-stage genome-wide association studies (GWASs) have been shown to reduce required genotyping with little loss of power, compared to a one-stage design, provided a substantial fraction of cases and controls, pi(sample), is included in stage 1. However, a number of recent GWASs have used pi(sample) < 0.2. Moreover, standard power calculations are not applicable because SNPs are selected in stage 1 by ranking their p-values, rather than comparing each SNP's statistic to a fixed critical value. We define the detection probability (DP) of a two-stage design as the probability that a given disease-associated SNP will have a p-value among the lowest ranks of p-values at stage 1, and, among those SNPs selected at stage 1, at stage 2. For 8000 cases and 8000 controls available for study and for odds ratios per allele in the range 1.1-1.3, we show that DP is substantially reduced for designs with pi(sample)<or= 0.25, and that DP cannot be appreciably increased by analyzing the stage 1 and stage 2 data jointly. These results suggest that multistage designs with small first stages (e.g. pi(sample)<or= 0.25) should be avoided, and that additional genotyping in earlier studies with small first stages will yield previously unselected disease-associated SNPs.
Collapse
Affiliation(s)
- M H Gail
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892-7244, US.
| | | | | | | |
Collapse
|
11
|
Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008; 9:356-69. [PMID: 18398418 DOI: 10.1038/nrg2344] [Citation(s) in RCA: 1870] [Impact Index Per Article: 116.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The past year has witnessed substantial advances in understanding the genetic basis of many common phenotypes of biomedical importance. These advances have been the result of systematic, well-powered, genome-wide surveys exploring the relationships between common sequence variation and disease predisposition. This approach has revealed over 50 disease-susceptibility loci and has provided insights into the allelic architecture of multifactorial traits. At the same time, much has been learned about the successful prosecution of association studies on such a scale. This Review highlights the knowledge gained, defines areas of emerging consensus, and describes the challenges that remain as researchers seek to obtain more complete descriptions of the susceptibility architecture of biomedical traits of interest and to translate the information gathered into improvements in clinical management.
Collapse
|
12
|
Ziegler A, König IR, Thompson JR. Biostatistical Aspects of Genome-Wide Association Studies. Biom J 2008; 50:8-28. [DOI: 10.1002/bimj.200710398] [Citation(s) in RCA: 113] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|