1
|
Abstract
The estimated effect of a marker allele from the initial study reporting the marker-allele association is often exaggerated relative to the estimated effect in follow-up studies (the "winner's curse" phenomenon). This is a particular concern for genome-wide association studies, where markers typically must pass very stringent significance thresholds to be selected for replication. A related problem is the overestimation of the predictive accuracy that occurs when the same data set is used to select a multilocus risk model from a wide range of possible models and then estimate the accuracy of the final model ("over-fitting"). Even in the absence of these quantitative biases, researchers can over-state the qualitative importance of their findings--for example, by focusing on relative risks in a context where sensitivity and specificity may be more appropriate measures. Epidemiologists need to be aware of these potential problems: as authors, to avoid or minimize them, and as readers, to detect them.
Collapse
|
2
|
Kraft P, Yen YC, Stram DO, Morrison J, Gauderman WJ. Exploiting gene-environment interaction to detect genetic associations. Hum Hered 2007; 63:111-9. [PMID: 17283440 DOI: 10.1159/000099183] [Citation(s) in RCA: 327] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Complex disease by definition results from the interplay of genetic and environmental factors. However, it is currently unclear how gene-environment interaction can best be used to locate complex disease susceptibility loci, particularly in the context of studies where between 1,000 and 1,000,000 markers are scanned for association with disease. We present a joint test of marginal association and gene-environment interaction for case-control data. We compare the power and sample size requirements of this joint test to other analyses: the marginal test of genetic association, the standard test for gene-environment interaction based on logistic regression, and the case-only test for interaction that exploits gene-environment independence. Although for many penetrance models the joint test of genetic marginal effect and interaction is not the most powerful, it is nearly optimal across all penetrance models we considered. In particular, it generally has better power than the marginal test when the genetic effect is restricted to exposed subjects and much better power than the tests of gene-environment interaction when the genetic effect is not restricted to a particular exposure level. This makes the joint test an attractive tool for large-scale association scans where the true gene-environment interaction model is unknown.
Collapse
Affiliation(s)
- Peter Kraft
- Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA.
| | | | | | | | | |
Collapse
|
3
|
Iversen ES, Chen S. Population-Calibrated Gene Characterization: Estimating Age at Onset Distributions Associated With Cancer Genes. J Am Stat Assoc 2005; 100:399-409. [PMID: 18418465 DOI: 10.1198/016214505000000196] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Phenotypic characterization of rare disease genes poses a significant statistical challenge, but the need to do so is clear. Clinical management of patients carrying a disease gene depends crucially on an accurate characterization of the genetically predisposed disease, including its likelihood of occurrence among mutation carriers, natural history, and response to treatment. We propose a formal yet practical method for controlling for bias due to ignoring ascertainment, defined as the sampling mechanism, when quantifying the association between genotype and disease using data on high-risk families. The approach is more statistically efficient than conditioning on the variables used in sampling. In it, the likelihood is adjusted by a factor that is a function of sampling weights in strata defined by those variables. It requires that these variables and the sampling probabilities in the strata they define either are known or can be estimated. The latter requires a second, population-based dataset. As an example, we derive ascertainment-corrected estimates of penetrance for the breast cancer susceptibility genes BRCA1 and BRCA2. The Bayesian analysis that we use incorporates a modified segregation model and prior data on penetrance derived from the literature. Markov chain Monte Carlo methods are used for inference.
Collapse
Affiliation(s)
- Edwin S Iversen
- Edwin S. Iversen, Jr. is Research Assistant Professor, Department of Biostatistics and Bioinformatics and Institute of Statistics and Decision Sciences, Duke University, Durham, NC 27708 (E-mail: ). Sining Chen is Postdoctoral Fellow, Oncology Biostatistics, Johns Hopkins University, Baltimore, MD 21205 (E-mail: )
| | | |
Collapse
|
4
|
Kraft P, Thomas DC. Case-sibling gene-association studies for diseases with variable age at onset. Stat Med 2005; 23:3697-712. [PMID: 15534888 DOI: 10.1002/sim.1722] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Studies which compare cases to disease-free siblings are useful for assessing association between a genetic locus and a phenotypic trait, as they eliminate the possibility of confounding by population stratification. Many analytic methods for such family-based studies are based on a binary disease model. However, complex diseases have variable age at onset. Consequently, binary-outcome methods can be inefficient or biased. We review methods for analysing censored age-at-onset data from family studies, including stratified Cox regression and genotype-decomposition regression, an unstratified procedure which regresses age-at-onset on between- and within-family genotype components. We also introduce a retrospective likelihood for censored age-at-onset data, which requires an external estimate of the baseline hazard. Stratified Cox regression does not use controls who have not attained the age of their case sibling(s), potentially leading to a loss of efficiency. Both genotype-decomposition regression and the retrospective likelihood use these younger controls. We assess the performance of these methods via simulation studies. Stratified Cox regression and the retrospective likelihood have appropriate type I error rates in almost all situations studied; genotype-decomposition regression is often anti-conservative. Away from the null, confidence intervals for the relative risk derived from stratified Cox regression are anti-conservative when the disease is rare and case-rich families are sampled. The retrospective likelihood is more efficient than stratified Cox regression and its confidence intervals have correct coverage when the disease is rare or the estimate of the baseline hazard is reasonably accurate. These results suggest that when estimating genotype relative risks is the principal analytic goal, stratified Cox regression is appropriate as long as the disease is common; when the disease is rare, the retrospective likelihood may be more appropriate.
Collapse
Affiliation(s)
- Peter Kraft
- Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA.
| | | |
Collapse
|
5
|
Zhou X, Iversen ES, Parmigiani G. Classification of Missense Mutations of Disease Genes. J Am Stat Assoc 2005; 100:51-60. [PMID: 18418466 DOI: 10.1198/016214504000001817] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Clinical management of individuals found to harbor a mutation at a known disease-susceptibility gene depends on accurate assessment of mutation-specific disease risk. For missense mutations (MMs)-mutations that lead to a single amino acid change in the protein coded by the gene-this poses a particularly challenging problem. Because it is not possible to predict the structural and functional changes to the protein product for a given amino acid substitution, and because functional assays are often not available, disease association must be inferred from data on individuals with the mutation. Inference is complicated by small sample sizes and by sampling mechanisms that bias toward individuals at high familial risk of disease. We propose a Bayesian hierarchical model to classify the disease association of MMs given pedigree data collected in the high-risk setting. The model's structure allows simultaneous characterization of multiple MMs. It uses a group of pedigrees identified through probands tested positive for known disease associated mutations and a group of test-negative pedigrees, both obtained from the same clinic, to calibrate classification and control for potential ascertainment bias. We apply this model to study MMs of breast-ovarian susceptibility genes BRCA1 and BRCA2, using data collected at the Duke University Medical Center in Durham, North Carolina.
Collapse
Affiliation(s)
- Xi Zhou
- Xi Zhou is Instructor, Division of Biostatistics, Department of Public Health, Weill Medical College of Cornell University, New York, NY 10021 (E-mail: ). Edwin S. Iversen is Assistant Research Professor, Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708 (E-mail: ). Giovanni Parmigiani is Associate Professor, Departments of Oncology, Biostatistics, and Pathology, The Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, Baltimore, MD 21205 (E-mail: )
| | | | | |
Collapse
|
6
|
Andrieu N, Goldstein AM. The case-combined-control design was efficient in detecting gene-environment interactions. J Clin Epidemiol 2004; 57:662-71. [PMID: 15358394 DOI: 10.1016/j.jclinepi.2003.11.014] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/11/2003] [Indexed: 11/21/2022]
Abstract
OBJECTIVE The interest in studying gene-environment (GxE) interaction is increasing for complex diseases. A design combining both related and unrelated controls (e.g., population-based and siblings) is proposed to increase the power to detect GxE interaction. STUDY DESIGN AND SETTING We used simulations to assess the efficiency of the case-combined-control design relative to a classical case-control study under a variety of assumptions. RESULTS The case-combined-control design appears more efficient and feasible than a classical case-control study for detecting interaction involving rare exposures and/or genetic factors. The number of available sibling controls per case and the frequencies of the risk factors are the most important parameters for determining relative efficiency. Relative efficiencies decrease as the frequency of the gene (G) increases. A positive correlation in exposure (E) between siblings decreases relative efficiency. CONCLUSIONS Although the case-combined-control design may not be efficient for common genes with moderate effects, it appears to be a useful alternative in certain situations where classical approaches remain unrealistic.
Collapse
Affiliation(s)
- N Andrieu
- Inserm EMI00-06, Tour Evry 2, 523 Place des Terrasses de l'Agora, 91034 Evry Cedex, France.
| | | |
Collapse
|
7
|
Worrall BB, Brown DL, Brott TG, Brown RD, Silliman SL, Meschia JF. Spouses and unrelated friends of probands as controls for stroke genetics studies. Neuroepidemiology 2003; 22:239-44. [PMID: 12792144 PMCID: PMC2613842 DOI: 10.1159/000070565] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
To plan a multisite, ischemic stroke genetic study, stroke patients were surveyed about the availability and characteristics of a convenience sample of spouse/friend controls. 65% of all stroke-affected probands reported a living spouse. A more detailed survey was conducted at the University of Virginia, Charlottesville, Va., USA: 51% of stroke patients reported a living, stroke-free spouse who would be willing to serve as a control, and 49% reported having a stroke-free friend who would be willing to serve as a control. Overall, 75% of stroke patients reported at least 1 individual willing to participate as a control. Cases without an identified control were more likely to be non-white (48%) than were cases with a control (13%; p = 0.00004). Cases were older than controls (67.3 vs. 59.2 years; p = 0.000002), and a greater proportion of cases than controls were male (57 vs. 33%; p = 0.0002). Without proper attention to matching, the use of a spouse/friend convenience sample would result in imbalances in basic demographic characteristics.
Collapse
Affiliation(s)
- Bradford Burke Worrall
- Department of Neurology, University of Virginia, Charlottesville, Va., USA
- Department of Health Evaluation Sciences, University of Virginia, Charlottesville, Va., USA
| | - Devin L. Brown
- Department of Neurology, University of Virginia, Charlottesville, Va., USA
| | - Thomas G. Brott
- Department of Neurology, Mayo Clinic Jacksonville, Jacksonville, Fla., USA
| | - Robert D. Brown
- Department of Neurology, Mayo Clinic Rochester, Rochester, Minn., USA
| | - Scott L. Silliman
- Department of Neurology, Shands/University of Florida, Jacksonville, Fla., USA
| | - James F. Meschia
- Department of Neurology, Mayo Clinic Jacksonville, Jacksonville, Fla., USA
| |
Collapse
|
8
|
Rebbeck TR. The contribution of inherited genotype to breast cancer. Breast Cancer Res 2002; 4:85-9. [PMID: 12052249 PMCID: PMC138727 DOI: 10.1186/bcr430] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2002] [Revised: 02/13/2002] [Accepted: 02/26/2002] [Indexed: 11/10/2022] Open
Abstract
The etiology of breast cancer is complex, and is likely to involve the actions of genes at multiple levels along the multistage carcinogenesis process. These inherited genotypes include those that affect the propensity to be exposed to breast carcinogens, and those associated with breast tumorigenesis directly. In addition, inherited genotypes may influence response to breast cancer chemoprevention and treatment. Studies relating inherited genotypes with breast cancer incidence and mortality should consider a broader spectrum of genes and their potential roles in multistage carcinogenesis than have been typically evaluated to date. Understanding the role of inherited genotype at different stages of carcinogenesis could improve our understanding of cancer biology, may identify specific exposures or events that correlate with carcinogenesis, or target relevant biochemical pathways for the development of preventive or therapeutic interventions.
Collapse
Affiliation(s)
- Timothy R Rebbeck
- Department of Biostatistics and Epidemiology, Center for Clinical Epidemiology and Biostatistics, and Cancer Center, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104-6021, USA.
| |
Collapse
|
9
|
Abstract
Pharmacogenetics is the study of how genetic variations affect drug response. These variations can affect a patient's response to cancer drugs, for which there is usually a fine line between a dosage that has a therapeutic effect and one that produces toxicity. Gaining better insight into the genetic elements of both the patient and the tumour that affect drug efficacy will eventually allow for individualized dosage determination and fewer adverse effects.
Collapse
Affiliation(s)
- M V Relling
- Department of Pharmaceutical Sciences, St Jude Children's Research Hospital, Memphis, Tennessee 38105, USA.
| | | |
Collapse
|
10
|
Rothman N, Wacholder S, Caporaso NE, Garcia-Closas M, Buetow K, Fraumeni JF. The use of common genetic polymorphisms to enhance the epidemiologic study of environmental carcinogens. BIOCHIMICA ET BIOPHYSICA ACTA 2001; 1471:C1-10. [PMID: 11342183 DOI: 10.1016/s0304-419x(00)00021-4] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Overwhelming evidence indicates that environmental exposures, broadly defined, are responsible for most cancer. There is reason to believe, however, that relatively common polymorphisms in a wide spectrum of genes may modify the effect of these exposures. We discuss the rationale for using common polymorphisms to enhance our understanding of how environmental exposures cause cancer and comment on epidemiologic strategies to assess these effects, including study design, genetic and statistical analysis, and sample size requirements. Special attention is given to sources of potential bias in population studies of gene--environment interactions, including exposure and genotype misclassification and population stratification (i.e., confounding by ethnicity). Nevertheless, by merging epidemiologic and molecular approaches in the twenty-first century, there will be enormous opportunities for unraveling the environmental determinants of cancer. In particular, studies of genetically susceptible subgroups may enable the detection of low levels of risk due to certain common exposures that have eluded traditional epidemiologic methods. Further, by identifying susceptibility genes and their pathways of action, it may be possible to identify previously unsuspected carcinogens. Finally, by gaining a more comprehensive understanding of environmental and genetic risk factors, there should emerge new clinical and public health strategies aimed at preventing and controlling cancer.
Collapse
Affiliation(s)
- N Rothman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD 20892, USA.
| | | | | | | | | | | |
Collapse
|
11
|
Abstract
Three characteristics of genetic epidemiology that distinguish it from its parent disciplines are a focus on population-based research, a focus on the joint effects of genes and the environment, and the incorporation of the underlying biology of the disease into its conceptual models. These principles are illustrated by a review of the genetic epidemiology of breast and ovarian cancer. Descriptive and mechanistic models for the joint effects of genes and "environmental" risk factors such as hormones and reproductive events are compared to illustrate the need to understand the biology. The contribution of population-based research to the development of the evidence for the involvement of major genes, the discovery of BRCA1 and BRCA2, and their characterization is reviewed. Interactions of major susceptibility genes, metabolic genes, and hormones are also discussed. I conclude with some suggestions for future directions for the field, the journal, and the Society, including recent bioethics initiatives. I believe that the Society should reach out more to the epidemiology community and that the journal should shift its emphasis from pure methodology to also include more substantive papers that illustrate these principles.
Collapse
Affiliation(s)
- D C Thomas
- Department of Preventive Medicine, University of Southern California, Los Angeles, California 90089-9011, USA.
| |
Collapse
|