1
|
Doretti M, Genbäck M, Stanghellini E. Mediation analysis with case-control sampling: Identification and estimation in the presence of a binary mediator. Biom J 2024; 66:e2300089. [PMID: 38285401 DOI: 10.1002/bimj.202300089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 10/08/2023] [Accepted: 11/11/2023] [Indexed: 01/30/2024]
Abstract
With reference to a stratified case-control (CC) procedure based on a binary variable of primary interest, we derive the expression of the distortion induced by the sampling design on the parameters of the logistic model of a secondary variable. This is particularly relevant when performing mediation analysis (possibly in a causal framework) with stratified case-control (SCC) data in settings where both the outcome and the mediator are binary. Despite being designed for parametric identification, our strategy is general and can be used also in a nonparametric context. With reference to parametric estimation, we derive the maximum likelihood (ML) estimator and the M-estimator of the joint outcome-mediator parameter vector. We then conduct a simulation study focusing on the main causal mediation quantities (i.e., natural effects) and comparing M- and ML estimation to existing methods, based on weighting. As an illustrative example, we reanalyze a German CC data set in order to investigate whether the effect of reduced immunocompetency on listeriosis onset is mediated by the intake of gastric acid suppressors.
Collapse
Affiliation(s)
- Marco Doretti
- Department of Statistics, Computer Science, and Applications, University of Florence, Florence, Italy
| | - Minna Genbäck
- Department of Statistics, USBE, Umeå University, Umeå, Sweden
| | - Elena Stanghellini
- Department of Statistics, USBE, Umeå University, Umeå, Sweden
- Department of Economics, University of Perugia, Perugia, Italy
| |
Collapse
|
2
|
Krishnan M, Phipps-Green A, Russell EM, Major TJ, Cadzow M, Stamp LK, Dalbeth N, Hindmarsh JH, Qasim M, Watson H, Liu S, Carlson JC, Minster RL, Hawley NL, Naseri T, Reupena MS, Deka R, McGarvey ST, Merriman TR, Murphy R, Weeks DE. Association of rs9939609 in FTO with BMI among Polynesian peoples living in Aotearoa New Zealand and other Pacific nations. J Hum Genet 2023; 68:463-468. [PMID: 36864286 PMCID: PMC10313811 DOI: 10.1038/s10038-023-01141-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 01/30/2023] [Accepted: 02/19/2023] [Indexed: 03/04/2023]
Abstract
The fat mass and obesity associated (FTO) locus consistently associates with higher body mass index (BMI) across diverse ancestral groups. However, previous small studies of people of Polynesian ancestries have failed to replicate the association. In this study, we used Bayesian meta-analysis to test rs9939609, the most replicated FTO variant, for association with BMI with a large sample (n = 6095) of Aotearoa New Zealanders of Polynesian (Māori and Pacific) ancestry and of Samoan people living in the Independent State of Samoa and in American Samoa. We did not observe statistically significant association within each separate Polynesian subgroup. Bayesian meta-analysis of the Aotearoa New Zealand Polynesian and Samoan samples resulted in a posterior mean effect size estimate of +0.21 kg/m2, with a 95% credible interval [+0.03 kg/m2, +0.39 kg/m2]. While the Bayes Factor (BF) of 0.77 weakly favors the null, the BF = 1.4 Bayesian support interval is [+0.04, +0.20]. These results suggest that rs9939609 in FTO may have a similar effect on mean BMI in people of Polynesian ancestries as previously observed in other ancestral groups.
Collapse
Affiliation(s)
- Mohanraj Krishnan
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Emily M Russell
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Tanya J Major
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | - Murray Cadzow
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | - Lisa K Stamp
- Department of Medicine, University of Otago, Christchurch, New Zealand
| | - Nicola Dalbeth
- Department of Medicine, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre, Auckland, New Zealand
| | - Jennie Harré Hindmarsh
- Ngāti Porou Hauora Charitable Trust, Te Puia Springs, Tairāwhiti East Coast, New Zealand
| | - Muhammad Qasim
- Ngāti Porou Hauora Charitable Trust, Te Puia Springs, Tairāwhiti East Coast, New Zealand
| | - Huti Watson
- Ngāti Porou Hauora Charitable Trust, Te Puia Springs, Tairāwhiti East Coast, New Zealand
| | - Shuwei Liu
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jenna C Carlson
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ryan L Minster
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nicola L Hawley
- Department of Chronic Disease Epidemiology, School of Public Health, Yale University, New Haven, CT, USA
| | - Take Naseri
- Ministry of Health, Government of Samoa, Apia, Samoa
- International Health Institute, Department of Epidemiology, School of Public Health, Brown University, Providence, RI, USA
| | | | - Ranjan Deka
- Department of Environmental and Public Health Sciences, College of Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Stephen T McGarvey
- International Health Institute, Department of Epidemiology, School of Public Health, Brown University, Providence, RI, USA
- Department of Anthropology, Brown University, Providence, RI, USA
| | - Tony R Merriman
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | - Rinki Murphy
- Department of Medicine, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre, Auckland, New Zealand
| | - Daniel E Weeks
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA.
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
3
|
Satten GA, Curtis SW, Solis-Lemus C, Leslie EJ, Epstein MP. Efficient estimation of indirect effects in case-control studies using a unified likelihood framework. Stat Med 2022; 41:2879-2893. [PMID: 35352841 PMCID: PMC9232910 DOI: 10.1002/sim.9390] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 03/07/2022] [Accepted: 03/08/2022] [Indexed: 06/01/2024]
Abstract
Mediation models are a set of statistical techniques that investigate the mechanisms that produce an observed relationship between an exposure variable and an outcome variable in order to deduce the extent to which the relationship is influenced by intermediate mediator variables. For a case-control study, the most common mediation analysis strategy employs a counterfactual framework that permits estimation of indirect and direct effects on the odds ratio scale for dichotomous outcomes, assuming either binary or continuous mediators. While this framework has become an important tool for mediation analysis, we demonstrate that we can embed this approach in a unified likelihood framework for mediation analysis in case-control studies that leverages more features of the data (in particular, the relationship between exposure and mediator) to improve efficiency of indirect effect estimates. One important feature of our likelihood approach is that it naturally incorporates cases within the exposure-mediator model to improve efficiency. Our approach does not require knowledge of disease prevalence and can model confounders and exposure-mediator interactions, and is straightforward to implement in standard statistical software. We illustrate our approach using both simulated data and real data from a case-control genetic study of lung cancer.
Collapse
Affiliation(s)
- Glen A. Satten
- Department of Gynecology and Obstetrics, Emory University, Atlanta, GA
| | | | - Claudia Solis-Lemus
- Department of Plant Pathology, Wisconsin Institute for Discovery, University of Wisconsin, Madison, WI
| | | | | |
Collapse
|
4
|
Modeling Secondary Phenotypes Conditional on Genotypes in Case–Control Studies. STATS 2022. [DOI: 10.3390/stats5010014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Traditional case–control genetic association studies examine relationships between case–control status and one or more covariates. It is becoming increasingly common to study secondary phenotypes and their association with the original covariates. The Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) project, a study of temporomandibular disorders (TMD), motivates this work. Numerous measures of interest are collected at enrollment, such as the number of comorbid pain conditions from which a participant suffers. Examining the potential genetic basis of these measures is of secondary interest. Assessing these associations is statistically challenging, as participants do not form a random sample from the population of interest. Standard methods may be biased and lack coverage and power. We propose a general method for the analysis of arbitrary phenotypes utilizing inverse probability weighting and bootstrapping for standard error estimation. The method may be applied to the complicated association tests used in next-generation sequencing studies, such as analyses of haplotypes with ambiguous phase. Simulation studies show that our method performs as well as competing methods when they are applicable and yield promising results for outcome types, such as time-to-event, to which other methods may not apply. The method is applied to the OPPERA baseline case–control genetic study.
Collapse
|
5
|
Wang J, Ning J, Shete S. Mediation model with a categorical exposure and a censored mediator with application to a genetic study. PLoS One 2021; 16:e0257628. [PMID: 34637449 PMCID: PMC8509986 DOI: 10.1371/journal.pone.0257628] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 09/06/2021] [Indexed: 12/12/2022] Open
Abstract
Mediation analysis is a statistical method for evaluating the direct and indirect effects of an exposure on an outcome in the presence of a mediator. Mediation models have been widely used to determine direct and indirect contributions of genetic variants in clinical phenotypes. In genetic studies, the additive genetic model is the most commonly used model because it can detect effects from either recessive or dominant models (or any model in between). However, the existing approaches for mediation model cannot be directly applied when the genetic model is additive (e.g. the most commonly used model for SNPs) or categorical (e.g. polymorphic loci), and thus modification to measures of indirect and direct effects is warranted. In this study, we proposed overall measures of indirect, direct, and total effects for a mediation model with a categorical exposure and a censored mediator, which accounts for the frequency of different values of the categorical exposure. The proposed approach provides the overall contribution of the categorical exposure to the outcome variable. We assessed the empirical performance of the proposed overall measures via simulation studies and applied the measures to evaluate the mediating effect of a women’s age at menopause on the association between genetic variants and type 2 diabetes.
Collapse
Affiliation(s)
- Jian Wang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Jing Ning
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Sanjay Shete
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
6
|
Vázquez-Moreno M, Mejía-Benítez A, Sharma T, Peralta-Romero J, Locia-Morales D, Klünder-Klünder M, Cruz M, Meyre D. Association of AMY1A/AMY2A copy numbers and AMY1/AMY2 serum enzymatic activity with obesity in Mexican children. Pediatr Obes 2020; 15:e12641. [PMID: 32314532 DOI: 10.1111/ijpo.12641] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 03/30/2020] [Accepted: 03/31/2020] [Indexed: 12/13/2022]
Abstract
BACKGROUND Mexican children are characterized by a high-starch intake diet and high prevalence of obesity. OBJECTIVES To investigate the association of AMY1A/AMY2A copy numbers (CNs) and AMY1/AMY2 serum enzymatic activity with childhood obesity in up to 427 and 337 Mexican cases and controls. METHODS Anthropometric and dietary starch intake data were collected. CN of AMY1A/AMY2A and AMY1/AMY2 serum enzymatic activity were determined using droplet digital PCR (ddPCR) and enzymatic colorimetry, respectively. An individual participant level data meta-analysis of association between AMY1A CNVs and obesity was also performed. RESULTS A positive association between AMY1A/AMY2A CNs and their corresponding AMY1/AMY2 serum enzyme activity was observed in children with normal weight and obesity. The serum enzyme activity of AMY1 and AMY2 was negatively associated with childhood obesity risk, and the association was restricted to kids eating medium/high amount of starch (Pinteraction = .004). While no association between AMY1A and AMY2A CNs and childhood obesity was observed in our sample, we confirmed a significant association between AMY1A CN and obesity in a meta-analysis of 3100 Mexican children. CONCLUSIONS Our data suggest that genetically determined salivary and pancreatic amylase activity can increase/decrease the risk of obesity in Mexican children, this effect being blunted by a low-starch diet.
Collapse
Affiliation(s)
- Miguel Vázquez-Moreno
- Hospital de Especialidades, Centro Médico Nacional Siglo XXI del Instituto Mexicano del Seguro Social, Unidad de Investigación Médica en Bioquímica, Mexico City, Mexico.,Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Canada
| | - Aurora Mejía-Benítez
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Canada
| | - Tanmay Sharma
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Canada
| | - Jesús Peralta-Romero
- Hospital de Especialidades, Centro Médico Nacional Siglo XXI del Instituto Mexicano del Seguro Social, Unidad de Investigación Médica en Bioquímica, Mexico City, Mexico
| | - Daniel Locia-Morales
- Hospital de Especialidades, Centro Médico Nacional Siglo XXI del Instituto Mexicano del Seguro Social, Unidad de Investigación Médica en Bioquímica, Mexico City, Mexico
| | - Miguel Klünder-Klünder
- Departamento de Investigación en Salud Comunitaria, Hospital Infantil de México Federico Gómez, Mexico City, Mexico
| | -
- Instituto Mexicano del seguro social, Mexico City, Mexico
| | - Miguel Cruz
- Hospital de Especialidades, Centro Médico Nacional Siglo XXI del Instituto Mexicano del Seguro Social, Unidad de Investigación Médica en Bioquímica, Mexico City, Mexico
| | - David Meyre
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Canada.,Department of Pathology and Molecular Medicine, McMaster University, Hamilton, Canada
| |
Collapse
|
7
|
Vázquez-Moreno M, Locia-Morales D, Perez-Herrera A, Gomez-Diaz RA, Gonzalez-Dzib R, Valdez-González AL, Flores-Alfaro E, Corona-Salazar P, Suarez-Sanchez F, Gomez-Zamudio J, Valladares-Salgado A, Wacher-Rodarte N, Cruz M, Meyre D. Causal Association of Haptoglobin With Obesity in Mexican Children: A Mendelian Randomization Study. J Clin Endocrinol Metab 2020; 105:5822684. [PMID: 32309857 DOI: 10.1210/clinem/dgaa213] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 04/16/2020] [Indexed: 12/16/2022]
Abstract
CONTEXT Little is known about the association between haptoglobin level and cardiometabolic traits. A previous genome-wide association study identified rs2000999 in the HP gene as the stronger genetic contributor to serum haptoglobin level in European populations. OBJECTIVE AND DESIGN We investigated the association of HP rs2000999 with serum haptoglobin and childhood and adult obesity in up to 540/697 and 592/691 Mexican cases and controls, respectively. Anthropometric and biochemical data were collected. Serum haptoglobin was measured by an immunoturbidimetry assay. HP rs2000999 was genotyped using the TaqMan technology. Mendelian randomization analysis was performed using the Wald and inverse variance weighting methods. RESULTS Haptoglobin level was positively associated with childhood and adult obesity. HP rs2000999 G allele was positively associated with haptoglobin level in children and adults. HP rs2000999 G allele was positively associated with childhood but not adult obesity. The association between HP rs2000999 and childhood obesity was removed after adjusting for haptoglobin level. In a Mendelian randomization analysis, haptoglobin level genetically predicted by HP rs2000999 showed a significant causal effect on childhood obesity by the Wald and inverse variance weighting methods. CONCLUSION Our data provide evidence for the first time for a causal positive association between serum haptoglobin level and childhood obesity in the Mexican population. Our study contributes to the genetic elucidation of childhood obesity and proposes haptoglobin as an important biomarker and treatment target for obesity.
Collapse
Affiliation(s)
- Miguel Vázquez-Moreno
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI del Instituto Mexicano del Seguro Social, Mexico City, Mexico
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Canada
| | - Daniel Locia-Morales
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI del Instituto Mexicano del Seguro Social, Mexico City, Mexico
- Laboratorio de Investigación en Epidemiología Clínica y Molecular, Facultad de Ciencias Químico Biológicas, Universidad Autónoma de Guerrero, Chilpancingo, Guerrero, 39090, Mexico
| | - Aleyda Perez-Herrera
- Consejo Nacional de Ciencia y Tecnología, Instituto Politécnico Nacional-Centro Interdisciplinario de Investigación para el Desarrollo Integral-Regional Unidad Oaxaca, Oaxaca, Mexico
| | - Rita A Gomez-Diaz
- Unidad de Investigación en Epidemiología Clínica, Hospital de Especialidades Bernardo Sepúlveda, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Ciudad de México, México
| | - Roxana Gonzalez-Dzib
- Servicio de Prestaciones Médicas del Instituto Mexicano del Seguro Social, Delegación Campeche, Campeche, Mexico
| | - Adriana L Valdez-González
- Unidad de Investigación en Epidemiología Clínica, Hospital de Especialidades Bernardo Sepúlveda, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Ciudad de México, México
| | - Eugenia Flores-Alfaro
- Laboratorio de Investigación en Epidemiología Clínica y Molecular, Facultad de Ciencias Químico Biológicas, Universidad Autónoma de Guerrero, Chilpancingo, Guerrero, 39090, Mexico
| | - Perla Corona-Salazar
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI del Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Fernando Suarez-Sanchez
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI del Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Jaime Gomez-Zamudio
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI del Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Adan Valladares-Salgado
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI del Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Niels Wacher-Rodarte
- Unidad de Investigación en Epidemiología Clínica, Hospital de Especialidades Bernardo Sepúlveda, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Ciudad de México, México
| | - Miguel Cruz
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI del Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - David Meyre
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Canada
- Department of Pathology and Molecular Medicine, McMaster University, Hamilton, Canada
| |
Collapse
|
8
|
Beesley LJ, Salvatore M, Fritsche LG, Pandit A, Rao A, Brummett C, Willer CJ, Lisabeth LD, Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat Med 2020; 39:773-800. [PMID: 31859414 PMCID: PMC7983809 DOI: 10.1002/sim.8445] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 09/10/2019] [Accepted: 11/16/2019] [Indexed: 01/03/2023]
Abstract
Biobanks linked to electronic health records provide rich resources for health-related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis-generating studies of disease-treatment, disease-exposure, and disease-gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank-based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank-based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.
Collapse
Affiliation(s)
| | | | | | - Anita Pandit
- University of Michigan, Department of Biostatistics
| | - Arvind Rao
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | - Chad Brummett
- University of Michigan, Department of Anesthesiology
| | - Cristen J. Willer
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | | | | |
Collapse
|
9
|
Tounkara F, Lefebvre G, Greenwood C, Oualkacha K. A flexible copula-based approach for the analysis of secondary phenotypes in ascertained samples. Stat Med 2020; 39:517-543. [PMID: 31868965 DOI: 10.1002/sim.8416] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Revised: 04/30/2019] [Accepted: 09/04/2019] [Indexed: 12/20/2022]
Abstract
Data collected for a genome-wide association study of a primary phenotype are often used for additional genome-wide association analyses of secondary phenotypes. However, when the primary and secondary traits are dependent, naïve analyses of secondary phenotypes may induce spurious associations in non-randomly ascertained samples. Previously, retrospective likelihood-based methods have been proposed to correct for sampling biases arising in secondary trait association analyses. However, most methods have been introduced to handle studies featuring a case-control design based on a binary primary phenotype. As such, these methods are not directly applicable to more complicated study designs such as multiple-trait studies, where the sampling mechanism also depends on the secondary phenotype, or extreme-trait studies, where individuals with extreme primary phenotype values are selected. To accommodate these more complicated sampling mechanisms, only a few prospective likelihood approaches have been proposed. These approaches assume a normal distribution for the secondary phenotype (or the latent secondary phenotype) and a bivariate normal distribution for the primary-secondary phenotype dependence. In this paper, we propose a unified copula-based approach to appropriately detect genetic variant/secondary phenotype association in the presence of selected samples. Primary phenotype is either binary or continuous and the secondary phenotype is continuous although not necessary normal. We use both prospective and retrospective likelihoods to account for the sampling mechanism and use a copula model to allow for potentially different dependence structures between the primary and secondary phenotypes. We demonstrate the effectiveness of our approach through simulation studies and by analyzing data from the Avon Longitudinal Study of Parents and Children cohort.
Collapse
Affiliation(s)
- Fodé Tounkara
- Lunenfeld-Tenenbaum Research Institute, Toronto, Canada
| | - Geneviève Lefebvre
- Department of Mathematics, Université du Québec à Montréal, Montreal, Canada
| | - Celia Greenwood
- Lady Davis Research Institute, Centre for Clinical Epidemiology, Jewish General Hospital, Montreal, Canada.,Gerald Bronfman Department of Oncology, McGill University, Montreal, Canada.,Department of Epidemiology, Biostatistics & Occupational Health, McGill University, Montreal, Canada.,Department of Human Genetics, McGill University, Montreal, Canada
| | - Karim Oualkacha
- Department of Mathematics, Université du Québec à Montréal, Montreal, Canada
| |
Collapse
|
10
|
Bi W, Li Y, Smeltzer MP, Gao G, Zhao S, Kang G. STEPS: an efficient prospective likelihood approach to genetic association analyses of secondary traits in extreme phenotype sequencing. Biostatistics 2020; 21:33-49. [PMID: 30007308 PMCID: PMC8559722 DOI: 10.1093/biostatistics/kxy030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 05/16/2018] [Accepted: 06/02/2018] [Indexed: 11/13/2022] Open
Abstract
It has been well acknowledged that methods for secondary trait (ST) association analyses under a case-control design (ST$_{\text{CC}}$) should carefully consider the sampling process to avoid biased risk estimates. A similar situation also exists in the extreme phenotype sequencing (EPS) designs, which is to select subjects with extreme values of continuous primary phenotype for sequencing. EPS designs are commonly used in modern epidemiological and clinical studies such as the well-known National Heart, Lung, and Blood Institute Exome Sequencing Project. Although naïve generalized regression or ST$_{\text{CC}}$ method could be applied, their validity is questionable due to difference in statistical designs. Herein, we propose a general prospective likelihood framework to perform association testing for binary and continuous STs under EPS designs (STEPS), which can also incorporate covariates and interaction terms. We provide a computationally efficient and robust algorithm to obtain the maximum likelihood estimates. We also present two empirical mathematical formulas for power/sample size calculations to facilitate planning of binary/continuous STs association analyses under EPS designs. Extensive simulations and application to a genome-wide association study of benign ethnic neutropenia under an EPS design demonstrate the superiority of STEPS over all its alternatives above.
Collapse
Affiliation(s)
- Wenjian Bi
- Department of Biostatistics, St. Jude Children’s Research
Hospital, Memphis, TN 38105, USA
| | - Yun Li
- Department of Genetics, University of North Carolina, Chapel
Hill, NC 27599, USA
- Department of Biostatistics, University of North Carolina, Chapel
Hill, NC 27599, USA
- Department of Computer Science, University of North Carolina,
Chapel Hill, NC 27599, USA
| | - Matthew P Smeltzer
- Division of Epidemiology, Biostatistics, and Environmental Health, School of
Public Health, University of Memphis, Memphis, TN 38152, USA
| | - Guimin Gao
- Department of Public Health Sciences, University of Chicago,
Chicago, IL 60637, USA
| | - Shengli Zhao
- School of Statistics, Qufu Normal University, Qufu 273165, PR
China
| | - Guolian Kang
- Department of Biostatistics, St. Jude Children’s Research
Hospital, Memphis, TN 38105, USA
| |
Collapse
|
11
|
Lamiquiz-Moneo I, Mateo-Gallego R, Bea AM, Dehesa-García B, Pérez-Calahorra S, Marco-Benedí V, Baila-Rueda L, Laclaustra M, Civeira F, Cenarro A. Genetic predictors of weight loss in overweight and obese subjects. Sci Rep 2019; 9:10770. [PMID: 31341224 PMCID: PMC6656717 DOI: 10.1038/s41598-019-47283-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Accepted: 07/03/2019] [Indexed: 12/24/2022] Open
Abstract
The aim of our study was to investigate a large cohort of overweight subjects consuming a homogeneous diet to identify the genetic factors associated with weight loss that could be used as predictive markers in weight loss interventions. We retrospectively recruited subjects (N = 788) aged over 18 years with a Body Mass Index (BMI) between 25 and 40 kg/m2 who were treated at our lipid unit for at least one year from 2008 to 2016, and we also recruited a control group (168 patients) with normal BMIs. All participants received counselling from a nutritionist that included healthy diet and physical activity recommendations. We genotyped 25 single nucleotide variants (SNVs) in 25 genes that were previously associated with obesity and calculated genetic scores that were derived from 25 SNVs. The risk allele in CADM2 showed a higher frequency in overweight and obese subjects than in controls (p = 0.007). The mean follow-up duration was 5.58 ± 2.68 years. Subjects with lower genetic scores showed greater weight loss during the follow-up period. The genetic score was the variable that best explained the variations in weight from the baseline. The genetic score explained 2.4% of weight change variance at one year and 1.6% of weight change variance at the end of the follow-up period after adjusting for baseline weight, sex, age and years of follow-up.
Collapse
Affiliation(s)
- Itziar Lamiquiz-Moneo
- Unidad Clínica y de Investigación en Lípidos y Arteriosclerosis, Hospital Universitario Miguel Servet, Instituto de Investigación Sanitaria Aragón (IIS Aragón), CIBERCV, Zaragoza, Spain
| | - Rocío Mateo-Gallego
- Unidad Clínica y de Investigación en Lípidos y Arteriosclerosis, Hospital Universitario Miguel Servet, Instituto de Investigación Sanitaria Aragón (IIS Aragón), CIBERCV, Zaragoza, Spain. .,Universidad de Zaragoza, Zaragoza, Spain.
| | - Ana M Bea
- Unidad Clínica y de Investigación en Lípidos y Arteriosclerosis, Hospital Universitario Miguel Servet, Instituto de Investigación Sanitaria Aragón (IIS Aragón), CIBERCV, Zaragoza, Spain
| | - Blanca Dehesa-García
- Unidad Clínica y de Investigación en Lípidos y Arteriosclerosis, Hospital Universitario Miguel Servet, Instituto de Investigación Sanitaria Aragón (IIS Aragón), CIBERCV, Zaragoza, Spain
| | - Sofía Pérez-Calahorra
- Unidad Clínica y de Investigación en Lípidos y Arteriosclerosis, Hospital Universitario Miguel Servet, Instituto de Investigación Sanitaria Aragón (IIS Aragón), CIBERCV, Zaragoza, Spain
| | - Victoria Marco-Benedí
- Unidad Clínica y de Investigación en Lípidos y Arteriosclerosis, Hospital Universitario Miguel Servet, Instituto de Investigación Sanitaria Aragón (IIS Aragón), CIBERCV, Zaragoza, Spain
| | - Lucía Baila-Rueda
- Unidad Clínica y de Investigación en Lípidos y Arteriosclerosis, Hospital Universitario Miguel Servet, Instituto de Investigación Sanitaria Aragón (IIS Aragón), CIBERCV, Zaragoza, Spain
| | - Martín Laclaustra
- Unidad Clínica y de Investigación en Lípidos y Arteriosclerosis, Hospital Universitario Miguel Servet, Instituto de Investigación Sanitaria Aragón (IIS Aragón), CIBERCV, Zaragoza, Spain
| | - Fernando Civeira
- Unidad Clínica y de Investigación en Lípidos y Arteriosclerosis, Hospital Universitario Miguel Servet, Instituto de Investigación Sanitaria Aragón (IIS Aragón), CIBERCV, Zaragoza, Spain.,Universidad de Zaragoza, Zaragoza, Spain
| | - Ana Cenarro
- Unidad Clínica y de Investigación en Lípidos y Arteriosclerosis, Hospital Universitario Miguel Servet, Instituto de Investigación Sanitaria Aragón (IIS Aragón), CIBERCV, Zaragoza, Spain
| |
Collapse
|
12
|
Zhang H, Bi W, Cui Y, Chen H, Chen J, Zhao Y, Kang G. Extreme-value sampling design is cost-beneficial only with a valid statistical approach for exposure-secondary outcome association analyses. Stat Methods Med Res 2019; 29:466-480. [PMID: 30945605 DOI: 10.1177/0962280219839093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In epidemiology cohort studies, exposure data are collected in sub-studies based on a primary outcome (PO) of interest, as with the extreme-value sampling design (EVSD), to investigate their correlation. Secondary outcomes (SOs) data are also readily available, enabling researchers to assess the correlations between the exposure and the SOs. However, when the EVSD is used, the data for SOs are not representative samples of a general population; thus, many commonly used statistical methods, such as the generalized linear model (GLM), are not valid. A prospective likelihood method has been developed to associate SOs with single-nucleotide polymorphisms under an extreme phenotype sequencing design. In this paper, we describe the application of the prospective likelihood method (STEVSD) to exposure-SO association analysis under an EVSD. We undertook extensive simulations to assess the performance of the STEVSD method in associating binary and continuous exposures with SOs, comparing it to the simple GLM method that ignores the EVSD. To demonstrate the cost-benefit of the STEVSD method, we also mimicked the design of two new retrospective studies, as would be done in actual practice, based on the PO of interest, which was the same as the SO in the EVSD study. We then analyzed these data by using the GLM method and compared its power to that of the STEVSD method. We demonstrated the usefulness of the STEVSD method by applying it to a benign ethnic neutropenia dataset. Our results indicate that the STEVSD method can control type I error well, whereas the GLM method cannot do so owing to its ignorance of EVSD, and that the STEVSD method is cost-effective because it has statistical power similar to that of two new retrospective studies that require collecting new exposure data for selected individuals.
Collapse
Affiliation(s)
- Hang Zhang
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, PR China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, PR China
| | - Wenjian Bi
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA
| | - Honglei Chen
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Yanlong Zhao
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, PR China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, PR China
| | - Guolian Kang
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, USA
| |
Collapse
|
13
|
A review of analysis methods for secondary outcomes in case-control studies. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2019. [DOI: 10.29220/csam.2019.26.2.103] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
14
|
Wang J, Ning J, Shete S. Mediation analysis in a case-control study when the mediator is a censored variable. Stat Med 2019; 38:1213-1229. [PMID: 30421436 DOI: 10.1002/sim.8028] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Revised: 09/11/2018] [Accepted: 10/15/2018] [Indexed: 11/10/2022]
Abstract
Mediation analysis is an approach for assessing the direct and indirect effects of an initial variable on an outcome through a mediator. In practice, mediation models can involve a censored mediator (eg, a woman's age at menopause). The current research for mediation analysis with a censored mediator focuses on scenarios where outcomes are continuous. However, the outcomes can be binary (eg, type 2 diabetes). Another challenge when analyzing such a mediation model is to use data from a case-control study, which results in biased estimations for the initial variable-mediator association if a standard approach is directly applied. In this study, we propose an approach (denoted as MAC-CC) to analyze the mediation model with a censored mediator given data from a case-control study, based on the semiparametric accelerated failure time model along with a pseudo-likelihood function. We adapted the measures for assessing the indirect and direct effects using counterfactual definitions. We conducted simulation studies to investigate the performance of MAC-CC and compared it to those of the naïve approach and the complete-case approach. MAC-CC accurately estimates the coefficients of different paths, the indirect effects, and the proportions of the total effects mediated. We applied the proposed and existing approaches to the mediation study of genetic variants, a woman's age at menopause, and type 2 diabetes based on a case-control study of type 2 diabetes. Our results indicate that there is no mediating effect from the age at menopause on the association between the genetic variants and type 2 diabetes.
Collapse
Affiliation(s)
- Jian Wang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Jing Ning
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Sanjay Shete
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas.,Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| |
Collapse
|
15
|
Aschard H, Laville V, Tchetgen ET, Knights D, Imhann F, Seksik P, Zaitlen N, Silverberg MS, Cosnes J, Weersma RK, Xavier R, Beaugerie L, Skurnik D, Sokol H. Genetic effects on the commensal microbiota in inflammatory bowel disease patients. PLoS Genet 2019; 15:e1008018. [PMID: 30849075 PMCID: PMC6426259 DOI: 10.1371/journal.pgen.1008018] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 03/20/2019] [Accepted: 02/13/2019] [Indexed: 12/16/2022] Open
Abstract
Several bacteria in the gut microbiota have been shown to be associated with inflammatory bowel disease (IBD), and dozens of IBD genetic variants have been identified in genome-wide association studies. However, the role of the microbiota in the etiology of IBD in terms of host genetic susceptibility remains unclear. Here, we studied the association between four major genetic variants associated with an increased risk of IBD and bacterial taxa in up to 633 IBD cases. We performed systematic screening for associations, identifying and replicating associations between NOD2 variants and two taxa: the Roseburia genus and the Faecalibacterium prausnitzii species. By exploring the overall association patterns between genes and bacteria, we found that IBD risk alleles were significantly enriched for associations concordant with bacteria-IBD associations. To understand the significance of this pattern in terms of the study design and known effects from the literature, we used counterfactual principles to assess the fitness of a few parsimonious gene-bacteria-IBD causal models. Our analyses showed evidence that the disease risk of these genetic variants were likely to be partially mediated by the microbiome. We confirmed these results in extensive simulation studies and sensitivity analyses using the association between NOD2 and F. prausnitzii as a case study.
Collapse
Affiliation(s)
- Hugues Aschard
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI), Institut Pasteur, Paris, France
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- * E-mail: (HA); (DS); (HS)
| | - Vincent Laville
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI), Institut Pasteur, Paris, France
| | - Eric Tchetgen Tchetgen
- Department of Statistics, The Wharton School at the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Dan Knights
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, United States of America
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- Center for Computational and Integrative Biology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, United States of America
- Biotechnology Institute, University of Minnesota, St. Paul, Minnesota, United States of America
| | - Floris Imhann
- Department of Gastroenterology and Hepatology, University of Groningen and University Medical Center Groningen, Groningen, the Netherlands
| | - Philippe Seksik
- Department of Gastroenterology, Saint Antoine Hospital, Paris, France
| | - Noah Zaitlen
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Mark S. Silverberg
- Zane Cohen Centre for Digestive Diseases, Mount Sinai Hospital, Toronto, Ontario, Canada
| | - Jacques Cosnes
- Department of Gastroenterology, Saint Antoine Hospital, Paris, France
- Sorbonne Université, Paris, France
| | - Rinse K. Weersma
- Department of Gastroenterology and Hepatology, University of Groningen and University Medical Center Groningen, Groningen, the Netherlands
| | - Ramnik Xavier
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- Center for Computational and Integrative Biology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, United States of America
- Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Laurent Beaugerie
- Department of Gastroenterology, Saint Antoine Hospital, Paris, France
- Sorbonne Université, Paris, France
| | - David Skurnik
- Division of Infectious Diseases, Harvard Medical School, Boston, Massachusetts, United States of America
- Massachusetts Technology and Analytics, Brookline, Massachusetts, United States of America
- Department of Microbiology, Necker Hospital and University Paris Descartes, Paris, France
- INSERM U1151-Equipe 11, Institut Necker-Enfants Malades, Paris, France
- * E-mail: (HA); (DS); (HS)
| | - Harry Sokol
- Department of Gastroenterology, Saint Antoine Hospital, Paris, France
- Sorbonne Université, Paris, France
- Micalis Institute, AgroParisTech, Jouy-en-Josas, France
- INSERM CRSA UMRS U938, Paris, France
- * E-mail: (HA); (DS); (HS)
| |
Collapse
|
16
|
Ray D, Basu S. A novel association test for multiple secondary phenotypes from a case-control GWAS. Genet Epidemiol 2017; 41:413-426. [PMID: 28393390 DOI: 10.1002/gepi.22045] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Revised: 12/22/2016] [Accepted: 02/05/2017] [Indexed: 12/13/2022]
Abstract
In the past decade, many genome-wide association studies (GWASs) have been conducted to explore association of single nucleotide polymorphisms (SNPs) with complex diseases using a case-control design. These GWASs not only collect information on the disease status (primary phenotype, D) and the SNPs (genotypes, X), but also collect extensive data on several risk factors and traits. Recent literature and grant proposals point toward a trend in reusing existing large case-control data for exploring genetic associations of some additional traits (secondary phenotypes, Y) collected during the study. These secondary phenotypes may be correlated, and a proper analysis warrants a multivariate approach. Commonly used multivariate methods are not equipped to properly account for the non-random sampling scheme. Current ad hoc practices include analyses without any adjustment, and analyses with D adjusted as a covariate. Our theoretical and empirical studies suggest that the type I error for testing genetic association of secondary traits can be substantial when X as well as Y are associated with D, even when there is no association between X and Y in the underlying (target) population. Whether using D as a covariate helps maintain type I error depends heavily on the disease mechanism and the underlying causal structure (which is often unknown). To avoid grossly incorrect inference, we have proposed proportional odds model adjusted for propensity score (POM-PS). It uses a proportional odds logistic regression of X on Y and adjusts estimated conditional probability of being diseased as a covariate. We demonstrate the validity and advantage of POM-PS, and compare to some existing methods in extensive simulation experiments mimicking plausible scenarios of dependency among Y, X, and D. Finally, we use POM-PS to jointly analyze four adiposity traits using a type 2 diabetes (T2D) case-control sample from the population-based Metabolic Syndrome in Men (METSIM) study. Only POM-PS analysis of the T2D case-control sample seems to provide valid association signals.
Collapse
Affiliation(s)
- Debashree Ray
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Saonli Basu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
17
|
Kang G, Bi W, Zhang H, Pounds S, Cheng C, Shete S, Zou F, Zhao Y, Zhang JF, Yue W. A Robust and Powerful Set-Valued Approach to Rare Variant Association Analyses of Secondary Traits in Case-Control Sequencing Studies. Genetics 2017; 205:1049-1062. [PMID: 28040743 PMCID: PMC5340322 DOI: 10.1534/genetics.116.192377] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 12/29/2016] [Indexed: 12/16/2022] Open
Abstract
In many case-control designs of genome-wide association (GWAS) or next generation sequencing (NGS) studies, extensive data on secondary traits that may correlate and share the common genetic variants with the primary disease are available. Investigating these secondary traits can provide critical insights into the disease etiology or pathology, and enhance the GWAS or NGS results. Methods based on logistic regression (LG) were developed for this purpose. However, for the identification of rare variants (RVs), certain inadequacies in the LG models and algorithmic instability can cause severely inflated type I error, and significant loss of power, when the two traits are correlated and the RV is associated with the disease, especially at stringent significance levels. To address this issue, we propose a novel set-valued (SV) method that models a binary trait by dichotomization of an underlying continuous variable, and incorporate this into the genetic association model as a critical component. Extensive simulations and an analysis of seven secondary traits in a GWAS of benign ethnic neutropenia show that the SV method consistently controls type I error well at stringent significance levels, has larger power than the LG-based methods, and is robust in performance to effect pattern of the genetic variant (risk or protective), rare or common variants, rare or common diseases, and trait distributions. Because of the SV method's striking and profound advantage, we strongly recommend the SV method be employed instead of the LG-based methods for secondary traits analyses in case-control sequencing studies.
Collapse
Affiliation(s)
- Guolian Kang
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105
| | - Wenjian Bi
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105
| | - Hang Zhang
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Stanley Pounds
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105
| | - Cheng Cheng
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105
| | - Sanjay Shete
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030
| | - Fei Zou
- Department of Biostatistics, The University of North Carolina at Chapel Hill, North Carolina 27599
| | - Yanlong Zhao
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| | - Ji-Feng Zhang
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Weihua Yue
- Institute of Mental Health, Key Laboratory of Mental Health, Ministry of Health & National Clinical Research Center for Mental Disorders, Sixth Hospital, Peking University, Beijing 100191, People's Republic of China
| |
Collapse
|
18
|
Sofer T, Cornelis MC, Kraft P, Tchetgen Tchetgen EJ. CONTROL FUNCTION ASSISTED IPW ESTIMATION WITH A SECONDARY OUTCOME IN CASE-CONTROL STUDIES. Stat Sin 2017. [PMID: 28649172 DOI: 10.5705/ss.202015.0116] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Case-control studies are designed towards studying associations between risk factors and a single, primary outcome. Information about additional, secondary outcomes is also collected, but association studies targeting such secondary outcomes should account for the case-control sampling scheme, or otherwise results may be biased. Often, one uses inverse probability weighted (IPW) estimators to estimate population effects in such studies. IPW estimators are robust, as they only require correct specification of the mean regression model of the secondary outcome on covariates, and knowledge of the disease prevalence. However, IPW estimators are inefficient relative to estimators that make additional assumptions about the data generating mechanism. We propose a class of estimators for the effect of risk factors on a secondary outcome in case-control studies that combine IPW with an additional modeling assumption: specification of the disease outcome probability model. We incorporate this model via a mean zero control function. We derive the class of all regular and asymptotically linear estimators corresponding to our modeling assumption, when the secondary outcome mean is modeled using either the identity or the log link. We find the efficient estimator in our class of estimators and show that it reduces to standard IPW when the model for the primary disease outcome is unrestricted, and is more efficient than standard IPW when the model is either parametric or semiparametric.
Collapse
Affiliation(s)
- Tamar Sofer
- University of Washington and Harvard T.H. Chan School of Public Health
| | | | - Peter Kraft
- University of Washington and Harvard T.H. Chan School of Public Health
| | | |
Collapse
|
19
|
Zhu W, Yuan Y, Zhang J, Zhou F, Knickmeyer RC, Zhu H. Genome-wide association analysis of secondary imaging phenotypes from the Alzheimer's disease neuroimaging initiative study. Neuroimage 2016; 146:983-1002. [PMID: 27717770 DOI: 10.1016/j.neuroimage.2016.09.055] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Revised: 08/13/2016] [Accepted: 09/21/2016] [Indexed: 11/17/2022] Open
Abstract
The aim of this paper is to systematically evaluate a biased sampling issue associated with genome-wide association analysis (GWAS) of imaging phenotypes for most imaging genetic studies, including the Alzheimer's Disease Neuroimaging Initiative (ADNI). Specifically, the original sampling scheme of these imaging genetic studies is primarily the retrospective case-control design, whereas most existing statistical analyses of these studies ignore such sampling scheme by directly correlating imaging phenotypes (called the secondary traits) with genotype. Although it has been well documented in genetic epidemiology that ignoring the case-control sampling scheme can produce highly biased estimates, and subsequently lead to misleading results and suspicious associations, such findings are not well documented in imaging genetics. We use extensive simulations and a large-scale imaging genetic data analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data to evaluate the effects of the case-control sampling scheme on GWAS results based on some standard statistical methods, such as linear regression methods, while comparing it with several advanced statistical methods that appropriately adjust for the case-control sampling scheme.
Collapse
Affiliation(s)
- Wensheng Zhu
- School of Mathematics & Statistics and KLAS, Northeast Normal University, Changchun 130024, China; Departments of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ying Yuan
- Takeda Pharmaceuticals U.S.A., Inc., 300 Massachusetts Ave, Cambridge, MA 02139, USA
| | - Jingwen Zhang
- Departments of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Fan Zhou
- Departments of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rebecca C Knickmeyer
- Departments of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hongtu Zhu
- Departments of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
| |
Collapse
|
20
|
Yung G, Lin X. Validity of using ad hoc methods to analyze secondary traits in case-control association studies. Genet Epidemiol 2016; 40:732-743. [PMID: 27670932 DOI: 10.1002/gepi.21994] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Revised: 06/23/2016] [Accepted: 06/26/2016] [Indexed: 11/10/2022]
Abstract
Case-control association studies often collect from their subjects information on secondary phenotypes. Reusing the data and studying the association between genes and secondary phenotypes provide an attractive and cost-effective approach that can lead to discovery of new genetic associations. A number of approaches have been proposed, including simple and computationally efficient ad hoc methods that ignore ascertainment or stratify on case-control status. Justification for these approaches relies on the assumption of no covariates and the correct specification of the primary disease model as a logistic model. Both might not be true in practice, for example, in the presence of population stratification or the primary disease model following a probit model. In this paper, we investigate the validity of ad hoc methods in the presence of covariates and possible disease model misspecification. We show that in taking an ad hoc approach, it may be desirable to include covariates that affect the primary disease in the secondary phenotype model, even though these covariates are not necessarily associated with the secondary phenotype. We also show that when the disease is rare, ad hoc methods can lead to severely biased estimation and inference if the true disease model follows a probit model instead of a logistic model. Our results are justified theoretically and via simulations. Applied to real data analysis of genetic associations with cigarette smoking, ad hoc methods collectively identified as highly significant (P<10-5) single nucleotide polymorphisms from over 10 genes, genes that were identified in previous studies of smoking cessation.
Collapse
Affiliation(s)
- Godwin Yung
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
21
|
Longitudinal relationships between glycemic status and body mass index in a multiethnic study: evidence from observational and genetic epidemiology. Sci Rep 2016; 6:30744. [PMID: 27480816 PMCID: PMC4969745 DOI: 10.1038/srep30744] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Accepted: 07/07/2016] [Indexed: 12/12/2022] Open
Abstract
We investigated the relationship between glycemic status and BMI and its interaction with obesity single-nucleotide polymorphisms (SNPs) in a multi-ethnic longitudinal cohort at high-risk for dysglycemia. We studied 17 394 participants from six ethnicities followed-up for 3.3 years. Twenty-three obesity SNPs were genotyped and an unweighted genotype risk score (GRS) was calculated. Glycemic status was defined using an oral glucose tolerance test. Linear regression models were adjusted for age, sex and population stratification. Normal glucose tolerance (NGT) to dysglycemia transition was associated with baseline BMI and BMI change. Impaired fasting glucose/impaired glucose tolerance to type 2 diabetes transition was associated with baseline BMI but not BMI change. No simultaneous significant main genetic effects and interactions between SNPs/GRS and glycemic status or transition on BMI level and BMI change were observed. Our data suggests that the interplay between glycemic status and BMI trajectory may be independent of the effects of obesity genes. This implies that individuals with different glycemic statuses may be combined together in genetic association studies on obesity traits, if appropriate adjustments for glycemic status are performed. Implementation of population-wide weight management programs may be more beneficial towards individuals with NGT than those at a later disease stage.
Collapse
|
22
|
Song X, Ionita-Laza I, Liu M, Reibman J, We Y. A General and Robust Framework for Secondary Traits Analysis. Genetics 2016; 202:1329-43. [PMID: 26896329 PMCID: PMC4827729 DOI: 10.1534/genetics.115.181073] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Accepted: 02/13/2016] [Indexed: 11/18/2022] Open
Abstract
Case-control designs are commonly employed in genetic association studies. In addition to the case-control status, data on secondary traits are often collected. Directly regressing secondary traits on genetic variants from a case-control sample often leads to biased estimation. Several statistical methods have been proposed to address this issue. The inverse probability weighting (IPW) approach and the semiparametric maximum-likelihood (SPML) approach are the most commonly used. A new weighted estimating equation (WEE) approach is proposed to provide unbiased estimation of genetic associations with secondary traits, by combining observed and counterfactual outcomes. Compared to the existing approaches, WEE is more robust against biased sampling and disease model misspecification. We conducted simulations to evaluate the performance of the WEE under various models and sampling schemes. The WEE demonstrated robustness in all scenarios investigated, had appropriate type I error, and was as powerful or more powerful than the IPW and SPML approaches. We applied the WEE to an asthma case-control study to estimate the associations between the thymic stromal lymphopoietin gene and two secondary traits: overweight status and serum IgE level. The WEE identified two SNPs associated with overweight in logistic regression, three SNPs associated with serum IgE levels in linear regression, and an additional four SNPs that were missed in linear regression to be associated with the 75th quantile of IgE in quantile regression. The WEE approach provides a general and robust secondary analysis framework, which complements the existing approaches and should serve as a valuable tool for identifying new associations with secondary traits.
Collapse
Affiliation(s)
- Xiaoyu Song
- Heilbrunn Department of Population and Family Health, Columbia University, New York, New York 10032
| | | | - Mengling Liu
- Department of Population Health, New York University School of Medicine, New York, New York 10016
| | - Joan Reibman
- Department of Medicine, New York University School of Medicine, New York, New York 10016
| | - Ying We
- Department of Biostatistics, Columbia University, New York, New York 10032
| |
Collapse
|
23
|
Kim J, Pan W. A cautionary note on using secondary phenotypes in neuroimaging genetic studies. Neuroimage 2015; 121:136-45. [PMID: 26220747 PMCID: PMC4604049 DOI: 10.1016/j.neuroimage.2015.07.058] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Revised: 06/12/2015] [Accepted: 07/20/2015] [Indexed: 11/18/2022] Open
Abstract
Almost all genome-wide association studies (GWASs), including Alzheimer's Disease Neuroimaging Initiative (ADNI), are based on the case-control study design, implying that the resulting case-control data are likely a biased, not random, sample of the target population. Although association analysis of the disease (e.g. Alzheimer's disease in the ADNI) can be conducted using a standard logistic regression by ignoring the biased case-control sampling, a standard linear regression analysis on a secondary phenotype (e.g. any neuroimaging phenotype in the ADNI) may in general lead to biased inference, including biased parameter estimates, inflated Type I errors and reduced power for association testing. Despite of this well known result in genetic epidemiology, to our surprise, all the published studies on secondary phenotypes with the ADNI data have ignored this potential problem. Here we aim to answer whether such a standard analysis of a secondary phenotype is valid or problematic with the ADNI data. Through both real data analyses and simulation studies, we found that, strikingly, such an analysis was generally valid (with only small biases or slightly inflated Type I errors) for the ADNI data, though cautions must be taken when analyzing other data. We also illustrate applications and possible problems of two methods specifically developed for valid analysis of secondary phenotypes.
Collapse
Affiliation(s)
- Junghi Kim
- Division of Biostatistics, University of Minnesota, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, USA.
| |
Collapse
|
24
|
Yang L, Lu X, Deng J, Zhou Y, Huang D, Qiu F, Yang X, Yang R, Fang W, Ran P, Zhong N, Zhou Y, Fang S, Lu J. Risk factors shared by COPD and lung cancer and mediation effect of COPD: two center case–control studies. Cancer Causes Control 2014; 26:11-24. [DOI: 10.1007/s10552-014-0475-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Accepted: 10/07/2014] [Indexed: 02/07/2023]
|
25
|
Tseng TS, Park JY, Zabaleta J, Moody-Thomas S, Sothern MS, Chen T, Evans DE, Lin HY. Role of nicotine dependence on the relationship between variants in the nicotinic receptor genes and risk of lung adenocarcinoma. PLoS One 2014; 9:e107268. [PMID: 25233467 PMCID: PMC4169410 DOI: 10.1371/journal.pone.0107268] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Accepted: 08/14/2014] [Indexed: 12/27/2022] Open
Abstract
Several variations in the nicotinic receptor genes have been identified to be associated with both lung cancer risk and smoking in the genome-wide association (GWA) studies. However, the relationships among these three factors (genetic variants, nicotine dependence, and lung cancer) remain unclear. In an attempt to elucidate these relationships, we applied mediation analysis to quantify the impact of nicotine dependence on the association between the nicotinic receptor genetic variants and lung adenocarcinoma risk. We evaluated 23 single nucleotide polymorphisms (SNPs) in the five nicotinic receptor related genes (CHRNB3, CHRNA6, and CHRNA5/A3/B4) previously reported to be associated with lung cancer risk and smoking behavior and 14 SNPs in the four 'control' genes (TERT, CLPTM1L, CYP1A1, and TP53), which were not reported in the smoking GWA studies. A total of 661 lung adenocarcinoma cases and 1,347 controls with a smoking history, obtained from the Environment and Genetics in Lung Cancer Etiology case-control study, were included in the study. Results show that nicotine dependence is a mediator of the association between lung adenocarcinoma and gene variations in the regions of CHRNA5/A3/B4 and accounts for approximately 15% of this relationship. The top two CHRNA3 SNPs associated with the risk for lung adenocarcinoma were rs1051730 and rs12914385 (p-value = 1.9×10(-10) and 1.1×10(-10), respectively). Also, these two SNPs had significant indirect effects on lung adenocarcinoma risk through nicotine dependence (p = 0.003 and 0.007). Gene variations rs2736100 and rs2853676 in TERT and rs401681 and rs31489 in CLPTM1L had significant direct associations on lung adenocarcinoma without indirect effects through nicotine dependence. Our findings suggest that nicotine dependence plays an important role between genetic variants in the CHRNA5/A3/B4 region, especially CHRNA3, and lung adenocarcinoma. This may provide valuable information for understanding the pathogenesis of lung adenocarcinoma and for conducting personalized smoking cessation interventions.
Collapse
Affiliation(s)
- Tung-Sung Tseng
- Behavioral and Community Health Sciences, School of Public Health and Stanley S. Scott Cancer Center, Louisiana State University Health Sciences Center, New Orleans, LA, United States of America
| | - Jong Y. Park
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States of America
| | - Jovanny Zabaleta
- Department of Pediatrics and Stanley S. Scott Cancer Center, Louisiana State University Health Sciences Center, New Orleans, LA, United States of America
| | - Sarah Moody-Thomas
- Behavioral and Community Health Sciences, School of Public Health and Stanley S. Scott Cancer Center, Louisiana State University Health Sciences Center, New Orleans, LA, United States of America
| | - Melinda S. Sothern
- Behavioral and Community Health Sciences, School of Public Health and Stanley S. Scott Cancer Center, Louisiana State University Health Sciences Center, New Orleans, LA, United States of America
| | - Ted Chen
- Department of Global Community Health and Behavioral Sciences, Tulane University, New Orleans, LA, United States of America
| | - David E. Evans
- Department of Health Outcomes and Behavior, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States of America
| | - Hui-Yi Lin
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States of America
| |
Collapse
|
26
|
Lutz SM, Hokanson JE, Lange C. An alternative hypothesis testing strategy for secondary phenotype data in case-control genetic association studies. Front Genet 2014; 5:188. [PMID: 25071819 PMCID: PMC4076613 DOI: 10.3389/fgene.2014.00188] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 06/04/2014] [Indexed: 11/13/2022] Open
Abstract
Motivated by the challenges associated with accounting for the ascertainment when analyzing secondary phenotypes that are correlated with case-control status, Lin and Zeng have proposed a method that properly reflects the case-control sampling (Lin and Zeng, 2009). The Lin and Zeng method has the advantage of accurately estimating effect sizes for secondary phenotypes that are normally distributed or dichotomous. This method can be computationally intensive in practice under the null hypothesis when the likelihood surface that needs to be maximized can be relatively flat. We propose an extension of the Lin and Zeng method for hypothesis testing that uses proportional odds logistic regression to circumvent these computational issues. Through simulation studies, we compare the power and type-1 error rate of our method to standard approaches and Lin and Zeng's approach.
Collapse
Affiliation(s)
- Sharon M Lutz
- Department of Biostatistics, University of Colorado Aurora, CO, USA
| | - John E Hokanson
- Department of Epidemiology, University of Colorado Aurora, CO, USA
| | - Christoph Lange
- Department of Biostatistics, Harvard School of Public Health Boston, MA, USA ; Channing Laboratory, Harvard Medical School Boston, MA, USA ; Institute for Genomic Mathematics, University of Bonn Bonn, Germany ; German Center for Neurodegenerative Diseases (DZNE) Bonn, Germany
| |
Collapse
|
27
|
Yang L, Yang X, Ji W, Deng J, Qiu F, Yang R, Fang W, Zhang L, Huang D, Xie C, Zhang H, Zhong N, Ran P, Zhou Y, Lu J. Effects of a functional variant c.353T>C in snai1 on risk of two contextual diseases. Chronic obstructive pulmonary disease and lung cancer. Am J Respir Crit Care Med 2014; 189:139-48. [PMID: 24354880 DOI: 10.1164/rccm.201307-1355oc] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
RATIONALE Epithelial-mesenchymal transition (EMT) plays a key role in the development of chronic obstructive pulmonary disease (COPD) and lung cancer. OBJECTIVES There are five major EMT regulatory genes (Snai1, Slug, Zeb1, Zeb2, and Twist1) involved in EMT. We hypothesized that germline variants in these genes may influence the development of both diseases. METHODS Seven genetic variants were genotyped in two two-stage case-control studies with 2,072 lung cancer cases and 2,077 control subjects, and 1,791 patients with COPD and 1,940 control subjects to show their associations with development of both diseases. MEASUREMENTS AND MAIN RESULTS An exon variant c.353T>C(p.Val118Ala) of Snai1 harbored decreased risks of lung cancer (CT/CC vs. TT: odds ratio [OR], 0.76; 95% confidence interval [CI], 0.65-0.90) and COPD (CC vs. CT vs. TT: OR, 0.75; 95% CI, 0.63-0.89), and c.353T>C affected lung cancer risk indirectly through COPD (COPD accounted for 6.78% of effect that the variant had on lung cancer). Moreover, c.353T>C was correlated with lung cancer stages in smoking patients (P = 0.013), and those with the c.353C genotypes were less likely to have metastasis at diagnosis than those with the c.353TT genotype (OR, 0.60; 95% CI, 0.41-0.88). The c.353C allele encoding p.118Ala attenuated Snai1's ability to up-regulate mesenchymal biomarkers (i.e., fibronectin and vimentin) expression, and to promote EMT-like changes, including morphologic changes, cell migration, and invasion. However, these effects were not observed for the other variants. CONCLUSIONS The functional germline variant c.353T>C (p.Val118Ala) of Snai1 confers consistently decreased risks of lung cancer and COPD, and this variant affects lung cancer risk through a mediation effect of COPD.
Collapse
Affiliation(s)
- Lei Yang
- 1 The State Key Lab of Respiratory Disease, The Institute for Chemical Carcinogenesis, Guangzhou Institute of Respiratory Diseases, and
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Tchetgen Tchetgen EJ. A general regression framework for a secondary outcome in case-control studies. Biostatistics 2013; 15:117-28. [PMID: 24152770 DOI: 10.1093/biostatistics/kxt041] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Modern case-control studies typically involve the collection of data on a large number of outcomes, often at considerable logistical and monetary expense. These data are of potentially great value to subsequent researchers, who, although not necessarily concerned with the disease that defined the case series in the original study, may want to use the available information for a regression analysis involving a secondary outcome. Because cases and controls are selected with unequal probability, regression analysis involving a secondary outcome generally must acknowledge the sampling design. In this paper, the author presents a new framework for the analysis of secondary outcomes in case-control studies. The approach is based on a careful re-parameterization of the conditional model for the secondary outcome given the case-control outcome and regression covariates, in terms of (a) the population regression of interest of the secondary outcome given covariates and (b) the population regression of the case-control outcome on covariates. The error distribution for the secondary outcome given covariates and case-control status is otherwise unrestricted. For a continuous outcome, the approach sometimes reduces to extending model (a) by including a residual of (b) as a covariate. However, the framework is general in the sense that models (a) and (b) can take any functional form, and the methodology allows for an identity, log or logit link function for model (a).
Collapse
Affiliation(s)
- Eric J Tchetgen Tchetgen
- Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA
| |
Collapse
|
29
|
Ghosh A, Wright FA, Zou F. Unified Analysis of Secondary Traits in Case-Control Association Studies. J Am Stat Assoc 2013; 108. [PMID: 24409003 DOI: 10.1080/01621459.2013.793121] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
It has been repeatedly shown that in case-control association studies, analysis of a secondary trait which ignores the original sampling scheme can produce highly biased risk estimates. Although a number of approaches have been proposed to properly analyze secondary traits, most approaches fail to reproduce the marginal logistic model assumed for the original case-control trait and/or do not allow for interaction between secondary trait and genotype marker on primary disease risk. In addition, the flexible handling of covariates remains challenging. We present a general retrospective likelihood framework to perform association testing for both binary and continuous secondary traits which respects marginal models and incorporates the interaction term. We provide a computational algorithm, based on a reparameterized approximate profile likelihood, for obtaining the maximum likelihood (ML) estimate and its standard error for the genetic effect on secondary trait, in presence of covariates. For completeness we also present an alternative pseudo-likelihood method for handling covariates. We describe extensive simulations to evaluate the performance of the ML estimator in comparison with the pseudo-likelihood and other competing methods.
Collapse
Affiliation(s)
- Arpita Ghosh
- Public Health Foundation of India, New Delhi, India
| | - Fred A Wright
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, USA
| |
Collapse
|
30
|
Lutz S, Yip WK, Hokanson J, Laird N, Lange C. A general semi-parametric approach to the analysis of genetic association studies in population-based designs. BMC Genet 2013; 14:13. [PMID: 23448186 PMCID: PMC3648382 DOI: 10.1186/1471-2156-14-13] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2012] [Accepted: 02/01/2013] [Indexed: 12/03/2022] Open
Abstract
Background For genetic association studies in designs of unrelated individuals, current statistical methodology typically models the phenotype of interest as a function of the genotype and assumes a known statistical model for the phenotype. In the analysis of complex phenotypes, especially in the presence of ascertainment conditions, the specification of such model assumptions is not straight-forward and is error-prone, potentially causing misleading results. Results In this paper, we propose an alternative approach that treats the genotype as the random variable and conditions upon the phenotype. Thereby, the validity of the approach does not depend on the correctness of assumptions about the phenotypic model. Misspecification of the phenotypic model may lead to reduced statistical power. Theoretical derivations and simulation studies demonstrate both the validity and the advantages of the approach over existing methodology. In the COPDGene study (a GWAS for Chronic Obstructive Pulmonary Disease (COPD)), we apply the approach to a secondary, quantitative phenotype, the Fagerstrom nicotine dependence score, that is correlated with COPD affection status. The software package that implements this method is available. Conclusions The flexibility of this approach enables the straight-forward application to quantitative phenotypes and binary traits in ascertained and unascertained samples. In addition to its robustness features, our method provides the platform for the construction of complex statistical models for longitudinal data, multivariate data, multi-marker tests, rare-variant analysis, and others.
Collapse
Affiliation(s)
- Sharon Lutz
- Department of Biostatistics, University of Colorado Anschutz Medical Campus, Aurora, USA.
| | | | | | | | | |
Collapse
|
31
|
Wang J, Spitz MR, Amos CI, Wu X, Wetter DW, Cinciripini PM, Shete S. Method for evaluating multiple mediators: mediating effects of smoking and COPD on the association between the CHRNA5-A3 variant and lung cancer risk. PLoS One 2012; 7:e47705. [PMID: 23077662 PMCID: PMC3471886 DOI: 10.1371/journal.pone.0047705] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2012] [Accepted: 09/14/2012] [Indexed: 01/18/2023] Open
Abstract
A mediation model explores the direct and indirect effects between an independent variable and a dependent variable by including other variables (or mediators). Mediation analysis has recently been used to dissect the direct and indirect effects of genetic variants on complex diseases using case-control studies. However, bias could arise in the estimations of the genetic variant-mediator association because the presence or absence of the mediator in the study samples is not sampled following the principles of case-control study design. In this case, the mediation analysis using data from case-control studies might lead to biased estimates of coefficients and indirect effects. In this article, we investigated a multiple-mediation model involving a three-path mediating effect through two mediators using case-control study data. We propose an approach to correct bias in coefficients and provide accurate estimates of the specific indirect effects. Our approach can also be used when the original case-control study is frequency matched on one of the mediators. We employed bootstrapping to assess the significance of indirect effects. We conducted simulation studies to investigate the performance of the proposed approach, and showed that it provides more accurate estimates of the indirect effects as well as the percent mediated than standard regressions. We then applied this approach to study the mediating effects of both smoking and chronic obstructive pulmonary disease (COPD) on the association between the CHRNA5-A3 gene locus and lung cancer risk using data from a lung cancer case-control study. The results showed that the genetic variant influences lung cancer risk indirectly through all three different pathways. The percent of genetic association mediated was 18.3% through smoking alone, 30.2% through COPD alone, and 20.6% through the path including both smoking and COPD, and the total genetic variant-lung cancer association explained by the two mediators was 69.1%.
Collapse
Affiliation(s)
- Jian Wang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Margaret R. Spitz
- Department of Molecular and Cellular Biology, Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Christopher I. Amos
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Xifeng Wu
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - David W. Wetter
- Department of Health Disparities Research, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Paul M. Cinciripini
- Department of Behavioral Science, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Sanjay Shete
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
32
|
Chen HY, Kittles R, Zhang W. Bias correction to secondary trait analysis with case-control design. Stat Med 2012; 32:1494-508. [PMID: 22987618 DOI: 10.1002/sim.5613] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Accepted: 08/21/2012] [Indexed: 11/08/2022]
Abstract
In genetic association studies with densely typed genetic markers, it is often of substantial interest to examine not only the primary phenotype but also the secondary traits for their association with the genetic markers. For more efficient sample ascertainment of the primary phenotype, a case-control design or its variants, such as the extreme-value sampling design for a quantitative trait, are often adopted. The secondary trait analysis without correcting for the sample ascertainment may yield a biased association estimator. We propose a new method aiming at correcting the potential bias due to the inadequate adjustment of the sample ascertainment. The method yields explicit correction formulas that can be used to both screen the genetic markers and rapidly evaluate the sensitivity of the results to the assumed baseline case-prevalence rate in the population. Simulation studies demonstrate good performance of the proposed approach in comparison with the more computationally intensive approaches, such as the compensator approaches and the maximum prospective likelihood approach. We illustrate the application of the approach by analysis of the genetic association of prostate specific antigen in a case-control study of prostate cancer in the African American population.
Collapse
Affiliation(s)
- Hua Yun Chen
- Division of Epidemiology and Biostatistics, School of Public Health, University of Illinois at Chicago, Chicago, IL 60612 USA.
| | | | | |
Collapse
|
33
|
Wang J, Shete S. Analysis of secondary phenotype involving the interactive effect of the secondary phenotype and genetic variants on the primary disease. Ann Hum Genet 2012; 76:484-99. [PMID: 22881407 DOI: 10.1111/j.1469-1809.2012.00725.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
A genome-wide association (GWA) study is usually designed as a case-control study, where the presence and absence of the primary disease define the cases and controls, respectively. Using the existing data from GWA studies, investigators are also trying to identify the association between genetic variants and secondary phenotypes, which are defined as traits associated with the primary disease. However, recent studies have shown that bias arises in the estimation of marker-secondary phenotype association using originally collected data. We recently proposed a bias correction approach to accurately estimate the odds ratio (OR) for marker-secondary phenotype association. In this communication, we further investigated whether our bias correction approach is robust for a scenario involving the interactive effect of the secondary phenotype and genetic variants on the primary disease. We found that in such a scenario, our bias correction approach also provides an accurate estimation of OR for marker-secondary phenotype association. We investigated accuracy of our approach using simulation studies and showed that the approach better controlled for type I errors than the existing approaches. We also applied our bias correction approach to the real data analysis of association between an N-acetyltransferase gene, NAT2, and smoking on the basis of colorectal adenoma data.
Collapse
Affiliation(s)
- Jian Wang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | | |
Collapse
|
34
|
Li H, Gail MH. Efficient adaptively weighted analysis of secondary phenotypes in case-control genome-wide association studies. Hum Hered 2012; 73:159-73. [PMID: 22710642 DOI: 10.1159/000338943] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2011] [Accepted: 04/20/2012] [Indexed: 11/19/2022] Open
Abstract
We propose and compare methods of analysis for detecting associations between genotypes of a single nucleotide polymorphism (SNP) and a dichotomous secondary phenotype (X), when the data arise from a case-control study of a primary dichotomous phenotype (D), which is not rare. We considered both a dichotomous genotype (G) as in recessive or dominant models and an additive genetic model based on the number of minor alleles present. To estimate the log odds ratio β(1) relating X to G in the general population, one needs to understand the conditional distribution [D ∣ X, G] in the general population. For the most general model, [D ∣ X, G], one needs external data on P(D = 1) to estimate β(1). We show that for this 'full model', the maximum likelihood (FM) corresponds to a previously proposed weighted logistic regression (WL) approach if G is dichotomous. For the additive model, WL yields results numerically close, but not identical, to those of the maximum likelihood FM. Efficiency can be gained by assuming that [D ∣ X, G] is a logistic model with no interaction between X and G (the 'reduced model'). However, the resulting maximum likelihood (RM) can be misleading in the presence of interactions. We therefore propose an adaptively weighted approach (AW) that captures the efficiency of RM but is robust to the occasional SNP that might interact with the secondary phenotype to affect the risk of the primary disease. We study the robustness of FM, WL, RM and AW to misspecification of P(D = 1). In principle, one should be able to estimate β(1) without external information on P(D = 1) under the reduced model. However, our simulations show that the resulting inference is unreliable. Therefore, in practice one needs to introduce external information on P(D = 1), even in the absence of interactions between X and G.
Collapse
Affiliation(s)
- Huilin Li
- Division of Biostatistics, Department of Population Health, School of Medicine, New York University, New York, NY 10016, USA.
| | | |
Collapse
|
35
|
Wang J, Shete S. Power and type I error results for a bias-correction approach recently shown to provide accurate odds ratios of genetic variants for the secondary phenotypes associated with primary diseases. Genet Epidemiol 2011; 35:739-43. [PMID: 21769937 DOI: 10.1002/gepi.20611] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Revised: 06/05/2011] [Accepted: 06/12/2011] [Indexed: 11/11/2022]
Abstract
We recently proposed a bias correction approach to evaluate accurate estimation of the odds ratio (OR) of genetic variants associated with a secondary phenotype, in which the secondary phenotype is associated with the primary disease, based on the original case-control data collected for the purpose of studying the primary disease. As reported in this communication, we further investigated the type I error probabilities and powers of the proposed approach, and compared the results to those obtained from logistic regression analysis (with or without adjustment for the primary disease status). We performed a simulation study based on a frequency-matching case-control study with respect to the secondary phenotype of interest. We examined the empirical distribution of the natural logarithm of the corrected OR obtained from the bias correction approach and found it to be normally distributed under the null hypothesis. On the basis of the simulation study results, we found that the logistic regression approaches that adjust or do not adjust for the primary disease status had low power for detecting secondary phenotype associated variants and highly inflated type I error probabilities, whereas our approach was more powerful for identifying the SNP-secondary phenotype associations and had better-controlled type I error probabilities.
Collapse
Affiliation(s)
- Jian Wang
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | | |
Collapse
|