1
|
Amorim G, Tao R, Lotspeich S, Shaw PA, Lumley T, Patel RC, Shepherd BE. Three-phase generalized raking and multiple imputation estimators to address error-prone data. Stat Med 2024; 43:379-394. [PMID: 37987515 PMCID: PMC10842111 DOI: 10.1002/sim.9967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 09/23/2023] [Accepted: 11/09/2023] [Indexed: 11/22/2023]
Abstract
Validation studies are often used to obtain more reliable information in settings with error-prone data. Validated data on a subsample of subjects can be used together with error-prone data on all subjects to improve estimation. In practice, more than one round of data validation may be required, and direct application of standard approaches for combining validation data into analyses may lead to inefficient estimators since the information available from intermediate validation steps is only partially considered or even completely ignored. In this paper, we present two novel extensions of multiple imputation and generalized raking estimators that make full use of all available data. We show through simulations that incorporating information from intermediate steps can lead to substantial gains in efficiency. This work is motivated by and illustrated in a study of contraceptive effectiveness among 83 671 women living with HIV, whose data were originally extracted from electronic medical records, of whom 4732 had their charts reviewed, and a subsequent 1210 also had a telephone interview to validate key study variables.
Collapse
Affiliation(s)
- Gustavo Amorim
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Sarah Lotspeich
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Statistical Sciences, Wake Forest University, Winston-Salem, NC, USA
| | - Pamela A. Shaw
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Thomas Lumley
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Rena C. Patel
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - Bryan E. Shepherd
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
2
|
Odutola MK, van Leeuwen MT, Bruinsma F, Turner J, Hertzberg M, Seymour JF, Prince HM, Trotman J, Verner E, Roncolato F, Opat S, Lindeman R, Tiley C, Milliken ST, Underhill CR, Benke G, Giles GG, Vajdic CM. A Population-Based Family Case-Control Study of Sun Exposure and Follicular Lymphoma Risk. Cancer Epidemiol Biomarkers Prev 2024; 33:106-116. [PMID: 37831120 PMCID: PMC10774741 DOI: 10.1158/1055-9965.epi-23-0578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 08/08/2023] [Accepted: 10/11/2023] [Indexed: 10/14/2023] Open
Abstract
BACKGROUND Epidemiologic evidence suggests an inverse association between sun exposure and follicular lymphoma risk. METHODS We conducted an Australian population-based family case-control study based on 666 cases and 459 controls (288 related, 171 unrelated). Participants completed a lifetime residence and work calendar and recalled outdoor hours on weekdays, weekends, and holidays in the warmer and cooler months at ages 10, 20, 30, and 40 years, and clothing types worn in the warmer months. We used a group-based trajectory modeling approach to identify outdoor hour trajectories over time and examined associations with follicular lymphoma risk using logistic regression. RESULTS We observed an inverse association between follicular lymphoma risk and several measures of high lifetime sun exposure, particularly intermittent exposure (weekends, holidays). Associations included reduced risk with increasing time outdoors on holidays in the warmer months [highest category OR = 0.56; 95% confidence interval (CI), 0.42-0.76; Ptrend < 0.01], high outdoor hours on weekends in the warmer months (highest category OR = 0.71; 95% CI, 0.52-0.96), and increasing time outdoors in the warmer and cooler months combined (highest category OR = 0.66; 95% CI, 0.50-0.91; Ptrend 0.01). Risk was reduced for high outdoor hour maintainers in the warmer months across the decade years (OR = 0.71; 95% CI, 0.53-0.96). CONCLUSIONS High total and intermittent sun exposure, particularly in the warmer months, may be protective against the development of follicular lymphoma. IMPACT Although sun exposure is not recommended as a cancer control policy, confirming this association may provide insights regarding the future control of this intractable malignancy.
Collapse
Affiliation(s)
- Michael K. Odutola
- Centre for Big Data Research in Health, University of New South Wales, Sydney, New South Wales, Australia
| | - Marina T. van Leeuwen
- Centre for Big Data Research in Health, University of New South Wales, Sydney, New South Wales, Australia
| | - Fiona Bruinsma
- Cancer Epidemiology Division, Cancer Council Victoria, Melbourne, Victoria, Australia
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Parkville, Victoria, Australia
| | - Jennifer Turner
- Anatomical Pathology, Douglass Hanly Moir Pathology, Macquarie Park, Sydney, Australia
- Department of Clinical Medicine, Faculty of Medicine, Health and Human Science, Macquarie University, Sydney, Australia
| | - Mark Hertzberg
- Department of Haematology, Prince of Wales Hospital and University of New South Wales, Sydney, New South Wales, Australia
| | - John F. Seymour
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, Victoria, Australia
| | - H. Miles Prince
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, Victoria, Australia
| | - Judith Trotman
- Concord Repatriation General Hospital, Concord, New South Wales, Australia
| | - Emma Verner
- Concord Repatriation General Hospital, Concord, New South Wales, Australia
| | | | - Stephen Opat
- Clinical Haematology, Monash Health, Clayton, Victoria, Australia
| | - Robert Lindeman
- New South Wales Health Pathology, Sydney, New South Wales, Australia
| | | | | | - Craig R. Underhill
- Border Medical Oncology Research Unit, Albury, New South Wales, Australia
| | - Geza Benke
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
| | - Graham G. Giles
- Cancer Epidemiology Division, Cancer Council Victoria, Melbourne, Victoria, Australia
- Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Victoria, Australia
| | - Claire M. Vajdic
- Centre for Big Data Research in Health, University of New South Wales, Sydney, New South Wales, Australia
- The Kirby Institute, University of New South Wales, Sydney, New South Wales, Australia
| |
Collapse
|
3
|
Odutola MK, van Leeuwen MT, Bassett JK, Bruinsma F, Turner J, Seymour JF, Prince HM, Milliken ST, Hertzberg M, Roncolato F, Opat SS, Lindeman R, Tiley C, Trotman J, Verner E, Harvey M, Underhill CR, Benke G, Giles GG, Vajdic CM. Dietary intake of animal-based products and likelihood of follicular lymphoma and survival: A population-based family case-control study. Front Nutr 2023; 9:1048301. [PMID: 36687712 PMCID: PMC9846614 DOI: 10.3389/fnut.2022.1048301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 12/12/2022] [Indexed: 01/06/2023] Open
Abstract
Background The association between dietary intake of foods of animal origin and follicular lymphoma (FL) risk and survival is uncertain. In this study, we examined the relationship between dietary intake of dairy foods and fats, meat, fish and seafoods, and the likelihood of FL and survival. Methods We conducted a population-based family case-control study in Australia between 2011 and 2016 and included 710 cases, 303 siblings and 186 spouse/partner controls. We assessed dietary intake of animal products prior to diagnosis (the year before last) using a structured food frequency questionnaire and followed-up cases over a median of 6.9 years using record linkage to national death data. We examined associations with the likelihood of FL using logistic regression and used Cox regression to assess association with all-cause and FL-specific mortality among cases. Results We observed an increased likelihood of FL with increasing daily quantity of oily fish consumption in the year before last (highest category OR = 1.96, CI = 1.02-3.77; p-trend 0.06) among cases and sibling controls, but no associations with spouse/partner controls. We found no association between the likelihood of FL and the consumption of other types of fish or seafood, meats or dairy foods and fats. In FL cases, we found no association between meat or oily fish intake and all-cause or FL-specific mortality. Conclusion Our study showed suggestive evidence of a positive association between oily fish intake and the likelihood of FL, but findings varied by control type. Further investigation of the potential role of environmental contaminants in oily fish on FL etiology is warranted.
Collapse
Affiliation(s)
- Michael K. Odutola
- Centre for Big Data Research in Health, University of New South Wales, Sydney, NSW, Australia
| | - Marina T. van Leeuwen
- Centre for Big Data Research in Health, University of New South Wales, Sydney, NSW, Australia
| | - Julie K. Bassett
- Cancer Epidemiology Division, Cancer Council Victoria, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Parkville, VIC, Australia
| | - Fiona Bruinsma
- Cancer Epidemiology Division, Cancer Council Victoria, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Parkville, VIC, Australia
| | - Jennifer Turner
- Douglass Hanly Moir Pathology, Macquarie Park, NSW, Australia,Department of Clinical Medicine, Faculty of Medicine, Health and Human Science, Macquarie University, Sydney, NSW, Australia
| | - John F. Seymour
- Royal Melbourne Hospital, Peter MacCallum Cancer Centre, University of Melbourne, Melbourne, VIC, Australia
| | - Henry Miles Prince
- Epworth Healthcare and Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, VIC, Australia
| | - Samuel T. Milliken
- St. Vincent's Hospital, University of New South Wales, Sydney, NSW, Australia
| | - Mark Hertzberg
- Department of Haematology, Prince of Wales Hospital, University of New South Wales, Sydney, NSW, Australia
| | - Fernando Roncolato
- St. George Hospital, Kogarah, NSW, Australia,St. George Clinical School, University of New South Wales, Kogarah, NSW, Australia
| | - Stephen S. Opat
- Clinical Haematology, Monash Health and Monash University, Clayton, VIC, Australia
| | - Robert Lindeman
- New South Wales Health Pathology, University of New South Wales, Sydney, NSW, Australia
| | - Campbell Tiley
- Gosford Hospital, The University of Newcastle, Callaghan, NSW, Australia
| | - Judith Trotman
- Concord Repatriation General Hospital, University of Sydney, Concord, NSW, Australia
| | - Emma Verner
- Concord Repatriation General Hospital, University of Sydney, Concord, NSW, Australia
| | - Michael Harvey
- Liverpool Hospital, Western Sydney University, Liverpool, NSW, Australia
| | - Craig R. Underhill
- Border Medical Oncology Research Unit, Rural Medical School, Albury, NSW, Australia
| | - Geza Benke
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| | - Graham G. Giles
- Cancer Epidemiology Division, Cancer Council Victoria, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Parkville, VIC, Australia,Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Melbourne, VIC, Australia
| | - Claire M. Vajdic
- Centre for Big Data Research in Health, University of New South Wales, Sydney, NSW, Australia,The Kirby Institute, University of New South Wales, Sydney, NSW, Australia,*Correspondence: Claire M. Vajdic ✉
| |
Collapse
|
4
|
Odutola MK, van Leeuwen MT, Turner J, Bruinsma F, Seymour JF, Prince HM, Milliken ST, Hertzberg M, Trotman J, Opat SS, Lindeman R, Roncolato F, Verner E, Harvey M, Tiley C, Underhill CR, Benke G, Giles GG, Vajdic CM. Associations between early-life growth pattern and body size and follicular lymphoma risk and survival: a family-based case-control study. Cancer Epidemiol 2022; 80:102241. [PMID: 36058036 DOI: 10.1016/j.canep.2022.102241] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 08/04/2022] [Accepted: 08/21/2022] [Indexed: 11/02/2022]
Abstract
BACKGROUND The influence of early-life growth pattern and body size on follicular lymphoma (FL) risk and survival is unclear. In this study, we aimed to investigate the association between gestational age, growth during childhood, body size, changes in body shape over time, and FL risk and survival. METHODS We conducted a population-based family case-control study and included 706 cases and 490 controls. We ascertained gestational age, growth during childhood, body size and body shape using questionnaires and followed-up cases (median=83 months) using record linkage with national death records. We used a group-based trajectory modeling approach to identify body shape trajectories from ages 5-70. We examined associations with FL risk using unconditional logistic regression and used Cox regression to assess the association between body mass index (BMI) and all-cause and FL-specific mortality among cases. RESULTS We found no association between gestational age, childhood height and FL risk. We observed a modest increase in FL risk with being obese 5 years prior to enrolment (OR=1.43, 95 %CI=0.99-2.06; BMI ≥30 kg/m2) and per 5-kg/m2 increase in BMI 5 years prior to enrolment (OR=1.14, 95 %CI=0.99-1.31). The excess risk for obesity 5 years prior to enrolment was higher for ever-smokers (OR=2.00, 95 %CI=1.08-3.69) than never-smokers (OR=1.14, 95 %CI=0.71-1.84). We found no association between FL risk and BMI at enrolment, BMI for heaviest lifetime weight, the highest categories of adult weight or height, trouser size, body shape at different ages or body shape trajectory. We also observed no association between all-cause or FL-specific mortality and excess adiposity at or prior to enrolment. CONCLUSION We observed a weak association between elevated BMI and FL risk, and no association with all-cause or FL-specific mortality, consistent with previous studies. Future studies incorporating biomarkers are needed to elucidate possible mechanisms underlying the role of body composition in FL etiology.
Collapse
Affiliation(s)
- Michael K Odutola
- Centre for Big Data Research in Health, University of New South Wales, Sydney, New South Wales, Australia.
| | - Marina T van Leeuwen
- Centre for Big Data Research in Health, University of New South Wales, Sydney, New South Wales, Australia.
| | - Jennifer Turner
- Douglass Hanly Moir Pathology, Macquarie Park and Department of Clinical Medicine, Faculty of Medicine, Health and Human Science, Macquarie University, Sydney, Australia.
| | - Fiona Bruinsma
- Cancer Epidemiology Division, Cancer Council Victoria, and Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Parkville, Victoria, Australia.
| | - John F Seymour
- Royal Melbourne Hospital, Peter MacCallum Cancer Centre and University of Melbourne, Melbourne, Victoria, Australia.
| | - H Miles Prince
- Epworth Healthcare and Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, Victoria, Australia.
| | - Samuel T Milliken
- St. Vincent's Hospital, Sydney and University of New South Wales, Sydney, New South Wales, Australia.
| | - Mark Hertzberg
- Department of Haematology, Prince of Wales Hospital and University of New South Wales, Sydney, New South Wales, Australia.
| | - Judith Trotman
- Concord Repatriation General Hospital and University of Sydney, Concord, New South Wales, Australia.
| | - Stephen S Opat
- Clinical Haematology, Monash Health and Monash University, Clayton, Australia.
| | - Robert Lindeman
- New South Wales Health Pathology and University of New South Wales, Sydney, New South Wales, Australia.
| | - Fernando Roncolato
- St. George Hospital, Kogarah and University of New South Wales, Sydney, New South Wales, Australia.
| | - Emma Verner
- Concord Repatriation General Hospital and University of Sydney, Concord, New South Wales, Australia.
| | - Michael Harvey
- Liverpool Hospital, Liverpool and Western Sydney University, New South Wales, Australia.
| | - Campbell Tiley
- Gosford Hospital and The University of Newcastle, New South Wales, Australia.
| | - Craig R Underhill
- Rural Medical School and Border Medical Oncology Research Unit, Albury, New South Wales, Australia.
| | - Geza Benke
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia.
| | - Graham G Giles
- Cancer Epidemiology Division, Cancer Council Victoria, and Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Parkville, Victoria, Australia; Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Victoria, Australia.
| | - Claire M Vajdic
- Centre for Big Data Research in Health, University of New South Wales, Sydney, New South Wales, Australia; The Kirby Institute, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
5
|
Odutola MK, van Leeuwen MT, Turner J, Bruinsma F, Seymour JF, Prince HM, Milliken ST, Trotman J, Verner E, Tiley C, Roncolato F, Underhill CR, Opat SS, Harvey M, Hertzberg M, Benke G, Giles GG, Vajdic CM. Associations between Smoking and Alcohol and Follicular Lymphoma Incidence and Survival: A Family-Based Case-Control Study in Australia. Cancers (Basel) 2022; 14:cancers14112710. [PMID: 35681690 PMCID: PMC9179256 DOI: 10.3390/cancers14112710] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 05/16/2022] [Accepted: 05/27/2022] [Indexed: 12/10/2022] Open
Abstract
The association between smoking and alcohol consumption and follicular lymphoma (FL) incidence and clinical outcome is uncertain. We conducted a population-based family case-control study (709 cases: 490 controls) in Australia. We assessed lifetime history of smoking and recent alcohol consumption and followed-up cases (median = 83 months). We examined associations with FL risk using unconditional logistic regression and with all-cause and FL-specific mortality of cases using Cox regression. FL risk was associated with ever smoking (OR = 1.38, 95%CI = 1.08−1.74), former smoking (OR = 1.36, 95%CI = 1.05−1.77), smoking initiation before age 17 (OR = 1.47, 95%CI = 1.06−2.05), the highest categories of cigarettes smoked per day (OR = 1.44, 95%CI = 1.04−2.01), smoking duration (OR = 1.53, 95%CI = 1.07−2.18) and pack-years (OR = 1.56, 95%CI = 1.10−2.22). For never smokers, FL risk increased for those exposed indoors to >2 smokers during childhood (OR = 1.84, 95%CI = 1.11−3.04). For cases, current smoking and the highest categories of smoking duration and lifetime cigarette exposure were associated with elevated all-cause mortality. The hazard ratio for current smoking and FL-specific mortality was 2.97 (95%CI = 0.91−9.72). We found no association between recent alcohol consumption and FL risk, all-cause or FL-specific mortality. Our study showed consistent evidence of an association between smoking and increased FL risk and possibly also FL-specific mortality. Strengthening anti-smoking policies and interventions may reduce the population burden of FL.
Collapse
Affiliation(s)
- Michael K. Odutola
- Centre for Big Data Research in Health, University of New South Wales, Sydney 2052, Australia; (M.K.O.); (M.T.v.L.)
| | - Marina T. van Leeuwen
- Centre for Big Data Research in Health, University of New South Wales, Sydney 2052, Australia; (M.K.O.); (M.T.v.L.)
| | - Jennifer Turner
- Department of Anatomical Pathology, Douglass Hanly Moir Pathology, Macquarie Park 2113, Australia;
- Department of Clinical Medicine, Faculty of Medicine, Health and Human Science, Macquarie University, North Ryde 2109, Australia
| | - Fiona Bruinsma
- Cancer Epidemiology Division, Cancer Council Victoria, Melbourne 3004, Australia; (F.B.); (G.G.G.)
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Parkville 3010, Australia
| | - John F. Seymour
- Royal Melbourne Hospital, Melbourne 3052, Australia;
- Peter MacCallum Cancer Centre, University of Melbourne, Parkville 3010, Australia;
| | - Henry M. Prince
- Peter MacCallum Cancer Centre, University of Melbourne, Parkville 3010, Australia;
- Epworth Healthcare, Richmond 3121, Australia
| | - Samuel T. Milliken
- St. Vincent’s Hospital, Sydney 2010, Australia;
- University of New South Wales, Sydney 2052, Australia; (F.R.); (M.H.)
| | - Judith Trotman
- Concord Repatriation General Hospital, Concord 2139, Australia; (J.T.); (E.V.)
- Faculty of Medicine and Health, University of Sydney, Concord 2139, Australia
| | - Emma Verner
- Concord Repatriation General Hospital, Concord 2139, Australia; (J.T.); (E.V.)
- Faculty of Medicine and Health, University of Sydney, Concord 2139, Australia
| | - Campbell Tiley
- Gosford Hospital, Gosford 2250, Australia;
- School of Medicine and Public Health, The University of Newcastle, Newcastle 2308, Australia
| | - Fernando Roncolato
- University of New South Wales, Sydney 2052, Australia; (F.R.); (M.H.)
- St. George Hospital, Kogarah 2217, Australia
| | - Craig R. Underhill
- Rural Medical School, Albury 2640, Australia;
- Border Medical Oncology Research Unit, Albury 2640, Australia
| | - Stephen S. Opat
- Clinical Haematology, Monash Health and Monash University, Clayton 3168, Australia;
| | - Michael Harvey
- Liverpool Hospital, Liverpool 2170, Australia;
- Western Sydney University, Sydney 2000, Australia
| | - Mark Hertzberg
- University of New South Wales, Sydney 2052, Australia; (F.R.); (M.H.)
- Department of Haematology, Prince of Wales Hospital, Sydney 2031, Australia
| | - Geza Benke
- School of Public Health and Preventive Medicine, Monash University, Melbourne 3004, Australia;
| | - Graham G. Giles
- Cancer Epidemiology Division, Cancer Council Victoria, Melbourne 3004, Australia; (F.B.); (G.G.G.)
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Parkville 3010, Australia
- Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton 3168, Australia
| | - Claire M. Vajdic
- Centre for Big Data Research in Health, University of New South Wales, Sydney 2052, Australia; (M.K.O.); (M.T.v.L.)
- Kirby Institute, University of New South Wales, Sydney 2052, Australia
- Correspondence:
| |
Collapse
|
6
|
Lee J, Cook RJ. The illness-death model for family studies. Biostatistics 2019; 22:482-503. [PMID: 31742352 DOI: 10.1093/biostatistics/kxz048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 10/14/2019] [Accepted: 10/18/2019] [Indexed: 11/12/2022] Open
Abstract
Family studies involve the selection of affected individuals from a disease registry who provide right-truncated ages of disease onset. Coarsened disease histories are then obtained from consenting family members, either through examining medical records, retrospective reporting, or clinical examination. Methods for dealing with such biased sampling schemes are available for continuous, binary, and failure time responses, but methods for more complex life history processes are less developed. We consider a simple joint model for clustered illness-death processes which we formulate to study covariate effects on the marginal intensity for disease onset and to study the within-family dependence in disease onset times. We construct likelihoods and composite likelihoods for family data obtained from biased sampling schemes. In settings where the disease is rare and data are insufficient to fit the model of interest, we show how auxiliary data can augment the composite likelihood to facilitate estimation. We apply the proposed methods to analyze data from a family study of psoriatic arthritis carried out at the University of Toronto Psoriatic Arthritis Registry.
Collapse
Affiliation(s)
- Jooyoung Lee
- Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | - Richard J Cook
- Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| |
Collapse
|
7
|
Namkung J, Won S. Single Marker Family-Based Association Analysis Not Conditional on Parental Information. Methods Mol Biol 2017; 1666:409-439. [PMID: 28980257 DOI: 10.1007/978-1-4939-7274-6_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Family-based association analysis unconditional on parental genotypes models the effects of observed genotypes. This approach has been shown to have greater power than conditional methods. In this chapter, we review popular association analysis methods accounting for familial correlations: the marginal model using generalized estimating equations (GEE), the mixed model with a polygenic random component, and genome-wide association analyses. The marginal approach does not explicitly model familial correlations but uses the information to improve the efficiency of parameter estimates. This model, using GEE, is useful when the correlation structure is not of interest; the correlations are treated as nuisance parameters. In the mixed model, familial correlations are modeled as random effects, e.g., the polygenic inheritance model accounts for correlations originating from shared genomic components within a family. These unconditional methods provide a flexible modeling framework for general pedigree data to accommodate traits with various distributions and many types of covariate effects. Genome-wide association studies usually test more than 10,000 SNPs and thus traditional statistical methods accounting for the familial correlations often suffer from a computational burden. Multiple approaches that have been recently proposed to avoid this computational issue are reviewed. The single-marker analysis procedures are demonstrated using the R package gee and the ASSOC program in the S.A.G.E. package, including how to prepare input data, conduct the analysis, and interpret the output. ASSOC allows models to include random components of additional familial correlations that may be not sufficiently explained by a polygenic effect and addresses nonnormality of response variables by transformation methods. With its ease of use, ASSOC provides a useful tool for association analysis of large pedigree data.
Collapse
Affiliation(s)
- Junghyun Namkung
- Molecular Diagnostics Team, IVD Business Unit, SK Telecom, SK T-tower 65 Eulji-ro, Jung-gu, 04539, Seoul, South Korea.
| | - Sungho Won
- Department of Public Health Science, Graduate School of Public Health, Seoul National University, Seoul, South Korea
| |
Collapse
|
8
|
Liu Z, Coghill AE, Pfeiffer RM, Hsu WL, Lou PJ, Wang CP, Yu KJ, Niwa S, Brotzman M, Ye W, Chen CJ, Hildesheim A. Birth order and risk of nasopharyngeal carcinoma in multiplex families from Taiwan. Int J Cancer 2016; 139:2467-73. [PMID: 27537611 DOI: 10.1002/ijc.30390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 08/04/2016] [Indexed: 11/06/2022]
Abstract
A small proportion of individuals infected with Epstein-Barr virus (EBV) develop nasopharyngeal carcinoma (NPC). Timing of initial exposure could alter immunological responses to primary EBV infection and explain variation in cancer risk later in life. We measured early life family structure as a proxy for the timing of primary EBV infection to examine whether earlier age at infection alters NPC risk. We utilized data from 480 NPC cases and 1,291 unaffected siblings from Taiwanese NPC multiplex families (≥ 2 family members with NPC, N = 2,921). Information on birth order within the family was derived from questionnaires. We utilized logistic regression models to examine the association between birth order and NPC, accounting for correlations between relatives. Within these high-risk families, older siblings had an elevated risk of NPC. Compared with being a first-born child, the risk (95% CIs) of NPC associated with a birth order of two, three, four and five or more was 1.00 (0.71, 1.40), 0.88 (0.62, 1.24), 0.74 (0.53, 1.05) and 0.60 (0.43, 0.82), respectively (P for trend = 0.002). We observed no associations between NPC risk and the number of younger siblings or cumulative infant-years exposure. These associations were not modified by underlying genetic predisposition or family size. We observed that early life family structure was important for NPC risk in NPC multiplex families, with older siblings having a greater risk of disease. Future studies focusing on more direct measures of the immune response to EBV in early childhood could elucidate the underlying mechanisms.
Collapse
Affiliation(s)
- Zhiwei Liu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, 17177, Sweden.
| | - Anna E Coghill
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, Maryland
| | - Ruth M Pfeiffer
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, Maryland
| | - Wan-Lun Hsu
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Pei-Jen Lou
- Department of Otolaryngology, National Taiwan University Hospital and National Taiwan University College of Medicine, Taipei, Taiwan
| | - Cheng-Ping Wang
- Department of Otolaryngology, National Taiwan University Hospital and National Taiwan University College of Medicine, Taipei, Taiwan
| | - Kelly J Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, Maryland
| | | | | | - Weimin Ye
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, 17177, Sweden
| | - Chien-Jen Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan.,Graduate Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Allan Hildesheim
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, Maryland
| |
Collapse
|
9
|
Tsonaka R, van der Woude D, Houwing-Duistermaat J. Marginal genetic effects estimation in family and twin studies using random-effects models. Biometrics 2015; 71:1130-8. [PMID: 26148843 DOI: 10.1111/biom.12350] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 04/01/2015] [Accepted: 05/01/2015] [Indexed: 11/30/2022]
Abstract
Random-effects models are often used in family-based genetic association studies to properly capture the within families relationships. In such models, the regression parameters have a conditional on the random effects interpretation and they measure, e.g., genetic effects for each family. Estimating parameters that can be used to make inferences at the population level is often more relevant than the family-specific effects, but not straightforward. This is mainly for two reasons: First the analysis of family data often requires high-dimensional random-effects vectors to properly model the familial relationships, for instance when members with a different degree of relationship are considered, such as trios, mix of monozygotic and dizygotic twins, etc. The second complication is the biased sampling design, such as the multiple cases families design, which is often employed to enrich the sample with genetic information. For these reasons deriving parameters with the desired marginal interpretation can be challenging. In this work we consider the marginalized mixed-effects models, we discuss challenges in applying them in ascertained family data and propose penalized maximum likelihood methodology to stabilize the parameter estimation by using external information on the disease prevalence or heritability. The performance of our methodology is evaluated via simulation and is illustrated on data from Rheumatoid Arthritis patients, where we estimate the marginal effect of HLA-DRB1*13 and shared epitope alleles across three different study designs and combine them using meta-analysis.
Collapse
Affiliation(s)
- Roula Tsonaka
- Department of Medical Statistics and BioInformatics, Leiden University Medical Center, Post Zone S5-P, PO Box 9600, 2300 RC Leiden, The Netherlands
| | - Diane van der Woude
- Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
| | - Jeanine Houwing-Duistermaat
- Department of Medical Statistics and BioInformatics, Leiden University Medical Center, Post Zone S5-P, PO Box 9600, 2300 RC Leiden, The Netherlands
| |
Collapse
|
10
|
Tsai MY. Variable selection in Bayesian generalized linear-mixed models: an illustration using candidate gene case-control association studies. Biom J 2014; 57:234-53. [PMID: 25267186 DOI: 10.1002/bimj.201300259] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Revised: 04/25/2014] [Accepted: 06/21/2014] [Indexed: 11/07/2022]
Abstract
The problem of variable selection in the generalized linear-mixed models (GLMMs) is pervasive in statistical practice. For the purpose of variable selection, many methodologies for determining the best subset of explanatory variables currently exist according to the model complexity and differences between applications. In this paper, we develop a "higher posterior probability model with bootstrap" (HPMB) approach to select explanatory variables without fitting all possible GLMMs involving a small or moderate number of explanatory variables. Furthermore, to save computational load, we propose an efficient approximation approach with Laplace's method and Taylor's expansion to approximate intractable integrals in GLMMs. Simulation studies and an application of HapMap data provide evidence that this selection approach is computationally feasible and reliable for exploring true candidate genes and gene-gene associations, after adjusting for complex structures among clusters.
Collapse
Affiliation(s)
- Miao-Yu Tsai
- Institute of Statistics and Information Science, National Changhua University of Education, Changhua, 500, Taiwan
| |
Collapse
|
11
|
Wen SH, Tsai MY. Haplotype association analysis of combining unrelated case-control and triads with consideration of population stratification. Front Genet 2014; 5:103. [PMID: 24860592 PMCID: PMC4028876 DOI: 10.3389/fgene.2014.00103] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Accepted: 04/09/2014] [Indexed: 12/27/2022] Open
Abstract
Combining data when data are collected under different study designs, such as family trios and unrelated case-control samples, gains more power and is cost-effective than analyzing each data separately. However, a potential concern is population stratification (PS) among unrelated case-control samples and analyses integrating data should address this confounding effect. In this paper, we develop a simpler method, haplotype generalized linear model (HGLM), that tests and estimates haplotype effects on disease risk and allows for modification against PS for combining data. We proposed to combine information across aggregations of haplotype weighted-counts estimated from population case-control data and trio data separately, and to perform subsequent GLM analysis. Furthermore, we present a framework of analysis of variance based on haplotype weighted-counts for detecting whether it is appropriate to combine two data sources, as well as the modified HGLM with clustering methods for addressing PS. We evaluate the statistical properties in terms of the accuracy, false positive rate (FPR) and empirical power using simulated data with regard to various disease risks, sample sizes, multi-SNP haplotypes and the presence of PS. Our simulation results indicate that HGLM performs comparably well with the likelihood-based haplotype association analysis, particularly when the haplotype effects are moderate, but may not perform well when dealing with lengthy haplotypes for small sample sizes. In the presence of PS, the modified HGLM remains valid and has satisfactory nominal level and small bias. Overall, HGLM appears to be successful in combining data and is simple to implement in standard statistical software.
Collapse
Affiliation(s)
- Shu-Hui Wen
- Department of Public Health, College of Medicine, Tzu-Chi University Hualien, Taiwan
| | - Miao-Yu Tsai
- Institute of Statistics and Information Science, National Changhua University of Education Chang-Hua, Taiwan
| |
Collapse
|
12
|
Wang X, Lee S, Zhu X, Redline S, Lin X. GEE-based SNP set association test for continuous and discrete traits in family-based association studies. Genet Epidemiol 2013; 37:778-86. [PMID: 24166731 PMCID: PMC4007511 DOI: 10.1002/gepi.21763] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Revised: 08/17/2013] [Accepted: 09/10/2013] [Indexed: 12/17/2022]
Abstract
Family-based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P-values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP-SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P-value GEE test for an SNP-set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.
Collapse
Affiliation(s)
- Xuefeng Wang
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA 02115
| | - Seunggeun Lee
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA 02115
| | - Xiaofeng Zhu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA 44106
| | - Susan Redline
- Department of Medicine, Brigham and Women’s Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Xihong Lin
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA 02115
| |
Collapse
|
13
|
Tsonaka R, De Visser MCH, Houwing-Duistermaat J. Estimation of genetic effects in multiple cases family studies using penalized maximum likelihood methodology. Biostatistics 2012; 14:220-31. [PMID: 22989557 DOI: 10.1093/biostatistics/kxs032] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Family studies are often used in genetic research to explore associations between genetic markers and various phenotypes. A commonly used design oversamples families enriched with the disease under study for efficient data collection and estimation. For instance, in a multiple cases family study, families are selected based on the number of affected relatives. In such cases, valid inference for the model parameters relies on the proper modeling of both the within family correlations and the outcome-dependent sampling, also known as ascertainment. A flexible modeling approach is the ascertainment-corrected mixed-effects model, but it is known to only be asymptotically identifiable, because in small samples the available data do not provide sufficient information to estimate both the intercept and the genetic variance. To deal with this issue, we propose a penalized maximum likelihood estimation procedure which reliably estimates the model parameters in small family studies by using external population-based information.
Collapse
Affiliation(s)
- Roula Tsonaka
- Department of Medical Statistics and BioInformatics, Leiden University Medical Center, Post Zone S5-P, PO Box 9600, 2300 RC Leiden, The Netherlands.
| | | | | |
Collapse
|
14
|
Balliu B, Tsonaka R, van der Woude D, Boehringer S, Houwing-Duistermaat JJ. Combining family and twin data in association studies to estimate the noninherited maternal antigens effect. Genet Epidemiol 2012; 36:811-9. [PMID: 22851506 DOI: 10.1002/gepi.21667] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Revised: 06/06/2012] [Accepted: 06/20/2012] [Indexed: 11/08/2022]
Abstract
It is hypothesized that certain alleles can have a protective effect not only when inherited by the offspring but also as noninherited maternal antigens (NIMA). To estimate the NIMA effect, large samples of families are needed. When large samples are not available, we propose a combined approach to estimate the NIMA effect from ascertained nuclear families and twin pairs. We develop a likelihood-based approach allowing for several ascertainment schemes, to accommodate for the outcome-dependent sampling scheme, and a family-specific random term, to take into account the correlation between family members. We estimate the parameters using maximum likelihood based on the combined joint likelihood (CJL) approach. Simulations show that the CJL is more efficient for estimating the NIMA odds ratios as compared to a families-only approach. To illustrate our approach, we used data from a family and a twin study from the United Kingdom on rheumatoid arthritis, and confirmed the protective NIMA effect, with an odds ratio of 0.477 (95% CI 0.264-0.864).
Collapse
Affiliation(s)
- Brunilda Balliu
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
| | | | | | | | | |
Collapse
|
15
|
Bagos PG. On the covariance of two correlated log-odds ratios. Stat Med 2012; 31:1418-31. [PMID: 22302419 DOI: 10.1002/sim.4474] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2010] [Revised: 09/20/2011] [Accepted: 10/31/2011] [Indexed: 01/08/2023]
Abstract
In many applications two correlated estimates of an effect size need to be considered simultaneously to be combined or compared. Apparently, there is a need for calculating their covariance, which however requires access to the individual data that may not be available to a researcher performing the analysis. We present a simple and efficient method for calculating the covariance of two correlated log-odds ratios. The method is very simple, is based on the well-known large sample approximations, can be applied using only data that are available in the published reports and more importantly, is very general, because it is shown to encompass several previously derived estimates (multiple outcomes, multiple treatments, dose-response models, mutually exclusive outcomes, genetic association studies) as special cases. By encompassing the previous approaches in a unified framework, the method allows easily deriving estimates for the covariance concerning problems that were not easy to be obtained otherwise. We show that the method can be used to derive the covariance of log-odds ratios from matched and unmatched case-control studies that use the same cases, a situation that has been addressed in the past only using individual data. Future applications of the method are discussed.
Collapse
Affiliation(s)
- Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Central Greece, Papasiopoulou 2-4, Lamia, GR35100, Greece.
| |
Collapse
|
16
|
Namkung J. Single marker family-based association analysis not conditional on parental information. Methods Mol Biol 2012; 850:371-397. [PMID: 22307709 DOI: 10.1007/978-1-61779-555-8_20] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Family-based association analysis unconditional on parental genotypes models the effects of observed genotypes. This approach has been shown to have greater power than conditional methods. In this chapter, I review two popular association analysis methods accounting for familial correlations: the marginal model using generalized estimating equations (GEE) and the mixed model with a polygenic random component. The marginal approach does not explicitly model familial correlations but uses the information to improve the efficiency of parameter estimates. This model, using GEE, is useful when the correlation structure is not of interest; the correlations are treated as nuisance parameters. In the mixed model, familial correlations are modeled as random effects, e.g., the polygenic inheritance model accounts for correlations originating from shared genomic components within a family. These unconditional methods provide a flexible modeling framework for general pedigree data to accommodate traits with various distributions and many types of covariate effects. The analysis procedures are demonstrated using the ASSOC program in the S.A.G.E. package and the R package gee, including how to prepare input data, conduct the analysis, and interpret the output. ASSOC allows models to include random components of additional familial correlations that may be not sufficiently explained by a polygenic effect and addresses nonnormality of response variables by transformation methods. With its ease of use, ASSOC provides a useful tool for association analysis of large pedigree data.
Collapse
Affiliation(s)
- Junghyun Namkung
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA.
| |
Collapse
|
17
|
Fardo DW, Druen AR, Liu J, Mirea L, Infante-Rivard C, Breheny P. Exploration and comparison of methods for combining population- and family-based genetic association using the Genetic Analysis Workshop 17 mini-exome. BMC Proc 2011; 5 Suppl 9:S28. [PMID: 22373349 PMCID: PMC3287863 DOI: 10.1186/1753-6561-5-s9-s28] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
We examine the performance of various methods for combining family- and population-based genetic association data. Several approaches have been proposed for situations in which information is collected from both a subset of unrelated subjects and a subset of family members. Analyzing these samples separately is known to be inefficient, and it is important to determine the scenarios for which differing methods perform well. Others have investigated this question; however, no extensive simulations have been conducted, nor have these methods been applied to mini-exome-style data such as that provided by Genetic Analysis Workshop 17. We quantify the empirical power and false-positive rates for three existing methods applied to the Genetic Analysis Workshop 17 mini-exome data and compare relative performance. We use knowledge of the underlying data simulation model to make these assessments.
Collapse
Affiliation(s)
- David W Fardo
- Department of Biostatistics, University of Kentucky College of Public Health, 121 Washington Avenue, Lexington, KY 40536, USA.
| | | | | | | | | | | |
Collapse
|
18
|
Chung RH, Schmidt MA, Morris RW, Martin ER. CAPL: a novel association test using case-control and family data and accounting for population stratification. Genet Epidemiol 2011; 34:747-55. [PMID: 20878716 DOI: 10.1002/gepi.20539] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The recent successes of GWAS based on large sample sizes motivate combining independent datasets to obtain larger sample sizes and thereby increase statistical power. Analysis methods that can accommodate different study designs, such as family-based and case-control designs, are of general interest. However, population stratification can cause spurious association for population-based association analyses. For family-based association analysis that infers missing parental genotypes based on the allele frequencies estimated in the entire sample, the parental mating-type probabilities may not be correctly estimated in the presence of population stratification. Therefore, any approach to combining family and case-control data should also properly account for population stratification. Although several methods have been proposed to accommodate family-based and case-control data, all have restrictions. Most of them require sampling a homogeneous population, which may not be a reasonable assumption for data from a large consortium. One of the methods, FamCC, can account for population stratification and uses nuclear families with arbitrary number of siblings but requires parental genotype data, which are often unavailable for late-onset diseases. We extended the family-based test, Association in the Presence of Linkage (APL), to combine family and case-control data (CAPL). CAPL can accommodate case-control data and families with multiple affected siblings and missing parents in the presence of population stratification. We used simulations to demonstrate that CAPL is a valid test either in a homogeneous population or in the presence of population stratification. We also showed that CAPL can have more power than other methods that combine family and case-control data.
Collapse
Affiliation(s)
- Ren-Hua Chung
- Center for Genetic Epidemiology and Statistical Genetics, John P Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida 33101, USA
| | | | | | | |
Collapse
|
19
|
Zheng Y, Heagerty PJ, Hsu L, Newcomb PA. On combining family-based and population-based case-control data in association studies. Biometrics 2010; 66:1024-33. [PMID: 20163402 PMCID: PMC3038246 DOI: 10.1111/j.1541-0420.2010.01393.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Combining data collected from different sources can potentially enhance statistical efficiency in estimating effects of environmental or genetic factors or gene-environment interactions. However, combining data across studies becomes complicated when data are collected under different study designs, such as family-based and unrelated individual-based case-control design. In this article, we describe likelihood-based approaches that permit the joint estimation of covariate effects on disease risk under study designs that include cases, relatives of cases, and unrelated individuals. Our methods accommodate familial residual correlation and a variety of ascertainment schemes. Extensive simulation experiments demonstrate that the proposed methods for estimation and inference perform well in realistic settings. Efficiencies of different designs are contrasted in the simulation. We applied the methods to data from the Colorectal Cancer Family Registry.
Collapse
Affiliation(s)
- Yingye Zheng
- Biostatistics and Biomathematics Program, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.
| | | | | | | |
Collapse
|