1
|
Nguyen NH, Shin SJ, Dodd-Eaton EB, Ning J, Wang W. Personalized Risk Prediction for Cancer Survivors: A Bayesian Semi-parametric Recurrent Event Model with Competing Outcomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.28.530537. [PMID: 36909464 PMCID: PMC10002693 DOI: 10.1101/2023.02.28.530537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Multiple primary cancers are increasingly more frequent due to improved survival of cancer patients. Characteristics of the first primary cancer largely impact the risk of developing subsequent primary cancers. Hence, model-based risk characterization of cancer survivors that captures patient-specific variables is needed for healthcare policy making. We propose a Bayesian semi-parametric framework, where the occurrence processes of the competing cancer types follow independent non-homogeneous Poisson processes and adjust for covariates including the type and age at diagnosis of the first primary. Applying this framework to a historically collected cohort with families presenting a highly enriched history of multiple primary tumors and diverse cancer types, we have derived a suite of age-to-onset penetrance curves for cancer survivors. This includes penetrance estimates for second primary lung cancer, potentially impactful to ongoing cancer screening decisions. Using Receiver Operating Characteristic (ROC) curves, we have validated the good predictive performance of our models in predicting second primary lung cancer, sarcoma, breast cancer, and all other cancers combined, with areas under the curves (AUCs) at 0.89, 0.91, 0.76 and 0.68, respectively. In conclusion, our framework provides covariate-adjusted quantitative risk assessment for cancer survivors, hence moving a step closer to personalized health management for this unique population.
Collapse
Affiliation(s)
- Nam H Nguyen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX
- Department of Statistics, Rice University, Houston, TX
| | - Seung Jun Shin
- Department of Statistics, Korea University, Seoul, Korea
| | - Elissa B Dodd-Eaton
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX
| | - Jing Ning
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX
| | - Wenyi Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX
| |
Collapse
|
2
|
Gao F, Zeng D, Wang Y. Semiparametric regression analysis of bivariate censored events in a family study of Alzheimer's disease. Biostatistics 2022; 24:32-51. [PMID: 33948627 DOI: 10.1093/biostatistics/kxab014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 03/21/2021] [Accepted: 03/25/2021] [Indexed: 12/16/2022] Open
Abstract
Assessing disease comorbidity patterns in families represents the first step in gene mapping for diseases and is central to the practice of precision medicine. One way to evaluate the relative contributions of genetic risk factor and environmental determinants of a complex trait (e.g., Alzheimer's disease [AD]) and its comorbidities (e.g., cardiovascular diseases [CVD]) is through familial studies, where an initial cohort of subjects are recruited, genotyped for specific loci, and interviewed to provide extensive disease history in family members. Because of the retrospective nature of obtaining disease phenotypes in family members, the exact time of disease onset may not be available such that current status data or interval-censored data are observed. All existing methods for analyzing these family study data assume single event subject to right-censoring so are not applicable. In this article, we propose a semiparametric regression model for the family history data that assumes a family-specific random effect and individual random effects to account for the dependence due to shared environmental exposures and unobserved genetic relatedness, respectively. To incorporate multiple events, we jointly model the onset of the primary disease of interest and a secondary disease outcome that is subject to interval-censoring. We propose nonparametric maximum likelihood estimation and develop a stable Expectation-Maximization (EM) algorithm for computation. We establish the asymptotic properties of the resulting estimators and examine the performance of the proposed methods through simulation studies. Our application to a real world study reveals that the main contribution of comorbidity between AD and CVD is due to genetic factors instead of environmental factors.
Collapse
Affiliation(s)
- Fei Gao
- Division of Vaccine and Infectious Disease, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Yuanjia Wang
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| |
Collapse
|
3
|
Liang B, Wang Y, Zeng D. SEMIPARAMETRIC TRANSFORMATION MODELS WITH MULTILEVEL RANDOM EFFECTS FOR CORRELATED DISEASE ONSET IN FAMILIES. Stat Sin 2019; 29:1851-1871. [PMID: 31579362 PMCID: PMC6774630 DOI: 10.5705/ss.202017.0326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Large cohort studies are commonly launched to study risk of genetic variants or other risk factors on age at onset (AAO) of a chronic disorder. In these studies, family history data including AAO of disease in family members are collected to provide additional information and can be used to improve efficiency. Statistical analysis of these data is challenging due to missing genotypes in family members and the heterogeneous dependence attributed to both shared genetic back-ground and shared environmental factors (e.g., life style). In this paper, we propose a class of semiparametric transformation models with multilevel random effects to tackle these challenges. The proposed models include both proportional hazards model and proportional odds model as special cases. The multilevel random effects contain individual-specific random effects including kinship correlation structure dependent on the family pedigree, and a shared random effect to account for unobserved environment exposure. We use nonparametric maximum likelihood approach for inference and propose an expectation-maximization algorithm for computation in the presence of missing genotypes among family members. The obtained estimators are shown to be consistent, asymptotically normal, and semiparametrically efficient. Simulation studies demonstrate that the proposed method performs well with finite sample sizes. Finally, the proposed method is applied to study genetic risks in an Alzheimer's disease study.
Collapse
Affiliation(s)
- Baosheng Liang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong
| | - Yuanjia Wang
- Department of Biostatistics, Columbia University, New York, USA
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, USA
| |
Collapse
|
4
|
Qiu Y, Liang B. Robust logistic regression of family data in the presence of missing genotypes. J Appl Stat 2018. [DOI: 10.1080/02664763.2018.1526890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Yanping Qiu
- School of Statistics, Renmin University of China, Beijing, People's Republic of China
- MSD R&D (China) Co., Ltd., Beijing, People's Republic of China
| | - Baosheng Liang
- Department of Natural Sciences in Medicine, Peking University Health Science Center, Beijing, People's Republic of China
| |
Collapse
|
5
|
Hsu L, Gorfine M, Zucker DM. On Estimation of the Hazard Function from Population-based Case-Control Studies. J Am Stat Assoc 2018; 113:560-570. [PMID: 30906082 DOI: 10.1080/01621459.2017.1356315] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
The population-based case-control study design has been widely used for studying the etiology of chronic diseases. It is well established that the Cox proportional hazards model can be adapted to the case-control study and hazard ratios can be estimated by (conditional) logistic regression model with time as either a matched set or a covariate (Prentice and Breslow, 1978). However, the baseline hazard function, a critical component in absolute risk assessment, is unidentifiable, because the ratio of cases and controls is controlled by the investigators and does not reflect the true disease incidence rate in the population. In this paper we propose a simple and innovative approach, which makes use of routinely collected family history information, to estimate the baseline hazard function for any logistic regression model that is fit to the risk factor data collected on cases and controls. We establish that the proposed baseline hazard function estimator is consistent and asymptotically normal and show via simulation that it performs well in finite samples. We illustrate the proposed method by a population-based case-control study of prostate cancer where the association of various risk factors is assessed and the family history information is used to estimate the baseline hazard function.
Collapse
Affiliation(s)
- Li Hsu
- Biostatistics and Biomathematics, Fred Hutchinson Cancer Research Center
| | - Malka Gorfine
- Department of Statistics and Operations Research, Tel Aviv University
| | | |
Collapse
|
6
|
Liu XR, Pawitan Y, Clements MS. Generalized survival models for correlated time-to-event data. Stat Med 2017; 36:4743-4762. [DOI: 10.1002/sim.7451] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Revised: 07/20/2017] [Accepted: 08/07/2017] [Indexed: 11/06/2022]
Affiliation(s)
- Xing-Rong Liu
- Department of Medical Epidemiology and Biostatistics; Karolinska Institutet; Nobels väg 12A S-171 77 Stockholm Sweden
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics; Karolinska Institutet; Nobels väg 12A S-171 77 Stockholm Sweden
| | - Mark S. Clements
- Department of Medical Epidemiology and Biostatistics; Karolinska Institutet; Nobels väg 12A S-171 77 Stockholm Sweden
| |
Collapse
|
7
|
Vos JR, Hsu L, Brohet RM, Mourits MJE, de Vries J, Malone KE, Oosterwijk JC, de Bock GH. Bias Correction Methods Explain Much of the Variation Seen in Breast Cancer Risks of BRCA1/2 Mutation Carriers. J Clin Oncol 2015; 33:2553-62. [PMID: 26150446 DOI: 10.1200/jco.2014.59.0463] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Recommendations for treating patients who carry a BRCA1/2 gene are mainly based on cumulative lifetime risks (CLTRs) of breast cancer determined from retrospective cohorts. These risks vary widely (27% to 88%), and it is important to understand why. We analyzed the effects of methods of risk estimation and bias correction and of population factors on CLTRs in this retrospective clinical cohort of BRCA1/2 carriers. PATIENTS AND METHODS The following methods to estimate the breast cancer risk of BRCA1/2 carriers were identified from the literature: Kaplan-Meier, frailty, and modified segregation analyses with bias correction consisting of including or excluding index patients combined with including or excluding first-degree relatives (FDRs) or different conditional likelihoods. These were applied to clinical data of BRCA1/2 families derived from our family cancer clinic for whom a simulation was also performed to evaluate the methods. CLTRs and 95% CIs were estimated and compared with the reference CLTRs. RESULTS CLTRs ranged from 35% to 83% for BRCA1 and 41% to 86% for BRCA2 carriers at age 70 years width of 95% CIs: 10% to 35% and 13% to 46%, respectively). Relative bias varied from -38% to +16%. Bias correction with inclusion of index patients and untested FDRs gave the smallest bias: +2% (SD, 2%) in BRCA1 and +0.9% (SD, 3.6%) in BRCA2. CONCLUSION Much of the variation in breast cancer CLTRs in retrospective clinical BRCA1/2 cohorts is due to the bias-correction method, whereas a smaller part is due to population differences. Kaplan-Meier analyses with bias correction that includes index patients and a proportion of untested FDRs provide suitable CLTRs for carriers counseled in the clinic.
Collapse
Affiliation(s)
- Janet R Vos
- Janet R. Vos, Marian J.E. Mourits, Jakob de Vries, Jan C. Oosterwijk, and Geertruida H. de Bock, University of Groningen, University Medical Center Groningen, Groningen; Richard M. Brohet, Spaarne Hospital, Hoofddorp, the Netherlands; and Li Hsu and Kathleen E. Malone, Fred Hutchinson Cancer Research Center, Seattle, WA.
| | - Li Hsu
- Janet R. Vos, Marian J.E. Mourits, Jakob de Vries, Jan C. Oosterwijk, and Geertruida H. de Bock, University of Groningen, University Medical Center Groningen, Groningen; Richard M. Brohet, Spaarne Hospital, Hoofddorp, the Netherlands; and Li Hsu and Kathleen E. Malone, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Richard M Brohet
- Janet R. Vos, Marian J.E. Mourits, Jakob de Vries, Jan C. Oosterwijk, and Geertruida H. de Bock, University of Groningen, University Medical Center Groningen, Groningen; Richard M. Brohet, Spaarne Hospital, Hoofddorp, the Netherlands; and Li Hsu and Kathleen E. Malone, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Marian J E Mourits
- Janet R. Vos, Marian J.E. Mourits, Jakob de Vries, Jan C. Oosterwijk, and Geertruida H. de Bock, University of Groningen, University Medical Center Groningen, Groningen; Richard M. Brohet, Spaarne Hospital, Hoofddorp, the Netherlands; and Li Hsu and Kathleen E. Malone, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Jakob de Vries
- Janet R. Vos, Marian J.E. Mourits, Jakob de Vries, Jan C. Oosterwijk, and Geertruida H. de Bock, University of Groningen, University Medical Center Groningen, Groningen; Richard M. Brohet, Spaarne Hospital, Hoofddorp, the Netherlands; and Li Hsu and Kathleen E. Malone, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Kathleen E Malone
- Janet R. Vos, Marian J.E. Mourits, Jakob de Vries, Jan C. Oosterwijk, and Geertruida H. de Bock, University of Groningen, University Medical Center Groningen, Groningen; Richard M. Brohet, Spaarne Hospital, Hoofddorp, the Netherlands; and Li Hsu and Kathleen E. Malone, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Jan C Oosterwijk
- Janet R. Vos, Marian J.E. Mourits, Jakob de Vries, Jan C. Oosterwijk, and Geertruida H. de Bock, University of Groningen, University Medical Center Groningen, Groningen; Richard M. Brohet, Spaarne Hospital, Hoofddorp, the Netherlands; and Li Hsu and Kathleen E. Malone, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Geertruida H de Bock
- Janet R. Vos, Marian J.E. Mourits, Jakob de Vries, Jan C. Oosterwijk, and Geertruida H. de Bock, University of Groningen, University Medical Center Groningen, Groningen; Richard M. Brohet, Spaarne Hospital, Hoofddorp, the Netherlands; and Li Hsu and Kathleen E. Malone, Fred Hutchinson Cancer Research Center, Seattle, WA
| |
Collapse
|
8
|
Genetic modifiers and subtypes in schizophrenia: investigations of age at onset, severity, sex and family history. Schizophr Res 2014; 154:48-53. [PMID: 24581549 PMCID: PMC4422643 DOI: 10.1016/j.schres.2014.01.030] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/10/2013] [Revised: 12/15/2013] [Accepted: 01/18/2014] [Indexed: 02/08/2023]
Abstract
Schizophrenia is a genetically and clinically heterogeneous disorder. Genetic risk factors for the disorder may differ between the sexes or between multiply affected families compared to cases with no family history. Additionally, limited data support a genetic basis for variation in onset and severity, but specific loci have not been identified. We performed genome-wide association studies (GWAS) examining genetic influences on age at onset (AAO) and illness severity as well as specific risk by sex or family history status using up to 2762 cases and 3187 controls from the International Schizophrenia Consortium (ISC). Subjects with a family history of schizophrenia demonstrated a slightly lower average AAO that was not significant following multiple testing correction (p=.048), but no differences in illness severity were observed by family history status (p=.51). Consistent with prior reports, we observed earlier AAO (p=.005) and a more severe course of illness for men (p=.002). Family history positive analyses showed the greatest association with KIF5C (p=1.96×10(-8)), however, genetic risk burden overall does not differ by family history. Separate association analyses for males and females revealed no significant sex-specific associations. The top GWAS hit for AAO was near the olfactory receptor gene OR2K2 (p=1.52×10(-7)). Analyses of illness severity (episodic vs. continuous) implicated variation in ST18 (p=8.24×10(-7)). These results confirm recognized demographic relationships but do not support a simplified genetic architecture for schizophrenia subtypes based on these variables.
Collapse
|
9
|
Gorfine M, Hsu L, Parmigiani G. Frailty Models for Familial Risk with Application to Breast Cancer. J Am Stat Assoc 2013; 108:1205-1215. [PMID: 24678132 PMCID: PMC3963469 DOI: 10.1080/01621459.2013.818001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
In evaluating familial risk for disease we have two main statistical tasks: assessing the probability of carrying an inherited genetic mutation conferring higher risk; and predicting the absolute risk of developing diseases over time, for those individuals whose mutation status is known. Despite substantial progress, much remains unknown about the role of genetic and environmental risk factors, about the sources of variation in risk among families that carry high-risk mutations, and about the sources of familial aggregation beyond major Mendelian effects. These sources of heterogeneity contribute substantial variation in risk across families. In this paper we present simple and efficient methods for accounting for this variation in familial risk assessment. Our methods are based on frailty models. We implemented them in the context of generalizing Mendelian models of cancer risk, and compared our approaches to others that do not consider heterogeneity across families. Our extensive simulation study demonstrates that when predicting the risk of developing a disease over time conditional on carrier status, accounting for heterogeneity results in a substantial improvement in the area under the curve of the receiver operating characteristic. On the other hand, the improvement for carriership probability estimation is more limited. We illustrate the utility of the proposed approach through the analysis of BRCA1 and BRCA2 mutation carriers in the Washington Ashkenazi Kin-Cohort Study of Breast Cancer.
Collapse
Affiliation(s)
- Malka Gorfine
- Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel
| | - Li Hsu
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, U.S.A
| | - Giovanni Parmigiani
- Department of Biostatistics and Computational Biology, Dana Farber Cancer Institute Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, U.S.A
| |
Collapse
|
10
|
Choi YH. A Frailty-Model-Based Method for Estimating Age-Dependent Penetrance from Family Data. JOURNAL OF BIOMETRICS & BIOSTATISTICS 2012; Suppl 4:5488. [PMID: 24294538 PMCID: PMC3841342 DOI: 10.4172/2155-6180.s4-001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Accurate estimates of disease risk (penetrance) associated with inherited gene mutations are critical for the clinical management of individuals at risk, but this estimation raises many statistical challenges especially when performed in a family-based design. In this paper, we propose a general frailty model-based approach to accommodate this design, where the frailty random effect accounts for shared risk among family members not due to the observed risk factors. It is of major interest when the goal is to discover other genetic variations besides the major gene and to get accurate estimates of penetrance (i.e. unbiased by unknown confounding factors). This approach is further extended to accommodate missing genotypes in family members and the non-random ascertainment of the families. Simulation results show that the proposed method performs well in realistic settings. Finally, a family-based breast cancer study of the BRCA1 and BRCA2 genes is used to illustrate the method.
Collapse
Affiliation(s)
- Yun-Hee Choi
- Department of Epidemiology and Biostatistics, Western University, London, ON, Canada
| |
Collapse
|
11
|
Lawless JF, Yilmaz YE. Semiparametric estimation in copula models for bivariate sequential survival times. Biom J 2011; 53:779-96. [PMID: 21887793 DOI: 10.1002/bimj.201000131] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2010] [Revised: 06/13/2011] [Accepted: 06/29/2011] [Indexed: 11/06/2022]
Abstract
Sequentially observed survival times are of interest in many studies but there are difficulties in analyzing such data using nonparametric or semiparametric methods. First, when the duration of followup is limited and the times for a given individual are not independent, induced dependent censoring arises for the second and subsequent survival times. Non-identifiability of the marginal survival distributions for second and later times is another issue, since they are observable only if preceding survival times for an individual are uncensored. In addition, in some studies a significant proportion of individuals may never have the first event. Fully parametric models can deal with these features, but robustness is a concern. We introduce a new approach to address these issues. We model the joint distribution of the successive survival times by using copula functions, and provide semiparametric estimation procedures in which copula parameters are estimated without parametric assumptions on the marginal distributions. This provides more robust estimates and checks on the fit of parametric models. The methodology is applied to a motivating example involving relapse and survival following colon cancer treatment.
Collapse
Affiliation(s)
- Jerald F Lawless
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
| | | |
Collapse
|
12
|
Graber-Naidich A, Gorfine M, Malone KE, Hsu L. Missing genetic information in case-control family data with general semi-parametric shared frailty model. LIFETIME DATA ANALYSIS 2011; 17:175-194. [PMID: 21153764 PMCID: PMC3174530 DOI: 10.1007/s10985-010-9178-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2009] [Accepted: 06/15/2010] [Indexed: 05/30/2023]
Abstract
Case-control family data are now widely used to examine the role of gene-environment interactions in the etiology of complex diseases. In these types of studies, exposure levels are obtained retrospectively and, frequently, information on most risk factors of interest is available on the probands but not on their relatives. In this work we consider correlated failure time data arising from population-based case-control family studies with missing genotypes of relatives. We present a new method for estimating the age-dependent marginalized hazard function. The proposed technique has two major advantages: (1) it is based on the pseudo full likelihood function rather than a pseudo composite likelihood function, which usually suffers from substantial efficiency loss; (2) the cumulative baseline hazard function is estimated using a two-stage estimator instead of an iterative process. We assess the performance of the proposed methodology with simulation studies, and illustrate its utility on a real data example.
Collapse
Affiliation(s)
- Anna Graber-Naidich
- Faculty of Industrial Engineering and Management, Technion City, Haifa 32000, Israel
| | - Malka Gorfine
- Faculty of Industrial Engineering and Management, Technion City, Haifa 32000, Israel
| | - Kathleen E. Malone
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA
| | - Li Hsu
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA
| |
Collapse
|