1
|
Goldstein L, Langholz B. Analysis and asymptotic theory for nested case-control designs under highly stratified proportional hazards models. Lifetime Data Anal 2023; 29:342-371. [PMID: 36472759 DOI: 10.1007/s10985-022-09582-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 11/11/2022] [Indexed: 06/17/2023]
Abstract
Nested case-control sampled event time data under a highly stratified proportional hazards model, in which the number of strata increases proportional to sample size, is described and analyzed. The data can be characterized as stratified sampling from the event time risk sets and the analysis approach of Borgan et al. (Ann Stat 23:1749-1778, 1995) is adapted to accommodate both the stratification and case-control sampling from the stratified risk sets. Conditions for the consistency and asymptotic normality of the maximum partial likelihood estimator are provided and the results are used to compare the efficiency of the stratified analysis to an unstratified analysis when the baseline hazards can be semi-parametrically modeled in two special cases. Using the stratified sampling representation of the stratified analysis, methods for absolute risk estimation described by Borgan et al. (1995) for nested case-control data are used to develop methods for absolute risk estimation under the stratified model. The methods are illustrated by a year of birth stratified analysis of radon exposure and lung cancer mortality in a cohort of uranium miners from the Colorado Plateau.
Collapse
Affiliation(s)
- Larry Goldstein
- Department of Mathematics, University of Southern California, Los Angeles, USA
| | - Bryan Langholz
- Department of Preventive Medicine, University of Southern California, Los Angeles, USA.
| |
Collapse
|
2
|
Duan LL, Dunson DB. Bayesian Distance Clustering. J Mach Learn Res 2021; 22:224. [PMID: 35782785 PMCID: PMC9245927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Model-based clustering is widely used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some information in the data is discarded, we gain substantial robustness to modeling assumptions. The proposed approach represents an appealing middle ground between distance- and model-based clustering, drawing advantages from each of these canonical approaches. We illustrate dramatic gains in the ability to infer clusters that are not well represented by the usual choices of kernel. A simulation study is included to assess performance relative to competitors, and we apply the approach to clustering of brain genome expression data.
Collapse
Affiliation(s)
- Leo L Duan
- Department of Statistics, University of Florida, Gainesville, FL 32611, USA
| | - David B Dunson
- Department of Statistical Science, Duke University, Durham, NC 27708, USA
| |
Collapse
|
3
|
Henderson R, Mihaylova R, Oman P. A dual frailty model for lifetime analysis in maritime transportation. Lifetime Data Anal 2019; 25:739-756. [PMID: 30783873 PMCID: PMC6776569 DOI: 10.1007/s10985-019-09463-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 01/21/2019] [Indexed: 06/09/2023]
Abstract
We consider changes in ownership of commercial shipping vessels from an event history perspective. Each change in ownership can be influenced by the properties of the vessel itself, its age and history to date, the characteristics of both the seller and the buyer, and time-varying market conditions. Similar factors can affect the process of deciding when to scrap the vessel as no longer being economically viable. We consider a multi-state approach in which states are defined by the owning companies, a sale marks a transition, and scrapping of the vessel corresponds to moving to an absorbing state. We propose a dual frailty model that attempts to capture unexplained heterogeneity in the data, with one frailty term for the seller and one for the buyer. We describe a Monte Carlo Markov chain estimation procedure and verify its accuracy through simulations. We investigate the consequences of mistakenly ignoring frailty in these circumstances. We compare results with and without the inclusion of frailty.
Collapse
Affiliation(s)
- Robin Henderson
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle, UK
| | | | - Paul Oman
- Department of Mathematics, Physics and Electrical Engineering, Northumbria University, Newcastle, UK
| |
Collapse
|
4
|
Abstract
The varying-coefficient Cox model is flexible and useful for modeling the dynamic changes of regression coefficients in survival analysis. In this paper, we study feature screening for varying-coefficient Cox models in ultrahigh-dimensional covariates. The proposed screening procedure is based on the joint partial likelihood of all predictors, thus different from marginal screening procedures available in the literature. In order to carry out the new procedure, we propose an effective algorithm and establish its ascent property. We further prove that the proposed procedure possesses the sure screening property. That is, with probability tending to 1, the selected variable set includes the actual active predictors. We conducted simulations to evaluate the finite-sample performance of the proposed procedure and compared it with marginal screening procedures. A genomic data set is used for illustration purposes.
Collapse
Affiliation(s)
- Guangren Yang
- Department of Statistics, School of Economics, Jinan University, Guangzhou, China 510632
| | - Ling Zhang
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Runze Li
- Department of Statistics and the Methodology Center, The Pennsylvania State University, University Park, PA 16802, USA
| | - Yuan Huang
- Department of Biostatistics, University of Iowa, Iowa City, IA 52242, USA
| |
Collapse
|
5
|
Lin DY, Dai L, Cheng G, Sailer MO. On confidence intervals for the hazard ratio in randomized clinical trials. Biometrics 2016; 72:1098-1102. [PMID: 27123760 DOI: 10.1111/biom.12528] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 02/01/2016] [Accepted: 03/01/2016] [Indexed: 11/30/2022]
Abstract
The log-rank test is widely used to compare two survival distributions in a randomized clinical trial, while partial likelihood (Cox, 1975) is the method of choice for making inference about the hazard ratio under the Cox (1972) proportional hazards model. The Wald 95% confidence interval of the hazard ratio may include the null value of 1 when the p-value of the log-rank test is less than 0.05. Peto et al. (1977) provided an estimator for the hazard ratio based on the log-rank statistic; the corresponding 95% confidence interval excludes the null value of 1 if and only if the p-value of the log-rank test is less than 0.05. However, Peto's estimator is not consistent, and the corresponding confidence interval does not have correct coverage probability. In this article, we construct the confidence interval by inverting the score test under the (possibly stratified) Cox model, and we modify the variance estimator such that the resulting score test for the null hypothesis of no treatment difference is identical to the log-rank test in the possible presence of ties. Like Peto's method, the proposed confidence interval excludes the null value if and only if the log-rank test is significant. Unlike Peto's method, however, this interval has correct coverage probability. An added benefit of the proposed confidence interval is that it tends to be more accurate and narrower than the Wald confidence interval. We demonstrate the advantages of the proposed method through extensive simulation studies and a colon cancer study.
Collapse
Affiliation(s)
- Dan-Yu Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599, U.S.A
| | - Luyan Dai
- Boehringer Ingelheim Investment Co., Ltd., 1601 Nanjing Road West, Shanghai 200040, P.R. China
| | - Gang Cheng
- Boehringer Ingelheim Investment Co., Ltd., 1601 Nanjing Road West, Shanghai 200040, P.R. China
| | - Martin Oliver Sailer
- Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, 88397 Biberach an der Riss, Germany
| |
Collapse
|
6
|
Rivera C, Lumley T. Using the whole cohort in the analysis of countermatched samples. Biometrics 2015; 72:382-91. [PMID: 26393818 DOI: 10.1111/biom.12419] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Revised: 07/01/2015] [Accepted: 08/01/2015] [Indexed: 11/29/2022]
Abstract
We present a technique for using calibrated weights to incorporate whole-cohort information in the analysis of a countermatched sample. Following Samuelsen's approach for matched case-control sampling, we derive expressions for the marginal sampling probabilities, so that the data can be treated as an unequally-sampled case-cohort design. Pseudolikelihood estimating equations are used to find the estimates. The sampling weights can be calibrated, allowing all whole-cohort variables to be used in estimation; in contrast, the partial likelihood analysis makes use only of a single discrete surrogate for exposure. Using a survey-sampling approach rather than a martingale approach simplifies the theory; in particular, the sampling weights need not be a predictable process. Our simulation results show that pseudolikelihood estimation gives lower efficiency than partial likelihood estimation, but that the gain from calibration of weights can more than compensate for this loss. If there is a good surrogate for exposure, countermatched sampling still outperforms case-cohort and two-phase case-control sampling even when calibrated weights are used. Findings are illustrated with data from the National Wilms' Tumour Study and the Welsh nickel refinery workers study.
Collapse
Affiliation(s)
- C Rivera
- Department of statistics, University of Auckland, Auckland, NZ
| | - T Lumley
- Department of statistics, University of Auckland, Auckland, NZ
| |
Collapse
|
7
|
Zhang F, Khalili A, Lin S. Optimum study design for detecting imprinting and maternal effects based on partial likelihood. Biometrics 2015; 72:95-105. [PMID: 26288102 DOI: 10.1111/biom.12380] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Revised: 06/01/2015] [Accepted: 07/01/2015] [Indexed: 11/28/2022]
Abstract
Despite spectacular advances in molecular genomic technologies in the past two decades, resources available for genomic studies are still finite and limited, especially for family-based studies. Hence, it is important to consider an optimum study design to maximally utilize limited resources to increase statistical power in family-based studies. A particular question of interest is whether it is more profitable to genotype siblings of probands or to recruit more independent families. Numerous studies have attempted to address this study design issue for simultaneous detection of imprinting and maternal effects, two important epigenetic factors for studying complex diseases. The question is far from settled, however, mainly due to the fact that results and recommendations in the literature are based on anecdotal evidence from limited simulation studies rather than based on rigorous statistical analysis. In this article, we propose a systematic approach to study various designs based on a partial likelihood formulation. We derive the asymptotic properties and obtain formulas for computing the information contents of study designs being considered. Our results show that, for a common disease, recruiting additional siblings is beneficial because both affected and unaffected individuals will be included. However, if a disease is rare, then any additional siblings recruited are most likely to be unaffected, thus contributing little additional information; in such cases, additional families will be a better choice with a fixed amount of resources. Our work thus offers a practical strategy for investigators to select the optimum study design within a case-control family scheme before data collection.
Collapse
Affiliation(s)
- Fangyuan Zhang
- Department of Statistics, The Ohio State University, 1958 Neil Avenue, Columbus, Ohio 43210, U.S.A
| | - Abbas Khalili
- Department of Mathematics and Statistics, McGill University, 805 Sherbrooke Street West, Montreal, Quebec H3A 0B9, Canada
| | - Shili Lin
- Department of Statistics, The Ohio State University, 1958 Neil Avenue, Columbus, Ohio 43210, U.S.A
| |
Collapse
|
8
|
Abstract
A rate model is proposed for a modulated renewal process comprising a single long sequence, where the covariate process may not capture the dependencies in the sequence as in standard intensity models. We consider partial likelihood-based inferences under a semiparametric multiplicative rate model, which has been widely studied in the context of independent and identical data. Under an intensity model, gap times in a single long sequence may be used naively in the partial likelihood with variance estimation utilizing the observed information matrix. Under a rate model, the gap times cannot be treated as independent and studying the partial likelihood is much more challenging. We employ a mixing condition in the application of limit theory for stationary sequences to obtain consistency and asymptotic normality. The estimator's variance is quite complicated owing to the unknown gap times dependence structure. We adapt block bootstrapping and cluster variance estimators to the partial likelihood. Simulation studies and an analysis of a semiparametric extension of a popular model for neural spike train data demonstrate the practical utility of the rate approach in comparison with the intensity approach.
Collapse
Affiliation(s)
- Feng-Chang Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599, U.S.A
| | - Young K Truong
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599, U.S.A
| | - Jason P Fine
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|