1
|
Gámiz ML, Mammen E, Martínez-Miranda MD, Nielsen JP. Missing link survival analysis with applications to available pandemic data. Comput Stat Data Anal 2021; 169:107405. [PMID: 34924652 PMCID: PMC8666881 DOI: 10.1016/j.csda.2021.107405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 11/25/2021] [Accepted: 11/27/2021] [Indexed: 11/18/2022]
Abstract
It is shown how to overcome a new missing data problem in survival analysis. Iterative nonparametric techniques are utilized and the missing data information is both estimated and used for further estimation in each iterative step. Theory is developed and a good finite sample performance is illustrated by simulations. The main motivation is an application to French data on the temporal development of the number of hospitalized Covid-19 patients.
Collapse
Affiliation(s)
- María Luz Gámiz
- Department of Statistics and Operations Research, University of Granada, Spain
| | - Enno Mammen
- Institute of Applied Mathematics, Heidelberg University, Germany
| | | | | |
Collapse
|
2
|
Seaman SR, Presanis A, Jackson C. Estimating a time-to-event distribution from right-truncated data in an epidemic: A review of methods. Stat Methods Med Res 2021; 31:1641-1655. [PMID: 34931911 PMCID: PMC9465556 DOI: 10.1177/09622802211023955] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Time-to-event data are right-truncated if only individuals who have experienced
the event by a certain time can be included in the sample. For example, we may
be interested in estimating the distribution of time from onset of disease
symptoms to death and only have data on individuals who have died. This may be
the case, for example, at the beginning of an epidemic. Right truncation causes
the distribution of times to event in the sample to be biased towards shorter
times compared to the population distribution, and appropriate statistical
methods should be used to account for this bias. This article is a review of
such methods, particularly in the context of an infectious disease epidemic,
like COVID-19. We consider methods for estimating the marginal time-to-event
distribution, and compare their efficiencies. (Non-)identifiability of the
distribution is an important issue with right-truncated data, particularly at
the beginning of an epidemic, and this is discussed in detail. We also review
methods for estimating the effects of covariates on the time to event. An
illustration of the application of many of these methods is provided, using data
on individuals who had died with coronavirus disease by 5 April 2020.
Collapse
Affiliation(s)
- Shaun R Seaman
- 47959MRC Biostatistics Unit, University of Cambridge, UK
| | - Anne Presanis
- 47959MRC Biostatistics Unit, University of Cambridge, UK
| | | |
Collapse
|
3
|
Pak D, Liu J, Ning J, Gómez G, Shen Y. Analyzing left-truncated and right-censored infectious disease cohort data with interval-censored infection onset. Stat Med 2020; 40:287-298. [PMID: 33086432 DOI: 10.1002/sim.8774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 08/18/2020] [Accepted: 09/26/2020] [Indexed: 11/10/2022]
Abstract
In an infectious disease cohort study, individuals who have been infected with a pathogen are often recruited for follow up. The period between infection and the onset of symptomatic disease, referred to as the incubation period, is of interest because of its importance on disease surveillance and control. However, the incubation period is often difficult to ascertain due to the uncertainty associated with asymptomatic infection onset time. An additional complication is that the observed infected subjects are likely to have longer incubation periods due to the prevalent sampling. In this article, we demonstrate how to estimate the distribution of the incubation period with the uncertain infection onset, subject to left-truncation and right-censoring. We employ a family of sufficiently general parametric models, the generalized odds-rate class of regression models, for the underlying incubation period and its correlation with covariates. In simulation studies, we assess the finite sample performance of the model fitting and hazard function estimation. The proposed method is illustrated on data from the HIV/AIDS study on injection drug users admitted to a detoxification program in Badalona, Spain.
Collapse
Affiliation(s)
- Daewoo Pak
- Department of Information & Statistics, Yonsei University, Wonju, Korea.,Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Jun Liu
- Department of Plastic Surgery, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Jing Ning
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Guadalupe Gómez
- Departament d'Estadística i Investigació Operativa and Barcelona Graduate School of Mathematics BGSMath, Universitat Politécnica de Catalunya, Barcelona, Spain
| | - Yu Shen
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| |
Collapse
|
4
|
Vakulenko-Lagun B, Mandel M, Betensky RA. Inverse probability weighting methods for Cox regression with right-truncated data. Biometrics 2019; 76:484-495. [PMID: 31621059 DOI: 10.1111/biom.13162] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 10/07/2019] [Indexed: 11/28/2022]
Abstract
Right-truncated data arise when observations are ascertained retrospectively, and only subjects who experience the event of interest by the time of sampling are selected. Such a selection scheme, without adjustment, leads to biased estimation of covariate effects in the Cox proportional hazards model. The existing methods for fitting the Cox model to right-truncated data, which are based on the maximization of the likelihood or solving estimating equations with respect to both the baseline hazard function and the covariate effects, are numerically challenging. We consider two alternative simple methods based on inverse probability weighting (IPW) estimating equations, which allow consistent estimation of covariate effects under a positivity assumption and avoid estimation of baseline hazards. We discuss problems of identifiability and consistency that arise when positivity does not hold and show that although the partial tests for null effects based on these IPW methods can be used in some settings even in the absence of positivity, they are not valid in general. We propose adjusted estimating equations that incorporate the probability of observation when it is known from external sources, which results in consistent estimation. We compare the methods in simulations and apply them to the analyses of human immunodeficiency virus latency.
Collapse
Affiliation(s)
- Bella Vakulenko-Lagun
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.,Department of Statistics, University of Haifa, Haifa, Israel
| | - Micha Mandel
- Department of Statistics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Rebecca A Betensky
- Department of Biostatistics, College of Global Public Health, New York, New York
| |
Collapse
|
5
|
Molecular evolution analysis of the human immunodeficiency virus type 1 envelope in simian/human immunodeficiency virus-infected macaques: implications for challenge dose selection. J Virol 2011; 85:10332-45. [PMID: 21795341 DOI: 10.1128/jvi.05290-11] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Since the demonstration that almost 80% of human immunodeficiency virus type 1 (HIV-1) infections result from the transmission of a single variant from the donor, biological features similar to those of HIV mucosal transmission have been reported for macaques inoculated with simian immunodeficiency virus (SIV). Here we describe the early diversification events and the impact of challenge doses on viral kinetics and on the number of variants transmitted in macaques infected with the chimeric simian/human immunodeficiency virus SHIV(sf162p4). We show that there is a correlation between the dose administered and the number of variants transmitted and that certain inoculum variants are preferentially transmitted. This could provide insight into the viral determinants of transmission and could aid in vaccine development. Challenge through the mucosal route with high doses results in the transmission of multiple variants in all the animals. Such an unrealistic scenario could underestimate potential intervention measures. We thus propose the use of molecular evolution analysis to aid in the determination of challenge doses that better mimic the transmission dynamics seen in natural HIV-1 infection.
Collapse
|
6
|
Wolkewitz M, Dettenkofer M, Bertz H, Schumacher M, Huebner J. Statistical epidemic modeling with hospital outbreak data. Stat Med 2008; 27:6522-31. [DOI: 10.1002/sim.3419] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
7
|
Alioum A, Commenges D, Thiebaut R, Dabis F. A multistate approach for estimating the incidence of human immunodeficiency virus by using data from a prevalent cohort study. J R Stat Soc Ser C Appl Stat 2005. [DOI: 10.1111/j.1467-9876.2005.00514.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
8
|
Abstract
Many statisticians have contributed to studies of the HIV epidemic and progression to AIDS. They have developed new statistical methodology, where needed, to address HIV-related issues. The transfer of methods from one area to another often involves a substantial delay. This paper points to methods that were developed in the HIV context and have either already found applications in other areas of medical research or have the potential for such applications, with the hope that this will promote a speedier transfer of the research methods. Among the new tools that HIV studies have placed firmly into the pool of statistical methods for medical research are the methods of back-calculation, methods for the analysis of retrospective ascertainment data and methods of analysis for the combined data from clinical trials and associated longitudinal studies. Notions that have been stimulated substantially are use of surrogate endpoints in clinical trials and screening blood products by the use of pooled serum samples. Research activity in many other areas has been boosted substantially through contributions motivated by HIV/AIDS studies. Noteworthy examples are analyses for doubly-censored lifetime data and methods for assessing vaccines for transmissible diseases.
Collapse
Affiliation(s)
- N G Becker
- National Centre for Epidemiology and Population Health, Australian National University, Canberra, ACT 0200, Australia.
| | | |
Collapse
|
9
|
Cooley PC, Myers LE, Hamill DN. A meta-analysis of estimates of the AIDS incubation distribution. Eur J Epidemiol 1996; 12:229-35. [PMID: 8884188 DOI: 10.1007/bf00145410] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Information from 12 studies is combined to estimate the AIDS incubation distribution with greater precision than is possible from a single study. The analysis uses a hierarchy of parametric models based on a four-parameter generalized F distribution. This general model contains four standard two-parameter distributions as special cases. The cases are the Weibull, gamma, log-logistic, lognormal distributions. These four special cases subsume three distinct asymptotic hazard behaviors. As time increases beyond the median of approximately 10 years, the hazard can increase to infinity (Weibull), can plateau at some constant level (gamma), or can decrease to zero (log-logistic and lognormal). The Weibull, gamma and 'log-logistic distributions' which represent the three distinct asymptotic hazard behaviors, all fit the data as well as the generalized F distribution at the 25 percent significance level. Hence, we conclude that incubation data is still too limited to ascertain the specific hazard assumption that should be utilized in studies of the AIDS epidemic. Accordingly, efforts to model the AIDS epidemic (e.g., back-calculation approaches) should allow the incubation distribution to take several forms to adequately represent HIV estimation uncertainty. It is recommended that, at a minimum, the specific Weibull, gamma and log-logistic distributions estimated in this meta-analysis should all be used in modeling the AIDS epidemic, to reflect this uncertainty.
Collapse
Affiliation(s)
- P C Cooley
- Research Triangle Institute, Center for Computer Science, Research Triangle Park, North Carolina, USA
| | | | | |
Collapse
|
10
|
Paik MC, Begg MD, el-Sadr W, Gorman J, Stien Z. Difference in clinical implications of CD4 counts among HIV-infected homosexual men and injection drug using men and women. Stat Med 1995; 14:1889-900. [PMID: 8532982 DOI: 10.1002/sim.4780141705] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
While the relationship between CD4 counts and clinical symptoms is well established among homosexual men, the same is not true for injection drug using men and women (IDUM and IDUW). In this paper we investigate whether CD4 counts have the same clinical implications for IDUM and IDUW as for homosexual men. We estimated the CD4 counts at which 50 per cent of the HIV-infected but AIDS-free population has AIDS related complex (ARC) based on three biannually measured CD4 counts. The analyses involve interval, right and left censored threshold data. We took the parametric approach, assuming that the threshold values for ARC arise from a family of distributions that includes symmetric, left or right skewed distributions, in which the logistic and extreme value distributions are embedded as special cases. The resulting estimates of median thresholds of CD4 counts for ARC were 249, 424 and 755 for homosexual men, IDUM, and IDUW, respectively. The results were robust with respect to the assumptions on the underlying distribution.
Collapse
Affiliation(s)
- M C Paik
- Division of Biostatistics, School of Public Health, Columbia University, NY, NY, 10032, USA
| | | | | | | | | |
Collapse
|
11
|
Book Reviews. J Am Stat Assoc 1995. [DOI: 10.1080/01621459.1995.10476573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
12
|
|
13
|
Abstract
Techniques for reconstructing plausible HIV incidence curves from AIDS incidence data are called methods of back-projection, or back-calculation. Approaches to back-projection tend to make the simplifying assumption that the quarterly HIV incidences are independent, which is not even approximately true. Here we investigate whether smoothed non-parametric back-projection based on this simplifying assumption gives sensible back-projections and appropriate measures of precision for these reconstructed HIV incidence curves. Simple models for HIV transmission are shown to have much greater variation than the corresponding non-homogeneous Poisson process arising from the independence assumption. Nevertheless, bearing in mind that the objective is to reconstruct the HIV epidemic curve for the current epidemic, it is argued that such back-projection does give sensible HIV curves. This conclusion is supported by a simulation study, which also finds that confidence intervals for the HIV incidences are wider for transmission data than those determined from independent Poisson data.
Collapse
Affiliation(s)
- N G Becker
- Department of Statistics, La Trobe University, Bundoora VIC, Australia
| | | |
Collapse
|
14
|
Dunlop DD, Tamhane AC, Chmiel JS, Phair JP. A model-based approach to estimate the AIDS-free time distribution in homosexual men using longitudinal data. J Biopharm Stat 1994; 4:129-46. [PMID: 7951270 DOI: 10.1080/10543409408835078] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
A model-based approach is developed to estimate the distribution of time from seroconversion to diagnosis with acquired immunodeficiency syndrome (AIDS) as a function of selected time-dependent covariates. The approach is applied to longitudinal data collected over 4 years of follow-up from 450 men seropositive for the human immunodeficiency virus (90 AIDS cases) and 62 seroconverters (nine AIDS cases) participating in the Chicago part of the Multicenter AIDS Cohort Study. Because of the periodic nature of monitoring, the seroconversion time is interval-censored for seroconverters and left-censored for seroprevalent cohort members; the end-point is right-censored for 413 individuals. Since serological monitoring is not continuous but only at regularly scheduled visit times, a model for the discrete hazard rate (DHR) is proposed that is a generalized linear model that relates the DHR to the covariate history through the complementary log-log link. Classification trees are used for preliminary screening of covariates to identify predictors of AIDS that should be incorporated into the DHR model. The missing seroconversion times for all men are imputed 100 times to obtain 100 completed datasets from which the parameters of the DHR are then estimated using the maximum-likelihood method. The final DHR model includes the following infection progression (marker) variables: CD4%, hemoglobin, p24 antigen, and CD4% x p24 antigen interaction. Using this DHR model, the discrete survival distribution of AIDS-free time is estimated for the given population. The jackknife procedure is used to assess the precision of the estimated survival distribution.
Collapse
Affiliation(s)
- D D Dunlop
- Center for Health Services and Policy Research, Northwestern University, Evanston, Illinois 60208
| | | | | | | |
Collapse
|
15
|
Chiarotti F, Palombi M, Schinaia N, Ghirardini A, Bellocco R. Median time from seroconversion to AIDS in Italian HIV-positive haemophiliacs: different parametric estimates. Stat Med 1994; 13:163-75. [PMID: 8122052 DOI: 10.1002/sim.4780130207] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The purpose of this study was to estimate the median incubation time between human immunodeficiency virus (HIV) infection and onset of acquired immunodeficiency syndrome (AIDS), using three parametric models and six estimates of seroconversion time. Study subjects were 732 HIV-positive haemophiliacs enrolled in the Italian Registry of patients with congenital coagulation disorders. Seroconversion time was estimated for each subject according to six different criteria, based on three distributions of seroconversion (uniform, uniform on three sub-intervals and truncated Weibull) and two indices synthesizing each distribution (median and median of three random values). The estimated seroconversion times were subsequently used as starting points in the analysis of incubation. This was performed applying Kaplan-Meier non-parametric survival analysis, and fitting to incubation data three probability density functions, representing three different situations with respect to the hazard of developing AIDS following seroconversion (namely Weibull (WE), generalized exponential (GE) and log-logistic (LL)). The cumulative incidence over an 8-year period ranged from 14.9 to 17.8 per cent when applying the Kaplan-Meier method, from 14.1 to 17.2 per cent when using the WE function, from 14.5 to 17.3 per cent when using the GE function and from 14.4 to 17.3 per cent when using the LL function, depending on the estimate of seroconversion time used. Similarly, the median incubation times ranged from 12.6 to 15.0 years with the WE function, from 14.0 to 16.5 years with the GE function, and from 13.4 to 16.1 years with the LL function. The presence of a bound on the increase of the hazard function seems to affect the incubation more strongly than the eventual decrease following the attainment of the maximum risk. This may be due to the decrease in the hazard beginning when most of the seropositive subjects have already developed AIDS.
Collapse
Affiliation(s)
- F Chiarotti
- Istituto Superiore di Sanità-Laboratorio di Fisiopatologia di Organo e di Sistema, Roma, Italy
| | | | | | | | | |
Collapse
|
16
|
Abstract
Today, we know a lot about HIV and AIDS, yet too little to be able to stop the pandemic by a vaccination or by medication. Until that day is at hand, the only means of controlling the pandemic is by information--information about the diffusion of the virus, about risk behaviour, about safe sex, and about social responsibility. Educating people must be based on concise clinical experience, an up-to-date picture of the epidemiological situation, and reliable forecasts about the future course of the epidemic. Producing reliable forecasts about the HIV epidemic has proved to be a more complicated task than expected. The proven, reliable, and in most cases very useful epidemiological models have been far from successful when applied to the HIV epidemic. The reason for this is mostly due to the lack of reliable data depicting the true HIV seroprevalence. Instead of being handicapped by the problems of inadequate data, geographical modelling of the HIV epidemic is able to rely on its theoretical understanding and good knowledge about the spatial organization and the functioning of societies to make is forecasts. Due to its flexibility of approach, geographical modelling can adjust itself to less accurate data, thus providing an interim forecasting instrument until epidemiological modelling becomes successful. This paper presents the results of an analysis of the HIV epidemic in Finland based on cartographic analysis of municipal data and the use of a simple growth model.
Collapse
Affiliation(s)
- M Löytönen
- Department of Geography, University of Helsinki, Finland
| |
Collapse
|
17
|
|
18
|
Hser YI. Population estimates of intravenous drug users and HIV infection in Los Angeles County. THE INTERNATIONAL JOURNAL OF THE ADDICTIONS 1993; 28:695-709. [PMID: 8349387 DOI: 10.3109/10826089309062167] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
This study applies multiple-capture models to drug treatment data and the synthetic estimation method to arrestee data to provide estimates of the number of IVDUs in Los Angeles County in 1989. Based on the 5% HIV-prevalence rate currently found in IVDUs in Los Angeles, it is estimated that there could be as many as 9,500 HIV-infected IVDUs. The estimates of IVDUs are generally higher than those obtained by back-calculation methods which often undercount IVDU-related AIDS cases and do not consider deaths for causes other than HIV infection.
Collapse
Affiliation(s)
- Y I Hser
- Neuropsychiatric Institute, University of California, Los Angeles
| |
Collapse
|
19
|
Abstract
Doubly censored data arise in some cohort studies of the AIDS incubation period because the time of infection may be known only up to an interval defined by two successive screening tests for HIV antibody. A simple analytic approach is to impute the infection time by the mid-point of the interval and then apply standard survival techniques for right censored data. The objective of this paper is to investigate the statistical properties of such a mid-point imputation approach. We investigated the asymptotic bias of the Kaplan-Meier estimate, coverage probabilities of associated confidence intervals, bias in hazard ratio, and the size of the logrank test. We show that the statistical properties of mid-point imputation depend strongly on the underlying distributions of infection times and the incubation periods, and the width of the interval between screening tests. In the absence of treatment, the median incubation period of HIV infection is approximately 10 years, and we conclude that, for this situation, mid-point imputation is a reasonable procedure for interval widths of 2 years or less.
Collapse
Affiliation(s)
- C G Law
- Westat Inc., Rockville, MD 20850
| | | |
Collapse
|
20
|
Abstract
A three-stage stochastic epidemic model extending the so-called classical epidemic process to one that includes time-dependent transition probabilities is described, and a solution to the appropriate set of forward differential-difference equations is given. When an individual can move from being a susceptible to one infected with the HIV virus to one diagnosed as having AIDS, we can use this general model to describe an AIDS epidemic process. We obtain expressions for the mean and variance of the number of AIDS cases for some special cases. By comparing these with actual data, it is suggested that, for some categories of cases (in particular, children), this model might be a plausible model to describe the underlying mechanism of the AIDS epidemic.
Collapse
Affiliation(s)
- L Billard
- Department of Statistics, University of Georgia, Athens 30602
| | | |
Collapse
|
21
|
Farrington CP. Subacute sclerosing panencephalitis in England and Wales: transient effects and risk estimates. Stat Med 1991; 10:1733-44. [PMID: 1792467 DOI: 10.1002/sim.4780101111] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Data on cases of subacute sclerosing panencephalitis (SSPE) with onset in 1970-89 show a marked increase in the interval from initial measles attack to onset of SSPE by year of onset. It is shown that this increase is a transient effect resulting from the decline in measles incidence following the introduction of mass vaccination in 1968. The risk of SSPE after measles is estimated to be 4.0 x 10(-5) (18 x 10(-5) after measles under one year of age) and the risk after vaccination to be no greater than 0.14 x 10(-5), thus confirming that vaccination is effective in reducing the incidence of SSPE. The paper makes use of methods developed to estimate the incubation period of AIDS.
Collapse
Affiliation(s)
- C P Farrington
- PHLS Communicable Disease Surveillance Centre, London, U.K
| |
Collapse
|