1
|
Liu J, Duan Z, Hu X, Zhong J, Yin Y. Detracking Autoencoding Conditional Generative Adversarial Network: Improved Generative Adversarial Network Method for Tabular Missing Value Imputation. ENTROPY (BASEL, SWITZERLAND) 2024; 26:402. [PMID: 38785651 PMCID: PMC11120050 DOI: 10.3390/e26050402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/20/2024] [Accepted: 04/21/2024] [Indexed: 05/25/2024]
Abstract
Due to various reasons, such as limitations in data collection and interruptions in network transmission, gathered data often contain missing values. Existing state-of-the-art generative adversarial imputation methods face three main issues: limited applicability, neglect of latent categorical information that could reflect relationships among samples, and an inability to balance local and global information. We propose a novel generative adversarial model named DTAE-CGAN that incorporates detracking autoencoding and conditional labels to address these issues. This enhances the network's ability to learn inter-sample correlations and makes full use of all data information in incomplete datasets, rather than learning random noise. We conducted experiments on six real datasets of varying sizes, comparing our method with four classic imputation baselines. The results demonstrate that our proposed model consistently exhibited superior imputation accuracy.
Collapse
Affiliation(s)
- Jingrui Liu
- College of Computer Science, Chongqing University, Chongqing 400044, China
- Chongqing University-University of Cincinnati Joint Co-op Institute, Chongqing University, Chongqing 400044, China
| | - Zixin Duan
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Xinkai Hu
- College of Computer Science, Chongqing University, Chongqing 400044, China
| | - Jingxuan Zhong
- College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing 400044, China
| | - Yunfei Yin
- College of Computer Science, Chongqing University, Chongqing 400044, China
| |
Collapse
|
2
|
Chen H, Heitjan DF. Analysis of local sensitivity to nonignorability with missing outcomes and predictors. Biometrics 2022; 78:1342-1352. [PMID: 34297356 DOI: 10.1111/biom.13532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 07/03/2021] [Accepted: 07/15/2021] [Indexed: 12/30/2022]
Abstract
The ISNI (index of sensitivity to local nonignorability) method quantifies local sensitivity of parametric inferences to nonignorable missingness in an outcome variable. Here we extend ISNI to the situations where both outcomes and predictors can be missing and where the missingness mechanism can be either parametric or semi-parametric. We define the quantity MinNI (minimum nonignorability) to be an approximation to the norm of the smallest value of the transformed nonignorability that gives a nonnegligible displacement of the estimate of the parameter of interest. We illustrate our method in a complete data set from which we synthetically delete observations according to various patterns. We then apply the method to real-data examples involving the normal linear model and conditional logistic regression.
Collapse
Affiliation(s)
- Heng Chen
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Daniel F Heitjan
- Department of Statistical Science, Southern Methodist University, Dallas, Texas, USA.,Department of Population & Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| |
Collapse
|
3
|
Xia M, Akakpo RM. A Bayesian approach to simultaneous adjustment of misclassification and missingness in categorical covariates. Stat Methods Med Res 2022; 31:1449-1469. [PMID: 35473473 DOI: 10.1177/09622802221094941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
This study considers concurrent adjustment of misclassification and missingness in categorical covariates in regression models. Under various misclassification and missingness mechanisms, we derive a general mixture regression structure for regression models that can incorporate multiple surrogates of categorical covariates that are subject to misclassification and missingness. In simulation studies, we demonstrate that including observations with missingness and/or multiple surrogates of the covariate helps alleviate the efficiency loss caused by misclassification. In addition, we study the efficacy of misclassification adjustment when the number of categories increases for the covariate of interest. Using data from the Longitudinal Studies of HIV-Associated Lung Infections and Complications, we perform simultaneous adjustment of misclassification and missingness in the self-reported cocaine and heroin use variable when assessing its association with lung density measures.
Collapse
Affiliation(s)
- Michelle Xia
- Department of Statistics and Actuarial Science, 2848Northern Illinois University, Dekalb, IL 60115, USA
| | - Rexford M Akakpo
- Department of Statistics and Actuarial Science, 2848Northern Illinois University, Dekalb, IL 60115, USA
| |
Collapse
|
4
|
On the Relation between Prediction and Imputation Accuracy under Missing Covariates. ENTROPY 2022; 24:e24030386. [PMID: 35327897 PMCID: PMC8947649 DOI: 10.3390/e24030386] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 02/23/2022] [Accepted: 02/23/2022] [Indexed: 02/01/2023]
Abstract
Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the use of modern Machine-Learning algorithms for imputation. This originates from their capability of showing favorable prediction accuracy in different learning problems. In this work, we analyze through simulation the interaction between imputation accuracy and prediction accuracy in regression learning problems with missing covariates when Machine-Learning-based methods for both imputation and prediction are used. We see that even a slight decrease in imputation accuracy can seriously affect the prediction accuracy. In addition, we explore imputation performance when using statistical inference procedures in prediction settings, such as the coverage rates of (valid) prediction intervals. Our analysis is based on empirical datasets provided by the UCI Machine Learning repository and an extensive simulation study.
Collapse
|
5
|
Shi J, Qin G, Zhu H, Zhu Z. Communication-efficient distributed M-estimation with missing data. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2021.107251] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
6
|
Diao G, Qin J. New semiparametric regression method with applications in selection‐biased sampling and missing data problems. CAN J STAT 2021. [DOI: 10.1002/cjs.11615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Guoqing Diao
- Department of Biostatistics and Bioinformatics George Washington University Washington DC U.S.A
| | - Jing Qin
- National Institution of Allergy and Infectious Diseases Bethesda MD U.S.A
| |
Collapse
|
7
|
Darba S, Safaei N, Mahboub–Ahari A, Nosratnejad S, Alizadeh G, Ameri H, Yousefi M. Direct and Indirect Costs Associated with Coronary Artery (Heart) Disease in Tabriz, Iran. Risk Manag Healthc Policy 2020; 13:969-978. [PMID: 32801971 PMCID: PMC7406327 DOI: 10.2147/rmhp.s261612] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 07/14/2020] [Indexed: 12/18/2022] Open
Abstract
PURPOSE Cardiovascular diseases (CVDs) are the major causes of mortalities worldwide. This study was conducted to evaluate the direct and indirect costs of coronary artery disease (CAD) in Iran. PATIENTS AND METHODS This is a prevalence-based cost-of-illness (COI) study that estimates the direct and indirect costs of CAD. The study conducted over a six-month period from April to September in 2017. Patients were recruited from Madani hospital in Tabriz, Iran. A total of 379 patients were investigated from societal perspective. Direct costs were estimated using the bottom-up costing approach and indirect costs were estimated using the Human Capital (HC) approach. A generalized linear model of regression was used to explore the relation between total cost and socio-demographic variables. The total annual mean cost was compared to Gross Domestic Product (GDP) per capita which was reported in the form of Purchasing Power Parity (PPP) index. To deal with uncertainty, one-way sensitivity analysis was performed. RESULTS Total costs per patient in one year were estimated to be IRR 63452290.17 ($PPP 7736.19) at a 95% confidence interval (58191511.73-68713068.60), the biggest part of which is related to direct medical costs with IRR 33884019.53 per year ($PPP 4131.18) (54%). Direct non-medical costs were estimated IRR 1655936.68 ($PPP 201.89) per patient (2%) and indirect costs were estimated IRR 27912333.97 per patient ($PPP 3403.11) (44%), which 62% of indirect costs is related to patients' work absenteeism. CONCLUSION This study estimates the direct (56%) and indirect (44%) costs associated with CAD. The study explores the essential drivers of the costs and provides the magnitude of the burden in terms of the share of GDP. The outcomes can be used in priority setting, in particular for cost benefit analysis, and adopting new policies regarding insurance coverage and equity issues.
Collapse
Affiliation(s)
- Shahla Darba
- Department of Health Economics, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Naser Safaei
- Madani Heart Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Alireza Mahboub–Ahari
- Department of Health Economics, Iranian Center of Excellence in Health Services Management, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Shirin Nosratnejad
- Department of Health Economics, Iranian Center of Excellence in Health Services Management, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Gisoo Alizadeh
- Department of Health Policy and Management, Iranian Center of Excellence in Health Management, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Hosein Ameri
- Health Policy and Management Research, Department of Health Care Management, School of Public Health, Shahid Sadoughi University of Medical Science, Yazd, Iran
| | - Mahmood Yousefi
- Department of Health Economics, Iranian Center of Excellence in Health Services Management, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
8
|
Rana S. Analysis of longitudinal ordinal data using semi-parametric mixed model under missingness. COMMUN STAT-SIMUL C 2020. [DOI: 10.1080/03610918.2020.1778031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Subrata Rana
- Department of Statistics, Krishnagar Govt. College, Krishnagar, India
| | | |
Collapse
|
9
|
Lüdtke O, Robitzsch A, West SG. Analysis of Interactions and Nonlinear Effects with Missing Data: A Factored Regression Modeling Approach Using Maximum Likelihood Estimation. MULTIVARIATE BEHAVIORAL RESEARCH 2020; 55:361-381. [PMID: 31366241 DOI: 10.1080/00273171.2019.1640104] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
When estimating multiple regression models with incomplete predictor variables, it is necessary to specify a joint distribution for the predictor variables. A convenient assumption is that this distribution is a multivariate normal distribution, which is also the default in many statistical software packages. This distribution will in general be misspecified if predictors with missing data have nonlinear effects (e.g., x2) or are included in interaction terms (e.g., x·z). In the present article, we introduce a factored regression modeling approach for estimating regression models with missing data that is based on maximum likelihood estimation. In this approach, the model likelihood is factorized into a part that is due to the model of interest and a part that is due to the model for the incomplete predictors. In three simulation studies, we showed that the factored regression modeling approach produced valid estimates of interaction and nonlinear effects in regression models with missing values on categorical or continuous predictor variables under a broad range of conditions. We developed the R package mdmb, which facilitates a user-friendly application of the factored regression modeling approach, and present a real-data example that illustrates the flexibility of the software.
Collapse
Affiliation(s)
- Oliver Lüdtke
- Leibniz Institute for Science and Mathematics Education
- Centre for International Student Assessment
| | - Alexander Robitzsch
- Leibniz Institute for Science and Mathematics Education
- Centre for International Student Assessment
| | | |
Collapse
|
10
|
Ezedike C, Ohazurike E, Emetumah FC, Ajaegbu OO. Health-seeking behavior and waste management practices among women in major urban markets in Owerri, Nigeria. AIMS Public Health 2020; 7:169-187. [PMID: 32258198 PMCID: PMC7109532 DOI: 10.3934/publichealth.2020015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2019] [Accepted: 02/05/2020] [Indexed: 11/18/2022] Open
Abstract
Behavioral patterns on seeking health are pertinent in terms of how waste is managed. However, informal approach towards waste management has led to poor environmental attitude and pernicious health consequences for many Nigerians. Despite plethora of scientific investigation on waste management, there has been paucity of information on health-seeking behavior and waste management practices among market women, hence the need for this research. The study aimed at assessing the health-seeking behavioral pattern of women traders on waste management in major urban markets in Owerri, Nigeria by identifying the extent of their commitment to sustainable waste management practices, investigating health-seeking behaviors that influence their attitude towards waste management and measuring prevalence of waste-related diseases among them. Data collection for the study involved a cross-sectional survey of 739 women trading in three Owerri major urban markets in line with the study's aim. Results show that motivation to manage waste for disease control was effectively predicted by type of trading item (Omnibus Test: χ2 = 13.871, df = 3, p-value = 0.003); Cochran-Armitage tests of trend show that there is no statistically linear trend between the proportions of understanding the 3Rs and the rankings for methods of seeking health; understanding the 3Rs was not determined by health-seeking method as most methods were with motivation to manage waste discordant (4 out 5 health-seeking methods had negative Goodman & Kruskal's G values); PCA on the prevalence of waste-related diseases had a two-component structure which followed acute and chronic dimensions; vegetable and plastics comprised the highest waste streams with plastics being most reused waste type while government is mainly responsible for waste disposal. The study recommends a knowledge transfer approach in entrenching sustainable waste management practices.
Collapse
Affiliation(s)
- Cyprian Ezedike
- Department of Geography & Environmental Management, Imo state university, Owerri, Imo state, Nigeria
| | - Eudora Ohazurike
- Department of Political Science, Imo state university, Owerri, Imo state, Nigeria
| | - Faisal C Emetumah
- Department of Geography & Environmental Management, Imo state university, Owerri, Imo state, Nigeria
| | | |
Collapse
|
11
|
Balan RM, Jankovic D. Asymptotic Theory for Longitudinal Data with Missing Responses Adjusted by Inverse Probability Weights. MATHEMATICAL METHODS OF STATISTICS 2019. [DOI: 10.3103/s1066530719020017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
12
|
Diallo AO, Diop A, Dupuy JF. Estimation in zero-inflated binomial regression with missing covariates. STATISTICS-ABINGDON 2019. [DOI: 10.1080/02331888.2019.1619741] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Alpha Oumar Diallo
- LERSTAD, CEA-MITIC, Gaston Berger University, Saint Louis, Senegal
- Univ Rennes, INSA Rennes, CNRS, IRMAR - UMR 6625, F-35000 Rennes, France
| | - Aliou Diop
- LERSTAD, CEA-MITIC, Gaston Berger University, Saint Louis, Senegal
| | | |
Collapse
|
13
|
Sun B, Liu L, Miao W, Wirth K, Robins J, Tchetgen Tchetgen EJ. Semiparametric Estimation with Data Missing Not at Random Using an Instrumental Variable. Stat Sin 2018; 28:1965-1983. [PMID: 33335381 PMCID: PMC7743916 DOI: 10.5705/ss.202016.0324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Missing data occur frequently in empirical studies in health and social sciences, often compromising our ability to make accurate inferences. An outcome is said to be missing not at random (MNAR) if, conditional on the observed variables, the missing data mechanism still depends on the unobserved outcome. In such settings, identification is generally not possible without imposing additional assumptions. Identification is sometimes possible, however, if an instrumental variable (IV) is observed for all subjects which satisfies the exclusion restriction that the IV affects the missingness process without directly influencing the outcome. In this paper, we provide necessary and sufficient conditions for nonparametric identification of the full data distribution under MNAR with the aid of an IV. In addition, we give sufficient identification conditions that are more straightforward to verify in practice. For inference, we focus on estimation of a population outcome mean, for which we develop a suite of semiparametric estimators that extend methods previously developed for data missing at random. Specifically, we propose inverse probability weighted estimation, outcome regression-based estimation and doubly robust estimation of the mean of an outcome subject to MNAR. For illustration, the methods are used to account for selection bias induced by HIV testing refusal in the evaluation of HIV seroprevalence in Mochudi, Botswana, using interviewer characteristics such as gender, age and years of experience as IVs.
Collapse
Affiliation(s)
- BaoLuo Sun
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| | - Lan Liu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| | - Wang Miao
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
- Beijing International Center for Mathematical Research, Peking University
| | - Kathleen Wirth
- Department of Epidemiology, Harvard T.H. Chan School of Public Health
- Departments of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health
| | - James Robins
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
- Department of Epidemiology, Harvard T.H. Chan School of Public Health
| | - Eric J Tchetgen Tchetgen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
- Department of Epidemiology, Harvard T.H. Chan School of Public Health
| |
Collapse
|
14
|
Tchetgen EJT, Wang L, Sun B. Discrete Choice Models for Nonmonotone Nonignorable Missing Data: Identification and Inference. Stat Sin 2018; 28:2069-2088. [PMID: 33994754 DOI: 10.5705/ss.202016.0325] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Nonmonotone missing data arise routinely in empirical studies of social and health sciences, and when ignored, can induce selection bias and loss of efficiency. In practice, it is common to account for nonresponse under a missing-at-random assumption which although convenient, is rarely appropriate when nonresponse is nonmonotone. Likelihood and Bayesian missing data methodologies often require specification of a parametric model for the full data law, thus a priori ruling out any prospect for semiparametric inference. In this paper, we propose an all-purpose approach which delivers semiparametric inferences when missing data are nonmonotone and not at random. The approach is based on a discrete choice model (DCM) as a means to generate a large class of nonmonotone nonresponse mechanisms that are nonignorable. Sufficient conditions for nonparametric identification are given, and a general framework for fully parametric and semiparametric inference under an arbitrary DCM is proposed. Special consideration is given to the case of logit discrete choice nonresponse model (LDCM) for which we describe generalizations of inverse-probability weighting, pattern-mixture estimation, doubly robust estimation and multiply robust estimation.
Collapse
Affiliation(s)
| | - Linbo Wang
- Department of Biostatistics, Harvard University
| | - BaoLuo Sun
- Department of Biostatistics, Harvard University
| |
Collapse
|
15
|
Sun B, Tchetgen Tchetgen EJ. On Inverse Probability Weighting for Nonmonotone Missing at Random Data. J Am Stat Assoc 2017; 113:369-379. [PMID: 30034062 DOI: 10.1080/01621459.2016.1256814] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The development of coherent missing data models to account for nonmonotone missing at random (MAR) data by inverse probability weighting (IPW) remains to date largely unresolved. As a consequence, IPW has essentially been restricted for use only in monotone missing data settings. We propose a class of models for nonmonotone missing data mechanisms that spans the MAR model, while allowing the underlying full data law to remain unrestricted. For parametric specifications within the proposed class, we introduce an unconstrained maximum likelihood estimator for estimating the missing data probabilities which can be easily implemented using existing software. To circumvent potential convergence issues with this procedure, we also introduce a Bayesian constrained approach to estimate the missing data process which is guaranteed to yield inferences that respect all model restrictions. The efficiency of the standard IPW estimator is improved by incorporating information from incomplete cases through an augmented estimating equation which is optimal within a large class of estimating equations. We investigate the finite-sample properties of the proposed estimators in a simulation study and illustrate the new methodology in an application evaluating key correlates of preterm delivery for infants born to HIV infected mothers in Botswana, Africa.
Collapse
Affiliation(s)
- BaoLuo Sun
- Department of Biostatistics, Harvard School of Public Health
| | | |
Collapse
|
16
|
Gupta VK, Grover G. Multiple imputation for gamma outcome variable using generalized linear model. J STAT COMPUT SIM 2017. [DOI: 10.1080/00949655.2017.1300904] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Vinay K. Gupta
- Department of Statistics, University of Delhi, Delhi, India
| | - Gurprit Grover
- Department of Statistics, University of Delhi, Delhi, India
| |
Collapse
|
17
|
Poleto FZ, Paulino CD, Singer JM, Molenberghs G. Semi-parametric Bayesian analysis of binary responses with a continuous covariate subject to non-random missingness. STAT MODEL 2014. [DOI: 10.1177/1471082x14549290] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Missingness in explanatory variables requires a model for the covariates even if the interest lies only in a model for the outcomes given the covariates. An incorrect specification of the models for the covariates or for the missingness mechanism may lead to biased inferences for the parameters of interest. Previously published articles either use semi-/non-parametric flexible distributions for the covariates and identify the model via a missing at random assumption, or employ parametric distributions for the covariates and allow a more general non-random missingness mechanism. We consider the analysis of binary responses, combining a missing not at random mechanism with a non-parametric model based on a Dirichlet process mixture for the continuous covariates. We illustrate the proposal with simulations and the analysis of a dataset.
Collapse
Affiliation(s)
- Frederico Z Poleto
- Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brazil
| | - Carlos Daniel Paulino
- Instituto Superior Técnico, Universidade de Lisboa (and CEAUL-FCUL), Av. Rovisco Pais, Lisboa, Portugal
| | - Julio M Singer
- Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brazil
| | - Geert Molenberghs
- I-BioStat, Universiteit Hasselt, Diepenbeek, Belgium, and Katholieke Universiteit Leuven, Leuven, Belgium
| |
Collapse
|
18
|
Rana S, Roy S, Das K. On analyzing ordinal data when responses and covariates are both missing at random. Stat Methods Med Res 2013; 25:1564-78. [PMID: 23804969 DOI: 10.1177/0962280213492063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In many occasions, particularly in biomedical studies, data are unavailable for some responses and covariates. This leads to biased inference in the analysis when a substantial proportion of responses or a covariate or both are missing. Except a few situations, methods for missing data have earlier been considered either for missing response or for missing covariates, but comparatively little attention has been directed to account for both missing responses and missing covariates, which is partly attributable to complexity in modeling and computation. This seems to be important as the precise impact of substantial missing data depends on the association between two missing data processes as well. The real difficulty arises when the responses are ordinal by nature. We develop a joint model to take into account simultaneously the association between the ordinal response variable and covariates and also that between the missing data indicators. Such a complex model has been analyzed here by using the Markov chain Monte Carlo approach and also by the Monte Carlo relative likelihood approach. Their performance on estimating the model parameters in finite samples have been looked into. We illustrate the application of these two methods using data from an orthodontic study. Analysis of such data provides some interesting information on human habit.
Collapse
Affiliation(s)
- Subrata Rana
- Department of Statistics, University of Calcutta, Kolkata, India
| | - Surupa Roy
- Department of Statistics, St. Xavier's College, Kolkata, India
| | - Kalyan Das
- Department of Statistics, University of Calcutta, Kolkata, India
| |
Collapse
|
19
|
Escarela G, Ruiz-de-Chavez J, Castillo-Morales A. Addressing missing covariates for the regression analysis of competing risks: Prognostic modelling for triaging patients diagnosed with prostate cancer. Stat Methods Med Res 2013; 25:1579-95. [PMID: 23804968 DOI: 10.1177/0962280213492406] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Competing risks arise in medical research when subjects are exposed to various types or causes of death. Data from large cohort studies usually exhibit subsets of regressors that are missing for some study subjects. Furthermore, such studies often give rise to censored data. In this article, a carefully formulated likelihood-based technique for the regression analysis of right-censored competing risks data when two of the covariates are discrete and partially missing is developed. The approach envisaged here comprises two models: one describes the covariate effects on both long-term incidence and conditional latencies for each cause of death, whilst the other deals with the observation process by which the covariates are missing. The former is formulated with a well-established mixture model and the latter is characterised by copula-based bivariate probability functions for both the missing covariates and the missing data mechanism. The resulting formulation lends itself to the empirical assessment of non-ignorability by performing sensitivity analyses using models with and without a non-ignorable component. The methods are illustrated on a 20-year follow-up involving a prostate cancer cohort from the National Cancer Institutes Surveillance, Epidemiology, and End Results program.
Collapse
Affiliation(s)
- Gabriel Escarela
- Departamento de Matemáticas, Universidad Autónoma Metropolitana - Iztapalapa, Mexico City, Mexico
| | - Juan Ruiz-de-Chavez
- Departamento de Matemáticas, Universidad Autónoma Metropolitana - Iztapalapa, Mexico City, Mexico
| | - Alberto Castillo-Morales
- Departamento de Matemáticas, Universidad Autónoma Metropolitana - Iztapalapa, Mexico City, Mexico
| |
Collapse
|
20
|
Chen B, Zhou XH. A latent-variable marginal method for multi-level incomplete binary data. Stat Med 2012; 31:3211-22. [PMID: 22733392 PMCID: PMC3631603 DOI: 10.1002/sim.5394] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2011] [Accepted: 03/13/2012] [Indexed: 12/27/2022]
Abstract
Incomplete multi-level data arise commonly in many clinical trials and observational studies. Because of multi-level variations in this type of data, appropriate data analysis should take these variations into account. A random effects model can allow for the multi-level variations by assuming random effects at each level, but the computation is intensive because high-dimensional integrations are often involved in fitting models. Marginal methods such as the inverse probability weighted generalized estimating equations can involve simple estimation computation, but it is hard to specify the working correlation matrix for multi-level data. In this paper, we introduce a latent variable method to deal with incomplete multi-level data when the missing mechanism is missing at random, which fills the gap between the random effects model and marginal models. Latent variable models are built for both the response and missing data processes to incorporate the variations that arise at each level. Simulation studies demonstrate that this method performs well in various situations. We apply the proposed method to an Alzheimer's disease study.
Collapse
Affiliation(s)
- Baojiang Chen
- Department of Biostatistics, University of Nebraska Medical Center, Omaha, NE 68198, U.S.A.
| | | |
Collapse
|
21
|
Abstract
We review the class of inverse probability weighting (IPW) approaches for the analysis of missing data under various missing data patterns and mechanisms. The IPW methods rely on the intuitive idea of creating a pseudo-population of weighted copies of the complete cases to remove selection bias introduced by the missing data. However, different weighting approaches are required depending on the missing data pattern and mechanism. We begin with a uniform missing data pattern (i.e. a scalar missing indicator indicating whether or not the full data is observed) to motivate the approach. We then generalise to more complex settings. Our goal is to provide a conceptual overview of existing IPW approaches and illustrate the connections and differences among these approaches.
Collapse
Affiliation(s)
- Lingling Li
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA.
| | | | | | | |
Collapse
|
22
|
Chen B, Zhou XH. Doubly robust estimates for binary longitudinal data analysis with missing response and missing covariates. Biometrics 2011; 67:830-42. [PMID: 21281272 DOI: 10.1111/j.1541-0420.2010.01541.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Longitudinal studies often feature incomplete response and covariate data. Likelihood-based methods such as the expectation-maximization algorithm give consistent estimators for model parameters when data are missing at random (MAR) provided that the response model and the missing covariate model are correctly specified; however, we do not need to specify the missing data mechanism. An alternative method is the weighted estimating equation, which gives consistent estimators if the missing data and response models are correctly specified; however, we do not need to specify the distribution of the covariates that have missing values. In this article, we develop a doubly robust estimation method for longitudinal data with missing response and missing covariate when data are MAR. This method is appealing in that it can provide consistent estimators if either the missing data model or the missing covariate model is correctly specified. Simulation studies demonstrate that this method performs well in a variety of situations.
Collapse
Affiliation(s)
- Baojiang Chen
- Department of Biostatistics, College of Public Health, University of Nebraska Medical Center, Omaha, Nebraska 68198, USA.
| | | |
Collapse
|
23
|
Hemming K, Hutton JL, Maguire MG, Marson AG. Meta-regression with partial information on summary trial or patient characteristics. Stat Med 2010; 29:1312-24. [PMID: 20087842 DOI: 10.1002/sim.3848] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We present a model for meta-regression in the presence of missing information on some of the study level covariates, obtaining inferences using Bayesian methods. In practice, when confronted with missing covariate data in a meta-regression, it is common to carry out a complete case or available case analysis. We propose to use the full observed data, modelling the joint density as a factorization of a meta-regression model and a conditional factorization of the density for the covariates. With the inclusion of several covariates, inter-relations between these covariates are modelled. Under this joint likelihood-based approach, it is shown that the lesser assumption of the covariates being Missing At Random is imposed, instead of the more usual Missing Completely At Random (MCAR) assumption. The model is easily programmable in WinBUGS, and we examine, through the analysis of two real data sets, sensitivity and robustness of results to the MCAR assumption.
Collapse
Affiliation(s)
- K Hemming
- Department of Public Health, Epidemiology and Biostatistics, University of Birmingham, UK.
| | | | | | | |
Collapse
|
24
|
Xu S, Hu Z. Generalized linear model for interval mapping of quantitative trait loci. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2010; 121:47-63. [PMID: 20180093 PMCID: PMC2871098 DOI: 10.1007/s00122-010-1290-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2009] [Accepted: 02/01/2010] [Indexed: 05/23/2023]
Abstract
We developed a generalized linear model of QTL mapping for discrete traits in line crossing experiments. Parameter estimation was achieved using two different algorithms, a mixture model-based EM (expectation-maximization) algorithm and a GEE (generalized estimating equation) algorithm under a heterogeneous residual variance model. The methods were developed using ordinal data, binary data, binomial data and Poisson data as examples. Applications of the methods to simulated as well as real data are presented. The two different algorithms were compared in the data analyses. In most situations, the two algorithms were indistinguishable, but when large QTL are located in large marker intervals, the mixture model-based EM algorithm can fail to converge to the correct solutions. Both algorithms were coded in C++ and interfaced with SAS as a user-defined SAS procedure called PROC QTL.
Collapse
Affiliation(s)
- Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA.
| | | |
Collapse
|
25
|
Hill KD, LoGiudice D, Lautenschlager NT, Said CM, Dodd KJ, Suttanon P. Effectiveness of balance training exercise in people with mild to moderate severity Alzheimer's disease: protocol for a randomised trial. BMC Geriatr 2009; 9:29. [PMID: 19607686 PMCID: PMC2722658 DOI: 10.1186/1471-2318-9-29] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2009] [Accepted: 07/16/2009] [Indexed: 01/23/2023] Open
Abstract
Background Balance dysfunction and falls are common problems in later stages of dementia. Exercise is a well-established intervention to reduce falls in cognitively intact older people, although there is limited randomised trial evidence of outcomes in people with dementia. The primary objective of this study is to evaluate whether a home-based balance exercise programme improves balance performance in people with mild to moderate severity Alzheimer's disease. Methods/design Two hundred and fourteen community dwelling participants with mild to moderate severity Alzheimer's disease will be recruited for the randomised controlled trial. A series of laboratory and clinical measures will be used to evaluate balance and mobility performance at baseline. Participants will then be randomized to receive either a balance training home exercise programme (intervention group) from a physiotherapist, or an education, information and support programme from an occupational therapist (control group). Both groups will have six home visits in the six months following baseline assessment, as well as phone support. All participants will be re-assessed at the completion of the programme (after six months), and again in a further six months to evaluate sustainability of outcomes. The primary outcome measures will be the Limits of Stability (a force platform measure of balance) and the Step Test (a clinical measure of balance). Secondary outcomes include other balance and mobility measures, number of falls and falls risk measures, cognitive and behavioural measures, and carer burden and quality of life measures. Assessors will be blind to group allocation. Longitudinal change in balance performance will be evaluated in a sub-study, in which the first 64 participants of the control group with mild to moderate severity Alzheimer's disease, and 64 age and gender matched healthy participants will be re-assessed on all measures at initial assessment, and then at 6, 12, 18 and 24 months. Discussion By introducing a balance programme at an early stage of the dementia pathway, when participants are more likely capable of safe and active participation in balance training, there is potential that balance performance will be improved as dementia progresses, which may reduce the high falls risk at this later stage. If successful, this approach has the potential for widespread application through community based services for people with mild to moderate severity Alzheimer's disease. Trial registration The protocol for this study is registered with the Australian New Zealand Clinical Trials Registry (ACTRN12608000040369).
Collapse
Affiliation(s)
- Keith D Hill
- Musculoskeletal Research Centre, Faculty of Health Sciences, La Trobe University, Bundoora, Victoria, 3086 Australia.
| | | | | | | | | | | |
Collapse
|
26
|
Horton NJ, Roberts K, Ryan L, Suglia SF, Wright RJ. A maximum likelihood latent variable regression model for multiple informants. Stat Med 2009; 27:4992-5004. [PMID: 18613227 DOI: 10.1002/sim.3324] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Studies pertaining to childhood psychopathology often incorporate information from multiple sources (or informants). For example, measurement of some factor of particular interest might be collected from parents, teachers as well as the children being studied. We propose a latent variable modeling framework to incorporate multiple informant predictor data. Several related models are presented, and likelihood ratio tests are introduced to formally compare fit. The incorporation of partially observed subjects is addressed under a variety of missing data mechanisms. The methods are motivated by and applied to a study of the association of chronic exposure to violence on asthma in children.
Collapse
Affiliation(s)
- Nicholas J Horton
- Department of Mathematics and Statistics, Smith College, Northampton, MA 01063, USA.
| | | | | | | | | |
Collapse
|
27
|
Chen Q, Ibrahim JG, Chen MH, Senchaudhuri P. Theory and Inference for Regression Models with Missing Responses and Covariates. J MULTIVARIATE ANAL 2008; 99:1302-1331. [PMID: 19169388 DOI: 10.1016/j.jmva.2007.08.009] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
In this paper, we carry out an in-depth theoretical investigation for inference with missing response and covariate data for general regression models. We assume that the missing data are Missing at Random (MAR) or Missing Completely at Random (MCAR) throughout. Previous theoretical investigations in the literature have focused only on missing covariates or missing responses, but not both. Here, we consider theoretical properties of the estimates under three different estimation settings: complete case analysis (CC), a complete response analysis (CR) that involves an analysis of those subjects with only completely observed responses, and the all case analysis (AC), which is an analysis based on all of the cases. Under each scenario, we derive general expressions for the likelihood and devise estimation schemes based on the EM algorithm. We carry out a theoretical investigation of the three estimation methods in the normal linear model and analytically characterize the loss of information for each method, as well as derive and compare the asymptotic variances for each method assuming the missing data are MAR or MCAR. In addition, a theoretical investigation of bias for the CC method is also carried out. A simulation study and real dataset are given to illustrate the methodology.
Collapse
Affiliation(s)
- Qingxia Chen
- Qingxia Chen is Assistant Professor, Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, . Joseph G. Ibrahim is Professor, Department of Biostatistics, University of North Carolina, McGavran-Greenberg Hall, Chapel Hill, NC 27599, . Ming-Hui Chen is Professor, Department of Statistics, University of Connecticut, 215 Glenbrook Road, U-4120, Storrs, CT 06269-4120, . Pralay Senchaudhuri is Director of Cytel Software Corporation, Cambridge, MA 02139,
| | | | | | | |
Collapse
|
28
|
Morara M, Ryan L, Houseman A, Strauss W. Optimal design for epidemiological studies subject to designed missingness. LIFETIME DATA ANALYSIS 2007; 13:583-605. [PMID: 18080755 DOI: 10.1007/s10985-007-9068-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2007] [Accepted: 10/25/2007] [Indexed: 05/25/2023]
Abstract
In large epidemiological studies, budgetary or logistical constraints will typically preclude study investigators from measuring all exposures, covariates and outcomes of interest on all study subjects. We develop a flexible theoretical framework that incorporates a number of familiar designs such as case control and cohort studies, as well as multistage sampling designs. Our framework also allows for designed missingness and includes the option for outcome dependent designs. Our formulation is based on maximum likelihood and generalizes well known results for inference with missing data to the multistage setting. A variety of techniques are applied to streamline the computation of the Hessian matrix for these designs, facilitating the development of an efficient software tool to implement a wide variety of designs.
Collapse
Affiliation(s)
- Michele Morara
- Battelle Memorial Institute, 505 King Avenue, Columbus, OH 43201, USA
| | | | | | | |
Collapse
|
29
|
Robust detection and verification of linear relationships to generate metabolic networks using estimates of technical errors. BMC Bioinformatics 2007; 8:162. [PMID: 17517139 PMCID: PMC1894643 DOI: 10.1186/1471-2105-8-162] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2006] [Accepted: 05/21/2007] [Indexed: 11/30/2022] Open
Abstract
Background The size and magnitude of the metabolome, the ratio between individual metabolites and the response of metabolic networks is controlled by multiple cellular factors. A tight control over metabolite ratios will be reflected by a linear relationship of pairs of metabolite due to the flexibility of metabolic pathways. Hence, unbiased detection and validation of linear metabolic variance can be interpreted in terms of biological control. For robust analyses, criteria for rejecting or accepting linearities need to be developed despite technical measurement errors. The entirety of all pair wise linear metabolic relationships then yields insights into the network of cellular regulation. Results The Bayesian law was applied for detecting linearities that are validated by explaining the residues by the degree of technical measurement errors. Test statistics were developed and the algorithm was tested on simulated data using 3–150 samples and 0–100% technical error. Under the null hypothesis of the existence of a linear relationship, type I errors remained below 5% for data sets consisting of more than four samples, whereas the type II error rate quickly raised with increasing technical errors. Conversely, a filter was developed to balance the error rates in the opposite direction. A minimum of 20 biological replicates is recommended if technical errors remain below 20% relative standard deviation and if thresholds for false error rates are acceptable at less than 5%. The algorithm was proven to be robust against outliers, unlike Pearson's correlations. Conclusion The algorithm facilitates finding linear relationships in complex datasets, which is radically different from estimating linearity parameters from given linear relationships. Without filter, it provides high sensitivity and fair specificity. If the filter is activated, high specificity but only fair sensitivity is yielded. Total error rates are more favorable with deactivated filters, and hence, metabolomic networks should be generated without the filter. In addition, Bayesian likelihoods facilitate the detection of multiple linear dependencies between two variables. This property of the algorithm enables its use as a discovery tool and to generate novel hypotheses of the existence of otherwise hidden biological factors.
Collapse
|
30
|
Litman HJ, Horton NJ, Hernández B, Laird NM. Incorporating missingness for estimation of marginal regression models with multiple source predictors. Stat Med 2007; 26:1055-68. [PMID: 16755531 PMCID: PMC1808330 DOI: 10.1002/sim.2593] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Multiple informant data refers to information obtained from different individuals or sources used to measure the same construct; for example, researchers might collect information regarding child psychopathology from the child's teacher and the child's parent. Frequently, studies with multiple informants have incomplete observations; in some cases the missingness of informants is substantial. We introduce a Maximum Likelihood (ML) technique to fit models with multiple informants as predictors that permits missingness in the predictors as well as the response. We provide closed form solutions when possible and analytically compare the ML technique to the existing Generalized Estimating Equations (GEE) approach. We demonstrate that the ML approach can be used to compare the effect of the informants on response without standardizing the data. Simulations incorporating missingness show that ML is more efficient than the existing GEE method. In the presence of MCAR missing data, we find through a simulation study that the ML approach is robust to a relatively extreme departure from the normality assumption. We implement both methods in a study investigating the association between physical activity and obesity with activity measured using multiple informants (children and their mothers).
Collapse
Affiliation(s)
- Heather J Litman
- New England Research Institutes, 9 Galen St, Watertown, MA 02472, USA.
| | | | | | | |
Collapse
|
31
|
Horton NJ, Kleinman KP. Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. AM STAT 2007; 61:79-90. [PMID: 17401454 PMCID: PMC1839993 DOI: 10.1198/000313007x172556] [Citation(s) in RCA: 376] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Missing data are a recurring problem that can cause bias or lead to inefficient analyses. Development of statistical methods to address missingness have been actively pursued in recent years, including imputation, likelihood and weighting approaches. Each approach is more complicated when there are many patterns of missing values, or when both categorical and continuous random variables are involved. Implementations of routines to incorporate observations with incomplete variables in regression models are now widely available. We review these routines in the context of a motivating example from a large health services research dataset. While there are still limitations to the current implementations, and additional efforts are required of the analyst, it is feasible to incorporate partially observed values, and these methods should be utilized in practice.
Collapse
Affiliation(s)
| | - Ken P. Kleinman
- Department of Ambulatory Care and Prevention, Harvard Medical School and Harvard Pilgrim Health Care, Boston, MA
| |
Collapse
|
32
|
Siemes C, Visser LE, Coebergh JWW, Splinter TAW, Witteman JCM, Uitterlinden AG, Hofman A, Pols HAP, Stricker BHC. C-reactive protein levels, variation in the C-reactive protein gene, and cancer risk: the Rotterdam Study. J Clin Oncol 2006; 24:5216-22. [PMID: 17114654 DOI: 10.1200/jco.2006.07.1381] [Citation(s) in RCA: 275] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE It remains unclear if inflammation itself may induce cancer, if inflammation is a result of tumor growth, or a combination of both exists. The aim of this study was to examine whether C-reactive protein (CRP) levels and CRP gene variations were associated with an altered risk of colorectal, lung, breast, or prostate cancer. PATIENTS AND METHODS A total of 7,017 participants age > or = 55 years from the Rotterdam Study were eligible for analyses. Mean follow-up time was 10.2 years. High-sensitivity CRP measurements were performed to identify additional values of 0.2 to 1.0 mg/L compared with standard procedures. Genotypes of the CRP gene were determined with an allelic discrimination assay. RESULTS High levels (> 3 mg/L) of CRP were associated with an increased risk of incident cancer (hazard ratio, 1.4; 95% CI, 1.1 to 1.7) compared with persons with low levels (< 1 mg/L), even after a potential latent period of 5 years was introduced. Although CRP seems to affect several cancer sites, the association was strongest for lung cancer (hazard ratio, 2.8; 95% CI, 1.6 to 4.9). A CRP single nucleotide polymorphism associated with decreased CRP levels was associated with an increased lung cancer risk of 2.6 (95% CI, 1.6 to 4.4) in homozygous carriers. CONCLUSION Baseline CRP levels seem to be a biomarker of chronic inflammation preceding lung cancer, even after subtracting a 5-year latent period. Furthermore, CRP gene variation associated with low CRP blood levels was relatively common in patients with lung cancer. Both chronic inflammation and impaired defense mechanisms resulting in chronic inflammation might explain these results.
Collapse
Affiliation(s)
- Claire Siemes
- Department of Epidemiology & Biostatistics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Staatz CE, Byrne C, Thomson AH. Population pharmacokinetic modelling of gentamicin and vancomycin in patients with unstable renal function following cardiothoracic surgery. Br J Clin Pharmacol 2006; 61:164-76. [PMID: 16433871 PMCID: PMC1885003 DOI: 10.1111/j.1365-2125.2005.02547.x] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
AIMS To describe the population pharmacokinetics of gentamicin and vancomycin in cardiothoracic surgery patients with unstable renal function. METHODS Data collected during routine care were analyzed using NONMEM. Linear relationships between creatinine clearance (CL(Cr)) and drug clearance (CL) were identified, and two approaches to modelling changing CL(Cr) were examined. The first included baseline (BCOV) and difference from baseline (DCOV) effects and the second allowed the influence of CL(Cr) to vary between individuals. Final model predictive performance was evaluated using independent data. The data sets were then combined and parameters re-estimated. RESULTS Model building was performed using data from 96 (gentamicin) and 102 (vancomycin) patients, aged 17-87 years. CL(Cr) ranged from 9 to 172 ml min(-1) and changes varied from -76 to 58 ml min(-1) (gentamicin) and -86 to 93 ml min(-1) (vancomycin). Inclusion of BCOV and DCOV improved the fit of the gentamicin data but had little effect on that for vancomycin. Inclusion of interindividual variability (IIV) in the influence of CL(cr) resulted in a poorly characterized model for gentamicin and had no effect on vancomycin modelling. No bias was seen in population compared with individual CL estimates in independent data from 39 (gentamicin) and 37 (vancomycin) patients. Mean (95% CI) differences were 4% (-3, 11%) and 2% (-2, 6%), respectively. Final estimates were: CL(Gent) (l h(-1)) = 2.81 x (1 + 0.015 x (BCOV(CLCr)-BCOV(CLCr Median)) + 0.0174 x DCOV(CLCr)); CL(Vanc) (l h(-1)) = 2.97 x (1 + 0.0205 x (CL(Cr)-CL(Cr Median))). IIV in CL was 27% for both drugs. CONCLUSIONS A parameter describing individual changes in CL(cr) with time improves population pharmacokinetic modelling of gentamicin but not vancomycin in clinically unstable patients.
Collapse
Affiliation(s)
- Christine E Staatz
- Pharmacy Department, Western Infirmary, North Glasgow University Hospitals, NHS, Glasgow G11 6NT, UK
| | | | | |
Collapse
|
34
|
Litman HJ, Horton NJ, Murphy JM, Laird NM. Marginal regression models with a time to event outcome and discrete multiple source predictors. LIFETIME DATA ANALYSIS 2006; 12:249-65. [PMID: 17021951 PMCID: PMC1851698 DOI: 10.1007/s10985-006-9013-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2005] [Accepted: 05/15/2006] [Indexed: 05/12/2023]
Abstract
Information from multiple informants is frequently used to assess psychopathology. We consider marginal regression models with multiple informants as discrete predictors and a time to event outcome. We fit these models to data from the Stirling County Study; specifically, the models predict mortality from self report of psychiatric disorders and also predict mortality from physician report of psychiatric disorders. Previously, Horton et al. found little relationship between self and physician reports of psychopathology, but that the relationship of self report of psychopathology with mortality was similar to that of physician report of psychopathology with mortality. Generalized estimating equations (GEE) have been used to fit marginal models with multiple informant covariates; here we develop a maximum likelihood (ML) approach and show how it relates to the GEE approach. In a simple setting using a saturated model, the ML approach can be constructed to provide estimates that match those found using GEE. We extend the ML technique to consider multiple informant predictors with missingness and compare the method to using inverse probability weighted (IPW) GEE. Our simulation study illustrates that IPW GEE loses little efficiency compared with ML in the presence of monotone missingness. Our example data has non-monotone missingness; in this case, ML offers a modest decrease in variance compared with IPW GEE, particularly for estimating covariates in the marginal models. In more general settings, e.g., categorical predictors and piecewise exponential models, the likelihood parameters from the ML technique do not have the same interpretation as the GEE. Thus, the GEE is recommended to fit marginal models for its flexibility, ease of interpretation and comparable efficiency to ML in the presence of missing data.
Collapse
Affiliation(s)
- Heather J Litman
- New England Research Institutes, 9 Galen Street, Watertown, MA 02472, USA.
| | | | | | | |
Collapse
|
35
|
White IR, Thompson SG. Adjusting for partially missing baseline measurements in randomized trials. Stat Med 2005; 24:993-1007. [PMID: 15570623 DOI: 10.1002/sim.1981] [Citation(s) in RCA: 252] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Adjustment for baseline variables in a randomized trial can increase power to detect a treatment effect. However, when baseline data are partly missing, analysis of complete cases is inefficient. We consider various possible improvements in the case of normally distributed baseline and outcome variables. Joint modelling of baseline and outcome is the most efficient method. Mean imputation is an excellent alternative, subject to three conditions. Firstly, if baseline and outcome are correlated more than about 0.6 then weighting should be used to allow for the greater information from complete cases. Secondly, imputation should be carried out in a deterministic way, using other baseline variables if possible, but not using randomized arm or outcome. Thirdly, if baselines are not missing completely at random, then a dummy variable for missingness should be included as a covariate (the missing indicator method). The methods are illustrated in a randomized trial in community psychiatry.
Collapse
Affiliation(s)
- Ian R White
- MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge CB2 2SR, U.K.
| | | |
Collapse
|
36
|
Abstract
We present a general regression model that accounts for both linkage and linkage disequilibrium (LD) when analyzing nuclear family data. The method does not require LD to exist in order to evaluate linkage, but if LD does exist, the power to detect linkage can increase due to improved information on linkage phase. The proposed method is general, allowing for a variety of traits (e.g., binary affection status, categorical and quantitative phenotypes), affecteds only analyses, and covariates. Covariates can be useful to assess heterogeneity of linkage and LD, as well as gene-environment interactions. Other advantages of our methods are that: LD parameters are not defined without linkage, so that population stratification cannot bias the analyses; a combined test for linkage and LD can be used to test for linkage; given the existence of linkage, an adjusted LD test useful for fine-mapping can be constructed; covariate effects can be flexibly modeled; and families containing a single child and families containing multiple offspring can be combined for a single analysis (capitalizing on the LD information provided by single-child families and the combined linkage and LD information provided by multiple offspring). The basic features of the regression model are presented, as well as discussions of potential applications and critical statistical issues.
Collapse
Affiliation(s)
- D J Schaid
- Department of Health Sciences Research, Mayo Clinic/Foundation, Rochester, Minnesota 55905, USA.
| | | |
Collapse
|