1
|
Zhao L, Chen T, Novitsky V, Wang R. Joint penalized spline modeling of multivariate longitudinal data, with application to HIV-1 RNA load levels and CD4 cell counts. Biometrics 2020; 77:1061-1074. [PMID: 32683682 DOI: 10.1111/biom.13339] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Revised: 06/21/2020] [Accepted: 07/09/2020] [Indexed: 12/01/2022]
Abstract
Motivated by the need to jointly model the longitudinal trajectories of HIV viral load levels and CD4 counts during the primary infection stage, we propose a joint penalized spline modeling approach that can be used to model the repeated measurements from multiple biomarkers of various types (eg, continuous, binary) simultaneously. This approach allows for flexible trajectories for each marker, accounts for potentially time-varying correlation between markers, and is robust to misspecification of knots. Despite its advantages, the application of multivariate penalized spline models, especially when biomarkers may be of different data types, has been limited in part due to its seemingly complexity in implementation. To overcome this, we describe a procedure that transforms the multivariate setting to the univariate one, and then makes use of the generalized linear mixed effect model representation of a penalized spline model to facilitate its implementation with standard statistical software. We performed simulation studies to evaluate the validity and efficiency through joint modeling of correlated biomarkers measured longitudinally compared to the univariate modeling approach. We applied this modeling approach to longitudinal HIV-1 RNA load and CD4 count data from Southern African cohorts to estimate features of the joint distributions such as the correlation and the proportion of subjects with high viral load levels and high CD4 cell counts over time.
Collapse
Affiliation(s)
- Lihui Zhao
- Department of Prevention Medicine, Northwestern University, Chicago, Illinois
| | - Tom Chen
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts
| | - Vladimir Novitsky
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | - Rui Wang
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| |
Collapse
|
2
|
Biswas J, Das K. A Bayesian quantile regression approach to multivariate semi-continuous longitudinal data. Comput Stat 2020. [DOI: 10.1007/s00180-020-01002-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
3
|
Ji K, Dubin JA. A semiparametric stochastic mixed effects model for bivariate cyclic longitudinal data. CAN J STAT 2020. [DOI: 10.1002/cjs.11543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Kexin Ji
- Department of Statistics and Actuarial ScienceUniversity of WaterlooWaterloo ON Canada
| | - Joel A. Dubin
- Department of Statistics and Actuarial ScienceUniversity of WaterlooWaterloo ON Canada
| |
Collapse
|
4
|
A Selective Overview of Skew-Elliptical and Related Distributions and of Their Applications. Symmetry (Basel) 2020. [DOI: 10.3390/sym12010118] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Within the context of flexible parametric families of distributions, much work has been dedicated in recent years to the theme of skew-symmetric distributions, or symmetry-modulated distributions, as we prefer to call them. The present contribution constitutes a review of this area, with special emphasis on multivariate skew-elliptical families, which represent the subset with more immediate impact on applications. After providing background information of the distribution theory aspects, we focus on the aspects more relevant for applied work. The exposition is targeted to non-specialists in this domain, although some general knowledge of probability and multivariate statistics is assumed. Given this aim, the mathematical profile is kept to the minimum required.
Collapse
|
5
|
Kulkarni H, Biswas J, Das K. A joint quantile regression model for multiple longitudinal outcomes. ASTA ADVANCES IN STATISTICAL ANALYSIS 2019. [DOI: 10.1007/s10182-018-00339-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
6
|
Kunihama T, Halpern CT, Herring AH. Non‐parametric Bayes models for mixed scale longitudinal surveys. J R Stat Soc Ser C Appl Stat 2019. [DOI: 10.1111/rssc.12348] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
7
|
Wang J, Luo S. Bayesian multivariate augmented Beta rectangular regression models for patient-reported outcomes and survival data. Stat Methods Med Res 2017; 26:1684-1699. [PMID: 26037528 PMCID: PMC4457342 DOI: 10.1177/0962280215586010] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Many longitudinal studies (e.g. observational studies and randomized clinical trials) have collected multiple rating scales at each visit in the form of patient-reported outcomes (PROs) in the close unit interval [0 ,1]. We propose a joint modeling framework to address the issues from the following data features: (1) multiple correlated PROs; (2) the presence of the boundary values of zeros and ones; (3) extreme outliers and heavy tails; (4) the PRO-dependent terminal events such as death and dropout. Our modeling framework consists of a multivariate augmented mixed-effects sub-model based on Beta rectangular distributions for the multiple longitudinal outcomes and a Cox model for the terminal events. The simulation studies suggest that in the presence of outliers, heavy tails, and dependent terminal event, our proposed models provide more accurate parameter estimates than the joint model based on Beta distributions. The proposed models are applied to the motivating Long-term Study-1 (LS-1 study, n = 1741) of Parkinson's disease patients.
Collapse
Affiliation(s)
| | - Sheng Luo
- Corresponding author: Sheng Luo is Assistant Professor, Department of Biostatistics, The University of Texas Health Science Center at Houston, 1200 Pressler St, Houston, TX 77030, USA (; Phone: 713-500-9554)
| |
Collapse
|
8
|
Bao J, Hanson T, McMillan GP, Knight K. Assessment of DPOAE test-retest difference curves via hierarchical Gaussian processes. Biometrics 2016; 73:334-343. [PMID: 27332505 DOI: 10.1111/biom.12550] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Revised: 03/01/2016] [Accepted: 03/01/2016] [Indexed: 11/25/2022]
Abstract
Distortion product otoacoustic emissions (DPOAE) testing is a promising alternative to behavioral hearing tests and auditory brainstem response testing of pediatric cancer patients. The central goal of this study is to assess whether significant changes in the DPOAE frequency/emissions curve (DP-gram) occur in pediatric patients in a test-retest scenario. This is accomplished through the construction of normal reference charts, or credible regions, that DP-gram differences lie in, as well as contour probabilities that measure how abnormal (or in a certain sense rare) a test-retest difference is. A challenge is that the data were collected over varying frequencies, at different time points from baseline, and on possibly one or both ears. A hierarchical structural equation Gaussian process model is proposed to handle the different sources of correlation in the emissions measurements, wherein both subject-specific random effects and variance components governing the smoothness and variability of each child's Gaussian process are coupled together.
Collapse
Affiliation(s)
- Junshu Bao
- Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania, U.S.A
| | - Timothy Hanson
- Department of Statistics, University of South Carolina, Columbia, South Carolina, U.S.A
| | - Garnett P McMillan
- National Center for Rehabilitative Auditory Research, VA Rehabilitation Research & Development, Portland, Oregon, U.S.A
| | - Kristin Knight
- Oregon Health and Science University, Pediatric Audiology, Portland, Oregon, U.S.A
| |
Collapse
|
9
|
Li Z, Liu H, Tu W. A generalized semiparametric mixed model for analysis of multivariate health care utilization data. Stat Methods Med Res 2015; 26:2909-2918. [PMID: 26596349 DOI: 10.1177/0962280215615159] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Health care utilization is an outcome of interest in health services research. Two frequently studied forms of utilization are counts of emergency department (ED) visits and hospital admissions. These counts collectively convey a sense of disease exacerbation and cost escalation. Different types of event counts from the same patient form a vector of correlated outcomes. Traditional analysis typically model such outcomes one at a time, ignoring the natural correlations between different events, and thus failing to provide a full picture of patient care utilization. In this research, we propose a multivariate semiparametric modeling framework for the analysis of multiple health care events following the exponential family of distributions in a longitudinal setting. Bivariate nonparametric functions are incorporated to assess the concurrent nonlinear influences of independent variables as well as their interaction effects on the outcomes. The smooth functions are estimated using the thin plate regression splines. A maximum penalized likelihood method is used for parameter estimation. The performance of the proposed method was evaluated through simulation studies. To illustrate the method, we analyzed data from a clinical trial in which ED visits and hospital admissions were considered as bivariate outcomes.
Collapse
Affiliation(s)
- Zhuokai Li
- 1 Duke Clinical Research Institute, Durham, NC, USA
| | - Hai Liu
- 2 Gilead Sciences, Inc., Foster City, CA, USA
| | - Wanzhu Tu
- 3 Department of Biostatistics, Indiana University Center for Aging Research, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
10
|
Wang J, Luo S. Augmented Beta rectangular regression models: A Bayesian perspective. Biom J 2015; 58:206-21. [PMID: 26289406 DOI: 10.1002/bimj.201400232] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Revised: 05/04/2015] [Accepted: 05/12/2015] [Indexed: 11/07/2022]
Abstract
Mixed effects Beta regression models based on Beta distributions have been widely used to analyze longitudinal percentage or proportional data ranging between zero and one. However, Beta distributions are not flexible to extreme outliers or excessive events around tail areas, and they do not account for the presence of the boundary values zeros and ones because these values are not in the support of the Beta distributions. To address these issues, we propose a mixed effects model using Beta rectangular distribution and augment it with the probabilities of zero and one. We conduct extensive simulation studies to assess the performance of mixed effects models based on both the Beta and Beta rectangular distributions under various scenarios. The simulation studies suggest that the regression models based on Beta rectangular distributions improve the accuracy of parameter estimates in the presence of outliers and heavy tails. The proposed models are applied to the motivating Neuroprotection Exploratory Trials in Parkinson's Disease (PD) Long-term Study-1 (LS-1 study, n = 1741), developed by The National Institute of Neurological Disorders and Stroke Exploratory Trials in Parkinson's Disease (NINDS NET-PD) network.
Collapse
Affiliation(s)
- Jue Wang
- Department of Biostatistics, The University of Texas Health Science Center at Houston, 1200 Pressler St, Houston, TX, 77030, USA
| | - Sheng Luo
- Department of Biostatistics, The University of Texas Health Science Center at Houston, 1200 Pressler St, Houston, TX, 77030, USA
| |
Collapse
|
11
|
Li Z, Liu H, Tu W. A sexually transmitted infection screening algorithm based on semiparametric regression models. Stat Med 2015; 34:2844-57. [PMID: 25900920 DOI: 10.1002/sim.6515] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2014] [Revised: 03/16/2015] [Accepted: 04/02/2015] [Indexed: 11/11/2022]
Abstract
Sexually transmitted infections (STIs) with Chlamydia trachomatis, Neisseria gonorrhoeae, and Trichomonas vaginalis are among the most common infectious diseases in the United States, disproportionately affecting young women. Because a significant portion of the infections present no symptoms, infection control relies primarily on disease screening. However, universal STI screening in a large population can be expensive. In this paper, we propose a semiparametric model-based screening algorithm. The model quantifies organism-specific infection risks in individual subjects and accounts for the within-subject interdependence of the infection outcomes of different organisms and the serial correlations among the repeated assessments of the same organism. Bivariate thin-plate regression spline surfaces are incorporated to depict the concurrent influences of age and sexual partners on infection acquisition. Model parameters are estimated by using a penalized likelihood method. For inference, we develop a likelihood-based resampling procedure to compare the bivariate effect surfaces across outcomes. Simulation studies are conducted to evaluate the model fitting performance. A screening algorithm is developed using data collected from an epidemiological study of young women at increased risk of STIs. We present evidence that the three organisms have distinct age and partner effect patterns; for C. trachomatis, the partner effect is more pronounced in younger adolescents. Predictive performance of the proposed screening algorithm is assessed through a receiver operating characteristic analysis. We show that the model-based screening algorithm has excellent accuracy in identifying individuals at increased risk, and thus can be used to assist STI screening in clinical practice.
Collapse
Affiliation(s)
- Zhuokai Li
- Duke Clinical Research Institute, 2400 Pratt Street, Durham, NC 27705, U.S.A
| | - Hai Liu
- Department of Biostatistics, Indiana University Schools of Medicine and Public Health, 410 West 10th Street, Indianapolis, IN 46202, U.S.A
| | - Wanzhu Tu
- Department of Biostatistics, Indiana University Schools of Medicine and Public Health, 410 West 10th Street, Indianapolis, IN 46202, U.S.A
| |
Collapse
|
12
|
Das K, Daniels MJ. A semiparametric approach to simultaneous covariance estimation for bivariate sparse longitudinal data. Biometrics 2014; 70:33-43. [PMID: 24400941 DOI: 10.1111/biom.12133] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Revised: 10/01/2013] [Accepted: 11/01/2013] [Indexed: 11/30/2022]
Abstract
Estimation of the covariance structure for irregular sparse longitudinal data has been studied by many authors in recent years but typically using fully parametric specifications. In addition, when data are collected from several groups over time, it is known that assuming the same or completely different covariance matrices over groups can lead to loss of efficiency and/or bias. Nonparametric approaches have been proposed for estimating the covariance matrix for regular univariate longitudinal data by sharing information across the groups under study. For the irregular case, with longitudinal measurements that are bivariate or multivariate, modeling becomes more difficult. In this article, to model bivariate sparse longitudinal data from several groups, we propose a flexible covariance structure via a novel matrix stick-breaking process for the residual covariance structure and a Dirichlet process mixture of normals for the random effects. Simulation studies are performed to investigate the effectiveness of the proposed approach over more traditional approaches. We also analyze a subset of Framingham Heart Study data to examine how the blood pressure trajectories and covariance structures differ for the patients from different BMI groups (high, medium, and low) at baseline.
Collapse
Affiliation(s)
- Kiranmoy Das
- Department of Statistics, Presidency University, Kolkata, 700073, India
| | | |
Collapse
|
13
|
Yu B, O'Malley AJ, Ghosh P. Linear mixed models for multiple outcomes using extended multivariate skew-t distributions. STATISTICS AND ITS INTERFACE 2014; 7:101-111. [PMID: 28435512 PMCID: PMC5397123 DOI: 10.4310/sii.2014.v7.n1.a11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Multivariate outcomes with heavy skewness and thick tails often arise from clustered experiments or longitudinal studies. Linear mixed models with multivariate skew-t (MST) distributions for the random effects and the error terms is a popular tool of robust modeling for such outcomes. However the usual MST distribution only allows a common degree of freedom for all marginal distributions, which is only appropriate when each marginal has the same amount of tail heaviness. In this paper, we introduce a new class of extended MST distributions, which allow different degrees of freedom and thereby can accommodate heterogeneity in tail-heaviness across outcomes. The extended MST distributions yield a flexible family of models for multivariate outcomes. The hierarchical representation of the MST distribution allows MCMC methods to be easily applied to compute the parameter estimates. The proposed model is applied to data from two biomedical studies: one on bivariate markers of AIDS progression and the other on sexual behavior from a longitudinal study.
Collapse
Affiliation(s)
- Binbing Yu
- Laboratory of Epidemiology and Population Sciences, National Institute on Aging Bethesda, MD 20904, U.S.A
| | - A. James O'Malley
- Harvard Medical School, Department of Health Care Policy, 180 Longwood Avenue Boston, MA, 02115, U.S.A
| | - Pulak Ghosh
- Department of Quantitative Methods and Information Sciences, Indian Institute of Management, Bangalore Bannerghatta Road, 560076, Bangalore, India
| |
Collapse
|
14
|
Das K, Li R, Sengupta S, Wu R. A Bayesian semiparametric model for bivariate sparse longitudinal data. Stat Med 2013; 32:3899-910. [PMID: 23553747 PMCID: PMC3740051 DOI: 10.1002/sim.5790] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2011] [Accepted: 02/19/2013] [Indexed: 11/05/2022]
Abstract
Mixed-effects models have recently become popular for analyzing sparse longitudinal data that arise naturally in biological, agricultural and biomedical studies. Traditional approaches assume independent residuals over time and explain the longitudinal dependence by random effects. However, when bivariate or multivariate traits are measured longitudinally, this fundamental assumption is likely to be violated because of intertrait dependence over time. We provide a more general framework where the dependence of the observations from the same subject over time is not assumed to be explained completely by the random effects of the model. We propose a novel, mixed model-based approach and estimate the error-covariance structure nonparametrically under a generalized linear model framework. We use penalized splines to model the general effect of time, and we consider a Dirichlet process mixture of normal prior for the random-effects distribution. We analyze blood pressure data from the Framingham Heart Study where body mass index, gender and time are treated as covariates. We compare our method with traditional methods including parametric modeling of the random effects and independent residual errors over time. We conduct extensive simulation studies to investigate the practical usefulness of the proposed method. The current approach is very helpful in analyzing bivariate irregular longitudinal traits.
Collapse
Affiliation(s)
- Kiranmoy Das
- Department of Statistics, Temple University, Philadelphia, PA 19122, U.S.A
| | | | | | | |
Collapse
|
15
|
Liu H, Tu W. A semiparametric regression model for paired longitudinal outcomes with application in childhood blood pressure development. Ann Appl Stat 2012. [DOI: 10.1214/12-aoas567] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|