1
|
Ng HM, Jiang B, Wong KY. Penalized estimation of a class of single-index varying-coefficient models for integrative genomic analysis. Biom J 2023; 65:e2100139. [PMID: 35837982 DOI: 10.1002/bimj.202100139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 04/15/2022] [Accepted: 05/27/2022] [Indexed: 01/17/2023]
Abstract
Recent technological advances have made it possible to collect high-dimensional genomic data along with clinical data on a large number of subjects. In the studies of chronic diseases such as cancer, it is of great interest to integrate clinical and genomic data to build a comprehensive understanding of the disease mechanisms. Despite extensive studies on integrative analysis, it remains an ongoing challenge to model the interaction effects between clinical and genomic variables, due to high dimensionality of the data and heterogeneity across data types. In this paper, we propose an integrative approach that models interaction effects using a single-index varying-coefficient model, where the effects of genomic features can be modified by clinical variables. We propose a penalized approach for separate selection of main and interaction effects. Notably, the proposed methods can be applied to right-censored survival outcomes based on a Cox proportional hazards model. We demonstrate the advantages of the proposed methods through extensive simulation studies and provide applications to a motivating cancer genomic study.
Collapse
Affiliation(s)
- Hoi Min Ng
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
| | - Binyan Jiang
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
| | - Kin Yau Wong
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
| |
Collapse
|
2
|
Abstract
In longitudinal studies involving laboratory-based outcomes, repeated measurements can be censored due to assay detection limits. Linear mixed-effects (LMEs) models are a powerful tool to model the relationship between a response variable and covariates in longitudinal studies. However, the linear parametric form of linear mixed-effect models is often too restrictive to characterize the complex relationship between a response variable and covariates. More general and robust modeling tools, such as nonparametric and semiparametric regression models, have become increasingly popular in the last decade. In this article, we use semiparametric mixed models to analyze censored longitudinal data with irregularly observed repeated measures. The proposed model extends the censored linear mixed-effect model and provides more flexible modeling schemes by allowing the time effect to vary nonparametrically over time. We develop an Expectation-Maximization (EM) algorithm for maximum penalized likelihood estimation of model parameters and the nonparametric component. Further, as a byproduct of the EM algorithm, the smoothing parameter is estimated using a modified linear mixed-effects model, which is faster than alternative methods such as the restricted maximum likelihood approach. Finally, the performance of the proposed approaches is evaluated through extensive simulation studies as well as applications to data sets from acquired immune deficiency syndrome studies.
Collapse
Affiliation(s)
- Thalita B Mattos
- Departamento de Estatística, Universidade Estadual de Campinas, Brazil
| | | | - Victor H Lachos
- Department of Statistics, 7712University of Connecticut, USA
| |
Collapse
|
3
|
Chen C, Shen B, Liu A, Wu R, Wang M. A multiple robust propensity score method for longitudinal analysis with intermittent missing data. Biometrics 2020; 77:519-532. [PMID: 32662124 DOI: 10.1111/biom.13330] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 04/15/2020] [Accepted: 06/16/2020] [Indexed: 01/05/2023]
Abstract
Longitudinal data are very popular in practice, but they are often missing in either outcomes or time-dependent risk factors, making them highly unbalanced and complex. Missing data may contain various missing patterns or mechanisms, and how to properly handle it for unbiased and valid inference still presents a significant challenge. Here, we propose a novel semiparametric framework for analyzing longitudinal data with both missing responses and covariates that are missing at random and intermittent, a general and widely encountered situation in observational studies. Within this framework, we consider multiple robust estimation procedures based on innovative calibrated propensity scores, which offers additional relaxation of the misspecification of missing data mechanisms and shows more satisfactory numerical performance. Also, the corresponding robust information criterion on consistent variable selection for our proposed model is developed based on empirical likelihood-based methods. These advocated methods are evaluated in both theory and extensive simulation studies in a variety of situations, showing competing properties and advantages compared to the existing approaches. We illustrate the utility of our approach by analyzing the data from the HIV Epidemiology Research Study.
Collapse
Affiliation(s)
- Chixiang Chen
- Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania
| | - Biyi Shen
- Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania
| | - Aiyi Liu
- Biostatistics and Bioinformatics Branch, National Institute of Child Health and Human Development, NIH, Bethesda, Maryland
| | - Rongling Wu
- Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania
| | - Ming Wang
- Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania
| |
Collapse
|
4
|
Abstract
Bayesian additive regression trees (BART) is a flexible prediction model/machine learning approach that has gained widespread popularity in recent years. As BART becomes more mainstream, there is an increased need for a paper that walks readers through the details of BART, from what it is to why it works. This tutorial is aimed at providing such a resource. In addition to explaining the different components of BART using simple examples, we also discuss a framework, the General BART model that unifies some of the recent BART extensions, including semiparametric models, correlated outcomes, and statistical matching problems in surveys, and models with weaker distributional assumptions. By showing how these models fit into a single framework, we hope to demonstrate a simple way of applying BART to research problems that go beyond the original independent continuous or binary outcomes framework.
Collapse
Affiliation(s)
- Yaoyuan Vincent Tan
- Department of Biostatistics and Epidemiology, Rutgers School of Public Health, 683 Hoes Lane West, Piscataway, New Jersey 08854, USA
| | - Jason Roy
- Department of Biostatistics and Epidemiology, Rutgers School of Public Health, 683 Hoes Lane West, Piscataway, New Jersey 08854, USA
| |
Collapse
|
5
|
Abstract
Alzheimer's disease is a firmly incurable and progressive disease. The pathology of Alzheimer's disease usually evolves from cognitive normal, to mild cognitive impairment, to Alzheimer's disease. The aim of this paper is to develop a Bayesian hidden Markov model to characterize disease pathology, identify hidden states corresponding to the diagnosed stages of cognitive decline, and examine the dynamic changes of potential risk factors associated with the cognitive normal-mild cognitive impairment-Alzheimer's disease transition. The hidden Markov model framework consists of two major components. The first one is a state-dependent semiparametric regression for delineating the complex associations between clinical outcomes of interest and a set of prognostic biomarkers across neurodegenerative states. The second one is a parametric transition model, while accounting for potential covariate effects on the cross-state transition. The inter-individual and inter-process differences are taken into account via correlated random effects in both components. Based on the Alzheimer's Disease Neuroimaging Initiative data set, we are able to identify four states of Alzheimer's disease pathology, corresponding to common diagnosed cognitive decline stages, including cognitive normal, early mild cognitive impairment, late mild cognitive impairment, and Alzheimer's disease and examine the effects of hippocampus, age, gender, and APOE- ε 4 on degeneration of cognitive function across the four cognitive states.
Collapse
Affiliation(s)
- Kai Kang
- 1 Department of Statistics, Chinese University of Hong Kong, Hong Kong, China
| | - Jingheng Cai
- 2 Department of Statistics, Sun Yat-sen University, Guangzhou, China
| | - Xinyuan Song
- 1 Department of Statistics, Chinese University of Hong Kong, Hong Kong, China
- 3 Shenzhen Research Institute, Chinese University of Hong Kong, Hong Kong, China
| | - Hongtu Zhu
- 4 MD Anderson Cancer Center, University of Texas, Houston, USA
| |
Collapse
|
6
|
McLain AC, Frongillo EA, Feng J, Borghi E. Prediction intervals for penalized longitudinal models with multisource summary measures: An application to childhood malnutrition. Stat Med 2019; 38:1002-1012. [PMID: 30430613 DOI: 10.1002/sim.8024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Revised: 08/15/2018] [Accepted: 10/12/2018] [Indexed: 11/05/2022]
Abstract
In many global health analyses, it is of interest to examine countries' progress using indicators of socio-economic conditions based on national surveys from varying sources. This results in longitudinal data where heteroscedastic summary measures, rather than individual level data, are available. Administration of national surveys can be sporadic, resulting in sparse data measurements for some countries. Furthermore, the trend of the indicators over time is usually nonlinear and varies by country. It is of interest to track the current level of indicators to determine if countries are meeting certain thresholds, such as those indicated in the United Nations Sustainable Development Goals. In addition, estimation of confidence and prediction intervals are vital to determine true changes in prevalence and where data is low in quantity and/or quality. In this article, we use heteroscedastic penalized longitudinal models with survey summary data to estimate yearly prevalence of malnutrition quantities. We develop and compare methods to estimate confidence and prediction intervals using asymptotic and parametric bootstrap techniques. The intervals can incorporate data from multiple sources or other general data-smoothing steps. The methods are applied to African countries in the UNICEF-WHO-The World Bank joint child malnutrition data set. The properties of the intervals are demonstrated through simulation studies and cross-validation of real data.
Collapse
Affiliation(s)
- Alexander C McLain
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina
| | - Edward A Frongillo
- Department of Health Promotion, Education, and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina
| | - Juan Feng
- Food and Agriculture Organization, United Nations, Rome, Italy
| | | |
Collapse
|
7
|
Tran L, Yiannoutsos C, Wools-Kaloustian K, Siika A, van der Laan M, Petersen M. Double Robust Efficient Estimators of Longitudinal Treatment Effects: Comparative Performance in Simulations and a Case Study. Int J Biostat 2019; 15:ijb-2017-0054. [PMID: 30811344 DOI: 10.1515/ijb-2017-0054] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 11/16/2018] [Indexed: 11/15/2022]
Abstract
A number of sophisticated estimators of longitudinal effects have been proposed for estimating the intervention-specific mean outcome. However, there is a relative paucity of research comparing these methods directly to one another. In this study, we compare various approaches to estimating a causal effect in a longitudinal treatment setting using both simulated data and data measured from a human immunodeficiency virus cohort. Six distinct estimators are considered: (i) an iterated conditional expectation representation, (ii) an inverse propensity weighted method, (iii) an augmented inverse propensity weighted method, (iv) a double robust iterated conditional expectation estimator, (v) a modified version of the double robust iterated conditional expectation estimator, and (vi) a targeted minimum loss-based estimator. The details of each estimator and its implementation are presented along with nuisance parameter estimation details, which include potentially pooling the observed data across all subjects regardless of treatment history and using data adaptive machine learning algorithms. Simulations are constructed over six time points, with each time point steadily increasing in positivity violations. Estimation is carried out for both the simulations and applied example using each of the six estimators under both stratified and pooled approaches of nuisance parameter estimation. Simulation results show that double robust estimators remained without meaningful bias as long as at least one of the two nuisance parameters were estimated with a correctly specified model. Under full misspecification, the bias of the double robust estimators remained better than that of the inverse propensity estimator under misspecification, but worse than the iterated conditional expectation estimator. Weighted estimators tended to show better performance than the covariate estimators. As positivity violations increased, the mean squared error and bias of all estimators considered became worse, with covariate-based double robust estimators especially susceptible. Applied analyses showed similar estimates at most time points, with the important exception of the inverse propensity estimator which deviated markedly as positivity violations increased. Given its efficiency, ability to respect the parameter space, and observed performance, we recommend the pooled and weighted targeted minimum loss-based estimator.
Collapse
Affiliation(s)
- Linh Tran
- Department of Biostatistics, University of California Berkeley, Berkeley, CA, USA
| | - Constantin Yiannoutsos
- Department of Biostatistics, Indiana University Richard M Fairbanks School of Public Health, Indianapolis, IN, USA
| | - Kara Wools-Kaloustian
- Infectious Diseases, Howard Hughes Medical Institute - Indiana University School of Medicine, Indianapolis, IN, USA
| | | | | | - Maya Petersen
- University of California at Berkeley, Berkeley, CAUSA
| |
Collapse
|
8
|
Davenport CA, Maity A, Baladandayuthapani V. Functional interaction-based nonlinear models with application to multiplatform genomics data. Stat Med 2018; 37:2715-2733. [PMID: 29737021 DOI: 10.1002/sim.7671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Revised: 02/05/2018] [Accepted: 03/09/2018] [Indexed: 11/11/2022]
Abstract
Functional regression allows for a scalar response to be dependent on a functional predictor; however, not much work has been done when a scalar exposure that interacts with the functional covariate is introduced. In this paper, we present 2 functional regression models that account for this interaction and propose 2 novel estimation procedures for the parameters in these models. These estimation methods allow for a noisy and/or sparsely observed functional covariate and are easily extended to generalized exponential family responses. We compute standard errors of our estimators, which allows for further statistical inference and hypothesis testing. We compare the performance of the proposed estimators to each other and to one found in the literature via simulation and demonstrate our methods using a real data example.
Collapse
Affiliation(s)
- Clemontina A Davenport
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC, 27705, USA
| | - Arnab Maity
- Department of Statistics, North Carolina State University, Raleigh, NC, 27695, USA
| | | |
Collapse
|
9
|
Abstract
Some biomedical studies lead to mixture data. When a discrete covariate defining subgroup membership is missing for some of the subjects in a study, the distribution of the outcome follows a mixture distribution of the subgroup-specific distributions. Taking into account the uncertain distribution of the group membership and the covariates, we model the relation between the disease onset time and the covariates through transformation models in each sub-population, and develop a nonparametric maximum likelihood based estimation implemented through EM algorithm along with its inference procedure. We further propose methods to identify the covariates that have different effects or common effects in distinct populations, which enables parsimonious modeling and better understanding of the difference across populations. The methods are illustrated through extensive simulation studies and a real data example.
Collapse
Affiliation(s)
- Qianqian Wang
- University of South Carolina, Penn State University and Columbia University
| | - Yanyuan Ma
- University of South Carolina, Penn State University and Columbia University
| | - Yuanjia Wang
- University of South Carolina, Penn State University and Columbia University
| |
Collapse
|
10
|
Tran L, Yiannoutsos CT, Musick BS, Wools-Kaloustian KK, Siika A, Kimaiyo S, van der Laan MJ, Petersen M. Evaluating the Impact of a HIV Low-Risk Express Care Task-Shifting Program: A Case Study of the Targeted Learning Roadmap. Epidemiol Methods 2016; 5:69-91. [PMID: 28736692 PMCID: PMC5520542 DOI: 10.1515/em-2016-0004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In conducting studies on an exposure of interest, a systematic roadmap should be applied for translating causal questions into statistical analyses and interpreting the results. In this paper we describe an application of one such roadmap applied to estimating the joint effect of both time to availability of a nurse-based triage system (low risk express care (LREC)) and individual enrollment in the program among HIV patients in East Africa. Our study population is comprised of 16,513 subjects found eligible for this task-shifting program within 15 clinics in Kenya between 2006 and 2009, with each clinic starting the LREC program between 2007 and 2008. After discretizing follow-up into 90-day time intervals, we targeted the population mean counterfactual outcome (i. e. counterfactual probability of either dying or being lost to follow up) at up to 450 days after initial LREC eligibility under three fixed treatment interventions. These were (i) under no program availability during the entire follow-up, (ii) under immediate program availability at initial eligibility, but non-enrollment during the entire follow-up, and (iii) under immediate program availability and enrollment at initial eligibility. We further estimated the controlled direct effect of immediate program availability compared to no program availability, under a hypothetical intervention to prevent individual enrollment in the program. Targeted minimum loss-based estimation was used to estimate the mean outcome, while Super Learning was implemented to estimate the required nuisance parameters. Analyses were conducted with the ltmle R package; analysis code is available at an online repository as an R package. Results showed that at 450 days, the probability of in-care survival for subjects with immediate availability and enrollment was 0.93 (95% CI: 0.91, 0.95) and 0.87 (95% CI: 0.86, 0.87) for subjects with immediate availability never enrolling. For subjects without LREC availability, it was 0.91 (95% CI: 0.90, 0.92). Immediate program availability without individual enrollment, compared to no program availability, was estimated to slightly albeit significantly decrease survival by 4% (95% CI 0.03,0.06, p<0.01). Immediately availability and enrollment resulted in a 7 % higher in-care survival compared to immediate availability with non-enrollment after 450 days (95% CI-0.08,-0.05, p<0.01). The results are consistent with a fairly small impact of both availability and enrollment in the LREC program on incare survival.
Collapse
Affiliation(s)
- Linh Tran
- Department of Biostatistics, UC Berkeley, 101 Haviland Hall, Berkeley, CA 94720, USA
| | - Constantin T. Yiannoutsos
- Department of Biostatistics, Richard M. Fairbanks School of Public Health, Indiana University, Indianapolis, IN, USA
| | - Beverly S. Musick
- Department of Biostatistics, Indiana University School of Medicine, Indianapolis, IN, USA
| | | | - Abraham Siika
- Moi University School of Medicine, Eldoret, Central Kenya
| | | | - Mark J. van der Laan
- Department of Biostatistics, School of Public Health, UC Berkeley, 108 Haviland Hall, Berkeley, CA 94720-7360, USA
| | - Maya Petersen
- Department of Biostatistics, UC Berkeley, 101 Haviland Hall, Berkeley, CA 94720, USA
| |
Collapse
|
11
|
Abstract
We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of [Formula: see text], where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures.
Collapse
Affiliation(s)
- LiPing Zhu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai 200433, China
- The Key Laboratory of Mathematical Economics (SUFE), Ministry of Education, Shanghai 200433, China
| | - RunZe Li
- Department of Statistics and The Methodology Center, The Pennsylvania State University, University Park, PA 16802, USA
| | - HengJian Cui
- School of Mathematical Science, Capital Normal University, Beijing 100037, China
| |
Collapse
|
12
|
Song XY, Lu ZH, Cai JH, Ip EHS. A Bayesian modeling approach for generalized semiparametric structural equation models. Psychometrika 2013; 78:624-47. [PMID: 24092481 PMCID: PMC5129644 DOI: 10.1007/s11336-013-9323-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2011] [Revised: 08/15/2012] [Indexed: 05/15/2023]
Abstract
In behavioral, biomedical, and psychological studies, structural equation models (SEMs) have been widely used for assessing relationships between latent variables. Regression-type structural models based on parametric functions are often used for such purposes. In many applications, however, parametric SEMs are not adequate to capture subtle patterns in the functions over the entire range of the predictor variable. A different but equally important limitation of traditional parametric SEMs is that they are not designed to handle mixed data types-continuous, count, ordered, and unordered categorical. This paper develops a generalized semiparametric SEM that is able to handle mixed data types and to simultaneously model different functional relationships among latent variables. A structural equation of the proposed SEM is formulated using a series of unspecified smooth functions. The Bayesian P-splines approach and Markov chain Monte Carlo methods are developed to estimate the smooth functions and the unknown parameters. Moreover, we examine the relative benefits of semiparametric modeling over parametric modeling using a Bayesian model-comparison statistic, called the complete deviance information criterion (DIC). The performance of the developed methodology is evaluated using a simulation study. To illustrate the method, we used a data set derived from the National Longitudinal Survey of Youth.
Collapse
Affiliation(s)
- Xin-Yuan Song
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| | | | | | | |
Collapse
|
13
|
Abstract
We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.
Collapse
Affiliation(s)
- Jiawei Wei
- Department of Statistics, 3143 TAMU, Texas A&M University, College Station, Texas 77843, USA
| | - Raymond J. Carroll
- Department of Statistics, 3143 TAMU, Texas A&M University, College Station, Texas 77843, USA
| | - Arnab Maity
- Department of Statistics, North Carolina State University, 2311 Stinson Drive, Raleigh, North Carolina 27695, U.S.A
| |
Collapse
|
14
|
Kosorok MR. What's So Special About Semiparametric Methods? Sankhya Ser B 2009; 71-A:331-353. [PMID: 20640048 PMCID: PMC2903063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The number of scientific publications on semiparametric methods per year has been steadily increasing since the early 1980s. This increased interest has happened in spite of the fact that the novelty of semiparametrics for its own sake has run its course, and semiparametric methods are by now considered classical. The underlying reasons for this continued interest include the genuine scientific utility of semiparametric models combined with the breadth and depth of the many theoretical questions that remain to be answered. Empirical process techniques are an essential research tool for many of these questions. Moreover, both semiparametric methods and empirical processes are playing an increasingly valuable role in high dimensional data analysis and in other emerging areas in statistics. The topics are very fruitful and intriguing for new researchers to engage in. Graduate programs in statistics, biostatistics and econometrics can and should include more empirical processes and semiparametrics in their teaching in order to ensure a sufficient supply of suitably qualified researchers.
Collapse
Affiliation(s)
- Michael R Kosorok
- Department of Biostatistics and Department of Statistics and Operations Research, University of North Carolina at Chapel Hill
| |
Collapse
|
15
|
Maity A, Carroll RJ, Mammen E, Chatterjee N. Testing in semiparametric models with interaction, with applications to gene-environment interactions. J R Stat Soc Series B Stat Methodol 2009; 71:75-96. [PMID: 19838317 PMCID: PMC2762226 DOI: 10.1111/j.1467-9868.2008.00671.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Motivated from the problem of testing for genetic effects on complex traits in the presence of gene-environment interaction, we develop score tests in general semiparametric regression problems that involves Tukey style 1 degree-of-freedom form of interaction between parametrically and non-parametrically modelled covariates. We find that the score test in this type of model, as recently developed by Chatterjee and co-workers in the fully parametric setting, is biased and requires undersmoothing to be valid in the presence of non-parametric components. Moreover, in the presence of repeated outcomes, the asymptotic distribution of the score test depends on the estimation of functions which are defined as solutions of integral equations, making implementation difficult and computationally taxing. We develop profiled score statistics which are unbiased and asymptotically efficient and can be performed by using standard bandwidth selection methods. In addition, to overcome the difficulty of solving functional equations, we give easy interpretations of the target functions, which in turn allow us to develop estimation procedures that can be easily implemented by using standard computational methods. We present simulation studies to evaluate type I error and power of the method proposed compared with a naive test that does not consider interaction. Finally, we illustrate our methodology by analysing data from a case-control study of colorectal adenoma that was designed to investigate the association between colorectal adenoma and the candidate gene NAT2 in relation to smoking history.
Collapse
|
16
|
Abstract
For semiparametric models, interval estimation and hypothesis testing based on the information matrix for the full model is a challenge because of potentially unlimited dimension. Use of the profile information matrix for a small set of parameters of interest is an appealing alternative. Existing approaches for the estimation of the profile information matrix are either subject to the curse of dimensionality, or are ad-hoc and approximate and can be unstable and numerically inefficient. We propose a numerically stable and efficient algorithm that delivers an exact observed profile information matrix for regression coefficients for the class of Nonlinear Transformation Models [A. Tsodikov (2003) J R Statist Soc Ser B 65:759-774]. The algorithm deals with the curse of dimensionality and requires neither large matrix inverses nor explicit expressions for the profile surface.
Collapse
Affiliation(s)
- A. Tsodikov
- Department of Biostatistics, School of Public Health, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109-2029, USA,
| | - G. Garibotti
- Centro Regional Universitario Bariloche, Universidad Nacional del Comahue, Quintral 1250, 8400 Bariloche, Argentina,
| |
Collapse
|
17
|
Erbas B, Bui Q, Huggins R, Harper T, White V. Investigating the relation between placement of Quit antismoking advertisements and number of telephone calls to Quitline: a semiparametric modelling approach. J Epidemiol Community Health 2006; 60:180-2. [PMID: 16415271 PMCID: PMC2566152 DOI: 10.1136/jech.2005.038109] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/09/2005] [Indexed: 11/03/2022]
Abstract
STUDY OBJECTIVES Quitline-an antismoking advertising and a telephone helpline service-is an effective public health intervention strategy for tobacco control. The objective of this short report is to model the relation between placement of antismoking advertisements and calls to Quitline on a given day. METHODS/DESIGN Data on daily Quitline antismoking advertisements, television target audience rating points (TARPS), and calls to Quitline Victoria were studied for the period 1 August 2000 and 31 July 2001. The outcome-calls to Quitline-is a count and thus assumed to follow a Poisson distribution. Generalised partial linear models were used to model the logarithm of mean daily calls as a non-parametric function of time and a linear parametric function of the day of week, number of advertisements, and TARPS. MAIN RESULTS Peak calls to Quitline Victoria occurred during Monday to Wednesday with around three times as many calls compared with Sunday. Both placement of Quitline advertisements (p<0.001) and an increase in TARPS (p<0.001) on a given day significantly increased the number of calls made to Quitline Victoria. The model adequately captured fluctuations in call volume and diagnostics showed no model inadequacy. CONCLUSIONS In this short report the emphasis is on modelling the parametric components-day of week, placement of advertisements, and TARPS on call volume. The dynamics of the underlying time trend in call volume is captured in a non-parametric component. Future analysis of hourly data would provide additional information to assess different media buying strategies that might increase call volume.
Collapse
Affiliation(s)
- Bircan Erbas
- Centre for Molecular, Environmental, Genetic and Analytic Epidemiology, School of Population Health, University of Melbourne, Level 2, 723 Swanston Street, Carlton 3053, Victoria, Australia.
| | | | | | | | | |
Collapse
|