Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

24
(from Reference Citation Analysis)

Article PDFs (2)

Cited by > 0 (17)

Searched Name

influence function

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Testing the missing at random assumption in generalized linear models in the presence of instrumental variables. Scand Stat Theory Appl 2024;51:334-354. [PMID: 38370508 PMCID: PMC10871667 DOI: 10.1111/sjos.12685] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 07/09/2023] [Indexed: 02/20/2024] Abstract Practical problems with missing data are common, and many methods have been developed concerning the validity and/or efficiency of statistical procedures. On a central focus, there have been longstanding interests on the mechanism governing data missingness, and correctly deciding the appropriate mechanism is crucially relevant for conducting proper practical investigations. In this paper, we present a new hypothesis testing approach for deciding between the conventional notions of missing at random and missing not at random in generalized linear models in the presence of instrumental variables. The foundational idea is to develop appropriate discrepancy measures between estimators whose properties significantly differ only when missing at random does not hold. We show that our testing approach achieves an objective data-oriented choice between missing at random or not. We demonstrate the feasibility, validity, and efficacy of the new test by theoretical analysis, simulation studies, and a real data analysis. Collapse Key Words Hausman test hypothesis testing influence function instrumental variable missing not at random semiparametric inference Collapse MESH Headings Collapse Grants R56 AG069880 NIA NIH HHS RF1 AG077820 NIA NIH HHS R01 AI130460 NIAID NIH HHS R21 AI167418 NIAID NIH HHS U01 TR003709 NCATS NIH HHS R56 AG074604 NIA NIH HHS R01 AG073435 NIA NIH HHS R01 LM013519 NLM NIH HHS R01 GM140476 NIGMS NIH HHS R01 LM012607 NLM NIH HHS R01 GM148494 NIGMS NIH HHS Collapse
2	Investigating the Influence of Fluctuating Humidity and Temperature on Creep Deformation in High-Performance Concrete Beams: A Comparative Study between Natural and Laboratorial Environmental Tests. MATERIALS (BASEL, SWITZERLAND) 2024;17:998. [PMID: 38473471 DOI: 10.3390/ma17050998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 02/13/2024] [Accepted: 02/19/2024] [Indexed: 03/14/2024] Abstract To investigate the influence of temperature and humidity variations on creep in high-performance concrete beams, beam tests were conducted in both natural and laboratory settings. The findings indicate that the variations in creep primarily stem from temperature changes, whereas humidity changes have little influence on fluctuations in both basic creep and total creep. The influence of humidity on creep is more strongly reflected in the magnitude of creep. Functions describing the influence of temperature and humidity on the creep behavior of high-performance concrete (HPC) subjected to fluctuating conditions are proposed. The findings were employed to examine creep deformation in engineering applications across four places. This study complements the correction method for the creep of members under fluctuating temperature and humidity. This research application can provide a basis for the calculation of the long-term deformation of HPC structures in natural environments. Collapse Key Words HPC beams bridge creep humidity influence function temperature varying ambience Collapse MESH Headings Collapse Grants 51820105012; 52378172 National Natural Science Foundation of China 2022YFQ0048 Sichuan Provincial Science and Technology Department Regional Innovation Cooperation Project 2023J041 Natural Science Foundation of Ningbo Municipality LY23E080005 Zhejiang Provincial Natural Science Foundation 2023C03183 Zhejiang Provincial Leading Goose Research and Development Program Project Collapse
3	M-quantile regression shrinkage and selection via the Lasso and Elastic Net to assess the effect of meteorology and traffic on air quality. Biom J 2023;65:e2100355. [PMID: 37743255 DOI: 10.1002/bimj.202100355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 01/31/2023] [Accepted: 04/11/2023] [Indexed: 09/26/2023] Abstract In this work, we intersect data on size-selected particulate matter (PM) with vehicular traffic counts and a comprehensive set of meteorological covariates to study the effect of traffic on air quality. To this end, we develop an M-quantile regression model with Lasso and Elastic Net penalizations. This allows (i) to identify the best proxy for vehicular traffic via model selection, (ii) to investigate the relationship between fine PM concentration and the covariates at different M-quantiles of the conditional response distribution, and (iii) to be robust to the presence of outliers. Heterogeneity in the data is accounted by fitting a B-spline on the effect of the day of the year. Analytic and bootstrap-based variance estimates of the regression coefficients are provided, together with a numerical evaluation of the proposed estimation procedure. Empirical results show that atmospheric stability is responsible for the most significant effect on fine PM concentration: this effect changes at different levels of the conditional response distribution and is relatively weaker on the tails. On the other hand, model selection allows to identify the best proxy for vehicular traffic whose effect remains essentially the same at different levels of the conditional response distribution. Collapse Key Words B-splines additive models cross-validation influence function robust regression Collapse MESH Headings Air Pollutants/adverse effects Air Pollutants/analysis Meteorology Environmental Monitoring/methods Air Pollution/adverse effects Air Pollution/analysis Particulate Matter/analysis Collapse Grants ProgettoAIDMIX Università degli Studi di Perugia, Fondo Ricerca di Ateneo, edizione 2021 anenhancedmodellingtoolforthetransitiontosustainability,G.A.No.821105(Horizon2020) Università degli Studi di Perugia, Fondo Ricerca di Ateneo, edizione 2021 730998 European Commission 821105 European Commission Collapse
4	Nonparametric inference of general while-alive estimands for recurrent events. Biometrics 2023;79:1749-1760. [PMID: 35731993 PMCID: PMC9772359 DOI: 10.1111/biom.13709] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 06/16/2022] [Indexed: 12/24/2022] Abstract Measuring the treatment effect on recurrent events like hospitalization in the presence of death has long challenged statisticians and clinicians alike. Traditional inference on the cumulative frequency unjustly penalizes survivorship as longer survivors also tend to experience more adverse events. Expanding a recently suggested idea of the "while-alive" event rate, we consider a general class of such estimands that adjust for the length of survival without losing causal interpretation. Given a user-specified loss function that allows for arbitrary weighting, we define as estimand the average loss experienced per unit time alive within a target period and use the ratio of this loss rate to measure the effect size. Scaling the loss rate by the width of the corresponding time window gives us an alternative, and sometimes more photogenic, way of showing the data. To make inferences, we construct a nonparametric estimator for the loss rate through the cumulative loss and the restricted mean survival time and derive its influence function in closed form for variance estimation and testing. As simulations and analysis of real data from a heart failure trial both show, the while-alive approach corrects for the false attenuation of treatment effect due to patients living longer under treatment, with increased statistical power as a result. The proposed methods are implemented in the R-package WA, which is publicly available from the Comprehensive R Archive Network (CRAN). Collapse Key Words composite endpoints cumulative frequency influence function restricted mean survival time semicompeting risks Collapse MESH Headings Humans Research Design Causality Survival Rate Collapse Grants R01 HL149875 NHLBI NIH HHS Collapse
5	Bayesian and influence function-based empirical likelihoods for inference of sensitivity to the early diseased stage in diagnostic tests. Biom J 2023;65:e2200021. [PMID: 36642803 PMCID: PMC10006346 DOI: 10.1002/bimj.202200021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 09/11/2022] [Accepted: 10/08/2022] [Indexed: 01/17/2023] Abstract In practice, a disease process might involve three ordinal diagnostic stages: the normal healthy stage, the early stage of the disease, and the stage of full development of the disease. Early detection is critical for some diseases since it often means an optimal time window for therapeutic treatments of the diseases. In this study, we propose a new influence function-based empirical likelihood method and Bayesian empirical likelihood methods to construct confidence/credible intervals for the sensitivity of a test to patients in the early diseased stage given a specificity and a sensitivity of the test to patients in the fully diseased stage. Numerical studies are performed to compare the finite sample performances of the proposed approaches with existing methods. The proposed methods are shown to outperform existing methods in terms of coverage probability. A real dataset from the Alzheimer's Disease Neuroimaging Initiative (ANDI) is used to illustrate the proposed methods. Collapse Key Words Bayesian inference confidence interval empirical likelihood influence function sensitivity of the early stage Collapse MESH Headings Humans Bayes Theorem Probability Alzheimer Disease/diagnosis Neuroimaging Diagnostic Tests, Routine Likelihood Functions Collapse Grants U01 AG024904 NIA NIH HHS U19 AG024904 NIA NIH HHS Collapse
6	Estimation of separable direct and indirect effects in continuous time. Biometrics 2023;79:127-139. [PMID: 34506039 DOI: 10.1111/biom.13559] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 08/04/2021] [Accepted: 08/26/2021] [Indexed: 11/29/2022] Abstract Many research questions involve time-to-event outcomes that can be prevented from occurring due to competing events. In these settings, we must be careful about the causal interpretation of classical statistical estimands. In particular, estimands on the hazard scale, such as ratios of cause-specific or subdistribution hazards, are fundamentally hard to interpret causally. Estimands on the risk scale, such as contrasts of cumulative incidence functions, do have a clear causal interpretation, but they only capture the total effect of the treatment on the event of interest; that is, effects both through and outside of the competing event. To disentangle causal treatment effects on the event of interest and competing events, the separable direct and indirect effects were recently introduced. Here we provide new results on the estimation of direct and indirect separable effects in continuous time. In particular, we derive the nonparametric influence function in continuous time and use it to construct an estimator that has certain robustness properties. We also propose a simple estimator based on semiparametric models for the two cause-specific hazard functions. We describe the asymptotic properties of these estimators and present results from simulation studies, suggesting that the estimators behave satisfactorily in finite samples. Finally, we reanalyze the prostate cancer trial from Stensrud et al. (2020). Collapse Key Words competing events hazard functions influence function separable effects survival analysis Collapse MESH Headings Collapse Grants Collapse
7	Group sequential methods for interim monitoring of randomized clinical trials with time-lagged outcome. Stat Med 2022;41:5517-5536. [PMID: 36117235 PMCID: PMC9825950 DOI: 10.1002/sim.9580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 08/17/2022] [Accepted: 08/29/2022] [Indexed: 01/11/2023] Abstract The primary analysis in two-arm clinical trials usually involves inference on a scalar treatment effect parameter; for example, depending on the outcome, the difference of treatment-specific means, risk difference, risk ratio, or odds ratio. Most clinical trials are monitored for the possibility of early stopping. Because ordinarily the outcome on any given subject can be ascertained only after some time lag, at the time of an interim analysis, among the subjects already enrolled, the outcome is known for only a subset and is effectively censored for those who have not been enrolled sufficiently long for it to be observed. Typically, the interim analysis is based only on the data from subjects for whom the outcome has been ascertained. A goal of an interim analysis is to stop the trial as soon as the evidence is strong enough to do so, suggesting that the analysis ideally should make the most efficient use of all available data, thus including information on censoring as well as other baseline and time-dependent covariates in a principled way. A general group sequential framework is proposed for clinical trials with a time-lagged outcome. Treatment effect estimators that take account of censoring and incorporate covariate information at an interim analysis are derived using semiparametric theory and are demonstrated to lead to stronger evidence for early stopping than standard approaches. The associated test statistics are shown to have the independent increments structure, so that standard software can be used to obtain stopping boundaries. Collapse Key Words augmented inverse probability weighting early stopping influence function proportion of information Collapse MESH Headings Collapse Grants Collapse
8	Optimal sampling for design-based estimators of regression models. Stat Med 2022;41:1482-1497. [PMID: 34989429 PMCID: PMC8918008 DOI: 10.1002/sim.9300] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 12/02/2021] [Accepted: 12/10/2021] [Indexed: 11/05/2022] Abstract Two-phase designs measure variables of interest on a subcohort where the outcome and covariates are readily available or cheap to collect on all individuals in the cohort. Given limited resource availability, it is of interest to find an optimal design that includes more informative individuals in the final sample. We explore the optimal designs and efficiencies for analyses by design-based estimators. Generalized raking is an efficient class of design-based estimators, and they improve on the inverse-probability weighted (IPW) estimator by adjusting weights based on the auxiliary information. We derive a closed-form solution of the optimal design for estimating regression coefficients from generalized raking estimators. We compare it with the optimal design for analysis via the IPW estimator and other two-phase designs in measurement-error settings. We consider general two-phase designs where the outcome variable and variables of interest can be continuous or discrete. Our results show that the optimal designs for analyses by the two classes of design-based estimators can be very different. The optimal design for analysis via the IPW estimator is optimal for IPW estimation and typically gives near-optimal efficiency for generalized raking estimation, though we show there is potential improvement in some settings. Collapse Key Words Neyman allocation generalized raking influence function model-assisted sampling optimal design residual two-phase sampling Collapse MESH Headings Cohort Studies Humans Probability Research Design Collapse Grants R01 AI131771 NIAID NIH HHS R37 AI131771 NIAID NIH HHS Collapse
9	A robust variable screening procedure for ultra-high dimensional data. Stat Methods Med Res 2021;30:1816-1832. [PMID: 34053339 DOI: 10.1177/09622802211017299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Abstract Variable selection in ultra-high dimensional regression problems has become an important issue. In such situations, penalized regression models may face computational problems and some pre-screening of the variables may be necessary. A number of procedures for such pre-screening has been developed; among them the Sure Independence Screening (SIS) enjoys some popularity. However, SIS is vulnerable to outliers in the data, and in particular in small samples this may lead to faulty inference. In this paper, we develop a new robust screening procedure. We build on the density power divergence (DPD) estimation approach and introduce DPD-SIS and its extension iterative DPD-SIS. We illustrate the behavior of the methods through extensive simulation studies and show that they are superior to both the original SIS and other robust methods when there are outliers in the data. Finally, we illustrate its use in a study on regulation of lipid metabolism. Collapse Key Words NP dimensionality Variable selection gene selection independence screening influence function minimum density power divergence estimator Collapse MESH Headings Collapse Grants Collapse
10	Optimal multiwave sampling for regression modeling in two-phase designs. Stat Med 2020;39:4912-4921. [PMID: 33016376 PMCID: PMC7902311 DOI: 10.1002/sim.8760] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 08/27/2020] [Accepted: 09/08/2020] [Indexed: 11/09/2022] Abstract Two-phase designs involve measuring extra variables on a subset of the cohort where some variables are already measured. The goal of two-phase designs is to choose a subsample of individuals from the cohort and analyse that subsample efficiently. It is of interest to obtain an optimal design that gives the most efficient estimates of regression parameters. In this article, we propose a multiwave sampling design to approximate the optimal design for design-based estimators. Influence functions are used to compute the optimal sampling allocations. We propose to use informative priors on regression parameters to derive the wave-1 sampling probabilities because any prespecified sampling probabilities may be far from optimal and decrease the design efficiency. The posterior distributions of the regression parameters derived from the current wave will then be used as priors for the next wave. Generalized raking is used in the final statistical analysis. We show that a two-wave sampling with reasonable informative priors will end up with a highly efficient estimation for the parameter of interest and be close to the underlying optimal design. Collapse Key Words Neyman allocation design-based estimators influence function optimal design prior Collapse MESH Headings Cohort Studies Humans Probability Research Design Collapse Grants R01 AI131771 NIAID NIH HHS R37 AI131771 NIAID NIH HHS Collapse
11	Bayesian and influence function-based empirical likelihoods for inference of sensitivity in diagnostic tests. Stat Methods Med Res 2020;29:3457-3491. [PMID: 32552342 DOI: 10.1177/0962280220929042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Abstract In medical diagnostic studies, a diagnostic test can be evaluated based on its sensitivity under a desired specificity. Existing methods for inference on sensitivity include normal approximation-based approaches and empirical likelihood (EL)-based approaches. These methods generally have poor performance when the specificity is high, and some require choosing smoothing parameters. We propose a new influence function-based empirical likelihood method and Bayesian empirical likelihood methods to overcome such problems. Numerical studies are performed to compare the finite sample performance of the proposed approaches with existing methods. The proposed methods are shown to perform better in terms of both coverage probability and interval length. A real data set from Alzheimer's Disease Neuroimaging Initiative (ANDI) is analyzed. Collapse Key Words Bayesian inference confidence intervals empirical likelihood influence function sensitivity Collapse MESH Headings Collapse Grants Collapse
12	Influence function-based empirical likelihood for inference of quantile medical costs with censored data. Stat Methods Med Res 2019;29:1913-1934. [PMID: 31595834 DOI: 10.1177/0962280219880573] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Abstract In this paper, we propose empirical likelihood methods based on influence function and Jackknife techniques to construct confidence intervals for quantile medical costs with censored data. We show that the influence function-based empirical log-likelihood ratio statistic for the quantile medical cost has a standard Chi-square distribution as its asymptotic distribution. Simulation studies are conducted to compare coverage probabilities and interval lengths of the proposed empirical likelihood confidence intervals with the existing normal approximation-based confidence intervals for quantile medical costs. The proposed methods are observed to have better finite-sample performances than existing methods. The new methods are also illustrated through a real example. Collapse Key Words Censored data Jackknife empirical likelihood influence function quantile medical costs Collapse MESH Headings Collapse Grants Collapse
13	Machine learning methods for leveraging baseline covariate information to improve the efficiency of clinical trials. Stat Med 2019;38:1703-1714. [PMID: 30474289 DOI: 10.1002/sim.8054] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 09/11/2018] [Accepted: 11/09/2018] [Indexed: 11/09/2022] Abstract Clinical trials are widely considered the gold standard for treatment evaluation, and they can be highly expensive in terms of time and money. The efficiency of clinical trials can be improved by incorporating information from baseline covariates that are related to clinical outcomes. This can be done by modifying an unadjusted treatment effect estimator with an augmentation term that involves a function of covariates. The optimal augmentation is well characterized in theory but must be estimated in practice. In this article, we investigate the use of machine learning methods to estimate the optimal augmentation. We consider and compare an indirect approach based on an estimated regression function and a direct approach that aims directly to minimize the asymptotic variance of the treatment effect estimator. Theoretical considerations and simulation results indicate that the direct approach is generally preferable over the indirect approach. The direct approach can be implemented using any existing prediction algorithm that can minimize a weighted sum of squared prediction errors. Many such prediction algorithms are available, and the super learning principle can be used to combine multiple algorithms into a super learner under the direct approach. The resulting direct super learner has a desirable oracle property, is easy to implement, and performs well in realistic settings. The proposed methodology is illustrated with real data from a stroke trial. Collapse Key Words asymptotic efficiency augmentation cross-validation influence function semiparametric theory super learner Collapse MESH Headings Collapse Grants Collapse
14	Robust Inference after Random Projections via Hellinger Distance for Location-Scale Family. ENTROPY 2019;21:e21040348. [PMID: 33267062 PMCID: PMC7514831 DOI: 10.3390/e21040348] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 03/23/2019] [Accepted: 03/24/2019] [Indexed: 11/02/2022] Abstract Big data and streaming data are encountered in a variety of contemporary applications in business and industry. In such cases, it is common to use random projections to reduce the dimension of the data yielding compressed data. These data however possess various anomalies such as heterogeneity, outliers, and round-off errors which are hard to detect due to volume and processing challenges. This paper describes a new robust and efficient methodology, using Hellinger distance, to analyze the compressed data. Using large sample methods and numerical experiments, it is demonstrated that a routine use of robust estimation procedure is feasible. The role of double limits in understanding the efficiency and robustness is brought out, which is of independent interest. Collapse Key Words Hellinger distance asymptotic normality compressed data consistency influence function iterated limits location-scale family representation formula Collapse MESH Headings Collapse Grants Collapse
15	A critical issue of using the variance of the total in the linearization method - In the context of unequal probability sampling. Stat Med 2018;38:1475-1483. [PMID: 30488467 DOI: 10.1002/sim.8053] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 11/09/2018] [Accepted: 11/10/2018] [Indexed: 11/05/2022] Abstract Publicly available national survey data are useful for the evidence-based research to advance our understanding of important questions in the health and biomedical sciences. Appropriate variance estimation is a crucial step to evaluate the strength of evidence in the data analysis. In survey data analysis, the conventional linearization method for estimating the variance of a statistic of interest uses the variance estimator of the total based on linearized variables. We warn that this common practice may result in undesirable consequences such as susceptibility to data shift and severely inflated variance estimates, when unequal weights are incorporated into variance estimation. We propose to use the variance estimator of the mean (mean-approach) instead of the variance estimator of the total (total-approach). We show a superiority of the mean-approach through analytical investigations. A real data example (the National Comorbidity Survey Replication) and simulation-based studies strongly support our conclusion. Collapse Key Words Hansen-Hurwitz estimator NCS-R NHANES influence function variance of the variance estimator Collapse MESH Headings Collapse Grants Collapse
16	A New Class of Robust Two-Sample Wald-Type Tests. Int J Biostat 2018;14:/j/ijb.ahead-of-print/ijb-2017-0023/ijb-2017-0023.xml. [PMID: 30024852 DOI: 10.1515/ijb-2017-0023] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 06/25/2018] [Indexed: 11/15/2022] Abstract Parametric hypothesis testing associated with two independent samples arises frequently in several applications in biology, medical sciences, epidemiology, reliability and many more. In this paper, we propose robust Wald-type tests for testing such two sample problems using the minimum density power divergence estimators of the underlying parameters. In particular, we consider the simple two-sample hypothesis concerning the full parametric homogeneity as well as the general two-sample (composite) hypotheses involving some nuisance parameters. The asymptotic and theoretical robustness properties of the proposed Wald-type tests have been developed for both the simple and general composite hypotheses. Some particular cases of testing against one-sided alternatives are discussed with specific attention to testing the effectiveness of a treatment in clinical trials. Performances of the proposed tests have also been illustrated numerically through appropriate real data examples. Collapse Key Words clinical trial influence function minimum density power divergence estimator robust hypothesis testing two-sample problems Collapse MESH Headings Biostatistics Data Interpretation, Statistical Humans Models, Statistical Research Design Collapse Grants Collapse
17	Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models. ENTROPY 2018;20:e20030168. [PMID: 33265259 PMCID: PMC7512684 DOI: 10.3390/e20030168] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Revised: 03/01/2018] [Accepted: 03/01/2018] [Indexed: 11/29/2022] Abstract An important issue for robust inference is to examine the stability of the asymptotic level and power of the test statistic in the presence of contaminated data. Most existing results are derived in finite-dimensional settings with some particular choices of loss functions. This paper re-examines this issue by allowing for a diverging number of parameters combined with a broader array of robust error measures, called “robust-BD”, for the class of “general linear models”. Under regularity conditions, we derive the influence function of the robust-BD parameter estimator and demonstrate that the robust-BD Wald-type test enjoys the robustness of validity and efficiency asymptotically. Specifically, the asymptotic level of the test is stable under a small amount of contamination of the null hypothesis, whereas the asymptotic power is large enough under a contaminated distribution in a neighborhood of the contiguous alternatives, thus lending supports to the utility of the proposed robust-BD Wald-type test. Collapse Key Words Bregman divergence Wald-type test general linear model hypothesis testing influence function robust Collapse MESH Headings Collapse Grants Collapse
18	Influence Analysis for the Area Under the Receiver Operating Characteristic Curve. J Biopharm Stat 2017;28:722-734. [PMID: 28920760 DOI: 10.1080/10543406.2017.1377728] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Abstract Classification measures play essential roles in the assessment and construction of classifiers. Hence, determining how to prevent these measures from being affected by individual observations has become an important problem. In this paper, we propose several indexes based on the influence function and the concept of local influence to identify influential observations that affect the estimate of the area under the receiver operating characteristic curve (AUC), an important and commonly used measure. Cumulative lift charts are also used to equipoise the disagreements among the proposed indexes. Both the AUC indexes and the graphical tools only rely on the classification scores, and both are applicable to classifiers that can produce real-valued classification scores. A real data set is used for illustration. Collapse Key Words AUC cumulative lift chart influence function local influence partial AUC Collapse MESH Headings Collapse Grants Collapse
19	HIGHER ORDER ESTIMATING EQUATIONS FOR HIGH-DIMENSIONAL MODELS. Ann Stat 2017;45:1951-1987. [PMID: 30971851 PMCID: PMC6453538 DOI: 10.1214/16-aos1515] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Abstract We introduce a new method of estimation of parameters in semi-parametric and nonparametric models. The method is based on estimating equations that are U-statistics in the observations. The U-statistics are based on higher order influence functions that extend ordinary linear influence functions of the parameter of interest, and represent higher derivatives of this parameter. For parameters for which the representation cannot be perfect the method leads to a bias-variance trade-off, and results in estimators that converge at a slower than n -rate . In a number of examples the resulting rate can be shown to be optimal. We are particularly interested in estimating parameters in models with a nuisance parameter of high dimension or low regularity, where the parameter of interest cannot be estimated at n -rate , but we also consider efficient n -estimation using novel nonlinear estimators. The general approach is applied in detail to the example of estimating a mean response when the response is not always observed. Collapse Key Words 62F25 62G20 Nonlinear functional Primary 62G05 U-statistic influence function nonparametric estimation tangent space Collapse MESH Headings Collapse Grants R01 AI127271 NIAID NIH HHS Collapse
20	Empirical likelihood inference in randomized clinical trials. Stat Methods Med Res 2017;27:3770-3784. [PMID: 28679341 DOI: 10.1177/0962280217711205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract In individually randomized controlled trials, in addition to the primary outcome, information is often available on a number of covariates prior to randomization. This information is frequently utilized to undertake adjustment for baseline characteristics in order to increase precision of the estimation of average treatment effects; such adjustment is usually performed via covariate adjustment in outcome regression models. Although the use of covariate adjustment is widely seen as desirable for making treatment effect estimates more precise and the corresponding hypothesis tests more powerful, there are considerable concerns that objective inference in randomized clinical trials can potentially be compromised. In this paper, we study an empirical likelihood approach to covariate adjustment and propose two unbiased estimating functions that automatically decouple evaluation of average treatment effects from regression modeling of covariate-outcome relationships. The resulting empirical likelihood estimator of the average treatment effect is as efficient as the existing efficient adjusted estimators¹ when separate treatment-specific working regression models are correctly specified, yet are at least as efficient as the existing efficient adjusted estimators¹ for any given treatment-specific working regression models whether or not they coincide with the true treatment-specific covariate-outcome relationships. We present a simulation study to compare the finite sample performance of various methods along with some results on analysis of a data set from an HIV clinical trial. The simulation results indicate that the proposed empirical likelihood approach is more efficient and powerful than its competitors when the working covariate-outcome relationships by treatment status are misspecified. Collapse Key Words Average treatment effect clinical trial covariate adjustment efficiency empirical likelihood influence function randomization regression unbiased estimating function Collapse MESH Headings Collapse Grants Collapse
21	Empirical Likelihood in Nonignorable Covariate-Missing Data Problems. Int J Biostat 2017;13:/j/ijb.ahead-of-print/ijb-2016-0053/ijb-2016-0053.xml. [PMID: 28441139 DOI: 10.1515/ijb-2016-0053] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Abstract Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719-30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES). Collapse Key Words complete case analysis efficiency empirical likelihood influence function linear space,missing covariates missing not at random,projection regression residual unbiased estimating function Collapse MESH Headings Collapse Grants Collapse
22	Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates. Electron J Stat 2015;9:1583-1607. [PMID: 26279737 PMCID: PMC4533123 DOI: 10.1214/15-ejs1035] [Citation(s) in RCA: 127] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Abstract In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC. Collapse Key Words AUC ROC binary classification confidence intervals cross-validation influence curve influence function machine learning model selection variance estimation Collapse MESH Headings Collapse Grants R01 AI074345 NIAID NIH HHS Collapse
23	Semiparametric estimation of treatment effect with time-lagged response in the presence of informative censoring. LIFETIME DATA ANALYSIS 2011;17:566-593. [PMID: 21706378 PMCID: PMC3217309 DOI: 10.1007/s10985-011-9199-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2009] [Accepted: 06/11/2011] [Indexed: 05/30/2023] Abstract In many randomized clinical trials, the primary response variable, for example, the survival time, is not observed directly after the patients enroll in the study but rather observed after some period of time (lag time). It is often the case that such a response variable is missing for some patients due to censoring that occurs when the study ends before the patient's response is observed or when the patients drop out of the study. It is often assumed that censoring occurs at random which is referred to as noninformative censoring; however, in many cases such an assumption may not be reasonable. If the missing data are not analyzed properly, the estimator or test for the treatment effect may be biased. In this paper, we use semiparametric theory to derive a class of consistent and asymptotically normal estimators for the treatment effect parameter which are applicable when the response variable is right censored. The baseline auxiliary covariates and post-treatment auxiliary covariates, which may be time-dependent, are also considered in our semiparametric model. These auxiliary covariates are used to derive estimators that both account for informative censoring and are more efficient then the estimators which do not consider the auxiliary covariates. Collapse Key Words informative censoring influence function logrank test nuisance tangent space proportional hazards model regular and asymptotically linear estimators Collapse MESH Headings Anti-HIV Agents/administration & dosage Anti-HIV Agents/therapeutic use CD4 Lymphocyte Count Data Interpretation, Statistical HIV/growth & development HIV Infections/drug therapy Humans Models, Statistical Randomized Controlled Trials as Topic/methods Treatment Outcome Collapse Grants P01 CA142538-01 NCI NIH HHS P01 CA142538-02 NCI NIH HHS R01 CA051962 NCI NIH HHS R37 AI031789-22 NIAID NIH HHS R01 CA051962-20 NCI NIH HHS R37 AI031789 NIAID NIH HHS R01 CA051962-19 NCI NIH HHS P01 CA142538 NCI NIH HHS R37 AI031789-21 NIAID NIH HHS Collapse
24	Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data. Stat Sci 2005;20:261-301. [PMID: 19081743 PMCID: PMC2600547 DOI: 10.1214/088342305000000151] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Abstract The pretest-posttest study is commonplace in numerous applications. Typically, subjects are randomized to two treatments, and response is measured at baseline, prior to intervention with the randomized treatment (pretest), and at prespecified follow-up time (posttest). Interest focuses on the effect of treatments on the change between mean baseline and follow-up response. Missing posttest response for some subjects is routine, and disregarding missing cases can lead to invalid inference. Despite the popularity of this design, a consensus on an appropriate analysis when no data are missing, let alone for taking into account missing follow-up, does not exist. Under a semiparametric perspective on the pretest-posttest model, in which limited distributional assumptions on pretest or posttest response are made, we show how the theory of Robins, Rotnitzky and Zhao may be used to characterize a class of consistent treatment effect estimators and to identify the efficient estimator in the class. We then describe how the theoretical results translate into practice. The development not only shows how a unified framework for inference in this setting emerges from the Robins, Rotnitzky and Zhao theory, but also provides a review and demonstration of the key aspects of this theory in a familiar context. The results are also relevant to the problem of comparing two treatment means with adjustment for baseline covariates. Collapse Key Words analysis of covariance covariate adjustment influence function inverse probability weighting missing at random Collapse MESH Headings Collapse Grants P50 DA010075 NIDA NIH HHS R01 CA085848-05 NCI NIH HHS R37 AI031789-14 NIAID NIH HHS R01 CA051962-14 NCI NIH HHS R01 CA051962-15 NCI NIH HHS R37 AI031789-15 NIAID NIH HHS R01 CA085848 NCI NIH HHS R01 CA051962 NCI NIH HHS R37 AI031789 NIAID NIH HHS R01 CA085848-04 NCI NIH HHS Collapse