Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Minnier J, Tian L, Cai T. A Perturbation Method for Inference on Regularized Regression Estimates. J Am Stat Assoc 2012;106:1371-1382. [PMID: 22844171 DOI: 10.1198/jasa.2011.tm10382] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

For:	Minnier J, Tian L, Cai T. A Perturbation Method for Inference on Regularized Regression Estimates. J Am Stat Assoc 2012;106:1371-1382. [PMID: 22844171 DOI: 10.1198/jasa.2011.tm10382] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Number

Cited by Other Article(s)

Gao J, Bonzel CL, Hong C, Varghese P, Zakir K, Gronsbell J. Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms. J Am Med Inform Assoc 2024;31:640-650. [PMID: 38128118 PMCID: PMC10873838 DOI: 10.1093/jamia/ocad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/22/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023] Open

Wang L, Wang X, Liao KP, Cai T. Semisupervised transfer learning for evaluation of model classification performance. Biometrics 2024;80:ujae002. [PMID: 38465982 PMCID: PMC10926267 DOI: 10.1093/biomtc/ujae002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 12/17/2023] [Accepted: 01/17/2024] [Indexed: 03/12/2024]

Abdurrab I, Mahmood T, Sheikh S, Aijaz S, Kashif M, Memon A, Ali I, Peerwani G, Pathan A, Alkhodre AB, Siddiqui MS. Predicting the Length of Stay of Cardiac Patients Based on Pre-Operative Variables-Bayesian Models vs. Machine Learning Models. Healthcare (Basel) 2024;12:249. [PMID: 38255136 PMCID: PMC10815919 DOI: 10.3390/healthcare12020249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/04/2024] [Accepted: 01/16/2024] [Indexed: 01/24/2024] Open

Zhang D, Khalili A, Asgharian M. Post-model-selection inference in linear regression models: An integrated review. STATISTICS SURVEYS 2022. [DOI: 10.1214/22-ss135] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Ng TL, Newton MA. Random weighting in LASSO regression. Electron J Stat 2022. [DOI: 10.1214/22-ejs2020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Zhang HG, Hejblum BP, Weber GM, Palmer NP, Churchill SE, Szolovits P, Murphy SN, Liao KP, Kohane IS, Cai T. ATLAS: an automated association test using probabilistically linked health records with application to genetic studies. J Am Med Inform Assoc 2021;28:2582-2592. [PMID: 34608931 DOI: 10.1093/jamia/ocab187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 08/14/2021] [Accepted: 08/22/2021] [Indexed: 11/12/2022] Open

Abstract

OBJECTIVE

Large amounts of health data are becoming available for biomedical research. Synthesizing information across databases may capture more comprehensive pictures of patient health and enable novel research studies. When no gold standard mappings between patient records are available, researchers may probabilistically link records from separate databases and analyze the linked data. However, previous linked data inference methods are constrained to certain linkage settings and exhibit low power. Here, we present ATLAS, an automated, flexible, and robust association testing algorithm for probabilistically linked data.

MATERIALS AND METHODS

Missing variables are imputed at various thresholds using a weighted average method that propagates uncertainty from probabilistic linkage. Next, estimated effect sizes are obtained using a generalized linear model. ATLAS then conducts the threshold combination test by optimally combining P values obtained from data imputed at varying thresholds using Fisher's method and perturbation resampling.

RESULTS

In simulations, ATLAS controls for type I error and exhibits high power compared to previous methods. In a real-world genetic association study, meta-analysis of ATLAS-enabled analyses on a linked cohort with analyses using an existing cohort yielded additional significant associations between rheumatoid arthritis genetic risk score and laboratory biomarkers.

DISCUSSION

Weighted average imputation weathers false matches and increases contribution of true matches to mitigate linkage error-induced bias. The threshold combination test avoids arbitrarily choosing a threshold to rule a match, thus automating linked data-enabled analyses and preserving power.

CONCLUSION

ATLAS promises to enable novel and powerful research studies using linked data to capitalize on all available data sources.

Collapse

Zhang X, Fang K, Zhang Q. Multivariate functional generalized additive models. J STAT COMPUT SIM 2021. [DOI: 10.1080/00949655.2021.1979550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Fang F, Zhao J, Ahmed SE, Qu A. A weak‐signal‐assisted procedure for variable selection and statistical inference with an informative subsample. Biometrics 2021. [DOI: 10.1111/biom.13346] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Yu Q, Li Y, Wang Y, Yang Y, Zheng Z. Scalable and efficient inference via CPE. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.1936044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Cheng D, Ananthakrishnan AN, Cai T. Robust and efficient semi-supervised estimation of average treatment effects with application to electronic health records data. Biometrics 2021;77:413-423. [PMID: 32413171 PMCID: PMC7758040 DOI: 10.1111/biom.13298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2018] [Revised: 04/30/2020] [Accepted: 05/01/2020] [Indexed: 11/29/2022]

Zheng Z, Liu L, Li Y, Zhao N. High-dimensional statistical inference via DATE. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.1909733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Fei Z, Li Y. Estimation and Inference for High Dimensional Generalized Linear Models: A Splitting and Smoothing Approach. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2021;22:58. [PMID: 34531706 PMCID: PMC8442657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Zhao J, Chen C. A Nuisance-Free Inference Procedure Accounting for the Unknown Missingness with Application to Electronic Health Records. ENTROPY (BASEL, SWITZERLAND) 2020;22:E1154. [PMID: 33286923 PMCID: PMC7597318 DOI: 10.3390/e22101154] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 09/27/2020] [Accepted: 10/12/2020] [Indexed: 11/16/2022]

Robust high-dimensional regression for data with anomalous responses. ANN I STAT MATH 2020. [DOI: 10.1007/s10463-020-00764-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Hong H, Li J. The numerical bootstrap. Ann Stat 2020. [DOI: 10.1214/19-aos1812] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Solution paths for the generalized lasso with applications to spatially varying coefficients regression. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2019.106821] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Das D, Gregory K, Lahiri SN. Perturbation bootstrap in adaptive Lasso. Ann Stat 2019. [DOI: 10.1214/18-aos1741] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Lin J, Wang D, Zheng Q. Regression analysis and variable selection for two-stage multiple-infection group testing data. Stat Med 2019;38:4519-4533. [PMID: 31297869 DOI: 10.1002/sim.8311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2018] [Revised: 03/03/2019] [Accepted: 06/14/2019] [Indexed: 12/17/2022]

Das D, Lahiri SN. Distributional consistency of the lasso by perturbation bootstrap. Biometrika 2019. [DOI: 10.1093/biomet/asz029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Wang J, He X, Xu G. Debiased Inference on Treatment Effect in a High-Dimensional Model. J Am Stat Assoc 2019. [DOI: 10.1080/01621459.2018.1558062] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Cilluffo G, Sottile G, La Grutta S, Muggeo VM. The Induced Smoothed lasso: A practical framework for hypothesis testing in high dimensional regression. Stat Methods Med Res 2019;29:765-777. [PMID: 30991902 DOI: 10.1177/0962280219842890] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Gronsbell J, Minnier J, Yu S, Liao K, Cai T. Automated feature selection of predictors in electronic medical records data. Biometrics 2019;75:268-277. [PMID: 30353541 DOI: 10.1111/biom.12987] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2017] [Accepted: 10/01/2018] [Indexed: 01/29/2023]

Abstract

The use of Electronic Health Records (EHR) for translational research can be challenging due to difficulty in extracting accurate disease phenotype data. Historically, EHR algorithms for annotating phenotypes have been either rule-based or trained with billing codes and gold standard labels curated via labor intensive medical chart review. These simplistic algorithms tend to have unpredictable portability across institutions and low accuracy for many disease phenotypes due to imprecise billing codes. Recently, more sophisticated machine learning algorithms have been developed to improve the robustness and accuracy of EHR phenotyping algorithms. These algorithms are typically trained via supervised learning, relating gold standard labels to a wide range of candidate features including billing codes, procedure codes, medication prescriptions and relevant clinical concepts extracted from narrative notes via Natural Language Processing (NLP). However, due to the time intensiveness of gold standard labeling, the size of the training set is often insufficient to build a generalizable algorithm with the large number of candidate features extracted from EHR. To reduce the number of candidate predictors and in turn improve model performance, we present an automated feature selection method based entirely on unlabeled observations. The proposed method generates a comprehensive surrogate for the underlying phenotype with an unsupervised clustering of disease status based on several highly predictive features such as diagnosis codes and mentions of the disease in text fields available in the entire set of EHR data. A sparse regression model is then built with the estimated outcomes and remaining covariates to identify those features most informative of the phenotype of interest. Relying on the results of Li and Duan (1989), we demonstrate that variable selection for the underlying phenotype model can be achieved by fitting the surrogate-based model. We explore the performance of our methods in numerical simulations and present the results of a prediction model for Rheumatoid Arthritis (RA) built on a large EHR data mart from the Partners Health System consisting of billing codes and NLP terms. Empirical results suggest that our procedure reduces the number of gold-standard labels necessary for phenotyping thereby harnessing the automated power of EHR data and improving efficiency.

Collapse

Das D, Lahiri S. Second order correctness of perturbation bootstrap M-estimator of multiple linear regression parameter. BERNOULLI 2019. [DOI: 10.3150/17-bej1001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Lee SMS, Wu Y. A bootstrap recipe for post-model-selection inference under linear regression models. Biometrika 2018. [DOI: 10.1093/biomet/asy046] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Wang L, Van Keilegom I, Maidman A. Wild residual bootstrap inference for penalized quantile regression with heteroscedastic errors. Biometrika 2018. [DOI: 10.1093/biomet/asy037] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Penalized expectile regression: an alternative to penalized quantile regression. ANN I STAT MATH 2018. [DOI: 10.1007/s10463-018-0645-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Tuson M, Turlach B, Vickery A, Whyatt D. Reducing Bruzzi's Formula to Remove Instability in the Estimation of Population Attributable Fraction for Health Outcomes. Am J Epidemiol 2018;187:170-179. [PMID: 28595350 DOI: 10.1093/aje/kwx200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 02/27/2017] [Indexed: 11/13/2022] Open

Gronsbell JL, Cai T. Semi-supervised approaches to efficient evaluation of model prediction performance. J R Stat Soc Series B Stat Methodol 2017. [DOI: 10.1111/rssb.12264] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Geva A, Gronsbell JL, Cai T, Cai T, Murphy SN, Lyons JC, Heinz MM, Natter MD, Patibandla N, Bickel J, Mullen MP, Mandl KD. A Computable Phenotype Improves Cohort Ascertainment in a Pediatric Pulmonary Hypertension Registry. J Pediatr 2017;188. [PMID: 28625502 PMCID: PMC5572538 DOI: 10.1016/j.jpeds.2017.05.037] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Affiliation(s)

Alon Geva Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA,2Division of Critical Care Medicine, Department of Anesthesiology, Perioperative, and Pain Medicine, Boston Children’s Hospital, Boston, MA,9Department of Anaesthesia, Harvard Medical School, Boston, MA
Jessica L. Gronsbell Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA
Tianxi Cai Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA
Tianrun Cai Division of Rheumatology, Immunology and Allergy, Brigham and Women’s Hospital, Boston, MA
Shawn N. Murphy Department of Research Information Services and Computing, Partners Healthcare, Boston, MA,6Department of Neurology, Massachusetts General Hospital, Boston, MA,10Department of Biomedical Informatics, Harvard Medical School, Boston, MA
Jessica C. Lyons Department of Biomedical Informatics, Harvard Medical School, Boston, MA
Michelle M. Heinz Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
Marc D. Natter Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA,11Department of Pediatrics, Harvard Medical School, Boston, MA
Nandan Patibandla Information Services Department, Boston Children’s Hospital, Boston, MA
Jonathan Bickel Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA,7Information Services Department, Boston Children’s Hospital, Boston, MA,11Department of Pediatrics, Harvard Medical School, Boston, MA
Mary P. Mullen Department of Cardiology, Boston Children’s Hospital, Boston, MA,11Department of Pediatrics, Harvard Medical School, Boston, MA
Kenneth D. Mandl Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA,10Department of Biomedical Informatics, Harvard Medical School, Boston, MA,11Department of Pediatrics, Harvard Medical School, Boston, MA
for the PPHNet and NHLBI Pediatric Pulmonary Vascular Disease Outcomes Bioinformatics Clinical Coordinating Center Investigators

Collapse

Shi P, Qu A. Weak signal identification and inference in penalized model selection. Ann Stat 2017. [DOI: 10.1214/16-aos1482] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Marino M, Buxton OM, Li Y. Covariate Selection for Multilevel Models with Missing Data. Stat (Int Stat Inst) 2017;6:31-46. [PMID: 28239457 DOI: 10.1002/sta4.133] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Johnson BA, Long Q, Huang Y, Chansky K, Redman M. Model selection and inference for censored lifetime medical expenditures. Biometrics 2016;72:731-41. [PMID: 26689300 PMCID: PMC5741192 DOI: 10.1111/biom.12464] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Revised: 11/01/2015] [Accepted: 11/01/2015] [Indexed: 11/30/2022]

Tibshirani RJ, Taylor J, Lockhart R, Tibshirani R. Exact Post-Selection Inference for Sequential Regression Procedures. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2015.1108848] [Citation(s) in RCA: 86] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Laurin C, Boomsma D, Lubke G. The use of vector bootstrapping to improve variable selection precision in Lasso models. Stat Appl Genet Mol Biol 2016;15:305-20. [PMID: 27248122 PMCID: PMC5131926 DOI: 10.1515/sagmb-2015-0043] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Lin CY, Halabi S. A Simple Method for Deriving the Confidence Regions for the Penalized Cox's Model via the Minimand Perturbation. COMMUN STAT-THEOR M 2016;46:4791-4808. [PMID: 29326496 DOI: 10.1080/03610926.2015.1085568] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Lu S, Liu Y, Yin L, Zhang K. Confidence intervals and regions for the lasso by using stochastic variational inequality techniques in optimization. J R Stat Soc Series B Stat Methodol 2016. [DOI: 10.1111/rssb.12184] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Mandozzi J, Bühlmann P. Hierarchical Testing in the High-Dimensional Setting With Correlated Variables. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2015.1007209] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Sinnott JA, Cai T. Inference for survival prediction under the regularized Cox model. Biostatistics 2016;17:692-707. [PMID: 27107008 DOI: 10.1093/biostatistics/kxw016] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 03/23/2016] [Indexed: 12/31/2022] Open

Abstract

When a moderate number of potential predictors are available and a survival model is fit with regularization to achieve variable selection, providing accurate inference on the predicted survival can be challenging. We investigate inference on the predicted survival estimated after fitting a Cox model under regularization guaranteeing the oracle property. We demonstrate that existing asymptotic formulas for the standard errors of the coefficients tend to underestimate the variability for some coefficients, while typical resampling such as the bootstrap tends to overestimate it; these approaches can both lead to inaccurate variance estimation for predicted survival functions. We propose a two-stage adaptation of a resampling approach that brings the estimated error in line with the truth. In stage 1, we estimate the coefficients in the observed data set and in [Formula: see text] resampled data sets, and allow the resampled coefficient estimates to vote on whether each coefficient should be 0. For those coefficients voted as zero, we set both the point and interval estimates to [Formula: see text] In stage 2, to make inference about coefficients not voted as zero in stage 1, we refit the penalized model in the observed data and in the [Formula: see text] resampled data sets with only variables corresponding to those coefficients. We demonstrate that ensemble voting-based point and interval estimators of the coefficients perform well in finite samples, and prove that the point estimator maintains the oracle property. We extend this approach to derive inference procedures for survival functions and demonstrate that our proposed interval estimation procedures substantially outperform estimators based on asymptotic inference or standard bootstrap. We further illustrate our proposed procedures to predict breast cancer survival in a gene expression study.

Collapse

Agniel D, Liao KP, Cai T. Estimation and testing for multiple regulation of multivariate mixed outcomes. Biometrics 2016;72:1194-1205. [PMID: 26910481 DOI: 10.1111/biom.12495] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Revised: 11/01/2015] [Accepted: 12/01/2015] [Indexed: 11/27/2022]

Camponovo L. On the validity of the pairs bootstrap for lasso estimators. Biometrika 2015. [DOI: 10.1093/biomet/asv039] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Kim HL, Halabi S, Li P, Mayhew G, Simko J, Nixon AB, Small EJ, Rini B, Morris MJ, Taplin ME, George D. A Molecular Model for Predicting Overall Survival in Patients with Metastatic Clear Cell Renal Carcinoma: Results from CALGB 90206 (Alliance). EBioMedicine 2015;2:1814-20. [PMID: 26870806 PMCID: PMC4740313 DOI: 10.1016/j.ebiom.2015.09.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Revised: 09/06/2015] [Accepted: 09/07/2015] [Indexed: 11/30/2022] Open

Bühlmann P, van de Geer S. High-dimensional inference in misspecified linear models. Electron J Stat 2015. [DOI: 10.1214/15-ejs1041] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Zhou Q. Monte Carlo Simulation for Lasso-Type Problems by Estimator Augmentation. J Am Stat Assoc 2014. [DOI: 10.1080/01621459.2014.946035] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R. A SIGNIFICANCE TEST FOR THE LASSO. Ann Stat 2014;42:413-468. [PMID: 25574062 DOI: 10.1214/13-aos1175] [Citation(s) in RCA: 335] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Abstract

In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a [Formula: see text] distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than [Formula: see text] under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the [Formula: see text] penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties-adaptivity and shrinkage-and its null distribution is tractable and asymptotically Exp(1).

Collapse

Bühlmann P, Meier L, van de Geer S. Discussion: “A significance test for the lasso”. Ann Stat 2014. [DOI: 10.1214/13-aos1175a] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Bühlmann P. Discussion of Big Bayes Stories and BayesBag. Stat Sci 2014. [DOI: 10.1214/13-sts460] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Halabi S, Lin CY, Kelly WK, Fizazi KS, Moul JW, Kaplan EB, Morris MJ, Small EJ. Updated prognostic model for predicting overall survival in first-line chemotherapy for patients with metastatic castration-resistant prostate cancer. J Clin Oncol 2014;32:671-7. [PMID: 24449231 DOI: 10.1200/jco.2013.52.3696] [Citation(s) in RCA: 366] [Impact Index Per Article: 36.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Halabi S, Lin CY, Small EJ, Armstrong AJ, Kaplan EB, Petrylak D, Sternberg CN, Shen L, Oudard S, de Bono J, Sartor O. Prognostic model predicting metastatic castration-resistant prostate cancer survival in men treated with second-line chemotherapy. J Natl Cancer Inst 2013;105:1729-37. [PMID: 24136890 DOI: 10.1093/jnci/djt280] [Citation(s) in RCA: 128] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Abstract

BACKGROUND

Several prognostic models for overall survival (OS) have been developed and validated in men with metastatic castration-resistant prostate cancer (mCRPC) who receive first-line chemotherapy. We sought to develop and validate a prognostic model to predict OS in men who had progressed after first-line chemotherapy and were selected to receive second-line chemotherapy.

METHODS

Data from a phase III trial in men with mCRPC who had developed progressive disease after first-line chemotherapy (TROPIC trial) were used. The TROPIC was randomly split into training (n = 507) and testing (n = 248) sets. Another dataset consisting of 488 men previously treated with docetaxel (SPARC trial) was used for external validation. Adaptive least absolute shrinkage and selection operator selected nine prognostic factors of OS. A prognostic score was computed from the regression coefficients. The model was assessed on the testing and validation sets for its predictive accuracy using the time-dependent area under the curve (tAUC).

RESULTS

The nine prognostic variables in the final model were Eastern Cooperative Oncology Group performance status, time since last docetaxel use, measurable disease, presence of visceral disease, pain, duration of hormonal use, hemoglobin, prostate specific antigen, and alkaline phosphatase. The tAUCs for this model were 0.73 (95% confidence interval [CI] = 0.72 to 0.74) and 0.70 (95% CI = 0.68 to 0.72) for the testing and validation sets, respectively.

CONCLUSIONS

A prognostic model of OS in the postdocetaxel, second-line chemotherapy, mCRPC setting was developed and externally validated. This model incorporates novel prognostic factors and can be used to provide predicted probabilities for individual patients and to select patients to participate in clinical trials on the basis of their prognosis. Prospective validation is needed.

Collapse

Taylor J. The geometry of least squares in the 21st century. BERNOULLI 2013. [DOI: 10.3150/12-bejsp15] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Chatterjee A, Lahiri SN. Rates of convergence of the Adaptive LASSO estimators to the Oracle distribution and higher order refinements by the bootstrap. Ann Stat 2013. [DOI: 10.1214/13-aos1106] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]