1
|
Song Y, Han H, Fu L, Wang T. Penalized weighted smoothed quantile regression for high-dimensional longitudinal data. Stat Med 2024; 43:2007-2042. [PMID: 38634309 DOI: 10.1002/sim.10056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 01/30/2024] [Accepted: 02/25/2024] [Indexed: 04/19/2024]
Abstract
Quantile regression, known as a robust alternative to linear regression, has been widely used in statistical modeling and inference. In this paper, we propose a penalized weighted convolution-type smoothed method for variable selection and robust parameter estimation of the quantile regression with high dimensional longitudinal data. The proposed method utilizes a twice-differentiable and smoothed loss function instead of the check function in quantile regression without penalty, and can select the important covariates consistently using the efficient gradient-based iterative algorithms when the dimension of covariates is larger than the sample size. Moreover, the proposed method can circumvent the influence of outliers in the response variable and/or the covariates. To incorporate the correlation within each subject and enhance the accuracy of the parameter estimation, a two-step weighted estimation method is also established. Furthermore, we prove the oracle properties of the proposed method under some regularity conditions. Finally, the performance of the proposed method is demonstrated by simulation studies and two real examples.
Collapse
Affiliation(s)
- Yanan Song
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
| | - Haohui Han
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
| | - Liya Fu
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
| | - Ting Wang
- Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
2
|
Lee ER, Park S, Lee SK, Hong HG. Quantile forward regression for high-dimensional survival data. LIFETIME DATA ANALYSIS 2023; 29:769-806. [PMID: 37393569 DOI: 10.1007/s10985-023-09603-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 05/17/2023] [Indexed: 07/04/2023]
Abstract
Despite the urgent need for an effective prediction model tailored to individual interests, existing models have mainly been developed for the mean outcome, targeting average people. Additionally, the direction and magnitude of covariates' effects on the mean outcome may not hold across different quantiles of the outcome distribution. To accommodate the heterogeneous characteristics of covariates and provide a flexible risk model, we propose a quantile forward regression model for high-dimensional survival data. Our method selects variables by maximizing the likelihood of the asymmetric Laplace distribution (ALD) and derives the final model based on the extended Bayesian Information Criterion (EBIC). We demonstrate that the proposed method enjoys a sure screening property and selection consistency. We apply it to the national health survey dataset to show the advantages of a quantile-specific prediction model. Finally, we discuss potential extensions of our approach, including the nonlinear model and the globally concerned quantile regression coefficients model.
Collapse
Affiliation(s)
- Eun Ryung Lee
- Department of Statistics, Sungkyunkwan University, Seoul, 03063, Korea
| | - Seyoung Park
- Department of Statistics, Sungkyunkwan University, Seoul, 03063, Korea
| | - Sang Kyu Lee
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, 48823, USA
- Biostatistics Branch, National Cancer Institute, Bethesda, MD, 20892, USA
| | - Hyokyoung G Hong
- Biostatistics Branch, National Cancer Institute, Bethesda, MD, 20892, USA.
| |
Collapse
|
3
|
Wei B, Peng L, Guo Y, Manatunga A, Stevens J. Tensor response quantile regression with neuroimaging data. Biometrics 2023; 79:1947-1958. [PMID: 36482808 PMCID: PMC10250564 DOI: 10.1111/biom.13809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 11/25/2022] [Indexed: 12/14/2022]
Abstract
Collecting neuroimaging data in the form of tensors (i.e. multidimensional arrays) has become more common in mental health studies, driven by an increasing interest in studying the associations between neuroimaging phenotypes and clinical disease manifestation. Motivated by a neuroimaging study of post-traumatic stress disorder (PTSD) from the Grady Trauma Project, we study a tensor response quantile regression framework, which enables novel analyses that confer a detailed view of the potentially heterogeneous association between a neuroimaging phenotype and relevant clinical predictors. We adopt a sensible low-rank structure to represent the association of interest, and propose a simple two-step estimation procedure which is easy to implement with existing software. We provide rigorous theoretical justifications for the intuitive two-step procedure. Simulation studies demonstrate good performance of the proposed method with realistic sample sizes in neuroimaging studies. We conduct the proposed tensor response quantile regression analysis of the motivating PTSD study to investigate the association between fMRI resting-state functional connectivity and PTSD symptom severity. Our results uncover non-homogeneous effects of PTSD symptoms on brain functional connectivity, which cannot be captured by existing tensor response methods.
Collapse
Affiliation(s)
- Bo Wei
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, U.S.A
| | - Limin Peng
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, U.S.A
| | - Ying Guo
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, U.S.A
| | - Amita Manatunga
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, U.S.A
| | - Jennifer Stevens
- Department of Psychiatry and Behavior Sciences, Emory University, Atlanta, GA, 30322, U.S.A
| |
Collapse
|
4
|
Wang Y, Ghassabian A, Gu B, Afanasyeva Y, Li Y, Trasande L, Liu M. Semiparametric distributed lag quantile regression for modeling time-dependent exposure mixtures. Biometrics 2023; 79:2619-2632. [PMID: 35612351 PMCID: PMC10718172 DOI: 10.1111/biom.13702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 05/18/2022] [Indexed: 11/29/2022]
Abstract
Studying time-dependent exposure mixtures has gained increasing attentions in environmental health research. When a scalar outcome is of interest, distributed lag (DL) models have been employed to characterize the exposures effects distributed over time on the mean of final outcome. However, there is a methodological gap on investigating time-dependent exposure mixtures with different quantiles of outcome. In this paper, we introduce semiparametric partial-linear single-index (PLSI) DL quantile regression, which can describe the DL effects of time-dependent exposure mixtures on different quantiles of outcome and identify susceptible periods of exposures. We consider two time-dependent exposure settings: discrete and functional, when exposures are measured in a small number of time points and at dense time grids, respectively. Spline techniques are used to approximate the nonparametric DL function and single-index link function, and a profile estimation algorithm is proposed. Through extensive simulations, we demonstrate the performance and value of our proposed models and inference procedures. We further apply the proposed methods to study the effects of maternal exposures to ambient air pollutants of fine particulate and nitrogen dioxide on birth weight in New York University Children's Health and Environment Study (NYU CHES).
Collapse
Affiliation(s)
- Yuyan Wang
- Department of Population Health, NYU Grossman School of Medicine, New York, New York, USA
| | - Akhgar Ghassabian
- Department of Population Health, NYU Grossman School of Medicine, New York, New York, USA
- Department of Pediatrics, NYU Grossman School of Medicine, New York, New York, USA
- Department of Environmental Medicine, NYU Grossman School of Medicine, New York, New York, USA
| | - Bo Gu
- Department of Population Health, NYU Grossman School of Medicine, New York, New York, USA
| | - Yelena Afanasyeva
- Department of Population Health, NYU Grossman School of Medicine, New York, New York, USA
| | - Yiwei Li
- Department of Population Health, NYU Grossman School of Medicine, New York, New York, USA
| | - Leonardo Trasande
- Department of Population Health, NYU Grossman School of Medicine, New York, New York, USA
- Department of Pediatrics, NYU Grossman School of Medicine, New York, New York, USA
- Department of Environmental Medicine, NYU Grossman School of Medicine, New York, New York, USA
- NYU Wagner School of Public Service, New York, New York, USA
- NYU School of Global Public Health, New York, New York, USA
| | - Mengling Liu
- Department of Population Health, NYU Grossman School of Medicine, New York, New York, USA
- Department of Environmental Medicine, NYU Grossman School of Medicine, New York, New York, USA
| |
Collapse
|
5
|
Wu P, Dupuis J, Liu CT. Identifying important gene signatures of BMI using network structure-aided nonparametric quantile regression. Stat Med 2023; 42:1625-1639. [PMID: 36822218 PMCID: PMC10133010 DOI: 10.1002/sim.9691] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 11/21/2022] [Accepted: 02/12/2023] [Indexed: 02/25/2023]
Abstract
We focus on identifying genomics risk factors of higher body mass index (BMI) incorporating a priori information, such as biological pathways. However, the commonly used methods to incorporate prior information provide a model for the mean function of the outcome and rely on unmet assumptions. To address these concerns, we propose a method for nonparametric additive quantile regression with network regularization to incorporate the information encoded by known networks. To account for nonlinear associations, we approximate the unknown additive functional effect of each predictor with the expansion of a B-spline basis. We implement the group Lasso penalty to obtain a sparse model. We define the network-constrained penalty by the totalℓ 2 $$ {\ell}_2 $$ norm of the difference between the effect functions of any two linked genes in the known network. We further propose an efficient computation procedure to solve the optimization problem that arises in our model. Simulation studies show that our proposed method performs well in identifying more truly associated genes and less falsely associated genes than alternative approaches. We apply the proposed method to analyze the microarray gene-expression dataset in the Framingham Heart Study and identify several 75 percentile BMI associated genes. In conclusion, our proposed approach efficiently identifies the outcome-associated variables in a nonparametric additive quantile regression framework by leveraging known network information.
Collapse
Affiliation(s)
- Peitao Wu
- Department of Biostatistics, Boston University, School of Public Health, Boston, Massachusetts, USA
| | - Josée Dupuis
- Department of Biostatistics, Boston University, School of Public Health, Boston, Massachusetts, USA
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
| | - Ching-Ti Liu
- Department of Biostatistics, Boston University, School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
6
|
Xiong W, Pan H, Wang J, Tian M. An efficient model-free approach to interaction screening for high dimensional data. Stat Med 2023; 42:1583-1605. [PMID: 36857779 DOI: 10.1002/sim.9688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 12/02/2022] [Accepted: 02/06/2023] [Indexed: 03/03/2023]
Abstract
An innovated model-free interaction screening procedure called the MCVIS is proposed for high dimensional data analysis. Specifically, we adopt the introduced MCV index for quantifying the importance of an interaction effect among predictors. Our proposed method is fully nonparametric and is capable of successfully selecting interactions even if the signal of parental main effects is weak. The MCVIS procedure has many distinctive features: (i) it can work with discrete, categorical and continuous covariates; (ii) it can deal with both categorical and continuous response, even handle the missing response; (iii) it is robust for heavy-tailed distributions, thus well accommodates heterogeneity typically caused by high dimensionality; (iv) it enjoys the sure screening and ranking consistency properties, therefore achieves dimension reduction without information loss. In another respect, computational feasibility is a top concern in high dimensional data analysis, by transforming our MCV into several variants, the MCVIS procedure is simple and fast to implement. Extensive numerical experiments and comparisons confirm the effectiveness and wide applicability of our MCVIS procedure. We further illustrate the proposed methodology by empirical study of two real datasets. Supplementary materials are available online.
Collapse
Affiliation(s)
- Wei Xiong
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Han Pan
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Jianrong Wang
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Maozai Tian
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China
| |
Collapse
|
7
|
Variable selection for nonparametric quantile regression via measurement error model. Stat Pap (Berl) 2022. [DOI: 10.1007/s00362-022-01376-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
8
|
Zhang X, Xiao X, Wang F, Brasseur G, Chen S, Wang J, Gao M. Observed sensitivities of PM 2.5 and O 3 extremes to meteorological conditions in China and implications for the future. ENVIRONMENT INTERNATIONAL 2022; 168:107428. [PMID: 35985105 DOI: 10.1016/j.envint.2022.107428] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 06/19/2022] [Accepted: 07/20/2022] [Indexed: 06/15/2023]
Abstract
Frequent extreme air pollution episodes in China accompanied with high concentrations of particulate matters (PM2.5) and ozone (O3) are partly supported by meteorological conditions. However, the relationships between meteorological variables and pollution extremes can be poorly estimated solely based on mean pollutant level. In this study, we use quantile regression to investigate meteorological sensitivities of PM2.5 and O3 extremes, benefiting from nationwide observations of air pollutants over 2013-2019 in China. Results show that surface winds and humidity are identified as key drivers for high PM2.5 events during both summer and winter, with greater sensitivities at higher percentiles. Higher humidity favors the hydroscopic growth of particles during winter, but it tends to decrease PM2.5 through wet scavenging during summer. Surface temperature play dominant role in summer O3 extremes, especially in VOC-limited regime, followed by surface winds and radiation. Sensitivities of O3 to meteorological conditions are relatively unchanging across percentiles. Under the fossil-fueled development pathway (SSP5-8.5) scenario, meteorological conditions are projected to favor winter PM2.5 extremes in North China Plain (NCP), Yangtze River Delta (YRD) and Sichuan Basin (SCB), mainly due to enhanced surface specific humidity. Summer O3 extremes are likely to occur more frequently in the NCP and YRD, associated with warmer temperature and stronger solar radiation. Besides, meteorological conditions over a relatively longer period play a more important role in the formation of pollution extremes. These results improve our understanding of the relationships between extreme PM2.5 and O3 pollution and meteorology, and can be used as a valuable reference of model predicted air pollution extremes.
Collapse
Affiliation(s)
- Xiaorui Zhang
- Department of Geography, Hong Kong Baptist University, Hong Kong, China
| | - Xiang Xiao
- Department of Geography, Hong Kong Baptist University, Hong Kong, China
| | - Fan Wang
- Department of Geography, Hong Kong Baptist University, Hong Kong, China
| | - Guy Brasseur
- Atmospheric Chemistry Observation & Modeling Laboratory, National Center for Atmospheric Research, Boulder, CO, USA
| | - Siyu Chen
- Key Laboratory for Semi-Arid Climate Change of the Ministry of Education, Lanzhou University, Lanzhou, China
| | - Jing Wang
- Tianjin Key Laboratory for Oceanic Meteorology, and Tianjin Institute of Meteorological Science, Tianjin, China
| | - Meng Gao
- Department of Geography, Hong Kong Baptist University, Hong Kong, China; Hong Kong Branch of Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Hong Kong, China.
| |
Collapse
|
9
|
Park S, Lee ER, Zhao H. Low-rank regression models for multiple binary responses and their applications to cancer cell-line encyclopedia data. J Am Stat Assoc 2022; 119:202-216. [PMID: 38481466 PMCID: PMC10928550 DOI: 10.1080/01621459.2022.2105704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 07/16/2022] [Indexed: 10/16/2022]
Abstract
In this paper, we study high-dimensional multivariate logistic regression models in which a common set of covariates is used to predict multiple binary outcomes simultaneously. Our work is primarily motivated from many biomedical studies with correlated multiple responses such as the cancer cell-line encyclopedia project. We assume that the underlying regression coefficient matrix is simultaneously low-rank and row-wise sparse. We propose an intuitively appealing selection and estimation framework based on marginal model likelihood, and we develop an efficient computational algorithm for inference. We establish a novel high-dimensional theory for this nonlinear multivariate regression. Our theory is general, allowing for potential correlations between the binary responses. We propose a new type of nuclear norm penalty using the smooth clipped absolute deviation, filling the gap in the related non-convex penalization literature. We theoretically demonstrate that the proposed approach improves estimation accuracy by considering multiple responses jointly through the proposed estimator when the underlying coefficient matrix is low-rank and row-wise sparse. In particular, we establish the non-asymptotic error bounds, and both rank and row support consistency of the proposed method. Moreover, we develop a consistent rule to simultaneously select the rank and row dimension of the coefficient matrix. Furthermore, we extend the proposed methods and theory to a joint Ising model, which accounts for the dependence relationships. In our analysis of both simulated data and the cancer cell line encyclopedia data, the proposed methods outperform the existing methods in better predicting responses.
Collapse
Affiliation(s)
- Seyoung Park
- Department of Statistics, Sungkyunkwan University, Seoul, 03063, Korea
| | - Eun Ryung Lee
- Department of Statistics, Sungkyunkwan University, Seoul, 03063, Korea
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT, 06511, USA
| |
Collapse
|
10
|
Forward variable selection for ultra-high dimensional quantile regression models. ANN I STAT MATH 2022. [DOI: 10.1007/s10463-022-00849-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
|
11
|
Xiong W, Tian M, Tang M, Pan H. Robust and sparse learning of varying coefficient models with high-dimensional features. J Appl Stat 2022; 50:3312-3336. [PMID: 37969890 PMCID: PMC10637205 DOI: 10.1080/02664763.2022.2109129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Accepted: 07/28/2022] [Indexed: 10/15/2022]
Abstract
Varying coefficient model (VCM) is extensively used in various scientific fields due to its capability of capturing the changing structure of predictors. Classical mean regression analysis is often complicated in the existence of skewed, heterogeneous and heavy-tailed data. For this purpose, this work employs the idea of model averaging and introduces a novel comprehensive approach by incorporating quantile-adaptive weights across different quantile levels to further improve both least square (LS) and quantile regression (QR) methods. The proposed procedure that adaptively takes advantage of the heterogeneous and sparse nature of input data can gain more efficiency and be well adapted to extreme event case and high-dimensional setting. Motivated by its nice properties, we develop several robust methods to reveal the dynamic close-to-truth structure for VCM and consistently uncover the zero and nonzero patterns in high-dimensional scientific discoveries. We provide a new iterative algorithm that is proven to be asymptotic consistent and can attain the optimal nonparametric convergence rate given regular conditions. These introduced procedures are highlighted with extensive simulation examples and several real data analyses to further show their stronger predictive power compared with LS, composite quantile regression (CQR) and QR methods.
Collapse
Affiliation(s)
- Wei Xiong
- School of Statistics, University of International Business and Economics, Beijing, People's Republic of China
| | - Maozai Tian
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, People's Republic of China
| | - Manlai Tang
- Department of Mathematics, College of Engineering, Design and Physical Sciences, Brunel University London, London, UK
| | - Han Pan
- School of Statistics, University of International Business and Economics, Beijing, People's Republic of China
| |
Collapse
|
12
|
Li M, Wang K, Maity A, Staicu AM. Inference in Functional Linear Quantile Regression. J MULTIVARIATE ANAL 2022; 190:104985. [PMID: 35370319 PMCID: PMC8975129 DOI: 10.1016/j.jmva.2022.104985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In this paper, we study statistical inference in functional quantile regression for scalar response and a functional covariate. Specifically, we consider a functional linear quantile regression model where the effect of the covariate on the quantile of the response is modeled through the inner product between the functional covariate and an unknown smooth regression parameter function that varies with the level of quantile. The objective is to test that the regression parameter is constant across several quantile levels of interest. The parameter function is estimated by combining ideas from functional principal component analysis and quantile regression. An adjusted Wald testing procedure is proposed for this hypothesis of interest, and its chi-square asymptotic null distribution is derived. The testing procedure is investigated numerically in simulations involving sparse and noisy functional covariates and in a capital bike share data application. The proposed approach is easy to implement and the R code is published online at https://github.com/xylimeng/fQR-testing.
Collapse
Affiliation(s)
- Meng Li
- Department of Statistics, Rice University, Houston, TX
| | | | - Arnab Maity
- Department of Statistics, North Carolina State University, Raleigh, NC
| | - Ana-Maria Staicu
- Department of Statistics, North Carolina State University, Raleigh, NC
| |
Collapse
|
13
|
Penalized kernel quantile regression for varying coefficient models. J Stat Plan Inference 2022. [DOI: 10.1016/j.jspi.2021.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
14
|
Li X, Wang L, Wang HJ. Sparse Learning and Structure Identification for Ultrahigh-Dimensional Image-on-Scalar Regression. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2020.1753523] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Xinyi Li
- Statistical and Applied Mathematical Sciences Institute (SAMSI), Durham, NC
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Li Wang
- Department of Statistics, Iowa State University, Ames, IA
| | - Huixia Judy Wang
- Department of Statistics, George Washington University, Washington, DC
| | | |
Collapse
|
15
|
Ekin T, Damien P. Analysis of Health Care Billing via Quantile Variable Selection Models. Healthcare (Basel) 2021; 9:1274. [PMID: 34682954 PMCID: PMC8535243 DOI: 10.3390/healthcare9101274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 09/20/2021] [Accepted: 09/23/2021] [Indexed: 11/20/2022] Open
Abstract
Fraudulent billing of health care insurance programs such as Medicare is in the billions of dollars. The extent of such overpayments remains an issue despite the emerging use of analytical methods for fraud detection. This motivates policy makers to also be interested in the provider billing characteristics and understand the common factors that drive conservative and/or aggressive behavior. Statistical approaches to tackling this problem are confronted by the asymmetric and/or leptokurtic distributions of billing data. This paper is a first attempt at using a quantile regression framework and a variable selection approach for medical billing analysis. The proposed method addresses the varying impacts of (potentially different) variables at the different quantiles of the billing aggressiveness distribution. We use the mammography procedure to showcase our analysis and offer recommendations on fraud detection.
Collapse
Affiliation(s)
- Tahir Ekin
- McCoy College of Business, Texas State University, San Marcos, TX 78666, USA;
| | - Paul Damien
- McCombs School of Business, University of Texas in Austin, Austin, TX 78712, USA
| |
Collapse
|
16
|
Jiang F, Zhao Z, Shao X. Modelling the COVID‐19 infection trajectory: A piecewise linear quantile trend model*. J R Stat Soc Series B Stat Methodol 2021. [DOI: 10.1111/rssb.12453] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Feiyu Jiang
- Department of Statistics School of ManagementFudan University Shanghai China
| | - Zifeng Zhao
- Department of Information Technology, Analytics, and Operations Mendoza College of BusinessUniversity of Notre Dame Notre Dame Indiana USA
| | - Xiaofeng Shao
- Department of Statistics University of Illinois at Urbana‐Champaign Champaign Illinois USA
| |
Collapse
|
17
|
Reconstruction of the evolutionary biogeography reveal the origins and diversification of oysters (Bivalvia: Ostreidae). Mol Phylogenet Evol 2021; 164:107268. [PMID: 34302948 DOI: 10.1016/j.ympev.2021.107268] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 06/15/2021] [Accepted: 07/19/2021] [Indexed: 11/22/2022]
Abstract
Oysters (Bivalvia: Ostreidae Rafinesque, 1815) live in the intertidal and shallow subtidal areas worldwide. Despite their long evolutionary histories, abundant fossil records, global distribution, and ecological significance, a systematic time-dependent biogeographical analysis of this family is still lacking. Using combined mitochondrial (COI and 16S rRNA) and nuclear (18S rRNA, 28S rRNA, H3 and ITS2) gene makers for 80% (70/88) of the recognized extant Ostreidae, we reconstructed the global phylogenetic and biogeographical relationships throughout the evolutionary history of oysters. The result provided a holistic view of the origin, migration and dispersal patterns of Ostreidae. The phylogenetic results and fossil evidence indicated that Ostreidae originated from the circum-Arctic region in the Early Jurassic. The widening of the Atlantic Ocean and changes in the Tethys Ocean further facilitated their subsequent diversification during the Cretaceous and the Palaeogene periods. In particular, Crassostrea and Saccostrea exhibited relatively low dispersal abilities and their major diversifications were consistent with the tectonic events. Environmental adaptations and reproductive patterns, therefore, should play key roles in the formation of oyster distribution patterners, rather than the dispersal ability of their planktonic larvae. The diversity dynamics inferred by standard phylogenetic are consistent with the fossil record, however, further systematic classification, especially for fossil genus Ostrea, would enhance our understanding on extant and fossil oysters. The present study of the historical biogeography of oysters provides new insights into the evolution and speciation of oysters. Our findings also provide a foundation for the assessment of evolutionary patterns and ecological processes in intertidal and inshore life.
Collapse
|
18
|
Park S, Lee ER. Hypothesis testing of varying coefficients for regional quantiles. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2021.107204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
19
|
Frumento P, Bottai M, Fernández-Val I. Parametric Modeling of Quantile Regression Coefficient Functions With Longitudinal Data. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1892702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Paolo Frumento
- Department of Political Sciences, University of Pisa, Pisa, Italy
| | - Matteo Bottai
- Unit of Biostatistics, Karolinska Institutet, Institute of Environmental Medicine, Stockholm, Sweden
| | | |
Collapse
|
20
|
Abstract
AbstractApplying quantile regression to count data presents logical and practical complications which are usually solved by artificially smoothing the discrete response variable through jittering. In this paper, we present an alternative approach in which the quantile regression coefficients are modeled by means of (flexible) parametric functions. The proposed method avoids jittering and presents numerous advantages over standard quantile regression in terms of computation, smoothness, efficiency, and ease of interpretation. Estimation is carried out by minimizing a “simultaneous” version of the loss function of ordinary quantile regression. Simulation results show that the described estimators are similar to those obtained with jittering, but are often preferable in terms of bias and efficiency. To exemplify our approach and provide guidelines for model building, we analyze data from the US National Medical Expenditure Survey. All the necessary software is implemented in the existing R package .
Collapse
|
21
|
Variable-Order Equivalent Circuit Modeling and State of Charge Estimation of Lithium-Ion Battery Based on Electrochemical Impedance Spectroscopy. ENERGIES 2021. [DOI: 10.3390/en14030769] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In the battery management system, it is important to accurately and efficiently estimate the state of charge (SOC) of lithium-ion batteries, which generally requires the establishment of a equivalent circuit model of the battery, whose accuracy and rationality play an important role in accurately estimating the state of lithium-ion batteries. The traditional single order equivalent circuit models do not take into account the changes of impedance spectrum under the action of multiple factors, nor do they take into account the balance of practicality and complexity of the model, resulting the low accuracy and poor practicability. In this paper, the theory of electrochemical impedance spectroscopy is used to guide and improve the equivalent circuit model. Based on the analysis of the variation of the high and intermediate frequency range of the impedance spectrum with the state of charge and temperature of the battery, a variable order equivalent model (VOEM) is proposed by Arrhenius equation and Bayesian information criterion (BIC), and the state equation and observation equation of VOEM are improved by autoregressive (AR) equations. Combined with the unscented Kalman filter (UKF), a SOC online estimation method is proposed, named VOEM-AR-UKF. The experimental results show that the proposed method has high accuracy and good adaptability.
Collapse
|
22
|
Hu X, Huang J, Liu L, Sun D, Zhao X. Subgroup analysis in the heterogeneous Cox model. Stat Med 2020; 40:739-757. [PMID: 33169428 DOI: 10.1002/sim.8800] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Revised: 10/02/2020] [Accepted: 10/21/2020] [Indexed: 11/11/2022]
Abstract
In the analysis of censored survival data, to avoid a biased inference of treatment effects on the hazard function of the survival time, it is important to consider the treatment heterogeneity. Without requiring any prior knowledge about the subgroup structure, we propose a data driven subgroup analysis procedure for the heterogeneous Cox model by constructing a pairwise fusion penalized partial likelihood-based objective function. The proposed method can determine the number of subgroups, identify the group structure, and estimate the treatment effect simultaneously and automatically. A majorized alternating direction method of multipliers algorithm is then developed to deal with the numerically challenging high-dimensional problems. We also establish the oracle properties and the model selection consistency for the proposed penalized estimator. Our proposed method is evaluated by simulation studies and further illustrated by the analysis of the breast cancer data.
Collapse
Affiliation(s)
- Xiangbin Hu
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China
| | - Jian Huang
- Department of Statistics and Actuarial Science, University of Iowa, Iowa City, Iowa, USA
| | - Li Liu
- School of Mathematics and Statistics, Wuhan University, Wuhan, China
| | - Defeng Sun
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China
| | - Xingqiu Zhao
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China
| |
Collapse
|
23
|
Selig K, Shaw P, Ankerst D. Bayesian information criterion approximations to Bayes factors for univariate and multivariate logistic regression models. Int J Biostat 2020; 17:241-266. [PMID: 33119543 DOI: 10.1515/ijb-2020-0045] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 10/08/2020] [Indexed: 11/15/2022]
Abstract
Schwarz's criterion, also known as the Bayesian Information Criterion or BIC, is commonly used for model selection in logistic regression due to its simple intuitive formula. For tests of nested hypotheses in independent and identically distributed data as well as in Normal linear regression, previous results have motivated use of Schwarz's criterion by its consistent approximation to the Bayes factor (BF), defined as the ratio of posterior to prior model odds. Furthermore, under construction of an intuitive unit-information prior for the parameters of interest to test for inclusion in the nested models, previous results have shown that Schwarz's criterion approximates the BF to higher order in the neighborhood of the simpler nested model. This paper extends these results to univariate and multivariate logistic regression, providing approximations to the BF for arbitrary prior distributions and definitions of the unit-information prior corresponding to Schwarz's approximation. Simulations show accuracies of the approximations for small samples sizes as well as comparisons to conclusions from frequentist testing. We present an application in prostate cancer, the motivating setting for our work, which illustrates the approximation for large data sets in a practical example.
Collapse
Affiliation(s)
- Katharina Selig
- Department of Mathematics, Technical University of Munich, Munchen, Germany
| | - Pamela Shaw
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Donna Ankerst
- Department of Mathematics, Technical University of Munich, Munchen, Germany
| |
Collapse
|
24
|
Ciuperca G. Adaptive elastic-net selection in a quantile model with diverging number of variable groups. STATISTICS-ABINGDON 2020. [DOI: 10.1080/02331888.2020.1830402] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Gabriela Ciuperca
- Institut Camille Jordan, UMR 5208, Université Claude Bernard Lyon 1, Villeurbanne Cedex, France
| |
Collapse
|
25
|
Liu H, Ma J, Peng C. Shrinkage estimation for identification of linear components in composite quantile additive models. COMMUN STAT-SIMUL C 2020. [DOI: 10.1080/03610918.2018.1524905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Huilan Liu
- Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guiyang, P. R. China
- School of Mathematics and Statistics, Guizhou University, Guiyang, P. R. China
| | - Junjie Ma
- School of Mathematics and Statistics, Guizhou University, Guiyang, P. R. China
| | - Changgen Peng
- Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guiyang, P. R. China
| |
Collapse
|
26
|
Forward variable selection for sparse ultra-high-dimensional generalized varying coefficient models. JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE 2020. [DOI: 10.1007/s42081-020-00090-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
27
|
Wang F, Lin L, Liu L, Wang K. Estimation and clustering for partially heterogeneous single index model. Stat Pap (Berl) 2020. [DOI: 10.1007/s00362-020-01203-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
28
|
Bai Y, Tian M, Tang ML, Lee WY. Variable selection for ultra-high dimensional quantile regression with missing data and measurement error. Stat Methods Med Res 2020; 30:129-150. [PMID: 32746735 DOI: 10.1177/0962280220941533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In this paper, we consider variable selection for ultra-high dimensional quantile regression model with missing data and measurement errors in covariates. Specifically, we correct the bias in the loss function caused by measurement error by applying the orthogonal quantile regression approach and remove the bias caused by missing data using the inverse probability weighting. A nonconvex Atan penalized estimation method is proposed for simultaneous variable selection and estimation. With the proper choice of the regularization parameter and under some relaxed conditions, we show that the proposed estimate enjoys the oracle properties. The choice of smoothing parameters is also discussed. The performance of the proposed variable selection procedure is assessed by Monte Carlo simulation studies. We further demonstrate the proposed procedure with a breast cancer data set.
Collapse
Affiliation(s)
- Yongxin Bai
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China
| | - Maozai Tian
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China.,School of Statistics and Information, Xinjiang University of Finance and Economics, Urumqi, China.,School of Statistics, Lanzhou University of Finance and Economics, Lanzhou, China
| | - Man-Lai Tang
- Department of Mathematics, Statistics and Insurance, The Hang Seng University of Hong Kong, Siu Lek Yuen, China.,Big Data Intelligence Centre, The Hang Seng University of Hong Kong, Siu Lek Yuen, China
| | - Wing-Yan Lee
- Department of Mathematics, Statistics and Insurance, The Hang Seng University of Hong Kong, Siu Lek Yuen, China
| |
Collapse
|
29
|
Abstract
Quantile regression is widely used to estimate conditional quantiles of an outcome variable of interest given covariates. This method can estimate one quantile at a time without imposing any constraints on the quantile process other than the linear combination of covariates and parameters specified by the regression model. While this is a flexible modeling tool, it generally yields erratic estimates of conditional quantiles and regression coefficients. Recently, parametric models for the regression coefficients have been proposed that can help balance bias and sampling variability. So far, however, only models that are linear in the parameters and covariates have been explored. This paper presents the general case of nonlinear parametric quantile models. These can be nonlinear with respect to the parameters, the covariates, or both. Some important features and asymptotic properties of the proposed estimator are described, and its finite-sample behavior is assessed in a simulation study. Nonlinear parametric quantile models are applied to estimate extreme quantiles of longitudinal measures of respiratory mechanics in asthmatic children from an epidemiological study and to evaluate a dose-response relationship in a toxicological laboratory experiment.
Collapse
Affiliation(s)
- Matteo Bottai
- Division of Biostatistics, Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Giovanna Cilluffo
- Institute for Biomedical Research and Innovation (IRIB), National Research Council (CNR), Palermo, Italy
| |
Collapse
|
30
|
Liu Z, Xiong Z. Non-marginal feature screening for additive hazard model with ultrahigh-dimensional covariates. COMMUN STAT-THEOR M 2020. [DOI: 10.1080/03610926.2020.1770288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Zili Liu
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
| | - Zikang Xiong
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
| |
Collapse
|
31
|
McKinnon KA, Poppick A. Estimating Changes in the Observed Relationship Between Humidity and Temperature Using Noncrossing Quantile Smoothing Splines. JOURNAL OF AGRICULTURAL, BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2020. [DOI: 10.1007/s13253-020-00393-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
32
|
Jiang F, Cheng Q, Yin G, Shen H. Functional Censored Quantile Regression. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2019.1602047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Fei Jiang
- Department of Statistics and Actuarial Science, University of Hong Kong, Pokfulam, Hong Kong
| | - Qing Cheng
- Center for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Guosheng Yin
- Patrick S C Poon Professor in Statistics and Actuarial Science, Department of Statistics and Actuarial Science, University of Hong Kong, Pokfulam, Hong Kong
| | - Haipeng Shen
- Innovation and Information Management, University of Hong Kong, Pokfulam, Hong Kong
| |
Collapse
|
33
|
Honda T, Ing CK, Wu WY. Adaptively weighted group Lasso for semiparametric quantile regression models. BERNOULLI 2019. [DOI: 10.3150/18-bej1091] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
34
|
Ma S, Huang J, Zhang Z, Liu M. Exploration of Heterogeneous Treatment Effects via Concave Fusion. Int J Biostat 2019; 16:ijb-2018-0026. [PMID: 31541601 DOI: 10.1515/ijb-2018-0026] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 07/26/2019] [Indexed: 11/15/2022]
Abstract
Understanding treatment heterogeneity is essential to the development of precision medicine, which seeks to tailor medical treatments to subgroups of patients with similar characteristics. One of the challenges of achieving this goal is that we usually do not have a priori knowledge of the grouping information of patients with respect to treatment effect. To address this problem, we consider a heterogeneous regression model which allows the coefficients for treatment variables to be subject-dependent with unknown grouping information. We develop a concave fusion penalized method for estimating the grouping structure and the subgroup-specific treatment effects, and derive an alternating direction method of multipliers algorithm for its implementation. We also study the theoretical properties of the proposed method and show that under suitable conditions there exists a local minimizer that equals the oracle least squares estimator based on a priori knowledge of the true grouping information with high probability. This provides theoretical support for making statistical inference about the subgroup-specific treatment effects using the proposed method. The proposed method is illustrated in simulation studies and illustrated with real data from an AIDS Clinical Trials Group Study.
Collapse
Affiliation(s)
- Shujie Ma
- Department of Statistics, University of California at Riverside, Riverside, California 92521, USA
| | - Jian Huang
- Department of Statistics and Actuarial Science, University of Iowa, Iowa City, USA
| | - Zhiwei Zhang
- Department of Statistics, University of California at Riverside, Riverside, California 92521, USA
| | - Mingming Liu
- Department of Statistics, University of California at Riverside, Riverside, California, USA
| |
Collapse
|
35
|
Regularized quantile regression for ultrahigh-dimensional data with nonignorable missing responses. METRIKA 2019. [DOI: 10.1007/s00184-019-00744-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
36
|
Screening and selection for quantile regression using an alternative measure of variable importance. J MULTIVARIATE ANAL 2019. [DOI: 10.1016/j.jmva.2019.04.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
37
|
Sparse model identification and learning for ultra-high-dimensional additive partially linear models. J MULTIVARIATE ANAL 2019. [DOI: 10.1016/j.jmva.2019.02.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
38
|
Li X, Wang L, Nettleton D. Additive partially linear models for ultra‐high‐dimensional regression. Stat (Int Stat Inst) 2019. [DOI: 10.1002/sta4.223] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Xinyi Li
- SAMSI/Department of Statistics and Operations Research University of North Carolina at Chapel Hill Chapel Hill North Carolina
| | - Li Wang
- Department of Statistics Iowa State University Ames Iowa
| | - Dan Nettleton
- Department of Statistics Iowa State University Ames Iowa
| |
Collapse
|
39
|
Wang J, Li J, Li Y, Wong WK. A model-based multithreshold method for subgroup identification. Stat Med 2019; 38:2605-2631. [PMID: 30887552 DOI: 10.1002/sim.8136] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Revised: 01/29/2019] [Accepted: 02/11/2019] [Indexed: 11/07/2022]
Abstract
Thresholding variable plays a crucial role in subgroup identification for personalized medicine. Most existing partitioning methods split the sample based on one predictor variable. In this paper, we consider setting the splitting rule from a combination of multivariate predictors, such as the latent factors, principle components, and weighted sum of predictors. Such a subgrouping method may lead to more meaningful partitioning of the population than using a single variable. In addition, our method is based on a change point regression model and thus yields straight forward model-based prediction results. After choosing a particular thresholding variable form, we apply a two-stage multiple change point detection method to determine the subgroups and estimate the regression parameters. We show that our approach can produce two or more subgroups from the multiple change points and identify the true grouping with high probability. In addition, our estimation results enjoy oracle properties. We design a simulation study to compare performances of our proposed and existing methods and apply them to analyze data sets from a Scleroderma trial and a breast cancer study.
Collapse
Affiliation(s)
- Jingli Wang
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| | - Jialiang Li
- Department of Statistics and Applied Probability, National University of Singapore, Singapore.,Duke University-NUS Graduate Medical School, Singapore.,Singapore Eye Research Institute, Singapore
| | - Yaguang Li
- University of Science and Technology of China, Hefei, China
| | - Weng Kee Wong
- Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles, Los Angeles, California
| |
Collapse
|
40
|
Sottile G, Frumento P, Chiodi M, Bottai M. A penalized approach to covariate selection through quantile regression coefficient models. STAT MODEL 2019. [DOI: 10.1177/1471082x19825523] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The coefficients of a quantile regression model are one-to-one functions of the order of the quantile. In standard quantile regression (QR), different quantiles are estimated one at a time. Another possibility is to model the coefficient functions parametrically, an approach that is referred to as quantile regression coefficients modeling (QRCM). Compared with standard QR, the QRCM approach facilitates estimation, inference and interpretation of the results, and generates more efficient estimators. We designed a penalized method that can address the selection of covariates in this particular modelling framework. Unlike standard penalized quantile regression estimators, in which model selection is quantile-specific, our approach permits using information on all quantiles simultaneously. We describe the estimator, provide simulation results and analyse the data that motivated the present article. The proposed approach is implemented in the qrcmNP package in R.
Collapse
Affiliation(s)
- Gianluca Sottile
- Institute of Environmental Medicine, Unit of Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Paolo Frumento
- Department of Economics, Business and Statistics, University of Palermo, Palermo, Italy
| | - Marcello Chiodi
- Department of Economics, Business and Statistics, University of Palermo, Palermo, Italy
| | - Matteo Bottai
- Institute of Environmental Medicine, Unit of Biostatistics, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
41
|
Lee ER, Cho J, Yu K. A systematic review on model selection in high-dimensional regression. J Korean Stat Soc 2019. [DOI: 10.1016/j.jkss.2018.10.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
42
|
Pan J, Zhang S, Zhou Y. Variable screening for ultrahigh dimensional censored quantile regression. J STAT COMPUT SIM 2018. [DOI: 10.1080/00949655.2018.1554068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Jing Pan
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Shucong Zhang
- School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China
| | - Yong Zhou
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science, MOE, and Institute of Statistics and Interdisciplinary Sciences and School of Statistics, East China Normal University, Shanghai, China
| |
Collapse
|
43
|
Wang L, Van Keilegom I, Maidman A. Wild residual bootstrap inference for penalized quantile regression with heteroscedastic errors. Biometrika 2018. [DOI: 10.1093/biomet/asy037] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Lan Wang
- School of Statistics, University of Minnesota, 224 Church Street South East, Minneapolis, Minnesota, USA
| | - Ingrid Van Keilegom
- Research Centre for Operations Research and Business Statistics, KU Leuven, Naamsestraat 69, Leuven, Belgium
| | - Adam Maidman
- School of Statistics, University of Minnesota, 224 Church Street South East, Minneapolis, Minnesota, USA
| |
Collapse
|
44
|
Ahn KW, Kim S. Variable selection with group structure in competing risks quantile regression. Stat Med 2018; 37:1577-1586. [PMID: 29468710 DOI: 10.1002/sim.7619] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Revised: 12/05/2017] [Accepted: 01/03/2018] [Indexed: 11/11/2022]
Abstract
We study the group bridge and the adaptive group bridge penalties for competing risks quantile regression with group variables. While the group bridge consistently identifies nonzero group variables, the adaptive group bridge consistently selects variables not only at group level but also at within-group level. We allow the number of covariates to diverge as the sample size increases. The oracle property for both methods is also studied. The performance of the group bridge and the adaptive group bridge is compared in simulation and in a real data analysis. The simulation study shows that the adaptive group bridge selects nonzero within-group variables more consistently than the group bridge. A bone marrow transplant study is provided as an example.
Collapse
Affiliation(s)
- Kwang Woo Ahn
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Soyoung Kim
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| |
Collapse
|
45
|
Tarca AL, Romero R, Gudicha DW, Erez O, Hernandez-Andrade E, Yeo L, Bhatti G, Pacora P, Maymon E, Hassan SS. A new customized fetal growth standard for African American women: the PRB/NICHD Detroit study. Am J Obstet Gynecol 2018; 218:S679-S691.e4. [PMID: 29422207 DOI: 10.1016/j.ajog.2017.12.229] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 12/21/2017] [Accepted: 12/22/2017] [Indexed: 01/08/2023]
Abstract
BACKGROUND The assessment of fetal growth disorders requires a standard. Current nomograms for the assessment of fetal growth in African American women have been derived either from neonatal (rather than fetal) biometry data or have not been customized for maternal ethnicity, weight, height, and parity and fetal sex. OBJECTIVE We sought to (1) develop a new customized fetal growth standard for African American mothers; and (2) compare such a standard to 3 existing standards for the classification of fetuses as small (SGA) or large (LGA) for gestational age. STUDY DESIGN A retrospective cohort study included 4183 women (4001 African American and 182 Caucasian) from the Detroit metropolitan area who underwent ultrasound examinations between 14-40 weeks of gestation (the median number of scans per pregnancy was 5, interquartile range 3-7) and for whom relevant covariate data were available. Longitudinal quantile regression was used to build models defining the "normal" estimated fetal weight (EFW) centiles for gestational age in African American women, adjusted for maternal height, weight, and parity and fetal sex, and excluding pathologic factors with a significant effect on fetal weight. The resulting Perinatology Research Branch/Eunice Kennedy Shriver National Institute of Child Health and Human Development (hereinafter, PRB/NICHD) growth standard was compared to 3 other existing standards--the customized gestation-related optimal weight (GROW) standard; the Eunice Kennedy Shriver National Institute of Child Health and Human Development (hereinafter, NICHD) African American standard; and the multinational World Health Organization (WHO) standard--utilized to screen fetuses for SGA (<10th centile) or LGA (>90th centile) based on the last available ultrasound examination for each pregnancy. RESULTS First, the mean birthweight at 40 weeks was 133 g higher for neonates born to Caucasian than to African American mothers and 150 g higher for male than female neonates; maternal weight, height, and parity had a positive effect on birthweight. Second, analysis of longitudinal EFW revealed the following features of fetal growth: (1) all weight centiles were about 2% higher for male than for female fetuses; (2) maternal height had a positive effect on EFW, with larger fetuses being affected more (2% increase in the 95th centile of weight for each 10-cm increase in height); and (3) maternal weight and parity had a positive effect on EFW that increased with gestation and varied among the weight centiles. Third, the screen-positive rate for SGA was 7.2% for the NICHD African American standard, 12.3% for the GROW standard, 13% for the WHO standard customized by fetal sex, and 14.4% for the PRB/NICHD customized standard. For all standards, the screen-positive rate for SGA was at least 2-fold higher among fetuses delivered preterm than at term. Fourth, the screen-positive rate for LGA was 8.7% for the GROW standard, 9.2% for the PRB/NICHD customized standard, 10.8% for the WHO standard customized by fetal sex, and 12.3% for the NICHD African American standard. Finally, the highest overall agreement among standards was between the GROW and PRB/NICHD customized standards (Cohen's interrater agreement, kappa = 0.85). CONCLUSION We developed a novel customized PRB/NICHD fetal growth standard from fetal data in an African American population without assuming proportionality of the effects of covariates, and without assuming that these effects are equal on all centiles of weight; we also provide an easy-to-use centile calculator. This standard classified more fetuses as being at risk for SGA compared to existing standards, especially among fetuses delivered preterm, but classified about the same number of LGA. The comparison among the 4 growth standards also revealed that the most important factor determining agreement among standards is whether they account for the same factors known to affect fetal growth.
Collapse
Affiliation(s)
- Adi L Tarca
- Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development/National Institutes of Health/US Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI; Department of Computer Science, Wayne State University College of Engineering, Detroit, MI
| | - Roberto Romero
- Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development/National Institutes of Health/US Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, MI; Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI; Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI.
| | - Dereje W Gudicha
- Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development/National Institutes of Health/US Department of Health and Human Services, Bethesda, MD, and Detroit, MI
| | - Offer Erez
- Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development/National Institutes of Health/US Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI
| | - Edgar Hernandez-Andrade
- Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development/National Institutes of Health/US Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI
| | - Lami Yeo
- Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development/National Institutes of Health/US Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI
| | - Gaurav Bhatti
- Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development/National Institutes of Health/US Department of Health and Human Services, Bethesda, MD, and Detroit, MI
| | - Percy Pacora
- Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development/National Institutes of Health/US Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI
| | - Eli Maymon
- Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development/National Institutes of Health/US Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI
| | - Sonia S Hassan
- Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development/National Institutes of Health/US Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI; Department of Physiology, Wayne State University School of Medicine, Detroit, MI
| |
Collapse
|
46
|
Maidman A, Wang L. New semiparametric method for predicting high-cost patients. Biometrics 2017; 74:1104-1111. [PMID: 29228454 DOI: 10.1111/biom.12834] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 08/01/2017] [Accepted: 10/01/2017] [Indexed: 01/23/2023]
Abstract
Motivated by the Medical Expenditure Panel Survey containing data from individuals' medical providers and employers across the United States, we propose a new semiparametric procedure for predicting whether a patient will incur high medical expenditure. Problems of the same nature arise in many other important applications where one would like to predict if a future response occurs at the upper (or lower) tail of the response distribution. The common practice is to artificially dichotomize the response variable and then apply an existing classification method such as binomial regression or a classification tree. We propose a new semiparametric prediction rule to classify whether a future response occurs at the upper tail of the response distribution. The new method can be considered a semiparametric estimator of the Bayes rule for classification and enjoys some nice features. It does not require an artificially dichotomized response and better uses the information contained in the data. It does not require any parametric distributional assumptions and tends to be more robust. It incorporates nonlinear covariate effects and can be adapted to construct a prediction interval and hence provides more information about the future response. We provide an R package plaqr to implement the proposed procedure and demonstrate its performance in Monte Carlo simulations. We illustrate the application of the new method on a subset of the Medical Expenditure Panel Survey data.
Collapse
Affiliation(s)
- Adam Maidman
- School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| | - Lan Wang
- School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| |
Collapse
|
47
|
Yu L, Lin N, Wang L. A Parallel Algorithm for Large-Scale Nonconvex Penalized Quantile Regression. J Comput Graph Stat 2017. [DOI: 10.1080/10618600.2017.1328366] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Liqun Yu
- Department of Mathematics, Washington University in St. Louis, St. Louis, MO
| | - Nan Lin
- Department of Mathematics, Washington University in St. Louis, St. Louis, MO
| | - Lan Wang
- School of Statistics, University of Minnesota, Minneapolis, MN
| |
Collapse
|
48
|
Wang Y, Zhu L. Variable selection and parameter estimation via WLAD–SCAD with a diverging number of parameters. J Korean Stat Soc 2017. [DOI: 10.1016/j.jkss.2016.12.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
49
|
Liu H, Yang H. Penalized composite quantile estimation for censored regression model with a diverging number of parameters. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2015.1130840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Huilan Liu
- College of Mathematics and Statistics, Chongqing University, Chongqing, P. R. China
- School of Mathematics and Statistics, Guizhou University, Guiyang, Guizhou, P. R. China
| | - Hu Yang
- College of Mathematics and Statistics, Chongqing University, Chongqing, P. R. China
| |
Collapse
|
50
|
Abstract
In quantile linear regression with ultra-high dimensional data, we propose an algorithm for screening all candidate variables and subsequently selecting relevant predictors. Specifically, we first employ quantile partial correlation for screening, and then we apply the extended Bayesian information criterion (EBIC) for best subset selection. Our proposed method can successfully select predictors when the variables are highly correlated, and it can also identify variables that make a contribution to the conditional quantiles but are marginally uncorrelated or weakly correlated with the response. Theoretical results show that the proposed algorithm can yield the sure screening set. By controlling the false selection rate, model selection consistency can be achieved theoretically. In practice, we proposed using EBIC for best subset selection so that the resulting model is screening consistent. Simulation studies demonstrate that the proposed algorithm performs well, and an empirical example is presented.
Collapse
Affiliation(s)
- Shujie Ma
- Assistant Professor, Department of Statistics, University of California-Riverside, Riverside, CA 92521
| | - Runze Li
- Verne M. Willaman Professor, Department of Statistics, the Pennsylvania State University, University Park, PA 16802
| | - Chih-Ling Tsai
- Distinguished Professor and Robert W. Glock Endowed Chair in Management, Graduate School of Management, University of California at Davis, Davis, CA 95616
| |
Collapse
|