1
|
Guo Z, Chen M, Fan Y, Song Y. A general adaptive ridge regression method for generalized linear models: an iterative re-weighting approach. COMMUN STAT-THEOR M 2022. [DOI: 10.1080/03610926.2022.2028841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- Zijun Guo
- College of Science, University of Shanghai for Science and Technology, Shanghai, China
| | - Mengxing Chen
- College of Science, University of Shanghai for Science and Technology, Shanghai, China
| | - Yali Fan
- College of Science, University of Shanghai for Science and Technology, Shanghai, China
| | - Yan Song
- Department of Control Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| |
Collapse
|
2
|
Aydın D, Ahmed SE, Yılmaz E. Right-Censored Time Series Modeling by Modified Semi-Parametric A-Spline Estimator. ENTROPY 2021; 23:e23121586. [PMID: 34945891 PMCID: PMC8699840 DOI: 10.3390/e23121586] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 11/20/2021] [Accepted: 11/22/2021] [Indexed: 11/16/2022]
Abstract
This paper focuses on the adaptive spline (A-spline) fitting of the semiparametric regression model to time series data with right-censored observations. Typically, there are two main problems that need to be solved in such a case: dealing with censored data and obtaining a proper A-spline estimator for the components of the semiparametric model. The first problem is traditionally solved by the synthetic data approach based on the Kaplan-Meier estimator. In practice, although the synthetic data technique is one of the most widely used solutions for right-censored observations, the transformed data's structure is distorted, especially for heavily censored datasets, due to the nature of the approach. In this paper, we introduced a modified semiparametric estimator based on the A-spline approach to overcome data irregularity with minimum information loss and to resolve the second problem described above. In addition, the semiparametric B-spline estimator was used as a benchmark method to gauge the success of the A-spline estimator. To this end, a detailed Monte Carlo simulation study and a real data sample were carried out to evaluate the performance of the proposed estimator and to make a practical comparison.
Collapse
Affiliation(s)
- Dursun Aydın
- Department of Statistics, Faculty of Science, Mugla Sitki Kocman University, Kotekli 48000, Turkey;
| | - Syed Ejaz Ahmed
- Department of Mathematics and Statistics, Faculty of Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON L2S 3A1, Canada;
| | - Ersin Yılmaz
- Department of Statistics, Faculty of Science, Mugla Sitki Kocman University, Kotekli 48000, Turkey;
- Correspondence:
| |
Collapse
|
3
|
Bouaziz O, Lauridsen E, Nuel G. Regression modelling of interval censored data based on the adaptive ridge procedure. J Appl Stat 2021; 49:3319-3343. [DOI: 10.1080/02664763.2021.1944996] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
| | - Eva Lauridsen
- Ressource Center for Rare Oral Diseases, Copenhagen University Hospital, Copenhagen, Denmark
| | | |
Collapse
|
4
|
Goepp V, Thalabard JC, Nuel G, Bouaziz O. Regularized bidimensional estimation of the hazard rate. Int J Biostat 2021; 18:263-277. [PMID: 33768761 DOI: 10.1515/ijb-2019-0003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 02/26/2021] [Indexed: 11/15/2022]
Abstract
In epidemiological or demographic studies, with variable age at onset, a typical quantity of interest is the incidence of a disease (for example the cancer incidence). In these studies, the individuals are usually highly heterogeneous in terms of dates of birth (the cohort) and with respect to the calendar time (the period) and appropriate estimation methods are needed. In this article a new estimation method is presented which extends classical age-period-cohort analysis by allowing interactions between age, period and cohort effects. We introduce a bidimensional regularized estimate of the hazard rate where a penalty is introduced on the likelihood of the model. This penalty can be designed either to smooth the hazard rate or to enforce consecutive values of the hazard to be equal, leading to a parsimonious representation of the hazard rate. In the latter case, we make use of an iterative penalized likelihood scheme to approximate the L 0 norm, which makes the computation tractable. The method is evaluated on simulated data and applied on breast cancer survival data from the SEER program.
Collapse
Affiliation(s)
- Vivien Goepp
- MAP5, CNRS UMR 8145, 45, rue des Saints-Pères, 75006, Paris, France.,MINES ParisTech, CBIO-Centre for Computational Biology, PSL Research University, 75006, Paris, France.,Institut Curie, PSL Research University, 75005, Paris, France.,Inserm, U900, Paris, France
| | | | - Grégory Nuel
- LPSM, CNRS UMR 8001, 4, Place Jussieu, 75005, Paris, France
| | - Olivier Bouaziz
- MAP5, CNRS UMR 8145, 45, rue des Saints-Pères, 75006, Paris, France
| |
Collapse
|
5
|
Abstract
This paper aims to solve the problem of fitting a nonparametric regression function with right-censored data. In general, issues of censorship in the response variable are solved by synthetic data transformation based on the Kaplan–Meier estimator in the literature. In the context of synthetic data, there have been different studies on the estimation of right-censored nonparametric regression models based on smoothing splines, regression splines, kernel smoothing, local polynomials, and so on. It should be emphasized that synthetic data transformation manipulates the observations because it assigns zero values to censored data points and increases the size of the observations. Thus, an irregularly distributed dataset is obtained. We claim that adaptive spline (A-spline) regression has the potential to deal with this irregular dataset more easily than the smoothing techniques mentioned here, due to the freedom to determine the degree of the spline, as well as the number and location of the knots. The theoretical properties of A-splines with synthetic data are detailed in this paper. Additionally, we support our claim with numerical studies, including a simulation study and a real-world data example.
Collapse
|
6
|
Wit EC, Augugliaro L, Pazira H, González J, Abegaz F. Sparse relative risk regression models. Biostatistics 2020; 21:e131-e147. [PMID: 30380025 PMCID: PMC7868056 DOI: 10.1093/biostatistics/kxy060] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2017] [Revised: 09/20/2018] [Accepted: 09/24/2018] [Indexed: 11/15/2022] Open
Abstract
Clinical studies where patients are routinely screened for many genomic features are becoming more routine. In principle, this holds the promise of being able to find genomic signatures for a particular disease. In particular, cancer survival is thought to be closely linked to the genomic constitution of the tumor. Discovering such signatures will be useful in the diagnosis of the patient, may be used for treatment decisions and, perhaps, even the development of new treatments. However, genomic data are typically noisy and high-dimensional, not rarely outstripping the number of patients included in the study. Regularized survival models have been proposed to deal with such scenarios. These methods typically induce sparsity by means of a coincidental match of the geometry of the convex likelihood and a (near) non-convex regularizer. The disadvantages of such methods are that they are typically non-invariant to scale changes of the covariates, they struggle with highly correlated covariates, and they have a practical problem of determining the amount of regularization. In this article, we propose an extension of the differential geometric least angle regression method for sparse inference in relative risk regression models. A software implementation of our method is available on github (https://github.com/LuigiAugugliaro/dgcox).
Collapse
Affiliation(s)
- Ernst C Wit
- Institute of Computational Science, USI, Via Buffi 13, Lugano, Switzerland
| | - Luigi Augugliaro
- Department of Economics, Business and Statistics, University of Palermo, Building 13, Viale delle Scienze, Palermo, Italy
| | - Hassan Pazira
- Bernoulli Institute, University of Groningen, Nijenborg 9, AG Groningen, The Netherlands
| | - Javier González
- Amazon Research Cambridge, Poseidon House, Castle Park, Cambridge, UK
| | - Fentaw Abegaz
- Bernoulli Institute, University of Groningen, Nijenborg 9, AG Groningen, The Netherlands
- Department of Pediatrics and Systems Biology Centre for Energy Metabolism and Ageing, University of Groningen, University Medical Center Groningen, AD Groningen, The Netherlands
| |
Collapse
|
7
|
Wirsik N, Otto-Sobotka F, Pigeot I. Modeling physical activity data using L 0 -penalized expectile regression. Biom J 2019; 61:1371-1384. [PMID: 31172553 DOI: 10.1002/bimj.201800007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2018] [Revised: 12/26/2018] [Accepted: 01/09/2019] [Indexed: 11/11/2022]
Abstract
In recent years accelerometers have become widely used to objectively assess physical activity. Usually intensity ranges are assigned to the measured accelerometer counts by simple cut points, disregarding the underlying activity pattern. Under the assumption that physical activity can be seen as distinct sequence of distinguishable activities, the use of hidden Markov models (HMM) has been proposed to improve the modeling of accelerometer data. As further improvement we propose to use expectile regression utilizing a Whittaker smoother with an L0 -penalty to better capture the intensity levels underlying the observed counts. Different expectile asymmetries beyond the mean allow the distinction of monotonous and more variable activities as expectiles effectively model the complete distribution of the counts. This new approach is investigated in a simulation study, where we simulated 1,000 days of accelerometer data with 1 and 5 s epochs, based on collected labeled data to resemble real-life data as closely as possible. The expectile regression is compared to HMMs and the commonly used cut point method with regard to misclassification rate, number of identified bouts and identified levels as well as the proportion of the estimate being in the range of ± 10 % of the true activity level. In summary, expectile regression utilizing a Whittaker smoother with an L0 -penalty outperforms HMMs and the cut point method and is hence a promising approach to model accelerometer data.
Collapse
Affiliation(s)
- Norman Wirsik
- Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
| | - Fabian Otto-Sobotka
- School of Medicine and Health Sciences, Carl von Ossietzky University Oldenburg, Oldenburg, Germany
| | - Iris Pigeot
- Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany.,Faculty of Mathematics and Computer Science, University of Bremen, Bremen, Germany
| |
Collapse
|
8
|
Wit EC. Big data and biostatistics: The death of the asymptotic Valhalla. Stat Probab Lett 2018. [DOI: 10.1016/j.spl.2018.02.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
9
|
Fasola S, Muggeo VMR, Küchenhoff H. A heuristic, iterative algorithm for change-point detection in abrupt change models. Comput Stat 2017. [DOI: 10.1007/s00180-017-0740-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
10
|
Frommlet F, Nuel G. An Adaptive Ridge Procedure for L0 Regularization. PLoS One 2016; 11:e0148620. [PMID: 26849123 PMCID: PMC4743917 DOI: 10.1371/journal.pone.0148620] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Accepted: 01/21/2016] [Indexed: 11/18/2022] Open
Abstract
Penalized selection criteria like AIC or BIC are among the most popular methods for variable selection. Their theoretical properties have been studied intensively and are well understood, but making use of them in case of high-dimensional data is difficult due to the non-convex optimization problem induced by L0 penalties. In this paper we introduce an adaptive ridge procedure (AR), where iteratively weighted ridge problems are solved whose weights are updated in such a way that the procedure converges towards selection with L0 penalties. After introducing AR its specific shrinkage properties are studied in the particular case of orthogonal linear regression. Based on extensive simulations for the non-orthogonal case as well as for Poisson regression the performance of AR is studied and compared with SCAD and adaptive LASSO. Furthermore an efficient implementation of AR in the context of least-squares segmentation is presented. The paper ends with an illustrative example of applying AR to analyze GWAS data.
Collapse
Affiliation(s)
- Florian Frommlet
- Department of Medical Statistics (CEMSIIS), Medical University of Vienna, Spitalgasse 23, A-1090 Vienna, Austria
| | - Grégory Nuel
- National Institute for Mathematical Sciences (INSMI), CNRS, Stochastics and Biology Group (PSB), LPMA UMR CNRS 7599, Université Pierre et Marie Curie, 4 place Jussieu, 75005 Paris, France
| |
Collapse
|
11
|
Briollais L, Durrieu G. Application of quantile regression to recent genetic and -omic studies. Hum Genet 2014; 133:951-66. [DOI: 10.1007/s00439-014-1440-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2013] [Accepted: 03/10/2014] [Indexed: 12/01/2022]
|
12
|
Smith ADAC. Quadratic Programming and Penalized Regression. COMMUN STAT-THEOR M 2013. [DOI: 10.1080/03610926.2012.732177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|