1
|
Liu Y, Luo S. Feature selection in ultrahigh-dimensional additive models with heterogenous frequency component functions. J Stat Plan Inference 2023. [DOI: 10.1016/j.jspi.2023.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
2
|
He S, He K, Huang JZ. Improved Estimation of High-dimensional Additive Models Using Subspace Learning. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2034638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Shiyuan He
- Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China, Beijing 100872, China
| | - Kejun He
- Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China, Beijing 100872, China
| | - Jianhua Z. Huang
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| |
Collapse
|
3
|
Phillips MA, Arnold KR, Vue Z, Beasley HK, Garza-Lopez E, Marshall AG, Morton DJ, McReynolds MR, Barter TT, Hinton A. Combining Metabolomics and Experimental Evolution Reveals Key Mechanisms Underlying Longevity Differences in Laboratory Evolved Drosophila melanogaster Populations. Int J Mol Sci 2022; 23:1067. [PMID: 35162994 PMCID: PMC8835531 DOI: 10.3390/ijms23031067] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 01/07/2022] [Accepted: 01/11/2022] [Indexed: 12/22/2022] Open
Abstract
Experimental evolution with Drosophila melanogaster has been used extensively for decades to study aging and longevity. In recent years, the addition of DNA and RNA sequencing to this framework has allowed researchers to leverage the statistical power inherent to experimental evolution to study the genetic basis of longevity itself. Here, we incorporated metabolomic data into to this framework to generate even deeper insights into the physiological and genetic mechanisms underlying longevity differences in three groups of experimentally evolved D. melanogaster populations with different aging and longevity patterns. Our metabolomic analysis found that aging alters mitochondrial metabolism through increased consumption of NAD+ and increased usage of the TCA cycle. Combining our genomic and metabolomic data produced a list of biologically relevant candidate genes. Among these candidates, we found significant enrichment for genes and pathways associated with neurological development and function, and carbohydrate metabolism. While we do not explicitly find enrichment for aging canonical genes, neurological dysregulation and carbohydrate metabolism are both known to be associated with accelerated aging and reduced longevity. Taken together, our results provide plausible genetic mechanisms for what might be driving longevity differences in this experimental system. More broadly, our findings demonstrate the value of combining multiple types of omic data with experimental evolution when attempting to dissect mechanisms underlying complex and highly polygenic traits such as aging.
Collapse
Affiliation(s)
- Mark A. Phillips
- Department of Integrative Biology, Oregon State University, Corvallis, OR 97331, USA;
| | - Kenneth R. Arnold
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697, USA; (K.R.A.); (T.T.B.)
| | - Zer Vue
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37232, USA; (Z.V.); (H.K.B.); (A.G.M.)
| | - Heather K. Beasley
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37232, USA; (Z.V.); (H.K.B.); (A.G.M.)
- Department of Biochemistry, Cancer Biology, Neuroscience, and Pharmacology, Meharry Medical College, Nashville, TN 37208, USA
| | - Edgar Garza-Lopez
- Hinton and Garza-Lopez Family Consulting Company, Iowa City, IA 52246, USA;
| | - Andrea G. Marshall
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37232, USA; (Z.V.); (H.K.B.); (A.G.M.)
| | - Derrick J. Morton
- Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA;
| | - Melanie R. McReynolds
- Department of Biochemistry and Molecular Biology, Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA;
| | - Thomas T. Barter
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697, USA; (K.R.A.); (T.T.B.)
| | - Antentor Hinton
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37232, USA; (Z.V.); (H.K.B.); (A.G.M.)
- Hinton and Garza-Lopez Family Consulting Company, Iowa City, IA 52246, USA;
| |
Collapse
|
4
|
Rudin C, Chen C, Chen Z, Huang H, Semenova L, Zhong C. Interpretable machine learning: Fundamental principles and 10 grand challenges. STATISTICS SURVEYS 2022. [DOI: 10.1214/21-ss133] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
5
|
Haris A, Simon N, Shojaie A. Generalized Sparse Additive Models. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2022; 23:70. [PMID: 37873545 PMCID: PMC10593424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
We present a unified framework for estimation and analysis of generalized additive models in high dimensions. The framework defines a large class of penalized regression estimators, encompassing many existing methods. An efficient computational algorithm for this class is presented that easily scales to thousands of observations and features. We prove minimax optimal convergence bounds for this class under a weak compatibility condition. In addition, we characterize the rate of convergence when this compatibility condition is not met. Finally, we also show that the optimal penalty parameters for structure and sparsity penalties in our framework are linked, allowing cross-validation to be conducted over only a single tuning parameter. We complement our theoretical results with empirical studies comparing some existing methods within this framework.
Collapse
Affiliation(s)
- Asad Haris
- Department of Earth, Ocean and Atmospheric Sciences, University of British Columbia, 2020 - 2207 Main Mall, Vancouver, BC, Canada V6T 1Z4
| | - Noah Simon
- Department of Biostatistics, University of Washington, Seattle, WA 98195-7232, USA
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, WA 98195-7232, USA
| |
Collapse
|
6
|
Yang T, Tan Z. Hierarchical Total Variations and Doubly Penalized ANOVA Modeling for Multivariate Nonparametric Regression. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.1923513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
| | - Zhiqiang Tan
- Department of Statistics, Rutgers University, Piscataway, NJ
| |
Collapse
|
7
|
Tay JK, Tibshirani R. Reluctant Generalised Additive Modelling. Int Stat Rev 2020; 88:S205-S224. [PMID: 36062079 PMCID: PMC9435322 DOI: 10.1111/insr.12429] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 10/25/2020] [Indexed: 09/04/2024]
Abstract
Sparse generalised additive models (GAMs) are an extension of sparse generalised linear models that allow a model's prediction to vary non-linearly with an input variable. This enables the data analyst build more accurate models, especially when the linearity assumption is known to be a poor approximation of reality. Motivated by reluctant interaction modelling, we propose a multi-stage algorithm, called reluctant generalised additive modelling (RGAM), that can fit sparse GAMs at scale. It is guided by the principle that, if all else is equal, one should prefer a linear feature over a non-linear feature. Unlike existing methods for sparse GAMs, RGAM can be extended easily to binary, count and survival data. We demonstrate the method's effectiveness on real and simulated examples.
Collapse
Affiliation(s)
- J Kenneth Tay
- Department of Statistics, Stanford University, Stanford, California, USA
| | - Robert Tibshirani
- Department of Statistics, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| |
Collapse
|
8
|
Yuan P, You X, Chen H, Peng Q, Zhao Y, Xu Z, Jing XY, He Z. Group sparse additive machine with average top-k loss. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
9
|
Madrid Padilla OH, Sharpnack J, Chen Y, Witten DM. Adaptive nonparametric regression with the K-nearest neighbour fused lasso. Biometrika 2020; 107:293-310. [PMID: 32454528 DOI: 10.1093/biomet/asz071] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Indexed: 11/12/2022] Open
Abstract
The fused lasso, also known as total-variation denoising, is a locally adaptive function estimator over a regular grid of design points. In this article, we extend the fused lasso to settings in which the points do not occur on a regular grid, leading to a method for nonparametric regression. This approach, which we call the [Formula: see text]-nearest-neighbours fused lasso, involves computing the [Formula: see text]-nearest-neighbours graph of the design points and then performing the fused lasso over this graph. We show that this procedure has a number of theoretical advantages over competing methods: specifically, it inherits local adaptivity from its connection to the fused lasso, and it inherits manifold adaptivity from its connection to the [Formula: see text]-nearest-neighbours approach. In a simulation study and an application to flu data, we show that excellent results are obtained. For completeness, we also study an estimator that makes use of an [Formula: see text]-graph rather than a [Formula: see text]-nearest-neighbours graph and contrast it with the [Formula: see text]-nearest-neighbours fused lasso.
Collapse
Affiliation(s)
| | - James Sharpnack
- Department of Statistics, University of California, One Shields Avenue, Davis, California, U.S.A
| | - Yanzhen Chen
- Department of Information Systems, Business Statistics and Operations Management, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Daniela M Witten
- Department of Statistics, University of Washington, Seattle, Washington, U.S.A
| |
Collapse
|
10
|
|
11
|
Tan Z, Zhang CH. Doubly penalized estimation in additive regression with high-dimensional data. Ann Stat 2019. [DOI: 10.1214/18-aos1757] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
12
|
Haris A, Shojaie A, Simon N. Nonparametric regression with adaptive truncation via a convex hierarchical penalty. Biometrika 2019; 106:87-107. [PMID: 31427821 DOI: 10.1093/biomet/asy056] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2017] [Indexed: 11/13/2022] Open
Abstract
We consider the problem of nonparametric regression with a potentially large number of covariates. We propose a convex, penalized estimation framework that is particularly well suited to high-dimensional sparse additive models and combines the appealing features of finite basis representation and smoothing penalties. In the case of additive models, a finite basis representation provides a parsimonious representation for fitted functions but is not adaptive when component functions possess different levels of complexity. In contrast, a smoothing spline-type penalty on the component functions is adaptive but does not provide a parsimonious representation. Our proposal simultaneously achieves parsimony and adaptivity in a computationally efficient way. We demonstrate these properties through empirical studies and show that our estimator converges at the minimax rate for functions within a hierarchical class. We further establish minimax rates for a large class of sparse additive models. We also develop an efficient algorithm that scales similarly to the lasso with the number of covariates and sample size.
Collapse
Affiliation(s)
- Asad Haris
- Department of Biostatistics, University of Washington, 1705 NE Pacific Street, Seattle, Washington, USA
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, 1705 NE Pacific Street, Seattle, Washington, USA
| | - Noah Simon
- Department of Biostatistics, University of Washington, 1705 NE Pacific Street, Seattle, Washington, USA
| |
Collapse
|
13
|
Drosophila transcriptomics with and without ageing. Biogerontology 2019; 20:699-710. [PMID: 31317291 DOI: 10.1007/s10522-019-09823-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Accepted: 07/10/2019] [Indexed: 01/21/2023]
Abstract
The genomic basis of ageing still remains unknown despite being a topic of study for many years. Here, we present data from 20 experimentally evolved laboratory populations of Drosophila melanogaster that have undergone two different life-history selection regimes. One set of ten populations demonstrates early ageing whereas the other set of ten populations shows postponed ageing. Additionally, both types of populations consist of five long standing populations and five recently derived populations. Our primary goal was to determine which genes exhibit changes in expression levels by comparing the female transcriptome of the two population sets at two different time points. Using three different sets of increasingly restrictive criteria, we found that 2.1-15.7% (82-629 genes) of the expressed genes are associated with differential ageing between population sets. Conversely, a comparison of recently derived populations to long-standing populations reveals little to no transcriptome differentiation, suggesting that the recent selection regime has had a larger impact on the transcriptome than its more distant evolutionary history. In addition, we found very little evidence for significant enrichment for functional attributes regardless of the set of criteria used. Relative to previous ageing studies, we find little overlap with other lists of aging related genes. The disparity between our results and previously published results is likely due to the high replication used in this study coupled with our use of highly differentiated populations. Our results reinforce the notion that the use of genomic, transcriptomic, and phenotypic data to uncover the genetic basis of a complex trait like ageing can benefit from experimental designs that use highly replicated, experimentally-evolved populations.
Collapse
|
14
|
Mueller LD, Phillips MA, Barter TT, Greenspan ZS, Rose MR. Genome-Wide Mapping of Gene-Phenotype Relationships in Experimentally Evolved Populations. Mol Biol Evol 2019; 35:2085-2095. [PMID: 29860403 DOI: 10.1093/molbev/msy113] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Model organisms subjected to sustained experimental evolution often show levels of phenotypic differentiation that dramatically exceed the phenotypic differences observed in natural populations. Genome-wide sequencing of pooled populations then offers the opportunity to make inferences about the genes that are the cause of these phenotypic differences. We tested, through computer simulations, the efficacy of a statistical learning technique called the "fused lasso additive model" (FLAM). We focused on the ability of FLAM to distinguish between genes which are differentiated and directly affect a phenotype from differentiated genes which have no effect on the phenotype. FLAM can separate these two classes of genes even with relatively small samples (10 populations, in total). The efficacy of FLAM is improved with increased number of populations, reduced environmental phenotypic variation, and increased within-treatment among-replicate variation. FLAM was applied to SNP variation measured in both twenty-population and thirty-population studies of Drosophila subjected to selection for age-at-reproduction, to illustrate the application of the method.
Collapse
Affiliation(s)
- Laurence D Mueller
- Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA
| | - Mark A Phillips
- Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA
| | - Thomas T Barter
- Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA
| | - Zachary S Greenspan
- Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA
| | - Michael R Rose
- Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA
| |
Collapse
|
15
|
Chen X, Lin Q, Sen B. On Degrees of Freedom of Projection Estimators With Applications to Multivariate Nonparametric Regression. J Am Stat Assoc 2019. [DOI: 10.1080/01621459.2018.1537917] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Xi Chen
- Stern School of Business, New York University, New York, NY
| | - Qihang Lin
- Tippie College of Business, University of Iowa, Iowa City, IA
| | | |
Collapse
|
16
|
Petersen A, Witten D. Data-adaptive additive modeling. Stat Med 2019; 38:583-600. [PMID: 30010200 PMCID: PMC6335202 DOI: 10.1002/sim.7859] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Revised: 05/18/2018] [Accepted: 06/05/2018] [Indexed: 11/10/2022]
Abstract
In this paper, we consider fitting a flexible and interpretable additive regression model in a data-rich setting. We wish to avoid pre-specifying the functional form of the conditional association between each covariate and the response, while still retaining interpretability of the fitted functions. A number of recent proposals in the literature for nonparametric additive modeling are data adaptive, in the sense that they can adjust the level of flexibility in the functional fits to the data at hand. For instance, the sparse additive model makes it possible to adaptively determine which features should be included in the fitted model, the sparse partially linear additive model allows each feature in the fitted model to take either a linear or a nonlinear functional form, and the recent fused lasso additive model and additive trend filtering proposals allow the knots in each nonlinear function fit to be selected from the data. In this paper, we combine the strengths of each of these recent proposals into a single proposal that uses the data to determine which features to include in the model, whether to model each feature linearly or nonlinearly, and what form to use for the nonlinear functions. We establish connections between our approach and recent proposals from the literature, and we demonstrate its strengths in a simulation study.
Collapse
Affiliation(s)
- Ashley Petersen
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| | - Daniela Witten
- Departments of Biostatistics and Statistics, University of Washington, Seattle, WA, USA
| |
Collapse
|
17
|
Abstract
As datasets continue to increase in size, there is growing interest in methods for prediction that are both Received January 2018 flexible and interpretable. A flurry of recent work on this topic has focused on additive modeling in the Revised February 2019 regression setting, and in particular, on the use of data-adaptive nonlinear functions that can be used to flexibly model each covariate's effect, conditional on the other features in the model. In this article, we extend this recent line of work to the survival setting. We develop an additive Cox proportional hazards model, in which each additive function is obtained by trend filtering, so that the fitted functions are piece-wise polynomial with adaptively chosen knots. An efficient proximal gradient descent algorithm is used to fit the model. We demonstrate its performance in simulations and in application to a primary biliary cirrhosis data set, as well as a dataset consisting of time to publication for clinical trials in the biomedical literature. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Jiacheng Wu
- Department of Biostatistics, University of Washington, Seattle, WA
| | - Daniela Witten
- Department of Biostatistics, University of Washington, Seattle, WA.,Department of Statistics, University of Washington, Seattle, WA
| |
Collapse
|
18
|
Tan KM, Wang Z, Liu H, Zhang T. Sparse generalized eigenvalue problem: optimal statistical rates via truncated Rayleigh flow. J R Stat Soc Series B Stat Methodol 2018. [DOI: 10.1111/rssb.12291] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
| | | | - Han Liu
- Northwestern University; Evanston USA
| | - Tong Zhang
- Tencent Technology Shenzhen; People's Republic of China
| |
Collapse
|
19
|
Yang T, Tan Z. Backfitting algorithms for total-variation and empirical-norm penalized additive modelling with high-dimensional data. Stat (Int Stat Inst) 2018. [DOI: 10.1002/sta4.198] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Ting Yang
- Department of Statistics; Rutgers University; Piscataway 08854 NJ USA
| | - Zhiqiang Tan
- Department of Statistics; Rutgers University; Piscataway 08854 NJ USA
| |
Collapse
|
20
|
Affiliation(s)
| | - Thomas Nagler
- Lehrstuhl für Mathematische Statistik, Technische Universität München, Boltzmannstraße 3, 85748 Garching b. München, Germany
| |
Collapse
|
21
|
Boyd N, Hastie T, Boyd S, Recht B, Jordan MI. Saturating Splines and Feature Selection. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2018; 18:197. [PMID: 31007630 PMCID: PMC6474379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We extend the adaptive regression spline model by incorporating saturation, the natural requirement that a function extend as a constant outside a certain range. We fit saturating splines to data via a convex optimization problem over a space of measures, which we solve using an efficient algorithm based on the conditional gradient method. Unlike many existing approaches, our algorithm solves the original infinite-dimensional (for splines of degree at least two) optimization problem without pre-specified knot locations. We then adapt our algorithm to fit generalized additive models with saturating splines as coordinate functions and show that the saturation requirement allows our model to simultaneously perform feature selection and nonlinear function fitting. Finally, we briefly sketch how the method can be extended to higher order splines and to different requirements on the extension outside the data range.
Collapse
Affiliation(s)
- Nicholas Boyd
- Department of Statistics, University of California, Berkeley, CA 94720-1776, USA,
| | - Trevor Hastie
- Department of Statistics, Stanford University, Stanford, CA 94305, USA,
| | - Stephen Boyd
- Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA,
| | - Benjamin Recht
- Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94720-1776, USA,
| | - Michael I Jordan
- Division of Computer Science and Department of Statistics, University of California, Berkeley, CA 94720-1776, USA,
| |
Collapse
|
22
|
Segal BD, Elliott MR, Braun T, Jiang H. P-splines with an $\ell_{1}$ penalty for repeated measures. Electron J Stat 2018. [DOI: 10.1214/18-ejs1487] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
23
|
|
24
|
Petersen A, Simon N, Witten D. Convex Regression with Interpretable Sharp Partitions. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2016; 17:94. [PMID: 27635120 PMCID: PMC5021451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We consider the problem of predicting an outcome variable on the basis of a small number of covariates, using an interpretable yet non-additive model. We propose convex regression with interpretable sharp partitions (CRISP) for this task. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set.
Collapse
Affiliation(s)
- Ashley Petersen
- Department of Biostatistics, University of Washington, Seattle, WA 98195
| | - Noah Simon
- Department of Biostatistics, University of Washington, Seattle, WA 98195
| | - Daniela Witten
- Departments of Biostatistics and Statistics, University of Washington, Seattle, WA 98195
| |
Collapse
|