201
|
Howe LD, Smith AD, Macdonald-Wallis C, Anderson EL, Galobardes B, Lawlor DA, Ben-Shlomo Y, Hardy R, Cooper R, Tilling K, Fraser A. Relationship between mediation analysis and the structured life course approach. Int J Epidemiol 2016; 45:1280-1294. [PMID: 27681097 PMCID: PMC5841634 DOI: 10.1093/ije/dyw254] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/10/2016] [Indexed: 11/12/2022] Open
Abstract
Many questions in life course epidemiology involve mediation and/or interaction because of the long latency period between exposures and outcomes. In this paper, we explore how mediation analysis (based on counterfactual theory and implemented using conventional regression approaches) links with a structured approach to selecting life course hypotheses. Using theory and simulated data, we show how the alternative life course hypotheses assessed in the structured life course approach correspond to different combinations of mediation and interaction parameters. For example, an early life critical period model corresponds to a direct effect of the early life exposure, but no indirect effect via the mediator and no interaction between the early life exposure and the mediator. We also compare these methods using an illustrative real-data example using data on parental occupational social class (early life exposure), own adult occupational social class (mediator) and physical capability (outcome).
Collapse
Affiliation(s)
- Laura D Howe
- MRC Integrative Epidemiology Unit, .,School of Social and Community Medicine, University of Bristol, Bristol, UK and
| | - Andrew D Smith
- MRC Integrative Epidemiology Unit.,School of Social and Community Medicine, University of Bristol, Bristol, UK and
| | - Corrie Macdonald-Wallis
- MRC Integrative Epidemiology Unit.,School of Social and Community Medicine, University of Bristol, Bristol, UK and
| | - Emma L Anderson
- MRC Integrative Epidemiology Unit.,School of Social and Community Medicine, University of Bristol, Bristol, UK and
| | - Bruna Galobardes
- School of Social and Community Medicine, University of Bristol, Bristol, UK and
| | - Debbie A Lawlor
- MRC Integrative Epidemiology Unit.,School of Social and Community Medicine, University of Bristol, Bristol, UK and
| | - Yoav Ben-Shlomo
- School of Social and Community Medicine, University of Bristol, Bristol, UK and
| | - Rebecca Hardy
- MRC Unit for Lifelong Health and Ageing at UCL, University College London, London, UK
| | - Rachel Cooper
- MRC Unit for Lifelong Health and Ageing at UCL, University College London, London, UK
| | - Kate Tilling
- MRC Integrative Epidemiology Unit.,School of Social and Community Medicine, University of Bristol, Bristol, UK and
| | - Abigail Fraser
- MRC Integrative Epidemiology Unit, .,School of Social and Community Medicine, University of Bristol, Bristol, UK and
| |
Collapse
|
202
|
Janson L, Barber RF, Candès E. EigenPrism: inference for high dimensional signal-to-noise ratios. J R Stat Soc Series B Stat Methodol 2016; 79:1037-1065. [PMID: 29104447 DOI: 10.1111/rssb.12203] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Consider the following three important problems in statistical inference, namely, constructing confidence intervals for (1) the error of a high-dimensional (p > n) regression estimator, (2) the linear regression noise level, and (3) the genetic signal-to-noise ratio of a continuous-valued trait (related to the heritability). All three problems turn out to be closely related to the little-studied problem of performing inference on the [Formula: see text]-norm of the signal in high-dimensional linear regression. We derive a novel procedure for this, which is asymptotically correct when the covariates are multivariate Gaussian and produces valid confidence intervals in finite samples as well. The procedure, called EigenPrism, is computationally fast and makes no assumptions on coefficient sparsity or knowledge of the noise level. We investigate the width of the EigenPrism confidence intervals, including a comparison with a Bayesian setting in which our interval is just 5% wider than the Bayes credible interval. We are then able to unify the three aforementioned problems by showing that the EigenPrism procedure with only minor modifications is able to make important contributions to all three. We also investigate the robustness of coverage and find that the method applies in practice and in finite samples much more widely than just the case of multivariate Gaussian covariates. Finally, we apply EigenPrism to a genetic dataset to estimate the genetic signal-to-noise ratio for a number of continuous phenotypes.
Collapse
|
203
|
|
204
|
Predicting clinical outcome from reward circuitry function and white matter structure in behaviorally and emotionally dysregulated youth. Mol Psychiatry 2016; 21:1194-201. [PMID: 26903272 PMCID: PMC4993633 DOI: 10.1038/mp.2016.5] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Revised: 10/09/2015] [Accepted: 12/02/2015] [Indexed: 12/20/2022]
Abstract
Behavioral and emotional dysregulation in childhood may be understood as prodromal to adult psychopathology. Additionally, there is a critical need to identify biomarkers reflecting underlying neuropathological processes that predict clinical/behavioral outcomes in youth. We aimed to identify such biomarkers in youth with behavioral and emotional dysregulation in the Longitudinal Assessment of Manic Symptoms (LAMS) study. We examined neuroimaging measures of function and white matter in the whole brain using 80 youth aged 14.0 (s.d.=2.0) from three clinical sites. Linear regression using the LASSO (Least Absolute Shrinkage and Selection Operator) method for variable selection was used to predict severity of future behavioral and emotional dysregulation measured by the Parent General Behavior Inventory-10 Item Mania Scale (PGBI-10M)) at a mean of 14.2 months follow-up after neuroimaging assessment. Neuroimaging measures, together with near-scan PGBI-10M, a score of manic behaviors, depressive behaviors and sex, explained 28% of the variance in follow-up PGBI-10M. Neuroimaging measures alone, after accounting for other identified predictors, explained ~1/3 of the explained variance, in follow-up PGBI-10M. Specifically, greater bilateral cingulum length predicted lower PGBI-10M at follow-up. Greater functional connectivity in parietal-subcortical reward circuitry predicted greater PGBI-10M at follow-up. For the first time, data suggest that multimodal neuroimaging measures of underlying neuropathologic processes account for over a third of the explained variance in clinical outcome in a large sample of behaviorally and emotionally dysregulated youth. This may be an important first step toward identifying neurobiological measures with the potential to act as novel targets for early detection and future therapeutic interventions.
Collapse
|
205
|
Abstract
A model is presented for the supervised learning problem where the observations come from a fixed number of pre-specified groups, and the regression coefficients may vary sparsely between groups. The model spans the continuum between individual models for each group and one model for all groups. The resulting algorithm is designed with a high dimensional framework in mind. The approach is applied to a sentiment analysis dataset to show its efficacy and interpretability. One particularly useful application is for finding sub-populations in a randomized trial for which an intervention (treatment) is beneficial, often called the uplift problem. Some new concepts are introduced that are useful for uplift analysis. The value is demonstrated in an application to a real world credit card promotion dataset. In this example, although sending the promotion has a very small average effect, by targeting a particular subgroup with the promotion one can obtain a 15% increase in the proportion of people who purchase the new credit card.
Collapse
Affiliation(s)
- Samuel M. Gross
- Nuna, 650 Townsend St, San Francisco, CA
- Department of Statistics, Stanford University, Stanford, CA
| | | |
Collapse
|
206
|
Erdem C, Nagle AM, Casa AJ, Litzenburger BC, Wang YF, Taylor DL, Lee AV, Lezon TR. Proteomic Screening and Lasso Regression Reveal Differential Signaling in Insulin and Insulin-like Growth Factor I (IGF1) Pathways. Mol Cell Proteomics 2016; 15:3045-57. [PMID: 27364358 PMCID: PMC5013316 DOI: 10.1074/mcp.m115.057729] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2015] [Revised: 06/23/2016] [Indexed: 01/22/2023] Open
Abstract
Insulin and insulin-like growth factor I (IGF1) influence cancer risk and progression through poorly understood mechanisms. To better understand the roles of insulin and IGF1 signaling in breast cancer, we combined proteomic screening with computational network inference to uncover differences in IGF1 and insulin induced signaling. Using reverse phase protein array, we measured the levels of 134 proteins in 21 breast cancer cell lines stimulated with IGF1 or insulin for up to 48 h. We then constructed directed protein expression networks using three separate methods: (i) lasso regression, (ii) conventional matrix inversion, and (iii) entropy maximization. These networks, named here as the time translation models, were analyzed and the inferred interactions were ranked by differential magnitude to identify pathway differences. The two top candidates, chosen for experimental validation, were shown to regulate IGF1/insulin induced phosphorylation events. First, acetyl-CoA carboxylase (ACC) knock-down was shown to increase the level of mitogen-activated protein kinase (MAPK) phosphorylation. Second, stable knock-down of E-Cadherin increased the phospho-Akt protein levels. Both of the knock-down perturbations incurred phosphorylation responses stronger in IGF1 stimulated cells compared with insulin. Overall, the time-translation modeling coupled to wet-lab experiments has proven to be powerful in inferring differential interactions downstream of IGF1 and insulin signaling, in vitro.
Collapse
Affiliation(s)
- Cemal Erdem
- From the ‡Department of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania; §University of Pittsburgh Drug Discovery Institute, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Alison M Nagle
- ¶Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania; ‖Women's Cancer Research Center, University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania
| | - Angelo J Casa
- **Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas
| | - Beate C Litzenburger
- **Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas
| | - Yu-Fen Wang
- **Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas
| | - D Lansing Taylor
- From the ‡Department of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania; §University of Pittsburgh Drug Discovery Institute, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Adrian V Lee
- ¶Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania; ‖Women's Cancer Research Center, University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania; ‡‡Department of Human Genetics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Timothy R Lezon
- From the ‡Department of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania; §University of Pittsburgh Drug Discovery Institute, University of Pittsburgh, Pittsburgh, Pennsylvania;
| |
Collapse
|
207
|
Tibshirani RJ, Taylor J, Lockhart R, Tibshirani R. Exact Post-Selection Inference for Sequential Regression Procedures. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2015.1108848] [Citation(s) in RCA: 86] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
208
|
Lipkovich I, Dmitrienko A, B R. Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials. Stat Med 2016; 36:136-196. [PMID: 27488683 DOI: 10.1002/sim.7064] [Citation(s) in RCA: 160] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Revised: 06/23/2016] [Accepted: 07/05/2016] [Indexed: 02/05/2023]
Abstract
It is well known that both the direction and magnitude of the treatment effect in clinical trials are often affected by baseline patient characteristics (generally referred to as biomarkers). Characterization of treatment effect heterogeneity plays a central role in the field of personalized medicine and facilitates the development of tailored therapies. This tutorial focuses on a general class of problems arising in data-driven subgroup analysis, namely, identification of biomarkers with strong predictive properties and patient subgroups with desirable characteristics such as improved benefit and/or safety. Limitations of ad-hoc approaches to biomarker exploration and subgroup identification in clinical trials are discussed, and the ad-hoc approaches are contrasted with principled approaches to exploratory subgroup analysis based on recent advances in machine learning and data mining. A general framework for evaluating predictive biomarkers and identification of associated subgroups is introduced. The tutorial provides a review of a broad class of statistical methods used in subgroup discovery, including global outcome modeling methods, global treatment effect modeling methods, optimal treatment regimes, and local modeling methods. Commonly used subgroup identification methods are illustrated using two case studies based on clinical trials with binary and survival endpoints. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
| | | | - Ralph B
- Boston University, Boston, MA, U.S.A
| |
Collapse
|
209
|
Rubanovich AV, Khromov-Borisov NN. Genetic risk assessment of the joint effect of several genes: Critical appraisal. RUSS J GENET+ 2016. [DOI: 10.1134/s1022795416070073] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
210
|
Abram SV, Helwig NE, Moodie CA, DeYoung CG, MacDonald AW, Waller NG. Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data. Front Neurosci 2016; 10:344. [PMID: 27516732 PMCID: PMC4964314 DOI: 10.3389/fnins.2016.00344] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 07/08/2016] [Indexed: 11/13/2022] Open
Abstract
Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks.
Collapse
Affiliation(s)
- Samantha V Abram
- Department of Psychology, University of Minnesota Minneapolis, MN, USA
| | - Nathaniel E Helwig
- Department of Psychology, University of MinnesotaMinneapolis, MN, USA; School of Statistics, University of MinnesotaMinneapolis, MN, USA
| | - Craig A Moodie
- Department of Psychology, Stanford University Stanford, CA, USA
| | - Colin G DeYoung
- Department of Psychology, University of Minnesota Minneapolis, MN, USA
| | - Angus W MacDonald
- Department of Psychology, University of MinnesotaMinneapolis, MN, USA; Department of Psychiatry, University of MinnesotaMinneapolis, MN, USA
| | - Niels G Waller
- Department of Psychology, University of Minnesota Minneapolis, MN, USA
| |
Collapse
|
211
|
Gueuning T, Claeskens G. Confidence intervals for high-dimensional partially linear single-index models. J MULTIVARIATE ANAL 2016. [DOI: 10.1016/j.jmva.2016.03.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
212
|
Smith AD, Hardy R, Heron J, Joinson CJ, Lawlor DA, Macdonald-Wallis C, Tilling K. A structured approach to hypotheses involving continuous exposures over the life course. Int J Epidemiol 2016; 45:1271-1279. [PMID: 27371628 PMCID: PMC5841633 DOI: 10.1093/ije/dyw164] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2016] [Indexed: 11/22/2022] Open
Abstract
Background: Epidemiologists are often interested in examining different hypotheses for how exposures measured repeatedly over the life course relate to later-life outcomes. A structured approach for selecting the hypotheses most supported by theory and observed data has been developed for binary exposures. The aim of this paper is to extend this to include continuous exposures and allow for confounding and missing data. Methods: We studied two examples, the association between: (i) maternal weight during pregnancy and birthweight; and (ii) stressful family events throughout childhood and depression in adolescence. In each example we considered several plausible hypotheses including accumulation, critical periods, sensitive periods, change and effect modification. We used least angle regression to select the hypothesis that explained the most variation in the outcome, demonstrating appropriate methods for adjusting for confounders and dealing with missing data. Results: The structured approach identified a combination of sensitive periods: pre-pregnancy weight, and gestational weight gain 0-20 weeks and 20-40 weeks, as the best explanation for variation in birthweight after adjusting for maternal height. A sensitive period hypothesis best explained variation in adolescent depression, with the association strengthening with the proximity of stressful family events. For each example, these models have theoretical support at least as strong as any competing hypothesis. Conclusions: We have extended the structured approach to incorporate continuous exposures, confounding and missing data. This approach can be used in either an exploratory or a confirmatory setting. The interpretation, plausibility and consistency with causal assumptions should all be considered when proposing and choosing life course hypotheses.
Collapse
Affiliation(s)
- Andrew Dac Smith
- School of Social and Community Medicine .,MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK and
| | - Rebecca Hardy
- MRC Unit for Lifelong Health and Ageing, University College London, London, UK
| | - Jon Heron
- School of Social and Community Medicine
| | | | - Debbie A Lawlor
- School of Social and Community Medicine.,MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK and
| | - Corrie Macdonald-Wallis
- School of Social and Community Medicine.,MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK and
| | - Kate Tilling
- School of Social and Community Medicine.,MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK and
| |
Collapse
|
213
|
Franke B, Plante J, Roscher R, Lee EA, Smyth C, Hatefi A, Chen F, Gil E, Schwing A, Selvitella A, Hoffman MM, Grosse R, Hendricks D, Reid N. Statistical Inference, Learning and Models in Big Data. Int Stat Rev 2016. [DOI: 10.1111/insr.12176] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
| | | | | | | | | | | | - Fuqi Chen
- Western University London Ontario Canada
| | - Einat Gil
- University of Toronto Toronto Ontario Canada
| | | | | | | | | | | | - Nancy Reid
- University of Toronto Toronto Ontario Canada
| |
Collapse
|
214
|
Chen Q, Nian H, Zhu Y, Talbot HK, Griffin MR, Harrell FE. Too many covariates and too few cases? - a comparative study. Stat Med 2016; 35:4546-4558. [PMID: 27357163 DOI: 10.1002/sim.7021] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Revised: 05/10/2016] [Accepted: 05/31/2016] [Indexed: 11/10/2022]
Abstract
Prior research indicates that 10-15 cases or controls, whichever fewer, are required per parameter to reliably estimate regression coefficients in multivariable logistic regression models. This condition may be difficult to meet even in a well-designed study when the number of potential confounders is large, the outcome is rare, and/or interactions are of interest. Various propensity score approaches have been implemented when the exposure is binary. Recent work on shrinkage approaches like lasso were motivated by the critical need to develop methods for the p >> n situation, where p is the number of parameters and n is the sample size. Those methods, however, have been less frequently used when p≈n, and in this situation, there is no guidance on choosing among regular logistic regression models, propensity score methods, and shrinkage approaches. To fill this gap, we conducted extensive simulations mimicking our motivating clinical data, estimating vaccine effectiveness for preventing influenza hospitalizations in the 2011-2012 influenza season. Ridge regression and penalized logistic regression models that penalize all but the coefficient of the exposure may be considered in these types of studies. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Qingxia Chen
- Department of Biostatistics, School of Medicine, Vanderbilt University, Nashville, 37232, TN, U.S.A.. .,Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, 37232, TN, U.S.A..
| | - Hui Nian
- Department of Biostatistics, School of Medicine, Vanderbilt University, Nashville, 37232, TN, U.S.A
| | - Yuwei Zhu
- Department of Biostatistics, School of Medicine, Vanderbilt University, Nashville, 37232, TN, U.S.A
| | - H Keipp Talbot
- Department of Medicine, School of Medicine, Vanderbilt University, Nashville, 37232, TN, U.S.A
| | - Marie R Griffin
- Department of Medicine, School of Medicine, Vanderbilt University, Nashville, 37232, TN, U.S.A.,Department of Health Policy, School of Medicine, Vanderbilt University, Nashville, 37232, TN, U.S.A.,Medicine Mid-South Geriatric Research Education and Clinical Center and Clinical Research Center of Excellence, VA TN Valley Health Care System, Nashville, 37232, TN, U.S.A
| | - Frank E Harrell
- Department of Biostatistics, School of Medicine, Vanderbilt University, Nashville, 37232, TN, U.S.A
| |
Collapse
|
215
|
|
216
|
Lee JD, Sun DL, Sun Y, Taylor JE. Exact post-selection inference, with application to the lasso. Ann Stat 2016. [DOI: 10.1214/15-aos1371] [Citation(s) in RCA: 282] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
217
|
Lafzi A, Kazan H. Inferring RBP-Mediated Regulation in Lung Squamous Cell Carcinoma. PLoS One 2016; 11:e0155354. [PMID: 27186987 PMCID: PMC4871487 DOI: 10.1371/journal.pone.0155354] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Accepted: 04/27/2016] [Indexed: 12/11/2022] Open
Abstract
RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation of mRNAs. Dysregulations in RBP-mediated mechanisms have been found to be associated with many steps of cancer initiation and progression. Despite this, previous studies of gene expression in cancer have ignored the effect of RBPs. To this end, we developed a lasso regression model that predicts gene expression in cancer by incorporating RBP-mediated regulation as well as the effects of other well-studied factors such as copy-number variation, DNA methylation, TFs and miRNAs. As a case study, we applied our model to Lung squamous cell carcinoma (LUSC) data as we found that there are several RBPs differentially expressed in LUSC. Including RBP-mediated regulatory effects in addition to the other features significantly increased the Spearman rank correlation between predicted and measured expression of held-out genes. Using a feature selection procedure that accounts for the adaptive search employed by lasso regularization, we identified the candidate regulators in LUSC. Remarkably, several of these candidate regulators are RBPs. Furthermore, majority of the candidate regulators have been previously found to be associated with lung cancer. To investigate the mechanisms that are controlled by these regulators, we predicted their target gene sets based on our model. We validated the target gene sets by comparing against experimentally verified targets. Our results suggest that the future studies of gene expression in cancer must consider the effect of RBP-mediated regulation.
Collapse
Affiliation(s)
- Atefeh Lafzi
- Department of Health Informatics, Middle East Technical University, Ankara, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya International University, Antalya, Turkey
- * E-mail:
| |
Collapse
|
218
|
Lu S, Liu Y, Yin L, Zhang K. Confidence intervals and regions for the lasso by using stochastic variational inequality techniques in optimization. J R Stat Soc Series B Stat Methodol 2016. [DOI: 10.1111/rssb.12184] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Shu Lu
- University of North Carolina at Chapel Hill; USA
| | - Yufeng Liu
- University of North Carolina at Chapel Hill; USA
| | - Liang Yin
- University of North Carolina at Chapel Hill; USA
| | - Kai Zhang
- University of North Carolina at Chapel Hill; USA
| |
Collapse
|
219
|
Abstract
Supplemental Digital Content is available in the text. Epidemiologists are often interested in examining the effect on a later-life outcome of an exposure measured repeatedly over the life course. When different hypotheses for this effect are proposed by competing theories, it is important to identify those most supported by observed data as a first step toward estimating causal associations. One method is to compare goodness-of-fit of hypothesized models with a saturated model, but it is unclear how to judge the “best” out of two hypothesized models that both pass criteria for a good fit. We developed a new method using the least absolute shrinkage and selection operator to identify which of a small set of hypothesized models explains most of the observed outcome variation. We analyzed a cohort study with repeated measures of socioeconomic position (exposure) through childhood, early- and mid-adulthood, and body mass index (outcome) measured in mid-adulthood. We confirmed previous findings regarding support or lack of support for the following hypotheses: accumulation (number of times exposed), three critical periods (only exposure in childhood, early- or mid-adulthood), and social mobility (transition from low to high socioeconomic position). Simulations showed that our least absolute shrinkage and selection operator approach identified the most suitable hypothesized model with high probability in moderately sized samples, but with lower probability for hypotheses involving change in exposure or highly correlated exposures. Identifying a single, simple hypothesis that represents the specified knowledge of the life course association allows more precise definition of the causal effect of interest.
Collapse
|
220
|
Tan A, Huang J. Bayesian inference for high-dimensional linear regression under mnet priors. CAN J STAT 2016. [DOI: 10.1002/cjs.11283] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Aixin Tan
- Department of Statistics and Actuarial Science; University of Iowa; Iowa City, IA 52242 U.S.A
| | - Jian Huang
- Department of Statistics and Actuarial Science; University of Iowa; Iowa City, IA 52242 U.S.A
| |
Collapse
|
221
|
Taylor JE, Loftus JR, Tibshirani RJ. Inference in adaptive regression via the Kac–Rice formula. Ann Stat 2016. [DOI: 10.1214/15-aos1386] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
222
|
Lenters V, Portengen L, Rignell-Hydbom A, Jönsson BA, Lindh CH, Piersma AH, Toft G, Bonde JP, Heederik D, Rylander L, Vermeulen R. Prenatal Phthalate, Perfluoroalkyl Acid, and Organochlorine Exposures and Term Birth Weight in Three Birth Cohorts: Multi-Pollutant Models Based on Elastic Net Regression. ENVIRONMENTAL HEALTH PERSPECTIVES 2016; 124:365-72. [PMID: 26115335 PMCID: PMC4786980 DOI: 10.1289/ehp.1408933] [Citation(s) in RCA: 159] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2014] [Accepted: 06/23/2015] [Indexed: 05/18/2023]
Abstract
BACKGROUND Some legacy and emerging environmental contaminants are suspected risk factors for intrauterine growth restriction. However, the evidence is equivocal, in part due to difficulties in disentangling the effects of mixtures. OBJECTIVES We assessed associations between multiple correlated biomarkers of environmental exposure and birth weight. METHODS We evaluated a cohort of 1,250 term (≥ 37 weeks gestation) singleton infants, born to 513 mothers from Greenland, 180 from Poland, and 557 from Ukraine, who were recruited during antenatal care visits in 2002-2004. Secondary metabolites of diethylhexyl and diisononyl phthalates (DEHP, DiNP), eight perfluoroalkyl acids, and organochlorines (PCB-153 and p,p´-DDE) were quantifiable in 72-100% of maternal serum samples. We assessed associations between exposures and term birth weight, adjusting for co-exposures and covariates, including prepregnancy body mass index. To identify independent associations, we applied the elastic net penalty to linear regression models. RESULTS Two phthalate metabolites (MEHHP, MOiNP), perfluorooctanoic acid (PFOA), and p,p´-DDE were most consistently predictive of term birth weight based on elastic net penalty regression. In an adjusted, unpenalized regression model of the four exposures, 2-SD increases in natural log-transformed MEHHP, PFOA, and p,p´-DDE were associated with lower birth weight: -87 g (95% CI: -137, -340 per 1.70 ng/mL), -43 g (95% CI: -108, 23 per 1.18 ng/mL), and -135 g (95% CI: -192, -78 per 1.82 ng/g lipid), respectively; and MOiNP was associated with higher birth weight (46 g; 95% CI: -5, 97 per 2.22 ng/mL). CONCLUSIONS This study suggests that several of the environmental contaminants, belonging to three chemical classes, may be independently associated with impaired fetal growth. These results warrant follow-up in other cohorts.
Collapse
Affiliation(s)
- Virissa Lenters
- Division of Environmental Epidemiology, Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands
- Address correspondence to V. Lenters, Institute for Risk Assessment Sciences, Utrecht University, Yalelaan 2, 3584CM Utrecht, the Netherlands. Telephone: 31-30-253-9527. E-mail:
| | - Lützen Portengen
- Division of Environmental Epidemiology, Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands
| | - Anna Rignell-Hydbom
- Division of Occupational and Environmental Medicine, Lund University, Lund, Sweden
| | - Bo A.G. Jönsson
- Division of Occupational and Environmental Medicine, Lund University, Lund, Sweden
| | - Christian H. Lindh
- Division of Occupational and Environmental Medicine, Lund University, Lund, Sweden
| | - Aldert H. Piersma
- Laboratory for Health Protection Research, National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands
| | - Gunnar Toft
- Danish Ramazzini Center, Department of Occupational Medicine, Aarhus University Hospital, Aarhus, Denmark
| | - Jens Peter Bonde
- Department of Occupational and Environmental Medicine, Copenhagen University Hospital, Bispebjerg, Copenhagen, Denmark
| | - Dick Heederik
- Division of Environmental Epidemiology, Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands
| | - Lars Rylander
- Division of Occupational and Environmental Medicine, Lund University, Lund, Sweden
| | - Roel Vermeulen
- Division of Environmental Epidemiology, Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherlands
| |
Collapse
|
223
|
Su X, Wijayasinghe CS, Fan J, Zhang Y. Sparse estimation of Cox proportional hazards models via approximated information criteria. Biometrics 2016; 72:751-9. [PMID: 26873398 DOI: 10.1111/biom.12484] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Revised: 12/01/2015] [Accepted: 12/01/2015] [Indexed: 01/10/2023]
Abstract
We propose a new sparse estimation method for Cox (1972) proportional hazards models by optimizing an approximated information criterion. The main idea involves approximation of the ℓ0 norm with a continuous or smooth unit dent function. The proposed method bridges the best subset selection and regularization by borrowing strength from both. It mimics the best subset selection using a penalized likelihood approach yet with no need of a tuning parameter. We further reformulate the problem with a reparameterization step so that it reduces to one unconstrained nonconvex yet smooth programming problem, which can be solved efficiently as in computing the maximum partial likelihood estimator (MPLE). Furthermore, the reparameterization tactic yields an additional advantage in terms of circumventing postselection inference. The oracle property of the proposed method is established. Both simulated experiments and empirical examples are provided for assessment and illustration.
Collapse
Affiliation(s)
- Xiaogang Su
- Department of Mathematical Sciences, University of Texas, El Paso, Texas, U.S.A..
| | | | - Juanjuan Fan
- Department of Mathematics and Statistics, San Diego State University, San Diego, California, U.S.A
| | - Ying Zhang
- Department of Biostatistics, Indiana University Fairbanks School of Public Health and School of Medicine, Indianapolis, Indiana, U.S.A.,Department of Statistics, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
224
|
Sonographic markers of ovarian morphology, but not hirsutism indices, predict serum total testosterone in women with regular menstrual cycles. Fertil Steril 2016; 105:1322-1329.e1. [PMID: 26794423 DOI: 10.1016/j.fertnstert.2015.12.136] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Revised: 12/24/2015] [Accepted: 12/28/2015] [Indexed: 11/23/2022]
Abstract
OBJECTIVE To determine whether sonographic markers of ovarian morphology or male pattern hair growth scores predict androgen levels in women with regular or irregular menstrual cycles. DESIGN Cross-sectional observational study. SETTING Clinical research unit. PATIENT(S) Seventy-six women of reproductive age (18-39 years) were evaluated for male-pattern hair growth (using a modified Ferriman-Gallwey scoring system), ovarian morphology (by transvaginal ultrasonography), and total serum testosterone (T) (by liquid chromatography tandem mass spectrometry). INTERVENTION(S) Not applicable. MAIN OUTCOME MEASURE(S) Regional and total modified Ferriman-Gallwey scores, number of follicles per follicle size category, follicle number per ovary, ovarian volume, ovarian area, stromal to ovarian area ratio, stromal echogenicity index, total testosterone (total T), and menstrual cycle length. RESULT(S) Neither regional nor total modified Ferriman-Gallwey scores correlated with total T concentrations in women with regular or irregular menstrual cycles, as judged by the Least Absolute Shrinkage and Selection Operator technique. By contrast, a sonographic marker (follicle number per ovary 6-9 mm) significantly predicted total T concentrations in women with regular menstrual cycles but not in women with irregular menstrual cycles. CONCLUSION(S) Sonographic markers of ovarian morphology, but not hirsutism scores, predicted total T levels. However, the predictive value of ovarian morphology for total T differed by menstrual cycle status. That sonographic markers did not predict androgen levels in a diverse cohort of women with cycle irregularity suggests the potential for distinct variations in ovarian morphology for androgenic and nonandrogenic types of cycle irregularity. Overall, our findings support that an assessment of ovarian morphology may be helpful in reflecting total T levels.
Collapse
|
225
|
McKeague IW, Qian M. An adaptive resampling test for detecting the presence of significant predictors. J Am Stat Assoc 2016; 110:1422-1433. [PMID: 27073292 DOI: 10.1080/01621459.2015.1095099] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
This paper investigates marginal screening for detecting the presence of significant predictors in high-dimensional regression. Screening large numbers of predictors is a challenging problem due to the non-standard limiting behavior of post-model-selected estimators. There is a common misconception that the oracle property for such estimators is a panacea, but the oracle property only holds away from the null hypothesis of interest in marginal screening. To address this difficulty, we propose an adaptive resampling test (ART). Our approach provides an alternative to the popular (yet conservative) Bonferroni method of controlling familywise error rates. ART is adaptive in the sense that thresholding is used to decide whether the centered percentile bootstrap applies, and otherwise adapts to the non-standard asymptotics in the tightest way possible. The performance of the approach is evaluated using a simulation study and applied to gene expression data and HIV drug resistance data.
Collapse
Affiliation(s)
| | - Min Qian
- Department of Biostatistics, Columbia University.
| |
Collapse
|
226
|
Guo B, Chen SX. Tests for high dimensional generalized linear models. J R Stat Soc Series B Stat Methodol 2016. [DOI: 10.1111/rssb.12152] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Bin Guo
- Sichuan University Chengdu People's Republic of China
| | - Song Xi Chen
- Peking University Beijing People's Republic of China
- Iowa State University Ames USA
| |
Collapse
|
227
|
Mitra R, Zhang CH. The benefit of group sparsity in group inference with de-biased scaled group Lasso. Electron J Stat 2016. [DOI: 10.1214/16-ejs1120] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
228
|
Jiang X, Hu X. Microbiome Data Mining for Microbial Interactions and Relationships. BIG DATA ANALYTICS 2016. [DOI: 10.1007/978-81-322-3628-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
229
|
He H, Lin D, Zhang J, Wang Y, Deng HW. Biostatistics, Data Mining and Computational Modeling. TRANSLATIONAL BIOINFORMATICS 2016. [DOI: 10.1007/978-94-017-7543-4_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
230
|
|
231
|
Heiskanen MA, Leskinen T, Eskelinen JJ, Heinonen IHA, Löyttyniemi E, Virtanen K, Pärkkä JP, Hannukainen JC, Kalliokoski KK. Different Predictors of Right and Left Ventricular Metabolism in Healthy Middle-Aged Men. Front Physiol 2015; 6:389. [PMID: 26733882 PMCID: PMC4685066 DOI: 10.3389/fphys.2015.00389] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Accepted: 11/30/2015] [Indexed: 11/13/2022] Open
Abstract
Dysfunction of the right ventricle (RV) plays a crucial role in the outcome of various cardiovascular diseases. Previous studies on RV metabolism are sparse although evidence implies it may differ from left ventricular (LV) metabolism. Therefore, the aims of this study were (1) to determine predictors of RV glucose uptake (GU) and free fatty acid uptake (FFAU) and (2) to compare them to predictors of LV metabolism in healthy middle-aged men. Altogether 28 healthy, sedentary, middle-aged (40-55 years) men were studied. Insulin-stimulated GU and fasting FFAU were measured by positron emission tomography and RV and LV structural and functional parameters by cardiac magnetic resonance. Several parameters related to whole-body health were also measured. Predictors of RV and LV metabolism were determined by pairwise correlation analysis, lasso regression models, and variable clustering using heatmap. RVGU was most strongly predicted by age and moderately by RV ejection fraction (EF). The strongest determinants of RVFFAU were exercise capacity (peak oxygen uptake), resting heart rate, LVEF, and whole-body insulin-stimulated glucose uptake rate. When considering LV metabolism, age and RVEF were associated also with LVGU. In addition, LVGU was strongly, and negatively, influenced by whole-body insulin-stimulated glucose uptake rate. LVFFAU was predicted only by LVEF. This study shows that while RV and LV metabolism have shared characteristics, they also have unique properties. Age of the subject should be taken into account when measuring myocardial glucose utilization. Ejection fraction is related to myocardial metabolism, and even so that RVEF may be more closely related to GU of both ventricles and LVEF to FFAU of both ventricles, a finding supporting the ventricular interdependence. However, only RV fatty acid utilization associates with exercise capacity so that better physical fitness in a relatively sedentary population is related with decreased RV fat metabolism. To conclude, this study highlights the need for further study designed specifically on less-known RV, as the results on LV metabolism and physiology may not be directly applicable to the RV.
Collapse
Affiliation(s)
| | | | | | - Ilkka H A Heinonen
- Turku PET Centre, University of TurkuTurku, Finland; School of Sport Science, Exercise and Health, University of Western AustraliaCrawley, WA, Australia
| | | | | | | | | | | |
Collapse
|
232
|
Jiao B, Liu X, Zhou L, Wang MH, Zhou Y, Xiao T, Zhang W, Sun R, Waye MMY, Tang B, Shen L. Polygenic Analysis of Late-Onset Alzheimer's Disease from Mainland China. PLoS One 2015; 10:e0144898. [PMID: 26680604 PMCID: PMC4683047 DOI: 10.1371/journal.pone.0144898] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 11/24/2015] [Indexed: 01/14/2023] Open
Abstract
Recently, a number of single nucleotide polymorphisms (SNPs) were identified to be associated with late-onset Alzheimer disease (LOAD) through genome-wide association study data. Identification of SNP-SNP interaction played an important role in better understanding genetic basis of LOAD. In this study, fifty-eight SNPs were screened in a cohort of 229 LOAD cases and 318 controls from mainland China, and their interaction was evaluated by a series of analysis methods. Seven risk SNPs and six protective SNPs were identified to be associated with LOAD. Risk SNPs included rs9331888 (CLU), rs6691117 (CR1), rs4938933 (MS4A), rs9349407 (CD2AP), rs1160985 (TOMM40), rs4945261 (GAB2) and rs5984894 (PCDH11X); Protective SNPs consisted of rs744373 (BIN1), rs1562990 (MS4A), rs597668 (EXOC3L2), rs9271192 (HLA-DRB5/DRB1), rs157581 and rs11556505 (TOMM40). Among positive SNPs presented above, we found the interaction between rs4938933 (risk) and rs1562990 (protective) in MS4A weakened their each effect for LOAD; for three significant SNPs in TOMM40, their cumulative interaction induced the two protective SNPs effects lost and made the risk SNP effect aggravate for LOAD. Finally, we found rs6656401-rs3865444 (CR1-CD33) pairs were significantly associated with decreasing LOAD risk, while rs28834970-rs6656401 (PTK2B-CR1), and rs28834970-rs6656401 (PTK2B-CD33) were associated with increasing LOAD risk. In a word, our study indicates that SNP-SNP interaction existed in the same gene or cross different genes, which could weaken or aggravate their initial single effects for LOAD.
Collapse
Affiliation(s)
- Bin Jiao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Xiaoyan Liu
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Lin Zhou
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, China
| | - Maggie Haitian Wang
- Division of Biostatistics, School of Public Health and Primary Care, the Chinese University of Hong Kong, Shatin, Hong Kong Special Administrative Region
| | - Yafang Zhou
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, China
| | - Tingting Xiao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Weiwei Zhang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Rui Sun
- Division of Biostatistics, School of Public Health and Primary Care, the Chinese University of Hong Kong, Shatin, Hong Kong Special Administrative Region
| | - Mary Miu Yee Waye
- School of Biomedical Sciences, the Chinese University of Hong Kong, Shatin, Hong Kong Special Administrative Region
| | - Beisha Tang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- State Key Laboratory of Medical Genetics, Changsha, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, China
| | - Lu Shen
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- State Key Laboratory of Medical Genetics, Changsha, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, China
| |
Collapse
|
233
|
Comparative transcriptomics reveals similarities and differences between astrocytoma grades. BMC Cancer 2015; 15:952. [PMID: 26673168 PMCID: PMC4682229 DOI: 10.1186/s12885-015-1939-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 11/01/2015] [Indexed: 11/23/2022] Open
Abstract
Background Astrocytomas are the most common primary brain tumors distinguished into four histological grades. Molecular analyses of individual astrocytoma grades have revealed detailed insights into genetic, transcriptomic and epigenetic alterations. This provides an excellent basis to identify similarities and differences between astrocytoma grades. Methods We utilized public omics data of all four astrocytoma grades focusing on pilocytic astrocytomas (PA I), diffuse astrocytomas (AS II), anaplastic astrocytomas (AS III) and glioblastomas (GBM IV) to identify similarities and differences using well-established bioinformatics and systems biology approaches. We further validated the expression and localization of Ang2 involved in angiogenesis using immunohistochemistry. Results Our analyses show similarities and differences between astrocytoma grades at the level of individual genes, signaling pathways and regulatory networks. We identified many differentially expressed genes that were either exclusively observed in a specific astrocytoma grade or commonly affected in specific subsets of astrocytoma grades in comparison to normal brain. Further, the number of differentially expressed genes generally increased with the astrocytoma grade with one major exception. The cytokine receptor pathway showed nearly the same number of differentially expressed genes in PA I and GBM IV and was further characterized by a significant overlap of commonly altered genes and an exclusive enrichment of overexpressed cancer genes in GBM IV. Additional analyses revealed a strong exclusive overexpression of CX3CL1 (fractalkine) and its receptor CX3CR1 in PA I possibly contributing to the absence of invasive growth. We further found that PA I was significantly associated with the mesenchymal subtype typically observed for very aggressive GBM IV. Expression of endothelial and mesenchymal markers (ANGPT2, CHI3L1) indicated a stronger contribution of the micro-environment to the manifestation of the mesenchymal subtype than the tumor biology itself. We further inferred a transcriptional regulatory network associated with specific expression differences distinguishing PA I from AS II, AS III and GBM IV. Major central transcriptional regulators were involved in brain development, cell cycle control, proliferation, apoptosis, chromatin remodeling or DNA methylation. Many of these regulators showed directly underlying DNA methylation changes in PA I or gene copy number mutations in AS II, AS III and GBM IV. Conclusions This computational study characterizes similarities and differences between all four astrocytoma grades confirming known and revealing novel insights into astrocytoma biology. Our findings represent a valuable resource for future computational and experimental studies. Electronic supplementary material The online version of this article (doi:10.1186/s12885-015-1939-9) contains supplementary material, which is available to authorized users.
Collapse
|
234
|
Yang E, Ravikumar P, Allen GI, Liu Z. Graphical Models via Univariate Exponential Family Distributions. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2015; 16:3813-3847. [PMID: 27570498 PMCID: PMC4998206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Undirected graphical models, or Markov networks, are a popular class of statistical models, used in a wide variety of applications. Popular instances of this class include Gaussian graphical models and Ising models. In many settings, however, it might not be clear which subclass of graphical models to use, particularly for non-Gaussian and non-categorical data. In this paper, we consider a general sub-class of graphical models where the node-wise conditional distributions arise from exponential families. This allows us to derive multivariate graphical model distributions from univariate exponential family distributions, such as the Poisson, negative binomial, and exponential distributions. Our key contributions include a class of M-estimators to fit these graphical model distributions; and rigorous statistical analysis showing that these M-estimators recover the true graphical model structure exactly, with high probability. We provide examples of genomic and proteomic networks learned via instances of our class of graphical models derived from Poisson and exponential distributions.
Collapse
Affiliation(s)
- Eunho Yang
- IBM T.J. Watson Research Center, Yorktown Heights, NY 10598,
USA
| | - Pradeep Ravikumar
- Department of Computer Science, University of Texas, Austin,
Austin, TX 78712, USA
| | | | - Zhandong Liu
- Department of Pediatrics-Neurology, Baylor College of Medicine,
Houston, TX 77030, USA
| |
Collapse
|
235
|
Nadel J, Athanasiadou R, Lemetre C, Wijetunga NA, Ó Broin P, Sato H, Zhang Z, Jeddeloh J, Montagna C, Golden A, Seoighe C, Greally JM. RNA:DNA hybrids in the human genome have distinctive nucleotide characteristics, chromatin composition, and transcriptional relationships. Epigenetics Chromatin 2015; 8:46. [PMID: 26579211 PMCID: PMC4647656 DOI: 10.1186/s13072-015-0040-6] [Citation(s) in RCA: 115] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 10/29/2015] [Indexed: 01/01/2023] Open
Abstract
Background RNA:DNA hybrids represent a non-canonical nucleic acid structure that has been associated with a range of human diseases and potential transcriptional regulatory functions. Mapping of RNA:DNA hybrids in human cells reveals them to have a number of characteristics that give insights into their functions. Results We find RNA:DNA hybrids to occupy millions of base pairs in the human genome. A directional sequencing approach shows the RNA component of the RNA:DNA hybrid to be purine-rich, indicating a thermodynamic contribution to their in vivo stability. The RNA:DNA hybrids are enriched at loci with decreased DNA methylation and increased DNase hypersensitivity, and within larger domains with characteristics of heterochromatin formation, indicating potential transcriptional regulatory properties. Mass spectrometry studies of chromatin at RNA:DNA hybrids shows the presence of the ILF2 and ILF3 transcription factors, supporting a model of certain transcription factors binding preferentially to the RNA:DNA conformation. Conclusions Overall, there is little to indicate a dependence for RNA:DNA hybrids forming co-transcriptionally, with results from the ribosomal DNA repeat unit instead supporting the intriguing model of RNA generating these structures intrans. The results of the study indicate heterogeneous functions of these genomic elements and new insights into their formation and stability in vivo. Electronic supplementary material The online version of this article (doi:10.1186/s13072-015-0040-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Julie Nadel
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Rodoniki Athanasiadou
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA ; Department of Biology, Center for Genomics and Systems Biology, New York University, 12 Waverly Place, New York, NY 10003 USA
| | - Christophe Lemetre
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA ; Integrated Genomics Operation, Memorial Sloan-Kettering Cancer Center, New York, NY 10065 USA
| | - N Ari Wijetunga
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Pilib Ó Broin
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Hanae Sato
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Zhengdong Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | | | - Cristina Montagna
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Aaron Golden
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Cathal Seoighe
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, Galway, Ireland
| | - John M Greally
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA ; Department of Genetics, Center for Epigenomics and Division of Computational Genetics, Albert Einstein College of Medicine, 1301 Morris Park Avenue, Bronx, NY 10461 USA
| |
Collapse
|
236
|
Abstract
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
Collapse
|
237
|
Hinne M, Janssen RJ, Heskes T, van Gerven MA. Bayesian Estimation of Conditional Independence Graphs Improves Functional Connectivity Estimates. PLoS Comput Biol 2015; 11:e1004534. [PMID: 26540089 PMCID: PMC4634993 DOI: 10.1371/journal.pcbi.1004534] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 09/03/2015] [Indexed: 01/18/2023] Open
Abstract
Functional connectivity concerns the correlated activity between neuronal populations in spatially segregated regions of the brain, which may be studied using functional magnetic resonance imaging (fMRI). This coupled activity is conveniently expressed using covariance, but this measure fails to distinguish between direct and indirect effects. A popular alternative that addresses this issue is partial correlation, which regresses out the signal of potentially confounding variables, resulting in a measure that reveals only direct connections. Importantly, provided the data are normally distributed, if two variables are conditionally independent given all other variables, their respective partial correlation is zero. In this paper, we propose a probabilistic generative model that allows us to estimate functional connectivity in terms of both partial correlations and a graph representing conditional independencies. Simulation results show that this methodology is able to outperform the graphical LASSO, which is the de facto standard for estimating partial correlations. Furthermore, we apply the model to estimate functional connectivity for twenty subjects using resting-state fMRI data. Results show that our model provides a richer representation of functional connectivity as compared to considering partial correlations alone. Finally, we demonstrate how our approach can be extended in several ways, for instance to achieve data fusion by informing the conditional independence graph with data from probabilistic tractography. As our Bayesian formulation of functional connectivity provides access to the posterior distribution instead of only to point estimates, we are able to quantify the uncertainty associated with our results. This reveals that while we are able to infer a clear backbone of connectivity in our empirical results, the data are not accurately described by simply looking at the mode of the distribution over connectivity. The implication of this is that deterministic alternatives may misjudge connectivity results by drawing conclusions from noisy and limited data. Significant neuroscientific effort is devoted to elucidating functional connectivity between spatially segregated brain regions. This requires that we are able to quantify the degree of dependence between the signals of different areas. Yet how this must be accomplished—using which measures, each with their own limitations and interpretations—is far from a trivial task. One frequently advocated metric for functional connectivity is partial correlation, which is related to conditional independence: if two regions are independent, conditioned on all other regions, then their partial correlation is zero, assuming Gaussian data. Here, we use a probabilistic generative model to describe the relationship between functional connectivity and conditional independence. We apply this Bayesian approach to reveal functional connectivity between subcortical areas, and in addition we propose different variants of the generative model for connectivity. In the first, we address how a Bayesian formulation of connectivity allows for integration with other imaging modalities, resulting in data fusion. Secondly, we show how prior constraints can be incorporated in our estimates of connectivity.
Collapse
Affiliation(s)
- Max Hinne
- Radboud University, Institute for Computing and Information Sciences, Nijmegen, the Netherlands
- Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, the Netherlands
- * E-mail:
| | - Ronald J. Janssen
- Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, the Netherlands
| | - Tom Heskes
- Radboud University, Institute for Computing and Information Sciences, Nijmegen, the Netherlands
| | - Marcel A.J. van Gerven
- Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, the Netherlands
| |
Collapse
|
238
|
Dezeure R, Bühlmann P, Meier L, Meinshausen N. High-Dimensional Inference: Confidence Intervals, $p$-Values and R-Software hdi. Stat Sci 2015. [DOI: 10.1214/15-sts527] [Citation(s) in RCA: 104] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
239
|
|
240
|
|
241
|
Fisher CK, Mehta P. Bayesian Feature Selection with Strongly Regularizing Priors Maps to the Ising Model. Neural Comput 2015; 27:2411-22. [PMID: 26378876 DOI: 10.1162/neco_a_00780] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Identifying small subsets of features that are relevant for prediction and classification tasks is a central problem in machine learning and statistics. The feature selection task is especially important, and computationally difficult, for modern data sets where the number of features can be comparable to or even exceed the number of samples. Here, we show that feature selection with Bayesian inference takes a universal form and reduces to calculating the magnetizations of an Ising model under some mild conditions. Our results exploit the observation that the evidence takes a universal form for strongly regularizing priors--priors that have a large effect on the posterior probability even in the infinite data limit. We derive explicit expressions for feature selection for generalized linear models, a large class of statistical techniques that includes linear and logistic regression. We illustrate the power of our approach by analyzing feature selection in a logistic regression-based classifier trained to distinguish between the letters B and D in the notMNIST data set.
Collapse
Affiliation(s)
- Charles K Fisher
- Department of Physics, Boston University, Boston, MA 02215, U.S.A.
| | - Pankaj Mehta
- Department of Physics, Boston University, Boston, MA 02215, U.S.A.
| |
Collapse
|
242
|
McLennan SR, Engel-Glatter S, Meyer AH, Schwappach DLB, Scheidegger DH, Elger BS. The impact of medical errors on Swiss anaesthesiologists: a cross-sectional survey. Acta Anaesthesiol Scand 2015; 59:990-8. [PMID: 25952281 DOI: 10.1111/aas.12517] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Accepted: 02/26/2015] [Indexed: 11/28/2022]
Abstract
BACKGROUND Clinicians involved in medical errors can experience significant distress. This study aims to examine (1) how medical errors impact anaesthesiologists in key work and life domains; (2) anaesthesiologists' attitudes regarding support after errors; (3) and which anaesthesiologists are most affected by errors. METHODS This study is a mailed cross-sectional survey completed by 281 of the 542 clinically active anaesthesiologists (52% response rate) working at Switzerland's five university hospitals between July 2012 and April 2013. RESULTS Respondents reported that errors had negatively affected anxiety about future errors (51%), confidence in their ability as a doctor (45%), ability to sleep (36%), job satisfaction (32%), and professional reputation (9%). Respondents' lives were more likely to be affected as error severity increased. Ninety per cent of respondents disagreed that hospitals adequately support them in coping with the stress associated with medical errors. Nearly all of the respondents (92%) reported being interested in psychological counselling after a serious error, but many identified barriers to seeking counselling. However, there were significant differences between departments regarding error-related stress levels and attitudes about error-related support. Respondents were more likely to experience certain distress if they were female, older, had previously been involved in a serious error, and were dissatisfied with their last error disclosure. CONCLUSION Medical errors, even minor errors and near misses, can have a serious effect on clinicians. Health-care organisations need to do more to support clinicians in coping with the stress associated with medical errors.
Collapse
Affiliation(s)
- S. R. McLennan
- Institute for Biomedical Ethics; Universität Basel; Basel Switzerland
- Centre for Health Policy, School of Population and Global Health; University of Melbourne; Melbourne Victoria Australia
| | - S. Engel-Glatter
- Institute for Biomedical Ethics; Universität Basel; Basel Switzerland
| | - A. H. Meyer
- Department of Psychology; Division of Clinical Psychology and Epidemiology; Universität Basel; Basel Switzerland
| | - D. L. B. Schwappach
- Swiss Patient Safety Foundation; Zurich Switzerland
- Institute of Social and Preventive Medicine; Universität Bern; Bern Switzerland
| | - D. H. Scheidegger
- Prof. emer. Anesthesia and Intensive Care; Universität Basel; Basel Switzerland
| | - B. S. Elger
- Institute for Biomedical Ethics; Universität Basel; Basel Switzerland
| |
Collapse
|
243
|
Cronin RM, VanHouten JP, Siew ED, Eden SK, Fihn SD, Nielson CD, Peterson JF, Baker CR, Ikizler TA, Speroff T, Matheny ME. National Veterans Health Administration inpatient risk stratification models for hospital-acquired acute kidney injury. J Am Med Inform Assoc 2015; 22:1054-71. [PMID: 26104740 PMCID: PMC5009929 DOI: 10.1093/jamia/ocv051] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2014] [Revised: 03/12/2015] [Accepted: 04/20/2015] [Indexed: 02/04/2023] Open
Abstract
OBJECTIVE Hospital-acquired acute kidney injury (HA-AKI) is a potentially preventable cause of morbidity and mortality. Identifying high-risk patients prior to the onset of kidney injury is a key step towards AKI prevention. MATERIALS AND METHODS A national retrospective cohort of 1,620,898 patient hospitalizations from 116 Veterans Affairs hospitals was assembled from electronic health record (EHR) data collected from 2003 to 2012. HA-AKI was defined at stage 1+, stage 2+, and dialysis. EHR-based predictors were identified through logistic regression, least absolute shrinkage and selection operator (lasso) regression, and random forests, and pair-wise comparisons between each were made. Calibration and discrimination metrics were calculated using 50 bootstrap iterations. In the final models, we report odds ratios, 95% confidence intervals, and importance rankings for predictor variables to evaluate their significance. RESULTS The area under the receiver operating characteristic curve (AUC) for the different model outcomes ranged from 0.746 to 0.758 in stage 1+, 0.714 to 0.720 in stage 2+, and 0.823 to 0.825 in dialysis. Logistic regression had the best AUC in stage 1+ and dialysis. Random forests had the best AUC in stage 2+ but the least favorable calibration plots. Multiple risk factors were significant in our models, including some nonsteroidal anti-inflammatory drugs, blood pressure medications, antibiotics, and intravenous fluids given during the first 48 h of admission. CONCLUSIONS This study demonstrated that, although all the models tested had good discrimination, performance characteristics varied between methods, and the random forests models did not calibrate as well as the lasso or logistic regression models. In addition, novel modifiable risk factors were explored and found to be significant.
Collapse
Affiliation(s)
- Robert M Cronin
- Geriatric Research Education Clinical Center, Tennessee Valley Health System, Veterans Health Administration, Nashville, TN, USA Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA Division of General Internal Medicine and Public Health, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Jacob P VanHouten
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Edward D Siew
- Division of Nephrology, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Svetlana K Eden
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Stephan D Fihn
- Office of Analytics and Business Intelligence, VA Central Office, Veterans Health Administration, Seattle, WA, USA Division of General Internal Medicine, University of Washington, Seattle, WA, USA
| | - Christopher D Nielson
- Office of Analytics and Business Intelligence, VA Central Office, Veterans Health Administration, Seattle, WA, USA Division of Pulmonary Medicine and Critical Care, University of Nevada, Reno, NV, USA
| | - Josh F Peterson
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Clifton R Baker
- Office of Analytics and Business Intelligence, VA Central Office, Veterans Health Administration, Seattle, WA, USA
| | - T Alp Ikizler
- Division of Nephrology, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Theodore Speroff
- Geriatric Research Education Clinical Center, Tennessee Valley Health System, Veterans Health Administration, Nashville, TN, USA Division of General Internal Medicine and Public Health, Vanderbilt University School of Medicine, Nashville, TN, USA Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Michael E Matheny
- Geriatric Research Education Clinical Center, Tennessee Valley Health System, Veterans Health Administration, Nashville, TN, USA Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA Division of General Internal Medicine and Public Health, Vanderbilt University School of Medicine, Nashville, TN, USA Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| |
Collapse
|
244
|
Bogdan M, van den Berg E, Sabatti C, Su W, Candès EJ. SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. Ann Appl Stat 2015; 9:1103-1140. [PMID: 26709357 DOI: 10.1214/15-aoas842] [Citation(s) in RCA: 122] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
We introduce a new estimator for the vector of coefficients β in the linear model y = Xβ + z, where X has dimensions n × p with p possibly larger than n. SLOPE, short for Sorted L-One Penalized Estimation, is the solution to [Formula: see text]where λ1 ≥ λ2 ≥ … ≥ λ p ≥ 0 and [Formula: see text] are the decreasing absolute values of the entries of b. This is a convex program and we demonstrate a solution algorithm whose computational complexity is roughly comparable to that of classical ℓ1 procedures such as the Lasso. Here, the regularizer is a sorted ℓ1 norm, which penalizes the regression coefficients according to their rank: the higher the rank-that is, stronger the signal-the larger the penalty. This is similar to the Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B57 (1995) 289-300] procedure (BH) which compares more significant p-values with more stringent thresholds. One notable choice of the sequence {λ i } is given by the BH critical values [Formula: see text], where q ∈ (0, 1) and z(α) is the quantile of a standard normal distribution. SLOPE aims to provide finite sample guarantees on the selected model; of special interest is the false discovery rate (FDR), defined as the expected proportion of irrelevant regressors among all selected predictors. Under orthogonal designs, SLOPE with λBH provably controls FDR at level q. Moreover, it also appears to have appreciable inferential properties under more general designs X while having substantial power, as demonstrated in a series of experiments running on both simulated and real data.
Collapse
Affiliation(s)
- Małgorzata Bogdan
- Department of Mathematics, Wrocław University of Technology, 50-370 Wrocław, Poland
| | - Ewout van den Berg
- Human Language Technologies, IBM T.J. Watson Research Center, Yorktown Heights, New York 10598, USA
| | - Chiara Sabatti
- Department of Health Research and Policy, Division of Biostatistics, Stanford University, HRP Redwood Building, Stanford, California 94305, USA
| | - Weijie Su
- Department of Statistics, Stanford University, 90 Serra Mall, Sequoia Hall, Stanford, California 94305, USA
| | - Emmanuel J Candès
- Department of Statistics, Stanford University, 390 Serra Mall, Sequoia Hall, Stanford, California 94305, USA
| |
Collapse
|
245
|
Sabourin JA, Valdar W, Nobel AB. A permutation approach for selecting the penalty parameter in penalized model selection. Biometrics 2015; 71:1185-94. [PMID: 26243050 DOI: 10.1111/biom.12359] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 04/01/2015] [Accepted: 05/01/2015] [Indexed: 11/27/2022]
Abstract
We describe a simple, computationally efficient, permutation-based procedure for selecting the penalty parameter in LASSO-penalized regression. The procedure, permutation selection, is intended for applications where variable selection is the primary focus, and can be applied in a variety of structural settings, including that of generalized linear models. We briefly discuss connections between permutation selection and existing theory for the LASSO. In addition, we present a simulation study and an analysis of real biomedical data sets in which permutation selection is compared with selection based on the following: cross-validation (CV), the Bayesian information criterion (BIC), scaled sparse linear regression, and a selection method based on recently developed testing procedures for the LASSO.
Collapse
Affiliation(s)
- Jeremy A Sabourin
- Department of Genetics, University of North Carolina at Chapel Hill, North Carolina, U.S.A.,Genometrics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, Maryland, U.S.A
| | - William Valdar
- Department of Genetics, University of North Carolina at Chapel Hill, North Carolina, U.S.A.,Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, North Carolina, U.S.A
| | - Andrew B Nobel
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, North Carolina, U.S.A.,Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, North Carolina, U.S.A.,Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, U.S.A
| |
Collapse
|
246
|
G'Sell MG, Wager S, Chouldechova A, Tibshirani R. Sequential selection procedures and false discovery rate control. J R Stat Soc Series B Stat Methodol 2015. [DOI: 10.1111/rssb.12122] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
247
|
Teipel SJ, Kurth J, Krause B, Grothe MJ. The relative importance of imaging markers for the prediction of Alzheimer's disease dementia in mild cognitive impairment - Beyond classical regression. NEUROIMAGE-CLINICAL 2015. [PMID: 26199870 PMCID: PMC4506984 DOI: 10.1016/j.nicl.2015.05.006] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Selecting a set of relevant markers to predict conversion from mild cognitive impairment (MCI) to Alzheimer's disease (AD) has become a challenging task given the wealth of regional pathologic information that can be extracted from multimodal imaging data. Here, we used regularized regression approaches with an elastic net penalty for best subset selection of multiregional information from AV45-PET, FDG-PET and volumetric MRI data to predict conversion from MCI to AD. The study sample consisted of 127 MCI subjects from ADNI-2 who had a clinical follow-up between 6 and 31 months. Additional analyses assessed the effect of partial volume correction on predictive performance of AV45- and FDG-PET data. Predictor variables were highly collinear within and across imaging modalities. Penalized Cox regression yielded more parsimonious prediction models compared to unpenalized Cox regression. Within single modalities, time to conversion was best predicted by increased AV45-PET signal in posterior medial and lateral cortical regions, decreased FDG-PET signal in medial temporal and temporobasal regions, and reduced gray matter volume in medial, basal, and lateral temporal regions. Logistic regression models reached up to 72% cross-validated accuracy for prediction of conversion status, which was comparable to cross-validated accuracy of non-linear support vector machine classification. Regularized regression outperformed unpenalized stepwise regression when number of parameters approached or exceeded the number of training cases. Partial volume correction had a negative effect on the predictive performance of AV45-PET, but slightly improved the predictive value of FDG-PET data. Penalized regression yielded more parsimonious models than unpenalized stepwise regression for the integration of multiregional and multimodal imaging information. The advantage of penalized regression was particularly strong with a high number of collinear predictors. Use of regularized Cox and logistic regression for dementia prediction Regularized regression deals with a high number of highly collinear predictors. Regularized regression yields a parsimonious and plausible prediction model. Prediction accuracy of regularized regression is superior to machine learning. Partial volume correction of PET data modulates prediction accuracy.
Collapse
Affiliation(s)
- Stefan J Teipel
- German Center for Neurodegenerative Diseases (DZNE), Rostock, Germany ; Department of Psychosomatic Medicine, University Medicine Rostock, Rostock, Germany
| | - Jens Kurth
- Department of Nuclear Medicine, University Medicine Rostock, Rostock, Germany
| | - Bernd Krause
- Department of Nuclear Medicine, University Medicine Rostock, Rostock, Germany
| | - Michel J Grothe
- German Center for Neurodegenerative Diseases (DZNE), Rostock, Germany
| | | |
Collapse
|
248
|
Cross-validation and hypothesis testing in neuroimaging: An irenic comment on the exchange between Friston and Lindquist et al. Neuroimage 2015; 116:248-54. [PMID: 25918034 DOI: 10.1016/j.neuroimage.2015.04.032] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Revised: 03/26/2015] [Accepted: 04/16/2015] [Indexed: 12/28/2022] Open
Abstract
The "ten ironic rules for statistical reviewers" presented by Friston (2012) prompted a rebuttal by Lindquist et al. (2013), which was followed by a rejoinder by Friston (2013). A key issue left unresolved in this discussion is the use of cross-validation to test the significance of predictive analyses. This note discusses the role that cross-validation-based and related hypothesis tests have come to play in modern data analyses, in neuroimaging and other fields. It is shown that such tests need not be suboptimal and can fill otherwise-unmet inferential needs.
Collapse
|
249
|
Wu Y, Cook RJ. Penalized regression for interval-censored times of disease progression: Selection of HLA markers in psoriatic arthritis. Biometrics 2015; 71:782-91. [DOI: 10.1111/biom.12302] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2013] [Revised: 12/01/2014] [Accepted: 02/01/2015] [Indexed: 10/23/2022]
Affiliation(s)
- Ying Wu
- Department of Statistics and Actuarial Science; University of Waterloo; Waterloo, Ontario, Canada N2L 3G1
| | - Richard J. Cook
- Department of Statistics and Actuarial Science; University of Waterloo; Waterloo, Ontario, Canada N2L 3G1
| |
Collapse
|
250
|
|