1
|
Miao G, Yu L, Yang J, Bennett DA, Zhao J, Wu SS. Learning from vertically distributed data across multiple sites: An efficient privacy-preserving algorithm for Cox proportional hazards model with variable selection. J Biomed Inform 2024; 149:104581. [PMID: 38142903 PMCID: PMC10996392 DOI: 10.1016/j.jbi.2023.104581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/24/2023] [Accepted: 12/19/2023] [Indexed: 12/26/2023]
Abstract
OBJECTIVE To develop a lossless distributed algorithm for regularized Cox proportional hazards model with variable selection to support federated learning for vertically distributed data. METHODS We propose a novel distributed algorithm for fitting regularized Cox proportional hazards model when data sharing among different data providers is restricted. Based on cyclical coordinate descent, the proposed algorithm computes intermediary statistics by each site and then exchanges them to update the model parameters in other sites without accessing individual patient-level data. We evaluate the performance of the proposed algorithm with (1) a simulation study and (2) a real-world data analysis predicting the risk of Alzheimer's dementia from the Religious Orders Study and Rush Memory and Aging Project (ROSMAP). Moreover, we compared the performance of our method with existing privacy-preserving models. RESULTS Our algorithm achieves privacy-preserving variable selection for time-to-event data in the vertically distributed setting, without degradation of accuracy compared with a centralized approach. Simulation demonstrates that our algorithm is highly efficient in analyzing high-dimensional datasets. Real-world data analysis reveals that our distributed Cox model yields higher accuracy in predicting the risk of Alzheimer's dementia than the conventional Cox model built by each data provider without data sharing. Moreover, our algorithm is computationally more efficient compared with existing privacy-preserving Cox models with or without regularization term. CONCLUSION The proposed algorithm is lossless, privacy-preserving and highly efficient to fit regularized Cox model for vertically distributed data. It provides a suitable and convenient approach for modeling time-to-event data in a distributed manner.
Collapse
Affiliation(s)
- Guanhong Miao
- Department of Epidemiology, College of Public Health & Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA; Center for Genetic Epidemiology and Bioinformatics, University of Florida, Gainesville, FL, USA; Department of Biostatistics, College of Public Health & Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA
| | - Lei Yu
- Rush Alzheimer's Disease Center & Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA
| | - Jingyun Yang
- Rush Alzheimer's Disease Center & Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center & Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA
| | - Jinying Zhao
- Department of Epidemiology, College of Public Health & Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA; Center for Genetic Epidemiology and Bioinformatics, University of Florida, Gainesville, FL, USA
| | - Samuel S Wu
- Department of Biostatistics, College of Public Health & Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
2
|
Bon JJ, Bretherton A, Buchhorn K, Cramb S, Drovandi C, Hassan C, Jenner AL, Mayfield HJ, McGree JM, Mengersen K, Price A, Salomone R, Santos-Fernandez E, Vercelloni J, Wang X. Being Bayesian in the 2020s: opportunities and challenges in the practice of modern applied Bayesian statistics. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220156. [PMID: 36970822 PMCID: PMC10041356 DOI: 10.1098/rsta.2022.0156] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/06/2023] [Indexed: 06/18/2023]
Abstract
Building on a strong foundation of philosophy, theory, methods and computation over the past three decades, Bayesian approaches are now an integral part of the toolkit for most statisticians and data scientists. Whether they are dedicated Bayesians or opportunistic users, applied professionals can now reap many of the benefits afforded by the Bayesian paradigm. In this paper, we touch on six modern opportunities and challenges in applied Bayesian statistics: intelligent data collection, new data sources, federated analysis, inference for implicit models, model transfer and purposeful software products. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
Collapse
Affiliation(s)
- Joshua J. Bon
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Adam Bretherton
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Katie Buchhorn
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Susanna Cramb
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Public Health and Social Work, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Christopher Drovandi
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Conor Hassan
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Adrianne L. Jenner
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Helen J. Mayfield
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Public Health, The University of Queensland, Saint Lucia, Queensland, Australia
| | - James M. McGree
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Kerrie Mengersen
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Aiden Price
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Robert Salomone
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Computer Science, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Edgar Santos-Fernandez
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Julie Vercelloni
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Xiaoyu Wang
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| |
Collapse
|
3
|
Hof JP, Vermeulen SH, Coolen ACC, Galesloot TE. Fast and accurate recurrent event analysis for genome-wide association studies. Genet Epidemiol 2023. [PMID: 37060326 DOI: 10.1002/gepi.22525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 03/05/2023] [Accepted: 03/28/2023] [Indexed: 04/16/2023]
Abstract
Many diseases recur after recovery, for example, recurrences in cancer and infections. However, research is often focused on analysing only time-to-first recurrence, thereby ignoring any subsequent recurrences that may occur after the first. Statistical models for the analysis of recurrent events are available, of which the extended Cox proportional hazards frailty model is the current state-of-the-art. However, this model is too statistically complex for computationally efficient application in high-dimensional data sets, including genome-wide association studies (GWAS). Here, we develop an application for fast and accurate recurrent event analysis in GWAS, called SPARE (SaddlePoint Approximation for Recurrent Event analysis). In SPARE, every DNA variant is tested for association with recurrence risk using a modified score statistic. A saddlepoint approximation is implemented to achieve statistical accuracy. SPARE controls the Type I error, and its statistical power is similar to existing recurrent event models, yet SPARE is significantly faster. An application of SPARE in a recurrent event GWAS on bladder cancer for 6.2 million DNA variants in 1,443 individuals required less than 15 min, whereas existing recurrent event methods would require several weeks.
Collapse
Affiliation(s)
- Jasper P Hof
- Department for Health Evidence, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Sita H Vermeulen
- Department for Health Evidence, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Anthony C C Coolen
- Department of Biophysics, Donders Institute, Radboud University, Nijmegen, The Netherlands
| | - Tessel E Galesloot
- Department for Health Evidence, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|