1
|
Maity S, Dutta D, Terhorst J, Sun Y, Banerjee M. A linear adjustment-based approach to posterior drift in transfer learning. Biometrika 2024; 111:31-50. [PMID: 38948430 PMCID: PMC11212525 DOI: 10.1093/biomet/asad029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Indexed: 07/02/2024] Open
Abstract
We present new models and methods for the posterior drift problem where the regression function in the target domain is modelled as a linear adjustment, on an appropriate scale, of that in the source domain, and study the theoretical properties of our proposed estimators in the binary classification problem. The core idea of our model inherits the simplicity and the usefulness of generalized linear models and accelerated failure time models from the classical statistics literature. Our approach is shown to be flexible and applicable in a variety of statistical settings, and can be adopted for transfer learning problems in various domains including epidemiology, genetics and biomedicine. As concrete applications, we illustrate the power of our approach (i) through mortality prediction for British Asians by borrowing strength from similar data from the larger pool of British Caucasians, using the UK Biobank data, and (ii) in overcoming a spurious correlation present in the source domain of the Waterbirds dataset.
Collapse
Affiliation(s)
- Subha Maity
- Department of Statistics, University of Michigan, 1085 South University Avenue, Ann Arbor, Michigan 48109, U.S.A.
| | - Diptavo Dutta
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology & Genetics, National Cancer Institute, 9609 Medical Center Drive, Bethesda, Maryland 20892, U.S.A
| | | | | | - Moulinath Banerjee
- Department of Statistics, University of Michigan, 1085 South University Avenue, Ann Arbor, Michigan 48109, U.S.A
| |
Collapse
|
2
|
Bon JJ, Bretherton A, Buchhorn K, Cramb S, Drovandi C, Hassan C, Jenner AL, Mayfield HJ, McGree JM, Mengersen K, Price A, Salomone R, Santos-Fernandez E, Vercelloni J, Wang X. Being Bayesian in the 2020s: opportunities and challenges in the practice of modern applied Bayesian statistics. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220156. [PMID: 36970822 PMCID: PMC10041356 DOI: 10.1098/rsta.2022.0156] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/06/2023] [Indexed: 06/18/2023]
Abstract
Building on a strong foundation of philosophy, theory, methods and computation over the past three decades, Bayesian approaches are now an integral part of the toolkit for most statisticians and data scientists. Whether they are dedicated Bayesians or opportunistic users, applied professionals can now reap many of the benefits afforded by the Bayesian paradigm. In this paper, we touch on six modern opportunities and challenges in applied Bayesian statistics: intelligent data collection, new data sources, federated analysis, inference for implicit models, model transfer and purposeful software products. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
Collapse
Affiliation(s)
- Joshua J. Bon
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Adam Bretherton
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Katie Buchhorn
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Susanna Cramb
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Public Health and Social Work, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Christopher Drovandi
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Conor Hassan
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Adrianne L. Jenner
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Helen J. Mayfield
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Public Health, The University of Queensland, Saint Lucia, Queensland, Australia
| | - James M. McGree
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Kerrie Mengersen
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Aiden Price
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Robert Salomone
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Computer Science, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Edgar Santos-Fernandez
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Julie Vercelloni
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Xiaoyu Wang
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| |
Collapse
|
5
|
Tian Y, Feng Y. Transfer Learning under High-dimensional Generalized Linear Models. J Am Stat Assoc 2022; 118:2684-2697. [PMID: 38562655 PMCID: PMC10982637 DOI: 10.1080/01621459.2022.2071278] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Accepted: 04/20/2022] [Indexed: 10/18/2022]
Abstract
In this work, we study the transfer learning problem under highdimensional generalized linear models (GLMs), which aim to improve the fit on target data by borrowing information from useful source data. Given which sources to transfer, we propose a transfer learning algorithm on GLM, and derive its ℓ1 / ℓ2-estimation error bounds as well as a bound for a prediction error measure. The theoretical analysis shows that when the target and source are sufficiently close to each other, these bounds could be improved over those of the classical penalized estimator using only target data under mild conditions. When we don't know which sources to transfer, an algorithm-free transferable source detection approach is introduced to detect informative sources. The detection consistency is proved under the high-dimensional GLM transfer learning setting. We also propose an algorithm to construct confidence intervals of each coefficient component, and the corresponding theories are provided. Extensive simulations and a real-data experiment verify the effectiveness of our algorithms. We implement the proposed GLM transfer learning algorithms in a new R package glmtrans, which is available on CRAN.
Collapse
Affiliation(s)
- Ye Tian
- Department of Statistics, Columbia University
| | - Yang Feng
- Department of Biostatistics, School of Global Public Health, New York University
| |
Collapse
|