2
|
Wang X, Zhang HG, Xiong X, Hong C, Weber GM, Brat GA, Bonzel CL, Luo Y, Duan R, Palmer NP, Hutch MR, Gutiérrez-Sacristán A, Bellazzi R, Chiovato L, Cho K, Dagliati A, Estiri H, García-Barrio N, Griffier R, Hanauer DA, Ho YL, Holmes JH, Keller MS, Klann MEng JG, L'Yi S, Lozano-Zahonero S, Maidlow SE, Makoudjou A, Malovini A, Moal B, Moore JH, Morris M, Mowery DL, Murphy SN, Neuraz A, Yuan Ngiam K, Omenn GS, Patel LP, Pedrera-Jiménez M, Prunotto A, Jebathilagam Samayamuthu M, Sanz Vidorreta FJ, Schriver ER, Schubert P, Serrano-Balazote P, South AM, Tan ALM, Tan BWL, Tibollo V, Tippmann P, Visweswaran S, Xia Z, Yuan W, Zöller D, Kohane IS, Avillach P, Guo Z, Cai T. SurvMaximin: Robust federated approach to transporting survival risk prediction models. J Biomed Inform 2022; 134:104176. [PMID: 36007785 PMCID: PMC9707637 DOI: 10.1016/j.jbi.2022.104176] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 07/18/2022] [Accepted: 08/15/2022] [Indexed: 10/15/2022]
Abstract
OBJECTIVE For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information. MATERIALS AND METHODS For each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or a single center, corresponding to transfer learning. RESULTS Simulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations. CONCLUSIONS The SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.
Collapse
Affiliation(s)
- Xuan Wang
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Harrison G Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Xin Xiong
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Chuan Hong
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Gabriel A Brat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yuan Luo
- Department of Preventive Medicine Northwestern University, Chicago, IL, USA
| | - Rui Duan
- Department of Biostatistics, Harvard University, Boston, MA, USA
| | - Nathan P Palmer
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Meghan R Hutch
- Department of Preventive Medicine Northwestern University, Chicago, IL, USA
| | | | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Luca Chiovato
- Unit of Internal Medicine and Endocrinology, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Kelly Cho
- Population Health and Data Science, VA Boston Healthcare System, Boston, MA, USA; Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | - Arianna Dagliati
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Hossein Estiri
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | | | - Romain Griffier
- IAM unit, Bordeaux University Hospital, Bordeaux, France; INSERM Bordeaux Population Health ERIAS TEAM, ERIAS - Inserm U1219 BPH, Bordeaux, France
| | - David A Hanauer
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, USA
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | - John H Holmes
- Department of Biostatistics, Epidemiology, and Informatics University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Mark S Keller
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | | | - Sehi L'Yi
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Sara Lozano-Zahonero
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Sarah E Maidlow
- Michigan Institute for Clinical and Health Research (MICHR) Informatics, University of Michigan, Ann Arbor, MI, USA
| | - Adeline Makoudjou
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Alberto Malovini
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Bertrand Moal
- IAM unit, Bordeaux University Hospital, Bordeaux, France
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Danielle L Mowery
- Department of Biostatistics, Epidemiology, and Informatics University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Shawn N Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | - Antoine Neuraz
- Department of biomedical informatics, Hôpital Necker-Enfants Malade, Assistance Publique Hôpitaux de Paris (APHP), University of Paris, Paris, France
| | - Kee Yuan Ngiam
- Department of Biomedical informatics, WiSDM, National University Health Systems, Singapore
| | - Gilbert S Omenn
- Depts of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics, Public Health University of Michigan, Ann Arbor, MI, USA
| | - Lav P Patel
- Department of Internal Medicine, Division of Medical Informatics, University Of Kansas Medical Center
| | | | - Andrea Prunotto
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | | | | | - Emily R Schriver
- Data Analytics Center, University of Pennsylvania Health System, Philadelphia, PA, USA
| | - Petra Schubert
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | | | - Andrew M South
- Department of Pediatrics-Section of Nephrology, Brenner Children's, Wake Forest School of Medicine, Winston Salem, NC, USA
| | - Amelia L M Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Byorn W L Tan
- Department of Medicine, National University Hospital, Singapore
| | - Valentina Tibollo
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Patric Tippmann
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, PA, USA
| | - William Yuan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Daniela Zöller
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Zijian Guo
- Department of Statistics, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|