3
|
van Nee MM, Wessels LFA, van de Wiel MA. Flexible co-data learning for high-dimensional prediction. Stat Med 2021; 40:5910-5925. [PMID: 34438466 PMCID: PMC9292202 DOI: 10.1002/sim.9162] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 05/18/2021] [Accepted: 07/29/2021] [Indexed: 02/06/2023]
Abstract
Clinical research often focuses on complex traits in which many variables play a role in mechanisms driving, or curing, diseases. Clinical prediction is hard when data is high-dimensional, but additional information, like domain knowledge and previously published studies, may be helpful to improve predictions. Such complementary data, or co-data, provide information on the covariates, such as genomic location or P-values from external studies. We use multiple and various co-data to define possibly overlapping or hierarchically structured groups of covariates. These are then used to estimate adaptive multi-group ridge penalties for generalized linear and Cox models. Available group adaptive methods primarily target for settings with few groups, and therefore likely overfit for non-informative, correlated or many groups, and do not account for known structure on group level. To handle these issues, our method combines empirical Bayes estimation of the hyperparameters with an extra level of flexible shrinkage. This renders a uniquely flexible framework as any type of shrinkage can be used on the group level. We describe various types of co-data and propose suitable forms of hypershrinkage. The method is very versatile, as it allows for integration and weighting of multiple co-data sets, inclusion of unpenalized covariates and posterior variable selection. For three cancer genomics applications we demonstrate improvements compared to other models in terms of performance, variable selection stability and validation.
Collapse
Affiliation(s)
- Mirrelijn M van Nee
- Epidemiology & Data Science
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Lodewyk F A Wessels
- Molecular Carcinogenesis, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Computational Cancer Biology, Oncode Institute, Amsterdam, The Netherlands.,Intelligent Systems, Delft University of Technology, Delft, The Netherlands
| | - Mark A van de Wiel
- Epidemiology & Data Science
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, Amsterdam, The Netherlands.,MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
6
|
Culos A, Tsai AS, Stanley N, Becker M, Ghaemi MS, McIlwain DR, Fallahzadeh R, Tanada A, Nassar H, Espinosa C, Xenochristou M, Ganio E, Peterson L, Han X, Stelzer IA, Ando K, Gaudilliere D, Phongpreecha T, Marić I, Chang AL, Shaw GM, Stevenson DK, Bendall S, Davis KL, Fantl W, Nolan GP, Hastie T, Tibshirani R, Angst MS, Gaudilliere B, Aghaeepour N. Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions. NAT MACH INTELL 2020; 2:619-628. [PMID: 33294774 PMCID: PMC7720904 DOI: 10.1038/s42256-020-00232-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Accepted: 08/26/2020] [Indexed: 12/17/2022]
Abstract
The dense network of interconnected cellular signalling responses that are quantifiable in peripheral immune cells provides a wealth of actionable immunological insights. Although high-throughput single-cell profiling techniques, including polychromatic flow and mass cytometry, have matured to a point that enables detailed immune profiling of patients in numerous clinical settings, the limited cohort size and high dimensionality of data increase the possibility of false-positive discoveries and model overfitting. We introduce a generalizable machine learning platform, the immunological Elastic-Net (iEN), which incorporates immunological knowledge directly into the predictive models. Importantly, the algorithm maintains the exploratory nature of the high-dimensional dataset, allowing for the inclusion of immune features with strong predictive capabilities even if not consistent with prior knowledge. In three independent studies our method demonstrates improved predictions for clinically relevant outcomes from mass cytometry data generated from whole blood, as well as a large simulated dataset. The iEN is available under an open-source licence.
Collapse
Affiliation(s)
- Anthony Culos
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
- These authors contributed equally: Anthony Culos, Amy S. Tsai
| | - Amy S Tsai
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- These authors contributed equally: Anthony Culos, Amy S. Tsai
| | - Natalie Stanley
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
| | - Martin Becker
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
| | - Mohammad S Ghaemi
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
- Digital Technologies Research Centre, National Research Council Canada, Toronto, Ontario, Canada
| | - David R McIlwain
- Department of Microbiology and Immunology, Baxter Laboratory in Stem Cell Biology, Stanford University School of Medicine, Stanford, CA, USA
| | - Ramin Fallahzadeh
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
| | - Athena Tanada
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
| | - Huda Nassar
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
| | - Camilo Espinosa
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
| | - Maria Xenochristou
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
| | - Edward Ganio
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Laura Peterson
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Xiaoyuan Han
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Ina A Stelzer
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Kazuo Ando
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Dyani Gaudilliere
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Thanaphong Phongpreecha
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Ivana Marić
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alan L Chang
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
| | - Gary M Shaw
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - David K Stevenson
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Sean Bendall
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Kara L Davis
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Wendy Fantl
- Department of Microbiology and Immunology, Baxter Laboratory in Stem Cell Biology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Obstetrics and Gynecology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Urology, Stanford University School of Medicine, Stanford, CA, USA
| | - Garry P Nolan
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Trevor Hastie
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Robert Tibshirani
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Martin S Angst
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- These authors jointly supervised this work: Martin S. Angst, Brice Gaudilliere, Nima Aghaeepour
| | - Brice Gaudilliere
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA
- These authors jointly supervised this work: Martin S. Angst, Brice Gaudilliere, Nima Aghaeepour
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA
- These authors jointly supervised this work: Martin S. Angst, Brice Gaudilliere, Nima Aghaeepour
| |
Collapse
|