Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Mirkes EM, Bac J, Fouché A, Stasenko SV, Zinovyev A, Gorban AN. Domain Adaptation Principal Component Analysis: Base Linear Method for Learning with Out-of-Distribution Data. Entropy (Basel) 2022;25:33. [PMID: 36673174 PMCID: PMC9858254 DOI: 10.3390/e25010033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/18/2022] [Accepted: 12/21/2022] [Indexed: 06/17/2023]

For:	Mirkes EM, Bac J, Fouché A, Stasenko SV, Zinovyev A, Gorban AN. Domain Adaptation Principal Component Analysis: Base Linear Method for Learning with Out-of-Distribution Data. Entropy (Basel) 2022;25:33. [PMID: 36673174 PMCID: PMC9858254 DOI: 10.3390/e25010033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/18/2022] [Accepted: 12/21/2022] [Indexed: 06/17/2023]

Number

Cited by Other Article(s)

Eissa T, Huber M, Obermayer-Pietsch B, Linkohr B, Peters A, Fleischmann F, Žigman M. CODI: Enhancing machine learning-based molecular profiling through contextual out-of-distribution integration. PNAS NEXUS 2024;3:pgae449. [PMID: 39440022 PMCID: PMC11495219 DOI: 10.1093/pnasnexus/pgae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 09/07/2024] [Indexed: 10/25/2024]

Kagerbauer SM, Ulm B, Podtschaske AH, Andonov DI, Blobner M, Jungwirth B, Graessner M. Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic. BMC Med Inform Decis Mak 2024;24:34. [PMID: 38308256 PMCID: PMC10837894 DOI: 10.1186/s12911-024-02428-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 01/16/2024] [Indexed: 02/04/2024] Open

Abstract

BACKGROUND

Concept drift and covariate shift lead to a degradation of machine learning (ML) models. The objective of our study was to characterize sudden data drift as caused by the COVID pandemic. Furthermore, we investigated the suitability of certain methods in model training to prevent model degradation caused by data drift.

METHODS

We trained different ML models with the H2O AutoML method on a dataset comprising 102,666 cases of surgical patients collected in the years 2014-2019 to predict postoperative mortality using preoperatively available data. Models applied were Generalized Linear Model with regularization, Default Random Forest, Gradient Boosting Machine, eXtreme Gradient Boosting, Deep Learning and Stacked Ensembles comprising all base models. Further, we modified the original models by applying three different methods when training on the original pre-pandemic dataset: (Rahmani K, et al, Int J Med Inform 173:104930, 2023) we weighted older data weaker, (Morger A, et al, Sci Rep 12:7244, 2022) used only the most recent data for model training and (Dilmegani C, 2023) performed a z-transformation of the numerical input parameters. Afterwards, we tested model performance on a pre-pandemic and an in-pandemic data set not used in the training process, and analysed common features.

RESULTS

The models produced showed excellent areas under receiver-operating characteristic and acceptable precision-recall curves when tested on a dataset from January-March 2020, but significant degradation when tested on a dataset collected in the first wave of the COVID pandemic from April-May 2020. When comparing the probability distributions of the input parameters, significant differences between pre-pandemic and in-pandemic data were found. The endpoint of our models, in-hospital mortality after surgery, did not differ significantly between pre- and in-pandemic data and was about 1% in each case. However, the models varied considerably in the composition of their input parameters. None of our applied modifications prevented a loss of performance, although very different models emerged from it, using a large variety of parameters.

CONCLUSIONS

Our results show that none of our tested easy-to-implement measures in model training can prevent deterioration in the case of sudden external events. Therefore, we conclude that, in the presence of concept drift and covariate shift, close monitoring and critical review of model predictions are necessary.

Collapse

Fouché A, Chadoutaud L, Delattre O, Zinovyev A. Transmorph: a unifying computational framework for modular single-cell RNA-seq data integration. NAR Genom Bioinform 2023;5:lqad069. [PMID: 37448589 PMCID: PMC10336778 DOI: 10.1093/nargab/lqad069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 06/02/2023] [Accepted: 07/10/2023] [Indexed: 07/15/2023] Open

Fouché A, Zinovyev A. Omics data integration in computational biology viewed through the prism of machine learning paradigms. FRONTIERS IN BIOINFORMATICS 2023;3:1191961. [PMID: 37600970 PMCID: PMC10436311 DOI: 10.3389/fbinf.2023.1191961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 07/26/2023] [Indexed: 08/22/2023] Open