1
|
Spytek M, Krzyziński M, Langbein SH, Baniecki H, Wright MN, Biecek P. survex: an R package for explaining machine learning survival models. Bioinformatics 2023; 39:btad723. [PMID: 38039146 PMCID: PMC11025379 DOI: 10.1093/bioinformatics/btad723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/10/2023] [Accepted: 11/29/2023] [Indexed: 12/03/2023] Open
Abstract
SUMMARY Due to their flexibility and superior performance, machine learning models frequently complement and outperform traditional statistical survival models. However, their widespread adoption is hindered by a lack of user-friendly tools to explain their internal operations and prediction rationales. To tackle this issue, we introduce the survex R package, which provides a cohesive framework for explaining any survival model by applying explainable artificial intelligence techniques. The capabilities of the proposed software encompass understanding and diagnosing survival models, which can lead to their improvement. By revealing insights into the decision-making process, such as variable effects and importances, survex enables the assessment of model reliability and the detection of biases. Thus, transparency and responsibility may be promoted in sensitive areas, such as biomedical research and healthcare applications. AVAILABILITY AND IMPLEMENTATION survex is available under the GPL3 public license at https://github.com/modeloriented/survex and on CRAN with documentation available at https://modeloriented.github.io/survex.
Collapse
Affiliation(s)
- Mikołaj Spytek
- MI2.AI, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Mateusz Krzyziński
- MI2.AI, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Sophie Hanna Langbein
- Leibniz Institute for Prevention Research and Epidemiology—BIPS, Bremen, Germany
- Faculty of Mathematics and Computer Science, University of Bremen, Bremen, Germany
| | - Hubert Baniecki
- MI2.AI, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- MI2.AI, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Marvin N Wright
- Leibniz Institute for Prevention Research and Epidemiology—BIPS, Bremen, Germany
- Faculty of Mathematics and Computer Science, University of Bremen, Bremen, Germany
- Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Przemysław Biecek
- MI2.AI, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- MI2.AI, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| |
Collapse
|
2
|
Abstract
The growing need for in-depth analysis of predictive models leads to a series of new methods for explaining their local and global properties. Which of these methods is the best? It turns out that this is an ill-posed question. One cannot sufficiently explain a black-box machine learning model using a single method that gives only one perspective. Isolated explanations are prone to misunderstanding, leading to wrong or simplistic reasoning. This problem is known as the Rashomon effect and refers to diverse, even contradictory, interpretations of the same phenomenon. Surprisingly, most methods developed for explainable and responsible machine learning focus on a single-aspect of the model behavior. In contrast, we showcase the problem of explainability as an interactive and sequential analysis of a model. This paper proposes how different Explanatory Model Analysis (EMA) methods complement each other and discusses why it is essential to juxtapose them. The introduced process of Interactive EMA (IEMA) derives from the algorithmic side of explainable machine learning and aims to embrace ideas developed in cognitive sciences. We formalize the grammar of IEMA to describe human-model interaction. It is implemented in a widely used human-centered open-source software framework that adopts interactivity, customizability and automation as its main traits. We conduct a user study to evaluate the usefulness of IEMA, which indicates that an interactive sequential analysis of a model may increase the accuracy and confidence of human decision making.
Collapse
Affiliation(s)
- Hubert Baniecki
- Warsaw University of Technology, Warsaw, Poland
- University of Warsaw, Warsaw, Poland
| | | | - Przemyslaw Biecek
- Warsaw University of Technology, Warsaw, Poland
- University of Warsaw, Warsaw, Poland
| |
Collapse
|
4
|
Pfeifer B, Baniecki H, Saranti A, Biecek P, Holzinger A. Multi-omics disease module detection with an explainable Greedy Decision Forest. Sci Rep 2022; 12:16857. [PMID: 36207536 PMCID: PMC9546860 DOI: 10.1038/s41598-022-21417-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 09/27/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning methods can detect complex relationships between variables, but usually do not exploit domain knowledge. This is a limitation because in many scientific disciplines, such as systems biology, domain knowledge is available in the form of graphs or networks, and its use can improve model performance. We need network-based algorithms that are versatile and applicable in many research areas. In this work, we demonstrate subnetwork detection based on multi-modal node features using a novel Greedy Decision Forest (GDF) with inherent interpretability. The latter will be a crucial factor to retain experts and gain their trust in such algorithms. To demonstrate a concrete application example, we focus on bioinformatics, systems biology and particularly biomedicine, but the presented methodology is applicable in many other domains as well. Systems biology is a good example of a field in which statistical data-driven machine learning enables the analysis of large amounts of multi-modal biomedical data. This is important to reach the future goal of precision medicine, where the complexity of patients is modeled on a system level to best tailor medical decisions, health practices and therapies to the individual patient. Our proposed explainable approach can help to uncover disease-causing network modules from multi-omics data to better understand complex diseases such as cancer.
Collapse
Affiliation(s)
- Bastian Pfeifer
- Institute for Medical Informatics Statistics and Documentation, Medical University Graz, Graz, Austria.
| | - Hubert Baniecki
- MI2DataLab, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Anna Saranti
- Institute for Medical Informatics Statistics and Documentation, Medical University Graz, Graz, Austria.,Human-Centered AI Lab, University of Natural Resources and Life Sciences, Vienna, Austria
| | - Przemyslaw Biecek
- MI2DataLab, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Andreas Holzinger
- Institute for Medical Informatics Statistics and Documentation, Medical University Graz, Graz, Austria.,Human-Centered AI Lab, University of Natural Resources and Life Sciences, Vienna, Austria.,Alberta Machine Intelligence Institute, Alberta, Canada
| |
Collapse
|
5
|
Gierczyński R, Czerw A, Juszczyk G, Charkiewicz R, Nikliński J, Majewski P, Reszeć J, Piątyszek P, Baniecki H, Biecek P, Henry BM. Quantitative analysis of RT-PCR test results for SARS-CoV-2 diagnostics across Poland during COVID-19 pandemic: Comparison between early stage and major pandemic waves in 2020 and 2021 with reference to SARS-CoV-2 variants. Adv Med Sci 2022; 67:386-392. [PMID: 36191361 PMCID: PMC9468313 DOI: 10.1016/j.advms.2022.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 08/15/2022] [Accepted: 09/07/2022] [Indexed: 11/01/2022]
|
6
|
Sulewska A, Niklinski J, Charkiewicz R, Karabowicz P, Biecek P, Baniecki H, Kowalczuk O, Kozlowski M, Modzelewska P, Majewski P, Tryniszewska E, Reszec J, Dzieciol-Anikiej Z, Piwkowski C, Gryczka R, Ramlau R. A Signature of 14 Long Non-Coding RNAs (lncRNAs) as a Step towards Precision Diagnosis for NSCLC. Cancers (Basel) 2022; 14:cancers14020439. [PMID: 35053601 PMCID: PMC8773641 DOI: 10.3390/cancers14020439] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 01/11/2022] [Indexed: 02/04/2023] Open
Abstract
LncRNAs have arisen as new players in the world of non-coding RNA. Disrupted expression of these molecules can be tightly linked to the onset, promotion and progression of cancer. The present study estimated the usefulness of 14 lncRNAs (HAGLR, ADAMTS9-AS2, LINC00261, MCM3AP-AS1, TP53TG1, C14orf132, LINC00968, LINC00312, TP73-AS1, LOC344887, LINC00673, SOX2-OT, AFAP1-AS1, LOC730101) for early detection of non-small-cell lung cancer (NSCLC). The total RNA was isolated from paired fresh-frozen cancerous and noncancerous lung tissue from 92 NSCLC patients diagnosed with either adenocarcinoma (LUAD) or lung squamous cell carcinoma (LUSC). The expression level of lncRNAs was evaluated by a quantitative real-time PCR (qPCR). Based on Ct and delta Ct values, logistic regression and gradient boosting decision tree classifiers were built. The latter is a novel, advanced machine learning algorithm with great potential in medical science. The established predictive models showed that a set of 14 lncRNAs accurately discriminates cancerous from noncancerous lung tissues (AUC value of 0.98 ± 0.01) and NSCLC subtypes (AUC value of 0.84 ± 0.09), although the expression of a few molecules was statistically insignificant (SOX2-OT, AFAP1-AS1 and LOC730101 for tumor vs. normal tissue; and TP53TG1, C14orf132, LINC00968 and LOC730101 for LUAD vs. LUSC). However for subtypes discrimination, the simplified logistic regression model based on the four variables (delta Ct AFAP1-AS1, Ct SOX2-OT, Ct LINC00261, and delta Ct LINC00673) had even stronger diagnostic potential than the original one (AUC value of 0.88 ± 0.07). Our results demonstrate that the 14 lncRNA signature can be an auxiliary tool to endorse and complement the histological diagnosis of non-small-cell lung cancer.
Collapse
Affiliation(s)
- Anetta Sulewska
- Department of Clinical Molecular Biology, Medical University of Bialystok, 15-269 Bialystok, Poland; (J.N.); (R.C.); (O.K.)
- Correspondence:
| | - Jacek Niklinski
- Department of Clinical Molecular Biology, Medical University of Bialystok, 15-269 Bialystok, Poland; (J.N.); (R.C.); (O.K.)
| | - Radoslaw Charkiewicz
- Department of Clinical Molecular Biology, Medical University of Bialystok, 15-269 Bialystok, Poland; (J.N.); (R.C.); (O.K.)
- Center of Experimental Medicine, Medical University of Bialystok, 15-369 Bialystok, Poland
| | - Piotr Karabowicz
- Biobank, Medical University of Bialystok, 15-269 Bialystok, Poland; (P.K.); (P.M.); (J.R.); (Z.D.-A.)
| | - Przemyslaw Biecek
- Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662 Warsaw, Poland; (P.B.); (H.B.)
| | - Hubert Baniecki
- Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662 Warsaw, Poland; (P.B.); (H.B.)
| | - Oksana Kowalczuk
- Department of Clinical Molecular Biology, Medical University of Bialystok, 15-269 Bialystok, Poland; (J.N.); (R.C.); (O.K.)
| | - Miroslaw Kozlowski
- Department of Thoracic Surgery, Medical University of Bialystok, 15-269 Bialystok, Poland;
| | - Patrycja Modzelewska
- Biobank, Medical University of Bialystok, 15-269 Bialystok, Poland; (P.K.); (P.M.); (J.R.); (Z.D.-A.)
| | - Piotr Majewski
- Department of Microbiological Diagnostics and Infectious Immunology, Medical University of Bialystok, 15-269 Bialystok, Poland; (P.M.); (E.T.)
| | - Elzbieta Tryniszewska
- Department of Microbiological Diagnostics and Infectious Immunology, Medical University of Bialystok, 15-269 Bialystok, Poland; (P.M.); (E.T.)
| | - Joanna Reszec
- Biobank, Medical University of Bialystok, 15-269 Bialystok, Poland; (P.K.); (P.M.); (J.R.); (Z.D.-A.)
- Department of Medical Pathomorphology, Medical University of Bialystok, 15-269 Bialystok, Poland
| | - Zofia Dzieciol-Anikiej
- Biobank, Medical University of Bialystok, 15-269 Bialystok, Poland; (P.K.); (P.M.); (J.R.); (Z.D.-A.)
- Department of Rehabilitation, Medical University of Bialystok, 15-089 Bialystok, Poland
| | - Cezary Piwkowski
- Department of Thoracic Surgery, Poznan University of Medical Sciences, 60-569 Poznan, Poland;
| | - Robert Gryczka
- Department of Oncology, Poznan University of Medical Sciences, 60-569 Poznan, Poland; (R.G.); (R.R.)
| | - Rodryg Ramlau
- Department of Oncology, Poznan University of Medical Sciences, 60-569 Poznan, Poland; (R.G.); (R.R.)
| |
Collapse
|