Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Roder J, Maguire L, Georgantas R, Roder H. Explaining multivariate molecular diagnostic tests via Shapley values. BMC Med Inform Decis Mak 2021;21:211. [PMID: 34238309 PMCID: PMC8265031 DOI: 10.1186/s12911-021-01569-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 06/29/2021] [Indexed: 11/17/2022] Open

For:	Roder J, Maguire L, Georgantas R, Roder H. Explaining multivariate molecular diagnostic tests via Shapley values. BMC Med Inform Decis Mak 2021;21:211. [PMID: 34238309 PMCID: PMC8265031 DOI: 10.1186/s12911-021-01569-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 06/29/2021] [Indexed: 11/17/2022] Open

Number

Cited by Other Article(s)

Banerjee A, Sharma A, Kamble P, Garg P. Prediction of Mycobacterium tuberculosis cell wall permeability using machine learning methods. Mol Divers 2024:10.1007/s11030-024-10952-3. [PMID: 39133353 DOI: 10.1007/s11030-024-10952-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 07/26/2024] [Indexed: 08/13/2024]

Zhou Q, Wang L, Craft J, Weber J, Passick M, Ngai N, Khalique OK, Goldfarb JW, Barasch E, Cao JJ. A machine learning-derived risk score to predict left ventricular diastolic dysfunction from clinical cardiovascular magnetic resonance imaging. Front Cardiovasc Med 2024;11:1382418. [PMID: 38903970 PMCID: PMC11187483 DOI: 10.3389/fcvm.2024.1382418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 05/27/2024] [Indexed: 06/22/2024] Open

Bifarin O, Fernández FM. Automated Machine Learning and Explainable AI (AutoML-XAI) for Metabolomics: Improving Cancer Diagnostics. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024;35:1089-1100. [PMID: 38690775 PMCID: PMC11157651 DOI: 10.1021/jasms.3c00403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 02/08/2024] [Accepted: 04/23/2024] [Indexed: 05/03/2024]

Abstract

Metabolomics generates complex data necessitating advanced computational methods for generating biological insight. While machine learning (ML) is promising, the challenges of selecting the best algorithms and tuning hyperparameters, particularly for nonexperts, remain. Automated machine learning (AutoML) can streamline this process; however, the issue of interpretability could persist. This research introduces a unified pipeline that combines AutoML with explainable AI (XAI) techniques to optimize metabolomics analysis. We tested our approach on two data sets: renal cell carcinoma (RCC) urine metabolomics and ovarian cancer (OC) serum metabolomics. AutoML, using Auto-sklearn, surpassed standalone ML algorithms like SVM and k-Nearest Neighbors in differentiating between RCC and healthy controls, as well as OC patients and those with other gynecological cancers. The effectiveness of Auto-sklearn is highlighted by its AUC scores of 0.97 for RCC and 0.85 for OC, obtained from the unseen test sets. Importantly, on most of the metrics considered, Auto-sklearn demonstrated a better classification performance, leveraging a mix of algorithms and ensemble techniques. Shapley Additive Explanations (SHAP) provided a global ranking of feature importance, identifying dibutylamine and ganglioside GM(d34:1) as the top discriminative metabolites for RCC and OC, respectively. Waterfall plots offered local explanations by illustrating the influence of each metabolite on individual predictions. Dependence plots spotlighted metabolite interactions, such as the connection between hippuric acid and one of its derivatives in RCC, and between GM3(d34:1) and GM3(18:1_16:0) in OC, hinting at potential mechanistic relationships. Through decision plots, a detailed error analysis was conducted, contrasting feature importance for correctly versus incorrectly classified samples. In essence, our pipeline emphasizes the importance of harmonizing AutoML and XAI, facilitating both simplified ML application and improved interpretability in metabolomics data science.

Collapse

Bilodeau B, Jaques N, Koh PW, Kim B. Impossibility theorems for feature attribution. Proc Natl Acad Sci U S A 2024;121:e2304406120. [PMID: 38181057 PMCID: PMC10786278 DOI: 10.1073/pnas.2304406120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 10/11/2023] [Indexed: 01/07/2024] Open

Bifarin OO, Fernández FM. Automated machine learning and explainable AI (AutoML-XAI) for metabolomics: improving cancer diagnostics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.26.564244. [PMID: 37961534 PMCID: PMC10634896 DOI: 10.1101/2023.10.26.564244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]

Abstract

Motivation

Results

We tested our approach on two datasets: renal cell carcinoma (RCC) urine metabolomics and ovarian cancer (OC) serum metabolomics. AutoML, using auto-sklearn, surpassed standalone ML algorithms such as SVM and random forest in differentiating between RCC and healthy controls, as well as OC patients and those with other gynecological cancers (Non-OC). Auto-sklearn employed a mix of algorithms and ensemble techniques, yielding a superior performance (AUC of 0.97 for RCC and 0.85 for OC). Shapley Additive Explanations (SHAP) provided a global ranking of feature importance, identifying dibutylamine and ganglioside GM(d34:1) as the top discriminative metabolites for RCC and OC, respectively. Waterfall plots offered local explanations by illustrating the influence of each metabolite on individual predictions. Dependence plots spotlighted metabolite interactions, such as the connection between hippuric acid and one of its derivatives in RCC, and between GM3(d34:1) and GM3(18:1_16:0) in OC, hinting at potential mechanistic relationships. Through decision plots, a detailed error analysis was conducted, contrasting feature importance for correctly versus incorrectly classified samples. In essence, our pipeline emphasizes the importance of harmonizing AutoML and XAI, facilitating both simplified ML application and improved interpretability in metabolomics data science.

Availability

https://github.com/obifarin/automl-xai-metabolomics.

Collapse

Ferrão LFV, Dhakal R, Dias R, Tieman D, Whitaker V, Gore MA, Messina C, Resende MFR. Machine learning applications to improve flavor and nutritional content of horticultural crops through breeding and genetics. Curr Opin Biotechnol 2023;83:102968. [PMID: 37515935 DOI: 10.1016/j.copbio.2023.102968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 06/19/2023] [Accepted: 06/21/2023] [Indexed: 07/31/2023]

Nayebi A, Tipirneni S, Reddy CK, Foreman B, Subbian V. WindowSHAP: An efficient framework for explaining time-series classifiers based on Shapley values. J Biomed Inform 2023;144:104438. [PMID: 37414368 PMCID: PMC10552726 DOI: 10.1016/j.jbi.2023.104438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 06/29/2023] [Accepted: 07/03/2023] [Indexed: 07/08/2023]

Abstract

Unpacking and comprehending how black-box machine learning algorithms (such as deep learning models) make decisions has been a persistent challenge for researchers and end-users. Explaining time-series predictive models is useful for clinical applications with high stakes to understand the behavior of prediction models, e.g., to determine how different variables and time points influence the clinical outcome. However, existing approaches to explain such models are frequently unique to architectures and data where the features do not have a time-varying component. In this paper, we introduce WindowSHAP, a model-agnostic framework for explaining time-series classifiers using Shapley values. We intend for WindowSHAP to mitigate the computational complexity of calculating Shapley values for long time-series data as well as improve the quality of explanations. WindowSHAP is based on partitioning a sequence into time windows. Under this framework, we present three distinct algorithms of Stationary, Sliding and Dynamic WindowSHAP, each evaluated against baseline approaches, KernelSHAP and TimeSHAP, using perturbation and sequence analyses metrics. We applied our framework to clinical time-series data from both a specialized clinical domain (Traumatic Brain Injury - TBI) as well as a broad clinical domain (critical care medicine). The experimental results demonstrate that, based on the two quantitative metrics, our framework is superior at explaining clinical time-series classifiers, while also reducing the complexity of computations. We show that for time-series data with 120 time steps (hours), merging 10 adjacent time points can reduce the CPU time of WindowSHAP by 80 % compared to KernelSHAP. We also show that our Dynamic WindowSHAP algorithm focuses more on the most important time steps and provides more understandable explanations. As a result, WindowSHAP not only accelerates the calculation of Shapley values for time-series data, but also delivers more understandable explanations with higher quality.

Collapse

Droit A, Pelletier S, Leclerq M, Roux-Dalvai F, de Geus M, Leslie S, Wang W, Lam T, Nairn A, Arnold S, Carlyle B, Precioso F. Enhancing Classification of liquid chromatography mass spectrometry data with Batch Effect Removal Neural Networks (BERNN). RESEARCH SQUARE 2023:rs.3.rs-3112514. [PMID: 37461653 PMCID: PMC10350225 DOI: 10.21203/rs.3.rs-3112514/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]

Lisboa P, Saralajew S, Vellido A, Fernández-Domenech R, Villmann T. The Coming of Age of Interpretable and Explainable Machine Learning Models. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.02.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2023]

Lee CL, Liu WJ, Tsai SF. Development and Validation of an Insulin Resistance Model for a Population with Chronic Kidney Disease Using a Machine Learning Approach. Nutrients 2022;14:nu14142832. [PMID: 35889789 PMCID: PMC9319821 DOI: 10.3390/nu14142832] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 07/06/2022] [Accepted: 07/06/2022] [Indexed: 02/01/2023] Open

Abstract

Background: Chronic kidney disease (CKD) is a complex syndrome without a definitive treatment. For these patients, insulin resistance (IR) is associated with worse renal and patient outcomes. Until now, no predictive model using machine learning (ML) has been reported on IR in CKD patients. Methods: The CKD population studied was based on results from the National Health and Nutrition Examination Survey (NHANES) of the USA from 1999 to 2012. The homeostasis model assessment of IR (HOMA-IR) was used to assess insulin resistance. We began the model building process via the ML algorithm (random forest (RF), eXtreme Gradient Boosting (XGboost), logistic regression algorithms, and deep neural learning (DNN)). We compared different receiver operating characteristic (ROC) curves from different algorithms. Finally, we used SHAP values (SHapley Additive exPlanations) to explain how the different ML models worked. Results: In this study population, 71,916 participants were enrolled. Finally, we analyzed 1,229 of these participants. Their data were segregated into the IR group (HOMA IR > 3, n = 572) or non-IR group (HOMR IR ≤ 3, n = 657). In the validation group, RF had a higher accuracy (0.77), specificity (0.81), PPV (0.77), and NPV (0.77). In the test group, XGboost had a higher AUC of ROC (0.78). In addition, XGBoost also had a higher accuracy (0.7) and NPV (0.71). RF had a higher accuracy (0.7), specificity (0.78), and PPV (0.7). In the RF algorithm, the body mass index had a much larger impact on IR (0.1654), followed by triglyceride (0.0117), the daily calorie intake (0.0602), blood HDL value (0.0587), and age (0.0446). As for the SHAP value, in the RF algorithm, almost all features were well separated to show a positive or negative association with IR. Conclusion: This was the first study using ML to predict IR in patients with CKD. Our results showed that the RF algorithm had the best AUC of ROC and the best SHAP value differentiation. This was also the first study that included both macronutrients and micronutrients. We concluded that ML algorithms, particularly RF, can help determine risk factors and predict IR in patients with CKD.

Collapse

Solary E, Abou-Zeid N, Calvo F. Ageing and cancer: a research gap to fill. Mol Oncol 2022;16:3220-3237. [PMID: 35503718 PMCID: PMC9490141 DOI: 10.1002/1878-0261.13222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 04/01/2022] [Accepted: 05/02/2022] [Indexed: 12/03/2022] Open

Rich P, Mitchell RB, Schaefer E, Walker PR, Dubay JW, Boyd J, Oubre D, Page R, Khalil M, Sinha S, Boniol S, Halawani H, Santos ES, Brenner W, Orsini JM, Pauli E, Goldberg J, Veatch A, Haut M, Ghabach B, Bidyasar S, Quejada M, Khan W, Huang K, Traylor L, Akerley W. Real-world performance of blood-based proteomic profiling in first-line immunotherapy treatment in advanced stage non-small cell lung cancer. J Immunother Cancer 2021;9:jitc-2021-002989. [PMID: 34706885 PMCID: PMC8552188 DOI: 10.1136/jitc-2021-002989] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2021] [Indexed: 12/02/2022] Open

Abstract

Purpose

Immune checkpoint inhibition (ICI) therapy has improved patient outcomes in advanced non-small cell lung cancer (NSCLC), but better biomarkers are needed. A clinically validated, blood-based proteomic test, or host immune classifier (HIC), was assessed for its ability to predict ICI therapy outcomes in this real-world, prospectively designed, observational study.

Materials and methods

The prospectively designed, observational registry study INSIGHT (Clinical Effectiveness Assessment of VeriStrat® Testing and Validation of Immunotherapy Tests in NSCLC Subjects) (NCT03289780) includes 35 US sites having enrolled over 3570 NSCLC patients at any stage and line of therapy. After enrolment and prior to therapy initiation, all patients are tested and designated HIC-Hot (HIC-H) or HIC-Cold (HIC-C). A prespecified interim analysis was performed after 1-year follow-up with the first 2000 enrolled patients. We report the overall survival (OS) of patients with advanced stage (IIIB and IV) NSCLC treated in the first-line (ICI-containing therapies n=284; all first-line therapies n=877), by treatment type and in HIC-defined subgroups.

Results

OS for HIC-H patients was longer than OS for HIC-C patients across treatment regimens, including ICI. For patients treated with all ICI regimens, median OS was not reached (95% CI 15.4 to undefined months) for HIC-H (n=196) vs 5.0 months (95% CI 2.9 to 6.4) for HIC-C patients (n=88); HR=0.38 (95% CI 0.27 to 0.53), p<0.0001. For ICI monotherapy, OS was 16.8 vs 2.8 months (HR=0.36 (95% CI 0.22 to 0.58), p<0.0001) and for ICI with chemotherapy OS was unreached vs 6.4 months (HR=0.41 (95% CI 0.26 to 0.67), p=0.0003). HIC results were independent of programmed death ligand 1 (PD-L1). In a subgroup with PD-L1 ≥50% and performance status 0–1, HIC stratified survival significantly for ICI monotherapy but not ICI with chemotherapy.

Conclusion

Blood-based HIC proteomic testing provides clinically meaningful information for immunotherapy treatment decision in NSCLC independent of PD-L1. The data suggest that HIC-C patients should not be treated with ICI alone regardless of their PD-L1 expression.

Collapse

Affiliation(s)

Patricia Rich Lung Cancer, Piedmont Physicians Group, Atlanta, Georgia, USA
R Brian Mitchell Virginia Cancer Institute, Richmond, Virginia, USA
Eric Schaefer Highlands Oncology Group, Fayetteville, Arkansas, USA
Paul R Walker Leo W Jenkins Cancer Center, Brody School of Medicine at East Carolina University, Greenville, North Carolina, USA
John W Dubay Lewis and Faye Manderson Cancer Center at DCH Regional Medical Center, Tuscaloosa, Alabama, USA
Jason Boyd Southeastern Medical Oncology Center, Goldsboro, North Carolina, USA
David Oubre Pontchartrain Cancer Center, Covington, Louisiana, USA
Ray Page The Center for Cancer and Blood Disorders, Fort Worth, Texas, USA
Mazen Khalil St. Bernards Hospital, Inc, Jonesboro, Arkansas, USA
Suman Sinha Christus Saint Michael Health System, Texarkana, Texas, USA
Scott Boniol Christus Cancer Treatment Center, Shreveport, Louisiana, USA
Hafez Halawani St. Frances Cabrini Hospital Cancer Center, Alexandria, Louisiana, USA
Edgardo S Santos Florida Precision Oncology, Division of Genesis Care, Aventura, Florida, USA
Warren Brenner Lynn Clinical Research Institute, Boca Raton, Florida, USA
James M Orsini Essex Oncology Group, Belleville, New Jersey, USA
Emily Pauli Clearview Cancer Institute, Huntsville, Alabama, USA
Jonathan Goldberg Clinical Research Alliance, Caremount Medical, Mount Kisco, New York, USA
Andrea Veatch Northwest Medical Specialties, Puyallup, Washington, USA
Mitchell Haut Hematology and Oncology Associates, Inc, Canton, Ohio, USA
Bassam Ghabach John Peter Smith Hospital, Fort Worth, Texas, USA
Savita Bidyasar Pearlman Cancer Center, Valdosta, Georgia, USA
Maria Quejada Edward-Elmhurst Health, Naperville, Illinois, USA
Waseemullah Khan Lake City Cancer Care, Lake City, Florida, USA
Kan Huang Phelps County Regional Medical Center, Rolla, Missouri, USA
Linda Traylor Biodesix Inc, Boulder, Colorado, USA
Wallace Akerley Huntsman Cancer Institute Cancer Hospital, Salt Lake City, Utah, USA

Collapse