Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

35
(from Reference Citation Analysis)

Article PDFs (8)

Cited by > 0 (21)

Searched Name

Model interpretability

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Teng X, Han K, Jin W, Ma L, Wei L, Min D, Chen L, Du Y. Development and validation of an early diagnosis model for bone metastasis in non-small cell lung cancer based on serological characteristics of the bone metastasis mechanism. EClinicalMedicine 2024;72:102617. [PMID: 38707910 PMCID: PMC11066529 DOI: 10.1016/j.eclinm.2024.102617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 04/10/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open

Abstract

Background

Bone metastasis significantly impact the prognosis of non-small cell lung cancer (NSCLC) patients, reducing their quality of life and shortening their survival. Currently, there are no effective tools for the diagnosis and risk assessment of early bone metastasis in NSCLC patients. This study employed machine learning to analyze serum indicators that are closely associated with bone metastasis, aiming to construct a model for the timely detection and prognostic evaluation of bone metastasis in NSCLC patients.

Methods

The derivation cohort consisted of 664 individuals with stage IV NSCLC, diagnosed between 2015 and 2018. The variables considered in this study included age, sex, and 18 specific serum indicators that have been linked to the occurrence of bone metastasis in NSCLC. Variable selection used multivariate logistic regression analysis and Lasso regression analysis. Six machine learning methods were utilized to develop a bone metastasis diagnostic model, assessed with Area Under the Curve (AUC), Decision Curve Analysis (DCA), sensitivity, specificity, and validation cohorts. External validation used 113 NSCLC patients from the Medical Alliance (2019-2020). Furthermore, a prospective validation study was conducted on a cohort of 316 patients (2019-2020) who were devoid of bone metastasis, and followed-up for at least two years to assess the predictive capabilities of this model. The model's prognostic value was evaluated using Kaplan-Meier survival curves.

Findings

Through variable selection, 11 serum indictors were identified as independent predictive factors for NSCLC bone metastasis. Six machine learning models were developed using age, sex, and these serum indicators. A random forest (RF) model demonstrated strong performance during the training and internal validation cohorts, achieving an AUC of 0.98 (95% CI 0.95-0.99) for internal validation. External validation further confirmed the RF model's effectiveness, yielding an AUC of 0.97 (95% CI 0.94-0.99). The calibration curves demonstrated a high level of concordance between the anticipated risk and the observed risk of the RF model. Prospective validation revealed that the RF model could predict the occurrence of bone metastasis approximately 10.27 ± 3.58 months in advance, according to the results of the SPECT. An online computing platform (https://bonemetastasis.shinyapps.io/shiny_cls_1model/) for this RF model is publicly available and free-to-use by doctors and patients.

Interpretation

This study innovatively employs age, gender, and 11 serological markers closely related to the mechanism of bone metastasis to construct an RF model, providing a reliable tool for the early screening and prognostic assessment of bone metastasis in NSCLC patients. However, as an exploratory study, the findings require further validation through large-scale, multicenter prospective studies.

Funding

This work is supported by the National Natural Science Foundation of China (NO.81974315); Shanghai Municipal Science and Technology Commission Medical Innovation Research Project (NO.20Y11903300); Shanghai Municipal Health Commission Health Industry Clinical Research Youth Program (NO.20204Y034).

Collapse

Chen Y, Du Z, Ren X, Pan C, Zhu Y, Li Z, Meng T, Yao X. mRNA-CLA: An interpretable deep learning approach for predicting mRNA subcellular localization. Methods 2024;227:17-26. [PMID: 38705502 DOI: 10.1016/j.ymeth.2024.04.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 03/30/2024] [Accepted: 04/28/2024] [Indexed: 05/07/2024] Open

Tian Y, Yang X, Chen N, Li C, Yang W. Data-driven interpretable analysis for polysaccharide yield prediction. Environ Sci Ecotechnol 2024;19:100321. [PMID: 38021368 PMCID: PMC10661693 DOI: 10.1016/j.ese.2023.100321] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 09/17/2023] [Accepted: 09/17/2023] [Indexed: 12/01/2023]

Kang T, Ding W, Chen P. CRESPR: Modular sparsification of DNNs to improve pruning performance and model interpretability. Neural Netw 2024;172:106067. [PMID: 38199151 DOI: 10.1016/j.neunet.2023.12.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 11/27/2023] [Accepted: 12/12/2023] [Indexed: 01/12/2024]

Zhai S, Chen K, Yang L, Li Z, Yu T, Chen L, Zhu H. Applying machine learning to anaerobic fermentation of waste sludge using two targeted modeling strategies. Sci Total Environ 2024;916:170232. [PMID: 38278257 DOI: 10.1016/j.scitotenv.2024.170232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 01/13/2024] [Accepted: 01/15/2024] [Indexed: 01/28/2024]

Bhati A, Gour N, Khanna P, Ojha A, Werghi N. An interpretable dual attention network for diabetic retinopathy grading: IDANet. Artif Intell Med 2024;149:102782. [PMID: 38462283 DOI: 10.1016/j.artmed.2024.102782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 01/05/2024] [Accepted: 01/15/2024] [Indexed: 03/12/2024]

Wang T, Li YY. Predictive modeling based on artificial neural networks for membrane fouling in a large pilot-scale anaerobic membrane bioreactor for treating real municipal wastewater. Sci Total Environ 2024;912:169164. [PMID: 38081428 DOI: 10.1016/j.scitotenv.2023.169164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 11/25/2023] [Accepted: 12/05/2023] [Indexed: 12/17/2023]

Fan L, Gong X, Zheng C, Li J. Data pyramid structure for optimizing EUS-based GISTs diagnosis in multi-center analysis with missing label. Comput Biol Med 2024;169:107897. [PMID: 38171262 DOI: 10.1016/j.compbiomed.2023.107897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 12/04/2023] [Accepted: 12/23/2023] [Indexed: 01/05/2024]

Hou Z, Leng J, Yu J, Xia Z, Wu LY. PathExpSurv: pathway expansion for explainable survival analysis and disease gene discovery. BMC Bioinformatics 2023;24:434. [PMID: 37968615 PMCID: PMC10648621 DOI: 10.1186/s12859-023-05535-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/16/2023] [Indexed: 11/17/2023] Open

Miao Z, Zhao M, Zhang X, Ming D. LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interfaces and interpretability. Neuroimage 2023;276:120209. [PMID: 37269957 DOI: 10.1016/j.neuroimage.2023.120209] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 05/05/2023] [Accepted: 05/30/2023] [Indexed: 06/05/2023] Open

Abstract

Electroencephalography (EEG)-based brain-computer interfaces (BCIs) pose a challenge for decoding due to their low spatial resolution and signal-to-noise ratio. Typically, EEG-based recognition of activities and states involves the use of prior neuroscience knowledge to generate quantitative EEG features, which may limit BCI performance. Although neural network-based methods can effectively extract features, they often encounter issues such as poor generalization across datasets, high predicting volatility, and low model interpretability. To address these limitations, we propose a novel lightweight multi-dimensional attention network, called LMDA-Net. By incorporating two novel attention modules designed specifically for EEG signals, the channel attention module and the depth attention module, LMDA-Net is able to effectively integrate features from multiple dimensions, resulting in improved classification performance across various BCI tasks. LMDA-Net was evaluated on four high-impact public datasets, including motor imagery (MI) and P300-Speller, and was compared with other representative models. The experimental results demonstrate that LMDA-Net outperforms other representative methods in terms of classification accuracy and predicting volatility, achieving the highest accuracy in all datasets within 300 training epochs. Ablation experiments further confirm the effectiveness of the channel attention module and the depth attention module. To facilitate an in-depth understanding of the features extracted by LMDA-Net, we propose class-specific neural network feature interpretability algorithms that are suitable for evoked responses and endogenous activities. By mapping the output of the specific layer of LMDA-Net to the time or spatial domain through class activation maps, the resulting feature visualizations can provide interpretable analysis and establish connections with EEG time-spatial analysis in neuroscience. In summary, LMDA-Net shows great potential as a general decoding model for various EEG tasks.

Collapse

Bao X, Sun J, Yi M, Qiu J, Chen X, Shuai SC, Zhao Q. MPFFPSDC: A multi-pooling feature fusion model for predicting synergistic drug combinations. Methods 2023:S1046-2023(23)00098-1. [PMID: 37321525 DOI: 10.1016/j.ymeth.2023.06.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 06/11/2023] [Accepted: 06/12/2023] [Indexed: 06/17/2023] Open

Zhang Z, Lin J, Chen Z. Predicting the effect of silver nanoparticles on soil enzyme activity using the machine learning method: type, size, dose and exposure time. J Hazard Mater 2023;457:131789. [PMID: 37301072 DOI: 10.1016/j.jhazmat.2023.131789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/03/2023] [Accepted: 06/04/2023] [Indexed: 06/12/2023]

Lu X, Du J, Zheng L, Wang G, Li X, Sun L, Huang X. Feature fusion improves performance and interpretability of machine learning models in identifying soil pollution of potentially contaminated sites. Ecotoxicol Environ Saf 2023;259:115052. [PMID: 37224784 DOI: 10.1016/j.ecoenv.2023.115052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 05/17/2023] [Accepted: 05/19/2023] [Indexed: 05/26/2023]

Abstract

Owing to the rapid development of big data technology, use of machine learning methods to identify soil pollution of potentially contaminated sites (PCS) at regional scales and in different industries has become a research hot spot. However, due to the difficulty in obtaining key indexes of site pollution sources and pathways, current methods have problems such as low accuracy of model predictions and insufficient scientific basis. In this study, we collected the environmental data of 199 PCS in 6 typical industries involving heavy metal and organic pollution. Then, 21 indexes based on basic information, potential for pollution from product and raw material, pollution control level, and migration capacity of soil pollutants were used to established the soil pollution identification index system. We fused the original indexes into the new feature subset with 11 indexes through the method of consolidation calculation. The new feature subset was then used to train machine learning models of random forest (RF), support vector machine (SVM), and multilayer perceptron (MLP), and tested to determine whether it improved the accuracy and precision of soil pollination identification models. The results of correlation analysis showed that the four new indexes created by feature fusion have the correlation with soil pollution is similar to the original indexes. The accuracies and precisions of three machine learning models trained on the new feature subset were 67.4%- 72.9% and 72.0%- 74.7%, which were 2.1%- 2.5% and 0.3%- 5.7% higher than these of the models trained on original indexes, respectively. When the PCS were divided into typical heavy metal and organic pollution sites according to the enterprise industries, the accuracy of the model trained on the two datasets for identifying soil heavy metal and organic pollution were significantly improve to approximately 80%. Owing to the imbalance in positive and negative samples in the prediction of soil organic pollution, the precisions of soil organic pollution identification models were 58%- 72.5%, which were significantly lower than their accuracies. According to the factors analysis based on the model interpretability of SHAP, most of the indexes of basic information, potential for pollution from product and raw material, and pollution control level had different degrees of impact on soil pollution. However, the indexes of migration capacity of soil pollutants had the least effect in the classification task of soil pollution identification of PCS. Among the indexes, traces of soil pollution, industrial utilization years/start-up time, pollution control risk scores and enterprise scale having the greatest effects on soil pollution with the mean SHAP values of 0.17-0.36, which reflected their contribution rate on soil pollution and could help to optimize the current index scoring of the technical regulation for identifying site soil pollution. This study provides a new technical method to identify soil pollution based on big data and machine learning methods, in addition to providing a reference and scientific basis for environmental management and soil pollution control of PCS.

Collapse

Majdandzic A, Rajesh C, Koo PK. Correcting gradient-based interpretations of deep neural networks for genomics. Genome Biol 2023;24:109. [PMID: 37161475 PMCID: PMC10169356 DOI: 10.1186/s13059-023-02956-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 04/28/2023] [Indexed: 05/11/2023] Open

Lee NK, Tang Z, Toneyan S, Koo PK. EvoAug: improving generalization and interpretability of genomic deep neural networks with evolution-inspired data augmentations. Genome Biol 2023;24:105. [PMID: 37143118 PMCID: PMC10161416 DOI: 10.1186/s13059-023-02941-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 04/17/2023] [Indexed: 05/06/2023] Open

Liu M, Huang Y, Hu J, He J, Xiao X. Algal community structure prediction by machine learning. Environ Sci Ecotechnol 2023;14:100233. [PMID: 36793396 PMCID: PMC9923192 DOI: 10.1016/j.ese.2022.100233] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 06/18/2023]

Koo PK, Ploenzke M, Anand P, Paul S, Majdandzic A. ResidualBind: Uncovering Sequence-Structure Preferences of RNA-Binding Proteins with Deep Neural Networks. Methods Mol Biol 2023;2586:197-215. [PMID: 36705906 DOI: 10.1007/978-1-0716-2768-6_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Chen Q, Li R, Lin C, Lai C, Chen D, Qu H, Huang Y, Lu W, Tang Y, Li L. Transferability and interpretability of the sepsis prediction models in the intensive care unit. BMC Med Inform Decis Mak 2022;22:343. [PMID: 36581881 PMCID: PMC9798724 DOI: 10.1186/s12911-022-02090-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 12/16/2022] [Indexed: 12/31/2022] Open

Abstract

BACKGROUND

We aimed to develop an early warning system for real-time sepsis prediction in the ICU by machine learning methods, with tools for interpretative analysis of the predictions. In particular, we focus on the deployment of the system in a target medical center with small historical samples.

METHODS

Light Gradient Boosting Machine (LightGBM) and multilayer perceptron (MLP) were trained on Medical Information Mart for Intensive Care (MIMIC-III) dataset and then finetuned on the private Historical Database of local Ruijin Hospital (HDRJH) using transfer learning technique. The Shapley Additive Explanations (SHAP) analysis was employed to characterize the feature importance in the prediction inference. Ultimately, the performance of the sepsis prediction system was further evaluated in the real-world study in the ICU of the target Ruijin Hospital.

RESULTS

The datasets comprised 6891 patients from MIMIC-III, 453 from HDRJH, and 67 from Ruijin real-world data. The area under the receiver operating characteristic curves (AUCs) for LightGBM and MLP models derived from MIMIC-III were 0.98 - 0.98 and 0.95 - 0.96 respectively on MIMIC-III dataset, and, in comparison, 0.82 - 0.86 and 0.84 - 0.87 respectively on HDRJH, from 1 to 5 h preceding. After transfer learning and ensemble learning, the AUCs of the final ensemble model were enhanced to 0.94 - 0.94 on HDRJH and to 0.86 - 0.9 in the real-world study in the ICU of the target Ruijin Hospital. In addition, the SHAP analysis illustrated the importance of age, antibiotics, net balance, and ventilation for sepsis prediction, making the model interpretable.

CONCLUSIONS

Our machine learning model allows accurate real-time prediction of sepsis within 5-h preceding. Transfer learning can effectively improve the feasibility to deploy the prediction model in the target cohort, and ameliorate the model performance for external validation. SHAP analysis indicates that the role of antibiotic usage and fluid management needs further investigation. We argue that our system and methodology have the potential to improve ICU management by helping medical practitioners identify at-sepsis-risk patients and prepare for timely diagnosis and intervention.

TRIAL REGISTRATION

NCT05088850 (retrospectively registered).

Collapse

Zhang X, Gavaldà R, Baixeries J. Interpretable prediction of mortality in liver transplant recipients based on machine learning. Comput Biol Med 2022;151:106188. [PMID: 36306583 DOI: 10.1016/j.compbiomed.2022.106188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 09/24/2022] [Accepted: 10/08/2022] [Indexed: 12/27/2022]

Abstract

BACKGROUND

Accurate prediction of the mortality of post-liver transplantation is an important but challenging task. It relates to optimizing organ allocation and estimating the risk of possible dysfunction. Existing risk scoring models, such as the Balance of Risk (BAR) score and the Survival Outcomes Following Liver Transplantation (SOFT) score, do not predict the mortality of post-liver transplantation with sufficient accuracy. In this study, we evaluate the performance of machine learning models and establish an explainable machine learning model for predicting mortality in liver transplant recipients.

METHOD

The optimal feature set for the prediction of the mortality was selected by a wrapper method based on binary particle swarm optimization (BPSO). With the selected optimal feature set, seven machine learning models were applied to predict mortality over different time windows. The best-performing model was used to predict mortality through a comprehensive comparison and evaluation. An interpretable approach based on machine learning and SHapley Additive exPlanations (SHAP) is used to explicitly explain the model's decision and make new discoveries.

RESULTS

With regard to predictive power, our results demonstrated that the feature set selected by BPSO outperformed both the feature set in the existing risk score model (BAR score, SOFT score) and the feature set processed by principal component analysis (PCA). The best-performing model, extreme gradient boosting (XGBoost), was found to improve the Area Under a Curve (AUC) values for mortality prediction by 6.7%, 11.6%, and 17.4% at 3 months, 3 years, and 10 years, respectively, compared to the SOFT score. The main predictors of mortality and their impact were discussed for different age groups and different follow-up periods.

CONCLUSIONS

Our analysis demonstrates that XGBoost can be an ideal method to assess the mortality risk in liver transplantation. In combination with the SHAP approach, the proposed framework provides a more intuitive and comprehensive interpretation of the predictive model, thereby allowing the clinician to better understand the decision-making process of the model and the impact of factors associated with mortality risk in liver transplantation.

Collapse

Zhao Y, Shao J, Asmann YW. Assessment and Optimization of Explainable Machine Learning Models Applied to Transcriptomic Data. Genomics Proteomics Bioinformatics 2022;20:899-911. [PMID: 35931322 PMCID: PMC10025763 DOI: 10.1016/j.gpb.2022.07.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 06/05/2022] [Accepted: 07/25/2022] [Indexed: 01/12/2023]

Zhang Y, Zhang X, Razbek J, Li D, Xia W, Bao L, Mao H, Daken M, Cao M. Opening the black box: interpretable machine learning for predictor finding of metabolic syndrome. BMC Endocr Disord 2022;22:214. [PMID: 36028865 PMCID: PMC9419421 DOI: 10.1186/s12902-022-01121-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 07/31/2022] [Indexed: 11/10/2022] Open

Abstract

OBJECTIVE

The internal workings ofmachine learning algorithms are complex and considered as low-interpretation "black box" models, making it difficult for domain experts to understand and trust these complex models. The study uses metabolic syndrome (MetS) as the entry point to analyze and evaluate the application value of model interpretability methods in dealing with difficult interpretation of predictive models.

METHODS

The study collects data from a chain of health examination institution in Urumqi from 2017 ~ 2019, and performs 39,134 remaining data after preprocessing such as deletion and filling. RFE is used for feature selection to reduce redundancy; MetS risk prediction models (logistic, random forest, XGBoost) are built based on a feature subset, and accuracy, sensitivity, specificity, Youden index, and AUROC value are used to evaluate the model classification performance; post-hoc model-agnostic interpretation methods (variable importance, LIME) are used to interpret the results of the predictive model.

RESULTS

Eighteen physical examination indicators are screened out by RFE, which can effectively solve the problem of physical examination data redundancy. Random forest and XGBoost models have higher accuracy, sensitivity, specificity, Youden index, and AUROC values compared with logistic regression. XGBoost models have higher sensitivity, Youden index, and AUROC values compared with random forest. The study uses variable importance, LIME and PDP for global and local interpretation of the optimal MetS risk prediction model (XGBoost), and different interpretation methods have different insights into the interpretation of model results, which are more flexible in model selection and can visualize the process and reasons for the model to make decisions. The interpretable risk prediction model in this study can help to identify risk factors associated with MetS, and the results showed that in addition to the traditional risk factors such as overweight and obesity, hyperglycemia, hypertension, and dyslipidemia, MetS was also associated with other factors, including age, creatinine, uric acid, and alkaline phosphatase.

CONCLUSION

The model interpretability methods are applied to the black box model, which can not only realize the flexibility of model application, but also make up for the uninterpretable defects of the model. Model interpretability methods can be used as a novel means of identifying variables that are more likely to be good predictors.

Collapse

Coupet M, Urruty T, Leelanupab T, Naudin M, Bourdon P, Maloigne CF, Guillevin R. A multi-sequences MRI deep framework study applied to glioma classfication. Multimed Tools Appl 2022;81:13563-13591. [PMID: 35250358 PMCID: PMC8882719 DOI: 10.1007/s11042-022-12316-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 09/02/2021] [Accepted: 01/17/2022] [Indexed: 06/14/2023]

Gao F, Shen Y, Brett Sallach J, Li H, Zhang W, Li Y, Liu C. Predicting crop root concentration factors of organic contaminants with machine learning models. J Hazard Mater 2022;424:127437. [PMID: 34678561 DOI: 10.1016/j.jhazmat.2021.127437] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 09/15/2021] [Accepted: 10/03/2021] [Indexed: 06/13/2023]

Susnjak T, Ramaswami GS, Mathrani A. Learning analytics dashboard: a tool for providing actionable insights to learners. Int J Educ Technol High Educ 2022;19:12. [PMID: 35194560 PMCID: PMC8853217 DOI: 10.1186/s41239-021-00313-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Accepted: 12/14/2021] [Indexed: 06/14/2023]

Abbas A, O'Byrne C, Fu DJ, Moraes G, Balaskas K, Struyven R, Beqiri S, Wagner SK, Korot E, Keane PA. Evaluating an automated machine learning model that predicts visual acuity outcomes in patients with neovascular age-related macular degeneration. Graefes Arch Clin Exp Ophthalmol 2022. [PMID: 35122132 DOI: 10.1007/s00417-021-05544-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 11/10/2021] [Accepted: 12/27/2021] [Indexed: 01/01/2023] Open

Abstract

Purpose

Neovascular age-related macular degeneration (nAMD) is a major global cause of blindness. Whilst anti-vascular endothelial growth factor (anti-VEGF) treatment is effective, response varies considerably between individuals. Thus, patients face substantial uncertainty regarding their future ability to perform daily tasks. In this study, we evaluate the performance of an automated machine learning (AutoML) model which predicts visual acuity (VA) outcomes in patients receiving treatment for nAMD, in comparison to a manually coded model built using the same dataset. Furthermore, we evaluate model performance across ethnic groups and analyse how the models reach their predictions.

Methods

Binary classification models were trained to predict whether patients’ VA would be ‘Above’ or ‘Below’ a score of 70 one year after initiating treatment, measured using the Early Treatment Diabetic Retinopathy Study (ETDRS) chart. The AutoML model was built using the Google Cloud Platform, whilst the bespoke model was trained using an XGBoost framework. Models were compared and analysed using the What-if Tool (WIT), a novel model-agnostic interpretability tool.

Results

Our study included 1631 eyes from patients attending Moorfields Eye Hospital. The AutoML model (area under the curve [AUC], 0.849) achieved a highly similar performance to the XGBoost model (AUC, 0.847). Using the WIT, we found that the models over-predicted negative outcomes in Asian patients and performed worse in those with an ethnic category of Other. Baseline VA, age and ethnicity were the most important determinants of model predictions. Partial dependence plot analysis revealed a sigmoidal relationship between baseline VA and the probability of an outcome of ‘Above’.

Conclusion

We have described and validated an AutoML-WIT pipeline which enables clinicians with minimal coding skills to match the performance of a state-of-the-art algorithm and obtain explainable predictions.

Supplementary Information

The online version contains supplementary material available at 10.1007/s00417-021-05544-y.

Collapse

Romeo L, Frontoni E. A Unified Hierarchical XGBoost model for classifying priorities for COVID-19 vaccination campaign. Pattern Recognit 2022;121:108197. [PMID: 34312570 PMCID: PMC8295058 DOI: 10.1016/j.patcog.2021.108197] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 06/21/2021] [Accepted: 07/20/2021] [Indexed: 05/03/2023]

Pucci F, Schwersensky M, Rooman M. Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr Opin Struct Biol 2021;72:161-8. [PMID: 34922207 DOI: 10.1016/j.sbi.2021.11.001] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/15/2021] [Accepted: 11/08/2021] [Indexed: 01/17/2023]

Thomas M, Boardman A, Garcia-Ortegon M, Yang H, de Graaf C, Bender A. Applications of Artificial Intelligence in Drug Design: Opportunities and Challenges. Methods Mol Biol 2021;2390:1-59. [PMID: 34731463 DOI: 10.1007/978-1-0716-1787-8_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Ferré Q, Chèneby J, Puthier D, Capponi C, Ballester B. Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders. BMC Bioinformatics 2021;22:460. [PMID: 34563116 PMCID: PMC8467021 DOI: 10.1186/s12859-021-04359-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 06/04/2021] [Accepted: 08/09/2021] [Indexed: 11/13/2022] Open

Tideman LEM, Migas LG, Djambazova KV, Patterson NH, Caprioli RM, Spraggins JM, Van de Plas R. Automated biomarker candidate discovery in imaging mass spectrometry data through spatially localized Shapley additive explanations. Anal Chim Acta 2021;1177:338522. [PMID: 34482894 PMCID: PMC10124144 DOI: 10.1016/j.aca.2021.338522] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 04/04/2021] [Accepted: 04/11/2021] [Indexed: 01/09/2023]

Abstract

The search for molecular species that are differentially expressed between biological states is an important step towards discovering promising biomarker candidates. In imaging mass spectrometry (IMS), performing this search manually is often impractical due to the large size and high-dimensionality of IMS datasets. Instead, we propose an interpretable machine learning workflow that automatically identifies biomarker candidates by their mass-to-charge ratios, and that quantitatively estimates their relevance to recognizing a given biological class using Shapley additive explanations (SHAP). The task of biomarker candidate discovery is translated into a feature ranking problem: given a classification model that assigns pixels to different biological classes on the basis of their mass spectra, the molecular species that the model uses as features are ranked in descending order of relative predictive importance such that the top-ranking features have a higher likelihood of being useful biomarkers. Besides providing the user with an experiment-wide measure of a molecular species' biomarker potential, our workflow delivers spatially localized explanations of the classification model's decision-making process in the form of a novel representation called SHAP maps. SHAP maps deliver insight into the spatial specificity of biomarker candidates by highlighting in which regions of the tissue sample each feature provides discriminative information and in which regions it does not. SHAP maps also enable one to determine whether the relationship between a biomarker candidate and a biological state of interest is correlative or anticorrelative. Our automated approach to estimating a molecular species' potential for characterizing a user-provided biological class, combined with the untargeted and multiplexed nature of IMS, allows for the rapid screening of thousands of molecular species and the obtention of a broader biomarker candidate shortlist than would be possible through targeted manual assessment. Our biomarker candidate discovery workflow is demonstrated on mouse-pup and rat kidney case studies.

Collapse

Walakira A, Rozman D, Režen T, Mraz M, Moškon M. Guided extraction of genome-scale metabolic models for the integration and analysis of omics data. Comput Struct Biotechnol J 2021;19:3521-3530. [PMID: 34194675 PMCID: PMC8225705 DOI: 10.1016/j.csbj.2021.06.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 06/04/2021] [Accepted: 06/04/2021] [Indexed: 02/05/2023] Open

Lewis A, Stoyanovich J. Teaching Responsible Data Science: Charting New Pedagogical Territory. Int J Artif Intell Educ 2021;32:783-807. [PMID: 33880114 PMCID: PMC8049623 DOI: 10.1007/s40593-021-00241-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/10/2021] [Indexed: 11/29/2022]

Sushil M, Šuster S, Luyckx K, Daelemans W. Patient representation learning and interpretable evaluation using clinical notes. J Biomed Inform 2018;84:103-13. [PMID: 29966746 DOI: 10.1016/j.jbi.2018.06.016] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Revised: 06/07/2018] [Accepted: 06/28/2018] [Indexed: 11/22/2022]

Rios A, Kavuluru R. Ordinal convolutional neural networks for predicting RDoC positive valence psychiatric symptom severity scores. J Biomed Inform 2017;75S:S85-S93. [PMID: 28506904 PMCID: PMC5682241 DOI: 10.1016/j.jbi.2017.05.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Revised: 04/04/2017] [Accepted: 05/10/2017] [Indexed: 10/19/2022]

Abstract

BACKGROUND

The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) provided a set of 1000 neuropsychiatric notes to participants as part of a competition to predict psychiatric symptom severity scores. This paper summarizes our methods, results, and experiences based on our participation in the second track of the shared task.

OBJECTIVE

Classical methods of text classification usually fall into one of three problem types: binary, multi-class, and multi-label classification. In this effort, we study ordinal regression problems with text data where misclassifications are penalized differently based on how far apart the ground truth and model predictions are on the ordinal scale. Specifically, we present our entries (methods and results) in the N-GRID shared task in predicting research domain criteria (RDoC) positive valence ordinal symptom severity scores (absent, mild, moderate, and severe) from psychiatric notes.

METHODS

We propose a novel convolutional neural network (CNN) model designed to handle ordinal regression tasks on psychiatric notes. Broadly speaking, our model combines an ordinal loss function, a CNN, and conventional feature engineering (wide features) into a single model which is learned end-to-end. Given interpretability is an important concern with nonlinear models, we apply a recent approach called locally interpretable model-agnostic explanation (LIME) to identify important words that lead to instance specific predictions.

RESULTS

Our best model entered into the shared task placed third among 24 teams and scored a macro mean absolute error (MMAE) based normalized score (100·(1-MMAE)) of 83.86. Since the competition, we improved our score (using basic ensembling) to 85.55, comparable with the winning shared task entry. Applying LIME to model predictions, we demonstrate the feasibility of instance specific prediction interpretation by identifying words that led to a particular decision.

CONCLUSION

In this paper, we present a method that successfully uses wide features and an ordinal loss function applied to convolutional neural networks for ordinal text classification specifically in predicting psychiatric symptom severity scores. Our approach leads to excellent performance on the N-GRID shared task and is also amenable to interpretability using existing model-agnostic approaches.

Collapse

Jovanovic M, Radovanovic S, Vukicevic M, Van Poucke S, Delibasic B. Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression. Artif Intell Med 2016;72:12-21. [PMID: 27664505 DOI: 10.1016/j.artmed.2016.07.003] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Revised: 07/23/2016] [Accepted: 07/25/2016] [Indexed: 11/18/2022]

Abstract

OBJECTIVES

Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions.

MATERIALS AND METHODS

The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy.

RESULTS

The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model.

CONCLUSIONS

We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features.

Collapse