1
|
Integrating Explainable Machine Learning in Clinical Decision Support Systems: Study Involving a Modified Design Thinking Approach. JMIR Form Res 2024; 8:e50475. [PMID: 38625728 PMCID: PMC11061789 DOI: 10.2196/50475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 01/26/2024] [Accepted: 02/19/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND Though there has been considerable effort to implement machine learning (ML) methods for health care, clinical implementation has lagged. Incorporating explainable machine learning (XML) methods through the development of a decision support tool using a design thinking approach is expected to lead to greater uptake of such tools. OBJECTIVE This work aimed to explore how constant engagement of clinician end users can address the lack of adoption of ML tools in clinical contexts due to their lack of transparency and address challenges related to presenting explainability in a decision support interface. METHODS We used a design thinking approach augmented with additional theoretical frameworks to provide more robust approaches to different phases of design. In particular, in the problem definition phase, we incorporated the nonadoption, abandonment, scale-up, spread, and sustainability of technology in health care (NASSS) framework to assess these aspects in a health care network. This process helped focus on the development of a prognostic tool that predicted the likelihood of admission to an intensive care ward based on disease severity in chest x-ray images. In the ideate, prototype, and test phases, we incorporated a metric framework to assess physician trust in artificial intelligence (AI) tools. This allowed us to compare physicians' assessments of the domain representation, action ability, and consistency of the tool. RESULTS Physicians found the design of the prototype elegant, and domain appropriate representation of data was displayed in the tool. They appreciated the simplified explainability overlay, which only displayed the most predictive patches that cumulatively explained 90% of the final admission risk score. Finally, in terms of consistency, physicians unanimously appreciated the capacity to compare multiple x-ray images in the same view. They also appreciated the ability to toggle the explainability overlay so that both options made it easier for them to assess how consistently the tool was identifying elements of the x-ray image they felt would contribute to overall disease severity. CONCLUSIONS The adopted approach is situated in an evolving space concerned with incorporating XML or AI technologies into health care software. We addressed the alignment of AI as it relates to clinician trust, describing an approach to wire framing and prototyping, which incorporates the use of a theoretical framework for trust in the design process itself. Moreover, we proposed that alignment of AI is dependent upon integration of end users throughout the larger design process. Our work shows the importance and value of engaging end users prior to tool development. We believe that the described approach is a unique and valuable contribution that outlines a direction for ML experts, user experience designers, and clinician end users on how to collaborate in the creation of trustworthy and usable XML-based clinical decision support tools.
Collapse
|
2
|
Automated machine learning-based model for the prediction of pedicle screw loosening after degenerative lumbar fusion surgery. Biosci Trends 2024; 18:83-93. [PMID: 38417874 DOI: 10.5582/bst.2023.01327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2024]
Abstract
The adequacy of screw anchorage is a critical factor in achieving successful spinal fusion. This study aimed to use machine learning algorithms to identify critical variables and predict pedicle screw loosening after degenerative lumbar fusion surgery. A total of 552 patients who underwent primary transpedicular lumbar fixation for lumbar degenerative disease were included. The LASSO method identified key features associated with pedicle screw loosening. Patient clinical characteristics, intraoperative variables, and radiographic parameters were collected and used to construct eight machine learning models, including a training set (80% of participants) and a test set (20% of participants). The XGBoost model exhibited the best performance, with an AUC of 0.884 (95% CI: 0.825-0.944) in the test set, along with the lowest Brier score. Ten crucial variables, including age, disease diagnosis: degenerative scoliosis, number of fused levels, fixation to S1, HU value, preoperative PT, preoperative PI-LL, postoperative LL, postoperative PT, and postoperative PI-LL were selected. In the prospective cohort, the XGBoost model demonstrated substantial performance with an accuracy of 83.32%. This study identified crucial variables associated with pedicle screw loosening after degenerative lumbar fusion surgery and successfully developed a machine learning model to predict pedicle screw loosening. The findings of this study may provide valuable information for clinical decision-making.
Collapse
|
3
|
Explainable machine learning in outcome prediction of high-grade aneurysmal subarachnoid hemorrhage. Aging (Albany NY) 2024; 16:4654-4669. [PMID: 38431285 DOI: 10.18632/aging.205621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 01/29/2024] [Indexed: 03/05/2024]
Abstract
OBJECTIVE Accurate prognostic prediction in patients with high-grade aneruysmal subarachnoid hemorrhage (aSAH) is essential for personalized treatment. In this study, we developed an interpretable prognostic machine learning model for high-grade aSAH patients using SHapley Additive exPlanations (SHAP). METHODS A prospective registry cohort of high-grade aSAH patients was collected in one single-center hospital. The endpoint in our study is a 12-month follow-up outcome. The dataset was divided into training and validation sets in a 7:3 ratio. Machine learning algorithms, including Logistic regression model (LR), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost), were employed to develop a prognostic prediction model for high-grade aSAH. The optimal model was selected for SHAP analysis. RESULTS Among the 421 patients, 204 (48.5%) exhibited poor prognosis. The RF model demonstrated superior performance compared to LR (AUC = 0.850, 95% CI: 0.783-0.918), SVM (AUC = 0.862, 95% CI: 0.799-0.926), and XGBoost (AUC = 0.850, 95% CI: 0.783-0.917) with an AUC of 0.867 (95% CI: 0.806-0 .929). Primary prognostic features identified through SHAP analysis included higher World Federation of Neurosurgical Societies (WFNS) grade, higher modified Fisher score (mFS) and advanced age, were found to be associated with 12-month unfavorable outcome, while the treatment of coiling embolization for aSAH drove the prediction towards favorable prognosis. Additionally, the SHAP force plot visualized individual prognosis predictions. CONCLUSIONS This study demonstrated the potential of machine learning techniques in prognostic prediction for high-grade aSAH patients. The features identified through SHAP analysis enhance model interpretability and provide guidance for clinical decision-making.
Collapse
|
4
|
Clarifying Cognitive Control Deficits in Psychosis via Drift Diffusion Modeling and Attractor Dynamics. Schizophr Bull 2024:sbae014. [PMID: 38408151 DOI: 10.1093/schbul/sbae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
BACKGROUND AND HYPOTHESIS Cognitive control deficits are prominent in individuals with psychotic psychopathology. Studies providing evidence for deficits in proactive control generally examine average performance and not variation across trials for individuals-potentially obscuring detection of essential contributors to cognitive control. Here, we leverage intertrial variability through drift-diffusion models (DDMs) aiming to identify key contributors to cognitive control deficits in psychosis. STUDY DESIGN People with psychosis (PwP; N = 122), their first-degree biological relatives (N = 78), and controls (N = 50) each completed 120 trials of the dot pattern expectancy (DPX) cognitive control task. We fit full hierarchical DDMs to response and reaction time (RT) data for individual trials and then used classification models to compare the DDM parameters with conventional measures of proactive and reactive control. STUDY RESULTS PwP demonstrated slower drift rates on proactive control trials suggesting less efficient use of cue information. Both PwP and relatives showed protracted nondecision times to infrequent trial sequences suggesting slowed perceptual processing. Classification analyses indicated that DDM parameters differentiated between the groups better than conventional measures and identified drift rates during proactive control, nondecision time during reactive control, and cue bias as most important. DDM parameters were associated with real-world functioning and schizotypal traits. CONCLUSIONS Modeling of trial-level data revealed that slow evidence accumulation and longer preparatory periods are the strongest contributors to cognitive control deficits in psychotic psychopathology. This pattern of atypical responding during the DPX is consistent with shallow basins in attractor dynamic models that reflect difficulties in maintaining state representations, possibly mediated by excess neural excitation or poor connectivity.
Collapse
|
5
|
A stroke prediction framework using explainable ensemble learning. Comput Methods Biomech Biomed Engin 2024:1-20. [PMID: 38384147 DOI: 10.1080/10255842.2024.2316877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 01/23/2024] [Indexed: 02/23/2024]
Abstract
The death of brain cells occurs when blood flow to a particular area of the brain is abruptly cut off, resulting in a stroke. Early recognition of stroke symptoms is essential to prevent strokes and promote a healthy lifestyle. FAST tests (looking for abnormalities in the face, arms, and speech) have limitations in reliability and accuracy for diagnosing strokes. This research employs machine learning (ML) techniques to develop and assess multiple ML models to establish a robust stroke risk prediction framework. This research uses a stacking-based ensemble method to select the best three machine learning (ML) models and combine their collective intelligence. An empirical evaluation of a publicly available stroke prediction dataset demonstrates the superior performance of the proposed stacking-based ensemble model, with only one misclassification. The experimental results reveal that the proposed stacking model surpasses other state-of-the-art research, achieving accuracy, precision, F1-score of 99.99%, recall of 100%, receiver operating characteristics (ROC), Mathews correlation coefficient (MCC), and Kappa scores 1.0. Furthermore, Shapley's Additive Explanations (SHAP) are employed to analyze the predictions of the black-box machine learning (ML) models. The findings highlight that age, BMI, and glucose level are the most significant risk factors for stroke prediction. These findings contribute to the development of more efficient techniques for stroke prediction, potentially saving many lives.
Collapse
|
6
|
Value of multi-center 18 F-FDG PET/CT radiomics in predicting EGFR mutation status in lung adenocarcinoma. Med Phys 2024. [PMID: 38285641 DOI: 10.1002/mp.16947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 12/08/2023] [Accepted: 12/27/2023] [Indexed: 01/31/2024] Open
Abstract
BACKGROUND Accurate, noninvasive, and reliable assessment of epidermal growth factor receptor (EGFR) mutation status and EGFR molecular subtypes is essential for treatment plan selection and individualized therapy in lung adenocarcinoma (LUAD). Radiomics models based on 18 F-FDG PET/CT have great potential in identifying EGFR mutation status and EGFR subtypes in patients with LUAD. The validation of multi-center data, model visualization, and interpretation are significantly important for the management, application and trust of machine learning predictive models. However, few EGFR-related research involved model visualization and interpretation, and multi-center trial. PURPOSE To develop explainable optimal predictive models based on handcrafted radiomics features (HRFs) extracted from multi-center 18 F-FDG PET/CT to predict EGFR mutation status and molecular subtypes in LUAD. METHODS Baseline 18 F-FDG PET/CT images of 383 LUAD patients from three hospitals and one public data set were collected. Further, 1808 HRFs were extracted from the primary tumor regions using Pyradiomics. Predictive models were built based on cross-combination of seven feature selection methods and seven machine learning algorithms. Yellowbrick and explainable artificial intelligence technology were used for model visualization and interpretation. Receiver operating characteristic curve, classification report and confusion matrix were used for model performance evaluation. Clinical applicability of the optimal models was assessed by decision curve analysis. RESULTS STACK feature selection method combined with light gradient boosting machine (LGBM) reached optimal performance in identifying EGFR mutation status ([area under the curve] AUC = 0.81 in the internal test cohort; AUC = 0.62 in the external test cohort). Random forest feature selection method combined with LGBM reached optimal performance in predicting EGFR mutation molecular subtypes (AUC = 0.89 in the internal test cohort; AUC = 0.61 in the external test cohort). CONCLUSIONS Explainable machine learning models combined with radiomics features extracted from multi-center/scanner 18 F-FDG PET/CT have certain potential to identify EGFR mutation status and subtypes in LUAD, which might be helpful to the treatment of LUAD.
Collapse
|
7
|
Satellite-Based Global Sea Surface Oxygen Mapping and Interpretation with Spatiotemporal Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:498-509. [PMID: 38103020 DOI: 10.1021/acs.est.3c08833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2023]
Abstract
The assessment of dissolved oxygen (DO) concentration at the sea surface is essential for comprehending the global ocean oxygen cycle and associated environmental and biochemical processes as it serves as the primary site for photosynthesis and sea-air exchange. However, limited comprehensive measurements and imprecise numerical simulations have impeded the study of global sea surface DO and its relationship with environmental challenges. This paper presents a novel spatiotemporal information embedding machine-learning framework that provides explanatory insights into the underlying driving mechanisms. By integrating extensive in situ data and high-resolution satellite data, the proposed framework successfully generated high-resolution (0.25° × 0.25°) estimates of DO concentration with exceptional accuracy (R2 = 0.95, RMSE = 11.95 μmol/kg, and test number = 2805) for near-global sea surface areas from 2010 to 2018, uncertainty estimated to be ±13.02 μmol/kg. The resulting sea surface DO data set exhibits precise spatial distribution and reveals compelling correlations with prominent marine phenomena and environmental stressors. Leveraging its interpretability, our model further revealed the key influence of marine factors on surface DO and their implications for environmental issues. The presented machine-learning framework offers an improved DO data set with higher resolution, facilitating the exploration of oceanic DO variability, deoxygenation phenomena, and their potential consequences for environments.
Collapse
|
8
|
Recognizing protected and anthropogenic patterns in landscapes using interpretable machine learning and satellite imagery. Front Artif Intell 2023; 6:1278118. [PMID: 38106982 PMCID: PMC10725256 DOI: 10.3389/frai.2023.1278118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 10/30/2023] [Indexed: 12/19/2023] Open
Abstract
The accurate and comprehensive mapping of land cover has become a central task in modern environmental research, with increasing emphasis on machine learning approaches. However, a clear technical definition of the land cover class is a prerequisite for learning and applying a machine learning model. One of the challenging classes is naturalness and human influence, yet mapping it is important due to its critical role in biodiversity conservation, habitat assessment, and climate change monitoring. We present an interpretable machine learning approach to map patterns related to territorial protected and anthropogenic areas as proxies of naturalness and human influence using satellite imagery. To achieve this, we train a weakly-supervised convolutional neural network and subsequently apply attribution methods such as Grad-CAM and occlusion sensitivity mapping. We propose a novel network architecture that consists of an image-to-image network and a shallow, task-specific head. Both sub-networks are connected by an intermediate layer that captures high-level features in full resolution, allowing for detailed analysis with a wide range of attribution methods. We further analyze how intermediate layer activations relate to their attributions across the training dataset to establish a consistent relationship. This makes attributions consistent across different scenes and allows for a large-scale analysis of remote sensing data. The results highlight that our approach is a promising way to observe and assess naturalness and territorial protection.
Collapse
|
9
|
Interpretable machine learning-based predictive modeling of patient outcomes following cardiac surgery. J Thorac Cardiovasc Surg 2023:S0022-5223(23)01110-8. [PMID: 38040328 DOI: 10.1016/j.jtcvs.2023.11.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/17/2023] [Accepted: 11/21/2023] [Indexed: 12/03/2023]
Abstract
BACKGROUND The clinical applicability of machine learning predictions of patient outcomes following cardiac surgery remains unclear. We applied machine learning to predict patient outcomes associated with high morbidity and mortality after cardiac surgery and identified the importance of variables to the derived model's performance. METHODS We applied machine learning to the Society of Thoracic Surgeons Adult Cardiac Surgery Database to predict postoperative hemorrhage requiring reoperation, venous thromboembolism (VTE), and stroke. We used permutation feature importance to identify variables important to model performance and a misclassification analysis to study the limitations of the model. RESULTS The study dataset included 662,772 subjects who underwent cardiac surgery between 2015 and 2017 and 240 variables. Hemorrhage requiring reoperation, VTE, and stroke occurred in 2.9%, 1.2%, and 2.0% of subjects, respectively. The model performed remarkably well at predicting all 3 complications (area under the receiver operating characteristic curve, 0.92-0.97). Preoperative and intraoperative variables were not important to model performance; instead, performance for the prediction of all 3 outcomes was driven primarily by several postoperative variables, including known risk factors for the complications, such as mechanical ventilation and new onset of postoperative arrhythmias. Many of the postoperative variables important to model performance also increased the risk of subject misclassification, indicating internal validity. CONCLUSIONS A machine learning model accurately and reliably predicts patient outcomes following cardiac surgery. Postoperative, as opposed to preoperative or intraoperative variables, are important to model performance. Interventions targeting this period, including minimizing the duration of mechanical ventilation and early treatment of new-onset postoperative arrhythmias, may help lower the risk of these complications.
Collapse
|
10
|
Exploration of Solid Solutions and the Strengthening of Aluminum Substrates by Alloying Atoms: Machine Learning Accelerated Density Functional Theory Calculations. MATERIALS (BASEL, SWITZERLAND) 2023; 16:6757. [PMID: 37895739 PMCID: PMC10608410 DOI: 10.3390/ma16206757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 10/11/2023] [Accepted: 10/13/2023] [Indexed: 10/29/2023]
Abstract
In this paper, we studied the effects of a series of alloying atoms on the stability and micromechanical properties of aluminum alloy using a machine learning accelerated first-principles approach. In our preliminary work, high-throughput first-principles calculations were explored and the solution energy and theoretical stress of atomically doped aluminum substrates were extracted as basic data. By comparing five different algorithms, we found that the Catboost model had the lowest RMSE (0.24) and lowest MAPE (6.34), and this was used as the final prediction model to predict the solid solution strengthening of the aluminum matrix by the elements. Calculations show that alloying atoms such as K, Na, Y and Tl are difficult to dissolve in the aluminum matrix, whereas alloy atoms like Sc, Cu, B, Zr, Ni, Ti, Nb, V, Cr, Mn, Mo, and W exerted a strengthening influence. Theoretical studies on solid solutions and the strengthening effect of various alloy atoms in an aluminum matrix can offer theoretical guidance for the subsequent selection of suitable alloy elements. The theoretical investigation of alloy atoms in an aluminum matrix unveils the fundamental aspects of the solution strengthening effect, contributing significantly to the expedited development of new aluminum alloys.
Collapse
|
11
|
Explainable machine learning for diffraction patterns. J Appl Crystallogr 2023; 56:1494-1504. [PMID: 37791364 PMCID: PMC10543671 DOI: 10.1107/s1600576723007446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 08/24/2023] [Indexed: 10/05/2023] Open
Abstract
Serial crystallography experiments at X-ray free-electron laser facilities produce massive amounts of data but only a fraction of these data are useful for downstream analysis. Thus, it is essential to differentiate between acceptable and unacceptable data, generally known as 'hit' and 'miss', respectively. Image classification methods from artificial intelligence, or more specifically convolutional neural networks (CNNs), classify the data into hit and miss categories in order to achieve data reduction. The quantitative performance established in previous work indicates that CNNs successfully classify serial crystallography data into desired categories [Ke, Brewster, Yu, Ushizima, Yang & Sauter (2018). J. Synchrotron Rad.25, 655-670], but no qualitative evidence on the internal workings of these networks has been provided. For example, there are no visualization methods that highlight the features contributing to a specific prediction while classifying data in serial crystallography experiments. Therefore, existing deep learning methods, including CNNs classifying serial crystallography data, are like a 'black box'. To this end, presented here is a qualitative study to unpack the internal workings of CNNs with the aim of visualizing information in the fundamental blocks of a standard network with serial crystallography data. The region(s) or part(s) of an image that mostly contribute to a hit or miss prediction are visualized.
Collapse
|
12
|
A Fast and Minimal System to Identify Depression Using Smartphones: Explainable Machine Learning-Based Approach. JMIR Form Res 2023; 7:e28848. [PMID: 37561568 PMCID: PMC10450542 DOI: 10.2196/28848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 03/17/2023] [Accepted: 03/19/2023] [Indexed: 08/11/2023] Open
Abstract
BACKGROUND Existing robust, pervasive device-based systems developed in recent years to detect depression require data collected over a long period and may not be effective in cases where early detection is crucial. Additionally, due to the requirement of running systems in the background for prolonged periods, existing systems can be resource inefficient. As a result, these systems can be infeasible in low-resource settings. OBJECTIVE Our main objective was to develop a minimalistic system to identify depression using data retrieved in the fastest possible time. Another objective was to explain the machine learning (ML) models that were best for identifying depression. METHODS We developed a fast tool that retrieves the past 7 days' app usage data in 1 second (mean 0.31, SD 1.10 seconds). A total of 100 students from Bangladesh participated in our study, and our tool collected their app usage data and responses to the Patient Health Questionnaire-9. To identify depressed and nondepressed students, we developed a diverse set of ML models: linear, tree-based, and neural network-based models. We selected important features using the stable approach, along with 3 main types of feature selection (FS) approaches: filter, wrapper, and embedded methods. We developed and validated the models using the nested cross-validation method. Additionally, we explained the best ML models through the Shapley additive explanations (SHAP) method. RESULTS Leveraging only the app usage data retrieved in 1 second, our light gradient boosting machine model used the important features selected by the stable FS approach and correctly identified 82.4% (n=42) of depressed students (precision=75%, F1-score=78.5%). Moreover, after comprehensive exploration, we presented a parsimonious stacking model where around 5 features selected by the all-relevant FS approach Boruta were used in each iteration of validation and showed a maximum precision of 77.4% (balanced accuracy=77.9%). Feature importance analysis suggested app usage behavioral markers containing diurnal usage patterns as being more important than aggregated data-based markers. In addition, a SHAP analysis of our best models presented behavioral markers that were related to depression. For instance, students who were not depressed spent more time on education apps on weekdays, whereas those who were depressed used a higher number of photo and video apps and also had a higher deviation in using photo and video apps over the morning, afternoon, evening, and night time periods of the weekend. CONCLUSIONS Due to our system's fast and minimalistic nature, it may make a worthwhile contribution to identifying depression in underdeveloped and developing regions. In addition, our detailed discussion about the implication of our findings can facilitate the development of less resource-intensive systems to better understand students who are depressed and take steps for intervention.
Collapse
|
13
|
Too much information is no information: how machine learning and feature selection could help in understanding the motor control of pointing. Front Big Data 2023; 6:921355. [PMID: 37546547 PMCID: PMC10399757 DOI: 10.3389/fdata.2023.921355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 06/16/2023] [Indexed: 08/08/2023] Open
Abstract
The aim of this study was to develop the use of Machine Learning techniques as a means of multivariate analysis in studies of motor control. These studies generate a huge amount of data, the analysis of which continues to be largely univariate. We propose the use of machine learning classification and feature selection as a means of uncovering feature combinations that are altered between conditions. High dimensional electromyogram (EMG) vectors were generated as several arm and trunk muscles were recorded while subjects pointed at various angles above and below the gravity neutral horizontal plane. We used Linear Discriminant Analysis (LDA) to carry out binary classifications between the EMG vectors for pointing at a particular angle, vs. pointing at the gravity neutral direction. Classification success provided a composite index of muscular adjustments for various task constraints-in this case, pointing angles. In order to find the combination of features that were significantly altered between task conditions, we conducted a post classification feature selection i.e., investigated which combination of features had allowed for the classification. Feature selection was done by comparing the representations of each category created by LDA for the classification. In other words computing the difference between the representations of each class. We propose that this approach will help with comparing high dimensional EMG patterns in two ways; (i) quantifying the effects of the entire pattern rather than using single arbitrarily defined variables and (ii) identifying the parts of the patterns that convey the most information regarding the investigated effects.
Collapse
|
14
|
Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification. Healthcare (Basel) 2023; 11:2000. [PMID: 37510441 PMCID: PMC10379972 DOI: 10.3390/healthcare11142000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 07/06/2023] [Accepted: 07/08/2023] [Indexed: 07/30/2023] Open
Abstract
Mammography is considered the gold standard for breast cancer screening. Multiple risk factors that affect breast cancer development have been identified; however, there is an ongoing debate regarding the significance of these factors. Machine learning (ML) models and Shapley Additive Explanation (SHAP) methodology can rank risk factors and provide explanatory model results. This study used ML algorithms with SHAP to analyze the risk factors between two different age groups and evaluate the impact of each factor in predicting positive mammography. The ML model was built using data from the risk factor questionnaires of women participating in a breast cancer screening program from 2017 to 2021. Three ML models, least absolute shrinkage and selection operator (lasso) logistic regression, extreme gradient boosting (XGBoost), and random forest (RF), were applied. RF generated the best performance. The SHAP values were then applied to the RF model for further analysis. The model identified age at menarche, education level, parity, breast self-examination, and BMI as the top five significant risk factors affecting mammography outcomes. The differences between age groups ranked by reproductive lifespan and BMI were higher in the younger and older age groups, respectively. The use of SHAP frameworks allows us to understand the relationships between risk factors and generate individualized risk factor rankings. This study provides avenues for further research and individualized medicine.
Collapse
|
15
|
Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence. Diagnostics (Basel) 2023; 13:diagnostics13081506. [PMID: 37189606 DOI: 10.3390/diagnostics13081506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 04/13/2023] [Accepted: 04/15/2023] [Indexed: 05/17/2023] Open
Abstract
Polycystic ovary syndrome (PCOS) has been classified as a severe health problem common among women globally. Early detection and treatment of PCOS reduce the possibility of long-term complications, such as increasing the chances of developing type 2 diabetes and gestational diabetes. Therefore, effective and early PCOS diagnosis will help the healthcare systems to reduce the disease's problems and complications. Machine learning (ML) and ensemble learning have recently shown promising results in medical diagnostics. The main goal of our research is to provide model explanations to ensure efficiency, effectiveness, and trust in the developed model through local and global explanations. Feature selection methods with different types of ML models (logistic regression (LR), random forest (RF), decision tree (DT), naive Bayes (NB), support vector machine (SVM), k-nearest neighbor (KNN), xgboost, and Adaboost algorithm to get optimal feature selection and best model. Stacking ML models that combine the best base ML models with meta-learner are proposed to improve performance. Bayesian optimization is used to optimize ML models. Combining SMOTE (Synthetic Minority Oversampling Techniques) and ENN (Edited Nearest Neighbour) solves the class imbalance. The experimental results were made using a benchmark PCOS dataset with two ratios splitting 70:30 and 80:20. The result showed that the Stacking ML with REF feature selection recorded the highest accuracy at 100 compared to other models.
Collapse
|
16
|
A prediction model for asthma exacerbations after stopping asthma biologics. Ann Allergy Asthma Immunol 2023; 130:305-311. [PMID: 36509405 PMCID: PMC9992017 DOI: 10.1016/j.anai.2022.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 11/29/2022] [Accepted: 11/30/2022] [Indexed: 12/14/2022]
Abstract
BACKGROUND Little is known regarding the prediction of the risks of asthma exacerbation after stopping asthma biologics. OBJECTIVE To develop and validate a predictive model for the risk of asthma exacerbations after stopping asthma biologics using machine learning models. METHODS We identified 3057 people with asthma who stopped asthma biologics in the OptumLabs Database Warehouse and considered a wide range of demographic and clinical risk factors to predict subsequent outcomes. The primary outcome used to assess success after stopping was having no exacerbations in the 6 months after stopping the biologic. Elastic-net logistic regression (GLMnet), random forest, and gradient boosting machine models were used with 10-fold cross-validation within a development (80%) cohort and validation cohort (20%). RESULTS The mean age of the total cohort was 47.1 (SD, 17.1) years, 1859 (60.8%) were women, 2261 (74.0%) were White, and 1475 (48.3%) were in the Southern region of the United States. The elastic-net logistic regression model yielded an area under the curve (AUC) of 0.75 (95% confidence interval [CI], 0.71-0.78) in the development and an AUC of 0.72 in the validation cohort. The random forest model yielded an AUC of 0.75 (95% CI, 0.68-0.79) in the development cohort and an AUC of 0.72 in the validation cohort. The gradient boosting machine model yielded an AUC of 0.76 (95% CI, 0.72-0.80) in the development cohort and an AUC of 0.74 in the validation cohort. CONCLUSION Outcomes after stopping asthma biologics can be predicted with moderate accuracy using machine learning methods.
Collapse
|
17
|
Improving Intensive Care Unit Early Readmission Prediction Using Optimized and Explainable Machine Learning. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:3455. [PMID: 36834150 PMCID: PMC9960143 DOI: 10.3390/ijerph20043455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/10/2023] [Accepted: 02/14/2023] [Indexed: 06/18/2023]
Abstract
It is of great interest to develop and introduce new techniques to automatically and efficiently analyze the enormous amount of data generated in today's hospitals, using state-of-the-art artificial intelligence methods. Patients readmitted to the ICU in the same hospital stay have a higher risk of mortality, morbidity, longer length of stay, and increased cost. The methodology proposed to predict ICU readmission could improve the patients' care. The objective of this work is to explore and evaluate the potential improvement of existing models for predicting early ICU patient readmission by using optimized artificial intelligence algorithms and explainability techniques. In this work, XGBoost is used as a predictor model, combined with Bayesian techniques to optimize it. The results obtained predicted early ICU readmission (AUROC of 0.92 ± 0.03) improves state-of-the-art consulted works (whose AUROC oscillate between 0.66 and 0.78). Moreover, we explain the internal functioning of the model by using Shapley Additive Explanation-based techniques, allowing us to understand the model internal performance and to obtain useful information, as patient-specific information, the thresholds from which a feature begins to be critical for a certain group of patients, and the feature importance ranking.
Collapse
|
18
|
Editorial: Interpretable and explainable machine learning models in oncology. Front Oncol 2023; 13:1184428. [PMID: 37035194 PMCID: PMC10075249 DOI: 10.3389/fonc.2023.1184428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 03/17/2023] [Indexed: 04/11/2023] Open
|
19
|
Effective Prediction and Important Counseling Experience for Perceived Helpfulness of Social Question and Answering-Based Online Counseling: An Explainable Machine Learning Model. Front Public Health 2022; 10:817570. [PMID: 36620293 PMCID: PMC9815621 DOI: 10.3389/fpubh.2022.817570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 05/09/2022] [Indexed: 12/24/2022] Open
Abstract
The social question answering based online counseling (SQA-OC) is easy access for people seeking professional mental health information and service, has become the crucial pre-consultation and application stage toward online counseling. However, there is a lack of efforts to evaluate and explain the counselors' service quality in such an asynchronous online questioning and answering (QA) format efficiently. This study applied the notion of perceived helpfulness as a public's perception of counselors' service quality in SQA-OC, used computational linguistic and explainable machine learning (XML) methods suited for large-scale QA discourse analysis to build an predictive model, explored how various sources and types of linguistic cues [i.e., Linguistic Inquiry and Word Count (LIWC), topic consistency, linguistic style similarity, emotional similarity] contributed to the perceived helpfulness. Results show that linguistic cues from counselees, counselors, and synchrony between them are important predictors, the linguistic cues and XML can effectively predict and explain the perceived usefulness of SQA-OC, and support operational decision-making for counselors. Five helpful counseling experiences including linguistic styles of "talkative", "empathy", "thoughtful", "concise with distance", and "friendliness and confident" were identified in the SQA-OC. The paper proposed a method to evaluate the perceived helpfulness of SQA-OC service automatically, effectively, and explainable, shedding light on the understanding of the SQA-OC service outcome and the design of a better mechanism for SQA-OC systems.
Collapse
|
20
|
XML-CIMT: Explainable Machine Learning (XML) Model for Predicting Chemical-Induced Mitochondrial Toxicity. Int J Mol Sci 2022; 23:ijms232415655. [PMID: 36555297 PMCID: PMC9779353 DOI: 10.3390/ijms232415655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 12/06/2022] [Accepted: 12/06/2022] [Indexed: 12/14/2022] Open
Abstract
Organ toxicity caused by chemicals is a serious problem in the creation and usage of chemicals such as medications, insecticides, chemical products, and cosmetics. In recent decades, the initiation and development of chemical-induced organ damage have been related to mitochondrial dysfunction, among several adverse effects. Recently, many drugs, for example, troglitazone, have been removed from the marketplace because of significant mitochondrial toxicity. As a result, it is an urgent requirement to develop in silico models that can reliably anticipate chemical-induced mitochondrial toxicity. In this paper, we have proposed an explainable machine-learning model to classify mitochondrially toxic and non-toxic compounds. After several experiments, the Mordred feature descriptor was shortlisted to be used after feature selection. The selected features used with the CatBoost learning algorithm achieved a prediction accuracy of 85% in 10-fold cross-validation and 87.1% in independent testing. The proposed model has illustrated improved prediction accuracy when compared with the existing state-of-the-art method available in the literature. The proposed tree-based ensemble model, along with the global model explanation, will aid pharmaceutical chemists in better understanding the prediction of mitochondrial toxicity.
Collapse
|
21
|
Advances in Computational Polypharmacology. Mol Inform 2022; 41:e2200190. [PMID: 36002382 PMCID: PMC10078381 DOI: 10.1002/minf.202200190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 08/24/2022] [Indexed: 12/13/2022]
Abstract
In drug discovery, polypharmacology encompasses the use of small molecules with defined multi-target activity and in vivo effects resulting from multi-target engagement. Multi-target compounds are often efficacious in the treatment of complex diseases involving target and pathway networks, but might also elicit unwanted side effects. Computational approaches such as target prediction or multi-target ligand design have been used to support polypharmacological drug discovery. In addition to efforts directed at the identification or design of new multi-target compounds, other computational investigations have aimed to differentiate such compounds from potential false-positives or explore the molecular basis of multi-target activities. Herein, a concise overview of the field is provided and recent advances in computational polypharmacology through machine learning are discussed.
Collapse
|
22
|
New onset delirium prediction using machine learning and long short-term memory (LSTM) in electronic health record. J Am Med Inform Assoc 2022; 30:120-131. [PMID: 36303456 PMCID: PMC9748586 DOI: 10.1093/jamia/ocac210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/09/2022] [Accepted: 10/17/2022] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVE To develop and test an accurate deep learning model for predicting new onset delirium in hospitalized adult patients. METHODS Using electronic health record (EHR) data extracted from a large academic medical center, we developed a model combining long short-term memory (LSTM) and machine learning to predict new onset delirium and compared its performance with machine-learning-only models (logistic regression, random forest, support vector machine, neural network, and LightGBM). The labels of models were confusion assessment method (CAM) assessments. We evaluated models on a hold-out dataset. We calculated Shapley additive explanations (SHAP) measures to gauge the feature impact on the model. RESULTS A total of 331 489 CAM assessments with 896 features from 34 035 patients were included. The LightGBM model achieved the best performance (AUC 0.927 [0.924, 0.929] and F1 0.626 [0.618, 0.634]) among the machine learning models. When combined with the LSTM model, the final model's performance improved significantly (P = .001) with AUC 0.952 [0.950, 0.955] and F1 0.759 [0.755, 0.765]. The precision value of the combined model improved from 0.497 to 0.751 with a fixed recall of 0.8. Using the mean absolute SHAP values, we identified the top 20 features, including age, heart rate, Richmond Agitation-Sedation Scale score, Morse fall risk score, pulse, respiratory rate, and level of care. CONCLUSION Leveraging LSTM to capture temporal trends and combining it with the LightGBM model can significantly improve the prediction of new onset delirium, providing an algorithmic basis for the subsequent development of clinical decision support tools for proactive delirium interventions.
Collapse
|
23
|
Explainable Machine-Learning-Based Characterization of Abnormal Cortical Activities for Working Memory of Restless Legs Syndrome Patients. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22207792. [PMID: 36298144 PMCID: PMC9608870 DOI: 10.3390/s22207792] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 10/07/2022] [Accepted: 10/11/2022] [Indexed: 05/31/2023]
Abstract
Restless legs syndrome (RLS) is a sensorimotor disorder accompanied by a strong urge to move the legs and an unpleasant sensation in the legs, and is known to accompany prefrontal dysfunction. Here, we aimed to clarify the neural mechanism of working memory deficits associated with RLS using machine-learning-based analysis of single-trial neural activities. A convolutional neural network classifier was developed to discriminate the cortical activities between RLS patients and normal controls. A layer-wise relevance propagation was applied to the trained classifier in order to determine the critical nodes in the input layer for the output decision, i.e., the time/location of cortical activities discriminating RLS patients and normal controls during working memory tasks. Our method provided high classification accuracy (~94%) from single-trial event-related potentials, which are known to suffer from high inter-trial/inter-subject variation and low signal-to-noise ratio, after strict separation of training/test/validation data according to leave-one-subject-out cross-validation. The determined critical areas overlapped with the cortical substrates of working memory, and the neural activities in these areas were correlated with some significant clinical scores of RLS.
Collapse
|
24
|
Interpretable Machine Learning Models for Molecular Design of Tyrosine Kinase Inhibitors Using Variational Autoencoders and Perturbation-Based Approach of Chemical Space Exploration. Int J Mol Sci 2022; 23:ijms231911262. [PMID: 36232566 PMCID: PMC9569663 DOI: 10.3390/ijms231911262] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/21/2022] [Accepted: 09/21/2022] [Indexed: 11/17/2022] Open
Abstract
In the current study, we introduce an integrative machine learning strategy for the autonomous molecular design of protein kinase inhibitors using variational autoencoders and a novel cluster-based perturbation approach for exploration of the chemical latent space. The proposed strategy combines autoencoder-based embedding of small molecules with a cluster-based perturbation approach for efficient navigation of the latent space and a feature-based kinase inhibition likelihood classifier that guides optimization of the molecular properties and targeted molecular design. In the proposed generative approach, molecules sharing similar structures tend to cluster in the latent space, and interpolating between two molecules in the latent space enables smooth changes in the molecular structures and properties. The results demonstrated that the proposed strategy can efficiently explore the latent space of small molecules and kinase inhibitors along interpretable directions to guide the generation of novel family-specific kinase molecules that display a significant scaffold diversity and optimal biochemical properties. Through assessment of the latent-based and chemical feature-based binary and multiclass classifiers, we developed a robust probabilistic evaluator of kinase inhibition likelihood that is specifically tailored to guide the molecular design of novel SRC kinase molecules. The generated molecules originating from LCK and ABL1 kinase inhibitors yielded ~40% of novel and valid SRC kinase compounds with high kinase inhibition likelihood probability values (p > 0.75) and high similarity (Tanimoto coefficient > 0.6) to the known SRC inhibitors. By combining the molecular perturbation design with the kinase inhibition likelihood analysis and similarity assessments, we showed that the proposed molecular design strategy can produce novel valid molecules and transform known inhibitors of different kinase families into potential chemical probes of the SRC kinase with excellent physicochemical profiles and high similarity to the known SRC kinase drugs. The results of our study suggest that task-specific manipulation of a biased latent space may be an important direction for more effective task-oriented and target-specific autonomous chemical design models.
Collapse
|
25
|
Explainable machine learning for medicinal chemistry: exploring multi-target compounds. Future Med Chem 2022; 14:1171-1173. [PMID: 35775386 DOI: 10.4155/fmc-2022-0122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
26
|
Cluster activation mapping with application to computed tomography scans of the lung. J Med Imaging (Bellingham) 2022; 9:026001. [PMID: 35274026 PMCID: PMC8902064 DOI: 10.1117/1.jmi.9.2.026001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 02/17/2022] [Indexed: 11/14/2022] Open
Abstract
Purpose: An open question in deep clustering is how to explain what in the image is driving the cluster assignments. This is especially important for applications in medical imaging when the derived cluster assignments may inform decision-making or create new disease subtypes. We develop cluster activation mapping (CLAM), which is methodology to create localization maps highlighting the image regions important for cluster assignment. Approach: Our approach uses a linear combination of the activation channels from the last layer of the encoder within a pretrained autoencoder. The activation channels are weighted by a channelwise confidence measure, which is a modification of score-CAM. Results: Our approach performs well under medical imaging-based simulation experiments, when the image clusters differ based on size, location, and intensity of abnormalities. Under simulation, the cluster assignments were predicted with 100% accuracy when the number of clusters was set at the true value. In addition, applied to computed tomography scans from a sarcoidosis population, CLAM identified two subtypes of sarcoidosis based purely on CT scan presentation, which were significantly associated with pulmonary function tests and visual assessment scores, such as ground-glass, fibrosis, and honeycombing. Conclusions: CLAM is a transparent methodology for identifying explainable groupings of medical imaging data. As deep learning networks are often criticized and not trusted due to their lack of interpretability, our contribution of CLAM to deep clustering architectures is critical to our understanding of cluster assignments, which can ultimately lead to new subtypes of diseases.
Collapse
|
27
|
Explainable Machine Learning Model for Predicting First-Time Acute Exacerbation in Patients with Chronic Obstructive Pulmonary Disease. J Pers Med 2022; 12:jpm12020228. [PMID: 35207716 PMCID: PMC8879653 DOI: 10.3390/jpm12020228] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 02/02/2022] [Accepted: 02/03/2022] [Indexed: 12/15/2022] Open
Abstract
Background: The study developed accurate explainable machine learning (ML) models for predicting first-time acute exacerbation of chronic obstructive pulmonary disease (COPD, AECOPD) at an individual level. Methods: We conducted a retrospective case–control study. A total of 606 patients with COPD were screened for eligibility using registry data from the COPD Pay-for-Performance Program (COPD P4P program) database at Changhua Christian Hospital between January 2017 and December 2019. Recursive feature elimination technology was used to select the optimal subset of features for predicting the occurrence of AECOPD. We developed four ML models to predict first-time AECOPD, and the highest-performing model was applied. Finally, an explainable approach based on ML and the SHapley Additive exPlanations (SHAP) and a local explanation method were used to evaluate the risk of AECOPD and to generate individual explanations of the model’s decisions. Results: The gradient boosting machine (GBM) and support vector machine (SVM) models exhibited superior discrimination ability (area under curve [AUC] = 0.833 [95% confidence interval (CI) 0.745–0.921] and AUC = 0.836 [95% CI 0.757–0.915], respectively). The decision curve analysis indicated that the GBM model exhibited a higher net benefit in distinguishing patients at high risk for AECOPD when the threshold probability was <0.55. The COPD Assessment Test (CAT) and the symptom of wheezing were the two most important features and exhibited the highest SHAP values, followed by monocyte count and white blood cell (WBC) count, coughing, red blood cell (RBC) count, breathing rate, oral long-acting bronchodilator use, chronic pulmonary disease (CPD), systolic blood pressure (SBP), and others. Higher CAT score; monocyte, WBC, and RBC counts; BMI; diastolic blood pressure (DBP); neutrophil-to-lymphocyte ratio; and eosinophil and lymphocyte counts were associated with AECOPD. The presence of symptoms (wheezing, dyspnea, coughing), chronic disease (CPD, congestive heart failure [CHF], sleep disorders, and pneumonia), and use of COPD medications (triple-therapy long-acting bronchodilators, short-acting bronchodilators, oral long-acting bronchodilators, and antibiotics) were also positively associated with AECOPD. A high breathing rate, heart rate, or systolic blood pressure and methylxanthine use were negatively correlated with AECOPD. Conclusions: The ML model was able to accurately assess the risk of AECOPD. The ML model combined with SHAP and the local explanation method were able to provide interpretable and visual explanations of individualized risk predictions, which may assist clinical physicians in understanding the effects of key features in the model and the model’s decision-making process.
Collapse
|
28
|
Using Explainable Machine Learning to Improve Intensive Care Unit Alarm Systems. SENSORS 2021; 21:s21217125. [PMID: 34770432 PMCID: PMC8587076 DOI: 10.3390/s21217125] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 10/25/2021] [Accepted: 10/25/2021] [Indexed: 12/27/2022]
Abstract
Due to the continuous monitoring process of critical patients, Intensive Care Units (ICU) generate large amounts of data, which are difficult for healthcare personnel to analyze manually, especially in overloaded situations such as those present during the COVID-19 pandemic. Therefore, the automatic analysis of these data has many practical applications in patient monitoring, including the optimization of alarm systems for alerting healthcare personnel. In this paper, explainable machine learning techniques are used for this purpose, with a methodology based on age-stratification, boosting classifiers, and Shapley Additive Explanations (SHAP) proposed. The methodology is evaluated using MIMIC-III, an ICU patient research database. The results show that the proposed model can predict mortality within the ICU with AUROC values of 0.961, 0.936, 0.898, and 0.883 for age groups 18–45, 45–65, 65–85 and 85+, respectively. By using SHAP, the features with the highest impact in predicting mortality for different age groups and the threshold from which the value of a clinical feature has a negative impact on the patient’s health can be identified. This allows ICU alarms to be improved by identifying the most important variables to be sensed and the threshold values at which the health personnel must be warned.
Collapse
|
29
|
Facing the Challenges of Developing Fair Risk Scoring Models. Front Artif Intell 2021; 4:681915. [PMID: 34723172 PMCID: PMC8552888 DOI: 10.3389/frai.2021.681915] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 08/02/2021] [Indexed: 11/13/2022] Open
Abstract
Algorithmic scoring methods are widely used in the finance industry for several decades in order to prevent risk and to automate and optimize decisions. Regulatory requirements as given by the Basel Committee on Banking Supervision (BCBS) or the EU data protection regulations have led to an increasing interest and research activity on understanding black box machine learning models by means of explainable machine learning. Even though this is a step into a right direction, such methods are not able to guarantee for a fair scoring as machine learning models are not necessarily unbiased and may discriminate with respect to certain subpopulations such as a particular race, gender, or sexual orientation-even if the variable itself is not used for modeling. This is also true for white box methods like logistic regression. In this study, a framework is presented that allows analyzing and developing models with regard to fairness. The proposed methodology is based on techniques of causal inference and some of the methods can be linked to methods from explainable machine learning. A definition of counterfactual fairness is given together with an algorithm that results in a fair scoring model. The concepts are illustrated by means of a transparent simulation and a popular real-world example, the German Credit data using traditional scorecard models based on logistic regression and weight of evidence variable pre-transform. In contrast to previous studies in the field for our study, a corrected version of the data is presented and used. With the help of the simulation, the trade-off between fairness and predictive accuracy is analyzed. The results indicate that it is possible to remove unfairness without a strong performance decrease unless the correlation of the discriminative attributes on the other predictor variables in the model is not too strong. In addition, the challenge in explaining the resulting scoring model and the associated fairness implications to users is discussed.
Collapse
|
30
|
Deep Learning Model to Predict Serious Infection Among Children With Central Venous Lines. Front Pediatr 2021; 9:726870. [PMID: 34604142 PMCID: PMC8480258 DOI: 10.3389/fped.2021.726870] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 08/06/2021] [Indexed: 12/23/2022] Open
Abstract
Objective: Predict the onset of presumed serious infection, defined as a positive blood culture drawn and new antibiotic course of at least 4 days (PSI*), among pediatric patients with Central Venous Lines (CVLs). Design: Retrospective cohort study. Setting: Single academic children's hospital. Patients: All hospital encounters from January 2013 to December 2018, excluding the ones without a CVL or with a length-of-stay shorter than 24 h. Measurements and Main Results: Clinical features including demographics, laboratory results, vital signs, characteristics of the CVLs and medications used were extracted retrospectively from electronic medical records. Data were aggregated across all hospitals within a single pediatric health system and used to train a deep learning model to predict the occurrence of PSI* during the next 48 h of hospitalization. The proposed model prediction was compared to prediction of PSI* by a marker of illness severity (PELOD-2). The baseline prevalence of line infections was 0.34% over all segmented 48-h time windows. Events were identified among cases using onset time. All data from admission till the onset was used for cases and among controls we used all data from admission till discharge. The benchmarks were aggregated over all 48 h time windows [N=748,380 associated with 27,137 patient encounters]. The model achieved an area under the receiver operating characteristic curve of 0.993 (95% CI = [0.990, 0.996]), the enriched positive predictive value (PPV) was 23 times greater than the base prevalence. Conversely, prediction by PELOD-2 achieved a lower PPV of 1.5% [0.9%, 2.1%] which was 5 times the baseline prevalence. Conclusion: A deep learning model that employs common clinical features in the electronic health record can help predict the onset of CLABSI in hospitalized children with central venous line 48 hours prior to the time of specimen collection.
Collapse
|
31
|
Machine learning predicts per-vessel early coronary revascularization after fast myocardial perfusion SPECT: results from multicentre REFINE SPECT registry. Eur Heart J Cardiovasc Imaging 2021; 21:549-559. [PMID: 31317178 DOI: 10.1093/ehjci/jez177] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Indexed: 01/17/2023] Open
Abstract
AIMS To optimize per-vessel prediction of early coronary revascularization (ECR) within 90 days after fast single-photon emission computed tomography (SPECT) myocardial perfusion imaging (MPI) using machine learning (ML) and introduce a method for a patient-specific explanation of ML results in a clinical setting. METHODS AND RESULTS A total of 1980 patients with suspected coronary artery disease (CAD) underwent stress/rest 99mTc-sestamibi/tetrofosmin MPI with new-generation SPECT scanners were included. All patients had invasive coronary angiography within 6 months after SPECT MPI. ML utilized 18 clinical, 9 stress test, and 28 imaging variables to predict per-vessel and per-patient ECR with 10-fold cross-validation. Area under the receiver operator characteristics curve (AUC) of ML was compared with standard quantitative analysis [total perfusion deficit (TPD)] and expert interpretation. ECR was performed in 958 patients (48%). Per-vessel, the AUC of ECR prediction by ML (AUC 0.79, 95% confidence interval (CI) [0.77, 0.80]) was higher than by regional stress TPD (0.71, [0.70, 0.73]), combined-view stress TPD (AUC 0.71, 95% CI [0.69, 0.72]), or ischaemic TPD (AUC 0.72, 95% CI [0.71, 0.74]), all P < 0.001. Per-patient, the AUC of ECR prediction by ML (AUC 0.81, 95% CI [0.79, 0.83]) was higher than that of stress TPD, combined-view TPD, and ischaemic TPD, all P < 0.001. ML also outperformed nuclear cardiologists' expert interpretation of MPI for the prediction of early revascularization performance. A method to explain ML prediction for an individual patient was also developed. CONCLUSION In patients with suspected CAD, the prediction of ECR by ML outperformed automatic MPI quantitation by TPDs (per-vessel and per-patient) or nuclear cardiologists' expert interpretation (per-patient).
Collapse
|
32
|
A Novel Coupled Reaction-Diffusion System for Explainable Gene Expression Profiling. SENSORS (BASEL, SWITZERLAND) 2021; 21:2190. [PMID: 33801002 PMCID: PMC8003942 DOI: 10.3390/s21062190] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 03/06/2021] [Accepted: 03/08/2021] [Indexed: 12/20/2022]
Abstract
Machine learning (ML)-based algorithms are playing an important role in cancer diagnosis and are increasingly being used to aid clinical decision-making. However, these commonly operate as 'black boxes' and it is unclear how decisions are derived. Recently, techniques have been applied to help us understand how specific ML models work and explain the rational for outputs. This study aims to determine why a given type of cancer has a certain phenotypic characteristic. Cancer results in cellular dysregulation and a thorough consideration of cancer regulators is required. This would increase our understanding of the nature of the disease and help discover more effective diagnostic, prognostic, and treatment methods for a variety of cancer types and stages. Our study proposes a novel explainable analysis of potential biomarkers denoting tumorigenesis in non-small cell lung cancer. A number of these biomarkers are known to appear following various treatment pathways. An enhanced analysis is enabled through a novel mathematical formulation for the regulators of mRNA, the regulators of ncRNA, and the coupled mRNA-ncRNA regulators. Temporal gene expression profiles are approximated in a two-dimensional spatial domain for the transition states before converging to the stationary state, using a system comprised of coupled-reaction partial differential equations. Simulation experiments demonstrate that the proposed mathematical gene-expression profile represents a best fit for the population abundance of these oncogenes. In future, our proposed solution can lead to the development of alternative interpretable approaches, through the application of ML models to discover unknown dynamics in gene regulatory systems.
Collapse
|
33
|
Prediction of Long-Term Stroke Recurrence Using Machine Learning Models. J Clin Med 2021; 10:jcm10061286. [PMID: 33804724 PMCID: PMC8003970 DOI: 10.3390/jcm10061286] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 03/15/2021] [Accepted: 03/16/2021] [Indexed: 01/01/2023] Open
Abstract
Background: The long-term risk of recurrent ischemic stroke, estimated to be between 17% and 30%, cannot be reliably assessed at an individual level. Our goal was to study whether machine-learning can be trained to predict stroke recurrence and identify key clinical variables and assess whether performance metrics can be optimized. Methods: We used patient-level data from electronic health records, six interpretable algorithms (Logistic Regression, Extreme Gradient Boosting, Gradient Boosting Machine, Random Forest, Support Vector Machine, Decision Tree), four feature selection strategies, five prediction windows, and two sampling strategies to develop 288 models for up to 5-year stroke recurrence prediction. We further identified important clinical features and different optimization strategies. Results: We included 2091 ischemic stroke patients. Model area under the receiver operating characteristic (AUROC) curve was stable for prediction windows of 1, 2, 3, 4, and 5 years, with the highest score for the 1-year (0.79) and the lowest score for the 5-year prediction window (0.69). A total of 21 (7%) models reached an AUROC above 0.73 while 110 (38%) models reached an AUROC greater than 0.7. Among the 53 features analyzed, age, body mass index, and laboratory-based features (such as high-density lipoprotein, hemoglobin A1c, and creatinine) had the highest overall importance scores. The balance between specificity and sensitivity improved through sampling strategies. Conclusion: All of the selected six algorithms could be trained to predict the long-term stroke recurrence and laboratory-based variables were highly associated with stroke recurrence. The latter could be targeted for personalized interventions. Model performance metrics could be optimized, and models can be implemented in the same healthcare system as intelligent decision support for targeted intervention.
Collapse
|
34
|
Analysis of Gut Microbiome Using Explainable Machine Learning Predicts Risk of Diarrhea Associated With Tyrosine Kinase Inhibitor Neratinib: A Pilot Study. Front Oncol 2021; 11:604584. [PMID: 33796451 PMCID: PMC8008168 DOI: 10.3389/fonc.2021.604584] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 01/22/2021] [Indexed: 01/22/2023] Open
Abstract
Neratinib has great efficacy in treating HER2+ breast cancer but is associated with significant gastrointestinal toxicity. The objective of this pilot study was to understand the association of gut microbiome and neratinib-induced diarrhea. Twenty-five patients (age ≥ 60) were enrolled in a phase II trial evaluating safety and tolerability of neratinib in older adults with HER2+ breast cancer (NCT02673398). Fifty stool samples were collected from 11 patients at baseline and during treatment. 16S rRNA analysis was performed and relative abundance data were generated. Shannon's diversity was calculated to examine gut microbiome dysbiosis. An explainable tree-based approach was utilized to classify patients who might experience neratinib-related diarrhea (grade ≥ 1) based on pre-treatment baseline microbial relative abundance data. The hold-out Area Under Receiver Operating Characteristic and Area Under Precision-Recall Curves of the model were 0.88 and 0.95, respectively. Model explanations showed that patients with a larger relative abundance of Ruminiclostridium 9 and Bacteroides sp. HPS0048 may have reduced risk of neratinib-related diarrhea and was confirmed by Kruskal-Wallis test (p ≤ 0.05, uncorrected). Our machine learning model identified microbiota associated with reduced risk of neratinib-induced diarrhea and the result from this pilot study will be further verified in a larger study. CLINICAL TRIAL REGISTRATION ClinicalTrials.gov, identifier NCT02673398.
Collapse
|
35
|
Interpretable Machine Learning Models for Three-Way Classification of Cognitive Workload Levels for Eye-Tracking Features. Brain Sci 2021; 11:brainsci11020210. [PMID: 33572232 PMCID: PMC7914927 DOI: 10.3390/brainsci11020210] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 01/12/2021] [Accepted: 02/03/2021] [Indexed: 11/16/2022] Open
Abstract
The paper is focussed on the assessment of cognitive workload level using selected machine learning models. In the study, eye-tracking data were gathered from 29 healthy volunteers during examination with three versions of the computerised version of the digit symbol substitution test (DSST). Understanding cognitive workload is of great importance in analysing human mental fatigue and the performance of intellectual tasks. It is also essential in the context of explanation of the brain cognitive process. Eight three-class classification machine learning models were constructed and analysed. Furthermore, the technique of interpretable machine learning model was applied to obtain the measures of feature importance and its contribution to the brain cognitive functions. The measures allowed improving the quality of classification, simultaneously lowering the number of applied features to six or eight, depending on the model. Moreover, the applied method of explainable machine learning provided valuable insights into understanding the process accompanying various levels of cognitive workload. The main classification performance metrics, such as F1, recall, precision, accuracy, and the area under the Receiver operating characteristic curve (ROC AUC) were used in order to assess the quality of classification quantitatively. The best result obtained on the complete feature set was as high as 0.95 (F1); however, feature importance interpretation allowed increasing the result up to 0.97 with only seven of 20 features applied.
Collapse
|
36
|
Early Detection of Septic Shock Onset Using Interpretable Machine Learners. J Clin Med 2021; 10:jcm10020301. [PMID: 33467539 PMCID: PMC7830968 DOI: 10.3390/jcm10020301] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 12/31/2020] [Accepted: 01/12/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Developing a decision support system based on advances in machine learning is one area for strategic innovation in healthcare. Predicting a patient's progression to septic shock is an active field of translational research. The goal of this study was to develop a working model of a clinical decision support system for predicting septic shock in an acute care setting for up to 6 h from the time of admission in an integrated healthcare setting. METHOD Clinical data from Electronic Health Record (EHR), at encounter level, were used to build a predictive model for progression from sepsis to septic shock up to 6 h from the time of admission; that is, T = 1, 3, and 6 h from admission. Eight different machine learning algorithms (Random Forest, XGBoost, C5.0, Decision Trees, Boosted Logistic Regression, Support Vector Machine, Logistic Regression, Regularized Logistic, and Bayes Generalized Linear Model) were used for model development. Two adaptive sampling strategies were used to address the class imbalance. Data from two sources (clinical and billing codes) were used to define the case definition (septic shock) using the Centers for Medicare & Medicaid Services (CMS) Sepsis criteria. The model assessment was performed using Area under Receiving Operator Characteristics (AUROC), sensitivity, and specificity. Model predictions for each feature window (1, 3 and 6 h from admission) were consolidated. RESULTS Retrospective data from April 2005 to September 2018 were extracted from the EHR, Insurance Claims, Billing, and Laboratory Systems to create a dataset for septic shock detection. The clinical criteria and billing information were used to label patients into two classes-septic shock patients and sepsis patients at three different time points from admission, creating two different case-control cohorts. Data from 45,425 unique in-patient visits were used to build 96 prediction models comparing clinical-based definition versus billing-based information as the gold standard. Of the 24 consolidated models (based on eight machine learning algorithms and three feature windows), four models reached an AUROC greater than 0.9. Overall, all the consolidated models reached an AUROC of at least 0.8820 or higher. Based on the AUROC of 0.9483, the best model was based on Random Forest, with a sensitivity of 83.9% and specificity of 88.1%. The sepsis detection window at 6 h outperformed the 1 and 3-h windows. The sepsis definition based on clinical variables had improved performance when compared to the sepsis definition based on only billing information. CONCLUSION This study corroborated that machine learning models can be developed to predict septic shock using clinical and administrative data. However, the use of clinical information to define septic shock outperformed models developed based on only administrative data. Intelligent decision support tools can be developed and integrated into the EHR and improve clinical outcomes and facilitate the optimization of resources in real-time.
Collapse
|
37
|
Explainable Machine Learning Approach as a Tool to Understand Factors Used to Select the Refractive Surgery Technique on the Expert Level. Transl Vis Sci Technol 2020; 9:8. [PMID: 32704414 PMCID: PMC7346876 DOI: 10.1167/tvst.9.2.8] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Accepted: 11/18/2019] [Indexed: 12/23/2022] Open
Abstract
Purpose Recently, laser refractive surgery options, including laser epithelial keratomileusis, laser in situ keratomileusis, and small incision lenticule extraction, successfully improved patients' quality of life. Evidence-based recommendation for an optimal surgery technique is valuable in increasing patient satisfaction. We developed an interpretable multiclass machine learning model that selects the laser surgery option on the expert level. Methods A multiclass XGBoost model was constructed to classify patients into four categories including laser epithelial keratomileusis, laser in situ keratomileusis, small incision lenticule extraction, and contraindication groups. The analysis included 18,480 subjects who intended to undergo refractive surgery at the B&VIIT Eye center. Training (n = 10,561) and internal validation (n = 2640) were performed using subjects who visited between 2016 and 2017. The model was trained based on clinical decisions of highly experienced experts and ophthalmic measurements. External validation (n = 5279) was conducted using subjects who visited in 2018. The SHapley Additive ex-Planations technique was adopted to explain the output of the XGBoost model. Results The multiclass XGBoost model exhibited an accuracy of 81.0% and 78.9% when tested on the internal and external validation datasets, respectively. The SHapley Additive ex-Planations explanations for the results were consistent with prior knowledge from ophthalmologists. The explanation from one-versus-one and one-versus-rest XGBoost classifiers was effective for easily understanding users in the multicategorical classification problem. Conclusions This study suggests an expert-level multiclass machine learning model for selecting the refractive surgery for patients. It also provided a clinical understanding in a multiclass problem based on an explainable artificial intelligence technique. Translational Relevance Explainable machine learning exhibits a promising future for increasing the practical use of artificial intelligence in ophthalmic clinics.
Collapse
|