1
|
Klein L, Ziegler S, Laufer F, Debus C, Götz M, Maier-Hein K, Paetzold UW, Isensee F, Jäger PF. Discovering Process Dynamics for Scalable Perovskite Solar Cell Manufacturing with Explainable AI. Adv Mater 2024; 36:e2307160. [PMID: 37904613 DOI: 10.1002/adma.202307160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 09/27/2023] [Indexed: 11/01/2023]
Abstract
Large-area processing of perovskite semiconductor thin-films is complex and evokes unexplained variance in quality, posing a major hurdle for the commercialization of perovskite photovoltaics. Advances in scalable fabrication processes are currently limited to gradual and arbitrary trial-and-error procedures. While the in situ acquisition of photoluminescence (PL) videos has the potential to reveal important variations in the thin-film formation process, the high dimensionality of the data quickly surpasses the limits of human analysis. In response, this study leverages deep learning (DL) and explainable artificial intelligence (XAI) to discover relationships between sensor information acquired during the perovskite thin-film formation process and the resulting solar cell performance indicators, while rendering these relationships humanly understandable. The study further shows how gained insights can be distilled into actionable recommendations for perovskite thin-film processing, advancing toward industrial-scale solar cell manufacturing. This study demonstrates that XAI methods will play a critical role in accelerating energy materials science.
Collapse
Affiliation(s)
- Lukas Klein
- Interactive Machine Learning Group, German Cancer Research Center, 69120, Heidelberg, Germany
- Institute for Machine Learning, ETH Zürich, Zürich, 8092, Switzerland
- Helmholtz Imaging, German Cancer Research Center, 69120, Heidelberg, Germany
| | - Sebastian Ziegler
- Helmholtz Imaging, German Cancer Research Center, 69120, Heidelberg, Germany
- Division of Medical Image Computing, German Cancer Research Center, 69120, Heidelberg, Germany
| | - Felix Laufer
- Light Technology Institute, Karlsruhe Institute of Technology, 76131, Karlsruhe, Germany
| | - Charlotte Debus
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
- Helmholtz AI, Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
| | - Markus Götz
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
- Helmholtz AI, Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
| | - Klaus Maier-Hein
- Helmholtz Imaging, German Cancer Research Center, 69120, Heidelberg, Germany
- Division of Medical Image Computing, German Cancer Research Center, 69120, Heidelberg, Germany
| | - Ulrich W Paetzold
- Light Technology Institute, Karlsruhe Institute of Technology, 76131, Karlsruhe, Germany
- Institute of Microstructure Technology, Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
| | - Fabian Isensee
- Helmholtz Imaging, German Cancer Research Center, 69120, Heidelberg, Germany
- Division of Medical Image Computing, German Cancer Research Center, 69120, Heidelberg, Germany
| | - Paul F Jäger
- Interactive Machine Learning Group, German Cancer Research Center, 69120, Heidelberg, Germany
- Helmholtz Imaging, German Cancer Research Center, 69120, Heidelberg, Germany
| |
Collapse
|
2
|
Nahiduzzaman M, Chowdhury MEH, Salam A, Nahid E, Ahmed F, Al-Emadi N, Ayari MA, Khandakar A, Haider J. Explainable deep learning model for automatic mulberry leaf disease classification. Front Plant Sci 2023; 14:1175515. [PMID: 37794930 PMCID: PMC10546311 DOI: 10.3389/fpls.2023.1175515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 08/28/2023] [Indexed: 10/06/2023]
Abstract
Mulberry leaves feed Bombyx mori silkworms to generate silk thread. Diseases that affect mulberry leaves have reduced crop and silk yields in sericulture, which produces 90% of the world's raw silk. Manual leaf disease identification is tedious and error-prone. Computer vision can categorize leaf diseases early and overcome the challenges of manual identification. No mulberry leaf deep learning (DL) models have been reported. Therefore, in this study, two types of leaf diseases: leaf rust and leaf spot, with disease-free leaves, were collected from two regions of Bangladesh. Sericulture experts annotated the leaf images. The images were pre-processed, and 6,000 synthetic images were generated using typical image augmentation methods from the original 764 training images. Additional 218 and 109 images were employed for testing and validation respectively. In addition, a unique lightweight parallel depth-wise separable CNN model, PDS-CNN was developed by applying depth-wise separable convolutional layers to reduce parameters, layers, and size while boosting classification performance. Finally, the explainable capability of PDS-CNN is obtained through the use of SHapley Additive exPlanations (SHAP) evaluated by a sericulture specialist. The proposed PDS-CNN outperforms well-known deep transfer learning models, achieving an optimistic accuracy of 95.05 ± 2.86% for three-class classifications and 96.06 ± 3.01% for binary classifications with only 0.53 million parameters, 8 layers, and a size of 6.3 megabytes. Furthermore, when compared with other well-known transfer models, the proposed model identified mulberry leaf diseases with higher accuracy, fewer factors, fewer layers, and lower overall size. The visually expressive SHAP explanation images validate the models' findings aligning with the predictions made the sericulture specialist. Based on these findings, it is possible to conclude that the explainable AI (XAI)-based PDS-CNN can provide sericulture specialists with an effective tool for accurately categorizing mulberry leaves.
Collapse
Affiliation(s)
- Md. Nahiduzzaman
- Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh
- Department of Electrical Engineering, Qatar University, Doha, Qatar
| | | | - Abdus Salam
- Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh
| | - Emama Nahid
- Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh
| | - Faruque Ahmed
- Bangladesh Sericulture Research and Training Institute, Rajshahi, Bangladesh
| | - Nasser Al-Emadi
- Department of Electrical Engineering, Qatar University, Doha, Qatar
| | - Mohamed Arselene Ayari
- Department of Civil and Environmental Engineering, Qatar University, Doha, Qatar
- Technology Innovation and Engineering Education Unit, Qatar University, Doha, Qatar
| | - Amith Khandakar
- Department of Electrical Engineering, Qatar University, Doha, Qatar
| | - Julfikar Haider
- Department of Engineering, Manchester Metropolitan University, Manchester, United Kingdom
| |
Collapse
|
3
|
Zhang H, Ogasawara K. Grad-CAM-Based Explainable Artificial Intelligence Related to Medical Text Processing. Bioengineering (Basel) 2023; 10:1070. [PMID: 37760173 PMCID: PMC10525184 DOI: 10.3390/bioengineering10091070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 08/28/2023] [Accepted: 09/06/2023] [Indexed: 09/29/2023] Open
Abstract
The opacity of deep learning makes its application challenging in the medical field. Therefore, there is a need to enable explainable artificial intelligence (XAI) in the medical field to ensure that models and their results can be explained in a manner that humans can understand. This study uses a high-accuracy computer vision algorithm model to transfer learning to medical text tasks and uses the explanatory visualization method known as gradient-weighted class activation mapping (Grad-CAM) to generate heat maps to ensure that the basis for decision-making can be provided intuitively or via the model. The system comprises four modules: pre-processing, word embedding, classifier, and visualization. We used Word2Vec and BERT to compare word embeddings and use ResNet and 1Dimension convolutional neural networks (CNN) to compare classifiers. Finally, the Bi-LSTM was used to perform text classification for direct comparison. With 25 epochs, the model that used pre-trained ResNet on the formalized text presented the best performance (recall of 90.9%, precision of 91.1%, and an F1 score of 90.2% weighted). This study uses ResNet to process medical texts through Grad-CAM-based explainable artificial intelligence and obtains a high-accuracy classification effect; at the same time, through Grad-CAM visualization, it intuitively shows the words to which the model pays attention when making predictions.
Collapse
Affiliation(s)
| | - Katsuhiko Ogasawara
- Graduate School of Health Science, Hokkaido University, N12-W5, Kitaku, Sapporo 060-0812, Japan
| |
Collapse
|
4
|
Kim KH, Lee BJ, Koo HW. Effect of Cilostazol on Delayed Cerebral Infarction in Aneurysmal Subarachnoid Hemorrhage Using Explainable Predictive Modeling. Bioengineering (Basel) 2023; 10:797. [PMID: 37508824 PMCID: PMC10376257 DOI: 10.3390/bioengineering10070797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 06/26/2023] [Accepted: 07/02/2023] [Indexed: 07/30/2023] Open
Abstract
The studies interpreting DCI, a complication of SAH, and identifying correlations are very limited. This study aimed to investigate the effect of cilostazol on ACV and DCI after coil embolization for ruptured aneurysms (n = 432). A multivariate analysis was performed and explainable artificial intelligence approaches were used to analyze the contribution of cilostazol as a risk factor on the development of ACV and DCI with respect to global and local interpretation. The cilonimo group was significantly lower than the nimo group in ACV (13.5% vs. 29.3; p = 0.003) and DCI (7.9% vs. 20.7%; p = 0.006), respectively. In a multivariate logistic regression, the odds ratio for DCI for the cilonimo group, female sex, and aneurysm size was 0.556 (95% confidence interval (CI), 0.351-0.879; p = 0.012), 3.713 (95% CI, 1.683-8.191; p = 0.001), and 1.106 (95% CI, 1.008-1.214; p = 0.034). The risk of a DCI occurrence was significantly increased with an aneurysm size greater than 10 mm (max 80%). The mean AUC of the XGBoost and logistic regression models was 0.94 ± 0.03 and 0.95 ± 0.04, respectively. Cilostazol treatment combined with nimodipine could decrease the prevalence of ACV (13.5%) and DCI (7.9%) in patients with aSAH.
Collapse
Affiliation(s)
- Kwang Hyeon Kim
- Department of Neurosurgery, College of Medicine, Inje University Ilsan Paik Hospital, Goyang 10380, Republic of Korea
| | - Byung-Jou Lee
- Department of Neurosurgery, College of Medicine, Inje University Ilsan Paik Hospital, Goyang 10380, Republic of Korea
| | - Hae-Won Koo
- Department of Neurosurgery, College of Medicine, Inje University Ilsan Paik Hospital, Goyang 10380, Republic of Korea
| |
Collapse
|
5
|
GhoshRoy D, Alvi PA, Santosh KC. Unboxing Industry-Standard AI Models for Male Fertility Prediction with SHAP. Healthcare (Basel) 2023; 11:929. [PMID: 37046855 PMCID: PMC10094449 DOI: 10.3390/healthcare11070929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 03/21/2023] [Accepted: 03/21/2023] [Indexed: 04/14/2023] Open
Abstract
Infertility is a social stigma for individuals, and male factors cause approximately 30% of infertility. Despite this, male infertility is underrecognized and underrepresented as a disease. According to the World Health Organization (WHO), changes in lifestyle and environmental factors are the prime reasons for the declining rate of male fertility. Artificial intelligence (AI)/machine learning (ML) models have become an effective solution for early fertility detection. Seven industry-standard ML models are used: support vector machine, random forest (RF), decision tree, logistic regression, naïve bayes, adaboost, and multi-layer perception to detect male fertility. Shapley additive explanations (SHAP) are vital tools that examine the feature's impact on each model's decision making. On these, we perform a comprehensive comparative study to identify good and poor classification models. While dealing with the all-above-mentioned models, the RF model achieves an optimal accuracy and area under curve (AUC) of 90.47% and 99.98%, respectively, by considering five-fold cross-validation (CV) with the balanced dataset. Furthermore, we provide the SHAP explanations of existing models that attain good and poor performance. The findings of this study show that decision making (based on ML models) with SHAP provides thorough explanations for detecting male fertility, as well as a reference for clinicians for further treatment planning.
Collapse
Affiliation(s)
- Debasmita GhoshRoy
- School of Automation, Banasthali Vidyapith, Tonk 304022, Rajasthan, India
- Applied AI Research Lab, Vermillion, SD 57069, USA
| | - Parvez Ahmad Alvi
- Department of Physics, Banasthali Vidyapith, Tonk 304022, Rajasthan, India
| | - KC Santosh
- Applied AI Research Lab, Vermillion, SD 57069, USA
- Department of Computer Science, University of South Dakota, Vermillion, SD 57069, USA
| |
Collapse
|
6
|
Kim M, Kim D, Kim G. Examining the Relationship between Land Use/Land Cover (LULC) and Land Surface Temperature (LST) Using Explainable Artificial Intelligence (XAI) Models: A Case Study of Seoul, South Korea. Int J Environ Res Public Health 2022; 19:15926. [PMID: 36498000 PMCID: PMC9740204 DOI: 10.3390/ijerph192315926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 11/25/2022] [Accepted: 11/28/2022] [Indexed: 06/17/2023]
Abstract
Understanding the relationship between land use/land cover (LULC) and land surface temperature (LST) has long been an area of interest in urban and environmental study fields. To examine this, existing studies have utilized both white-box and black-box approaches, including regression, decision tree, and artificial intelligence models. To overcome the limitations of previous models, this study adopted the explainable artificial intelligence (XAI) approach in examining the relationships between LULC and LST. By integrating the XGBoost and SHAP model, we developed the LST prediction model in Seoul and estimated the LST reduction effects after specific LULC changes. Results showed that the prediction accuracy of LST was maximized when landscape, topographic, and LULC features within a 150 m buffer radius were adopted as independent variables. Specifically, the existence of surrounding built-up and vegetation areas were found to be the most influencing factors in explaining LST. In this study, after the LULC changes from expressway to green areas, approximately 1.5 °C of decreasing LST was predicted. The findings of our study can be utilized for assessing and monitoring the thermal environmental impact of urban planning and projects. Also, this study can contribute to determining the priorities of different policy measures for improving the thermal environment.
Collapse
Affiliation(s)
- Minjun Kim
- Department of Environmental Planning, Korea Environment Institute, Sejong 30147, Republic of Korea
| | - Dongbeom Kim
- Technical Research Institute NEGGA Co., Ltd., Seoul 07220, Republic of Korea
| | - Geunhan Kim
- Department of Environmental Planning, Korea Environment Institute, Sejong 30147, Republic of Korea
| |
Collapse
|
7
|
Khan IU, Aslam N, AlShedayed R, AlFrayan D, AlEssa R, AlShuail NA, Al Safwan A. A Proactive Attack Detection for Heating, Ventilation, and Air Conditioning (HVAC) System Using Explainable Extreme Gradient Boosting Model (XGBoost). Sensors (Basel) 2022; 22:9235. [PMID: 36501938 PMCID: PMC9740645 DOI: 10.3390/s22239235] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 11/07/2022] [Accepted: 11/07/2022] [Indexed: 06/17/2023]
Abstract
The advent of Industry 4.0 has revolutionized the life enormously. There is a growing trend towards the Internet of Things (IoT), which has made life easier on the one hand and improved services on the other. However, it also has vulnerabilities due to cyber security attacks. Therefore, there is a need for intelligent and reliable security systems that can proactively analyze the data generated by these devices and detect cybersecurity attacks. This study proposed a proactive interpretable prediction model using ML and explainable artificial intelligence (XAI) to detect different types of security attacks using the log data generated by heating, ventilation, and air conditioning (HVAC) attacks. Several ML algorithms were used, such as Decision Tree (DT), Random Forest (RF), Gradient Boosting (GB), Ada Boost (AB), Light Gradient Boosting (LGBM), Extreme Gradient Boosting (XGBoost), and CatBoost (CB). Furthermore, feature selection was performed using stepwise forward feature selection (FFS) technique. To alleviate the data imbalance, SMOTE and Tomeklink were used. In addition, SMOTE achieved the best results with selected features. Empirical experiments were conducted, and the results showed that the XGBoost classifier has produced the best result with 0.9999 Area Under the Curve (AUC), 0.9998, accuracy (ACC), 0.9996 Recall, 1.000 Precision and 0.9998 F1 Score got the best result. Additionally, XAI was applied to the best performing model to add the interpretability in the black-box model. Local and global explanations were generated using LIME and SHAP. The results of the proposed study have confirmed the effectiveness of ML for predicting the cyber security attacks on IoT devices and Industry 4.0.
Collapse
Affiliation(s)
- Irfan Ullah Khan
- SAUDI ARAMCO Cybersecurity Chair, Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Nida Aslam
- SAUDI ARAMCO Cybersecurity Chair, Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Rana AlShedayed
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Dina AlFrayan
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Rand AlEssa
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Noura A. AlShuail
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Alhawra Al Safwan
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| |
Collapse
|
8
|
Nakanishi A, Fukunishi H, Matsumoto R, Eguchi F. Development of a Prediction Method of Cell Density in Autotrophic/Heterotrophic Microorganism Mixtures by Machine Learning Using Absorbance Spectrum Data. BioTech (Basel) 2022; 11:46. [PMID: 36278558 DOI: 10.3390/biotech11040046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 10/09/2022] [Accepted: 10/11/2022] [Indexed: 11/06/2022]
Abstract
Microflora is actively used to produce value-added materials in industry, and each cell density should be controlled for stable microflora use. In this study, a simple system evaluating the cell density was constructed with artificial intelligence (AI) using the absorbance spectra data of microflora. To set up the system, the prediction system for cell density based on machine learning was constructed using the spectra data as the feature from the mixture of Saccharomyces cerevisiae and Chlamydomonas reinhardtii. As the results of predicting cell density by extremely randomized trees, when the cell densities of S. cerevisiae and C. reinhardtii were shifted and fixed, the coefficient of determination (R2) was 0.8495; on the other hand, when the cell densities of S. cerevisiae and C. reinhardtii were fixed and shifted, the R2 was 0.9232. To explain the prediction system, the randomized trees regressor of the decision tree-based ensemble learning method as the machine learning algorithm and Shapley additive explanations (SHAPs) as the explainable AI (XAI) to interpret the features contributing to the prediction results were used. As a result of the SHAP analyses, not only the optical density, but also the absorbance of the Soret and Q bands derived from the chloroplasts of C. reinhardtii could contribute to the prediction as the features. The simple cell density evaluating system could have an industrial impact.
Collapse
|
9
|
Huynh TMT, Ni CF, Su YS, Nguyen VCN, Lee IH, Lin CP, Nguyen HH. Predicting Heavy Metal Concentrations in Shallow Aquifer Systems Based on Low-Cost Physiochemical Parameters Using Machine Learning Techniques. Int J Environ Res Public Health 2022; 19:ijerph191912180. [PMID: 36231480 PMCID: PMC9566676 DOI: 10.3390/ijerph191912180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 09/20/2022] [Accepted: 09/20/2022] [Indexed: 05/07/2023]
Abstract
Monitoring ex-situ water parameters, namely heavy metals, needs time and laboratory work for water sampling and analytical processes, which can retard the response to ongoing pollution events. Previous studies have successfully applied fast modeling techniques such as artificial intelligence algorithms to predict heavy metals. However, neither low-cost feature predictability nor explainability assessments have been considered in the modeling process. This study proposes a reliable and explainable framework to find an effective model and feature set to predict heavy metals in groundwater. The integrated assessment framework has four steps: model selection uncertainty, feature selection uncertainty, predictive uncertainty, and model interpretability. The results show that Random Forest is the most suitable model, and quick-measure parameters can be used as predictors for arsenic (As), iron (Fe), and manganese (Mn). Although the model performance is auspicious, it likely produces significant uncertainties. The findings also demonstrate that arsenic is related to nutrients and spatial distribution, while Fe and Mn are affected by spatial distribution and salinity. Some limitations and suggestions are also discussed to improve the prediction accuracy and interpretability.
Collapse
Affiliation(s)
- Thi-Minh-Trang Huynh
- Graduate Institute of Applied Geology, National Central University, Taoyuan 32001, Taiwan
| | - Chuen-Fa Ni
- Graduate Institute of Applied Geology, National Central University, Taoyuan 32001, Taiwan
- Center for Environmental Studies, National Central University, Taoyuan 32001, Taiwan
- Correspondence: (C.-F.N.); (Y.-S.S.)
| | - Yu-Sheng Su
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung 202301, Taiwan
- Correspondence: (C.-F.N.); (Y.-S.S.)
| | - Vo-Chau-Ngan Nguyen
- College of Environment and Natural Resources, Can Tho University, Can Tho 94000, Vietnam
| | - I-Hsien Lee
- Graduate Institute of Applied Geology, National Central University, Taoyuan 32001, Taiwan
- Center for Environmental Studies, National Central University, Taoyuan 32001, Taiwan
| | - Chi-Ping Lin
- Graduate Institute of Applied Geology, National Central University, Taoyuan 32001, Taiwan
- Center for Environmental Studies, National Central University, Taoyuan 32001, Taiwan
| | - Hoang-Hiep Nguyen
- Graduate Institute of Applied Geology, National Central University, Taoyuan 32001, Taiwan
| |
Collapse
|
10
|
Lin R, Wichadakul D. Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification. Front Genet 2022; 13:876721. [PMID: 35685437 PMCID: PMC9173695 DOI: 10.3389/fgene.2022.876721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 04/11/2022] [Indexed: 11/13/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) play crucial roles in many biological processes and are implicated in several diseases. With the next-generation sequencing technologies, substantial unannotated transcripts have been discovered. Classifying unannotated transcripts using biological experiments are more time-consuming and expensive than computational approaches. Several tools are available for identifying long non-coding RNAs. These tools, however, did not explain the features in their tools that contributed to the prediction results. Here, we present Xlnc1DCNN, a tool for distinguishing long non-coding RNAs (lncRNAs) from protein-coding transcripts (PCTs) using a one-dimensional convolutional neural network with prediction explanations. The evaluation results of the human test set showed that Xlnc1DCNN outperformed other state-of-the-art tools in terms of accuracy and F1-score. The explanation results revealed that lncRNA transcripts were mainly identified as sequences with no conserved regions, short patterns with unknown functions, or only regions of transmembrane helices while protein-coding transcripts were mostly classified by conserved protein domains or families. The explanation results also conveyed the probably inconsistent annotations among the public databases, lncRNA transcripts which contain protein domains, protein families, or intrinsically disordered regions (IDRs). Xlnc1DCNN is freely available at https://github.com/cucpbioinfo/Xlnc1DCNN.
Collapse
Affiliation(s)
- Rattaphon Lin
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Pathumwan, Thailand
| | - Duangdao Wichadakul
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Pathumwan, Thailand.,Center of Excellence in Systems Biology, Faculty of Medicine, Chulalongkorn University, Pathumwan, Thailand
| |
Collapse
|
11
|
Abstract
In credit risk estimation, the most important element is obtaining a probability of default as close as possible to the effective risk. This effort quickly prompted new, powerful algorithms that reach a far higher accuracy, but at the cost of losing intelligibility, such as Gradient Boosting or ensemble methods. These models are usually referred to as "black-boxes", implying that you know the inputs and the output, but there is little way to understand what is going on under the hood. As a response to that, we have seen several different Explainable AI models flourish in recent years, with the aim of letting the user see why the black-box gave a certain output. In this context, we evaluate two very popular eXplainable AI (XAI) models in their ability to discriminate observations into groups, through the application of both unsupervised and predictive modeling to the weights these XAI models assign to features locally. The evaluation is carried out on real Small and Medium Enterprises data, obtained from official italian repositories, and may form the basis for the employment of such XAI models for post-processing features extraction.
Collapse
Affiliation(s)
- Alex Gramegna
- Department of Economics and Management, University of Pavia, Pavia, Italy
| | - Paolo Giudici
- Department of Economics and Management, University of Pavia, Pavia, Italy
| |
Collapse
|
12
|
Chakraborty D, Ivan C, Amero P, Khan M, Rodriguez-Aguayo C, Başağaoğlu H, Lopez-Berestein G. Explainable Artificial Intelligence Reveals Novel Insight into Tumor Microenvironment Conditions Linked with Better Prognosis in Patients with Breast Cancer. Cancers (Basel) 2021; 13:3450. [PMID: 34298668 PMCID: PMC8303703 DOI: 10.3390/cancers13143450] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 07/06/2021] [Accepted: 07/06/2021] [Indexed: 12/29/2022] Open
Abstract
We investigated the data-driven relationship between immune cell composition in the tumor microenvironment (TME) and the ≥5-year survival rates of breast cancer patients using explainable artificial intelligence (XAI) models. We acquired TCGA breast invasive carcinoma data from the cbioPortal and retrieved immune cell composition estimates from bulk RNA sequencing data from TIMER2.0 based on EPIC, CIBERSORT, TIMER, and xCell computational methods. Novel insights derived from our XAI model showed that B cells, CD8+ T cells, M0 macrophages, and NK T cells are the most critical TME features for enhanced prognosis of breast cancer patients. Our XAI model also revealed the inflection points of these critical TME features, above or below which ≥5-year survival rates improve. Subsequently, we ascertained the conditional probabilities of ≥5-year survival under specific conditions inferred from the inflection points. In particular, the XAI models revealed that the B cell fraction (relative to all cells in a sample) exceeding 0.025, M0 macrophage fraction (relative to the total immune cell content) below 0.05, and NK T cell and CD8+ T cell fractions (based on cancer type-specific arbitrary units) above 0.075 and 0.25, respectively, in the TME could enhance the ≥5-year survival in breast cancer patients. The findings could lead to accurate clinical predictions and enhanced immunotherapies, and to the design of innovative strategies to reprogram the breast TME.
Collapse
Affiliation(s)
- Debaditya Chakraborty
- Department of Construction Science, The University of Texas at San Antonio, San Antonio, TX 78249, USA
| | - Cristina Ivan
- Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (C.I.); (P.A.); (C.R.-A.); (G.L.-B.)
- Center for RNA Interference and Non-Coding RNA, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Paola Amero
- Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (C.I.); (P.A.); (C.R.-A.); (G.L.-B.)
| | - Maliha Khan
- Department of Lymphoma and Myeloma, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA;
| | - Cristian Rodriguez-Aguayo
- Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (C.I.); (P.A.); (C.R.-A.); (G.L.-B.)
- Center for RNA Interference and Non-Coding RNA, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | | | - Gabriel Lopez-Berestein
- Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (C.I.); (P.A.); (C.R.-A.); (G.L.-B.)
- Center for RNA Interference and Non-Coding RNA, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|