1
|
Frndak S, Queirolo EI, Mañay N, Yu G, Ahmed Z, Barg G, Colder C, Kordas K. Predicting blood lead in Uruguayan children: Individual- vs neighborhood-level ensemble learners. PLOS GLOBAL PUBLIC HEALTH 2024; 4:e0003607. [PMID: 39231183 PMCID: PMC11373808 DOI: 10.1371/journal.pgph.0003607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 07/23/2024] [Indexed: 09/06/2024]
Abstract
Predicting childhood blood lead levels (BLLs) has had mixed success, and it is unclear if individual- or neighborhood-level variables are most predictive. An ensemble machine learning (ML) approach to identify the most relevant predictors of BLL ≥2μg/dL in urban children was implemented. A cross-sectional sample of 603 children (~7 years of age) recruited between 2009-2019 from Montevideo, Uruguay participated in the study. 77 individual- and 32 neighborhood-level variables were used to predict BLLs ≥2μg/dL. Three ensemble learners were created: one with individual-level predictors (Ensemble-I), one with neighborhood-level predictors (Ensemble-N), and one with both (Ensemble-All). Each ensemble learner comprised four base classifiers with 50% training, 25% validation, and 25% test datasets. Predictive performance of the three ensemble models was compared using area under the curve (AUC) for the receiver operating characteristic (ROC), precision, sensitivity, and specificity on the test dataset. Ensemble-I (AUC: 0.75, precision: 0.56, sensitivity: 0.79, specificity: 0.65) performed similarly to Ensemble-All (AUC: 0.75, precision: 0.63, sensitivity: 0.79, specificity: 0.69). Ensemble-N (AUC: 0.51, precision: 0.0, sensitivity: 0.0, specificity: 0.50) severely underperformed. Year of enrollment was most important in Ensemble-I and Ensemble-All, followed by household water Pb. Three neighborhood-level variables were among the top 10 important predictors in Ensemble-All (density of bus routes, dwellings with stream/other water source and distance to nearest river). The individual-level only model performed best, although precision was improved when both neighborhood and individual-level variables were included. Future predictive models of lead exposure should consider proximal predictors (i.e., household characteristics).
Collapse
Affiliation(s)
- Seth Frndak
- Department of Epidemiology and Environmental Health, University at Buffalo, The State University of New York USA, Buffalo, New York, United States of America
| | - Elena I. Queirolo
- Department of Neuroscience and Learning, Catholic University of Uruguay, Montevideo, Uruguay
| | - Nelly Mañay
- Faculty of Chemistry, University of the Republic of Uruguay (UDELAR), Montevideo, Uruguay
| | - Guan Yu
- Department of Biostatistics, University of Pittsburgh USA, Pittsburgh, Pennsylvania, United States of America
| | - Zia Ahmed
- Research and Education in eNergy, Environment and Water (RENEW) Institute University at Buffalo, The State University of New York, Buffalo, New York, United States of America
| | - Gabriel Barg
- Department of Neuroscience and Learning, Catholic University of Uruguay, Montevideo, Uruguay
| | - Craig Colder
- Department of Psychology, University at Buffalo, The State University of New York, Buffalo, New York, United States of America
| | - Katarzyna Kordas
- Department of Epidemiology and Environmental Health, University at Buffalo, The State University of New York USA, Buffalo, New York, United States of America
| |
Collapse
|
2
|
Kothari S, Sharma S, Shejwal S, Kazi A, D'Silva M, Karthikeyan M. An explainable AI-assisted web application in cancer drug value prediction. MethodsX 2024; 12:102696. [PMID: 38633421 PMCID: PMC11022087 DOI: 10.1016/j.mex.2024.102696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open
Abstract
In recent years, there has been an increase in the interest in adopting Explainable Artificial Intelligence (XAI) for healthcare. The proposed system includes•An XAI model for cancer drug value prediction. The model provides data that is easy to understand and explain, which is critical for medical decision-making. It also produces accurate projections.•A model outperformed existing models due to extensive training and evaluation on a large cancer medication chemical compounds dataset.•Insights into the causation and correlation between the dependent and independent actors in the chemical composition of the cancer cell. While the model is evaluated on Lung Cancer data, the architecture offered in the proposed solution is cancer agnostic. It may be scaled out to other cancer cell data if the properties are similar. The work presents a viable route for customizing treatments and improving patient outcomes in oncology by combining XAI with a large dataset. This research attempts to create a framework where a user can upload a test case and receive forecasts with explanations, all in a portable PDF report.
Collapse
Affiliation(s)
- Sonali Kothari
- Symbiosis Institute of Technology – Pune Campus, Symbiosis International (Deemed University), Pune, India
| | - Shivanandana Sharma
- Symbiosis Institute of Technology – Pune Campus, Symbiosis International (Deemed University), Pune, India
| | - Sanskruti Shejwal
- Symbiosis Institute of Technology – Pune Campus, Symbiosis International (Deemed University), Pune, India
| | - Aqsa Kazi
- Symbiosis Institute of Technology – Pune Campus, Symbiosis International (Deemed University), Pune, India
| | - Michela D'Silva
- Symbiosis Institute of Technology – Pune Campus, Symbiosis International (Deemed University), Pune, India
| | - M. Karthikeyan
- Senior Principal Scientist, Chemical Engineering and Process Development, NCL-CSIR, Pune, India
| |
Collapse
|
3
|
Kırboğa KK, Rudrapal M. Feature Engineering-Assisted Drug Repurposing on Disease-Drug Transcriptome Profiles in Gastric Cancer. Assay Drug Dev Technol 2024; 22:181-191. [PMID: 38572922 DOI: 10.1089/adt.2023.141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2024] Open
Abstract
Gastric cancer is one of the most common and deadly types of cancer in the world. To develop new biomarkers and drugs to diagnose and treat this cancer, it is necessary to identify the differences between the transcriptome profiles of gastric cancer and healthy individuals, identify critical genes associated with these differences, and make potential drug predictions based on these genes. In this study, using two gene expression datasets related to gastric cancer (GSE19826 and GSE79973), 200 genes that were ready for machine learning were selected, and their expression levels were analyzed. The best 100 genes for the model were chosen with the permutation feature importance method, and central genes, such as SCARB1, ETV3, SPATA17, FAM167A-AS1, and MTBP, which were shown to be associated with gastric cancer, were identified. Then, using the drug repurposing method with the Connectivity Map CLUE Query tools, potential drugs such as Forskolin, Gestrinone, Cediranib, Apicidine, and Everolimus, which showed a highly negative correlation with the expression levels of the selected genes, were identified. This study provides a method to develop new approaches to diagnosing and treating gastric cancer by comparing the transcriptome profiles of patients gastric cancer and performing a feature engineering-assisted drug repurposing analysis based on cancer data.
Collapse
Affiliation(s)
- Kevser Kübra Kırboğa
- Bioengineering Department, Faculty of Engineering, Bilecik Seyh Edebali University, Bilecik, Türkiye
| | - Mithun Rudrapal
- Department of Pharmaceutical Sciences, School of Biotechnology and Pharmaceutical Sciences, Vignan's Foundation for Science, Technology & Research (Deemed to be University), Guntur, India
| |
Collapse
|
4
|
Romano D, Novielli P, Diacono D, Cilli R, Pantaleo E, Amoroso N, Bellantuono L, Monaco A, Bellotti R, Tangaro S. Insights from Explainable Artificial Intelligence of Pollution and Socioeconomic Influences for Respiratory Cancer Mortality in Italy. J Pers Med 2024; 14:430. [PMID: 38673057 PMCID: PMC11051343 DOI: 10.3390/jpm14040430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 04/10/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
Respiratory malignancies, encompassing cancers affecting the lungs, the trachea, and the bronchi, pose a significant and dynamic public health challenge. Given that air pollution stands as a significant contributor to the onset of these ailments, discerning the most detrimental agents becomes imperative for crafting policies aimed at mitigating exposure. This study advocates for the utilization of explainable artificial intelligence (XAI) methodologies, leveraging remote sensing data, to ascertain the primary influencers on the prediction of standard mortality rates (SMRs) attributable to respiratory cancer across Italian provinces, utilizing both environmental and socioeconomic data. By scrutinizing thirteen distinct machine learning algorithms, we endeavor to pinpoint the most accurate model for categorizing Italian provinces as either above or below the national average SMR value for respiratory cancer. Furthermore, employing XAI techniques, we delineate the salient factors crucial in predicting the two classes of SMR. Through our machine learning scrutiny, we illuminate the environmental and socioeconomic factors pertinent to mortality in this disease category, thereby offering a roadmap for prioritizing interventions aimed at mitigating risk factors.
Collapse
Affiliation(s)
- Donato Romano
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy; (D.R.); (P.N.)
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
| | - Pierfrancesco Novielli
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy; (D.R.); (P.N.)
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
| | - Domenico Diacono
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
| | - Roberto Cilli
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
- Dipartimento Interateneo di Fisica “M. Merlin”, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| | - Ester Pantaleo
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
- Dipartimento Interateneo di Fisica “M. Merlin”, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| | - Nicola Amoroso
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
- Dipartimento di Farmacia Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| | - Loredana Bellantuono
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
- Dipartimento di Biomedicina Traslazionale e Neuroscienze, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| | - Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
- Dipartimento Interateneo di Fisica “M. Merlin”, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| | - Roberto Bellotti
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
- Dipartimento Interateneo di Fisica “M. Merlin”, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| | - Sabina Tangaro
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy; (D.R.); (P.N.)
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
| |
Collapse
|
5
|
Frndak S, Yan F, Edelson M, Immergluck LC, Kordas K, Idris MY, Dickinson-Copeland CM. Predicting Low-Level Childhood Lead Exposure in Metro Atlanta Using Ensemble Machine Learning of High-Resolution Raster Cells. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:4477. [PMID: 36901487 PMCID: PMC10002062 DOI: 10.3390/ijerph20054477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/24/2023] [Accepted: 02/27/2023] [Indexed: 06/18/2023]
Abstract
Low-level lead exposure in children is a major public health issue. Higher-resolution spatial targeting would significantly improve county and state-wide policies and programs for lead exposure prevention that generally intervene across large geographic areas. We use stack-ensemble machine learning, including an elastic net generalized linear model, gradient-boosted machine, and deep neural network, to predict the number of children with venous blood lead levels (BLLs) ≥2 to <5 µg/dL and ≥5 µg/dL in ~1 km2 raster cells in the metro Atlanta region using a sample of 92,792 children ≤5 years old screened between 2010 and 2018. Permutation-based predictor importance and partial dependence plots were used for interpretation. Maps of predicted vs. observed values were generated to compare model performance. According to the EPA Toxic Release Inventory for air-based toxic release facility density, the percentage of the population below the poverty threshold, crime, and road network density was positively associated with the number of children with low-level lead exposure, whereas the percentage of the white population was inversely associated. While predictions generally matched observed values, cells with high counts of lead exposure were underestimated. High-resolution geographic prediction of lead-exposed children using ensemble machine learning is a promising approach to enhance lead prevention efforts.
Collapse
Affiliation(s)
- Seth Frndak
- Department of Epidemiology and Environmental Health, School of Public Health and Health Professions, University at Buffalo, Buffalo, NY 14260, USA
| | - Fengxia Yan
- Department of Community Health and Preventive Medicine, Morehouse School of Medicine, Atlanta, GA 30310, USA
| | - Mike Edelson
- Geographic Information Systems, InterDev, Roswell, GA 30076, USA
| | - Lilly Cheng Immergluck
- Department of Microbiology, Biochemistry, and Immunology, Morehouse School of Medicine, Atlanta, GA 30310, USA
| | - Katarzyna Kordas
- Department of Epidemiology and Environmental Health, School of Public Health and Health Professions, University at Buffalo, Buffalo, NY 14260, USA
| | - Muhammed Y. Idris
- Department of Medicine, Morehouse School of Medicine, Atlanta, GA 30310, USA
| | | |
Collapse
|
6
|
Ladbury C, Zarinshenas R, Semwal H, Tam A, Vaidehi N, Rodin AS, Liu A, Glaser S, Salgia R, Amini A. Utilization of model-agnostic explainable artificial intelligence frameworks in oncology: a narrative review. Transl Cancer Res 2022; 11:3853-3868. [PMID: 36388027 PMCID: PMC9641128 DOI: 10.21037/tcr-22-1626] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 09/07/2022] [Indexed: 11/25/2022]
Abstract
Background and Objective Machine learning (ML) models are increasingly being utilized in oncology research for use in the clinic. However, while more complicated models may provide improvements in predictive or prognostic power, a hurdle to their adoption are limits of model interpretability, wherein the inner workings can be perceived as a "black box". Explainable artificial intelligence (XAI) frameworks including Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) are novel, model-agnostic approaches that aim to provide insight into the inner workings of the "black box" by producing quantitative visualizations of how model predictions are calculated. In doing so, XAI can transform complicated ML models into easily understandable charts and interpretable sets of rules, which can give providers with an intuitive understanding of the knowledge generated, thus facilitating the deployment of such models in routine clinical workflows. Methods We performed a comprehensive, non-systematic review of the latest literature to define use cases of model-agnostic XAI frameworks in oncologic research. The examined database was PubMed/MEDLINE. The last search was run on May 1, 2022. Key Content and Findings In this review, we identified several fields in oncology research where ML models and XAI were utilized to improve interpretability, including prognostication, diagnosis, radiomics, pathology, treatment selection, radiation treatment workflows, and epidemiology. Within these fields, XAI facilitates determination of feature importance in the overall model, visualization of relationships and/or interactions, evaluation of how individual predictions are produced, feature selection, identification of prognostic and/or predictive thresholds, and overall confidence in the models, among other benefits. These examples provide a basis for future work to expand on, which can facilitate adoption in the clinic when the complexity of such modeling would otherwise be prohibitive. Conclusions Model-agnostic XAI frameworks offer an intuitive and effective means of describing oncology ML models, with applications including prognostication and determination of optimal treatment regimens. Using such frameworks presents an opportunity to improve understanding of ML models, which is a critical step to their adoption in the clinic.
Collapse
Affiliation(s)
- Colton Ladbury
- Department of Radiation Oncology, City of Hope National Medical Center, Duarte, CA, USA
| | - Reza Zarinshenas
- Department of Radiation Oncology, City of Hope National Medical Center, Duarte, CA, USA
| | - Hemal Semwal
- Departments of Bioengineering and Integrated Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA
| | - Andrew Tam
- Department of Radiation Oncology, City of Hope National Medical Center, Duarte, CA, USA
| | - Nagarajan Vaidehi
- Department of Computational and Quantitative Medicine, City of Hope National Medical Center, Duarte, CA, USA
| | - Andrei S Rodin
- Department of Computational and Quantitative Medicine, City of Hope National Medical Center, Duarte, CA, USA
| | - An Liu
- Department of Radiation Oncology, City of Hope National Medical Center, Duarte, CA, USA
| | - Scott Glaser
- Department of Radiation Oncology, City of Hope National Medical Center, Duarte, CA, USA
| | - Ravi Salgia
- Department of Medical Oncology, City of Hope National Medical Center, Duarte, CA, USA
| | - Arya Amini
- Department of Radiation Oncology, City of Hope National Medical Center, Duarte, CA, USA
| |
Collapse
|