1
|
Pan W, Gong S, Ke H, Li X, Chen D, Huang C, Song D. Development of an automated photolysis rates prediction system based on machine learning. J Environ Sci (China) 2025; 151:211-224. [PMID: 39481934 DOI: 10.1016/j.jes.2024.03.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 03/27/2024] [Accepted: 03/27/2024] [Indexed: 11/03/2024]
Abstract
Based on observed meteorological elements, photolysis rates (J-values) and pollutant concentrations, an automated J-values predicting system by machine learning (J-ML) has been developed to reproduce and predict the J-values of O1D, NO2, HONO, H2O2, HCHO, and NO3, which are the crucial values for the prediction of the atmospheric oxidation capacity (AOC) and secondary pollutant concentrations such as ozone (O3), secondary organic aerosols (SOA). The J-ML can self-select the optimal "Model + Hyperparameters" without human interference. The evaluated results showed that the J-ML had a good performance to reproduce the J-values where most of the correlation (R) coefficients exceed 0.93 and the accuracy (P) values are in the range of 0.68-0.83, comparing with the J-values from observations and from the tropospheric ultraviolet and visible (TUV) radiation model in Beijing, Chengdu, Guangzhou and Shanghai, China. The hourly prediction was also well performed with R from 0.78 to 0.81 for next 3-days and from 0.69 to 0.71 for next 7-days, respectively. Compared with O3 concentrations by using J-values from the TUV model, an emission-driven observation-based model (e-OBM) by using the J-values from the J-ML showed a 4%-12% increase in R and 4%-30% decrease in ME, indicating that the J-ML could be used as an excellent supplement to traditional numerical models. The feature importance analysis concluded that the key influential parameter was the surface solar downwards radiation for all J-values, and the other dominant factors for all J-values were 2-m mean temperature, O3, total cloud cover, boundary layer height, relative humidity and surface pressure.
Collapse
Affiliation(s)
- Weijun Pan
- State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological Sciences, Beijing 100081, China
| | - Sunling Gong
- State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological Sciences, Beijing 100081, China; National Observation and Research Station of Coastal Ecological Environments in Macao, Macao Environmental Research Institute, Macau University of Science and Technology, Macao 999078, China.
| | - Huabing Ke
- State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological Sciences, Beijing 100081, China
| | - Xin Li
- College of Environmental Sciences and Engineering, Peking University, Beijing 100871, China
| | - Duohong Chen
- State Environmental Key Laboratory of Reginal Air Quality Monitoring, Guangdong Ecological Environmental Monitoring Center, Guangzhou 510308. China
| | - Cheng Huang
- State Environmental Protection Key Laboratory of Formation and Prevention of the Urban Air Complex, Shanghai Academy of Environmental Sciences, Shanghai 200233, China
| | - Danlin Song
- Chengdu Academy of Environmental Sciences, Chengdu 610072, China
| |
Collapse
|
2
|
Yin WX, Lv JQ, Liu S, Chen JJ, Wei J, Ding C, Yuan Y, Bao HX, Wang HC, Wang AJ. Microbial-Guided prediction of methane and sulfide production in Sewers: Integrating mechanistic models with Machine learning. BIORESOURCE TECHNOLOGY 2025; 415:131640. [PMID: 39414164 DOI: 10.1016/j.biortech.2024.131640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 10/02/2024] [Accepted: 10/13/2024] [Indexed: 10/18/2024]
Abstract
Accurate modeling of methane (CH4) and sulfide (H2S) production in sewer systems was constrained by insufficient consideration of microbial processes under dynamic environmental conditions. This study introduces a microbial-guided machine learning (ML) framework (Micro-ML), which integrates microbial process representations from mechanistic models (microbial information) with ML models. Results indicate that Micro-ML model enhanced predictions of CH4 and H2S production, where microbial information provides more information for model optimization. The feature importance of microbial information performed comparable weightings for 58.12 % and 55.16 %, respectively, but their relative significance in influencing Micro-ML model performance varies considerably. The application of Micro-ML performed great potential in reducing CH4 and H2S production (decreased ∼ 80 % and 90 %). The integrated model not only improves the accuracy of CH4 and H2S predictions but also offers a valuable tool for effective management strategies for sewer systems.
Collapse
Affiliation(s)
- Wan-Xin Yin
- College of the Environment, Liaoning University, Shenyang 110036, PR China; State Key Laboratory of Urban Water Resource and Environment, School of Civil and Environmental Engineering, Harbin Institute of Technology, Shenzhen 518055, PR China
| | - Jia-Qiang Lv
- State Key Laboratory of Urban Water Resource and Environment, School of Civil and Environmental Engineering, Harbin Institute of Technology, Shenzhen 518055, PR China
| | - Shuai Liu
- State Key Laboratory of Urban Water Resource and Environment, School of Civil and Environmental Engineering, Harbin Institute of Technology, Shenzhen 518055, PR China
| | - Jia-Ji Chen
- CAS Key Laboratory of Environmental Biotechnology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China
| | - Jun Wei
- PowerChina Huadong Engineering Corporation Limited, Hangzhou 311122, PR China
| | - Cheng Ding
- School of Environmental Science and Engineering, Yancheng Institute of Technology, Yancheng 224051, PR China
| | - Ye Yuan
- School of Environmental Science and Engineering, Yancheng Institute of Technology, Yancheng 224051, PR China
| | - Hong-Xu Bao
- College of the Environment, Liaoning University, Shenyang 110036, PR China
| | - Hong-Cheng Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Civil and Environmental Engineering, Harbin Institute of Technology, Shenzhen 518055, PR China.
| | - Ai-Jie Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Civil and Environmental Engineering, Harbin Institute of Technology, Shenzhen 518055, PR China
| |
Collapse
|
3
|
Choi JY, Park S, Shim JS, Park HJ, Kuh SU, Jeong Y, Park MG, Noh TI, Yoon SG, Park YM, Lee SJ, Kim H, Kang SH, Lee KH. Explainable artificial intelligence-driven prostate cancer screening using exosomal multi-marker based dual-gate FET biosensor. Biosens Bioelectron 2025; 267:116773. [PMID: 39277920 DOI: 10.1016/j.bios.2024.116773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 08/14/2024] [Accepted: 09/09/2024] [Indexed: 09/17/2024]
Abstract
Prostate Imaging Reporting and Data System (PI-RADS) score, a reporting system of prostate MRI cases, has become a standard prostate cancer (PCa) screening method due to exceptional diagnosis performance. However, PI-RADS 3 lesions are an unmet medical need because PI-RADS provides diagnosis accuracy of only 30-40% at most, accompanied by a high false-positive rate. Here, we propose an explainable artificial intelligence (XAI) based PCa screening system integrating a highly sensitive dual-gate field-effect transistor (DGFET) based multi-marker biosensor for ambiguous lesions identification. This system produces interpretable results by analyzing sensing patterns of three urinary exosomal biomarkers, providing a possibility of an evidence-based prediction from clinicians. In our results, XAI-based PCa screening system showed a high accuracy with an AUC of 0.93 using 102 blinded samples with the non-invasive method. Remarkably, the PCa diagnosis accuracy of patients with PI-RADS 3 was more than twice that of conventional PI-RADS scoring. Our system also provided a reasonable explanation of its decision that TMEM256 biomarker is the leading factor for screening those with PI-RADS 3. Our study implies that XAI can facilitate informed decisions, guided by insights into the significance of visualized multi-biomarkers and clinical factors. The XAI-based sensor system can assist healthcare professionals in providing practical and evidence-based PCa diagnoses.
Collapse
Affiliation(s)
- Jae Yi Choi
- Center for Advanced Biomolecular Recognition, Biomedical Research Division, Korea Institute of Science and Technology, Seoul, 02792, Republic of Korea; Department of Medical Device Engineering and Management, College of Medicine, Yonsei University, Seoul, 06229, Republic of Korea
| | - Sungwook Park
- Center for Advanced Biomolecular Recognition, Biomedical Research Division, Korea Institute of Science and Technology, Seoul, 02792, Republic of Korea
| | - Ji Sung Shim
- Department of Urology, College of Medicine, Korea University, Seoul, 02841, Republic of Korea
| | - Hyung Joon Park
- Center for Advanced Biomolecular Recognition, Biomedical Research Division, Korea Institute of Science and Technology, Seoul, 02792, Republic of Korea; KU-KIST Graduate School of Converging Science and Technology, Korea University, Seoul, 02481, Republic of Korea
| | - Sung Uk Kuh
- Department of Medical Device Engineering and Management, College of Medicine, Yonsei University, Seoul, 06229, Republic of Korea; Department of Neurosurgery, College of Medicine, Yonsei University, Seoul, 03722, Republic of Korea
| | - Youngdo Jeong
- Center for Advanced Biomolecular Recognition, Biomedical Research Division, Korea Institute of Science and Technology, Seoul, 02792, Republic of Korea; Department of HY-KIST Bio-convergence, Hanyang University, Seoul, 04763, Republic of Korea
| | - Min Gu Park
- Department of Urology, College of Medicine, Korea University, Seoul, 02841, Republic of Korea
| | - Tae Il Noh
- Department of Urology, College of Medicine, Korea University, Seoul, 02841, Republic of Korea
| | - Sung Goo Yoon
- Department of Urology, College of Medicine, Korea University, Seoul, 02841, Republic of Korea
| | - Yoo Min Park
- Center for Nano Bio Development, National Nanofab Center (NNFC), Daejeon, 34141, Republic of Korea
| | - Seok Jae Lee
- Center for Nano Bio Development, National Nanofab Center (NNFC), Daejeon, 34141, Republic of Korea
| | - Hojun Kim
- Center for Advanced Biomolecular Recognition, Biomedical Research Division, Korea Institute of Science and Technology, Seoul, 02792, Republic of Korea; Division of Bio-Medical Science and Technology, KIST School, University of Science and Technology (UST), Daejeon, 34113, Republic of Korea.
| | - Seok Ho Kang
- Department of Urology, College of Medicine, Korea University, Seoul, 02841, Republic of Korea.
| | - Kwan Hyi Lee
- Center for Advanced Biomolecular Recognition, Biomedical Research Division, Korea Institute of Science and Technology, Seoul, 02792, Republic of Korea; KU-KIST Graduate School of Converging Science and Technology, Korea University, Seoul, 02481, Republic of Korea; Division of Bio-Medical Science and Technology, KIST School, University of Science and Technology (UST), Daejeon, 34113, Republic of Korea.
| |
Collapse
|
4
|
Wang H, Li T, Wang G, Peng Y, Zhang Q, Wang X, Ren Y, Liu R, Yan S, Meng Q, Wang Y, Wang Q. Significant spatiotemporal changes in atmospheric particulate mercury pollution in China: Insights from meta-analysis and machine-learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 955:177184. [PMID: 39454773 DOI: 10.1016/j.scitotenv.2024.177184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 10/19/2024] [Accepted: 10/21/2024] [Indexed: 10/28/2024]
Abstract
PM2.5 bound mercury (PBM2.5) in the atmosphere is a major component of total mercury, which is a pollutant of global concern and a potent neurotoxicant when converted to methylmercury. Despite its importance, comprehensive macroanalyses of PBM2.5 on large scales are still lacking. To explore the driving factors, spatiotemporal pollution distribution, and associated health risks, we compiled a comprehensive dataset consisting of PBM2.5 concentrations and spatiotemporal information across China from 2000 to 2023 that was collected from the published scientific literature with valid data. By incorporating corresponding multidimensional predicting variables, the best-fitted random forest model was applied to predict PBM2.5 concentrations with a high spatial resolution of 0.25° × 0.25°, and the health risk assessment model was used for subsequent health risk assessment. Our results indicated that population density and PM2.5 emissions from power generation were the main contributors to PBM2.5 concentrations. In 2020, the pollution was primarily concentrated in northern, central, and eastern China, with the highest annual average concentration of 815.91 pg/m3 in Shanghai. Beijing experienced the most significant seasonal increase, with PBM2.5 concentrations rising by 146.92 % from summer to winter. Nationally, the annual average PBM2.5 pollution decreased extensively and markedly from 2015 to 2020. The non-carcinogenic risk of PBM2.5 alone was negligible in 2020, with HQ values generally <0.02 in winter. This study may provide an important assessment of the effectiveness of China's measures against mercury pollution and offer valuable insights for future prevention and control of PBM2.5 pollution.
Collapse
Affiliation(s)
- Haolin Wang
- Academician Workstation for Big Data Research in Ecology and Environment, Environmental Research Institute, Shandong University, Qingdao 266237, China
| | - Tianshuai Li
- Academician Workstation for Big Data Research in Ecology and Environment, Environmental Research Institute, Shandong University, Qingdao 266237, China
| | - Guoqiang Wang
- Academician Workstation for Big Data Research in Ecology and Environment, Environmental Research Institute, Shandong University, Qingdao 266237, China
| | - Yanbo Peng
- Key Laboratory of Land and Sea Ecological Governance and Systematic Regulation, Shandong Academy for Environmental Planning, Jinan 250101, China.
| | - Qingzhu Zhang
- Academician Workstation for Big Data Research in Ecology and Environment, Environmental Research Institute, Shandong University, Qingdao 266237, China.
| | - Xinfeng Wang
- Academician Workstation for Big Data Research in Ecology and Environment, Environmental Research Institute, Shandong University, Qingdao 266237, China
| | - Yuchao Ren
- Academician Workstation for Big Data Research in Ecology and Environment, Environmental Research Institute, Shandong University, Qingdao 266237, China
| | - Ruobing Liu
- Academician Workstation for Big Data Research in Ecology and Environment, Environmental Research Institute, Shandong University, Qingdao 266237, China
| | - Shuwan Yan
- Academician Workstation for Big Data Research in Ecology and Environment, Environmental Research Institute, Shandong University, Qingdao 266237, China
| | - Qingpeng Meng
- Academician Workstation for Big Data Research in Ecology and Environment, Environmental Research Institute, Shandong University, Qingdao 266237, China
| | - Yujia Wang
- Academician Workstation for Big Data Research in Ecology and Environment, Environmental Research Institute, Shandong University, Qingdao 266237, China
| | - Qiao Wang
- Academician Workstation for Big Data Research in Ecology and Environment, Environmental Research Institute, Shandong University, Qingdao 266237, China
| |
Collapse
|
5
|
Peng M, Yang Z, Liu Z, Han W, Wang Q, Liu F, Zhou Y, Ma H, Bai J, Cheng H. Heavy metals in roadside soil along an expressway connecting two megacities in China: Accumulation characteristics, sources and influencing factors. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 955:177095. [PMID: 39461525 DOI: 10.1016/j.scitotenv.2024.177095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 09/16/2024] [Accepted: 10/18/2024] [Indexed: 10/29/2024]
Abstract
Transportation is widely recognized as a significant contributor to heavy metal (HM) pollution in roadside soils. A better understanding of HM pollution in soils near expressways is crucial, particularly given the rapid expansion of expressway transportation in China in recent years. In this study, 329 roadside topsoil samples were collected along the Beijing-Tianjin Expressway, which connects two megacities in China. Chemical analysis showed that HM concentrations in the soil samples were generally below national limits. The mean pollution index (Pi) values for As, Cr, Cu, Ni, Pb, and Zn ranged from 0.94 to 1.01, while Cd and Hg exhibited slightly higher mean Pi values of 1.19 and 1.13, respectively. The Nemerow integrated pollution index values for all samples ranged from 0.71 to 4.97, with a mean of 1.26. This suggests a slight enrichment of HM above natural background levels, especially for Cd and Hg. Source apportionment using positive matrix factorization revealed that natural sources contributed the most to soil HMs (64.51 %), followed by agricultural sources (19.15 %), traffic sources (9.77 %), and industrial sources (6.57 %). The Shapley additive explanation analysis, based on the random forest model, identified soil organic carbon, deep soil HM content, altitude, total soil K2O, urbanization composite impact index, and total soil P as primary influencing factors. This indicates that the impact of transportation on roadside soils along the Beijing-Tianjin Expressway is currently relatively limited. The prominent influence of soil properties and altitude underscored the importance of "transport" and "receptor" in the soil HMs accumulation process at the local scale. These findings provide critical data and a scientific basis for decision-makers to develop policies for expressway design and roadside soil protection.
Collapse
Affiliation(s)
- Min Peng
- Key Laboratory of Geochemical Cycling of Carbon and Mercury in the Earth's Critical Zone, Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang 065000, China; Research Center of Geochemical Survey and Assessment on Land Quality, China Geological Survey, Langfang 065000, China
| | - Zheng Yang
- Key Laboratory of Geochemical Cycling of Carbon and Mercury in the Earth's Critical Zone, Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang 065000, China; Research Center of Geochemical Survey and Assessment on Land Quality, China Geological Survey, Langfang 065000, China.
| | - Zijia Liu
- Key Laboratory of Geochemical Cycling of Carbon and Mercury in the Earth's Critical Zone, Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang 065000, China; Research Center of Geochemical Survey and Assessment on Land Quality, China Geological Survey, Langfang 065000, China
| | - Wei Han
- Key Laboratory of Geochemical Cycling of Carbon and Mercury in the Earth's Critical Zone, Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang 065000, China; Research Center of Geochemical Survey and Assessment on Land Quality, China Geological Survey, Langfang 065000, China
| | - Qiaolin Wang
- Key Laboratory of Geochemical Cycling of Carbon and Mercury in the Earth's Critical Zone, Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang 065000, China; Research Center of Geochemical Survey and Assessment on Land Quality, China Geological Survey, Langfang 065000, China
| | - Fei Liu
- Key Laboratory of Geochemical Cycling of Carbon and Mercury in the Earth's Critical Zone, Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang 065000, China; Research Center of Geochemical Survey and Assessment on Land Quality, China Geological Survey, Langfang 065000, China
| | - Yalong Zhou
- Key Laboratory of Geochemical Cycling of Carbon and Mercury in the Earth's Critical Zone, Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang 065000, China; Research Center of Geochemical Survey and Assessment on Land Quality, China Geological Survey, Langfang 065000, China
| | - Honghong Ma
- Key Laboratory of Geochemical Cycling of Carbon and Mercury in the Earth's Critical Zone, Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang 065000, China; Research Center of Geochemical Survey and Assessment on Land Quality, China Geological Survey, Langfang 065000, China
| | - Jinfeng Bai
- Key Laboratory of Geochemical Cycling of Carbon and Mercury in the Earth's Critical Zone, Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang 065000, China; Research Center of Geochemical Survey and Assessment on Land Quality, China Geological Survey, Langfang 065000, China
| | - Hangxin Cheng
- Key Laboratory of Geochemical Cycling of Carbon and Mercury in the Earth's Critical Zone, Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang 065000, China; Research Center of Geochemical Survey and Assessment on Land Quality, China Geological Survey, Langfang 065000, China.
| |
Collapse
|
6
|
Yeghaian M, Bodalal Z, Tareco Bucho T, Kurilova I, Blank C, Smit E, van der Heijden M, Nguyen-Kim T, van den Broek D, Beets-Tan R, Trebeschi S. Integrated noninvasive diagnostics for prediction of survival in immunotherapy. IMMUNO-ONCOLOGY TECHNOLOGY 2024; 24:100723. [PMID: 39185322 PMCID: PMC11342748 DOI: 10.1016/j.iotech.2024.100723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Background Integrating complementary diagnostic data sources promises enhanced robustness in the predictive performance of artificial intelligence (AI) models, a crucial requirement for future clinical validation/implementation. In this study, we investigate the potential value of integrating data from noninvasive diagnostic modalities, including chest computed tomography (CT) imaging, routine laboratory blood tests, and clinical parameters, to retrospectively predict 1-year survival in a cohort of patients with advanced non-small-cell lung cancer, melanoma, and urothelial cancer treated with immunotherapy. Patients and methods The study included 475 patients, of whom 444 had longitudinal CT scans and 475 had longitudinal laboratory data. An ensemble of AI models was trained on data from each diagnostic modality, and subsequently, a model-agnostic integration approach was adopted for combining the prediction probabilities of each modality and producing an integrated decision. Results Integrating different diagnostic data demonstrated a modest increase in predictive performance. The highest area under the curve (AUC) was achieved by CT and laboratory data integration (AUC of 0.83, 95% confidence interval 0.81-0.85, P < 0.001), whereas the performance of individual models trained on laboratory and CT data independently yielded AUCs of 0.81 and 0.73, respectively. Conclusions In our retrospective cohort, integrating different noninvasive data modalities improved performance.
Collapse
Affiliation(s)
- M. Yeghaian
- Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
- GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands
| | - Z. Bodalal
- Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
- GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands
| | - T.M. Tareco Bucho
- Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
- GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands
| | - I. Kurilova
- Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - C.U. Blank
- Department of Medical Oncology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - E.F. Smit
- Pulmonology Department, Leiden University Medical Center, Leiden, The Netherlands
| | - M.S. van der Heijden
- Department of Medical Oncology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
- Department of Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - T.D.L. Nguyen-Kim
- Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
- GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands
- Institute of Diagnostic and Interventional Radiology, University Hospital of Zurich, Zurich, Switzerland
| | - D. van den Broek
- Department of Laboratory Medicine, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - R.G.H. Beets-Tan
- Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
- GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands
- Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark
| | - S. Trebeschi
- Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
- GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
7
|
Sourkatti H, Pajula J, Keski-Kuha T, Koivisto J, Hilvo M, Lähteenmäki J. Predictive modeling for identification of older adults with high utilization of health and social services. Scand J Prim Health Care 2024; 42:609-616. [PMID: 38958358 PMCID: PMC11552250 DOI: 10.1080/02813432.2024.2372297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 06/19/2024] [Indexed: 07/04/2024] Open
Abstract
AIM Machine learning techniques have demonstrated success in predictive modeling across various clinical cases. However, few studies have considered predicting the use of multisectoral health and social services among older adults. This research aims to utilize machine learning models to detect high-risk groups of excessive health and social services utilization at early stage, facilitating the implementation of preventive interventions. METHODS We used pseudonymized data covering a four-year period and including information on a total of 33,374 senior citizens from Southern Finland. The endpoint was defined based on the occurrence of unplanned healthcare visits and the total number of different services used. Input features included individual's basic demographics, health status and past usage of healthcare resources. Logistic regression and eXtreme Gradient Boosting (XGBoost) methods were used for binary classification, with the dataset split into 70% training and 30% testing sets. RESULTS Subgroup-based results mirrored trends observed in the full cohort, with age and certain health issues, e.g. mental health, emerging as positive predictors for high service utilization. Conversely, hospital stay and urban residence were associated with decreased risk. The models achieved a classification performance (AUC) of 0.61 for the full cohort and varying in the range of 0.55-0.62 for the subgroups. CONCLUSIONS Predictive models offer potential for predicting future high service utilization in the older adult population. Achieving high classification performance remains challenging due to diverse contributing factors. We anticipate that classification performance could be increased by including features based on additional data categories such as socio-economic data.
Collapse
Affiliation(s)
- Heba Sourkatti
- VTT Technical Research Centre of Finland Ltd, Espoo, Finland
| | - Juha Pajula
- VTT Technical Research Centre of Finland Ltd, Espoo, Finland
| | - Teemu Keski-Kuha
- Finnish Institute of Health and Welfare (THL), Helsinki, Finland
| | - Juha Koivisto
- Finnish Institute of Health and Welfare (THL), Helsinki, Finland
| | - Mika Hilvo
- VTT Technical Research Centre of Finland Ltd, Espoo, Finland
| | | |
Collapse
|
8
|
Lin D, Liu J, Ke C, Chen H, Li J, Xie Y, Ma J, Lv X, Feng Y. Radiomics Analysis of Quantitative Maps from Synthetic MRI for Predicting Grades and Molecular Subtypes of Diffuse Gliomas. Clin Neuroradiol 2024; 34:817-826. [PMID: 38858272 DOI: 10.1007/s00062-024-01421-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 05/03/2024] [Indexed: 06/12/2024]
Abstract
PURPOSE To investigate the feasibility of using radiomics analysis of quantitative maps from synthetic MRI to preoperatively predict diffuse glioma grades, isocitrate dehydrogenase (IDH) subtypes, and 1p/19q codeletion status. METHODS Data from 124 patients with diffuse glioma were used for analysis (n = 87 for training, n = 37 for testing). Quantitative T1, T2, and proton density (PD) maps were obtained using synthetic MRI. Enhancing tumour (ET), non-enhancing tumour and necrosis (NET), and peritumoral edema (PE) regions were segmented followed by manual fine-tuning. Features were extracted using PyRadiomics and then selected using Levene/T, BorutaShap and maximum relevance minimum redundancy algorithms. A support vector machine was adopted for classification. Receiver operating characteristic curve analysis and integrated discrimination improvement analysis were implemented to compare the performance of different radiomics models. RESULTS Radiomics models constructed using features from multiple tumour subregions (ET + NET + PE) in the combined maps (T1 + T2 + PD) achieved the highest AUC in all three prediction tasks, among which the AUC for differentiating lower-grade and high-grade diffuse gliomas, predicting IDH mutation status and predicting 1p/19q codeletion status were 0.92, 0.95 and 0.86 respectively. Compared with those constructed on individual T1, T2, and PD maps, the discriminant ability of radiomics models constructed on the combined maps separately increased by 11, 17 and 10% in predicting glioma grades, 35, 52 and 19% in predicting IDH mutation status, and 16, 15 and 14% in predicting 1p/19q codeletion status (p < 0.05). CONCLUSION Radiomics analysis of quantitative maps from synthetic MRI provides a new quantitative imaging tool for the preoperative prediction of grades and molecular subtypes in diffuse gliomas.
Collapse
Affiliation(s)
- Danlin Lin
- Department of Medical Imaging, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Jiehong Liu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, China
| | - Chao Ke
- Department of Neurosurgery, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Haolin Chen
- Department of Radiation Oncology, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Jing Li
- Department of Medical Imaging, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Yuanyao Xie
- School of Biomedical Engineering, Southern Medical University, Guangzhou, China
| | - Jianhua Ma
- School of Biomedical Engineering, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Medical Image Processing & Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, China
| | - Xiaofei Lv
- Department of Medical Imaging, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China.
| | - Yanqiu Feng
- School of Biomedical Engineering, Southern Medical University, Guangzhou, China.
- Guangdong Provincial Key Laboratory of Medical Image Processing & Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, China.
- Guangdong-Hong Kong-Macao Greater Bay Area Centre for Brain Science and Brain-Inspired Intelligence & Key Laboratory of Mental Health of the Ministry of Education, Guangzhou, China.
- Department of Radiology, The First People's Hospital of Shunde, Southern Medical University, Foshan, China.
| |
Collapse
|
9
|
Florentino BR, Parmezan Bonidia R, Sanches NH, da Rocha UN, de Carvalho AC. BioPrediction-RPI: Democratizing the prediction of interaction between non-coding RNA and protein with end-to-end machine learning. Comput Struct Biotechnol J 2024; 23:2267-2276. [PMID: 38827228 PMCID: PMC11140557 DOI: 10.1016/j.csbj.2024.05.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 05/16/2024] [Accepted: 05/16/2024] [Indexed: 06/04/2024] Open
Abstract
Machine Learning (ML) algorithms have been important tools for the extraction of useful knowledge from biological sequences, particularly in healthcare, agriculture, and the environment. However, the categorical and unstructured nature of these sequences requiring usually additional feature engineering steps, before an ML algorithm can be efficiently applied. The addition of these steps to the ML algorithm creates a processing pipeline, known as end-to-end ML. Despite the excellent results obtained by applying end-to-end ML to biotechnology problems, the performance obtained depends on the expertise of the user in the components of the pipeline. In this work, we propose an end-to-end ML-based framework called BioPrediction-RPI, which can identify implicit interactions between sequences, such as pairs of non-coding RNA and proteins, without the need for specialized expertise in end-to-end ML. This framework applies feature engineering to represent each sequence by structural and topological features. These features are divided into feature groups and used to train partial models, whose partial decisions are combined into a final decision, which, provides insights to the user by giving an interpretability report. In our experiments, the developed framework was competitive when compared with various expert-created models. We assessed BioPrediction-RPI with 12 datasets when it presented equal or better performance than all tools in 40% to 100% of cases, depending on the experiment. Finally, BioPrediction-RPI can fine-tune models based on new data and perform at the same level as ML experts, democratizing end-to-end ML and increasing its access to those working in biological sciences.
Collapse
Affiliation(s)
- Bruno Rafael Florentino
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, 13566-590, São Paulo, Brazil
| | - Robson Parmezan Bonidia
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, 13566-590, São Paulo, Brazil
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, 86300-000, Paraná, Brazil
| | - Natan Henrique Sanches
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, 13566-590, São Paulo, Brazil
| | - Ulisses N. da Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, Leipzig, Saxony, Germany
| | - André C.P.L.F. de Carvalho
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, 13566-590, São Paulo, Brazil
| |
Collapse
|
10
|
Wu X, Wang Z, Zheng L, Yang Y, Shi W, Wang J, Liu D, Zhang Y. Construction and verification of a machine learning-based prediction model of deep vein thrombosis formation after spinal surgery. Int J Med Inform 2024; 192:105609. [PMID: 39260049 DOI: 10.1016/j.ijmedinf.2024.105609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 08/17/2024] [Accepted: 08/26/2024] [Indexed: 09/13/2024]
Abstract
BACKGROUND Deep vein thromboembolism (DVT) is a common postoperative complication with high morbidity and mortality rates. However, the safety and effectiveness of using prophylactic anticoagulants for preventing DVT after spinal surgery remain controversial. Hence, it is crucial to predict whether DVT occurs in advance following spinal surgery. The present study aimed to establish a machine learning (ML)-based prediction model of DVT formation following spinal surgery. METHODS We reviewed the medical records of patients who underwent elective spinal surgery at the Third Affiliated Hospital of Zunyi Medical University (TAHZMU) from January 2020 to December 2022. We ultimately selected the clinical data of 500 patients who met the criteria for elective spinal surgery. The Boruta-SHAP algorithm was used for feature selection, and the SMOTE algorithm was used for data balance. The related risk factors for DVT after spinal surgery were screened and analyzed. Five ML algorithm models were established. The data of 150 patients treated at the Affiliated Hospital of Zunyi Medical University (AHZMU) from July 2023 to October 2023 were used for external verification of the model. The area under the curve (AUC), geometric mean (G-mean), sensitivity, accuracy, specificity, and F1 score were used to evaluate the performance of the models. RESULTS The results revealed that activated partial thromboplastin time (APTT), age, body mass index (BMI), preoperative serum creatinine (Crea), anesthesia time, rocuronium dose, and propofol dose were the seven important characteristic variables for predicting DVT after spinal surgery. Among the five ML models established in this study, the random forest classifier (RF) showed superior performance to the other models in the internal validation set. CONCLUSION Seven preoperative and intraoperative variables were included in our study to develop an ML-based predictive model for DVT formation following spinal surgery, and this model can be used to assist in clinical evaluation and decision-making.
Collapse
Affiliation(s)
- Xingyan Wu
- Department of Anesthesiology, Second Affiliated Hospital of Zunyi Medical University, Guizhou Province, China.
| | - Zhao Wang
- Department of Anesthesiology, Second Affiliated Hospital of Zunyi Medical University, Guizhou Province, China
| | - Leilei Zheng
- Department of Anesthesiology, Second Affiliated Hospital of Zunyi Medical University, Guizhou Province, China
| | - Yihui Yang
- Department of Anesthesiology, Third Affiliated Hospital of Zunyi Medical University, Guizhou Province, China
| | - Wenyan Shi
- Department of Anesthesiology, Second Affiliated Hospital of Zunyi Medical University, Guizhou Province, China
| | - Jing Wang
- Department of Anesthesiology, Second Affiliated Hospital of Zunyi Medical University, Guizhou Province, China
| | - Dexing Liu
- Affiliated Hospital of Zunyi Medical University, Guizhou Province, China
| | - Yi Zhang
- Department of Anesthesiology, Second Affiliated Hospital of Zunyi Medical University, Guizhou Province, China.
| |
Collapse
|
11
|
Muhammad D, Bendechache M. Unveiling the black box: A systematic review of Explainable Artificial Intelligence in medical image analysis. Comput Struct Biotechnol J 2024; 24:542-560. [PMID: 39252818 PMCID: PMC11382209 DOI: 10.1016/j.csbj.2024.08.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 08/07/2024] [Accepted: 08/07/2024] [Indexed: 09/11/2024] Open
Abstract
This systematic literature review examines state-of-the-art Explainable Artificial Intelligence (XAI) methods applied to medical image analysis, discussing current challenges and future research directions, and exploring evaluation metrics used to assess XAI approaches. With the growing efficiency of Machine Learning (ML) and Deep Learning (DL) in medical applications, there's a critical need for adoption in healthcare. However, their "black-box" nature, where decisions are made without clear explanations, hinders acceptance in clinical settings where decisions have significant medicolegal consequences. Our review highlights the advanced XAI methods, identifying how they address the need for transparency and trust in ML/DL decisions. We also outline the challenges faced by these methods and propose future research directions to improve XAI in healthcare. This paper aims to bridge the gap between cutting-edge computational techniques and their practical application in healthcare, nurturing a more transparent, trustworthy, and effective use of AI in medical settings. The insights guide both research and industry, promoting innovation and standardisation in XAI implementation in healthcare.
Collapse
Affiliation(s)
- Dost Muhammad
- ADAPT Research Centre, School of Computer Science, University of Galway, Galway, Ireland
| | - Malika Bendechache
- ADAPT Research Centre, School of Computer Science, University of Galway, Galway, Ireland
| |
Collapse
|
12
|
Le EPV, Wong MYZ, Rundo L, Tarkin JM, Evans NR, Weir-McCall JR, Chowdhury MM, Coughlin PA, Pavey H, Zaccagna F, Wall C, Sriranjan R, Corovic A, Huang Y, Warburton EA, Sala E, Roberts M, Schönlieb CB, Rudd JHF. Using machine learning to predict carotid artery symptoms from CT angiography: A radiomics and deep learning approach. Eur J Radiol Open 2024; 13:100594. [PMID: 39280120 PMCID: PMC11402422 DOI: 10.1016/j.ejro.2024.100594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 07/20/2024] [Accepted: 08/04/2024] [Indexed: 09/18/2024] Open
Abstract
Purpose To assess radiomics and deep learning (DL) methods in identifying symptomatic Carotid Artery Disease (CAD) from carotid CT angiography (CTA) images. We further compare the performance of these novel methods to the conventional calcium score. Methods Carotid CT angiography (CTA) images from symptomatic patients (ischaemic stroke/transient ischaemic attack within the last 3 months) and asymptomatic patients were analysed. Carotid arteries were classified into culprit, non-culprit and asymptomatic. The calcium score was assessed using the Agatston method. 93 radiomic features were extracted from regions-of-interest drawn on 14 consecutive CTA slices. For DL, convolutional neural networks (CNNs) with and without transfer learning were trained directly on CTA slices. Predictive performance was assessed over 5-fold cross validated AUC scores. SHAP and GRAD-CAM algorithms were used for explainability. Results 132 carotid arteries were analysed (41 culprit, 41 non-culprit, and 50 asymptomatic). For asymptomatic vs symptomatic arteries, radiomics attained a mean AUC of 0.96(± 0.02), followed by DL 0.86(± 0.06) and then calcium 0.79(± 0.08). For culprit vs non-culprit arteries, radiomics achieved a mean AUC of 0.75(± 0.09), followed by DL 0.67(± 0.10) and then calcium 0.60(± 0.02). For multi-class classification, the mean AUCs were 0.95(± 0.07), 0.79(± 0.05), and 0.71(± 0.07) for radiomics, DL and calcium, respectively. Explainability revealed consistent patterns in the most important radiomic features. Conclusions Our study highlights the potential of novel image analysis techniques in extracting quantitative information beyond calcification in the identification of CAD. Though further work is required, the transition of these novel techniques into clinical practice may eventually facilitate better stroke risk stratification.
Collapse
Affiliation(s)
| | - Mark Y Z Wong
- Department of Medicine, University of Cambridge, United Kingdom
| | - Leonardo Rundo
- Department of Radiology, University of Cambridge, United Kingdom
- Cancer Research UK Cambridge Centre, University of Cambridge, United Kingdom
- Department of Information and Electrical Engineering and Applied Mathematics (DIEM), University of Salerno, Italy
| | - Jason M Tarkin
- Department of Medicine, University of Cambridge, United Kingdom
| | - Nicholas R Evans
- Department of Clinical Neurosciences, University of Cambridge, United Kingdom
| | - Jonathan R Weir-McCall
- Department of Radiology, University of Cambridge, United Kingdom
- Department of Radiology, Royal Papworth Hospital, Cambridge, UK
| | - Mohammed M Chowdhury
- Division of Vascular Surgery, Department of Surgery, University of Cambridge, United Kingdom
| | | | - Holly Pavey
- Division of Experimental Medicine and Immunotherapeutics, University of Cambridge, United Kingdom
| | - Fulvio Zaccagna
- Department of Radiology, University of Cambridge, United Kingdom
- Department of Imaging, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Investigative Medicine Division, Radcliffe Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Chris Wall
- Department of Medicine, University of Cambridge, United Kingdom
| | | | - Andrej Corovic
- Department of Medicine, University of Cambridge, United Kingdom
| | - Yuan Huang
- Department of Medicine, University of Cambridge, United Kingdom
- Department of Radiology, University of Cambridge, United Kingdom
- EPSRC Centre for Mathematical Imaging in Healthcare, University of Cambridge, United Kingdom
| | | | - Evis Sala
- Dipartimento di Scienze Radiologiche ed Ematologiche, Università Cattolica del Sacro Cuore, Rome, Italy
- Dipartimento Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Michael Roberts
- Department of Medicine, University of Cambridge, United Kingdom
- EPSRC Centre for Mathematical Imaging in Healthcare, University of Cambridge, United Kingdom
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, United Kingdom
| | | | - James H F Rudd
- Department of Medicine, University of Cambridge, United Kingdom
- EPSRC Centre for Mathematical Imaging in Healthcare, University of Cambridge, United Kingdom
| |
Collapse
|
13
|
Wang WK, Jeong H, Hershkovich L, Cho P, Singh K, Lederer L, Roghanizad AR, Shandhi MMH, Kibbe W, Dunn J. Tree-based classification model for Long-COVID infection prediction with age stratification using data from the National COVID Cohort Collaborative. JAMIA Open 2024; 7:ooae111. [PMID: 39524607 PMCID: PMC11547948 DOI: 10.1093/jamiaopen/ooae111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 10/02/2024] [Accepted: 10/07/2024] [Indexed: 11/16/2024] Open
Abstract
Objectives We propose and validate a domain knowledge-driven classification model for diagnosing post-acute sequelae of SARS-CoV-2 infection (PASC), also known as Long COVID, using Electronic Health Records (EHRs) data. Materials and Methods We developed a robust model that incorporates features strongly indicative of PASC or associated with the severity of COVID-19 symptoms as identified in our literature review. The XGBoost tree-based architecture was chosen for its ability to handle class-imbalanced data and its potential for high interpretability. Using the training data provided by the Long COVID Computation Challenge (L3C), which was a sample of the National COVID Cohort Collaborative (N3C), our models were fine-tuned and calibrated to optimize Area Under the Receiver Operating characteristic curve (AUROC) and the F1 score, following best practices for the class-imbalanced N3C data. Results Our age-stratified classification model demonstrated strong performance with an average 5-fold cross-validated AUROC of 0.844 and F1 score of 0.539 across the young adult, mid-aged, and older-aged populations in the training data. In an independent testing dataset, which was made available after the challenge was over, we achieved an overall AUROC score of 0.814 and F1 score of 0.545. Discussion The results demonstrated the utility of knowledge-driven feature engineering in a sparse EHR data and demographic stratification in model development to diagnose a complex and heterogeneously presenting condition like PASC. The model's architecture, mirroring natural clinician decision-making processes, contributed to its robustness and interpretability, which are crucial for clinical translatability. Further, the model's generalizability was evaluated over a new cross-sectional data as provided in the later stages of the L3C challenge. Conclusion The study proposed and validated the effectiveness of age-stratified, tree-based classification models to diagnose PASC. Our approach highlights the potential of machine learning in addressing the diagnostic challenges posed by the heterogeneity of Long-COVID symptoms.
Collapse
Affiliation(s)
- Will Ke Wang
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, United States
| | - Hayoung Jeong
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, United States
| | - Leeor Hershkovich
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, United States
| | - Peter Cho
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, United States
| | - Karnika Singh
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, United States
| | - Lauren Lederer
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, United States
| | - Ali R Roghanizad
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, United States
| | - Md Mobashir Hasan Shandhi
- School of Electrical, Computer, and Energy Engineering, Arizona State University, Tempe, AZ 85281, United States
- Biodesign Institute Center for Bioelectronics and Biosensors, Arizona State University, Tempe, AZ 85281, United States
| | - Warren Kibbe
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27708, United States
| | - Jessilyn Dunn
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, United States
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27708, United States
| |
Collapse
|
14
|
Du C, Pei J, Feng Z. Unraveling the complex interactions between ozone pollution and agricultural productivity in China's main winter wheat region using an interpretable machine learning framework. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 954:176293. [PMID: 39284447 DOI: 10.1016/j.scitotenv.2024.176293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 09/04/2024] [Accepted: 09/13/2024] [Indexed: 09/20/2024]
Abstract
Surface ozone has become a significant atmospheric pollutant in China, exerting a profound impact on crop production and posing a serious threat to food security. Previous studies have extensively explored the physiological mechanisms of ozone damage to plants. However, the effects of ozone interactions with other environmental factors, such as climate change, on agricultural productivity at the regional scale, particularly under natural conditions, remain insufficiently understood. In this study, we employed an interpretable machine learning framework, specifically the eXtreme Gradient Boosting (XGBoost) algorithm enhanced by SHapley Additive exPlanations (SHAP), to investigate the influence of ozone and its interactions with environmental factors on crop production in China's primary winter wheat region. Additionally, a structural equation model was developed to elucidate the mechanisms driving these interactions. Our findings demonstrate that ozone pollution exerts a significant negative effect on winter wheat productivity (r = -0.47, P < 0.001), with productivity losses escalating from -12.28 % to -22.09 % as ozone levels increase. Notably, the impact of ozone is spatially heterogeneous, with western Shandong province identified as a hotspot for ozone-induced damage. Furthermore, our results confirm the complexity of the relationship between ozone pollution and agricultural productivity, which is influenced by multiple interacting environmental factors. Specifically, we found that severe ozone pollution, when combined with high aerosol concentrations or elevated temperatures, significantly exacerbates crop productivity losses, although drought conditions can partially mitigate these adverse effects. Our study highlights the importance of incorporating the interactive effects of air pollution and climate change into future crop models. The comprehensive framework developed in this study, which integrates statistical modeling with explainable machine learning, provides a valuable methodological reference for quantitatively assessing the impact of air pollution on crop productivity at a regional scale.
Collapse
Affiliation(s)
- Chenxi Du
- School of Geospatial Engineering and Science, Sun Yat-sen University, Zhuhai 519082, China
| | - Jie Pei
- School of Geospatial Engineering and Science, Sun Yat-sen University, Zhuhai 519082, China; Key Laboratory of Natural Resources Monitoring in Tropical and Subtropical Area of South China, Ministry of Natural Resources, Zhuhai 519082, China.
| | - Zhaozhong Feng
- Key Laboratory of Ecosystem Carbon Source and Sink, China Meteorological Administration (ECSS-CMA), School of Ecology and Applied Meteorology, Nanjing University of Information Science & Technology, Nanjing 210044, China
| |
Collapse
|
15
|
Han S, Liu L. GP-HTNLoc: A graph prototype head-tail network-based model for multi-label subcellular localization prediction of ncRNAs. Comput Struct Biotechnol J 2024; 23:2034-2048. [PMID: 38765609 PMCID: PMC11101938 DOI: 10.1016/j.csbj.2024.04.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 04/17/2024] [Accepted: 04/18/2024] [Indexed: 05/22/2024] Open
Abstract
Numerous research results demonstrated that understanding the subcellular localization of non-coding RNAs (ncRNAs) is pivotal in elucidating their roles and regulatory mechanisms in cells. Despite the existence of over ten computational models dedicated to predicting the subcellular localization of ncRNAs, a majority of these models are designed solely for single-label prediction. In reality, ncRNAs often exhibit localization across multiple subcellular compartments. Furthermore, the existing multi-label localization prediction models are insufficient in addressing the challenges posed by the scarcity of training samples and class imbalance in ncRNA dataset. To address these limitations, this study proposes a novel multi-label localization prediction model for ncRNAs, named GP-HTNLoc. To mitigate class imbalance, GP-HTNLoc adopts separate training approaches for head and tail location labels. Additionally, GP-HTNLoc introduces a pioneering graph prototype module to enhance its performance in small-sample, multi-label scenarios. The experimental results based on 10-fold cross-validation on benchmark datasets demonstrate that GP-HTNLoc achieves competitive predictive performance. The average results from 10 rounds of testing on an independent dataset show that GP-HTNLoc outperforms the best existing models on the human lncRNA, human snoRNA, and human miRNA subsets, with average precision improvements of 31.5%, 14.2%, and 5.6%, respectively, reaching 0.685, 0.632, and 0.704. A user-friendly online GP-HTNLoc server is accessible at https://56s8y85390.goho.co.
Collapse
Affiliation(s)
- Shuangkai Han
- School of Information, Yunnan Normal University, Kunming, China
- Engineering Research Center of Computer Vision and Intelligent Control Technology, Department of Education of Yunnan Province, China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming, China
- Engineering Research Center of Computer Vision and Intelligent Control Technology, Department of Education of Yunnan Province, China
| |
Collapse
|
16
|
Hu X, Zhu M, Feng Z, Stanković L. Manifold-based Shapley explanations for high dimensional correlated features. Neural Netw 2024; 180:106634. [PMID: 39191125 DOI: 10.1016/j.neunet.2024.106634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 07/16/2024] [Accepted: 08/13/2024] [Indexed: 08/29/2024]
Abstract
Explainable artificial intelligence (XAI) holds significant importance in enhancing the reliability and transparency of network decision-making. SHapley Additive exPlanations (SHAP) is a game-theoretic approach for network interpretation, attributing confidence to inputs features to measure their importance. However, SHAP often relies on a flawed assumption that the model's features are independent, leading to incorrect results when dealing with correlated features. In this paper, we introduce a novel manifold-based Shapley explanation method, termed Latent SHAP. Latent SHAP transforms high-dimensional data into low-dimensional manifolds to capture correlations among features. We compute Shapley values on the data manifold and devise three distinct gradient-based mapping methods to transfer them back to the high-dimensional space. Our primary objectives include: (1) correcting misinterpretations by SHAP in certain samples; (2) addressing the challenge of feature correlations in high-dimensional data interpretation; and (3) reducing algorithmic complexity through Manifold SHAP for application in complex network interpretations. Code is available at https://github.com/Teriri1999/Latent-SHAP.
Collapse
Affiliation(s)
- Xuran Hu
- School of Electronic Engineering, Xidian University, Xi'an, China; Kunshan Innovation Institute of Xidian University, School of Electronic Engineering, Xidian University, Xi'an, China
| | - Mingzhe Zhu
- School of Electronic Engineering, Xidian University, Xi'an, China; Kunshan Innovation Institute of Xidian University, School of Electronic Engineering, Xidian University, Xi'an, China.
| | - Zhenpeng Feng
- School of Electronic Engineering, Xidian University, Xi'an, China
| | - Ljubiša Stanković
- Faculty of Electrical Engineering, University of Montenegro, Podgorica, Montenegro
| |
Collapse
|
17
|
Esteban-Medina M, de la Oliva Roque VM, Herráiz-Gil S, Peña-Chilet M, Dopazo J, Loucera C. drexml: A command line tool and Python package for drug repurposing. Comput Struct Biotechnol J 2024; 23:1129-1143. [PMID: 38510973 PMCID: PMC10950807 DOI: 10.1016/j.csbj.2024.02.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/27/2024] [Accepted: 02/27/2024] [Indexed: 03/22/2024] Open
Abstract
We introduce drexml, a command line tool and Python package for rational data-driven drug repurposing. The package employs machine learning and mechanistic signal transduction modeling to identify drug targets capable of regulating a particular disease. In addition, it employs explainability tools to contextualize potential drug targets within the functional landscape of the disease. The methodology is validated in Fanconi Anemia and Familial Melanoma, two distinct rare diseases where there is a pressing need for solutions. In the Fanconi Anemia case, the model successfully predicts previously validated repurposed drugs, while in the Familial Melanoma case, it identifies a promising set of drugs for further investigation.
Collapse
Affiliation(s)
- Marina Esteban-Medina
- Platform for Computational Medicine, Andalusian Public Foundation Progress and Health-FPS, Seville, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocío, Seville, Spain
| | - Víctor Manuel de la Oliva Roque
- Platform for Computational Medicine, Andalusian Public Foundation Progress and Health-FPS, Seville, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocío, Seville, Spain
| | - Sara Herráiz-Gil
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER-ISCIII), U714, Madrid, Spain
- Departamento de Bioingeniería, Universidad Carlos III de Madrid (UC3M), Madrid, Spain
- Regenerative Medicine and Tissue Engineering Group, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital (IIS-FJD), Madrid, Spain
- Epithelial Biomedicine Division, Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas (CIEMAT), Madrid, Spain
| | - María Peña-Chilet
- Platform for Computational Medicine, Andalusian Public Foundation Progress and Health-FPS, Seville, Spain
- Platform of Big Data, AI and Biostatistics, Health Research Institute La Fe (IISLAFE), Valencia, Spain
| | - Joaquín Dopazo
- Platform for Computational Medicine, Andalusian Public Foundation Progress and Health-FPS, Seville, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocío, Seville, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER-ISCIII), U715, Seville, Spain
- FPS/ELIXIR-es, Hospital Virgen del Rocío, Seville, Spain
| | - Carlos Loucera
- Platform for Computational Medicine, Andalusian Public Foundation Progress and Health-FPS, Seville, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocío, Seville, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER-ISCIII), U715, Seville, Spain
| |
Collapse
|
18
|
Huang J, Wang X, Xia R, Yang D, Liu J, Lv Q, Yu X, Meng J, Chen K, Song B, Wang Y. Domain-knowledge enabled ensemble learning of 5-formylcytosine (f5C) modification sites. Comput Struct Biotechnol J 2024; 23:3175-3185. [PMID: 39253057 PMCID: PMC11381828 DOI: 10.1016/j.csbj.2024.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 08/07/2024] [Accepted: 08/07/2024] [Indexed: 09/11/2024] Open
Abstract
5-formylcytidine (f5C) is a unique post-transcriptional RNA modification found in mRNA and tRNA at the wobble site, playing a crucial role in mitochondrial protein synthesis and potentially contributing to the regulation of translation. Recent studies have unveiled that the f5C modifications may drive mitochondrial mRNA translation to power cancer metastasis. Accurate identification of f5C sites is essential for further unraveling their molecular functions and regulatory mechanisms, but there are currently no computational methods available for predicting their locations. In this study, we introduce an innovative ensemble approach, successfully enabling the computational recognition of Saccharomyces cerevisiae f5C. We conducted a comprehensive model selection process that involved multiple basic machine learning and deep learning algorithms such as recurrent neural networks, convolutional neural networks and Transformer-based models. Initially trained only on sequence information, these individual models achieved an AUROC ranging from 0.7104 to 0.7492. Through the integration of 32 novel domain-derived genomic features, the performance of individual models has significantly improved to an AUROC between 0.7309 and 0.8076. To further enhance accuracy and robustness, we then constructed the ensembles of these individual models with different combinations. The best performance attained by our ensemble models reached an AUROC of 0.8391. Shapley additive explanations were conducted to explain the significant contributions of genomic features, providing insights into the putative distribution of f5C across various topological regions and potentially paving the way for revealing their functional relevance within distinct genomic contexts. A freely accessible web server that allows real-time analysis of user-uploaded sites can be accessed at: www.rnamd.org/Resf5C-Pred.
Collapse
Affiliation(s)
- Jiaming Huang
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
- Department of Biological Sciences, School of Science, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Xuan Wang
- Department of Biological Sciences, School of Science, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Rong Xia
- Department of Biological Sciences, School of Science, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
- School of AI and Advanced Computing, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Dongqing Yang
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Jian Liu
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Qi Lv
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Xiaoxuan Yu
- Department of Pharmacology, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Jia Meng
- Department of Biological Sciences, School of Science, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
- AI University Research Centre, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L7 8TX, United Kingdom
| | - Kunqi Chen
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China
| | - Bowen Song
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Yue Wang
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
| |
Collapse
|
19
|
Li Y, Xiong X, Liu X, Xu M, Yang B, Li X, Li Y, Lin B, Xu B. Predicting BRCA mutation and stratifying targeted therapy response using multimodal learning: a multicenter study. Ann Med 2024; 56:2399759. [PMID: 39258876 PMCID: PMC11391871 DOI: 10.1080/07853890.2024.2399759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 07/29/2024] [Accepted: 07/30/2024] [Indexed: 09/12/2024] Open
Abstract
BACKGROUND The status of BRCA1/2 genes plays a crucial role in the treatment decision-making process for multiple cancer types. However, due to high costs and limited resources, a demand for BRCA1/2 genetic testing among patients is currently unmet. Notably, not all patients with BRCA1/2 mutations achieve favorable outcomes with poly (ADP-ribose) polymerase inhibitors (PARPi), indicating the necessity for risk stratification. In this study, we aimed to develop and validate a multimodal model for predicting BRCA1/2 gene status and prognosis with PARPi treatment. METHODS We included 1695 slides from 1417 patients with ovarian, breast, prostate, and pancreatic cancers across three independent cohorts. Using a self-attention mechanism, we constructed a multi-instance attention model (MIAM) to detect BRCA1/2 gene status from hematoxylin and eosin (H&E) pathological images. We further combined tissue features from the MIAM model, cell features, and clinical factors (the MIAM-C model) to predict BRCA1/2 mutations and progression-free survival (PFS) with PARPi therapy. Model performance was evaluated using area under the curve (AUC) and Kaplan-Meier analysis. Morphological features contributing to MIAM-C were analyzed for interpretability. RESULTS Across the four cancer types, MIAM-C outperformed the deep learning-based MIAM in identifying the BRCA1/2 genotype. Interpretability analysis revealed that high-attention regions included high-grade tumors and lymphocytic infiltration, which correlated with BRCA1/2 mutations. Notably, high lymphocyte ratios appeared characteristic of BRCA1/2 mutations. Furthermore, MIAM-C predicted PARPi therapy response (log-rank p < 0.05) and served as an independent prognostic factor for patients with BRCA1/2-mutant ovarian cancer (p < 0.05, hazard ratio:0.4, 95% confidence interval: 0.16-0.99). CONCLUSIONS The MIAM-C model accurately detected BRCA1/2 gene status and effectively stratified prognosis for patients with BRCA1/2 mutations.
Collapse
Affiliation(s)
- Yi Li
- School of Medicine, Chongqing University, Chongqing, China
- Chongqing Key Laboratory for Intelligent Oncology in Breast Cancer, Chongqing University Cancer Hospital, Chongqing, China
| | - Xiaomin Xiong
- School of Medicine, Chongqing University, Chongqing, China
- Chongqing Key Laboratory for Intelligent Oncology in Breast Cancer, Chongqing University Cancer Hospital, Chongqing, China
| | - Xiaohua Liu
- Bioengineering College of Chongqing University, Chongqing, China
| | - Mengke Xu
- Chongqing Key Laboratory for Intelligent Oncology in Breast Cancer, Chongqing University Cancer Hospital, Chongqing, China
| | - Boping Yang
- Department of General Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing Health Center for Women and Children, Chongqing, China
| | - Xiaoju Li
- Department of Pathology, Chongqing University Cancer Hospital and School of Medicine, Chongqing University, Chongqing, China
| | - Yu Li
- Department of Pathology, Chongqing University Cancer Hospital and School of Medicine, Chongqing University, Chongqing, China
| | - Bo Lin
- Chongqing Key Laboratory for Intelligent Oncology in Breast Cancer, Chongqing University Cancer Hospital, Chongqing, China
| | - Bo Xu
- School of Medicine, Chongqing University, Chongqing, China
- Chongqing Key Laboratory for Intelligent Oncology in Breast Cancer, Chongqing University Cancer Hospital, Chongqing, China
| |
Collapse
|
20
|
Li J, Zhu M, Yan L. Predictive models of sepsis-associated acute kidney injury based on machine learning: a scoping review. Ren Fail 2024; 46:2380748. [PMID: 39082758 PMCID: PMC11293267 DOI: 10.1080/0886022x.2024.2380748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 06/27/2024] [Accepted: 07/11/2024] [Indexed: 08/03/2024] Open
Abstract
BACKGROUND With the development of artificial intelligence, the application of machine learning to develop predictive models for sepsis-associated acute kidney injury has made potential breakthroughs in early identification, grading, diagnosis, and prognosis determination. METHODS Here, we conducted a systematic search of the PubMed, Cochrane Library, Embase (Ovid), Web of Science, and Scopus databases on April 28, 2023, and screened relevant literature. Then, we comprehensively extracted relevant data related to machine learning algorithms, predictors, and predicted objectives. We subsequently performed a critical evaluation of research quality, data aggregation, and analyses. RESULTS We screened 25 studies on predictive models for sepsis-associated acute kidney injury from a total of originally identified 2898 studies. The most commonly used machine learning algorithm is traditional logistic regression, followed by eXtreme gradient boosting. We categorized these predictive models into early identification models (60%), prognostic prediction models (32%), and subtype identification models (8%) according to their predictive purpose. The five most commonly used predictors were serum creatinine levels, lactate levels, age, blood urea nitrogen concentration, and diabetes mellitus. In addition, a single data source, insufficient assessment of clinical utility, lack of model bias assessment, and hyperparameter adjustment may be the main reasons for the low quality of the current research. CONCLUSIONS However, studies on the nondeath prognostic outcomes, the long-term clinical outcomes, and the subtype identification models are insufficient. Additionally, the poor quality of the research and the insufficient practicality of the model are problems that need to be addressed urgently.
Collapse
Affiliation(s)
- Jie Li
- Department of Critical Care Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Department of Emergency, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Manli Zhu
- Department of Critical Care Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Department of Emergency, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Li Yan
- Department of Critical Care Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Department of Emergency, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
21
|
Qi D, Liu T. VotePLMs-AFP: Identification of antifreeze proteins using transformer-embedding features and ensemble learning. Biochim Biophys Acta Gen Subj 2024; 1868:130721. [PMID: 39426757 DOI: 10.1016/j.bbagen.2024.130721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 09/24/2024] [Accepted: 10/11/2024] [Indexed: 10/21/2024]
Abstract
Antifreeze proteins (AFPs) are a unique class of biomolecules capable of protecting other proteins, cell membranes, and cellular structures within organisms from damage caused by freezing conditions. Given the significance of AFPs in various domains such as biotechnology, agriculture, and medicine, several machine learning methods have been developed to identify AFPs. However, due to the complexity and diversity of AFPs, the predictive performance of existing methods is limited. Therefore, there is an urgent need to develop an efficient and rapid computational method for accurately predicting AFPs. In this study, we proposed a novel predictor based on transformer-embedding features and ensemble learning for the identification of AFPs, termed VotePLMs-AFP. Firstly, three types of feature descriptors were extracted from pre-trained protein language models (PLMs) during the feature extraction process. Subsequently, we analyzed six combinations generated by these three embeddings to explore the optimal feature set, which was input into the soft voting-based ensemble learning classifier for the identification of AFPs. Finally, we evaluated the model on the two benchmark datasets. The experimental results show that our model achieves high prediction accuracy in 10-fold cross-validation (CV) and independent set testing, outperforming existing state-of-the-art methods. Therefore, our model could serve as an effective tool for predicting AFPs.
Collapse
Affiliation(s)
- Dawei Qi
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.
| |
Collapse
|
22
|
Wu P, Guo Q, Zhao Y, Bian M, Wang G, Wu W, Shao J, Wang Q, Duan X, Zhang JJ. Construction of a minute ventilation model to address inter-individual inhaled dose variability within identical exposure scenarios using wearable devices. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 954:176415. [PMID: 39312972 DOI: 10.1016/j.scitotenv.2024.176415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 09/18/2024] [Accepted: 09/18/2024] [Indexed: 09/25/2024]
Abstract
Inhaled dose is crucial for accurately assessing exposure to air pollution, determined by pollutant concentration and minute ventilation (VE). However, the VE predictive models and its application to assess the health effects of air pollution are still lacking. In this study, we developed VE predictive models using machine learning techniques, utilizing data obtained from eighty participants who underwent a laboratory cardiopulmonary exercise test (CPET). VE predictive models were developed using generalized additive model (GAM), random forest model (RF) and extreme gradient boosting (XGBoost) and analyzed for explanation of input variables. The Random Forest model, cross-validated, exhibited outstanding performance with an R2 of 0.986 and a MAE of 1.816 L/min. The median difference between the measured VE and the predicted VE was 0.18 L/min, and the median difference between the black carbon (BC) inhaled dose based on predicted VE and measured VE was 0.02 ng. Employing explainable machine learning, the results showed that metabolic equivalent (METs), heart rate, and body weight are the three top important variables, emphasizing the significance of incorporating METs variables when constructing VE models. Through multiple linear regression models and an adjusted stratified analysis model, the significant adverse association between BC concentration and inhaled dose on diastolic blood pressure (DBP) was only observed in female. The disparity in the effect of BC inhaled dose compared to BC concentration on DBP reached up to 115 %. This study is the first to explore the ability of different machine learning algorithms to construct VE prediction models and directly apply the models to assess health effects of an example pollutant. This study contributes to the accurate assessment of air pollution exposure leveraging wearable devices, an approach useful for environmental epidemiology studies.
Collapse
Affiliation(s)
- Pengpeng Wu
- School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Qian Guo
- China North Artificial Intelligence & Innovation Research Institute, Beijing 100072. China; Collective Intelligence & Collaboration Laboratory, Beijing 100072, China
| | - Yuchen Zhao
- School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Mengyao Bian
- School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Gang Wang
- Department of Otolaryngology-Head and Neck Surgery, PLA Strategic Support Force Characteristic Medical Center, Beijing 100101, China
| | - Wei Wu
- Department of Otolaryngology-Head and Neck Surgery, PLA Strategic Support Force Characteristic Medical Center, Beijing 100101, China
| | - Jing Shao
- National Institute of Sports Medicine, General Administration of Sport of China, Beijing 100029, China
| | - Qirong Wang
- National Institute of Sports Medicine, General Administration of Sport of China, Beijing 100029, China
| | - Xiaoli Duan
- School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| | - Junfeng Jim Zhang
- Nicholas School of the Environment and Global Health Institute, Duke University, Durham, NC 27708, USA; Duke Kunshan University, Kunshan 215316, China
| |
Collapse
|
23
|
Chen B, Zhen L, Wang L, Zhong H, Lin C, Yang L, Xu W, Huang RJ. Revisiting the impact of temperature on ground-level ozone: A causal inference approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 953:176062. [PMID: 39244056 DOI: 10.1016/j.scitotenv.2024.176062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 08/13/2024] [Accepted: 09/03/2024] [Indexed: 09/09/2024]
Abstract
It has been widely acknowledged that high temperatures and heatwaves promote ozone concentration, worsening the ambient air quality. However, temperature can impact ozone via multiple pathways, and quantifying each path is challenging due to environmental confounders. In this study, we frame the problem as a treatment-outcome issue and utilize a machine learning-aided causal inference technique to disentangle the impact of temperature on ozone formation. Our approach reveals that failing to account for the covariations of solar radiation and other meteorological factors leads to an overestimation of the O3-temperature response. Through process evaluation, we find that temperature influences local ozone formation mainly by accelerating chemical reactions and enhancing precursor production and changing boundary layer heights. The O3 response to temperature via enhancing soil NOx and changing relative humidity and wind field is however observable. A better appreciation of O3-temperature response is critical for improving air quality regulation in the warming future.
Collapse
Affiliation(s)
- Baihua Chen
- Center for Excellence in Regional Atmospheric Environment, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen, China
| | - Ling Zhen
- Center for Excellence in Regional Atmospheric Environment, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen, China; University of Chinese Academy of Sciences, Beijing, China
| | - Lin Wang
- Center for Excellence in Regional Atmospheric Environment, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen, China
| | - Haobin Zhong
- Center for Excellence in Regional Atmospheric Environment, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen, China
| | - Chunshui Lin
- State Key Laboratory of Loess and Quaternary Geology (SKLLQG), Center for Excellence in Quaternary Science and Global Change, Institute of Earth Environment, Chinese Academy of Sciences, Xi'an 710061, China.
| | - Lin Yang
- Center for Excellence in Regional Atmospheric Environment, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen, China; School of Environmental Science and Technology, University of Nottingham Ningbo, Ningbo, China
| | - Wei Xu
- Center for Excellence in Regional Atmospheric Environment, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen, China; State Key Laboratory of Loess and Quaternary Geology (SKLLQG), Center for Excellence in Quaternary Science and Global Change, Institute of Earth Environment, Chinese Academy of Sciences, Xi'an 710061, China.
| | - Ru-Jin Huang
- State Key Laboratory of Loess and Quaternary Geology (SKLLQG), Center for Excellence in Quaternary Science and Global Change, Institute of Earth Environment, Chinese Academy of Sciences, Xi'an 710061, China
| |
Collapse
|
24
|
Li Y, Huang T, Lee HF, Heo Y, Ho KF, Yim SHL. Integrating Doppler LiDAR and machine learning into land-use regression model for assessing contribution of vertical atmospheric processes to urban PM 2.5 pollution. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 952:175632. [PMID: 39168320 DOI: 10.1016/j.scitotenv.2024.175632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 08/06/2024] [Accepted: 08/17/2024] [Indexed: 08/23/2024]
Abstract
Air pollution has been recognized as a global issue, through adverse effects on environment and health. While vertical atmospheric processes substantially affect urban air pollution, traditional epidemiological research using Land-use regression (LUR) modeling usually focused on ground-level attributes without considering upper-level atmospheric conditions. This study aimed to integrate Doppler LiDAR and machine learning techniques into LUR models (LURF-LiDAR) to comprehensively evaluate urban air pollution in Hong Kong, and to assess complex interactions between vertical atmospheric processes and urban air pollution from long-term (i.e., annual) and short-term (i.e., two air pollution episodes) views in 2021. The results demonstrated significant improvements in model performance, achieving CV R2 values of 0.81 (95 % CI: 0.75-0.86) for the long-term PM2.5 prediction model and 0.90 (95 % CI: 0.87-0.91) for the short-term models. Approximately 69 % of ground-level air pollution arose from the mixing of ground- and lower-level (105 m-225 m) particles, while 21 % was associated with upper-level (825 m-945 m) atmospheric processes. The identified transboundary air pollution (TAP) layer was located at ~900 m above the ground. The identified Episode one (E1: 7 Jan-22 Jan) was induced by the accumulation of local emissions under stable atmospheric conditions, whereas Episode two (E2: 13 Dec-24 Dec) was regulated by TAP under instable and turbulent conditions. Our improved air quality prediction model is accurate and comprehensive with high interpretability for supporting urban planning and air quality policies.
Collapse
Affiliation(s)
- Yue Li
- Department of Geography and Resource Management, The Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong 999077, China
| | - Tao Huang
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 639798, Singapore; Earth Observatory of Singapore, Nanyang Technological University, Singapore 639798, Singapore
| | - Harry Fung Lee
- Department of Geography and Resource Management, The Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong 999077, China
| | - Yeonsook Heo
- School of Civil, Environmental and Architectural Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, Republic of Korea
| | - Kin-Fai Ho
- The Jockey Club School of Public Health and Primary Care, The Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong 999077, China
| | - Steve H L Yim
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 639798, Singapore; Earth Observatory of Singapore, Nanyang Technological University, Singapore 639798, Singapore; Asian School of the Environment, Nanyang Technological University, Singapore 639798, Singapore.
| |
Collapse
|
25
|
Xu Z, Ding Y, Han SC, Zhang C. Predicting the performance of lithium adsorption and recovery from unconventional water sources with machine learning. WATER RESEARCH 2024; 266:122374. [PMID: 39260198 DOI: 10.1016/j.watres.2024.122374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 08/25/2024] [Accepted: 09/01/2024] [Indexed: 09/13/2024]
Abstract
Selective lithium (Li) recovery from unconventional water sources (UWS) (e.g., shale gas waters, geothermal brines, and rejected seawater desalination brines) using inorganic lithium-ion sieve (LIS) materials can address Li supply shortages and distribution issues. However, the development of high-performance LIS materials and the optimization of recovery-related operating parameters are hampered by the variety of production methods, intricate procedures, and experimental expenses. Machine learning (ML) techniques offer potential solutions for enhancing LIS material development. We collected literature data on Li adsorption, categorizing 16 parameters into adsorbent parameters, operating parameters, and solution components. Three tree-based algorithms-Random Forest (RF), Gradient Boosting Decision Trees (GBDT), and Extreme Gradient Boosting (XGBoost)-were used to evaluate the impact of these parameters on lithium adsorption. The grouped random splitting method limited data leakage and mitigated overfitting. XGBoost demonstrated the best performance, with an R² of 0.98 and a root-mean-squared error (RMSE) of 1.72. The SHAP values highlighted that operating parameters were the most influential, followed by adsorbent parameters and coexisting ion concentrations. Therefore, focusing on optimizing operating parameters or making targeted improvements on LIS based on operating conditions will enhance LIS performances in UWS. These insights are crucial for optimizing Li adsorption processes and designing effective inorganic LIS materials.
Collapse
Affiliation(s)
- Ziyang Xu
- CAS Key Laboratory of Urban Pollutant Conversion, Department of Environmental Science and Engineering, University of Science and Technology of China, Hefei, 230026, China.
| | - Yihao Ding
- School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC 3010, Australia.
| | - Soyeon Caren Han
- School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC 3010, Australia.
| | - Changyong Zhang
- CAS Key Laboratory of Urban Pollutant Conversion, Department of Environmental Science and Engineering, University of Science and Technology of China, Hefei, 230026, China.
| |
Collapse
|
26
|
Zhang K, Wang N. Machine learning modeling of thermally assisted biodrying process for municipal sludge. WASTE MANAGEMENT (NEW YORK, N.Y.) 2024; 188:95-106. [PMID: 39128323 DOI: 10.1016/j.wasman.2024.07.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 07/12/2024] [Accepted: 07/29/2024] [Indexed: 08/13/2024]
Abstract
Preparation of activated carbons is an important way to utilize municipal sludge (MS) resources, while drying is a pretreatment method for making activated carbons from MS. In this study, machine learning techniques were used to develop moisture ratio (MR) and composting temperature (CT) prediction models for the thermally assisted biodrying process of MS. First, six machine learning (ML) models were used to construct the MR and CT prediction models, respectively. Then the hyperparameters of the ML models were optimized using the Bayesian optimization algorithm, and the prediction performances of these models after optimization were compared. Finally, the effect of each input feature on the model was also evaluated using SHapley Additive exPlanations (SHAP) analysis and Partial Dependence Plots (PDPs) analysis. The results showed that Gaussian process regression (GPR) was the best model for predicting MR and CT, with R2 of 0.9967 and 0.9958, respectively, and root mean square errors (RMSE) of 0.0059 and 0.354 ℃. In addition, graphical user interface software was developed to facilitate the use of the GPR model for predicting MR and CT by researchers and engineers. This study contributes to the rapid prediction, improvement, and optimization of MR and CT during thermally assisted biodrying of MS, and also provides valuable guidance for the dynamic regulation of the drying process.
Collapse
Affiliation(s)
- Kaiqiang Zhang
- College of Mechanical Engineering, Qinghai University, Xining, Qinghai 810016, China
| | - Ningfung Wang
- College of Chemical Engineering, Qinghai University, Xining, Qinghai 810016, China; Key Laboratory of Salt Lake Chemical Materials Qinghai Province, Xining, Qinghai 810016, China.
| |
Collapse
|
27
|
Tayara A, Shang C, Zhao J, Xiang Y. Machine learning models for predicting the rejection of organic pollutants by forward osmosis and reverse osmosis membranes and unveiling the rejection mechanisms. WATER RESEARCH 2024; 266:122363. [PMID: 39244867 DOI: 10.1016/j.watres.2024.122363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 08/16/2024] [Accepted: 08/29/2024] [Indexed: 09/10/2024]
Abstract
While forward osmosis (FO) and reverse osmosis (RO) processes have been proven effective in rejecting organic pollutants, the rejection rate is highly dependent on compound and membrane characteristics, as well as operating conditions. This study aims to establish machine learning (ML) models for predicting the rejection of organic pollutants by FO and RO and providing insights into the underlying rejection mechanisms. Among the 14 ML models established, the random forest model (R2 = 0.85) and extreme gradient boosting model (R2 = 0.92) emerged as the best-performing models for FO and RO, respectively. Shapley additive explanations (SHAP) analysis identified the length of the compound, water flux, and hydrophobicity as the top three variables contributing to the FO model. For RO, in addition to the length of the compound and operating pressure, advanced variables including four molecular descriptors (e.g., ATSC2m and Balaban J) and three fingerprints (e.g., C=C double bond and carbonyl group) significantly contributed to the prediction. Besides, the associations between these highly ranked variables and their SHAP values shed light on the rejection mechanisms, such as size exclusion, adsorption, hydrophobic interaction, and electrostatic interaction, and illustrate the role of the operating parameters, such as the FO permeate water flux and RO operating pressure, in the rejection process. These findings provide interpretable predictive models for the removal of organic pollutants and advance the mechanistic understanding of the rejection mechanisms in the FO and RO processes.
Collapse
Affiliation(s)
- Adel Tayara
- Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon 000, Hong Kong Special Administrative Region of China
| | - Chii Shang
- Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon 000, Hong Kong Special Administrative Region of China; Hong Kong Branch of Chinese National Engineering Research Center for Control & Treatment of Heavy Metal Pollution, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon 000, Hong Kong Special Administrative Region of China
| | - Jing Zhao
- Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon 000, Hong Kong Special Administrative Region of China
| | - Yingying Xiang
- Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon 000, Hong Kong Special Administrative Region of China.
| |
Collapse
|
28
|
Feng S, Liu G, Shan T, Li K, Lai S. Predicting green technology innovation in the construction field from a technology convergence perspective: A two-stage predictive approach based on interpretable machine learning. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 372:123203. [PMID: 39549448 DOI: 10.1016/j.jenvman.2024.123203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 09/29/2024] [Accepted: 11/01/2024] [Indexed: 11/18/2024]
Abstract
The construction industry, as a major global energy consumer and carbon emitter, plays a crucial role in achieving global sustainability. A key strategy for the green transformation of this industry-without compromising development-involves fostering green technology innovation. Nevertheless, existing studies exhibit a notable gap in identifying and evaluating potential green technology innovation opportunities within the construction field, leading to a scarcity of decision-making information for governments and innovation entities during the research and development stage. Recognizing this, our study proposes a two-stage technology opportunity prediction approach based on interpretable machine learning from the perspective of technology convergence. Diverging from previous methods, it not only predicts the probability of technology opportunity occurrence but also forecasts the technical impact of convergence opportunities. By analysing 600,442 patent documents in the green and construction fields, we identify 305 high-potential technology convergence opportunities. Our results reveal that technologies such as carbon capture and storage, pollution alarms, solar energy, forestry techniques, wind energy, energy-saving methods, and waste materials for water treatment have significant potential for convergence with construction technologies. Additionally, we analyse the influencing factors behind these convergence innovations, finding that technical similarity and proximity play crucial roles. These findings provide robust decision support for governments and industry stakeholders in formulating scientifically grounded green technology innovation strategies, thereby accelerating the green transformation of the construction industry and contributing to the goal of sustainable development.
Collapse
Affiliation(s)
- Shuai Feng
- School of Management Science and Real Estate, Chongqing University, No.174, Shazheng Street, Shapingba District, Chongqing, 400044, PR China
| | - Guiwen Liu
- School of Management Science and Real Estate, Chongqing University, No.174, Shazheng Street, Shapingba District, Chongqing, 400044, PR China
| | - Tianlong Shan
- School of Management Science and Real Estate, Chongqing University, No.174, Shazheng Street, Shapingba District, Chongqing, 400044, PR China
| | - Kaijian Li
- School of Management Science and Real Estate, Chongqing University, No.174, Shazheng Street, Shapingba District, Chongqing, 400044, PR China.
| | - Sha Lai
- School of Management Science and Real Estate, Chongqing University, No.174, Shazheng Street, Shapingba District, Chongqing, 400044, PR China
| |
Collapse
|
29
|
Shi C, Zhuang N, Li Y, Xiong J, Zhang Y, Ding C, Liu H. Identifying factors influencing reservoir eutrophication using interpretable machine learning combined with shoreline morphology and landscape hydrological features: A case study of Danjiangkou Reservoir, China. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 951:175450. [PMID: 39134270 DOI: 10.1016/j.scitotenv.2024.175450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/31/2024] [Accepted: 08/09/2024] [Indexed: 08/17/2024]
Abstract
Reservoir nearshore areas are influenced by both terrestrial and aquatic ecosystems, making them sensitive regions to water quality changes. The analysis of basin landscape hydrological features provides limited insight into the spatial heterogeneity of eutrophication in these areas. The complex characteristics of shoreline morphology and their impact on eutrophication are often overlooked. To comprehensively analyze the complex relationships between shoreline morphology and landscape hydrological features, with eutrophication, this study uses Danjiangkou Reservoir as a case study. Utilizing Landsat 8 OLI remote sensing data from 2013 to 2022, combined with a semi-analytical approach, the spatial distribution of the Trophic State Index (TSI) during flood discharge periods (FDPs) and water storage periods (WSPs) was obtained. Using Extreme Gradient Boosting (XGBoost) and SHapley Additive exPlanations (SHAP), explained the relationships between landscape composition, landscape configuration, hydrological topography, shoreline morphology, and TSI, identified key factors at different spatial scales and validated their reliability. The results showed that: (1) There is significant spatial heterogeneity in the TSI distribution of Danjiangkou Reservoir. The eutrophication levels are significant in the shoreline and bay areas, with a tendency to extend inward only during the WSPs. (2) The importance of landscape composition, landscape configuration, hydrological topography, and shoreline morphology to TSI variations during the FDPs are 25.12 %, 29.6 %, 23.09 %, and 22.19 % respectively. Besides shoreline distance, the Landscape Shape Index (LSI) and Hypsometric Integral (HI) are the two most significant environmental variables overall during the FDPs. Forest and grassland areas become the most influential factors during the WSPs. The influence of landscape patterns and hydrological topography on TSI varies at different spatial scales. At the 200 m riparian buffer zone, the increase in cropland and impervious areas significantly elevates eutrophication levels. (3) Morphology complexity, shows a noticeable threshold effect on TSI, with complex shoreline morphology increasing the risk of eutrophication.
Collapse
Affiliation(s)
- Chenyi Shi
- Faculty of Resources and Environmental Science, Hubei University, Wuhan 430062, China; Hubei Key Laboratory of Regional Development and Environmental Response, Hubei University, Wuhan 430062, China
| | - Nana Zhuang
- Faculty of Resources and Environmental Science, Hubei University, Wuhan 430062, China
| | - Yiheng Li
- Faculty of Resources and Environmental Science, Hubei University, Wuhan 430062, China
| | - Jing Xiong
- Ecological Environment Monitoring Center Station of Hubei Province, Wuhan 430071, China
| | - Yuan Zhang
- Ecological Environment Monitoring Center Station of Hubei Province, Wuhan 430071, China
| | - Conghui Ding
- Faculty of Resources and Environmental Science, Hubei University, Wuhan 430062, China
| | - Hai Liu
- Faculty of Resources and Environmental Science, Hubei University, Wuhan 430062, China; Hubei Key Laboratory of Regional Development and Environmental Response, Hubei University, Wuhan 430062, China.
| |
Collapse
|
30
|
Zhang J, Dong L, Huang H, Hua P. Elucidating and forecasting the organochlorine pesticides in suspended particulate matter by a two-stage decomposition based interpretable deep learning approach. WATER RESEARCH 2024; 266:122315. [PMID: 39217646 DOI: 10.1016/j.watres.2024.122315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 07/01/2024] [Accepted: 08/21/2024] [Indexed: 09/04/2024]
Abstract
Accurately predicting the concentration of organochlorine pesticides (OCPs) presents a challenge due to their complex sources and environmental behaviors. In this study, we introduced a novel and advanced model that combined the power of three distinct techniques: Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), Variational Mode Decomposition (VMD), and a deep learning network of Long Short-Term Memory (LSTM). The objective is to characterize the variation in OCPs concentrations with high precision. Results show that the hybrid two-stage decomposition coupled models achieved an average symmetric mean absolute percentage error (SMAPE) of 23.24 % in the empirical analysis of typical surface water. It exhibited higher predictive power than the given individual benchmark models, which yielded an average SMAPE of 40.88 %, and single decomposition coupled models with an average SMAPE of 29.80 %. The proposed CEEMDAN-VMD-LSTM model, with an average SMAPE of 13.55 %, consistently outperformed the other models, yielding an average SMAPE of 33.53 %. A comparative analysis with shallow neural network methods demonstrated the advantages of the LSTM algorithm when coupled with secondary decomposition techniques for processing time series datasets. Furthermore, the interpretable analysis derived by the SHAP approach revealed that precipitation followed by the total phosphorus had strong effects on the predicted concentration of OCPs in the given water. The data presented herein shows the effectiveness of decomposition technique-based deep learning algorithms in capturing the dynamic characteristics of pollutants in surface water.
Collapse
Affiliation(s)
- Jin Zhang
- The National Key Laboratory of Water Disaster Prevention, Yangtze Institute for Conservation and Development, Hohai University, 210098, Nanjing, China
| | - Liang Dong
- School of Environment and Energy, South China University of Technology, 510006, Guangzhou, China
| | - Hai Huang
- Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, 510006 Guangzhou, China; School of Environment, South China Normal University, University Town, 510006 Guangzhou, China
| | - Pei Hua
- Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, 510006 Guangzhou, China; School of Environment, South China Normal University, University Town, 510006 Guangzhou, China.
| |
Collapse
|
31
|
Zhou M, Li Y. Spatial patterns and mechanism of the impact of soil salinity on potentially toxic elements in coastal areas. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 951:175802. [PMID: 39197776 DOI: 10.1016/j.scitotenv.2024.175802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 08/18/2024] [Accepted: 08/24/2024] [Indexed: 09/01/2024]
Abstract
Soil salinization and heavy metal pollution in the Yellow River Delta region have elicited increasing concern. Therefore, revealing the underlying mechanism of the impact of soil salinity on potential toxic elements (PTEs) is crucial for environmental protection and the rational utilization of resources in this area. In this study, we employed CatBoost-SHAP and multiscale geographically weighted regression (MGWR) models to comprehensively investigate the spatial effects of soil electrical conductivity (EC1:5) on PTEs. Additionally, we employed a space-for-time substitution strategy with the aim of investigating how increasing soil salinity, represented by EC1:5, K+, Na+, Ca2+, and Mg2+, affects the bioavailability of PTEs over time. The primary findings are as follows: (1) for most PTEs, the influence of soil EC1:5 on the bioavailable forms of these elements surpassed its impact on their total concentrations. (2) The results of the MGWR model indicated that exchangeable Ca (aCa) in the soils of the eastern coastal areas markedly increased the bioavailable Cd (aCd), bioavailable Cu (aCu), and bioavailable Zn (aZn). (3) When the soil EC1:5 ranges between 2 and 6 dS/m, exchangeable Na (aNa) primarily competed for the adsorption sites of bioavailable Pb (aPb). However, as the soil EC1:5 increases to 6-10 dS/m, exchangeable Mg (aMg) and aCa became the primary competing ions, with aMg playing a more significant role than aCa. These findings provide valuable theoretical insights and practical guidance for saline-alkali soil improvement and PTEs pollution control in the Yellow River Delta region, thereby providing a foundation for sustainable environmental management and resource utilization.
Collapse
Affiliation(s)
- Mengge Zhou
- Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yonghua Li
- Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China.
| |
Collapse
|
32
|
Liu X, Xie Z, Zhang Y, Huang J, Kuang L, Li X, Li H, Zou Y, Xiang T, Yin N, Zhou X, Yu J. Machine learning for predicting in-hospital mortality in elderly patients with heart failure combined with hypertension: a multicenter retrospective study. Cardiovasc Diabetol 2024; 23:407. [PMID: 39548495 DOI: 10.1186/s12933-024-02503-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 11/04/2024] [Indexed: 11/18/2024] Open
Abstract
BACKGROUND Heart failure combined with hypertension is a major contributor for elderly patients (≥ 65 years) to in-hospital mortality. However, there are very few models to predict in-hospital mortality in such elderly patients. We aimed to develop and test an individualized machine learning model to assess risk factors and predict in-hospital mortality in in these patients. METHODS From January 2012 to December 2021, this study collected data on elderly patients with heart failure and hypertension from the Chongqing Medical University Medical Data Platform. Least absolute shrinkage and the selection operator was used for recognizing key clinical variables. The optimal predictive model was chosen among eight machine learning algorithms on the basis of area under curve. SHapley Additive exPlanations and Local Interpretable Model-agnostic Explanations was employed to interpret the outcome of the predictive model. RESULTS This study ultimately comprised 4647 elderly individuals with hypertension and heart failure. The Random Forest model was chosen with the highest area under curve for 0.850 (95% CI 0.789-0.897), high accuracy for 0.738, recall 0.837, specificity 0.734 and brier score 0.178. According to SHapley Additive exPlanations results, the most related factors for in-hospital mortality in elderly patients with heart failure and hypertension were urea, length of stay, neutrophils, albumin and high-density lipoprotein cholesterol. CONCLUSIONS This study developed eight machine learning models to predict in-hospital mortality in elderly patients with hypertension as well as heart failure. Compared to other algorithms, the Random Forest model performed significantly better. Our study successfully predicted in-hospital mortality and identified the factors most associated with in-hospital mortality.
Collapse
Affiliation(s)
- Xiaozhu Liu
- Department of Cardiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
- Emergency and Critical Care Medical Center, Beijing Shijitan Hospital, Capital Medical University, Beijing, China
| | - Zulong Xie
- Department of Cardiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Yang Zhang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Jian Huang
- Department of Diagnostic Ultrasound, Sir Run Run Shaw Hospital, Zhejiang University College of Medicine, Hangzhou, China
| | - Lirong Kuang
- Department of Ophthalmology, Wuhan Wuchang Hospital (Wuchang Hospital Affiliated to Wuhan University of Science and Technology), Wuhan, China
| | - Xiujuan Li
- Department of Radiology, The Affiliated Taian City Central Hospital of Qingdao University, Taian, China
| | - Huan Li
- Chongqing College of Electronic Engineering, Chongqing, China
| | - Yuxin Zou
- The Second Clinical College, Chongqing Medical University, Chongqing, China
| | - Tianyu Xiang
- Information Center, The University-Town Hospital of Chongqing Medical University, Chongqing, China
| | - Niying Yin
- Department of blood transfusion, Suqian First Hospital, Suqian, China.
| | - Xiaoqian Zhou
- Department of Cardiovascular, The Third Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
| | - Jie Yu
- Department of Radiology, The Affiliated Taian City Central Hospital of Qingdao University, Taian, China.
| |
Collapse
|
33
|
Yang S, Ma Y, Gao J, Wang X, Weng F, Zhang Y, Xu Y. Exploring the response and prediction of phytoplankton to environmental factors in eutrophic marine areas using interpretable machine learning methods. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 951:175600. [PMID: 39159687 DOI: 10.1016/j.scitotenv.2024.175600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 08/10/2024] [Accepted: 08/15/2024] [Indexed: 08/21/2024]
Abstract
Coastal marine areas are frequently affected by human activities and face ecological and environmental threats, such as algal blooms and climate change. The community structure of phytoplankton-primary producers in marine ecosystems-is highly sensitive to environmental factors, such as temperature, salinity, and nutrients. However, traditional methods for exploring the relationship between phytoplankton communities and environmental factors in eutrophic marine areas are limited by various factors. Therefore, this study employed interpretable machine learning models, integrating high-dimensional data analysis and complex system modeling, to quantitatively and thoroughly analyze the dynamic relationship between phytoplankton communities and environmental variables in high-frequency samples collected over 53 weeks from eutrophic marine areas. The cell abundance of phytoplankton exhibited a distinct "two-peak pattern" variation. Interpretable machine learning model analysis revealed the dynamic contributions of different environmental factors during changes in the phytoplankton community structure. The results showed that temperature was a key environmental factor that affected phytoplankton growth during peak periods. In addition, the contribution of salinity increased during the second peak in phytoplankton abundance, highlighting its central role in the ecological dynamics of this phase. During green tide outbreaks, particularly in Area 01, the contributions of factors such as temperature and salinity increased, whereas those of phosphates and silicates decreased, indicating that green tide outbreaks substantially altered the nutritional dynamics of the ecosystem. Furthermore, different phytoplankton species, such as Skeletonema costatum, Thalassiosira spp., and Nitzschia spp., exhibit varying responses to environmental factors. Hence, the predictions made using random forest and generalized additive models for phytoplankton cell abundance in two marine areas revealed complex nonlinear relationships between environmental factors, such as temperature, salinity, and phytoplankton abundance.
Collapse
Affiliation(s)
- Shimin Yang
- College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.
| | - Yuanting Ma
- College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.
| | - Jie Gao
- College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Xiajie Wang
- College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Futian Weng
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China; Data Mining Research Center, Xiamen University, Xiamen 361005, China
| | - Yan Zhang
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Yan Xu
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.
| |
Collapse
|
34
|
Simonetti I, Lubello C, Cappietti L. On the use of hydrodynamic modelling and random forest classifiers for the prediction of hypoxia in coastal lagoons. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 951:175424. [PMID: 39142405 DOI: 10.1016/j.scitotenv.2024.175424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 07/19/2024] [Accepted: 08/07/2024] [Indexed: 08/16/2024]
Abstract
Hypoxia is one of the fundamental threats to water quality globally, particularly for partially enclosed basins with limited water renewal, such as coastal lagoons. This work proposes the combined use of a machine learning technique, field observations, and data derived from a hydrodynamic and heat exchange numerical model to predict, and forecast up to 10 days in advance, the occurrence of hypoxia in a eutrophic coastal lagoon. The random forest machine learning algorithm is used, training and validating a set of models to classify dissolved oxygen levels in the lagoon. The Orbetello lagoon, in the central Mediterranean Sea (Italy), has provided a test case for assessing the reliability of the proposed methodology. Results proved that the methodology is effective in providing a reliable short-term evaluation of DO levels, with a high resolution in both time and space throughout an entire lagoon. An overall classification accuracy of up to 91 % was found in the models, with a score for identifying the occurrence of severe hypoxia - i.e. hourly DO levels lower than 2 mg/l - of 86 %. The use of predictors extracted from a numerical hydrodynamic model allows us to overcome the intrinsic limitation of machine learning modelling approaches which rely on input data from relatively few, local field measurements, i.e. the inability to capture the spatial heterogeneity of DO distributions, unless several measuring points are available. The methodological approach is proposed for application to similar eutrophic environments.
Collapse
Affiliation(s)
- Irene Simonetti
- Dept. of Civil and Environmental Engineering, University of Florence, Italy.
| | - Claudio Lubello
- Dept. of Civil and Environmental Engineering, University of Florence, Italy
| | - Lorenzo Cappietti
- Dept. of Civil and Environmental Engineering, University of Florence, Italy
| |
Collapse
|
35
|
Chen C, Quan J, Chen X, Yang T, Yu C, Ye S, Yang Y, Wu X, Jiang D, Weng Y. Explore key genes of Crohn's disease based on glycerophospholipid metabolism: A comprehensive analysis Utilizing Mendelian Randomization, Multi-Omics integration, Machine Learning, and SHAP methodology. Int Immunopharmacol 2024; 141:112905. [PMID: 39173401 DOI: 10.1016/j.intimp.2024.112905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Revised: 07/25/2024] [Accepted: 08/05/2024] [Indexed: 08/24/2024]
Abstract
BACKGROUND AND AIMS Crohn's disease (CD) is a chronic, complex inflammatory condition with increasing incidence and prevalence worldwide. However, the causes of CD remain incompletely understood. We identified CD-related metabolites, inflammatory factors, and key genes by Mendelian randomization (MR), multi-omics integration, machine learning (ML), and SHAP. METHODS We first performed a mediation MR analysis on 1400 serum metabolites, 91 inflammatory factors, and CD. We found that certain phospholipids are causally related to CD. In the scRNA-seq data, monocytes were categorized into high and low metabolism groups based on their glycerophospholipid metabolism scores. The differentially expressed genes of these two groups of cells were extracted, and transcription factor prediction, cell communication analysis, and GSEA analysis were performed. After further screening of differentially expressed genes (FDR<0.05, log2FC>1), least absolute shrinkage and selection operator (LASSO) regression was performed to obtain hub genes. Models for hub genes were built using the Catboost, XGboost, and NGboost methods. Further, we used the SHAP method to interpret the models and obtain the gene with the highest contribution to each model. Finally, qRT-PCR was used to verify the expression of these genes in the peripheral blood mononuclear cells (PBMC) of CD patients and healthy subjects. RESULT MR results showed 1-palmitoyl-2-stearoyl-gpc (16:0/18:0) levels, 1-stearoyl-2-arachidonoyl-GPI (18:0/20:4) levels, 1-arachidonoyl-gpc (20:4n6) levels, 1-palmitoyl-2-arachidonoyl-gpc (16:0/20:4n6) levels, and 1-arachidonoyl-GPE (20:4n6) levels were significantly associated with CD risk reduction (FDR<0.05), with CXCL9 acting as a mediation between these phospholipids and CD. The analysis identified 19 hub genes, with Catboost, XGboost, and NGboost achieving AUC of 0.91, 0.88, and 0.85, respectively. The SHAP methodology obtained the three genes with the highest model contribution: G0S2, S100A8, and PLAUR. The qRT-PCR results showed that the expression levels of S100A8 (p = 0.0003), G0S2 (p < 0.0001), and PLAUR (p = 0.0141) in the PBMC of CD patients were higher than healthy subjects. CONCLUSION MR findings suggest that certain phospholipids may lower CD risk. G0S2, S100A8, and PLAUR may be potential pathogenic genes in CD. These phospholipids and genes could serve as novel diagnostic and therapeutic targets for CD.
Collapse
Affiliation(s)
- Changan Chen
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Juanhua Quan
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Xintian Chen
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Tingmei Yang
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Caiyuan Yu
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Shicai Ye
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Yuping Yang
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Xiu Wu
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China
| | - Danxian Jiang
- Department of Medical Oncology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China.
| | - Yijie Weng
- Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, PR China.
| |
Collapse
|
36
|
Yang Y, Li C, Yang L, Zhu H, Xie Z, Falandysz J, Weber R, Qin L, Liu G. Linking industrial emissions and dietary exposure to human burdens of polychlorinated naphthalenes. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 951:175733. [PMID: 39181249 DOI: 10.1016/j.scitotenv.2024.175733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Revised: 08/21/2024] [Accepted: 08/21/2024] [Indexed: 08/27/2024]
Abstract
Relationships between toxic pollutant emissions during industrial processes and toxic pollutant dietary intakes and adverse health burdens have not yet been quantitatively clarified. Polychlorinated naphthalenes (PCNs) are typical industrial pollutants that are carcinogenic and of increasing concern. In this study, we established an interpretable machine learning model for quantifying the contributions of industrial emissions and dietary intakes of PCNs to health effects. We used the SHapley Additive exPlanations model to achieve individualized interpretability, enabling us to evaluate the specific contributions of individual feature values towards PCNs concentration levels. A strong relationship between PCN dietary intake and body burden was found using a robust large-scale PCN diet survey database for China containing the results of the analyses of 17,280 dietary samples and 4480 breast milk samples. Industrial emissions and dietary intake contributed 12 % and 52 %, respectively, of the PCN burden in breast milk. The model quantified the contributions of food consumption and industrial emissions to PCN exposure, which will be useful for performing accurate health risk assessments and developing reduction strategies of PCNs.
Collapse
Affiliation(s)
- Yujue Yang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, PR China
| | - Cui Li
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, PR China
| | - Lili Yang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, PR China.
| | - Hao Zhu
- Tulane University, 205 Richardson, New Orleans, LA 70118, USA
| | - Zhiyong Xie
- Institute of Coastal Environmental Chemistry, Helmholtz-Zentrum Hereon, Geesthacht 21502, Germany
| | - Jerzy Falandysz
- Department of Toxicology, Medical University of Lodz, Muszyńskiego 1, 90-15 Łódź, Poland
| | - Roland Weber
- POPs Environmental Consulting, Schwäbisch Gmünd 73527, Germany
| | - Linjun Qin
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, PR China
| | - Guorui Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, PR China; School of Environment, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, PR China
| |
Collapse
|
37
|
Moon J, Maqsood M, So D, Baik SW, Rho S, Nam Y. Advancing ensemble learning techniques for residential building electricity consumption forecasting: Insight from explainable artificial intelligence. PLoS One 2024; 19:e0307654. [PMID: 39541326 PMCID: PMC11563398 DOI: 10.1371/journal.pone.0307654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 07/09/2024] [Indexed: 11/16/2024] Open
Abstract
Accurate electricity consumption forecasting in residential buildings has a direct impact on energy efficiency and cost management, making it a critical component of sustainable energy practices. Decision tree-based ensemble learning techniques are particularly effective for this task due to their ability to process complex datasets with high accuracy. Furthermore, incorporating explainable artificial intelligence into these predictions provides clarity and interpretability, allowing energy managers and homeowners to make informed decisions that optimize usage and reduce costs. This study comparatively analyzes decision tree-ensemble learning techniques augmented with explainable artificial intelligence for transparency and interpretability in residential building energy consumption forecasting. This approach employs the University Residential Complex and Appliances Energy Prediction datasets, data preprocessing, and decision-tree bagging and boosting methods. The superior model is evaluated using the Shapley additive explanations method within the explainable artificial intelligence framework, explaining the influence of input variables and decision-making processes. The analysis reveals the significant influence of the temperature-humidity index and wind chill temperature on short-term load forecasting, transcending traditional parameters, such as temperature, humidity, and wind speed. The complete study and source code have been made available on our GitHub repository at https://github.com/sodayeong for the purpose of enhancing precision and interpretability in energy system management, thereby promoting transparency and enabling replication.
Collapse
Affiliation(s)
- Jihoon Moon
- Department of AI and Big Data, Soonchunhyang University, Asan, Republic of Korea
- Department of ICT Convergence, Soonchunhyang University, Asan, Republic of Korea
| | - Muazzam Maqsood
- Department of Computer Science, COMSATS University Islamabad, Attock Campus, Attock, Pakistan
| | - Dayeong So
- Department of ICT Convergence, Soonchunhyang University, Asan, Republic of Korea
| | | | - Seungmin Rho
- Department of Industrial Security, Chung-Ang University, Seoul, Republic of Korea
| | - Yunyoung Nam
- Department of ICT Convergence, Soonchunhyang University, Asan, Republic of Korea
- Department of Computer Science and Engineering, Soonchunhyang University, Asan, Republic of Korea
| |
Collapse
|
38
|
Proks M, Salehin N, Brickman JM. Deep learning-based models for preimplantation mouse and human embryos based on single-cell RNA sequencing. Nat Methods 2024:10.1038/s41592-024-02511-3. [PMID: 39543284 DOI: 10.1038/s41592-024-02511-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 10/15/2024] [Indexed: 11/17/2024]
Abstract
The rapid growth of single-cell transcriptomic technology has produced an increasing number of datasets for both embryonic development and in vitro pluripotent stem cell-derived models. This avalanche of data surrounding pluripotency and the process of lineage specification has meant it has become increasingly difficult to define specific cell types or states in vivo, and compare these with in vitro differentiation. Here we utilize a set of deep learning tools to integrate and classify multiple datasets. This allows the definition of both mouse and human embryo cell types, lineages and states, thereby maximizing the information one can garner from these precious experimental resources. Our approaches are built on recent initiatives for large-scale human organ atlases, but here we focus on material that is difficult to obtain and process, spanning early mouse and human development. Using publicly available data for these stages, we test different deep learning approaches and develop a model to classify cell types in an unbiased fashion at the same time as defining the set of genes used by the model to identify lineages, cell types and states. We used our models trained on in vivo development to classify pluripotent stem cell models for both mouse and human development, showcasing the importance of this resource as a dynamic reference for early embryogenesis.
Collapse
Affiliation(s)
- Martin Proks
- The Novo Nordisk Foundation Center for Stem Cell Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Nazmus Salehin
- The Novo Nordisk Foundation Center for Stem Cell Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Joshua M Brickman
- The Novo Nordisk Foundation Center for Stem Cell Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
39
|
Chen Y, Zhao G, Yoon S, Habibi P, Hong CS, Li S, Moultos OA, Dey P, Vlugt TJH, Chung YG. Computational Exploration of Adsorption-Based Hydrogen Storage in Mg-Alkoxide Functionalized Covalent-Organic Frameworks (COFs): Force-Field and Machine Learning Models. ACS APPLIED MATERIALS & INTERFACES 2024; 16:61995-62009. [PMID: 39475372 DOI: 10.1021/acsami.4c11953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2024]
Abstract
Hydrogen is a clean-burning fuel that can be converted to other forms. of energy without generating any greenhouse gases. Currently, hydrogen is stored either by compression to high pressure (>700 bar) or cryogenic cooling to liquid form (<23 K). Therefore, it is essential to develop safe, reliable, and energy-efficient storage technology that can store hydrogen at lower pressures and temperatures. In this work, we systematically designed 2902 Mg-alkoxide-functionalized covalent-organic frameworks (COFs) and performed high-throughput (HT) computational screening for hydrogen storage applications at 111, 231, and 296 K. To accurately model the interaction between Mg-alkoxide sites and molecular hydrogen, we performed MP2 calculations to compute the hydrogen binding energy for different types of functionalized models, and the data were subsequently used to fit modified-Morse force field (FF) parameters. Using the developed FF models, we conducted HT grand canonical Monte Carlo (GCMC) simulations to compute hydrogen uptakes for both original and functionalized COFs. The generated data were subsequently used to evaluate the materials' gravimetric and volumetric storage performance at various temperatures (111, 231, and 296 K). Finally, we developed machine learning (ML) models to predict the hydrogen storage performance of functionalized structures based on the features of the original structures. The developed model showed excellent performance with a mean absolute error (MAE) of 0.061 wt % and 0.456 g/L for predicting the gravimetric and volumetric deliverable capacities, enabling a quick evaluation of structures in a hypothetical COF database. The screening results demonstrated that the Mg-alkoxide functionalization yields greater improvements in volumetric H2 storage capacities for COFs with smaller pores compared to those with larger (mesoporous) pores.
Collapse
Affiliation(s)
- Yu Chen
- School of Chemical Engineering, Pusan National University, Busan 46241, Republic of Korea
| | - Guobin Zhao
- School of Chemical Engineering, Pusan National University, Busan 46241, Republic of Korea
| | - Sunghyun Yoon
- School of Chemical Engineering, Pusan National University, Busan 46241, Republic of Korea
| | - Parsa Habibi
- Engineering Thermodynamics, Process & Energy Department, Faculty of Mechanical Engineering, Delft University of Technology, Leeghwaterstraat 39, 2628 CB Delft, The Netherlands
| | - Chang Seop Hong
- Department of Chemistry, Korea University, Seoul 02841, Republic of Korea
| | - Song Li
- Department of New Energy and Science Engineering, School of Energy and Power Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Othonas A Moultos
- Engineering Thermodynamics, Process & Energy Department, Faculty of Mechanical Engineering, Delft University of Technology, Leeghwaterstraat 39, 2628 CB Delft, The Netherlands
| | - Poulumi Dey
- Materials Science and Engineering Department, Faculty of Mechanical Engineering, Delft University of Technology, Merkelweg 2, 2628 CD Delft, The Netherlands
| | - Thijs J H Vlugt
- Engineering Thermodynamics, Process & Energy Department, Faculty of Mechanical Engineering, Delft University of Technology, Leeghwaterstraat 39, 2628 CB Delft, The Netherlands
| | - Yongchul G Chung
- School of Chemical Engineering, Pusan National University, Busan 46241, Republic of Korea
| |
Collapse
|
40
|
Shetty S, Hamer PD, Stebel K, Kylling A, Hassani A, Berntsen TK, Schneider P. Daily high-resolution surface PM 2.5 estimation over Europe by ML-based downscaling of the CAMS regional forecast. ENVIRONMENTAL RESEARCH 2024:120363. [PMID: 39547565 DOI: 10.1016/j.envres.2024.120363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 10/31/2024] [Accepted: 11/12/2024] [Indexed: 11/17/2024]
Abstract
Fine particulate matter (PM2.5) is a key air quality indicator due to its adverse health impacts. Accurate PM2.5 assessment requires high-resolution (e.g., atleast 1 km) daily data, yet current methods face challenges in balancing accuracy, coverage, and resolution. Chemical transport models such as those from the Copernicus Atmosphere Monitoring Service (CAMS) offer continuous data but their relatively coarse resolution can introduce uncertainties. Here we present a synergistic Machine Learning (ML)-based approach called S-MESH (Satellite and ML-based Estimation of Surface air quality at High resolution) for estimating daily surface PM2.5 over Europe at 1 km spatial resolution and demonstrate its performance for the years 2021 and 2022. The approach enhances and downscales the CAMS regional ensemble 24h PM2.5 forecast by training a stacked XGBoost model against station observations, effectively integrating satellite-derived data and modeled meteorological variables. Overall, against station observations, S-MESH (mean absolute error (MAE) of 3.54 μg/m3) shows higher accuracy than the CAMS forecast (MAE of 4.18 μg/m3) and is approaching the accuracy of the CAMS regional interim reanalysis (MAE of 3.21 μg/m3), while exhibiting a significantly reduced mean bias (MB of -0.3 μg/m3 vs. -1.5 μg/m3 for the reanalysis). At the same time, S-MESH requires substantially less computational resources and processing time. At concentrations >20 μg/m3, S-MESH outperforms the reanalysis (MB of -7.3 μg/m3 and -10.3 μg/m3 respectively), and reliably captures high pollution events in both space and time. In the eastern study area, where the reanalysis often underestimates, S-MESH better captures high levels of PM2.5 mostly from residential heating. S-MESH effectively tracks day-to-day variability, with a temporal relative absolute error of 5% (reanalysis 10%). Exhibiting good performance at high pollution events coupled with its high spatial resolution and rapid estimation speed, S-MESH can be highly relevant for air quality assessments where both resolution and timeliness are critical.
Collapse
Affiliation(s)
- Shobitha Shetty
- NILU, Kjeller, Norway; Department of Geosciences, University of Oslo, Oslo, Norway.
| | | | | | | | | | | | | |
Collapse
|
41
|
Kou M, Ma H, Wang X, Heianza Y, Qi L. Plasma proteomics-based brain aging signature and incident dementia risk. GeroScience 2024:10.1007/s11357-024-01407-6. [PMID: 39532828 DOI: 10.1007/s11357-024-01407-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 10/18/2024] [Indexed: 11/16/2024] Open
Abstract
Investigating brain-enriched proteins with machine learning methods may enable a brain-specific understanding of brain aging and provide insights into the molecular mechanisms and pathological pathways of dementia. The study aims to analyze associations of brain-specific plasma proteomic aging signature with risks of incident dementia. In 45,429 dementia-free UK Biobank participants at baseline, we generated a brain-specific biological age using 63 brain-enriched plasma proteins with machine learning methods. The brain age gap was estimated, and Cox proportional hazards models were used to study the association with incident all-cause dementia, Alzheimer's disease (AD), and vascular dementia. Per-unit increment in the brain age gap z-score was associated with significantly higher risks of all-cause dementia (hazard ratio [95% confidence interval], 1.67 [1.56-1.79], P < 0.001), AD (1.85 [1.66-2.08], P < 0.001), and vascular dementia (1.86 [1.55-2.24], P < 0.001), respectively. Notably, 2.1% of the study population exhibited extreme old brain aging defined as brain age gap z-score > 2, correlating with over threefold increased risks of all-cause dementia and vascular dementia (3.42 [2.25-5.20], P < 0.001, and 3.41 [1.05-11.13], P = 0.042, respectively), and fourfold increased risk of AD (4.45 [2.32-8.54], P < 0.001). The associations were stronger among participants with healthier lifestyle factors (all P-interaction < 0.05). These findings were corroborated by magnetic resonance imaging assessments showing that a higher brain age gap aligns global pathophysiology of dementia, including global and regional atrophy in gray matter, and white matter lesions (P < 0.001). The brain-specific proteomic age gap is a powerful biomarker of brain aging, indicative of dementia risk and neurodegeneration.
Collapse
Affiliation(s)
- Minghao Kou
- Department of Epidemiology, Celia Scott Weatherhead School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA
| | - Hao Ma
- Department of Epidemiology, Celia Scott Weatherhead School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA
| | - Xuan Wang
- Department of Epidemiology, Celia Scott Weatherhead School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA
| | - Yoriko Heianza
- Department of Epidemiology, Celia Scott Weatherhead School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Lu Qi
- Department of Epidemiology, Celia Scott Weatherhead School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA.
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
42
|
Hesse J, Boldini D, Sieber SA. Machine Learning-Driven Data Valuation for Optimizing High-Throughput Screening Pipelines. J Chem Inf Model 2024; 64:8142-8152. [PMID: 39440790 PMCID: PMC11558681 DOI: 10.1021/acs.jcim.4c01547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Revised: 10/14/2024] [Accepted: 10/15/2024] [Indexed: 10/25/2024]
Abstract
In the rapidly evolving field of drug discovery, high-throughput screening (HTS) is essential for identifying bioactive compounds. This study introduces a novel application of data valuation, a concept for evaluating the importance of data points based on their impact, to enhance drug discovery pipelines. Our approach improves active learning for compound library screening, robustly identifies true and false positives in HTS data, and identifies important inactive samples in an imbalanced HTS training, all while accounting for computational efficiency. We demonstrate that importance-based methods enable more effective batch screening, reducing the need for extensive HTS. Machine learning models accurately differentiate true biological activity from assay artifacts, streamlining the drug discovery process. Additionally, importance undersampling aids in HTS data set balancing, improving machine learning performance without omitting crucial inactive samples. These advancements could significantly enhance the efficiency and accuracy of drug development.
Collapse
Affiliation(s)
- Joshua Hesse
- Technical University of
Munich, TUM School of Natural Sciences, Department of Bioscience, Center for Functional Protein Assemblies
(CPA), 85748 Garching bei München, Germany
| | - Davide Boldini
- Technical University of
Munich, TUM School of Natural Sciences, Department of Bioscience, Center for Functional Protein Assemblies
(CPA), 85748 Garching bei München, Germany
| | - Stephan A. Sieber
- Technical University of
Munich, TUM School of Natural Sciences, Department of Bioscience, Center for Functional Protein Assemblies
(CPA), 85748 Garching bei München, Germany
| |
Collapse
|
43
|
Zhu Y, Wang F, Ning P, Zhu Y, Zhang L, Li K, Liu B, Ren H, Xu Z, Pang A, Yang X. Multimodal neuroimaging-based prediction of Parkinson's disease with mild cognitive impairment using machine learning technique. NPJ Parkinsons Dis 2024; 10:218. [PMID: 39528560 PMCID: PMC11555067 DOI: 10.1038/s41531-024-00828-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 10/23/2024] [Indexed: 11/16/2024] Open
Abstract
This study aimed to identify potential markers that can predict Parkinson's disease with mild cognitive impairment (PDMCI). We retrospectively collected general demographic data, clinically relevant scales, plasma samples, and neuroimaging data (T1-weighted magnetic resonance imaging (MRI) data as well as resting-state functional MRI [Rs-fMRI] data) from 173 individuals. Subsequently, based on the aforementioned multimodal indices, a support vector machine was employed to investigate the machine learning (ML) classification of PD patients with normal cognition (PDNC) and PDMCI. The performance of 29 classifiers was assessed based on various combinations of indicators. Results demonstrated that the optimal classifier in the validation set was composed by clinical + Rs-fMRI+ neurofilament light chain, exhibiting a mean Accuracy of 0.762, a mean area under curve of 0.840, a mean sensitivity of 0.745, along with a mean specificity of 0.783. The ML algorithm based on multimodal data demonstrated enhanced discriminative ability between PDNC and PDMCI patients.
Collapse
Affiliation(s)
- Yongyun Zhu
- Department of Neurology, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Fang Wang
- Department of Neurology, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Pingping Ning
- Department of Geriatric Neurology, Shaanxi Provincial People's Hospital, Xi'an, Shanxi, China
| | - Yangfan Zhu
- Department of Neurology, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Lingfeng Zhang
- Department of Neurology, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Kelu Li
- Department of Neurology, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Bin Liu
- Department of Neurology, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Hui Ren
- Department of Neurology, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Zhong Xu
- Department of Neurology, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Ailan Pang
- Department of Neurology, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China.
| | - Xinglong Yang
- Department of Neurology, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China.
| |
Collapse
|
44
|
Sharma RK, Jena MK, Minhas H, Pathak B. Machine-Learning-Assisted Screening of Nanocluster Electrocatalysts: Mapping and Reshaping the Activity Volcano for the Oxygen Reduction Reaction. ACS APPLIED MATERIALS & INTERFACES 2024. [PMID: 39527073 DOI: 10.1021/acsami.4c14076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
In computational heterogeneous catalysis, Sabatier's principle-based activity volcano plots provide an intuitive guide to catalyst design but impose a fundamental constraint on the maximum achievable catalytic performance. Recently, subnano clusters have emerged as an exciting platform, offering high noble metal utilization and superior performance for various reactions compared to extended surfaces, reflecting a complex structure-activity relationship in the non-scalable regime. However, understanding their non-monotonic catalytic activity, attributed to the large configurational space and their fluxional identity, poses a formidable challenge. Here, we present a machine learning (ML) framework that captures the non-monotonic trends in oxygen reduction reaction (ORR) activity at the subnanometer scale, attributed to their dynamic fluxional characteristics. We demonstrate a size-dependent shifting and reshaping of the ORR activity volcano, with Au replacing Pt at the peak. Leveraging only upon the non-ab initio geometric and electronic properties, our trained ML model accurately captures the site-specific adsorption energies of intermediates at the subnanometer regime. To account for the inconsistent trend in activity, we analyzed the correlation between electronic and geometric properties. Our findings reveal that the d-filling and coupling matrix of the neighboring metal atom significantly influences the intermediate adsorption on the local chemical environment compared to the d-band center. Following this analysis, we utilized ML to map the catalyst distribution in the activity volcano and identified the five best sub-nano electrocatalysts, demonstrating overpotential values lower than or comparable to the Pt(111) surface for the ORR. This study provides intuitive guidelines for the rational designing of highly efficient electrocatalysts for fuel cell applications while modifying the activity volcano plots for electrocatalysts at the subnanometer regime.
Collapse
Affiliation(s)
- Rahul Kumar Sharma
- Department of Chemistry, Indian Institute of Technology Indore, Indore 453552, India
| | - Milan Kumar Jena
- Department of Chemistry, Indian Institute of Technology Indore, Indore 453552, India
| | - Harpriya Minhas
- Department of Chemistry, Indian Institute of Technology Indore, Indore 453552, India
| | - Biswarup Pathak
- Department of Chemistry, Indian Institute of Technology Indore, Indore 453552, India
| |
Collapse
|
45
|
Nong X, Lai C, Chen L, Wei J. A novel coupling interpretable machine learning framework for water quality prediction and environmental effect understanding in different flow discharge regulations of hydro-projects. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 950:175281. [PMID: 39117235 DOI: 10.1016/j.scitotenv.2024.175281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 08/01/2024] [Accepted: 08/02/2024] [Indexed: 08/10/2024]
Abstract
Machine learning models (MLMs) have been increasingly used to forecast water pollution. However, the "black box" characteristic for understanding mechanism processes still limits the applicability of MLMs for water quality management in hydro-projects under complex and frequently artificial regulation. This study proposes an interpretable machine learning framework for water quality prediction coupled with a hydrodynamic (flow discharge) scenario-based Random Forest (RF) model with multiple model-agnostic techniques and quantifies global, local, and joint interpretations (i.e., partial dependence, individual conditional expectation, and accumulated local effects) of environmental factor implications. The framework was applied and verified to predict the permanganate index (CODMn) under different flow discharge regulation scenarios in the Middle Route of the South-to-North Water Diversion Project of China (MRSNWDPC). A total of 4664 sampling cases data matrices, including water quality, meteorological, and hydrological indicators from eight national stations along the main canal of the MRSNWDPC, were collected from May 2019 to December 2020. The results showed that the RF models were effective in forecasting CODMn in all flow discharge scenarios, with a mean square error, coefficient of determination, and mean absolute error of 0.006-0.026, 0.481-0.792, and 0.069-0.104, respectively, in the testing dataset. A global interpretation indicated that dissolved oxygen, flow discharge, and surface pressure are the three most important variables of CODMn. Local and joint interpretations indicated that the RF-based prediction model provides a basic understanding of the physical mechanisms of environmental systems. The proposed framework can effectively learn the fundamental environmental implications of water quality variations and provide reliable prediction performance, highlighting the importance of model interpretability for trustworthy machine learning applications in water management projects. This study provides scientific references for applying advanced data-driven MLMs to water quality forecasting and a reliable methodological framework for water quality management and similar hydro-projects.
Collapse
Affiliation(s)
- Xizhi Nong
- College of Civil Engineering and Architecture, Guangxi University, Nanning 530004, China; State Key Laboratory of Hydroscience and Engineering, Tsinghua University, Beijing 100084, China; Centre for Urban Sustainability and Resilience, Department of Civil, Environmental and Geomatic Engineering, University College London, London WC1E 6BT, UK; School of Computing and Engineering, University of West London, London W5 5RF, UK
| | - Cheng Lai
- College of Civil Engineering and Architecture, Guangxi University, Nanning 530004, China
| | - Lihua Chen
- College of Civil Engineering and Architecture, Guangxi University, Nanning 530004, China.
| | - Jiahua Wei
- State Key Laboratory of Hydroscience and Engineering, Tsinghua University, Beijing 100084, China
| |
Collapse
|
46
|
Victor A, Geremias Dos Santos H, Silva GFS, Barcellos Filho F, de Fátima Cobre A, Luzia LA, Rondó PHC, Chiavegatto Filho ADP. Predictive modeling of gestational weight gain: a machine learning multiclass classification study. BMC Pregnancy Childbirth 2024; 24:733. [PMID: 39516752 PMCID: PMC11549867 DOI: 10.1186/s12884-024-06952-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 11/04/2024] [Indexed: 11/16/2024] Open
Abstract
BACKGROUND Gestational weight gain (GWG) is a critical factor influencing maternal and fetal health. Excessive or insufficient GWG can lead to various complications, including gestational diabetes, hypertension, cesarean delivery, low birth weight, and preterm birth. This study aims to develop and evaluate machine learning models to predict GWG categories: below, within, or above recommended guidelines. METHODS We analyzed data from the Araraquara Cohort, Brazil, which comprised 1557 pregnant women with a gestational age of 19 weeks or less. Predictors included socioeconomic, demographic, lifestyle, morbidity, and anthropometric factors. Five machine learning algorithms (Random Forest, LightGBM, AdaBoost, CatBoost, and XGBoost) were employed for model development. The models were trained and evaluated using a multiclass classification approach. Model performance was assessed using metrics such as area under the ROC curve (AUC-ROC), F1 score and Matthew's correlation coefficient (MCC). RESULTS The outcomes were categorized as follows: GWG within recommendations (28.7%), GWG below (32.5%), and GWG above recommendations (38.7%). The XGBoost presented the best overall model, achieving an AUC-ROC of 0.79 for GWG within, 0.76 for GWG below, and 0.65 for GWG above. The LightGBM also performed well with an AUC-ROC of 0.79 for predicting GWG within recommendations, 0.76 for GWG below, and 0.624 for GWG above. The most important predictors of GWG were pre-gestational BMI, maternal age, glycemic profile, hemoglobin levels, and arm circumference. CONCLUSION Machine learning models can effectively predict GWG categories, offering a valuable tool for early identification of at-risk pregnancies. This approach can enhance personalized prenatal care and interventions to promote optimal pregnancy outcomes.
Collapse
Affiliation(s)
- Audêncio Victor
- School of Public Health, University of São Paulo (USP), Avenida Doutor Arnaldo, 715, São Paulo, 01246904, São Paulo, Brazil.
| | | | - Gabriel Ferreira Santos Silva
- School of Public Health, University of São Paulo (USP), Avenida Doutor Arnaldo, 715, São Paulo, 01246904, São Paulo, Brazil
| | - Fabiano Barcellos Filho
- School of Public Health, University of São Paulo (USP), Avenida Doutor Arnaldo, 715, São Paulo, 01246904, São Paulo, Brazil
| | | | - Liania A Luzia
- School of Public Health, University of São Paulo (USP), Avenida Doutor Arnaldo, 715, São Paulo, 01246904, São Paulo, Brazil
| | - Patrícia H C Rondó
- School of Public Health, University of São Paulo (USP), Avenida Doutor Arnaldo, 715, São Paulo, 01246904, São Paulo, Brazil
| | | |
Collapse
|
47
|
Newberry JA, Gimenez MA, Gunturkun F, Villa E, Maldonado M, Gonzalez D, Garcia G, Espinosa PR, Hedlin H, Kaysen D. Mental health care-seeking and barriers: a cross-sectional study of an urban Latinx community. BMC Public Health 2024; 24:3091. [PMID: 39516848 PMCID: PMC11545330 DOI: 10.1186/s12889-024-20533-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 10/28/2024] [Indexed: 11/16/2024] Open
Abstract
BACKGROUND The Latinx community faces an increasing amount of mental health challenges and disparities in care. While the contributing factors are complex, there are likely potential barriers related to connecting with mental health support and accessing care that can be addressed. METHODS To investigate barriers in connecting to mental health care, we conducted a cross-sectional survey of mental health service use and barriers in an urban community with a primarily Hispanic/Latinx ethnicity using a modified random walk approach for door-to-door data collection with a two-cluster sampling frame. Survey included questions on socio-demographic, mental health status, desire and attempt to seek care, and the Barriers to Access to Care Evaluation. Shapley additive explanation (SHAP) identified impactful barriers and demographic characteristics. Our primary outcome was the number of respondents who saw a professional in the past 12 months and the key determinants that enabled their successful connection. Secondary outcomes were people with poor mental health who had wanted or tried to seek any source of mental health support. RESULTS Of the 1004 respondents enrolled, 70.5% were foreign born; 63.4% were women. In the past 12 months, 23.8% of respondents wanted to connect with mental health care; 15.5% tried to connect, and only 11.7% successfully connected to mental health services. The two most cited barriers had the highest SHAP values: concerns about treatments available (65%) and financial costs (62.7%). Additional barriers with high SHAP values: being seen as weak and having no one to help them find care. Of demographic characteristics, age had the highest SHAP values. CONCLUSION In a community with a high density of Latinx immigrants, just under half of respondents wanting mental health care successfully connected. Perceived informational, financial, and stigma-related barriers impacted the likelihood to connect with mental health care. These factors should be considered when designing programs and interventions to improve mental health care access and services in the Latinx community.
Collapse
Affiliation(s)
- Jennifer A Newberry
- Department of Emergency Medicine, Stanford University School of Medicine, Stanford, CA, USA.
| | - Michelle A Gimenez
- Department of Emergency Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Fatma Gunturkun
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Erica Villa
- Next Door Solutions to Domestic Violence, San Jose, CA, USA
| | | | | | - Gabriel Garcia
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Haley Hedlin
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Debra Kaysen
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
48
|
Ma P, Ma H, Liu R, Wen H, Li H, Huang Y, Li Y, Xiong L, Xie L, Wang Q. Prediction of vancomycin plasma concentration in elderly patients based on multi-algorithm mining combined with population pharmacokinetics. Sci Rep 2024; 14:27165. [PMID: 39511378 PMCID: PMC11544216 DOI: 10.1038/s41598-024-78558-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 10/31/2024] [Indexed: 11/15/2024] Open
Abstract
The pharmacokinetics of vancomycin exhibit significant inter-individual variability, particularly among elderly patients. This study aims to develop a predictive model that integrates machine learning with population pharmacokinetics (popPK) to facilitate personalized medication management for this demographic. A retrospective analysis incorporating 33 features, including popPK parameters such as clearance and volume of distribution. A combination of multiple algorithms and Shapley Additive Explanations was utilized for feature selection to identify the most influential factors affecting drug concentrations. The performance of each algorithm with popPK parameters was superior to that without popPK parameters. Our final ensemble model, composed of support vector regression, light gradient boosting machine, and categorical boosting in a 6:3:1 ratio, included 16 optimized features. This model demonstrated superior predictive accuracy compared to models utilizing all features, with testing group metrics including an R2 of 0.656, mean absolute error of 3.458, mean square error of 28.103, absolute accuracy within ± 5 mg/L of 81.82%, and relative accuracy within ± 30% of 76.62%. This study presents a rapid and cost-effective predictive model for estimating vancomycin plasma concentrations in elderly patients. The model offers a valuable tool for clinicians to accurately determine effective plasma concentration ranges and tailor individualized dosing regimens, thereby enhancing therapeutic outcomes and safety.
Collapse
Affiliation(s)
- Pan Ma
- Department of Pharmacy, Southwest Hospital, The First Affiliated Hospital of Army Medical University, Gaotanyan Street 30, Chongqing, 400038, China
| | - Huan Ma
- Department of Pharmacy, Southwest Hospital, The First Affiliated Hospital of Army Medical University, Gaotanyan Street 30, Chongqing, 400038, China
| | - Ruixiang Liu
- Department of Pharmacy, Southwest Hospital, The First Affiliated Hospital of Army Medical University, Gaotanyan Street 30, Chongqing, 400038, China
| | - Haini Wen
- Department of Pharmacy, Uppsala University, Husargatan 3, 751 37, Uppsala, Sweden
| | - Haisheng Li
- Institute of Burn Research, The First Affiliated Hospital of Army Medical University, Chongqing, 400038, China
| | - Yifan Huang
- Medical Big Data and Artificial Intelligence Center, The First Affiliated Hospital of Army Medical University, Chongqing, 400038, China
| | - Ying Li
- Medical Big Data and Artificial Intelligence Center, The First Affiliated Hospital of Army Medical University, Chongqing, 400038, China
| | - Lirong Xiong
- Department of Pharmacy, Southwest Hospital, The First Affiliated Hospital of Army Medical University, Gaotanyan Street 30, Chongqing, 400038, China
| | - Linli Xie
- Department of Pharmacy, Southwest Hospital, The First Affiliated Hospital of Army Medical University, Gaotanyan Street 30, Chongqing, 400038, China.
| | - Qian Wang
- Department of Pharmacy, Southwest Hospital, The First Affiliated Hospital of Army Medical University, Gaotanyan Street 30, Chongqing, 400038, China.
| |
Collapse
|
49
|
Liu J, Duan X, Duan M, Jiang Y, Mao W, Wang L, Liu G. Development and external validation of an interpretable machine learning model for the prediction of intubation in the intensive care unit. Sci Rep 2024; 14:27174. [PMID: 39511328 PMCID: PMC11544239 DOI: 10.1038/s41598-024-77798-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Accepted: 10/25/2024] [Indexed: 11/15/2024] Open
Abstract
Given the limited capacity to accurately determine the necessity for intubation in intensive care unit settings, this study aimed to develop and externally validate an interpretable machine learning model capable of predicting the need for intubation among ICU patients. Seven widely used machine learning (ML) algorithms were employed to construct the prediction models. Adult patients from the Medical Information Mart for Intensive Care IV database who stayed in the ICU for longer than 24 h were included in the development and internal validation. The model was subsequently externally validated using the eICU-CRD database. In addition, the SHapley Additive exPlanations method was employed to interpret the influence of individual parameters on the predictions made by the model. A total of 11,988 patients were included in the final cohort for this study. The CatBoost model demonstrated the best performance (AUC: 0.881). In the external validation set, the efficacy of our model was also confirmed (AUC: 0.750), which suggests robust generalization capabilities. The Glasgow Coma Scale (GCS), body mass index (BMI), arterial partial pressure of oxygen (PaO2), respiratory rate (RR) and length of stay (LOS) before ICU were the top 5 features of the CatBoost model with the greatest impact. We developed an externally validated CatBoost model that accurately predicts the need for intubation in ICU patients within 24 to 96 h of admission, facilitating clinical decision-making and has the potential to improve patient outcomes. The prediction model utilizes readily obtainable monitoring parameters and integrates the SHAP method to enhance interpretability, providing clinicians with clear insights into the factors influencing predictions.
Collapse
Affiliation(s)
- Jianyuan Liu
- Emergency Medicine Clinical Research Center, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, China
| | - Xiangjie Duan
- Department of Infectious Diseases, Department of Emergency Medicine, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Minjie Duan
- Center for Artificial Intelligence in Medicine, Chinese PLA General Hospital, Beijing, China
| | - Yu Jiang
- Department of Respiratory and Critical Care Medicine, University-Town Hospital of Chongqing Medical University, Chongqing, China
| | - Wei Mao
- Department of Emergency and Critical Care Medicine, University-Town Hospital of Chongqing Medical University, Chongqing, China
| | - Lilin Wang
- Department of Emergency and Critical Care Medicine, University-Town Hospital of Chongqing Medical University, Chongqing, China
| | - Gang Liu
- Department of Emergency and Critical Care Medicine, University-Town Hospital of Chongqing Medical University, Chongqing, China.
| |
Collapse
|
50
|
Wang Y, Huang RJ, Zhong H, Wang T, Yang L, Yuan W, Xu W, An Z. Predictions of the Optical Properties of Brown Carbon Aerosol by Machine Learning with Typical Chromophores. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024. [PMID: 39510842 DOI: 10.1021/acs.est.4c09031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2024]
Abstract
The linkages between BrC optical properties and chemical composition remain inadequately understood, with quantified chromophores explaining less than 25% of ambient aerosol light absorption. This study characterized 38 typical chromophores in aerosols collected in Xi'an, with light absorption contributions to BrC ranging from 1.6 ± 0.3 to 5.8 ± 2.6% at 365 nm. Based on these quantified chromophores, an interpretable machine learning model and the Shapley Additive Explanation (SHAP) method were employed to explore the relationships between BrC optical properties and chemical composition. The model attained high accuracy with Pearson correlation coefficients (r) exceeding 0.93 for the absorption coefficient (Absλ) and surpassing 0.57 for mass absorption efficiency (MAEλ) of BrC. It explains more than 80% of the variance in Abs and over 50% in MAE, significantly improving the understanding of BrC light absorption. Polycyclic aromatic hydrocarbons (PAHs) and oxygenated PAHs (OPAHs) with four and five rings exhibit significant positive effects on Absλ, suggesting that similar unidentified chromophores may also notably impact BrC optical characteristics. The model based on chromophore mass concentrations further simplifies studying BrC optical characteristics. This study advances understanding of the relationship between BrC composition and optical properties and guides the investigation of unrecognized chromophores.
Collapse
Affiliation(s)
- Ying Wang
- Interdisciplinary Research Center of Earth Science Frontier, State Key Laboratory of Earth Surface Processes and Resource Ecology, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
- State Key Laboratory of Loess Science, Center for Excellence in Quaternary Science and Global Change, Institute of Earth Environment, Chinese Academy of Sciences, Xi'an 710061, China
| | - Ru-Jin Huang
- State Key Laboratory of Loess Science, Center for Excellence in Quaternary Science and Global Change, Institute of Earth Environment, Chinese Academy of Sciences, Xi'an 710061, China
- Institute of Global Environmental Change, Xi'an Jiaotong University, Xi'an 710049, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haobin Zhong
- School of Advanced Materials Engineering, Jiaxing Nanhu University, Jiaxing 314001, China
| | - Ting Wang
- State Key Laboratory of Loess Science, Center for Excellence in Quaternary Science and Global Change, Institute of Earth Environment, Chinese Academy of Sciences, Xi'an 710061, China
| | - Lu Yang
- State Key Laboratory of Loess Science, Center for Excellence in Quaternary Science and Global Change, Institute of Earth Environment, Chinese Academy of Sciences, Xi'an 710061, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Yuan
- State Key Laboratory of Loess Science, Center for Excellence in Quaternary Science and Global Change, Institute of Earth Environment, Chinese Academy of Sciences, Xi'an 710061, China
| | - Wei Xu
- Center for Excellence in Regional Atmospheric Environment, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China
| | - Zhisheng An
- Interdisciplinary Research Center of Earth Science Frontier, State Key Laboratory of Earth Surface Processes and Resource Ecology, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
- State Key Laboratory of Loess Science, Center for Excellence in Quaternary Science and Global Change, Institute of Earth Environment, Chinese Academy of Sciences, Xi'an 710061, China
| |
Collapse
|