1
|
Fan Y, Sun N, Lv S, Jiang H, Zhang Z, Wang J, Xie Y, Yue X, Hu B, Ju B, Yu P. Prediction of developmental toxic effects of fine particulate matter (PM 2.5) water-soluble components via machine learning through observation of PM 2.5 from diverse urban areas. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 946:174027. [PMID: 38906297 DOI: 10.1016/j.scitotenv.2024.174027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/09/2024] [Accepted: 06/13/2024] [Indexed: 06/23/2024]
Abstract
The global health implications of fine particulate matter (PM2.5) underscore the imperative need for research into its toxicity and chemical composition. In this study, zebrafish embryos exposed to the water-soluble components of PM2.5 from two cities (Harbin and Hangzhou) with differences in air quality, underwent microscopic examination to identify primary target organs. The Harbin PM2.5 induced dose-dependent organ malformation in zebrafish, indicating a higher level of toxicity than that of the Hangzhou sample. Harbin PM2.5 led to severe deformities such as pericardial edema and a high mortality rate, while the Hangzhou sample exhibited hepatotoxicity, causing delayed yolk sac absorption. The experimental determination of PM2.5 constituents was followed by the application of four algorithms for predictive toxicological assessment. The random forest algorithm correctly predicted each of the effect classes and showed the best performance, suggesting that zebrafish malformation rates were strongly correlated with water-soluble components of PM2.5. Feature selection identified the water-soluble ions F- and Cl- and metallic elements Al, K, Mn, and Be as potential key components affecting zebrafish development. This study provides new insights into the developmental toxicity of PM2.5 and offers a new approach for predicting and exploring the health effects of PM2.5.
Collapse
Affiliation(s)
- Yang Fan
- Department of Medical Oncology of the Second Affiliated Hospital, Department of Toxicology, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Nannan Sun
- Hangzhou SanOmics AI Co., Ltd, Hangzhou 311103, China
| | - Shenchong Lv
- Department of Medical Oncology of the Second Affiliated Hospital, Department of Toxicology, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Hui Jiang
- Department of Medical Oncology of the Second Affiliated Hospital, Department of Toxicology, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Ziqing Zhang
- Department of Medical Oncology of the Second Affiliated Hospital, Department of Toxicology, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Junjie Wang
- Department of Medical Oncology of the Second Affiliated Hospital, Department of Toxicology, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Yiyi Xie
- Department of Medical Oncology of the Second Affiliated Hospital, Department of Toxicology, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Xiaomin Yue
- Department of Biophysics, Zhejiang University School of Medicine, Hangzhou 310058, China; Department of Neurology of the Fourth Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Baolan Hu
- College of Environmental Resource Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Bin Ju
- Hangzhou SanOmics AI Co., Ltd, Hangzhou 311103, China.
| | - Peilin Yu
- Department of Medical Oncology of the Second Affiliated Hospital, Department of Toxicology, Zhejiang University School of Medicine, Hangzhou 310058, China.
| |
Collapse
|
2
|
Gregorich M, Simpson SL, Heinze G. Flexible parametrization of graph-theoretical features from individual-specific networks for prediction. Stat Med 2024; 43:2592-2606. [PMID: 38664934 DOI: 10.1002/sim.10091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 03/15/2024] [Accepted: 04/15/2024] [Indexed: 05/24/2024]
Abstract
Statistical techniques are needed to analyze data structures with complex dependencies such that clinically useful information can be extracted. Individual-specific networks, which capture dependencies in complex biological systems, are often summarized by graph-theoretical features. These features, which lend themselves to outcome modeling, can be subject to high variability due to arbitrary decisions in network inference and noise. Correlation-based adjacency matrices often need to be sparsified before meaningful graph-theoretical features can be extracted, requiring the data analysts to determine an optimal threshold. To address this issue, we propose to incorporate a flexible weighting function over the full range of possible thresholds to capture the variability of graph-theoretical features over the threshold domain. The potential of this approach, which extends concepts from functional data analysis to a graph-theoretical setting, is explored in a plasmode simulation study using real functional magnetic resonance imaging (fMRI) data from the Autism Brain Imaging Data Exchange (ABIDE) Preprocessed initiative. The simulations show that our modeling approach yields accurate estimates of the functional form of the weight function, improves inference efficiency, and achieves a comparable or reduced root mean square prediction error compared to competitor modeling approaches. This assertion holds true in settings where both complex functional forms underlie the outcome-generating process and a universal threshold value is employed. We demonstrate the practical utility of our approach by using resting-state fMRI data to predict biological age in children. Our study establishes the flexible modeling approach as a statistically principled, serious competitor to ad-hoc methods with superior performance.
Collapse
Affiliation(s)
- Mariella Gregorich
- Medical University of Vienna, Center for Medical Data Science, Institute of Clinical Biometrics, Vienna, Austria
| | - Sean L Simpson
- Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, North Carolina, USA
| | - Georg Heinze
- Medical University of Vienna, Center for Medical Data Science, Institute of Clinical Biometrics, Vienna, Austria
| |
Collapse
|
3
|
Dunias ZS, Van Calster B, Timmerman D, Boulesteix AL, van Smeden M. A comparison of hyperparameter tuning procedures for clinical prediction models: A simulation study. Stat Med 2024; 43:1119-1134. [PMID: 38189632 DOI: 10.1002/sim.9932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 09/10/2023] [Accepted: 09/21/2023] [Indexed: 01/09/2024]
Abstract
Tuning hyperparameters, such as the regularization parameter in Ridge or Lasso regression, is often aimed at improving the predictive performance of risk prediction models. In this study, various hyperparameter tuning procedures for clinical prediction models were systematically compared and evaluated in low-dimensional data. The focus was on out-of-sample predictive performance (discrimination, calibration, and overall prediction error) of risk prediction models developed using Ridge, Lasso, Elastic Net, or Random Forest. The influence of sample size, number of predictors and events fraction on performance of the hyperparameter tuning procedures was studied using extensive simulations. The results indicate important differences between tuning procedures in calibration performance, while generally showing similar discriminative performance. The one-standard-error rule for tuning applied to cross-validation (1SE CV) often resulted in severe miscalibration. Standard non-repeated and repeated cross-validation (both 5-fold and 10-fold) performed similarly well and outperformed the other tuning procedures. Bootstrap showed a slight tendency to more severe miscalibration than standard cross-validation-based tuning procedures. Differences between tuning procedures were larger for smaller sample sizes, lower events fractions and fewer predictors. These results imply that the choice of tuning procedure can have a profound influence on the predictive performance of prediction models. The results support the application of standard 5-fold or 10-fold cross-validation that minimizes out-of-sample prediction error. Despite an increased computational burden, we found no clear benefit of repeated over non-repeated cross-validation for hyperparameter tuning. We warn against the potentially detrimental effects on model calibration of the popular 1SE CV rule for tuning prediction models in low-dimensional settings.
Collapse
Affiliation(s)
- Zoë S Dunias
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Dirk Timmerman
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Munich, Germany
- Munich Center for Machine Learning (MCML), LMU Munich, Munich, Germany
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
4
|
Patwary AL, Haque AM, Mahdinia I, Khattak AJ. Investigating transportation safety in disadvantaged communities by integrating crash and Environmental Justice data. ACCIDENT; ANALYSIS AND PREVENTION 2024; 194:107366. [PMID: 37924566 DOI: 10.1016/j.aap.2023.107366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 10/03/2023] [Accepted: 10/24/2023] [Indexed: 11/06/2023]
Abstract
Recent efforts to identify disadvantaged communities (DACs) on a census tract level have evoked possibilities of attaining transportation justice and vision zero goals in these areas. To identify DACs, the United States Department of Transportation (USDOT) has developed six comprehensive indicators: economy, environment, equity, health, resilience, and transportation access. The indicators are used to explore the associations between DACs (in 71,728 census tracts) and five years of fatal crashes, providing a comprehensive understanding of safety risks. Specifically, using data on DACs and linking it with census and crash data, this study aims to understand the complex connections between safety (captured through fatal crashes) and disadvantages that communities confront due to a convergence of multiple challenges and burdens using Zero-Hurdle Negative Binomial models. The results reveal that health, resilience, and transportation-disadvantaged tracts are associated with more fatal crashes. The study also found the presence of a higher percentage of the population with bachelor's degrees and increased use of public transportation are correlated with fewer fatal crashes. Also, a higher fatal crash rate is observed in disadvantaged census tracts where a high proportion of the Hawaiian or other Pacific Islander, and American Indian or Alaska Native populations live. This implies that targeted interventions can be explored further in tracts that show high correlations with fatal crashes. The findings contribute to traffic safety by highlighting the risks in DACs, which can help design and implement traffic safety interventions. The insights gained from this study can inform decision-making and help to guide the development of more equitable traffic safety programs in disadvantaged communities.
Collapse
Affiliation(s)
- A Latif Patwary
- Department of Civil and Environmental Engineering, University of Tennessee Knoxville, Knoxville, TN 37996, USA.
| | - Antora Mohsena Haque
- Department of Civil and Environmental Engineering, University of Tennessee Knoxville, Knoxville, TN 37996, USA.
| | - Iman Mahdinia
- Safe Transportation Research & Education Center, The University of California Berkeley, CA 94704, USA.
| | - Asad J Khattak
- Department of Civil and Environmental Engineering, University of Tennessee Knoxville, Knoxville, TN 37996, USA.
| |
Collapse
|
5
|
Jin H, Ranasinghe KG, Prabhu P, Dale C, Gao Y, Kudo K, Vossel K, Raj A, Nagarajan SS, Jiang F. Dynamic functional connectivity MEG features of Alzheimer's disease. Neuroimage 2023; 281:120358. [PMID: 37699440 PMCID: PMC10865998 DOI: 10.1016/j.neuroimage.2023.120358] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 08/14/2023] [Accepted: 08/31/2023] [Indexed: 09/14/2023] Open
Abstract
Dynamic resting state functional connectivity (RSFC) characterizes time-varying fluctuations of functional brain network activity. While many studies have investigated static functional connectivity, it has been unclear whether features of dynamic functional connectivity are associated with neurodegenerative diseases. Popular sliding-window and clustering methods for extracting dynamic RSFC have various limitations that prevent extracting reliable features to address this question. Here, we use a novel and robust time-varying dynamic network (TVDN) approach to extract the dynamic RSFC features from high resolution magnetoencephalography (MEG) data of participants with Alzheimer's disease (AD) and matched controls. The TVDN algorithm automatically and adaptively learns the low-dimensional spatiotemporal manifold of dynamic RSFC and detects dynamic state transitions in data. We show that amongst all the functional features we investigated, the dynamic manifold features are the most predictive of AD. These include: the temporal complexity of the brain network, given by the number of state transitions and their dwell times, and the spatial complexity of the brain network, given by the number of eigenmodes. These dynamic features have higher sensitivity and specificity in distinguishing AD from healthy subjects than the existing benchmarks do. Intriguingly, we found that AD patients generally have higher spatial complexity but lower temporal complexity compared with healthy controls. We also show that graph theoretic metrics of dynamic component of TVDN are significantly different in AD versus controls, while static graph metrics are not statistically different. These results indicate that dynamic RSFC features are impacted in neurodegenerative disease like Alzheimer's disease, and may be crucial to understanding the pathophysiological trajectory of these diseases.
Collapse
Affiliation(s)
- Huaqing Jin
- Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA, USA
| | - Kamalini G Ranasinghe
- Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA, USA; Memory and Aging Center, University of California San Francisco, San Francisco, CA, USA
| | - Pooja Prabhu
- Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA, USA
| | - Corby Dale
- Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA, USA
| | - Yijing Gao
- Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA, USA
| | - Kiwamu Kudo
- Medical Imaging Business Center, Ricoh Company, Ltd., Kanazawa, 920-0177, Japan
| | - Keith Vossel
- Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
| | - Ashish Raj
- Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA, USA
| | - Srikantan S Nagarajan
- Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA, USA.
| | - Fei Jiang
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
6
|
Tiulpin A, Saarakkala S, Mathiessen A, Hammer HB, Furnes O, Nordsletten L, Englund M, Magnusson K. Predicting total knee arthroplasty from ultrasonography using machine learning. OSTEOARTHRITIS AND CARTILAGE OPEN 2022; 4:100319. [DOI: 10.1016/j.ocarto.2022.100319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 10/15/2022] [Accepted: 10/30/2022] [Indexed: 11/09/2022] Open
|
7
|
Ghorashi SM, Fazeli A, Hedayat B, Mokhtari H, Jalali A, Ahmadi P, Chalian H, Bragazzi NL, Shirani S, Omidi N. Comparison of conventional scoring systems to machine learning models for the prediction of major adverse cardiovascular events in patients undergoing coronary computed tomography angiography. Front Cardiovasc Med 2022; 9:994483. [PMID: 36386332 PMCID: PMC9643500 DOI: 10.3389/fcvm.2022.994483] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 10/05/2022] [Indexed: 08/04/2023] Open
Abstract
BACKGROUND The study aims to compare the prognostic performance of conventional scoring systems to a machine learning (ML) model on coronary computed tomography angiography (CCTA) to discriminate between the patients with and without major adverse cardiovascular events (MACEs) and to find the most important contributing factor of MACE. MATERIALS AND METHODS From November to December 2019, 500 of 1586 CCTA scans were included and analyzed, then six conventional scores were calculated for each participant, and seven ML models were designed. Our study endpoints were all-cause mortality, non-fatal myocardial infarction, late coronary revascularization, and hospitalization for unstable angina or heart failure. Score performance was assessed by area under the curve (AUC) analysis. RESULTS Of 500 patients (mean age: 60 ± 10; 53.8% male subjects) referred for CCTA, 416 patients have met inclusion criteria, 46 patients with early (<90 days) cardiac evaluation (due to the inability to clarify the reason for the assessment, deterioration of the symptoms vs. the CCTA result), and 38 patients because of missed follow-up were not enrolled in the final analysis. Forty-six patients (11.0%) developed MACE within 20.5 ± 7.9 months of follow-up. Compared to conventional scores, ML models showed better performance, except only one model which is eXtreme Gradient Boosting had lower performance than conventional scoring systems (AUC:0.824, 95% confidence interval (CI): 0.701-0.947). Between ML models, random forest, ensemble with generalized linear, and ensemble with naive Bayes were shown to have higher prognostic performance (AUC: 0.92, 95% CI: 0.85-0.99, AUC: 0.90, 95% CI: 0.81-0.98, and AUC: 0.89, 95% CI: 0.82-0.97), respectively. Coronary artery calcium score (CACS) had the highest correlation with MACE. CONCLUSION Compared to the conventional scoring system, ML models using CCTA scans show improved prognostic prediction for MACE. Anatomical features were more important than clinical characteristics.
Collapse
Affiliation(s)
| | - Amir Fazeli
- Tehran Heart Center, Tehran University of Medical Science, Tehran, Iran
| | - Behnam Hedayat
- Tehran Heart Center, Tehran University of Medical Science, Tehran, Iran
| | - Hamid Mokhtari
- Biomedical Engineering and Physics Department, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Arash Jalali
- Tehran Heart Center, Tehran University of Medical Science, Tehran, Iran
| | - Pooria Ahmadi
- Tehran Heart Center, Tehran University of Medical Science, Tehran, Iran
| | - Hamid Chalian
- Division of Cardiothoracic Imaging, Department of Radiology, University of Washington, Seattle, WA, United States
| | - Nicola Luigi Bragazzi
- Laboratory for Industrial and Applied Mathematics (LIAM), Department of Mathematics and Statistics, York University, Toronto, ON, Canada
| | - Shapour Shirani
- Department of Cardiovascular Imaging, Tehran Heart Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Negar Omidi
- Department of Cardiovascular Imaging, Tehran Heart Center, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
8
|
Gregorich M, Melograna F, Sunqvist M, Michiels S, Van Steen K, Heinze G. Individual-specific networks for prediction modelling – A scoping review of methods. BMC Med Res Methodol 2022; 22:62. [PMID: 35249534 PMCID: PMC8898441 DOI: 10.1186/s12874-022-01544-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 02/11/2022] [Indexed: 11/10/2022] Open
Abstract
Background Recent advances in biotechnology enable the acquisition of high-dimensional data on individuals, posing challenges for prediction models which traditionally use covariates such as clinical patient characteristics. Alternative forms of covariate representations for the features derived from these modern data modalities should be considered that can utilize their intrinsic interconnection. The connectivity information between these features can be represented as an individual-specific network defined by a set of nodes and edges, the strength of which can vary from individual to individual. Global or local graph-theoretical features describing the network may constitute potential prognostic biomarkers instead of or in addition to traditional covariates and may replace the often unsuccessful search for individual biomarkers in a high-dimensional predictor space. Methods We conducted a scoping review to identify, collate and critically appraise the state-of-art in the use of individual-specific networks for prediction modelling in medicine and applied health research, published during 2000–2020 in the electronic databases PubMed, Scopus and Embase. Results Our scoping review revealed the main application areas namely neurology and pathopsychology, followed by cancer research, cardiology and pathology (N = 148). Network construction was mainly based on Pearson correlation coefficients of repeated measurements, but also alternative approaches (e.g. partial correlation, visibility graphs) were found. For covariates measured only once per individual, network construction was mostly based on quantifying an individual’s contribution to the overall group-level structure. Despite the multitude of identified methodological approaches for individual-specific network inference, the number of studies that were intended to enable the prediction of clinical outcomes for future individuals was quite limited, and most of the models served as proof of concept that network characteristics can in principle be useful for prediction. Conclusion The current body of research clearly demonstrates the value of individual-specific network analysis for prediction modelling, but it has not yet been considered as a general tool outside the current areas of application. More methodological research is still needed on well-founded strategies for network inference, especially on adequate network sparsification and outcome-guided graph-theoretical feature extraction and selection, and on how networks can be exploited efficiently for prediction modelling. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01544-6.
Collapse
|