1
|
Sandip Vora D, Manoj Bhandari S, Sundar D. DNA shape features improve prediction of CRISPR/Cas9 activity. Methods 2024:S1046-2023(24)00102-6. [PMID: 38641083 DOI: 10.1016/j.ymeth.2024.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/27/2024] [Accepted: 04/10/2024] [Indexed: 04/21/2024] Open
Abstract
The CRISPR/Cas9 genome editing technology has transformed basic and translational research in biology and medicine. However, the advances are hindered by off-target effects and a paucity in the knowledge of the mechanism of the Cas9 protein. Machine learning models have been proposed for the prediction of Cas9 activity at unintended sites, yet feature engineering plays a major role in the outcome of the predictors. This study evaluates the improvement in the performance of similar predictors upon inclusion of epigenetic and DNA shape feature groups in the conventionally used sequence-based Cas9 target and off-target datasets. The approach involved the utilization of neural networks trained on a diverse range of parameters, allowing us to systematically assess the performance increase for the meticulously designed datasets- (i) sequence only, (ii) sequence and epigenetic features, and (iii) sequence, epigenetic and DNA shape feature datasets. The addition of DNA shape information significantly improved predictive performance, evaluated by Akaike and Bayesian information criteria. The evaluation of individual feature importance by permutation and LIME-based methods also indicates that not only sequence features like mismatches and nucleotide composition, but also base pairing parameters like opening and stretch, that are indicative of distortion in the DNA-RNA hybrid in the presence of mismatches, influence model outcomes.
Collapse
Affiliation(s)
- Dhvani Sandip Vora
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi 110016, India.
| | - Sakshi Manoj Bhandari
- Department of Mathematics, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India.
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi 110016, India; School of Artificial Intelligence, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India.
| |
Collapse
|
2
|
Wang R, Zhang KH, Wang Y, Wu CC, Bao LJ, Zeng EY. Use of machine learning to identify key factors regulating volatilization of semi-volatile organic chemicals from soil to air. Sci Total Environ 2024; 920:170769. [PMID: 38342447 DOI: 10.1016/j.scitotenv.2024.170769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 01/30/2024] [Accepted: 02/04/2024] [Indexed: 02/13/2024]
Abstract
Volatilization from soil to air is a key process driving the distribution and fate of semi-volatile organic contaminants. However, quantifying this process and the key environmental governing factors remains difficult. To address this issue, the volatilization fluxes of polybrominated diphenyl ethers (PBDEs) and organophosphate esters (OPEs) from soil were determined in 16 batch experiments orthogonally with six variables (chemical property, soil concentration, air velocity, ambient temperature, soil porosity, and soil moisture) and analyzed with machine learning methods. The results showed that gradient-boosting regression tree models satisfactorily predicted the volatilization fluxes of PBDEs (r2 = 0.82 ± 0.07) and OPEs (r2 = 0.62 ± 0.13). Permutation importance analysis showed that partitioning potential of chemicals between soil and air was the most important factor regulating the volatilization of the target compounds from soil. Temperature and soil porosity played a secondary role in controlling the migration of PBDEs and OPEs, respectively, due to higher volatilization enthalpies of PBDEs than those of OPEs and dominant adsorption of OPEs on mineral surface. The effect of soil moisture was negative and positive for the volatilization fluxes of PBDEs and OPEs, respectively. These results suggested different responses in the soil-air diffusive transport of PBDEs and OPEs to high temperature and rainstorm induced by climate change.
Collapse
Affiliation(s)
- Rong Wang
- Guangdong Key Laboratory of Environmental Pollution and Health, Jinan University, Guangzhou 511443, China
| | - Kai-Hui Zhang
- Guangdong Key Laboratory of Environmental Pollution and Health, Jinan University, Guangzhou 511443, China
| | - Yu Wang
- Guangdong Key Laboratory of Environmental Pollution and Health, Jinan University, Guangzhou 511443, China
| | - Chen-Chou Wu
- Guangdong Key Laboratory of Environmental Pollution and Health, Jinan University, Guangzhou 511443, China
| | - Lian-Jun Bao
- Guangdong Key Laboratory of Environmental Pollution and Health, Jinan University, Guangzhou 511443, China.
| | - Eddy Y Zeng
- Guangdong Key Laboratory of Environmental Pollution and Health, Jinan University, Guangzhou 511443, China
| |
Collapse
|
3
|
Xiong M, Li X, Zhang C, Shen S. Effects of weather and air pollution on outpatient visits for insect-and-mite-caused dermatitis: an empirical and predictive analysis. BMC Public Health 2024; 24:633. [PMID: 38419007 DOI: 10.1186/s12889-024-18067-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 02/11/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Dermatitis caused by insects and mites, diagnosed as papular urticaria or scabies, is a common skin disease. However, there is still a lack of studies about the effects of weather and air pollution on outpatient visits for this disease. This study aims to explore the impacts of meteorological and environmental factors on daily visits of dermatitis outpatients. METHODS Analyses are conducted on a total of 43,101 outpatient visiting records during the years 2015-2020 from the largest dermatology specialist hospital in Guangzhou, China. Hierarchical cluster models based on Pearson correlation between risk factors are utilized to select regression variables. Linear regression models are fitted to identify the statistically significant associations between the risk factors and daily visits, taking into account the short-term effects of temperatures. Permutation importance is adopted to evaluate the predictive ability of these factors. RESULTS Short-term temperatures have positive associations with daily visits and exhibit strong predictive abilities. In terms of total outpatients, the one-day lagged temperature not only has a significant impact on daily visits, but also has the highest median value of permutation importance. This conclusion is robust across most subgroups except for subgroups of summer and scabies, wherein the three-day lagged temperature has a negative effect. By contrast, air pollution has insignificant associations with daily visits and exhibits weak predictive abilities. Moreover, weekdays, holidays and trends have significant impacts on daily visits, but with weak predictive abilities. CONCLUSIONS Our study suggests that short-term temperatures have positive associations with daily visits and exhibit strong predictive abilities. Nevertheless, air pollution has insignificant associations with daily visits and exhibits weak predictive abilities. The results of this study provide a reference for local authorities to formulate intervention measures and establish an environment-based disease early warning system.
Collapse
Affiliation(s)
- Minghua Xiong
- Business School, Foshan University, Foshan, 528000, China
- Research Centre for Innovation & Economic Transformation, Research Institute of Social Sciences in Guangdong Province, Guangzhou, 510000, China
| | - Xiaoping Li
- Business School, Sichuan University, Chengdu, 610065, China
| | - Chao Zhang
- School of Business, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Shuqun Shen
- Dermatology Hospital, Southern Medical University, Guangzhou, 510515, China.
| |
Collapse
|
4
|
Xu C, Li H, Yang J, Peng Y, Cai H, Zhou J, Gu W, Chen L. Interpretable prediction of 3-year all-cause mortality in patients with chronic heart failure based on machine learning. BMC Med Inform Decis Mak 2023; 23:267. [PMID: 37985996 PMCID: PMC10662001 DOI: 10.1186/s12911-023-02371-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 11/08/2023] [Indexed: 11/22/2023] Open
Abstract
BACKGROUND The goal of this study was to assess the effectiveness of machine learning models and create an interpretable machine learning model that adequately explained 3-year all-cause mortality in patients with chronic heart failure. METHODS The data in this paper were selected from patients with chronic heart failure who were hospitalized at the First Affiliated Hospital of Kunming Medical University, from 2017 to 2019 with cardiac function class III-IV. The dataset was explored using six different machine learning models, including logistic regression, naive Bayes, random forest classifier, extreme gradient boost, K-nearest neighbor, and decision tree. Finally, interpretable methods based on machine learning, such as SHAP value, permutation importance, and partial dependence plots, were used to estimate the 3-year all-cause mortality risk and produce individual interpretations of the model's conclusions. RESULT In this paper, random forest was identified as the optimal aools lgorithm for this dataset. We also incorporated relevant machine learning interpretable tand techniques to improve disease prognosis, including permutation importance, PDP plots and SHAP values for analysis. From this study, we can see that the number of hospitalizations, age, glomerular filtration rate, BNP, NYHA cardiac function classification, lymphocyte absolute value, serum albumin, hemoglobin, total cholesterol, pulmonary artery systolic pressure and so on were important for providing an optimal risk assessment and were important predictive factors of chronic heart failure. CONCLUSION The machine learning-based cardiovascular risk models could be used to accurately assess and stratify the 3-year risk of all-cause mortality among CHF patients. Machine learning in combination with permutation importance, PDP plots, and the SHAP value could offer a clear explanation of individual risk prediction and give doctors an intuitive knowledge of the functions of important model components.
Collapse
Affiliation(s)
- Chenggong Xu
- The First Affiliated Hospital of Kunming Medical University, Kunming, China
| | - Hongxia Li
- The First Affiliated Hospital of Kunming Medical University, Kunming, China
| | - Jianping Yang
- College of Big Data, Yunnan Agricultural University, Kunming, China
| | - Yunzhu Peng
- The First Affiliated Hospital of Kunming Medical University, Kunming, China
| | - Hongyan Cai
- The First Affiliated Hospital of Kunming Medical University, Kunming, China
| | - Jing Zhou
- The First Affiliated Hospital of Kunming Medical University, Kunming, China
| | - Wenyi Gu
- The First Affiliated Hospital of Kunming Medical University, Kunming, China
| | - Lixing Chen
- The First Affiliated Hospital of Kunming Medical University, Kunming, China.
| |
Collapse
|
5
|
Li T, Zhang Q, Peng Y, Guan X, Li L, Mu J, Wang X, Yin X, Wang Q. Contributions of various driving factors to air pollution events: Interpretability analysis from Machine learning perspective. Environ Int 2023; 173:107861. [PMID: 36898175 DOI: 10.1016/j.envint.2023.107861] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/09/2023] [Accepted: 03/01/2023] [Indexed: 06/18/2023]
Abstract
The air quality in China has been improved substantially, however fine particulate matter (PM2.5) still remain at a high level in many areas. PM2.5 pollution is a complex process that is attributed to gaseous precursors, chemical, and meteorological factors. Quantifying the contribution of each variable to air pollution can facilitate the formulation of effective policies to precisely eliminate air pollution. In this study, we first used decision plot to map out the decision process of the Random Forest (RF) model for a single hourly data set and constructed a framework for analyzing the causes of air pollution using multiple interpretable methods. Permutation importance was used to qualitatively analyze the effect of each variable on PM2.5 concentrations. The sensitivity of secondary inorganic aerosols (SIA): SO42-, NO3- and NH4+ to PM2.5 was verified by Partial dependence plot (PDP). Shapley Additive Explanation (Shapley) was used to quantify the contribution of drivers behind the ten air pollution events. The RF model can accurately predict PM2.5 concentrations, with determination coefficient (R2) of 0.94, root mean square error (RMSE) and mean absolute error (MAE) of 9.4 μg/m3 and 5.7 μg/m3, respectively. This study revealed that the order of sensitivity of SIA to PM2.5 was NH4+>NO3->SO42-. Fossil fuel and biomass combustion may be contributing factors to air pollution events in Zibo in 2021 autumn-winter. NH4+ contributed 19.9-65.4 μg/m3 among ten air pollution events (APs). K, NO3-, EC and OC were the other main drivers, contributing 8.7 ± 2.7 μg/m3, 6.8 ± 7.5 μg/m3, 3.6 ± 5.8 μg/m3 and 2.5 ± 2.0 μg/m3, respectively. Lower temperature and higher humidity were vital factors that promoted the formation of NO3-. Our study may provide a methodological framework for precise air pollution management.
Collapse
Affiliation(s)
- Tianshuai Li
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China
| | - Qingzhu Zhang
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China
| | - Yanbo Peng
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China; Shandong Academy for Environmental Planning, Jinan 250101, PR China.
| | - Xu Guan
- Shandong Academy for Environmental Planning, Jinan 250101, PR China
| | - Lei Li
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China
| | - Jiangshan Mu
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China
| | - Xinfeng Wang
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China
| | - Xianwei Yin
- Zibo Ecological Environment Monitoring Center of Shandong Province, Zibo 255040, PR China
| | - Qiao Wang
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China
| |
Collapse
|
6
|
Forssten MP, Ioannidis I, Mohammad Ismail A, Bass GA, Borg T, Cao Y, Mohseni S. Dementia is a surrogate for frailty in hip fracture mortality prediction. Eur J Trauma Emerg Surg 2022. [PMID: 35355091 DOI: 10.1007/s00068-022-01960-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 03/13/2022] [Indexed: 12/12/2022]
Abstract
Purpose Among hip fracture patients both dementia and frailty are particularly prevalent. The aim of the current study was to determine if dementia functions as a surrogate for frailty, or if it confers additional information as a comorbidity when predicting postoperative mortality after a hip fracture. Methods All adult patients who suffered a traumatic hip fracture in Sweden between January 1, 2008 and December 31, 2017 were considered for inclusion. Pathological fractures, non-operatively treated fractures, reoperations, and patients missing data were excluded. Logistic regression (LR) models were fitted, one including and one excluding measurements of frailty, with postoperative mortality as the response variable. The primary outcome of interest was 30-day postoperative mortality. The relative importance for all variables was determined using the permutation importance. New LR models were constructed using the top ten most important variables. The area under the receiver-operating characteristic curve (AUC) was used to compare the predictive ability of these models. Results 121,305 patients were included in the study. Initially, dementia was among the top ten most important variables for predicting 30-day mortality. When measurements of frailty were included, dementia was replaced in relative importance by the ability to walk alone outdoors and institutionalization. There was no significant difference in the predictive ability of the models fitted using the top ten most important variables when comparing those that included [AUC for 30-day mortality (95% CI): 0.82 (0.81–0.82)] and excluded [AUC for 30-day mortality (95% CI): 0.81 (0.80–0.81)] measurements of frailty. Conclusion Dementia functions as a surrogate for frailty when predicting mortality up to one year after hip fracture surgery. The presence of dementia in a patient without frailty does not appreciably contribute to the prediction of postoperative mortality. Supplementary Information The online version contains supplementary material available at 10.1007/s00068-022-01960-9.
Collapse
|