1
|
Xiao F, Ding X, Shi Y, Wang D, Wang Y, Cui C, Zhu T, Chen K, Xiang P, Luo X. Application of ensemble learning for predicting GABA A receptor agonists. Comput Biol Med 2024; 169:107958. [PMID: 38194778 DOI: 10.1016/j.compbiomed.2024.107958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 12/29/2023] [Accepted: 01/01/2024] [Indexed: 01/11/2024]
Abstract
BACKGROUND Over the past few decades, agonists binding to the benzodiazepine site of the GABAA receptor have been successfully developed as clinical drugs. Different modulators (agonist, antagonist, and reverse agonist) bound to benzodiazepine sites exhibit different or even opposite pharmacological effects, however, their structures are so similar that it is difficult to distinguish them based solely on molecular skeleton. This study aims to develop classification models for predicting the agonists. METHODS 306 agonists or non-agonists were collected from literature. Six machine learning algorithms including RF, XGBoost, AdaBoost, GBoost, SVM, and ANN algorithms were employed for model development. Using six descriptors including 1D/2D Descriptors, ECFP4, 2D-Pharmacophore, MACCS, PubChem, and Estate fingerprint to characterize chemical structures. The model interpretability was explored by SHAP method. RESULTS The best model demonstrated an AUC value of 0.905 and an MCC value of 0.808 for the test set. The PubMac-based model (PubMac-GB) achieved best AUC values of 0.935 for test set. The SHAP analysis results emphasized that MaccsFP62, ECFP_624, ECFP_724, and PubchemFP213 were the crucial molecular features. Applicability domain analysis was also performed to determine reliable prediction boundaries for the model. The PubMac-GB model was applied to virtual screening for potential GABAA agonists and the top 100 compounds were given. CONCLUSION Overall, our ensemble learning-based model (PubMac-GB) achieved comparable performance and would be helpful in effectively identifying agonists of GABAA receptors.
Collapse
Affiliation(s)
- Fu Xiao
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, 210023, China; Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xiaoyu Ding
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Yan Shi
- Academy of Forensic Science, Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Key Laboratory of Forensic Science, Ministry of Justice, Shanghai, 200063, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Chen Cui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Tingfei Zhu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Kaixian Chen
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, 210023, China; Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Ping Xiang
- Academy of Forensic Science, Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Key Laboratory of Forensic Science, Ministry of Justice, Shanghai, 200063, China.
| | - Xiaomin Luo
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, 210023, China; Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| |
Collapse
|
2
|
Hao H, Li P, Jiao W, Ge D, Hu C, Li J, Lv Y, Chen W. Ensemble learning-based applied research on heavy metals prediction in a soil-rice system. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 898:165456. [PMID: 37451444 DOI: 10.1016/j.scitotenv.2023.165456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 07/06/2023] [Accepted: 07/08/2023] [Indexed: 07/18/2023]
Abstract
Accurate prediction of heavy metal accumulation in soil ecosystems is crucial for maintaining healthy soil environments and ensuring high-quality agricultural products, as well as a challenging scientific task. In this study, we constructed a dataset containing 490 sets of multidimensional environmental covariate data and proposed prediction models for heavy metal concentrations (HMC) in a soil-rice system, EL-HMC (including RF-HMC and GBM-HMC), based on Random Forest (RF) and Gradient Boosting Machine (GBM) ensemble learning (EL) techniques. To reasonably evaluate the effectiveness of each model, Multiple linear and Bayesian regressions were selected as benchmark models (BM), and mean absolute error (MAE), root mean square error (RMSE), and determination coefficient R2 were selected as evaluation indicators. In addition, sensitivity and spatial autocorrelation (SAC) analyses were used to examine the robustness of the model. The results showed that the R2 values of RF-HMC and GBM-HMC for modeling available cadmium (Cd) concentrations in soil were 0.654 and 0.690, respectively, with an average increase of 48.0 % compared to the BMs. The R2 values of RF-HMC and GBM-HMC for predicting Cd, lead (Pb), chromium (Cr), and mercury (Hg) concentrations in rice ranged from 0.618 to 0.824 and 0.645 to 0.850, respectively, with an average increase of 58.2 % compared with the BMs. The corresponding MAEs and RMSEs of RF-HMC and GBM-HMC had low error levels. Sensitivity analysis of the input features and the SAC of the prediction bias showed that the EL-HMC models have excellent robustness. Therefore, the EL technology-based prediction models for HMCs proposed herein are practical and feasible, demonstrating better accuracy and stability than the traditional model. This study verifies the application potential of EL technology in pollution ecology and provides a new perspective and solution for sustainable management and precise prevention of heavy metal pollution in farmland soil at the regional scale.
Collapse
Affiliation(s)
- Huijuan Hao
- Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China
| | - Panpan Li
- Information Centre, PLA Strategic Support Force Characteristic Medical Center, Beijing 100101, PR China.
| | - Wentao Jiao
- Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China.
| | - Dabing Ge
- College of Resources and Environment, Hunan Agricultural University, Changsha 410128, PR China
| | - Chengwei Hu
- Information Centre, PLA Strategic Support Force Characteristic Medical Center, Beijing 100101, PR China
| | - Jing Li
- Department of Oncology, Huludao Central Hospital, Huludao 125001, PR China
| | - Yuntao Lv
- Risk assessment Laboratory for Environmental Factors of Agro-product Quality Safety, Ministry of Agriculture and villages, Changsha 410005, PR China
| | - Wanming Chen
- Risk assessment Laboratory for Environmental Factors of Agro-product Quality Safety, Ministry of Agriculture and villages, Changsha 410005, PR China
| |
Collapse
|
3
|
Yan X, Yue T, Winkler DA, Yin Y, Zhu H, Jiang G, Yan B. Converting Nanotoxicity Data to Information Using Artificial Intelligence and Simulation. Chem Rev 2023. [PMID: 37262026 DOI: 10.1021/acs.chemrev.3c00070] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Decades of nanotoxicology research have generated extensive and diverse data sets. However, data is not equal to information. The question is how to extract critical information buried in vast data streams. Here we show that artificial intelligence (AI) and molecular simulation play key roles in transforming nanotoxicity data into critical information, i.e., constructing the quantitative nanostructure (physicochemical properties)-toxicity relationships, and elucidating the toxicity-related molecular mechanisms. For AI and molecular simulation to realize their full impacts in this mission, several obstacles must be overcome. These include the paucity of high-quality nanomaterials (NMs) and standardized nanotoxicity data, the lack of model-friendly databases, the scarcity of specific and universal nanodescriptors, and the inability to simulate NMs at realistic spatial and temporal scales. This review provides a comprehensive and representative, but not exhaustive, summary of the current capability gaps and tools required to fill these formidable gaps. Specifically, we discuss the applications of AI and molecular simulation, which can address the large-scale data challenge for nanotoxicology research. The need for model-friendly nanotoxicity databases, powerful nanodescriptors, new modeling approaches, molecular mechanism analysis, and design of the next-generation NMs are also critically discussed. Finally, we provide a perspective on future trends and challenges.
Collapse
Affiliation(s)
- Xiliang Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Tongtao Yue
- Key Laboratory of Marine Environment and Ecology, Ministry of Education, Institute of Coastal Environmental Pollution Control, Ocean University of China, Qingdao 266100, China
| | - David A Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria 3052, Australia
- School of Pharmacy, University of Nottingham, Nottingham NG7 2QL, U.K
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Yongguang Yin
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Hao Zhu
- Department of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Bing Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| |
Collapse
|