1
|
Xiao Z, Zhu M, Chen J, You Z. Integrated Transfer Learning and Multitask Learning Strategies to Construct Graph Neural Network Models for Predicting Bioaccumulation Parameters of Chemicals. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:15650-15660. [PMID: 39051472 DOI: 10.1021/acs.est.4c02421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Accurate prediction of parameters related to the environmental exposure of chemicals is crucial for the sound management of chemicals. However, the lack of large data sets for training models may result in poor prediction accuracy and robustness. Herein, integrated transfer learning (TL) and multitask learning (MTL) was proposed for constructing a graph neural network (GNN) model (abbreviated as TL-MTL-GNN model) using n-octanol/water partition coefficients as a source domain. The TL-MTL-GNN model was trained to predict three bioaccumulation parameters based on enlarged data sets that cover 2496 compounds with at least one bioaccumulation parameter. Results show that the TL-MTL-GNN model outperformed single-task GNN models with and without the TL, as well as conventional machine learning models trained with molecular descriptors or fingerprints. Applicability domains were characterized by a state-of-the-art structure-activity landscape-based (abbreviated as ADSAL) methodology. The TL-MTL-GNN model coupled with the optimal ADSAL was employed to predict bioaccumulation parameters for around 60,000 chemicals, with more than 13,000 compounds identified as bioaccumulative chemicals. The high predictive accuracy and robustness of the TL-MTL-GNN model demonstrate the feasibility of integrating the TL and MTL strategy in modeling small-sized data sets. The strategy holds significant potential for addressing small data challenges in modeling environmental chemicals.
Collapse
Affiliation(s)
- Zijun Xiao
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Minghua Zhu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
- Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, College of Environment, Hohai University, Nanjing 210098, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zecang You
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
2
|
Yang H, Yang ZJ, Huang TX, Pan L, Wei XM, Hu YF, Yuan YQ, Wang LL, Ding JJ. Accurate Density Prediction of Sesquiterpenoid HEDFs and the Multiproperty Computing Server SesquiterPre. ACS OMEGA 2024; 9:26213-26221. [PMID: 38911735 PMCID: PMC11191094 DOI: 10.1021/acsomega.4c01898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 05/24/2024] [Accepted: 05/24/2024] [Indexed: 06/25/2024]
Abstract
Accurate and rapid evaluation of density is crucial for evaluating the packing and combustion characteristics of high-energy-density fuels (HEDFs). This parameter is pivotal in the selection of high-performance HEDFs. Our study leveraged a polycyclic compound density data set and quantum chemical (QC) descriptors to establish a correlation with the target properties using the XGBoost algorithm. We utilized a recursive feature elimination method to simplify the model and developed a concise and interpretable density prediction model incorporating only six QC descriptors. The model demonstrated robust performance, achieving coefficients of determination (R 2) of 0.967 and 0.971 for internal and external test sets, respectively, and root-mean-square errors (RMSE) of 0.031 and 0.027 g/cm3, respectively. Compared to the other two mainstream methods, the marginal discrepancy between the predicted and actual molecular densities underscores the model's superior predictive ability and more usefulness for energy density calculation. Furthermore, we developed a web server (SesquiterPre, https://sespre.cmdrg.com/#/) that can simultaneously calculate the density, enthalpy of combustion, and energy density of sesquiterpenoid HEDFs, which greatly facilitates the use of researchers and is of great significance for accelerating the design and screening of novel sesquiterpenoid HEDFs.
Collapse
Affiliation(s)
- Hang Yang
- State
Key Laboratory of NBC Protection for Civilian, Beijing 102205, China
- School
of Physics and Electronics Engineering, Sichuan University of Science & Engineering, Zigong 643000, China
| | - Zhi-Jiang Yang
- State
Key Laboratory of NBC Protection for Civilian, Beijing 102205, China
| | - Teng-Xin Huang
- School
of Physics and Electronics Engineering, Sichuan University of Science & Engineering, Zigong 643000, China
| | - Li Pan
- State
Key Laboratory of NBC Protection for Civilian, Beijing 102205, China
| | - Xin-Miao Wei
- State
Key Laboratory of NBC Protection for Civilian, Beijing 102205, China
| | - Yan-Fei Hu
- Department
of Applied Physics, Chengdu University of
Technology, Chengdu 610059, China
| | - Yu-Quan Yuan
- School
of Physics and Electronics Engineering, Sichuan University of Science & Engineering, Zigong 643000, China
| | - Liang-Liang Wang
- State
Key Laboratory of NBC Protection for Civilian, Beijing 102205, China
| | - Jun-Jie Ding
- State
Key Laboratory of NBC Protection for Civilian, Beijing 102205, China
| |
Collapse
|
3
|
Leniak A, Pietruś W, Kurczab R. From NMR to AI: Designing a Novel Chemical Representation to Enhance Machine Learning Predictions of Physicochemical Properties. J Chem Inf Model 2024; 64:3302-3321. [PMID: 38529877 DOI: 10.1021/acs.jcim.3c02039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
A novel approach to the utilization of nuclear magnetic resonance (NMR) spectroscopy data in the prediction of logD through machine learning algorithms is shown. In the analysis, a data set of 754 chemical compounds, organized into 30 clusters, was evaluated using advanced machine learning models, such as Support Vector Regression (SVR), Gradient Boosting, and AdaBoost, and comprehensive validation and testing methods were employed, including 10-fold cross-validation, bootstrapping, and leave-one-out. The study revealed the superior performance of the Bucket Integration method for dimensionality reduction, consistently yielding the lowest root mean square error (RMSE) across all data sets and normalization schemes. The SVR prediction models demonstrated remarkable computational efficiency and low cost, with the best RMSE value reaching 0.66. Our best model outperformed existing tools like JChem Suite's logD Predictor (0.91) and CplogD (1.27), and a comparison with traditional molecular representations yielded a comparable RMSE (0.50), emphasizing the robustness of our NMR data integration. The widespread availability of NMR data in pharmaceutical and industrial research presents an untapped resource for predictive modeling, highlighting the need for accessible methodologies like ours that complement the analytical toolbox beyond conventional 2D approaches. Our approach, designed to leverage the rich spatial data from NMR spectroscopy, provides additional insights and enriches drug discovery and computational chemistry with a freely accessible tool.
Collapse
Affiliation(s)
- Arkadiusz Leniak
- Department of Medicinal Chemistry, Celon Pharma S.A., ul. Marymoncka 15, 05-152 Kazuń Nowy, Poland
| | - Wojciech Pietruś
- Department of Medicinal Chemistry, Celon Pharma S.A., ul. Marymoncka 15, 05-152 Kazuń Nowy, Poland
- Department of Medicinal Chemistry, Maj Institute of Pharmacology, Polish Academy of Sciences, Smetna 12, 31-343 Kraków, Poland
| | - Rafał Kurczab
- Department of Medicinal Chemistry, Maj Institute of Pharmacology, Polish Academy of Sciences, Smetna 12, 31-343 Kraków, Poland
| |
Collapse
|
4
|
Long TZ, Jiang DJ, Shi SH, Deng YC, Wang WX, Cao DS. Enhancing Multi-species Liver Microsomal Stability Prediction through Artificial Intelligence. J Chem Inf Model 2024; 64:3222-3236. [PMID: 38498003 DOI: 10.1021/acs.jcim.4c00159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Liver microsomal stability, a crucial aspect of metabolic stability, significantly impacts practical drug discovery. However, current models for predicting liver microsomal stability are based on limited molecular information from a single species. To address this limitation, we constructed the largest public database of compounds from three common species: human, rat, and mouse. Subsequently, we developed a series of classification models using both traditional descriptor-based and classic graph-based machine learning (ML) algorithms. Remarkably, the best-performing models for the three species achieved Matthews correlation coefficients (MCCs) of 0.616, 0.603, and 0.574, respectively, on the test set. Furthermore, through the construction of consensus models based on these individual models, we have demonstrated their superior predictive performance in comparison with the existing models of the same type. To explore the similarities and differences in the properties of liver microsomal stability among multispecies molecules, we conducted preliminary interpretative explorations using the Shapley additive explanations (SHAP) and atom heatmap approaches for the models and misclassified molecules. Additionally, we further investigated representative structural modifications and substructures that decrease the liver microsomal stability in different species using the matched molecule pair analysis (MMPA) method and substructure extraction techniques. The established prediction models, along with insightful interpretation information regarding liver microsomal stability, will significantly contribute to enhancing the efficiency of exploring practical drugs for development.
Collapse
Affiliation(s)
- Teng-Zhi Long
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - De-Jun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Shao-Hua Shi
- Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR 999077, P. R. China
| | - You-Chao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Wen-Xuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR 999077, P. R. China
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| |
Collapse
|
5
|
Yi JC, Yang ZY, Zhao WT, Yang ZJ, Zhang XC, Wu CK, Lu AP, Cao DS. ChemMORT: an automatic ADMET optimization platform using deep learning and multi-objective particle swarm optimization. Brief Bioinform 2024; 25:bbae008. [PMID: 38385872 PMCID: PMC10883642 DOI: 10.1093/bib/bbae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/17/2023] [Accepted: 01/02/2024] [Indexed: 02/23/2024] Open
Abstract
Drug discovery and development constitute a laborious and costly undertaking. The success of a drug hinges not only good efficacy but also acceptable absorption, distribution, metabolism, elimination, and toxicity (ADMET) properties. Overall, up to 50% of drug development failures have been contributed from undesirable ADMET profiles. As a multiple parameter objective, the optimization of the ADMET properties is extremely challenging owing to the vast chemical space and limited human expert knowledge. In this study, a freely available platform called Chemical Molecular Optimization, Representation and Translation (ChemMORT) is developed for the optimization of multiple ADMET endpoints without the loss of potency (https://cadd.nscc-tj.cn/deploy/chemmort/). ChemMORT contains three modules: Simplified Molecular Input Line Entry System (SMILES) Encoder, Descriptor Decoder and Molecular Optimizer. The SMILES Encoder can generate the molecular representation with a 512-dimensional vector, and the Descriptor Decoder is able to translate the above representation to the corresponding molecular structure with high accuracy. Based on reversible molecular representation and particle swarm optimization strategy, the Molecular Optimizer can be used to effectively optimize undesirable ADMET properties without the loss of bioactivity, which essentially accomplishes the design of inverse QSAR. The constrained multi-objective optimization of the poly (ADP-ribose) polymerase-1 inhibitor is provided as the case to explore the utility of ChemMORT.
Collapse
Affiliation(s)
- Jia-Cai Yi
- School of Computer Science, National University of Defense Technology, Changsha 410073, Hunan, PR China
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Wen-Tao Zhao
- School of Computer Science, National University of Defense Technology, Changsha 410073, Hunan, PR China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Xiao-Chen Zhang
- School of Computer Science, National University of Defense Technology, Changsha 410073, Hunan, PR China
| | - Cheng-Kun Wu
- State Key Laboratory of High-Performance Computing, Changsha 410073, Hunan, PR China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| |
Collapse
|
6
|
Tran TTV, Tayara H, Chong KT. Recent Studies of Artificial Intelligence on In Silico Drug Absorption. J Chem Inf Model 2023; 63:6198-6211. [PMID: 37819031 DOI: 10.1021/acs.jcim.3c00960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Absorption is an important area of research in pharmacochemistry and drug development, because the drug has to be absorbed before any drug effects can occur. Furthermore, the ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profile of drugs can be directly and considerably altered by modulating factors affecting absorption. Many drugs in development fail because of poor absorption. The research and continuous efforts of researchers in recent years have brought many successes and promises in drug absorption property prediction, especially in silico, which helps to reduce the time and cost significantly for screening undesirable drug candidates. In this report, we explicitly provide an overview of recent in silico studies on predicting absorption properties, especially from 2019 to the present, using artificial intelligence. Additionally, we have collected and investigated public databases that support absorption prediction research. On those grounds, we also proposed the challenges and development directions of absorption prediction in the future. We hope this review can provide researchers with valuable guidelines on absorption prediction to facilitate the development of newer approaches in drug discovery.
Collapse
Affiliation(s)
- Thi Tuyet Van Tran
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Faculty of Information Technology, An Giang University, Long Xuyen 880000, Vietnam
- Vietnam National University, Ho Chi Minh City, Ho Chi Minh 700000, Vietnam
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea
| |
Collapse
|
7
|
Patterson-Gardner C, Pavelich GM, Cannon AT, Menke AJ, Simanek EE. Adaptation of Empirical Methods to Predict the LogD of Triazine Macrocycles. ACS Med Chem Lett 2023; 14:1378-1382. [PMID: 37849549 PMCID: PMC10577694 DOI: 10.1021/acsmedchemlett.3c00290] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 09/01/2023] [Indexed: 10/19/2023] Open
Abstract
Octanol/water partition coefficients guide drug design, but algorithms do not always accurately predict these values. For cationic triazine macrocycles that adopt a conserved folded shape in solution, common algorithms fall short. Here, the logD values for 12 macrocycles differing in amino acid choice were predicted and then measured experimentally. On average, AlogP, XlogP, and ChemAxon predictions deviate by 0.9, 2.8, and 3.9 log units, with XlogP overestimating lipophilicity and AlogP and ChemAxon underestimating lipophilicity. Importantly, however, a linear relationship (R2 > 0.98) exists between the values predicted by AlogP and the experimentally determined logD values, thus enabling more accurate predictions.
Collapse
Affiliation(s)
- Casey
J. Patterson-Gardner
- Department of Chemistry & Biochemistry, Texas Christian University, Fort Worth, Texas 76129, United States
| | - Gretchen M. Pavelich
- Department of Chemistry & Biochemistry, Texas Christian University, Fort Worth, Texas 76129, United States
| | - April T. Cannon
- Department of Chemistry & Biochemistry, Texas Christian University, Fort Worth, Texas 76129, United States
| | - Alexander J. Menke
- Department of Chemistry & Biochemistry, Texas Christian University, Fort Worth, Texas 76129, United States
| | - Eric E. Simanek
- Department of Chemistry & Biochemistry, Texas Christian University, Fort Worth, Texas 76129, United States
| |
Collapse
|
8
|
Wang Y, Xiong J, Xiao F, Zhang W, Cheng K, Rao J, Niu B, Tong X, Qu N, Zhang R, Wang D, Chen K, Li X, Zheng M. LogD7.4 prediction enhanced by transferring knowledge from chromatographic retention time, microscopic pKa and logP. J Cheminform 2023; 15:76. [PMID: 37670374 PMCID: PMC10478446 DOI: 10.1186/s13321-023-00754-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 08/25/2023] [Indexed: 09/07/2023] Open
Abstract
Lipophilicity is a fundamental physical property that significantly affects various aspects of drug behavior, including solubility, permeability, metabolism, distribution, protein binding, and toxicity. Accurate prediction of lipophilicity, measured by the logD7.4 value (the distribution coefficient between n-octanol and buffer at physiological pH 7.4), is crucial for successful drug discovery and design. However, the limited availability of data for logD modeling poses a significant challenge to achieving satisfactory generalization capability. To address this challenge, we have developed a novel logD7.4 prediction model called RTlogD, which leverages knowledge from multiple sources. RTlogD combines pre-training on a chromatographic retention time (RT) dataset since the RT is influenced by lipophilicity. Additionally, microscopic pKa values are incorporated as atomic features, providing valuable insights into ionizable sites and ionization capacity. Furthermore, logP is integrated as an auxiliary task within a multitask learning framework. We conducted ablation studies and presented a detailed analysis, showcasing the effectiveness and interpretability of RT, pKa, and logP in the RTlogD model. Notably, our RTlogD model demonstrated superior performance compared to commonly used algorithms and prediction tools. These results underscore the potential of the RTlogD model to improve the accuracy and generalization of logD prediction in drug discovery and design. In summary, the RTlogD model addresses the challenge of limited data availability in logD modeling by leveraging knowledge from RT, microscopic pKa, and logP. Incorporating these factors enhances the predictive capabilities of our model, and it holds promise for real-world applications in drug discovery and design scenarios.
Collapse
Affiliation(s)
- Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jiacheng Xiong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Fu Xiao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Kaiyang Cheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Buying Niu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Ning Qu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Runze Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | | | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China.
| |
Collapse
|
9
|
Biswas S, Chung Y, Ramirez J, Wu H, Green WH. Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning. J Chem Inf Model 2023; 63:4574-4588. [PMID: 37487557 DOI: 10.1021/acs.jcim.3c00546] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
Knowledge of critical properties, such as critical temperature, pressure, density, as well as acentric factor, is essential to calculate thermo-physical properties of chemical compounds. Experiments to determine critical properties and acentric factors are expensive and time intensive; therefore, we developed a machine learning (ML) model that can predict these molecular properties given the SMILES representation of a chemical species. We explored directed message passing neural network (D-MPNN) and graph attention network as ML architecture choices. Additionally, we investigated featurization with additional atomic and molecular features, multitask training, and pretraining using estimated data to optimize model performance. Our final model utilizes a D-MPNN layer to learn the molecular representation and is supplemented by Abraham parameters. A multitask training scheme was used to train a single model to predict all the critical properties and acentric factors along with boiling point, melting point, enthalpy of vaporization, and enthalpy of fusion. The model was evaluated on both random and scaffold splits where it shows state-of-the-art accuracies. The extensive data set of critical properties and acentric factors contains 1144 chemical compounds and is made available in the public domain together with the source code that can be used for further exploration.
Collapse
Affiliation(s)
- Sayandeep Biswas
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yunsie Chung
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Josephine Ramirez
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Haoyang Wu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
10
|
Chen S, Wulamu A, Zou Q, Zheng H, Wen L, Guo X, Chen H, Zhang T, Zhang Y. MD-GNN: A mechanism-data-driven graph neural network for molecular properties prediction and new material discovery. J Mol Graph Model 2023; 123:108506. [PMID: 37182505 DOI: 10.1016/j.jmgm.2023.108506] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 04/12/2023] [Accepted: 04/30/2023] [Indexed: 05/16/2023]
Abstract
Molecular properties prediction and new material discovery are significant for the pharmaceutical industry, food, chemistry, and other fields. The popular methods are theoretical mechanism calculation and machine learning. There is a deviation between the theoretical mechanism calculation results and the experimental data. Machine learning method provides a promising solution. However, the process is lack of interpretability, and the reliability and the generalization depend on the training data. In this paper, a mechanism correction model combined with graph neural network (GNN) model which is based on the fusion of graph embedding and descriptors vector is proposed as backbone network to proceed molecule properties prediction and new material discovery. The molecular structure is input to graph neural network and the abstracted features are fused with numerical features together for training. The experiment data and computing data are designed as label constructor, and then the theoretical computation (mechanism driven model) is fused with the output of GNN (data-driven model) to form a fused model to modulate the output for the molecular property prediction. Experiments for public data set are executed and the results show that Mechanism-Data-Driven Graph Neural Network (MD-GNN) can effectively make the predicted results more accurate. Nineteen molecules by different construction are designed for potential drug discovery, the prediction from the proposed MD-GNN model shows that there are 9 candidates are discovered.
Collapse
Affiliation(s)
- Saian Chen
- Department of Computer, School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing, 100083, China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, 100083, China
| | - Aziguli Wulamu
- Department of Computer, School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing, 100083, China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, 100083, China
| | - Qiping Zou
- Key Laboratory of AI and Information Processing (Hechi University), Education Department of Guangxi Zhuang Autonomous Region, Hechi, 546300, Guangxi, China
| | - Han Zheng
- Key Laboratory of AI and Information Processing (Hechi University), Education Department of Guangxi Zhuang Autonomous Region, Hechi, 546300, Guangxi, China
| | - Li Wen
- Department of Business Administration, School of Business, City University of Macau (City U), Macao, 999078, China
| | - Xi Guo
- Department of Computer, School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing, 100083, China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, 100083, China
| | - Han Chen
- Department of Computer, School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing, 100083, China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, 100083, China
| | - Taohong Zhang
- Department of Computer, School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing, 100083, China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, 100083, China.
| | - Ying Zhang
- QingGong College, North China University of Science and Technology, TangShan, Hebei, 064000, China
| |
Collapse
|
11
|
Long TZ, Shi SH, Liu S, Lu AP, Liu ZQ, Li M, Hou TJ, Cao DS. Structural Analysis and Prediction of Hematotoxicity Using Deep Learning Approaches. J Chem Inf Model 2023; 63:111-125. [PMID: 36472475 DOI: 10.1021/acs.jcim.2c01088] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Hematotoxicity has been becoming a serious but overlooked toxicity in drug discovery. However, only a few in silico models have been reported for the prediction of hematotoxicity. In this study, we constructed a high-quality dataset comprising 759 hematotoxic compounds and 1623 nonhematotoxic compounds and then established a series of classification models based on a combination of seven machine learning (ML) algorithms and nine molecular representations. The results based on two data partitioning strategies and applicability domain (AD) analysis illustrate that the best prediction model based on Attentive FP yielded a balanced accuracy (BA) of 72.6%, an area under the receiver operating characteristic curve (AUC) value of 76.8% for the validation set, and a BA of 69.2%, an AUC of 75.9% for the test set. In addition, compared with existing filtering rules and models, our model achieved the highest BA value of 67.5% for the external validation set. Additionally, the shapley additive explanation (SHAP) and atom heatmap approaches were utilized to discover the important features and structural fragments related to hematotoxicity, which could offer helpful tips to detect undesired positive substances. Furthermore, matched molecular pair analysis (MMPA) and representative substructure derivation technique were employed to further characterize and investigate the transformation principles and distinctive structural features of hematotoxic chemicals. We believe that the novel graph-based deep learning algorithms and insightful interpretation presented in this study can be used as a trustworthy and effective tool to assess hematotoxicity in the development of new drugs.
Collapse
Affiliation(s)
- Teng-Zhi Long
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Shao-Hua Shi
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China.,Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, 0000, P. R. China
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| | - Ai-Ping Lu
- Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, 0000, P. R. China
| | - Zhao-Qian Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, P. R. China
| | - Ting-Jun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China.,Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, 0000, P. R. China.,Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| |
Collapse
|
12
|
Wu J, Wang J, Wu Z, Zhang S, Deng Y, Kang Y, Cao D, Hsieh CY, Hou T. ALipSol: An Attention-Driven Mixture-of-Experts Model for Lipophilicity and Solubility Prediction. J Chem Inf Model 2022; 62:5975-5987. [PMID: 36417544 DOI: 10.1021/acs.jcim.2c01290] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Lipophilicity (logD) and aqueous solubility (logSw) play a central role in drug development. The accurate prediction of these properties remains to be solved due to data scarcity. Current methodologies neglect the intrinsic relationships between physicochemical properties and usually ignore the ionization effects. Here, we propose an attention-driven mixture-of-experts (MoE) model named ALipSol, which explicitly reproduces the hierarchy of task relationships. We adopt the principle of divide-and-conquer by breaking down the complex end point (logD or logSw) into simpler ones (acidic pKa, basic pKa, and logP) and allocating a specific expert network for each subproblem. Subsequently, we implement transfer learning to extract knowledge from related tasks, thus alleviating the dilemma of limited data. Additionally, we substitute the gating network with an attention mechanism to better capture the dynamic task relationships on a per-example basis. We adopt local fine-tuning and consensus prediction to further boost model performance. Extensive evaluation experiments verify the success of the ALipSol model, which achieves RMSE improvement of 8.04%, 2.49%, 8.57%, 12.8%, and 8.60% on the Lipop, ESOL, AqSolDB, external logD, and external logS data sets, respectively, compared with Attentive FP and the state-of-the-art in silico tools. In particular, our model yields more significant advantages (Welch's t-test) for small training data, implying its high robustness and generalizability. The interpretability analysis proves that the atom contributions learned by ALipSol are more reasonable compared with the vanilla Attentive FP, and the substitution effects in benzene derivatives agreed well with empirical constants, revealing the potential of our model to extract useful patterns from data and provide guidance for lead optimization.
Collapse
Affiliation(s)
- Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania15261, United States
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent, Shenzhen, 518057Guangdong, P. R. China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004Hunan, P. R. China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| |
Collapse
|
13
|
Aliagas I, Gobbi A, Lee ML, Sellers BD. Comparison of logP and logD correction models trained with public and proprietary data sets. J Comput Aided Mol Des 2022; 36:253-262. [PMID: 35359246 DOI: 10.1007/s10822-022-00450-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 03/15/2022] [Indexed: 10/18/2022]
Abstract
In drug discovery, partition and distribution coefficients, logP and logD for octanol/water, are widely used as metrics of the lipophilicity of molecules, which in turn have a strong influence on the bioactivity and bioavailability of potential drugs. There are a variety of established methods, mostly fragment or atom-based, to calculate logP while logD prediction generally relies on calculated logP and pKa for the estimation of neutral and ionized populations at a given pH. Algorithms such as ClogP have limitations generally leading to systematic errors for chemically related molecules while pKa estimation is generally more difficult due to the interplay of electronic, inductive and conjugation effects for ionizable moieties. We propose an integrated machine learning QSAR modeling approach to predict logD by training the model with experimental data while using ClogP and pKa predicted by commercial software as model descriptors. By optimizing the loss function for the ClogD calculated by the software, we build a correction model that incorporates both descriptors from the software and available experimental logD data. Additionally, we calculate logP from the logD model using the software predicted pKa's. Here, we have trained models using publicly or commercial available logD data to show that this approach can improve on commercial software predictions of lipophilicity. When applied to other logD data sets, this approach extends the domain of applicability of logD and logP predictions over commercial software. Performance of these models favorably compare with models built with a larger set of proprietary logD data.
Collapse
Affiliation(s)
- Ignacio Aliagas
- Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA.
| | - Alberto Gobbi
- Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Man-Ling Lee
- Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Benjamin D Sellers
- Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| |
Collapse
|
14
|
Pan L, He Q, Wu Y, Zhang N, Cai H, Yang B, Wang Y, Li Y, Wu X. Synthesis, radiolabeling, and evaluation of a potent β-site APP cleaving enzyme (BACE1) inhibitor for PET imaging of BACE1 in vivo. Bioorg Med Chem Lett 2022; 59:128543. [PMID: 35031452 DOI: 10.1016/j.bmcl.2022.128543] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 12/24/2021] [Accepted: 01/09/2022] [Indexed: 02/05/2023]
Abstract
The β-site APP-cleaving enzyme 1 (BACE1) plays important roles in the proteolytic processing of amyloid precursor protein, and can be regarded as an important target for the diagnosis and treatment of AD. This study aimed to report the synthesis and evaluation of an 18F-labeled 2-amino-3,4-dihydroquinazoline analog as a potential BACE1 radioligand. A fluoropropyl side chain was introduced to the phenyl of this 3,4-dihydroquinazoline scaffold to generate the radioligand. Our preliminary data indicated that although the 2-amino-3,4-dihydroquinazoline scaffold possessed favorable in-vitro properties as a PET ligand, its poor brain uptake hindered the in-vivo imaging of BACE1. Further investigation would be required to optimize the scaffold for the development of a blood-brain-barrier-permeable BACE1-targeted PET ligand.
Collapse
Affiliation(s)
- Lili Pan
- Department of Nuclear Medicine, Laboratory of Clinical Nuclear Medicine, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Qian He
- Department of Emergency, West China Hospital, Sichuan University, Chengdu 610000, Sichuan, China
| | - Yi Wu
- Key Laboratory of Prevention and Treatment of Cardiovascular and Cerebrovascular Diseases, Ministry of Education, Gannan Medical University, Ganzhou 341000, China
| | - Ni Zhang
- Department of Psychiatry, West China Hospital of Sichuan University, Chengdu 610041, China
| | - Huawei Cai
- Department of Nuclear Medicine, Laboratory of Clinical Nuclear Medicine, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Bo Yang
- Department of Pharmacy, Panzhihua Central Hospital, Panzhihua, Sichuan, 617067, China
| | - Yuxi Wang
- Department of Respiratory and Critical Care Medicine, West China Medical School/West China Hospital, Sichuan University, Chengdu 610000, China
| | - Yunchun Li
- Department of Nuclear Medicine, Laboratory of Clinical Nuclear Medicine, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, China; Department of Nuclear Medicine, The Second People's Hospital of Yibin, Yibin 644000, Sichuan, China.
| | - Xiaoai Wu
- Department of Nuclear Medicine, Laboratory of Clinical Nuclear Medicine, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, China.
| |
Collapse
|
15
|
Machine learning & deep learning in data-driven decision making of drug discovery and challenges in high-quality data acquisition in the pharmaceutical industry. Future Med Chem 2021; 14:245-270. [PMID: 34939433 DOI: 10.4155/fmc-2021-0243] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Predicting novel small molecule bioactivities for the target deconvolution, hit-to-lead optimization in drug discovery research, requires molecular representation. Previous reports have demonstrated that machine learning (ML) and deep learning (DL) have substantial implications in virtual screening, peptide synthesis, drug ADMET screening and biomarker discovery. These strategies can increase the positive outcomes in the drug discovery process without false-positive rates and can be achieved in a cost-effective way with a minimum duration of time by high-quality data acquisition. This review substantially discusses the recent updates in AI tools as cheminformatics application in medicinal chemistry for the data-driven decision making of drug discovery and challenges in high-quality data acquisition in the pharmaceutical industry while improving small-molecule bioactivities and properties.
Collapse
|
16
|
Yang ZY, Fu L, Lu AP, Liu S, Hou TJ, Cao DS. Semi-automated workflow for molecular pair analysis and QSAR-assisted transformation space expansion. J Cheminform 2021; 13:86. [PMID: 34774096 PMCID: PMC8590336 DOI: 10.1186/s13321-021-00564-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/30/2021] [Indexed: 12/01/2022] Open
Abstract
In the process of drug discovery, the optimization of lead compounds has always been a challenge faced by pharmaceutical chemists. Matched molecular pair analysis (MMPA), a promising tool to efficiently extract and summarize the relationship between structural transformation and property change, is suitable for local structural optimization tasks. Especially, the integration of MMPA with QSAR modeling can further strengthen the utility of MMPA in molecular optimization navigation. In this study, a new semi-automated procedure based on KNIME was developed to support MMPA on both large- and small-scale datasets, including molecular preparation, QSAR model construction, applicability domain evaluation, and MMP calculation and application. Two examples covering regression and classification tasks were provided to gain a better understanding of the importance of MMPA, which has also shown the reliability and utility of this MMPA-by-QSAR pipeline. ![]()
Collapse
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, 999077, SAR, People's Republic of China
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, People's Republic of China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China.
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China. .,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China. .,Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, 999077, SAR, People's Republic of China.
| |
Collapse
|
17
|
Venkatraman V. FP-ADMET: a compendium of fingerprint-based ADMET prediction models. J Cheminform 2021; 13:75. [PMID: 34583740 PMCID: PMC8479898 DOI: 10.1186/s13321-021-00557-5] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 09/20/2021] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION The absorption, distribution, metabolism, excretion, and toxicity (ADMET) of drugs plays a key role in determining which among the potential candidates are to be prioritized. In silico approaches based on machine learning methods are becoming increasing popular, but are nonetheless limited by the availability of data. With a view to making both data and models available to the scientific community, we have developed FPADMET which is a repository of molecular fingerprint-based predictive models for ADMET properties. In this article, we have examined the efficacy of fingerprint-based machine learning models for a large number of ADMET-related properties. The predictive ability of a set of 20 different binary fingerprints (based on substructure keys, atom pairs, local path environments, as well as custom fingerprints such as all-shortest paths) for over 50 ADMET and ADMET-related endpoints have been evaluated as part of the study. We find that for a majority of the properties, fingerprint-based random forest models yield comparable or better performance compared with traditional 2D/3D molecular descriptors. AVAILABILITY The models are made available as part of open access software that can be downloaded from https://gitlab.com/vishsoft/fpadmet .
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Norwegian University of Science and Technology, Realfagbygget, Gløshaugen, Høgskoleringen, 7491, Trondheim, Norway.
| |
Collapse
|
18
|
Naveja JJ, Vogt M. Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications. Molecules 2021; 26:5291. [PMID: 34500724 PMCID: PMC8433811 DOI: 10.3390/molecules26175291] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 08/27/2021] [Accepted: 08/28/2021] [Indexed: 01/21/2023] Open
Abstract
Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis-Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments.
Collapse
Affiliation(s)
- José J. Naveja
- Instituto de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico;
| | - Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5-6, 53115 Bonn, Germany
| |
Collapse
|
19
|
Wang F, Diao X, Chang S, Xu L. Recent Progress of Deep Learning in Drug Discovery. Curr Pharm Des 2021; 27:2088-2096. [PMID: 33511933 DOI: 10.2174/1381612827666210129123231] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 11/11/2020] [Indexed: 11/22/2022]
Abstract
Deep learning, an emerging field of artificial intelligence based on neural networks in machine learning, has been applied in various fields and is highly valued. Herein, we mainly review several mainstream architectures in deep learning, including deep neural networks, convolutional neural networks and recurrent neural networks in the field of drug discovery. The applications of these architectures in molecular de novo design, property prediction, biomedical imaging and synthetic planning have also been explored. Apart from that, we further discuss the future direction of the deep learning approaches and the main challenges we need to address.
Collapse
Affiliation(s)
- Feng Wang
- College of Information Science and Engineering, Huaide College of Changzhou University, Taizhou 214500, China
| | - XiaoMin Diao
- College of Information Science and Engineering, Huaide College of Changzhou University, Taizhou 214500, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| |
Collapse
|
20
|
Wang L, Ding J, Shi P, Fu L, Pan L, Tian J, Cao D, Jiang H, Ding X. Ensemble machine learning to evaluate the in vivo acute oral toxicity and in vitro human acetylcholinesterase inhibitory activity of organophosphates. Arch Toxicol 2021; 95:2443-2457. [PMID: 33934188 DOI: 10.1007/s00204-021-03056-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 04/21/2021] [Indexed: 12/13/2022]
Abstract
Organophosphates (OPs) are hazardous chemicals widely used in industry and agriculture. Distribution of their residues in nature causes serious risks to humans, animals, and plants. To reduce hazards from OPs, quantitative structure-activity relationship (QSAR) models for predicting their acute oral toxicity in rats and mice and inhibition constants concerning human acetylcholinesterase were developed according to the bioactivity data of 456 unique OPs. Based on robust, two-dimensional molecular descriptors and quantum chemical descriptors, which accurately reflect OP electronic structures and reactivities, the influences of eight machine-learning algorithms on the prediction performance of the QSAR models were explored, and consensus QSAR models were constructed. Several strict model validation indices and the results of applicability domain evaluations show that the established consensus QSAR models exhibit good robustness, practical prediction abilities, and wide application scopes. Poor correlation was observed between acute oral toxicity at the mammalian level and the inhibition constants at the molecular level, indicating that the acute toxicity of OPs cannot be evaluated only by the experimental data of enzyme inhibitory activity, their toxicokinetic characteristics must also be considered. The constructed QSAR models described herein provide rapid, theoretical assessment of the bioactivity of unstudied or unknown OPs, as well as guidance for making decisions regarding their regulation.
Collapse
Affiliation(s)
- Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Peichang Shi
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, China
| | - Li Pan
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Jiahao Tian
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, China. .,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, People's Republic of China.
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China.
| | - Xiaoqin Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China.
| |
Collapse
|
21
|
Abstract
Quantitative Structure–Activity Relationship (QSAR) aims to correlate molecular structure properties with corresponding bioactivity. Chance correlations and multicollinearity are two major problems often encountered when generating QSAR models. Feature selection can significantly improve the accuracy and interpretability of QSAR by removing redundant or irrelevant molecular descriptors. An artificial bee colony algorithm (ABC) that mimics the foraging behaviors of honey bee colony was originally proposed for continuous optimization problems. It has been applied to feature selection for classification but seldom for regression analysis and prediction. In this paper, a binary ABC algorithm is used to select features (molecular descriptors) in QSAR. Furthermore, we propose an improved ABC-based algorithm for feature selection in QSAR, namely ABC-PLS-1. Crossover and mutation operators are introduced to employed bee and onlooker bee phase to modify several dimensions of each solution, which not only saves the process of converting continuous values into discrete values, but also reduces the computational resources. In addition, a novel greedy selection strategy which selects the feature subsets with higher accuracy and fewer features helps the algorithm to converge fast. Three QSAR datasets are used for the evaluation of the proposed algorithm. Experimental results show that ABC-PLS-1 outperforms PSO-PLS, WS-PSO-PLS, and BFDE-PLS in accuracy, root mean square error, and the number of selected features. Moreover, we also study whether to implement scout bee phase when tracking regression problems and drawing such an interesting conclusion that the scout bee phase is redundant when dealing with the feature selection in low-dimensional and medium-dimensional regression problems.
Collapse
|
22
|
Xie L, Xu L, Kong R, Chang S, Xu X. Improvement of Prediction Performance With Conjoint Molecular Fingerprint in Deep Learning. Front Pharmacol 2021; 11:606668. [PMID: 33488387 PMCID: PMC7819282 DOI: 10.3389/fphar.2020.606668] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 11/23/2020] [Indexed: 12/27/2022] Open
Abstract
The accurate predicting of physical properties and bioactivity of drug molecules in deep learning depends on how molecules are represented. Many types of molecular descriptors have been developed for quantitative structure-activity/property relationships quantitative structure-activity relationships (QSPR). However, each molecular descriptor is optimized for a specific application with encoding preference. Considering that standalone featurization methods may only cover parts of information of the chemical molecules, we proposed to build the conjoint fingerprint by combining two supplementary fingerprints. The impact of conjoint fingerprint and each standalone fingerprint on predicting performance was systematically evaluated in predicting the logarithm of the partition coefficient (logP) and binding affinity of protein-ligand by using machine learning/deep learning (ML/DL) methods, including random forest (RF), support vector regression (SVR), extreme gradient boosting (XGBoost), long short-term memory network (LSTM), and deep neural network (DNN). The results demonstrated that the conjoint fingerprint yielded improved predictive performance, even outperforming the consensus model using two standalone fingerprints among four out of five examined methods. Given that the conjoint fingerprint scheme shows easy extensibility and high applicability, we expect that the proposed conjoint scheme would create new opportunities for continuously improving predictive performance of deep learning by harnessing the complementarity of various types of fingerprints.
Collapse
Affiliation(s)
- Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China.,Jiangsu Sino-Israel Industrial Technology Research Institute, Changzhou, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| |
Collapse
|
23
|
Fu L, Yang ZY, Yang ZJ, Yin MZ, Lu AP, Chen X, Liu S, Hou TJ, Cao DS. QSAR-assisted-MMPA to expand chemical transformation space for lead optimization. Brief Bioinform 2021; 22:6071857. [PMID: 33418563 DOI: 10.1093/bib/bbaa374] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 10/25/2020] [Accepted: 11/25/2020] [Indexed: 11/13/2022] Open
Abstract
Matched molecular pairs analysis (MMPA) has become a powerful tool for automatically and systematically identifying medicinal chemistry transformations from compound/property datasets. However, accurate determination of matched molecular pair (MMP) transformations largely depend on the size and quality of existing experimental data. Lack of high-quality experimental data heavily hampers the extraction of more effective medicinal chemistry knowledge. Here, we developed a new strategy called quantitative structure-activity relationship (QSAR)-assisted-MMPA to expand the number of chemical transformations and took the logD7.4 property endpoint as an example to demonstrate the reliability of the new method. A reliable logD7.4 consensus prediction model was firstly established, and its applicability domain was strictly assessed. By applying the reliable logD7.4 prediction model to screen two chemical databases, we obtained more high-quality logD7.4 data by defining a strict applicability domain threshold. Then, MMPA was performed on the predicted data and experimental data to derive more chemical rules. To validate the reliability of the chemical rules, we compared the magnitude and directionality of the property changes of the predicted rules with those of the measured rules. Then, we compared the novel chemical rules generated by our proposed approach with the published chemical rules, and found that the magnitude and directionality of the property changes were consistent, indicating that the proposed QSAR-assisted-MMPA approach has the potential to enrich the collection of rule types or even identify completely novel rules. Finally, we found that the number of the MMP rules derived from the experimental data could be amplified by the predicted data, which is helpful for us to analyze the medicinal chemical rules in local chemical environment. In summary, the proposed QSAR-assisted-MMPA approach could be regarded as a very promising strategy to expand the chemical transformation space for lead optimization, especially when no enough experimental data can support MMPA.
Collapse
Affiliation(s)
- Li Fu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China.,Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Ming-Zhu Yin
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R China
| | - Xiang Chen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China.,Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China.,Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R China
| |
Collapse
|
24
|
Wang LL, Ding JJ, Pan L, Fu L, Tian JH, Cao DS, Jiang H, Ding XQ. Quantitative structure-toxicity relationship model for acute toxicity of organophosphates via multiple administration routes in rats and mice. JOURNAL OF HAZARDOUS MATERIALS 2021; 401:123724. [PMID: 33113726 DOI: 10.1016/j.jhazmat.2020.123724] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 07/29/2020] [Accepted: 08/13/2020] [Indexed: 06/11/2023]
Abstract
Organophosphates (OPs) are highly toxic compounds, with widespread application in agricultural and chemical industries, whose introduction into the environment poses serious hazards to humans and ecological systems. To assess and ultimately mitigate these hazards, this study predicted the acute toxicity of OPs according to their chemical structure and administration route. The acute toxicity data of 161 OPs in two species via six different administration routes were manually collected and used to develop a series of quantitative structure-toxicity relationship (QSTR) models with robust and practical predictive abilities. The random forest algorithm was used to develop the models, employing both quantum chemical and two-dimensional descriptors according to OECD guidelines. Correlation results and feature similarities indicated that whereas acute toxicity data from rats and mice via the same administration route were combinable for modeling, data from different routes were not. Six QSTR models for each route in a single species and two QSTR models for a single route in the two species were constructed, achieving practical predictive performance. Despite significant variances in their datasets, the prediction models could predict the acute toxicity of novel or unknown OPs, realize rapid assessment, and provide guidance for regulatory decisions to reduce the hazards of OPs.
Collapse
Affiliation(s)
- Liang-Liang Wang
- Beijing Institute of Pharmaceutical Chemistry, Beijing, 102205, PR China
| | - Jun-Jie Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing, 102205, PR China
| | - Li Pan
- Beijing Institute of Pharmaceutical Chemistry, Beijing, 102205, PR China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, PR China
| | - Jia-Hao Tian
- Beijing Institute of Pharmaceutical Chemistry, Beijing, 102205, PR China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, PR China; Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, PR China.
| | - Hui Jiang
- Beijing Institute of Pharmaceutical Chemistry, Beijing, 102205, PR China.
| | - Xiao-Qin Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing, 102205, PR China.
| |
Collapse
|
25
|
Wang Y, Chen X. QSPR model for Caco-2 cell permeability prediction using a combination of HQPSO and dual-RBF neural network. RSC Adv 2020; 10:42938-42952. [PMID: 35514900 PMCID: PMC9058322 DOI: 10.1039/d0ra08209k] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 11/06/2020] [Indexed: 12/23/2022] Open
Abstract
The Caco-2 cell model is widely used to evaluate the in vitro human intestinal permeability of drugs due to its morphological and functional similarity to human enterocytes. Although it is safe and relatively economic, it is time-consuming. A rapid and accurate quantitative structure-property relationship (QSPR) model of Caco-2 permeability is helpful to improve the efficiency of oral drug development. The aim of our study is to explore the predictive ability of the QSPR model, to study its permeation mechanism, and to develop a potential permeability prediction model, for Caco-2 cells. In our study, a relatively large data set was collected and the abnormal data were eliminated using the Monte Carlo regression and hybrid quantum particle swarm optimization (HQPSO) algorithm. Then, the remaining 1827 compounds were used to establish QSPR models. To generate multiple chemically diverse training and test sets, we used a combination of principal component analysis (PCA) and self-organizing mapping (SOM) neural networks to split the modeling data set characterized by PaDEL-descriptors. After preliminary selection of descriptors by the mean decrease impurity (MDI) method, the HQPSO algorithm was used to select the key descriptors. Six different methods, namely, multivariate linear regression (MLR), support vector machine regression (SVR), xgboost, radial basis function (RBF) neural networks, dual-SVR and dual-RBF were employed to develop QSPR models. The best dual-RBF model was obtained finally with R 2 = 0.91, and R cv5 2 = 0.77, for the training set, and R T 2 = 0.77, for the test set. A series of validation methods were used to assess the robustness and predictive ability of the dual-RBF model under OECD principles. A new application domain (AD) definition method based on the descriptor importance-weighted and distance-based (IWD) method was proposed, and the outliers were analyzed carefully. Combined with the importance of the descriptors used in the dual-RBF model, we concluded that the "H E-state" and hydrogen bonds are important factors affecting the permeability of drugs passing through the Caco-2 cell. Compared with the reported studies, our method exhibits certain advantages in data size, transparency of modeling process and prediction accuracy to some extent, and is a promising tool for virtual screening in the early stage of drug development.
Collapse
Affiliation(s)
- Yukun Wang
- School of Chemical Engineering, University of Science and Technology Liaoning No. 185, Qianshan Anshan 114051 Liaoning China
- School of Electronic and Information Engineering, University of Science and Technology Liaoning No. 185, Qianshan Anshan 114051 Liaoning China +864125928367
| | - Xuebo Chen
- School of Electronic and Information Engineering, University of Science and Technology Liaoning No. 185, Qianshan Anshan 114051 Liaoning China +864125928367
| |
Collapse
|
26
|
Wu F, Zhou Y, Li L, Shen X, Chen G, Wang X, Liang X, Tan M, Huang Z. Computational Approaches in Preclinical Studies on Drug Discovery and Development. Front Chem 2020; 8:726. [PMID: 33062633 PMCID: PMC7517894 DOI: 10.3389/fchem.2020.00726] [Citation(s) in RCA: 119] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Accepted: 07/14/2020] [Indexed: 12/11/2022] Open
Abstract
Because undesirable pharmacokinetics and toxicity are significant reasons for the failure of drug development in the costly late stage, it has been widely recognized that drug ADMET properties should be considered as early as possible to reduce failure rates in the clinical phase of drug discovery. Concurrently, drug recalls have become increasingly common in recent years, prompting pharmaceutical companies to increase attention toward the safety evaluation of preclinical drugs. In vitro and in vivo drug evaluation techniques are currently more mature in preclinical applications, but these technologies are costly. In recent years, with the rapid development of computer science, in silico technology has been widely used to evaluate the relevant properties of drugs in the preclinical stage and has produced many software programs and in silico models, further promoting the study of ADMET in vitro. In this review, we first introduce the two ADMET prediction categories (molecular modeling and data modeling). Then, we perform a systematic classification and description of the databases and software commonly used for ADMET prediction. We focus on some widely studied ADMT properties as well as PBPK simulation, and we list some applications that are related to the prediction categories and web tools. Finally, we discuss challenges and limitations in the preclinical area and propose some suggestions and prospects for the future.
Collapse
Affiliation(s)
- Fengxu Wu
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Research Platform Service Management Center, Dongguan, China
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, China
| | - Yuquan Zhou
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Research Platform Service Management Center, Dongguan, China
- The Second School of Clinical Medicine, Guangdong Medical University, Dongguan, China
| | - Langhui Li
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Research Platform Service Management Center, Dongguan, China
- Key Laboratory for Research and Development of Natural Drugs of Guangdong Province, School of Pharmacy, Guangdong Medical University, Dongguan, China
| | - Xianhuan Shen
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Research Platform Service Management Center, Dongguan, China
- Key Laboratory for Research and Development of Natural Drugs of Guangdong Province, School of Pharmacy, Guangdong Medical University, Dongguan, China
| | - Ganying Chen
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Research Platform Service Management Center, Dongguan, China
- The Second School of Clinical Medicine, Guangdong Medical University, Dongguan, China
| | - Xiaoqing Wang
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Research Platform Service Management Center, Dongguan, China
- Key Laboratory for Research and Development of Natural Drugs of Guangdong Province, School of Pharmacy, Guangdong Medical University, Dongguan, China
| | - Xianyang Liang
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Research Platform Service Management Center, Dongguan, China
- The Second School of Clinical Medicine, Guangdong Medical University, Dongguan, China
| | - Mengyuan Tan
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Research Platform Service Management Center, Dongguan, China
- Key Laboratory for Research and Development of Natural Drugs of Guangdong Province, School of Pharmacy, Guangdong Medical University, Dongguan, China
| | - Zunnan Huang
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Research Platform Service Management Center, Dongguan, China
- Key Laboratory for Research and Development of Natural Drugs of Guangdong Province, School of Pharmacy, Guangdong Medical University, Dongguan, China
- Marine Biomedical Research Institute of Guangdong Zhanjiang, Zhanjiang, China
| |
Collapse
|
27
|
Shen J, Nicolaou CA. Molecular property prediction: recent trends in the era of artificial intelligence. DRUG DISCOVERY TODAY. TECHNOLOGIES 2020; 32-33:29-36. [PMID: 33386091 DOI: 10.1016/j.ddtec.2020.05.001] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 03/10/2020] [Accepted: 04/06/2020] [Indexed: 12/18/2022]
Abstract
Artificial intelligence (AI) has become a powerful tool in many fields, including drug discovery. Among various AI applications, molecular property prediction can have more significant immediate impact to the drug discovery process since most algorithms and methods use predicted properties to evaluate, select, and generate molecules. Herein, we provide a brief review of the state-of-art molecular property prediction methodologies and discuss examples reported recently. We highlight key techniques that have been applied to molecular property prediction such as learned representation, multi-task learning, transfer learning, and federated learning. We also point out some critical but less discussed issues such as data set quality, benchmark, model performance evaluation, and prediction confidence quantification.
Collapse
Affiliation(s)
- Jie Shen
- Advanced Analytics and Data Sciences, Eli Lilly and Company, Indianapolis, IN 46285, United States.
| | - Christos A Nicolaou
- Discovery Chemistry Research & Technologies, Eli Lilly and Company, Indianapolis, IN 46285, United States.
| |
Collapse
|
28
|
Wang Y, Chen X. A joint optimization QSAR model of fathead minnow acute toxicity based on a radial basis function neural network and its consensus modeling. RSC Adv 2020; 10:21292-21308. [PMID: 35518745 PMCID: PMC9054390 DOI: 10.1039/d0ra02701d] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 05/24/2020] [Indexed: 01/07/2023] Open
Abstract
Acute toxicity of the fathead minnow (Pimephales promelas) is an important indicator to evaluate the hazards and risks of compounds in aquatic environments. The aim of our study is to explore the predictive power of the quantitative structure-activity relationship (QSAR) model based on a radial basis function (RBF) neural network with the joint optimization method to study the acute toxicity mechanism, and to develop a potential acute toxicity prediction model, for fathead minnow. To ensure the symmetry and fairness of the data splitting and to generate multiple chemically diverse training and validation sets, we used a self-organizing mapping (SOM) neural network to split the modeling dataset (containing 955 compounds) characterized by PaDEL-descriptors. After preliminary selection of descriptors via the mean decrease impurity method, a hybrid quantum particle swarm optimization (HQPSO) algorithm was used to jointly optimize the parameters of RBF and select the key descriptors. We established 20 RBF-based QSAR models, and the statistical results showed that the 10-fold cross-validation results (R cv10 2) and the adjusted coefficients of determination (R adj 2) were all great than 0.7 and 0.8, respectively. The Q ext 2 of these models was between 0.6480 and 0.7317, and the R ext 2 was between 0.6563 and 0.7318. Combined with the frequency and importance of the descriptors used in RBF-based models, and the correlation between the descriptors and acute toxicity, we concluded that the water distribution coefficient, molar refractivity, and first ionization potential are important factors affecting the acute toxicity of fathead minnow. A consensus QSAR model with RBF-based models was established; this model showed good performance with R 2 = 0.9118, R cv10 2 = 0.7632, and Q ext 2 = 0.7430. A frequency weighted and distance (FWD)-based application domain (AD) definition method was proposed, and the outliers were analyzed carefully. Compared with previous studies the method proposed in this paper has obvious advantages and its robustness and external predictive power are also better than Xgboost-based model. It is an effective QSAR modeling method.
Collapse
Affiliation(s)
- Yukun Wang
- School of Chemical Engineering, University of Science and Technology Liaoning No. 185, Qianshan Anshan 114051 Liaoning China
- School of Electronic and Information Engineering, University of Science and Technology Liaoning No. 185, Qianshan Anshan 114051 Liaoning China +864125928367
| | - Xuebo Chen
- School of Electronic and Information Engineering, University of Science and Technology Liaoning No. 185, Qianshan Anshan 114051 Liaoning China +864125928367
| |
Collapse
|
29
|
Awale M, Riniker S, Kramer C. Matched Molecular Series Analysis for ADME Property Prediction. J Chem Inf Model 2020; 60:2903-2914. [PMID: 32369360 DOI: 10.1021/acs.jcim.0c00269] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Generation and prioritization of new molecules are the most central part of the drug design process. Matched molecular series analysis (MMSA) has recently been proposed as a formal approach that captures both of these key elements of design. In order to better understand the power of MMSA and its specific limitations, we here evaluate its performance as an ADME property prediction tool. We use four large and diverse inhouse data sets, logD, microsomal clearance, CYP2C9, and CYP3A4 inhibition. MMSA follows the concept of parallel structure-activity relationship (SAR), where if two identical substituent series on different scaffolds show similarity in their property profiles, SAR from one series can be transferred to the other series. We test four different similarity metrics to identify pairs of molecular series where information can be transferred. We find that the best prediction performance is achieved by a combination of centered root-mean-square deviation (cRMSD) and a network score approach previously published by Keefer et al. However, cRMSD alone strikes the best balance between accuracy and the number of predictions that can be made. We identify statistical metrics that allow estimating when MMSA predictions will work, similar to the well-known applicability domain concept in machine learning. MMSA achieves a prediction accuracy that is comparable to a standard machine-learning model and matched molecular pair analysis. In contrast to machine learning, however, it is very easy to understand where MMSA predictions are coming from. Finally, to prospectively test the power of MMSA, we retested compounds that were strong outliers in the initial predictions and show how the MMSA model can help to identify erroneous data points.
Collapse
Affiliation(s)
- Mahendra Awale
- Computer-Aided Drug Design/Therapeutic Modalities, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Christian Kramer
- Computer-Aided Drug Design/Therapeutic Modalities, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| |
Collapse
|
30
|
Yang ZY, Dong J, Yang ZJ, Lu AP, Hou TJ, Cao DS. Structural Analysis and Identification of False Positive Hits in Luciferase-Based Assays. J Chem Inf Model 2020; 60:2031-2043. [PMID: 32202787 DOI: 10.1021/acs.jcim.9b01188] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Luciferase-based bioluminescence detection techniques are highly favored in high-throughput screening (HTS), in which the firefly luciferase (FLuc) is the most commonly used variant. However, FLuc inhibitors can interfere with the activity of luciferase, which may result in false positive signals in HTS assays. In order to reduce the unnecessary cost of time and money, an in silico prediction model for FLuc inhibitors is highly desirable. In this study, we built an extensive data set consisting of 20 888 FLuc inhibitors and 198 608 noninhibitors, and then developed a group of classification models based on the combination of three machine learning (ML) algorithms and four types of molecular representations. The best prediction model based on XGBoost and ECFP4 and MOE2d descriptors yielded a balanced accuracy (BA) of 0.878 and an area under the receiver operating characteristic curve (AUC) value of 0.958 for the validation set, and a BA of 0.886 and an AUC of 0.947 for the test set. Three external validation sets, including set 1 (3231 FLuc inhibitors and 69 783 noninhibitors), set 2 (695 FLuc inhibitors and 75 913 noninhibitors), and set 3 (1138 FLuc inhibitors and 8155 noninhibitors), were used to verify the predictive ability of our models. The BA values for the three external validation sets given by the best model are 0.864, 0.845, and 0.791, respectively. In addition, the important features or structural fragments related to FLuc inhibitors were recognized by the Shapley additive explanations (SHAP) method along with their influences on predictions, which may provide valuable clues to detecting undesirable luciferase inhibitors. Based on the important and explanatory features, 16 rules were proposed for detecting FLuc inhibitors, which can achieve a correction rate of 70% for FLuc inhibitors. Furthermore, a comparison with existing prediction rules and models for FLuc inhibitors used in virtual screening verified the high reliability of the models and rules proposed in this study. We also used the model to screen three curated chemical databases, and almost 10% of the molecules in the evaluated databases were predicted as inhibitors, highlighting the potential risk of false positives in luciferase-based assays. Finally, a public web server called ChemFLuc was developed (http://admet.scbdd.com/chemfluc/index/), and it offers a free available service to predict potential FLuc inhibitors.
Collapse
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P.R. China
| | - Jie Dong
- Central South University of Forestry and Technology, Changsha, 410004, P.R. China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P.R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P.R. China
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P.R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P.R. China.,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P.R. China
| |
Collapse
|