1
|
Wojtyło PA, Łapińska N, Bellagamba L, Camaioni E, Mendyk A, Giovagnoli S. Initial Development of Automated Machine Learning-Assisted Prediction Tools for Aryl Hydrocarbon Receptor Activators. Pharmaceutics 2024; 16:1456. [PMID: 39598579 PMCID: PMC11597659 DOI: 10.3390/pharmaceutics16111456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 11/02/2024] [Accepted: 11/12/2024] [Indexed: 11/29/2024] Open
Abstract
Background: The aryl hydrocarbon receptor (AhR) plays a crucial role in immune and metabolic processes. The large molecular diversity of ligands capable of activating AhR makes it impossible to determine the structural features useful for the design of new potent modulators. Thus, in the field of drug discovery, the intricate nature of AhR activation necessitates the development of novel tools to address related challenges. Methods: In this study, quantitative structure-activity relationship (QSAR) models of classification and regression were developed with the objective of identifying the most effective method for predicting AhR activity. The initial dataset was obtained by combining the ChEMBL and WIPO databases which contained 978 molecules with EC50 values. The predictive models were developed using the automated machine learning platform mljar according to a 10-fold cross validation (10-CV) testing procedure. Results: The classification model demonstrated an accuracy value of 0.760 and F1 value of 0.789 for the test set. The root-mean-squared error (RMSE) was 5444, and the coefficient of determination (R2) was 0.208 for the regression model. The Shapley Additive Explanations (SHAP) method was then employed for a deeper comprehension of the impact of the variables on the model's predictions. As a practical application for scientific purposes, the best performing classification model was then used to develop an AhR web application. This application is accessible online and has been implemented in Streamlit. Conclusions: The findings may serve as a foundation in prompting further research into the development of a QSAR model, which could enhance comprehension of the influence of ligand structure on the modulation of AhR activity.
Collapse
Affiliation(s)
- Paulina Anna Wojtyło
- Department of Pharmaceutical Sciences, University of Perugia, via del Liceo 1, 06123 Perugia, Italy; (L.B.); (E.C.); (S.G.)
| | - Natalia Łapińska
- Department of Pharmaceutical Technology and Biopharmaceutics, Jagiellonian University Medical College, 30-688 Kraków, Poland; (N.Ł.); (A.M.)
| | - Lucia Bellagamba
- Department of Pharmaceutical Sciences, University of Perugia, via del Liceo 1, 06123 Perugia, Italy; (L.B.); (E.C.); (S.G.)
| | - Emidio Camaioni
- Department of Pharmaceutical Sciences, University of Perugia, via del Liceo 1, 06123 Perugia, Italy; (L.B.); (E.C.); (S.G.)
| | - Aleksander Mendyk
- Department of Pharmaceutical Technology and Biopharmaceutics, Jagiellonian University Medical College, 30-688 Kraków, Poland; (N.Ł.); (A.M.)
| | - Stefano Giovagnoli
- Department of Pharmaceutical Sciences, University of Perugia, via del Liceo 1, 06123 Perugia, Italy; (L.B.); (E.C.); (S.G.)
| |
Collapse
|
2
|
Abou Hajal A, Bryce RA, Amor BB, Atatreh N, Ghattas MA. Boosting the Accuracy and Chemical Space Coverage of the Detection of Small Colloidal Aggregating Molecules Using the BAD Molecule Filter. J Chem Inf Model 2024; 64:4991-5005. [PMID: 38920403 DOI: 10.1021/acs.jcim.4c00363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
The ability to conduct effective high throughput screening (HTS) campaigns in drug discovery is often hampered by the detection of false positives in these assays due to small colloidally aggregating molecules (SCAMs). SCAMs can produce artifactual hits in HTS by nonspecific inhibition of the protein target. In this work, we present a new computational prediction tool for detecting SCAMs based on their 2D chemical structure. The tool, called the boosted aggregation detection (BAD) molecule filter, employs decision tree ensemble methods, namely, the CatBoost classifier and the light gradient-boosting machine, to significantly improve the detection of SCAMs. In developing the filter, we explore models trained on individual data sets, a consensus approach using these models, and, third, a merged data set approach, each tailored for specific drug discovery needs. The individual data set method emerged as most effective, achieving 93% sensitivity and 90% specificity, outperforming existing state-of-the-art models by 20 and 5%, respectively. The consensus models offer broader chemical space coverage, exceeding 90% for all testing sets. This feature is an important aspect particularly for early stage medicinal chemistry projects, and provides information on applicability domain. Meanwhile, the merged data set models demonstrated robust performance, with a notable sensitivity of 79% in the comprehensive 10-fold cross-validation test set. A SHAP analysis of model features indicates the importance of hydrophobicity and molecular complexity as primary factors influencing the aggregation propensity. The BAD molecule filter is readily accessible for the public usage on https://molmodlab-aau.com/Tools.html. This filter provides a new, more robust tool for aggregate prediction in the early stages of drug discovery to optimize hit rates and reduce associated testing and validation overheads.
Collapse
Affiliation(s)
- Abdallah Abou Hajal
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| | - Richard A Bryce
- Division of Pharmacy and Optometry, School of Health Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, U.K
| | - Boulbaba Ben Amor
- Core42, Inception/G42, Abu Dhabi 2282, United Arab Emirates
- IMT Nord Europe, Villeneuve D'Ascq 59650 France
| | - Noor Atatreh
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| | - Mohammad A Ghattas
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| |
Collapse
|
3
|
Choi JI, Song WS, Koh DH, Kim EY. In Silico and In Vitro multiple analysis approach for screening naturally derived ligands for red seabream aryl hydrocarbon receptor. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024; 275:116262. [PMID: 38569320 DOI: 10.1016/j.ecoenv.2024.116262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 03/20/2024] [Accepted: 03/23/2024] [Indexed: 04/05/2024]
Abstract
The aryl hydrocarbon receptor (AHR) is a key ligand-dependent transcription factor that mediates the toxic effects of compounds such as dioxin. Recently, natural ligands of AHR, including flavonoids, have been attracting physiological and toxicological attention as they have been reported to regulate major biological functions such as inflammation and anti-cancer by reducing the toxic effects of dioxin. Additionally, it is known that natural AHR ligands can accumulate in wildlife tissues, such as fish. However, studies in fish have investigated only a few ligands in experimental fish species, and the AHR response of marine fish to natural AHR ligands of various other structures has not been thoroughly investigated. To explore various natural AHR ligands in marine fish, which make up the most fish, it is necessary to develop new screening methods that consider the specificity of marine fish. In this study, we investigated the response of natural ligands by constructing in vitro and in silico experimental systems using red seabream as a model species. We attempted to develop a new predictive model to screen potential ligands that can induce transcriptional activation of red seabream AHR1 and AHR2 (rsAHR1 and rsAHR2). This was achieved through multiple analyses using in silico/ in vitro data and Tox21 big data. First, we constructed an in vitro reporter gene assay of rsAHR1 and rsAHR2 and measured the response of 10 representatives natural AHR ligands in COS-7 cells. The results showed that FICZ, Genistein, Daidzein, I3C, DIM, Quercetin and Baicalin induced the transcriptional activity of rsAHR1 and rsAHR2, while Resveratrol and Retinol did not induce the transcriptional activity of rsAHR isoforms. Comparing the EC50 values of the respective compounds in rsAHR1 and rsAHR2, FICZ, Genistein, and Daidzein exhibited similar isoform responses, but I3C, Baicalin, DIM and Quercetin show the isoform-specific responses. These results suggest that natural AHR ligands have specific profiling and transcriptional activity for each rsAHR isoform. In silico analysis, we constructed homology models of the ligand binding domains (LBDs) of rsAHR1 and rsAHR2 and calculated the docking energies (U_dock values) of natural ligands with measured in vitro transcriptional activity and dioxins reported in previous studies. The results showed a significant correlation (R2=0.74(rsAHR1), R2=0.83(rsAHR2)) between docking energy and transcriptional activity (EC50) value, suggesting that the homology model of rsAHR1 and rsAHR2 can be utilized to predict the potential transactivation of ligands. To broaden the applicability of the homology model to diverse compound structures and validate the correlation with transcriptional activity, we conducted additional analyses utilizing Tox21 big data. We calculated the docking energy values for 1860 chemicals in both rsAHR1 and rsAHR2, which were tested for transcriptional activation in Tox21 data against human AHR. By comparing the U_dock energy values between 775 active compounds and 1085 inactive compounds, a significant difference (p<0.001) was observed between the U_dock energy values in the two groups, suggesting that the U_dock value can be applied to distinguish the activation of compounds. Furthermore, we observed a significant correlation (R2=0.45) between the AC50 of Tox21 database and U_dock values of human AHR model. In conclusion, we calculated equations to translate the results of an in silico prediction model for ligand screening of rsAHR1 and rsAHR2 transactivation. This ligand screening model can be a powerful tool to quantitatively estimate AHR transactivation of major marine agents to which red seabream may be exposed. The study introduces a new screening approach for potential natural AHR ligands in marine fish, based on homology model-docking energy values of rsAHR1 and rsAHR2, with implications for future agonist development and applications bridging in silico and in vitro data.
Collapse
Affiliation(s)
- Jong-In Choi
- Department of Biology, Kyung Hee University, Seoul, Republic of Korea
| | - Woo-Seon Song
- Department of Biomedical and Pharmaceutical Sciences, Kyung Hee University, Seoul, Republic of Korea
| | - Dong-Hee Koh
- Department of Biomedical and Pharmaceutical Sciences, Kyung Hee University, Seoul, Republic of Korea
| | - Eun-Young Kim
- Department of Biology, Kyung Hee University, Seoul, Republic of Korea; Department of Biomedical and Pharmaceutical Sciences, Kyung Hee University, Seoul, Republic of Korea.
| |
Collapse
|
4
|
Kashyap K, Mahapatra PP, Ahmed S, Buyukbingol E, Siddiqi MI. Identification of Potential Aldose Reductase Inhibitors Using Convolutional Neural Network-Based in Silico Screening. J Chem Inf Model 2023; 63:6261-6282. [PMID: 37788831 DOI: 10.1021/acs.jcim.3c00547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Aldose reductase (ALR2) is a notable enzyme of the polyol pathway responsible for aggravating diabetic neuropathy complications. The first step begins when it catalyzes the reduction of glucose to sorbitol with NADPH as a coenzyme. Elevated concentrations of sorbitol damage the tissues, leading to complications like neuropathy. Though considerable effort has been pushed toward the successful discovery of potent inhibitors, its discovery still remains an elusive task. To this end, we present a 3D convolutional neural network (3D-CNN) based ALR2 inhibitor classification technique by dealing with snapshots of images captured from 3D chemical structures with multiple rotations as input data. The CNN-based architecture was trained on the 360 sets of image data along each axis and further prediction on the Maybridge library by each of the models. Subjecting the retrieved hits to molecular docking leads to the identification of the top 10 molecules with high binding affinity. The hits displayed a better blood-brain barrier penetration (BBB) score (90% with more than four scores) as compared to standard inhibitors (38%), reflecting the superior BBB penetrating efficiency of the hits. Followed by molecular docking, the biological evaluation spotlighted five compounds as promising ALR2 inhibitors and can be considered as a likely prospect for further structural optimization with medicinal chemistry efforts to improve their inhibition efficacy and consolidate them as new ALR2 antagonists in the future. In addition, the study also demonstrated the usefulness of scaffold analysis of the molecules as a method for investigating the significance of structurally diverse compounds in data-driven studies. For reproducibility and accessibility purposes, all of the source codes used in our study are publicly available.
Collapse
Affiliation(s)
- Kushagra Kashyap
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Pinaki Prasad Mahapatra
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow 226031, India
| | - Shakil Ahmed
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow 226031, India
| | - Erdem Buyukbingol
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Ankara University, 06100 Ankara, Turkey
| | - Mohammad Imran Siddiqi
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| |
Collapse
|
5
|
Mamada H, Takahashi M, Ogino M, Nomura Y, Uesawa Y. Predictive Models Based on Molecular Images and Molecular Descriptors for Drug Screening. ACS OMEGA 2023; 8:37186-37195. [PMID: 37841172 PMCID: PMC10568689 DOI: 10.1021/acsomega.3c04073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 08/30/2023] [Indexed: 10/17/2023]
Abstract
Various toxicity and pharmacokinetic evaluations as screening experiments are needed at the drug discovery stage. Currently, to reduce the use of animal experiments and developmental expenses, the development of high-performance predictive models based on quantitative structure-activity relationship analysis is desired. From these evaluation targets, we selected 50% lethal dose (LD50), blood-brain barrier penetration (BBBP), and the clearance (CL) pathway for this investigation and constructed predictive models for each target using 636-11,886 compounds. First, we constructed predictive models using the DeepSnap-deep learning (DL) method and images of compounds as features. The calculated area under the curve (AUC) and balanced accuracy (BAC) were, respectively, 0.887 and 0.818 for LD50, 0.893 and 0.824 for BBBP, and 0.883 and 0.763 for the CL pathway. Next, molecular descriptors (MDs) of compounds were calculated using Molecular Operating Environment, alvaDesc, and ADMET Predictor to construct predictive models using the MD-based method. Using these MDs, we constructed predictive models using DataRobot. The calculated AUC and BAC were, respectively, 0.931 and 0.805 for LD50, 0.919 and 0.849 for BBBP, and 0.900 and 0.807 for the CL pathway. In this investigation, we constructed predictive models combining the DeepSnap-DL and MD-based methods. In ensemble models using the mean predictive probability of the DeepSnap-DL and MD-based methods, the calculated AUC and BAC were, respectively, 0.942 and 0.842 for LD50, 0.936 and 0.853 for BBBP, and 0.908 and 0.832 for the CL pathway, with improved predictive performance observed for all variables compared with either single method alone. Moreover, in consensus models that adopted only compounds for which the results of the two methods agreed, the calculated BAC for LD50, BBBP, and the CL pathway were 0.916, 0.918, and 0.847, respectively, indicating higher predictive performance than the ensemble models for all three variables. The predictive models combining the DeepSnap-DL and MD-based methods displayed high predictive performance for LD50, BBBP, and the CL pathway. Therefore, the application of this approach to prediction targets in various drug discovery screenings is expected to accelerate drug discovery.
Collapse
Affiliation(s)
- Hideaki Mamada
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1 Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Mari Takahashi
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1 Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Mizuki Ogino
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1 Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yukihiro Nomura
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1 Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yoshihiro Uesawa
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1 Noshio, Kiyose, Tokyo 204-858, Japan
| |
Collapse
|
6
|
Ensemble Learning, Deep Learning-Based and Molecular Descriptor-Based Quantitative Structure-Activity Relationships. Molecules 2023; 28:molecules28052410. [PMID: 36903654 PMCID: PMC10005768 DOI: 10.3390/molecules28052410] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 02/28/2023] [Accepted: 03/01/2023] [Indexed: 03/09/2023] Open
Abstract
A deep learning-based quantitative structure-activity relationship analysis, namely the molecular image-based DeepSNAP-deep learning method, can successfully and automatically capture the spatial and temporal features in an image generated from a three-dimensional (3D) structure of a chemical compound. It allows building high-performance prediction models without extracting and selecting features because of its powerful feature discrimination capability. Deep learning (DL) is based on a neural network with multiple intermediate layers that makes it possible to solve highly complex problems and improve the prediction accuracy by increasing the number of hidden layers. However, DL models are too complex when it comes to understanding the derivation of predictions. Instead, molecular descriptor-based machine learning has clear features owing to the selection and analysis of features. However, molecular descriptor-based machine learning has some limitations in terms of prediction performance, calculation cost, feature selection, etc., while the DeepSNAP-deep learning method outperforms molecular descriptor-based machine learning due to the utilization of 3D structure information and the advanced computer processing power of DL.
Collapse
|
7
|
Yoda T, Tochitani T, Usui T, Kouchi M, Inada H, Hosaka T, Kanno Y, Miyawaki I, Yoshinari K. Involvement of the CYP1A1 inhibition-mediated activation of aryl hydrocarbon receptor in drug-induced hepatotoxicity. J Toxicol Sci 2022; 47:359-373. [PMID: 36047110 DOI: 10.2131/jts.47.359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Hepatotoxicity is one of the most common toxicities observed in non-clinical safety studies of drug candidates, and it is important to understand the hepatotoxicity mechanism to assess the risk of drug-induced liver injury in humans. In this study, we investigated the mechanism of hepatotoxicity caused by 2-[2-Methyl-1-(oxan-4-yl)-1H-benzimidazol-5-yl]-1,3-benzoxazole (DSP-0640), a drug candidate that showed hepatotoxicity characterized by centrilobular hypertrophy and vacuolation of hepatocytes in a 4-week oral repeated-dose toxicity study in male rats. In the liver of rats treated with DSP-0640, the expression of aryl hydrocarbon receptor (AHR) target genes, including Cyp1a1, was upregulated. In in vitro reporter assays, however, DSP-0640 showed only minimal AHR-activating potency. Therefore, we investigated the possibility that DSP-0640 indirectly activated AHR by inhibiting the CYP1 enzyme-dependent clearance of endogenous AHR agonists. In in vitro assays, DSP-0640 showed inhibitory effects on both rat and human CYP1A1 and enhanced rat and human AHR-mediated reporter gene expression induced by 6-formylindolo[3,2-b]carbazole, a well-known endogenous AHR agonist. The possible involvement of CYP1A1 inhibition in AHR activation was also demonstrated with other hepatotoxic compounds tacrine and albendazole. These results suggest that CYP1A1 inhibition-mediated AHR activation is involved in the hepatotoxicity caused by DSP-0640 and that DSP-0640 might induce hepatotoxicity in humans as well. We propose that CYP1A1 inhibition-mediated AHR activation is a novel mechanism for drug-induced hepatotoxicity.
Collapse
Affiliation(s)
- Tomomi Yoda
- Preclinical Research Unit, Sumitomo Pharma Co., Ltd.,Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka
| | | | - Toru Usui
- Preclinical Research Unit, Sumitomo Pharma Co., Ltd
| | - Mami Kouchi
- Preclinical Research Unit, Sumitomo Pharma Co., Ltd
| | | | - Takuomi Hosaka
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka
| | - Yuichiro Kanno
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka
| | | | - Kouichi Yoshinari
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka
| |
Collapse
|
8
|
Mamada H, Nomura Y, Uesawa Y. Novel QSAR Approach for a Regression Model of Clearance That Combines DeepSnap-Deep Learning and Conventional Machine Learning. ACS OMEGA 2022; 7:17055-17062. [PMID: 35647436 PMCID: PMC9134387 DOI: 10.1021/acsomega.2c00261] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 04/29/2022] [Indexed: 05/03/2023]
Abstract
The toxicity, absorption, distribution, metabolism, and excretion properties of some targets are difficult to predict by quantitative structure-activity relationship analysis. Therefore, there is a need for a new prediction method that performs well for these targets. The aim of this study was to develop a new regression model of rat clearance (CL). We constructed a regression model using 1545 in-house compounds for which we had rat CL data. Molecular descriptors were calculated using molecular operating environment, alvaDesc, and ADMET Predictor software. The classification model of DeepSnap and Deep Learning (DeepSnap-DL) with images of the three-dimensional chemical structures of compounds as features was constructed, and the prediction probabilities for each compound were calculated. For molecular descriptor-based methods that use molecular descriptors and conventional machine learning algorithms selected by DataRobot, the correlation coefficient (R 2) and root mean square error (RMSE) were 0.625-0.669 and 0.295-0.318, respectively. We combined molecular descriptors and prediction probability of DeepSnap-DL as features and developed a novel regression method we called the combination model. In the combination model with these two types of features and conventional algorithms selected by DataRobot, R 2 and RMSE were 0.710-0.769 and 0.247-0.278, respectively. This finding shows that the combination model performed better than molecular descriptor-based methods. Our combination model will contribute to the design of more rational compounds for drug discovery. This method may be applicable not only to rat CL but also to other pharmacokinetic and pharmacological activity and toxicity parameters; therefore, applying it to other parameters may help to accelerate drug discovery.
Collapse
Affiliation(s)
- Hideaki Mamada
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose, Tokyo 204-8588, Japan
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yukihiro Nomura
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yoshihiro Uesawa
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose, Tokyo 204-8588, Japan
- . Phone: +81-42-495-8983. Fax: +81-42-495-8983
| |
Collapse
|
9
|
Matsuzaka Y, Totoki S, Handa K, Shiota T, Kurosaki K, Uesawa Y. Prediction Models for Agonists and Antagonists of Molecular Initiation Events for Toxicity Pathways Using an Improved Deep-Learning-Based Quantitative Structure-Activity Relationship System. Int J Mol Sci 2021; 22:10821. [PMID: 34639159 PMCID: PMC8509615 DOI: 10.3390/ijms221910821] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Revised: 09/29/2021] [Accepted: 10/04/2021] [Indexed: 12/11/2022] Open
Abstract
In silico approaches have been studied intensively to assess the toxicological risk of various chemical compounds as alternatives to traditional in vivo animal tests. Among these approaches, quantitative structure-activity relationship (QSAR) analysis has the advantages that it is able to construct models to predict the biological properties of chemicals based on structural information. Previously, we reported a deep learning (DL) algorithm-based QSAR approach called DeepSnap-DL for high-performance prediction modeling of the agonist and antagonist activity of key molecules in molecular initiating events in toxicological pathways using optimized hyperparameters. In the present study, to achieve high throughput in the DeepSnap-DL system-which consists of the preparation of three-dimensional molecular structures of chemical compounds, the generation of snapshot images from the three-dimensional chemical structures, DL, and statistical calculations-we propose an improved DeepSnap-DL approach. Using this improved system, we constructed 59 prediction models for the agonist and antagonist activity of key molecules in the Tox21 10K library. The results indicate that modeling of the agonist and antagonist activity with high prediction performance and high throughput can be achieved by optimizing suitable parameters in the improved DeepSnap-DL system.
Collapse
Affiliation(s)
- Yasunari Matsuzaka
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Kiyose, Tokyo 204-8588, Japan; (Y.M.); (K.K.)
- Center for Gene and Cell Therapy, Division of Molecular and Medical Genetics, The Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo 108-8639, Japan
| | - Shin Totoki
- Fujitsu Limited, Kawasaki-shi, Kanagawa 211-8588, Japan; (S.T.); (K.H.); (T.S.)
| | - Kentaro Handa
- Fujitsu Limited, Kawasaki-shi, Kanagawa 211-8588, Japan; (S.T.); (K.H.); (T.S.)
| | - Tetsuyoshi Shiota
- Fujitsu Limited, Kawasaki-shi, Kanagawa 211-8588, Japan; (S.T.); (K.H.); (T.S.)
| | - Kota Kurosaki
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Kiyose, Tokyo 204-8588, Japan; (Y.M.); (K.K.)
| | - Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Kiyose, Tokyo 204-8588, Japan; (Y.M.); (K.K.)
| |
Collapse
|
10
|
Mamada H, Nomura Y, Uesawa Y. Prediction Model of Clearance by a Novel Quantitative Structure-Activity Relationship Approach, Combination DeepSnap-Deep Learning and Conventional Machine Learning. ACS OMEGA 2021; 6:23570-23577. [PMID: 34549154 PMCID: PMC8444299 DOI: 10.1021/acsomega.1c03689] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 08/23/2021] [Indexed: 05/19/2023]
Abstract
Some targets predicted by machine learning (ML) in drug discovery remain a challenge because of poor prediction. In this study, a new prediction model was developed and rat clearance (CL) was selected as a target because it is difficult to predict. A classification model was constructed using 1545 in-house compounds with rat CL data. The molecular descriptors calculated by Molecular Operating Environment (MOE), alvaDesc, and ADMET Predictor software were used to construct the prediction model. In conventional ML using 100 descriptors and random forest selected by DataRobot, the area under the curve (AUC) and accuracy (ACC) were 0.883 and 0.825, respectively. Conversely, the prediction model using DeepSnap and Deep Learning (DeepSnap-DL) with compound features as images had AUC and ACC of 0.905 and 0.832, respectively. We combined the two models (conventional ML and DeepSnap-DL) to develop a novel prediction model. Using the ensemble model with the mean of the predicted probabilities from each model improved the evaluation metrics (AUC = 0.943 and ACC = 0.874). In addition, a consensus model using the results of the agreement between classifications had an increased ACC (0.959). These combination models with a high level of predictive performance can be applied to rat CL as well as other pharmacokinetic parameters, pharmacological activity, and toxicity prediction. Therefore, these models will aid in the design of more rational compounds for the development of drugs.
Collapse
Affiliation(s)
- Hideaki Mamada
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose-shi, Tokyo 204-858, Japan
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical Research Institute, Japan Tobacco
Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yukihiro Nomura
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical Research Institute, Japan Tobacco
Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yoshihiro Uesawa
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose-shi, Tokyo 204-858, Japan
- . Tel.: +81-42-495-8983. Fax: +81-42-495-8983
| |
Collapse
|
11
|
Hancock JT, Khoshgoftaar TM. CatBoost for big data: an interdisciplinary review. JOURNAL OF BIG DATA 2020; 7:94. [PMID: 33169094 PMCID: PMC7610170 DOI: 10.1186/s40537-020-00369-8] [Citation(s) in RCA: 181] [Impact Index Per Article: 45.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 10/19/2020] [Indexed: 05/25/2023]
Abstract
Gradient Boosted Decision Trees (GBDT's) are a powerful tool for classification and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT's in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost's effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.
Collapse
Affiliation(s)
- John T. Hancock
- Florida Atlantic University, 777 Glades Road, Boca Raton, FL USA
| | | |
Collapse
|
12
|
Wu G, Zhou S, Wang Y, Lv W, Wang S, Wang T, Li X. A prediction model of outcome of SARS-CoV-2 pneumonia based on laboratory findings. Sci Rep 2020; 10:14042. [PMID: 32820210 PMCID: PMC7441177 DOI: 10.1038/s41598-020-71114-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 08/10/2020] [Indexed: 01/08/2023] Open
Abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has resulted in thousands of deaths in the world. Information about prediction model of prognosis of SARS-CoV-2 infection is scarce. We used machine learning for processing laboratory findings of 110 patients with SARS-CoV-2 pneumonia (including 51 non-survivors and 59 discharged patients). The maximum relevance minimum redundancy (mRMR) algorithm and the least absolute shrinkage and selection operator logistic regression model were used for selection of laboratory features. Seven laboratory features selected in the model were: prothrombin activity, urea, white blood cell, interleukin-2 receptor, indirect bilirubin, myoglobin, and fibrinogen degradation products. The signature constructed using the seven features had 98% [93%, 100%] sensitivity and 91% [84%, 99%] specificity in predicting outcome of SARS-CoV-2 pneumonia. Thus it is feasible to establish an accurate prediction model of outcome of SARS-CoV-2 pneumonia based on laboratory findings.
Collapse
Affiliation(s)
- Gang Wu
- Department of Radiology, Tongji Hospital of Tongji Medical College of Huazhong University of Science and Technology, Wuhan, China
| | - Shuchang Zhou
- Department of Radiology, Tongji Hospital of Tongji Medical College of Huazhong University of Science and Technology, Wuhan, China
| | - Yujin Wang
- Department of Radiology, Tongji Hospital of Tongji Medical College of Huazhong University of Science and Technology, Wuhan, China
| | | | - Shili Wang
- Computational Biology, Carnegie Mellon University, Pittsburgh, USA
| | - Ting Wang
- Department of Medical Ultrasound, Tongji Hospital of Tongji Medical College of Huazhong University of Science and Technology, Wuhan, China.
| | - Xiaoming Li
- Department of Radiology, Tongji Hospital of Tongji Medical College of Huazhong University of Science and Technology, Wuhan, China.
| |
Collapse
|
13
|
Matsuzaka Y, Uesawa Y. Molecular Image-Based Prediction Models of Nuclear Receptor Agonists and Antagonists Using the DeepSnap-Deep Learning Approach with the Tox21 10K Library. Molecules 2020; 25:molecules25122764. [PMID: 32549344 PMCID: PMC7356846 DOI: 10.3390/molecules25122764] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 06/06/2020] [Accepted: 06/12/2020] [Indexed: 02/07/2023] Open
Abstract
The interaction of nuclear receptors (NRs) with chemical compounds can cause dysregulation of endocrine signaling pathways, leading to adverse health outcomes due to the disruption of natural hormones. Thus, identifying possible ligands of NRs is a crucial task for understanding the adverse outcome pathway (AOP) for human toxicity as well as the development of novel drugs. However, the experimental assessment of novel ligands remains expensive and time-consuming. Therefore, an in silico approach with a wide range of applications instead of experimental examination is highly desirable. The recently developed novel molecular image-based deep learning (DL) method, DeepSnap-DL, can produce multiple snapshots from three-dimensional (3D) chemical structures and has achieved high performance in the prediction of chemicals for toxicological evaluation. In this study, we used DeepSnap-DL to construct prediction models of 35 agonist and antagonist allosteric modulators of NRs for chemicals derived from the Tox21 10K library. We demonstrate the high performance of DeepSnap-DL in constructing prediction models. These findings may aid in interpreting the key molecular events of toxicity and support the development of new fields of machine learning to identify environmental chemicals with the potential to interact with NR signaling pathways.
Collapse
|