1
|
Kim D, Jeong J, Choi J. Identification of Optimal Machine Learning Algorithms and Molecular Fingerprints for Explainable Toxicity Prediction Models Using ToxCast/Tox21 Bioassay Data. ACS OMEGA 2024; 9:37934-37941. [PMID: 39281924 PMCID: PMC11391437 DOI: 10.1021/acsomega.4c04474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/18/2024]
Abstract
Recent studies have primarily focused on introducing novel frameworks to enhance the predictive power of toxicity prediction models by refining molecular representation methods and algorithms. However, these methods are inherently complex and often pose challenges in understanding and explaining, leading to barriers in their regulatory adoption and validation. Therefore, it is necessary to select the optimal model, considering not only model performance but also interpretability. This study aimed to identify the optimal combination of molecular fingerprints (pattern-based versus algorithm-based) and machine learning algorithms (simple versus complex) for developing explainable toxicity prediction models through an comprehensive investigation of the ToxCast/Tox21 bioassay data set. For 1092 ToxCast/Tox21 assays, five molecular fingerprints (MACCS, Morgan, RDKit, Layered, and Patterned) and six algorithms (MLP, GBT, Random Forest, kNN, Logistic Regression, and Naïve Bayes) were used to train the models. Results showed that 35 models revealed acceptable performance (F1 score or accuracy is 0.8 or higher). Among the combinations, either MACCS or Morgan, paired with Random Forest, demonstrated robust performance compared with other molecular fingerprints and algorithms. MACCS and Random Forest are valuable, even when prioritizing interpretability. Consequently, the MACCS-Random Forest combination model based on four assays, targeting G protein-coupled receptor and kinase, were identified and they can be used to discern specific structural features or patterns in chemical compounds, offering explainable insights into toxicity-related chemical structures. This study indicates the importance of not disregarding the utilization of simple models when assessing both predictivity and interpretability within the context of chemical feature-based Tox21 data analysis.
Collapse
Affiliation(s)
- Donghyeon Kim
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - Jaeseong Jeong
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - Jinhee Choi
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| |
Collapse
|
2
|
Li Y, Liu B, Deng J, Guo Y, Du H. Image-based molecular representation learning for drug development: a survey. Brief Bioinform 2024; 25:bbae294. [PMID: 38920347 PMCID: PMC11200195 DOI: 10.1093/bib/bbae294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 05/19/2024] [Accepted: 06/08/2024] [Indexed: 06/27/2024] Open
Abstract
Artificial intelligence (AI) powered drug development has received remarkable attention in recent years. It addresses the limitations of traditional experimental methods that are costly and time-consuming. While there have been many surveys attempting to summarize related research, they only focus on general AI or specific aspects such as natural language processing and graph neural network. Considering the rapid advance on computer vision, using the molecular image to enable AI appears to be a more intuitive and effective approach since each chemical substance has a unique visual representation. In this paper, we provide the first survey on image-based molecular representation for drug development. The survey proposes a taxonomy based on the learning paradigms in computer vision and reviews a large number of corresponding papers, highlighting the contributions of molecular visual representation in drug development. Besides, we discuss the applications, limitations and future directions in the field. We hope this survey could offer valuable insight into the use of image-based molecular representation learning in the context of drug development.
Collapse
Affiliation(s)
- Yue Li
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| | - Bingyan Liu
- School of Computer Science, Beijing University of Posts and Telecommunications, No.10 Xituchen Street, 100876, Beijing, China
| | - Jinyan Deng
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| | - Yi Guo
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| | - Hongbo Du
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
- Institute of Liver Disease, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| |
Collapse
|
3
|
Uesawa Y. Efficiency of pharmaceutical toxicity prediction in computational toxicology. Toxicol Res 2024; 40:1-9. [PMID: 38223665 PMCID: PMC10786748 DOI: 10.1007/s43188-023-00215-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 09/25/2023] [Accepted: 10/11/2023] [Indexed: 01/16/2024] Open
Abstract
The adverse effects and toxicity of chemical substances pose substantial challenges in drug discovery and environmental science. Their management, most especially in the early development stage, is crucial in preventing costly failures in clinical trials. Predictive methodologies, such as computational toxicology, offer an effective means of managing risks, particularly for new compounds with insufficient post-marketing surveillance and those lacking information on adverse effects. Computational approaches have become increasingly important in environmental science, in which the sheer number and diversity of chemicals present similar challenges to toxicity control. Traditional animal-based evaluation methods are resource intensive, time consuming, and ethically problematic, making them unsuitable for use in assessing the vast compound range. It is an urgent task for the academic community to minimize the risks associated with drug discovery and environmental exposure. This study focuses on systems used to predict toxicity from chemical structure information and outlines the prediction accuracy and systems developed in Japan.
Collapse
Affiliation(s)
- Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, 2-522-1 Noshio, Kiyose, Tokyo 204-8588 Japan
| |
Collapse
|
4
|
Li H, Zhang R, Min Y, Ma D, Zhao D, Zeng J. A knowledge-guided pre-training framework for improving molecular representation learning. Nat Commun 2023; 14:7568. [PMID: 37989998 PMCID: PMC10663446 DOI: 10.1038/s41467-023-43214-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 11/03/2023] [Indexed: 11/23/2023] Open
Abstract
Learning effective molecular feature representation to facilitate molecular property prediction is of great significance for drug discovery. Recently, there has been a surge of interest in pre-training graph neural networks (GNNs) via self-supervised learning techniques to overcome the challenge of data scarcity in molecular property prediction. However, current self-supervised learning-based methods suffer from two main obstacles: the lack of a well-defined self-supervised learning strategy and the limited capacity of GNNs. Here, we propose Knowledge-guided Pre-training of Graph Transformer (KPGT), a self-supervised learning framework to alleviate the aforementioned issues and provide generalizable and robust molecular representations. The KPGT framework integrates a graph transformer specifically designed for molecular graphs and a knowledge-guided pre-training strategy, to fully capture both structural and semantic knowledge of molecules. Through extensive computational tests on 63 datasets, KPGT exhibits superior performance in predicting molecular properties across various domains. Moreover, the practical applicability of KPGT in drug discovery has been validated by identifying potential inhibitors of two antitumor targets: hematopoietic progenitor kinase 1 (HPK1) and fibroblast growth factor receptor 1 (FGFR1). Overall, KPGT can provide a powerful and useful tool for advancing the artificial intelligence (AI)-aided drug discovery process.
Collapse
Affiliation(s)
- Han Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
| | - Ruotian Zhang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
| | - Yaosen Min
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
| | - Dacheng Ma
- Research Center for Biological Computation, Zhejiang Province, Zhejiang Laboratory, 311100, Hangzhou, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China.
- School of Engineering, Westlake University, Zhejiang Province, 310030, Hangzhou, China.
| |
Collapse
|
5
|
Kashyap K, Mahapatra PP, Ahmed S, Buyukbingol E, Siddiqi MI. Identification of Potential Aldose Reductase Inhibitors Using Convolutional Neural Network-Based in Silico Screening. J Chem Inf Model 2023; 63:6261-6282. [PMID: 37788831 DOI: 10.1021/acs.jcim.3c00547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Aldose reductase (ALR2) is a notable enzyme of the polyol pathway responsible for aggravating diabetic neuropathy complications. The first step begins when it catalyzes the reduction of glucose to sorbitol with NADPH as a coenzyme. Elevated concentrations of sorbitol damage the tissues, leading to complications like neuropathy. Though considerable effort has been pushed toward the successful discovery of potent inhibitors, its discovery still remains an elusive task. To this end, we present a 3D convolutional neural network (3D-CNN) based ALR2 inhibitor classification technique by dealing with snapshots of images captured from 3D chemical structures with multiple rotations as input data. The CNN-based architecture was trained on the 360 sets of image data along each axis and further prediction on the Maybridge library by each of the models. Subjecting the retrieved hits to molecular docking leads to the identification of the top 10 molecules with high binding affinity. The hits displayed a better blood-brain barrier penetration (BBB) score (90% with more than four scores) as compared to standard inhibitors (38%), reflecting the superior BBB penetrating efficiency of the hits. Followed by molecular docking, the biological evaluation spotlighted five compounds as promising ALR2 inhibitors and can be considered as a likely prospect for further structural optimization with medicinal chemistry efforts to improve their inhibition efficacy and consolidate them as new ALR2 antagonists in the future. In addition, the study also demonstrated the usefulness of scaffold analysis of the molecules as a method for investigating the significance of structurally diverse compounds in data-driven studies. For reproducibility and accessibility purposes, all of the source codes used in our study are publicly available.
Collapse
Affiliation(s)
- Kushagra Kashyap
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Pinaki Prasad Mahapatra
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow 226031, India
| | - Shakil Ahmed
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow 226031, India
| | - Erdem Buyukbingol
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Ankara University, 06100 Ankara, Turkey
| | - Mohammad Imran Siddiqi
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| |
Collapse
|
6
|
Mamada H, Takahashi M, Ogino M, Nomura Y, Uesawa Y. Predictive Models Based on Molecular Images and Molecular Descriptors for Drug Screening. ACS OMEGA 2023; 8:37186-37195. [PMID: 37841172 PMCID: PMC10568689 DOI: 10.1021/acsomega.3c04073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 08/30/2023] [Indexed: 10/17/2023]
Abstract
Various toxicity and pharmacokinetic evaluations as screening experiments are needed at the drug discovery stage. Currently, to reduce the use of animal experiments and developmental expenses, the development of high-performance predictive models based on quantitative structure-activity relationship analysis is desired. From these evaluation targets, we selected 50% lethal dose (LD50), blood-brain barrier penetration (BBBP), and the clearance (CL) pathway for this investigation and constructed predictive models for each target using 636-11,886 compounds. First, we constructed predictive models using the DeepSnap-deep learning (DL) method and images of compounds as features. The calculated area under the curve (AUC) and balanced accuracy (BAC) were, respectively, 0.887 and 0.818 for LD50, 0.893 and 0.824 for BBBP, and 0.883 and 0.763 for the CL pathway. Next, molecular descriptors (MDs) of compounds were calculated using Molecular Operating Environment, alvaDesc, and ADMET Predictor to construct predictive models using the MD-based method. Using these MDs, we constructed predictive models using DataRobot. The calculated AUC and BAC were, respectively, 0.931 and 0.805 for LD50, 0.919 and 0.849 for BBBP, and 0.900 and 0.807 for the CL pathway. In this investigation, we constructed predictive models combining the DeepSnap-DL and MD-based methods. In ensemble models using the mean predictive probability of the DeepSnap-DL and MD-based methods, the calculated AUC and BAC were, respectively, 0.942 and 0.842 for LD50, 0.936 and 0.853 for BBBP, and 0.908 and 0.832 for the CL pathway, with improved predictive performance observed for all variables compared with either single method alone. Moreover, in consensus models that adopted only compounds for which the results of the two methods agreed, the calculated BAC for LD50, BBBP, and the CL pathway were 0.916, 0.918, and 0.847, respectively, indicating higher predictive performance than the ensemble models for all three variables. The predictive models combining the DeepSnap-DL and MD-based methods displayed high predictive performance for LD50, BBBP, and the CL pathway. Therefore, the application of this approach to prediction targets in various drug discovery screenings is expected to accelerate drug discovery.
Collapse
Affiliation(s)
- Hideaki Mamada
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1 Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Mari Takahashi
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1 Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Mizuki Ogino
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1 Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yukihiro Nomura
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1 Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yoshihiro Uesawa
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1 Noshio, Kiyose, Tokyo 204-858, Japan
| |
Collapse
|
7
|
Miao Y, Ma H, Huang J. Recent Advances in Toxicity Prediction: Applications of Deep Graph Learning. Chem Res Toxicol 2023; 36:1206-1226. [PMID: 37562046 DOI: 10.1021/acs.chemrestox.2c00384] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
The development of new drugs is time-consuming and expensive, and as such, accurately predicting the potential toxicity of a drug candidate is crucial in ensuring its safety and efficacy. Recently, deep graph learning has become prevalent in this field due to its computational power and cost efficiency. Many novel deep graph learning methods aid toxicity prediction and further prompt drug development. This review aims to connect fundamental knowledge with burgeoning deep graph learning methods. We first summarize the essential components of deep graph learning models for toxicity prediction, including molecular descriptors, molecular representations, evaluation metrics, validation methods, and data sets. Furthermore, based on various graph-related representations of molecules, we introduce several representative studies and methods for toxicity prediction from the perspective of GNN architectures and graph pretrained models. Compared to other types of models, deep graph models not only advance in higher accuracy and efficiency but also provide more intuitive insights, which is significant in the development of model interpretation and generalization ability. The graph pretrained models are emerging as they can extract prominent features from large-scale unlabeled molecular graph data and improve the performance of downstream toxicity prediction tasks. We hope this survey can serve as a handbook for individuals interested in exploring deep graph learning for toxicity prediction.
Collapse
Affiliation(s)
- Yuwei Miao
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| | - Hehuan Ma
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| | - Junzhou Huang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| |
Collapse
|
8
|
Niazi SK, Mariam Z. Recent Advances in Machine-Learning-Based Chemoinformatics: A Comprehensive Review. Int J Mol Sci 2023; 24:11488. [PMID: 37511247 PMCID: PMC10380192 DOI: 10.3390/ijms241411488] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 06/30/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023] Open
Abstract
In modern drug discovery, the combination of chemoinformatics and quantitative structure-activity relationship (QSAR) modeling has emerged as a formidable alliance, enabling researchers to harness the vast potential of machine learning (ML) techniques for predictive molecular design and analysis. This review delves into the fundamental aspects of chemoinformatics, elucidating the intricate nature of chemical data and the crucial role of molecular descriptors in unveiling the underlying molecular properties. Molecular descriptors, including 2D fingerprints and topological indices, in conjunction with the structure-activity relationships (SARs), are pivotal in unlocking the pathway to small-molecule drug discovery. Technical intricacies of developing robust ML-QSAR models, including feature selection, model validation, and performance evaluation, are discussed herewith. Various ML algorithms, such as regression analysis and support vector machines, are showcased in the text for their ability to predict and comprehend the relationships between molecular structures and biological activities. This review serves as a comprehensive guide for researchers, providing an understanding of the synergy between chemoinformatics, QSAR, and ML. Due to embracing these cutting-edge technologies, predictive molecular analysis holds promise for expediting the discovery of novel therapeutic agents in the pharmaceutical sciences.
Collapse
Affiliation(s)
- Sarfaraz K Niazi
- College of Pharmacy, University of Illinois, Chicago, IL 61820, USA
| | - Zamara Mariam
- Zamara Mariam, School of Interdisciplinary Engineering & Sciences (SINES), National University of Sciences & Technology (NUST), Islamabad 24090, Pakistan
| |
Collapse
|
9
|
Ensemble Learning, Deep Learning-Based and Molecular Descriptor-Based Quantitative Structure-Activity Relationships. Molecules 2023; 28:molecules28052410. [PMID: 36903654 PMCID: PMC10005768 DOI: 10.3390/molecules28052410] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 02/28/2023] [Accepted: 03/01/2023] [Indexed: 03/09/2023] Open
Abstract
A deep learning-based quantitative structure-activity relationship analysis, namely the molecular image-based DeepSNAP-deep learning method, can successfully and automatically capture the spatial and temporal features in an image generated from a three-dimensional (3D) structure of a chemical compound. It allows building high-performance prediction models without extracting and selecting features because of its powerful feature discrimination capability. Deep learning (DL) is based on a neural network with multiple intermediate layers that makes it possible to solve highly complex problems and improve the prediction accuracy by increasing the number of hidden layers. However, DL models are too complex when it comes to understanding the derivation of predictions. Instead, molecular descriptor-based machine learning has clear features owing to the selection and analysis of features. However, molecular descriptor-based machine learning has some limitations in terms of prediction performance, calculation cost, feature selection, etc., while the DeepSNAP-deep learning method outperforms molecular descriptor-based machine learning due to the utilization of 3D structure information and the advanced computer processing power of DL.
Collapse
|
10
|
Jeong J, Choi J. Artificial Intelligence-Based Toxicity Prediction of Environmental Chemicals: Future Directions for Chemical Management Applications. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:7532-7543. [PMID: 35666838 DOI: 10.1021/acs.est.1c07413] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Recently, research on the development of artificial intelligence (AI)-based computational toxicology models that predict toxicity without the use of animal testing has emerged because of the rapid development of computer technology. Various computational toxicology techniques that predict toxicity based on the structure of chemical substances are gaining attention, including the quantitative structure-activity relationship. To understand the recent development of these models, we analyzed the databases, molecular descriptors, fingerprints, and algorithms considered in recent studies. Based on a selection of 96 papers published since 2014, we found that AI models have been developed to predict approximately 30 different toxicity end points using more than 20 toxicity databases. For model development, molecular access system and extended-connectivity fingerprints are the most commonly used molecular descriptors. The most used algorithm among the machine learning techniques is the random forest, while the most used algorithm among the deep learning techniques is a deep neural network. The use of AI technology in the development of toxicity prediction models is a new concept that will aid in achieving a scientific accord and meet regulatory applications. The comprehensive overview provided in this study will provide a useful guide for the further development and application of toxicity prediction models.
Collapse
Affiliation(s)
- Jaeseong Jeong
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, South Korea
| | - Jinhee Choi
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, South Korea
| |
Collapse
|
11
|
Mamada H, Nomura Y, Uesawa Y. Novel QSAR Approach for a Regression Model of Clearance That Combines DeepSnap-Deep Learning and Conventional Machine Learning. ACS OMEGA 2022; 7:17055-17062. [PMID: 35647436 PMCID: PMC9134387 DOI: 10.1021/acsomega.2c00261] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 04/29/2022] [Indexed: 05/03/2023]
Abstract
The toxicity, absorption, distribution, metabolism, and excretion properties of some targets are difficult to predict by quantitative structure-activity relationship analysis. Therefore, there is a need for a new prediction method that performs well for these targets. The aim of this study was to develop a new regression model of rat clearance (CL). We constructed a regression model using 1545 in-house compounds for which we had rat CL data. Molecular descriptors were calculated using molecular operating environment, alvaDesc, and ADMET Predictor software. The classification model of DeepSnap and Deep Learning (DeepSnap-DL) with images of the three-dimensional chemical structures of compounds as features was constructed, and the prediction probabilities for each compound were calculated. For molecular descriptor-based methods that use molecular descriptors and conventional machine learning algorithms selected by DataRobot, the correlation coefficient (R 2) and root mean square error (RMSE) were 0.625-0.669 and 0.295-0.318, respectively. We combined molecular descriptors and prediction probability of DeepSnap-DL as features and developed a novel regression method we called the combination model. In the combination model with these two types of features and conventional algorithms selected by DataRobot, R 2 and RMSE were 0.710-0.769 and 0.247-0.278, respectively. This finding shows that the combination model performed better than molecular descriptor-based methods. Our combination model will contribute to the design of more rational compounds for drug discovery. This method may be applicable not only to rat CL but also to other pharmacokinetic and pharmacological activity and toxicity parameters; therefore, applying it to other parameters may help to accelerate drug discovery.
Collapse
Affiliation(s)
- Hideaki Mamada
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose, Tokyo 204-8588, Japan
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yukihiro Nomura
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yoshihiro Uesawa
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose, Tokyo 204-8588, Japan
- . Phone: +81-42-495-8983. Fax: +81-42-495-8983
| |
Collapse
|
12
|
Matsuzaka Y, Uesawa Y. A Deep Learning-Based Quantitative Structure-Activity Relationship System Construct Prediction Model of Agonist and Antagonist with High Performance. Int J Mol Sci 2022; 23:ijms23042141. [PMID: 35216254 PMCID: PMC8877122 DOI: 10.3390/ijms23042141] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 02/12/2022] [Accepted: 02/14/2022] [Indexed: 01/27/2023] Open
Abstract
Molecular design and evaluation for drug development and chemical safety assessment have been advanced by quantitative structure–activity relationship (QSAR) using artificial intelligence techniques, such as deep learning (DL). Previously, we have reported the high performance of prediction models molecular initiation events (MIEs) on the adverse toxicological outcome using a DL-based QSAR method, called DeepSnap-DL. This method can extract feature values from images generated on a three-dimensional (3D)-chemical structure as a novel QSAR analytical system. However, there is room for improvement of this system’s time-consumption. Therefore, in this study, we constructed an improved DeepSnap-DL system by combining the processes of generating an image from a 3D-chemical structure, DL using the image as input data, and statistical calculation of prediction-performance. Consequently, we obtained that the three prediction models of agonists or antagonists of MIEs achieved high prediction-performance by optimizing the parameters of DeepSnap, such as the angle used in the depiction of the image of a 3D-chemical structure, data-split, and hyperparameters in DL. The improved DeepSnap-DL system will be a powerful tool for computer-aided molecular design as a novel QSAR system.
Collapse
Affiliation(s)
- Yasunari Matsuzaka
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Kiyose 204-8588, Japan;
- Center for Gene and Cell Therapy, Division of Molecular and Medical Genetics, The Institute of Medical Science, University of Tokyo, Minato City 108-8639, Japan
| | - Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Kiyose 204-8588, Japan;
- Correspondence: ; Tel.: +81-42-495-8983
| |
Collapse
|
13
|
Mamada H, Nomura Y, Uesawa Y. Prediction Model of Clearance by a Novel Quantitative Structure-Activity Relationship Approach, Combination DeepSnap-Deep Learning and Conventional Machine Learning. ACS OMEGA 2021; 6:23570-23577. [PMID: 34549154 PMCID: PMC8444299 DOI: 10.1021/acsomega.1c03689] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 08/23/2021] [Indexed: 05/19/2023]
Abstract
Some targets predicted by machine learning (ML) in drug discovery remain a challenge because of poor prediction. In this study, a new prediction model was developed and rat clearance (CL) was selected as a target because it is difficult to predict. A classification model was constructed using 1545 in-house compounds with rat CL data. The molecular descriptors calculated by Molecular Operating Environment (MOE), alvaDesc, and ADMET Predictor software were used to construct the prediction model. In conventional ML using 100 descriptors and random forest selected by DataRobot, the area under the curve (AUC) and accuracy (ACC) were 0.883 and 0.825, respectively. Conversely, the prediction model using DeepSnap and Deep Learning (DeepSnap-DL) with compound features as images had AUC and ACC of 0.905 and 0.832, respectively. We combined the two models (conventional ML and DeepSnap-DL) to develop a novel prediction model. Using the ensemble model with the mean of the predicted probabilities from each model improved the evaluation metrics (AUC = 0.943 and ACC = 0.874). In addition, a consensus model using the results of the agreement between classifications had an increased ACC (0.959). These combination models with a high level of predictive performance can be applied to rat CL as well as other pharmacokinetic parameters, pharmacological activity, and toxicity prediction. Therefore, these models will aid in the design of more rational compounds for the development of drugs.
Collapse
Affiliation(s)
- Hideaki Mamada
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose-shi, Tokyo 204-858, Japan
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical Research Institute, Japan Tobacco
Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yukihiro Nomura
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical Research Institute, Japan Tobacco
Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yoshihiro Uesawa
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose-shi, Tokyo 204-858, Japan
- . Tel.: +81-42-495-8983. Fax: +81-42-495-8983
| |
Collapse
|
14
|
Kulichenko M, Smith JS, Nebgen B, Li YW, Fedik N, Boldyrev AI, Lubbers N, Barros K, Tretiak S. The Rise of Neural Networks for Materials and Chemical Dynamics. J Phys Chem Lett 2021; 12:6227-6243. [PMID: 34196559 DOI: 10.1021/acs.jpclett.1c01357] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Machine learning (ML) is quickly becoming a premier tool for modeling chemical processes and materials. ML-based force fields, trained on large data sets of high-quality electron structure calculations, are particularly attractive due their unique combination of computational efficiency and physical accuracy. This Perspective summarizes some recent advances in the development of neural network-based interatomic potentials. Designing high-quality training data sets is crucial to overall model accuracy. One strategy is active learning, in which new data are automatically collected for atomic configurations that produce large ML uncertainties. Another strategy is to use the highest levels of quantum theory possible. Transfer learning allows training to a data set of mixed fidelity. A model initially trained to a large data set of density functional theory calculations can be significantly improved by retraining to a relatively small data set of expensive coupled cluster theory calculations. These advances are exemplified by applications to molecules and materials.
Collapse
Affiliation(s)
- Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Justin S Smith
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Alexander I Boldyrev
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
15
|
Alves VM, Auerbach SS, Kleinstreuer N, Rooney JP, Muratov EN, Rusyn I, Tropsha A, Schmitt C. Curated Data In - Trustworthy In Silico Models Out: The Impact of Data Quality on the Reliability of Artificial Intelligence Models as Alternatives to Animal Testing. Altern Lab Anim 2021; 49:73-82. [PMID: 34233495 PMCID: PMC8609471 DOI: 10.1177/02611929211029635] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
New Approach Methodologies (NAMs) that employ artificial intelligence (AI) for predicting adverse effects of chemicals have generated optimistic expectations as alternatives to animal testing. However, the major underappreciated challenge in developing robust and predictive AI models is the impact of the quality of the input data on the model accuracy. Indeed, poor data reproducibility and quality have been frequently cited as factors contributing to the crisis in biomedical research, as well as similar shortcomings in the fields of toxicology and chemistry. In this article, we review the most recent efforts to improve confidence in the robustness of toxicological data and investigate the impact that data curation has on the confidence in model predictions. We also present two case studies demonstrating the effect of data curation on the performance of AI models for predicting skin sensitisation and skin irritation. We show that, whereas models generated with uncurated data had a 7-24% higher correct classification rate (CCR), the perceived performance was, in fact, inflated owing to the high number of duplicates in the training set. We assert that data curation is a critical step in building computational models, to help ensure that reliable predictions of chemical toxicity are achieved through use of the models.
Collapse
Affiliation(s)
- Vinicius M. Alves
- Office of Data Science, Division of the National Toxicology Program (DNTP), National Institute of Environmental Health Sciences (NIEHS), Durham, NC, USA
| | - Scott S. Auerbach
- Toxinformatics Group, Predictive Toxicology Branch, DNTP, NIEHS, Durham, NC, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Scientific Director's Office, DNTP, NIEHS, Durham, NC, USA
| | - John P. Rooney
- Integrated Laboratory Systems, LLC, Morrisville, NC, USA
| | - Eugene N. Muratov
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, The University of North Carolina at Chapel Hill, NC, USA
- Department of Pharmaceutical Sciences, Federal University of Paraiba, Joao Pessoa, Paraiba, Brazil
| | - Ivan Rusyn
- Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, The University of North Carolina at Chapel Hill, NC, USA
| | - Charles Schmitt
- Office of Data Science, Division of the National Toxicology Program (DNTP), National Institute of Environmental Health Sciences (NIEHS), Durham, NC, USA
| |
Collapse
|
16
|
Matsumura K. Skin sensitizer classification using dual-input machine learning model. CHEM-BIO INFORMATICS JOURNAL 2020. [DOI: 10.1273/cbij.20.54] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
17
|
Uesawa Y. [AI-based QSAR Modeling for Prediction of Active Compounds in MIE/AOP]. YAKUGAKU ZASSHI 2020; 140:499-505. [PMID: 32238631 DOI: 10.1248/yakushi.19-00190-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Toxicity testing is critical for new drug and chemical development process. A clinical study, experimental animal models, and in vitro study are performed to evaluate the safety of a new drug. The limitations of these methods include extensive time for toxicity testing, an ethical problem, and high costs of experimentation. Therefore computational methods are considered useful for estimating chemical toxicity. In silico toxicity prediction is one of the toxicity assessments that uses computational methods to predict and stimulate the toxicity of chemicals. In silico study aims to contribute to effective development of new drug and chemical design. In this study, quantitative structure-activity relationship (QSAR) models will be used to predict toxicities based on chemical structural parameters. Because toxicities are complicated physiological phenomena, a similar toxicity expression might cause a different pathway. Also, since many drugs with unknown mechanisms of actions are available, the application of artificial intelligence (AI)-which uses sophisticated algorithms- is increasingly used to predict toxicities. Recently, the QSAR model was applied to determine complex relations between chemical structures and toxicities. However, accuracy of QSAR for toxicity prediction remains an important issue. International competitions funded by public institutions can address this issue. Two important toxicity challenges were organized in the past decade; this article presents issues of toxicity based on these challenges.
Collapse
Affiliation(s)
- Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University
| |
Collapse
|
18
|
Matsuzaka Y, Uesawa Y. Molecular Image-Based Prediction Models of Nuclear Receptor Agonists and Antagonists Using the DeepSnap-Deep Learning Approach with the Tox21 10K Library. Molecules 2020; 25:molecules25122764. [PMID: 32549344 PMCID: PMC7356846 DOI: 10.3390/molecules25122764] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 06/06/2020] [Accepted: 06/12/2020] [Indexed: 02/07/2023] Open
Abstract
The interaction of nuclear receptors (NRs) with chemical compounds can cause dysregulation of endocrine signaling pathways, leading to adverse health outcomes due to the disruption of natural hormones. Thus, identifying possible ligands of NRs is a crucial task for understanding the adverse outcome pathway (AOP) for human toxicity as well as the development of novel drugs. However, the experimental assessment of novel ligands remains expensive and time-consuming. Therefore, an in silico approach with a wide range of applications instead of experimental examination is highly desirable. The recently developed novel molecular image-based deep learning (DL) method, DeepSnap-DL, can produce multiple snapshots from three-dimensional (3D) chemical structures and has achieved high performance in the prediction of chemicals for toxicological evaluation. In this study, we used DeepSnap-DL to construct prediction models of 35 agonist and antagonist allosteric modulators of NRs for chemicals derived from the Tox21 10K library. We demonstrate the high performance of DeepSnap-DL in constructing prediction models. These findings may aid in interpreting the key molecular events of toxicity and support the development of new fields of machine learning to identify environmental chemicals with the potential to interact with NR signaling pathways.
Collapse
|
19
|
Matsuzaka Y, Hosaka T, Ogaito A, Yoshinari K, Uesawa Y. Prediction Model of Aryl Hydrocarbon Receptor Activation by a Novel QSAR Approach, DeepSnap-Deep Learning. Molecules 2020; 25:molecules25061317. [PMID: 32183141 PMCID: PMC7144728 DOI: 10.3390/molecules25061317] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 12/31/2022] Open
Abstract
The aryl hydrocarbon receptor (AhR) is a ligand-dependent transcription factor that senses environmental exogenous and endogenous ligands or xenobiotic chemicals. In particular, exposure of the liver to environmental metabolism-disrupting chemicals contributes to the development and propagation of steatosis and hepatotoxicity. However, the mechanisms for AhR-induced hepatotoxicity and tumor propagation in the liver remain to be revealed, due to the wide variety of AhR ligands. Recently, quantitative structure–activity relationship (QSAR) analysis using deep neural network (DNN) has shown superior performance for the prediction of chemical compounds. Therefore, this study proposes a novel QSAR analysis using deep learning (DL), called the DeepSnap–DL method, to construct prediction models of chemical activation of AhR. Compared with conventional machine learning (ML) techniques, such as the random forest, XGBoost, LightGBM, and CatBoost, the proposed method achieves high-performance prediction of AhR activation. Thus, the DeepSnap–DL method may be considered a useful tool for achieving high-throughput in silico evaluation of AhR-induced hepatotoxicity.
Collapse
Affiliation(s)
- Yasunari Matsuzaka
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, 204-8588 Tokyo, Japan;
| | - Takuomi Hosaka
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 422-8529, Japan; (T.H.); (A.O.); (K.Y.)
| | - Anna Ogaito
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 422-8529, Japan; (T.H.); (A.O.); (K.Y.)
| | - Kouichi Yoshinari
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 422-8529, Japan; (T.H.); (A.O.); (K.Y.)
| | - Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, 204-8588 Tokyo, Japan;
- Correspondence:
| |
Collapse
|
20
|
Li X, Sanderson AR, Allen SS, Lahr RH. Tap water fingerprinting using a convolutional neural network built from images of the coffee-ring effect. Analyst 2020; 145:1511-1523. [PMID: 31934695 DOI: 10.1039/c9an01624d] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
A low-cost tap water fingerprinting technique was evaluated using the coffee-ring effect, a phenomenon by which tap water droplets leave distinguishable "fingerprint" residue patterns after water evaporates. Tap waters from communities across southern Michigan dried on aluminum and photographed with a cell phone camera and 30× loupe produced unique and reproducible images. A convolutional neural network (CNN) model was trained using the images from the Michigan tap waters, and despite the small size of the image dataset, the model assigned images into groups with similar water chemistry with 80% accuracy. Synthetic solutions containing only the majority species measured in Detroit, Lansing, and Michigan State University tap waters did not display the same residue patterns as collected waters; thus, the lower concentration species also influence the tap water "fingerprint". Residue pattern images from salt mixtures with an array of sodium, calcium, magnesium, chloride, bicarbonate, and sulfate concentrations were analyzed by measuring features observed in the photographs as well as using principal component analysis (PCA) on the image files and particles measurements. These analyses together highlighted differences in the residue patterns associated with the water chemistry in the sample. The results of these experiments suggest that the unique and reproducible residue patterns of tap water samples that can be imaged with a cell phone camera and a loupe contain a wealth of information about the overall composition of the tap water, and thus, the phenomenon should be further explored for potential use in low-cost tap water fingerprinting.
Collapse
Affiliation(s)
- Xiaoyan Li
- Department of Civil & Environmental Engineering, Michigan State University, 1449 Engineering Research Ct., East Lansing, MI 48824, USA.
| | | | | | | |
Collapse
|
21
|
Matsuzaka Y, Uesawa Y. DeepSnap-Deep Learning Approach Predicts Progesterone Receptor Antagonist Activity With High Performance. Front Bioeng Biotechnol 2020; 7:485. [PMID: 32039185 PMCID: PMC6987043 DOI: 10.3389/fbioe.2019.00485] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 12/30/2019] [Indexed: 12/16/2022] Open
Abstract
The progesterone receptor (PR) is important therapeutic target for many malignancies and endocrine disorders due to its role in controlling ovulation and pregnancy via the reproductive cycle. Therefore, the modulation of PR activity using its agonists and antagonists is receiving increasing interest as novel treatment strategy. However, clinical trials using the PR modulators have not yet been found conclusive evidences. Recently, increasing evidence from several fields shows that the classification of chemical compounds, including agonists and antagonists, can be done with recent improvements in deep learning (DL) using deep neural network. Therefore, we recently proposed a novel DL-based quantitative structure-activity relationship (QSAR) strategy using transfer learning to build prediction models for agonists and antagonists. By employing this novel approach, referred as DeepSnap-DL method, which uses images captured from 3-dimension (3D) chemical structure with multiple angles as input data into the DL classification, we constructed prediction models of the PR antagonists in this study. Here, the DeepSnap-DL method showed a high performance prediction of the PR antagonists by optimization of some parameters and image adjustment from 3D-structures. Furthermore, comparison of the prediction models from this approach with conventional machine learnings (MLs) indicated the DeepSnap-DL method outperformed these MLs. Therefore, the models predicted by DeepSnap-DL would be powerful tool for not only QSAR field in predicting physiological and agonist/antagonist activities, toxicity, and molecular bindings; but also for identifying biological or pathological phenomena.
Collapse
Affiliation(s)
| | - Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Tokyo, Japan
| |
Collapse
|
22
|
Cova TFGG, Pais AACC. Deep Learning for Deep Chemistry: Optimizing the Prediction of Chemical Patterns. Front Chem 2019; 7:809. [PMID: 32039134 PMCID: PMC6988795 DOI: 10.3389/fchem.2019.00809] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 11/11/2019] [Indexed: 12/14/2022] Open
Abstract
Computational Chemistry is currently a synergistic assembly between ab initio calculations, simulation, machine learning (ML) and optimization strategies for describing, solving and predicting chemical data and related phenomena. These include accelerated literature searches, analysis and prediction of physical and quantum chemical properties, transition states, chemical structures, chemical reactions, and also new catalysts and drug candidates. The generalization of scalability to larger chemical problems, rather than specialization, is now the main principle for transforming chemical tasks in multiple fronts, for which systematic and cost-effective solutions have benefited from ML approaches, including those based on deep learning (e.g. quantum chemistry, molecular screening, synthetic route design, catalysis, drug discovery). The latter class of ML algorithms is capable of combining raw input into layers of intermediate features, enabling bench-to-bytes designs with the potential to transform several chemical domains. In this review, the most exciting developments concerning the use of ML in a range of different chemical scenarios are described. A range of different chemical problems and respective rationalization, that have hitherto been inaccessible due to the lack of suitable analysis tools, is thus detailed, evidencing the breadth of potential applications of these emerging multidimensional approaches. Focus is given to the models, algorithms and methods proposed to facilitate research on compound design and synthesis, materials design, prediction of binding, molecular activity, and soft matter behavior. The information produced by pairing Chemistry and ML, through data-driven analyses, neural network predictions and monitoring of chemical systems, allows (i) prompting the ability to understand the complexity of chemical data, (ii) streamlining and designing experiments, (ii) discovering new molecular targets and materials, and also (iv) planning or rethinking forthcoming chemical challenges. In fact, optimization engulfs all these tasks directly.
Collapse
Affiliation(s)
- Tânia F. G. G. Cova
- Coimbra Chemistry Centre, CQC, Department of Chemistry, Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| | - Alberto A. C. C. Pais
- Coimbra Chemistry Centre, CQC, Department of Chemistry, Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
23
|
Matsuzaka Y, Uesawa Y. Prediction Model with High-Performance Constitutive Androstane Receptor (CAR) Using DeepSnap-Deep Learning Approach from the Tox21 10K Compound Library. Int J Mol Sci 2019; 20:ijms20194855. [PMID: 31574921 PMCID: PMC6801383 DOI: 10.3390/ijms20194855] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Revised: 09/23/2019] [Accepted: 09/27/2019] [Indexed: 12/30/2022] Open
Abstract
The constitutive androstane receptor (CAR) plays pivotal roles in drug-induced liver injury through the transcriptional regulation of drug-metabolizing enzymes and transporters. Thus, identifying regulatory factors for CAR activation is important for understanding its mechanisms. Numerous studies conducted previously on CAR activation and its toxicity focused on in vivo or in vitro analyses, which are expensive, time consuming, and require many animals. We developed a computational model that predicts agonists for the CAR using the Toxicology in the 21st Century 10k library. Additionally, we evaluate the prediction performance of novel deep learning (DL)-based quantitative structure-activity relationship analysis called the DeepSnap-DL approach, which is a procedure of generating an omnidirectional snapshot portraying three-dimensional (3D) structures of chemical compounds. The CAR prediction model, which applies a 3D structure generator tool, called CORINA-generated and -optimized chemical structures, in the DeepSnap-DL demonstrated better performance than the existing methods using molecular descriptors. These results indicate that high performance in the prediction model using the DeepSnap-DL approach may be important to prepare suitable 3D chemical structures as input data and to enable the identification of modulators of the CAR.
Collapse
Affiliation(s)
- Yasunari Matsuzaka
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Tokyo 204-8588, Japan.
| | - Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Tokyo 204-8588, Japan.
| |
Collapse
|