51
|
Nascimben M, Rimondini L. Molecular Toxicity Virtual Screening Applying a Quantized Computational SNN-Based Framework. Molecules 2023; 28:molecules28031342. [PMID: 36771009 PMCID: PMC9919191 DOI: 10.3390/molecules28031342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 01/27/2023] [Accepted: 01/29/2023] [Indexed: 02/04/2023] Open
Abstract
Spiking neural networks are biologically inspired machine learning algorithms attracting researchers' attention for their applicability to alternative energy-efficient hardware other than traditional computers. In the current work, spiking neural networks have been tested in a quantitative structure-activity analysis targeting the toxicity of molecules. Multiple public-domain databases of compounds have been evaluated with spiking neural networks, achieving accuracies compatible with high-quality frameworks presented in the previous literature. The numerical experiments also included an analysis of hyperparameters and tested the spiking neural networks on molecular fingerprints of different lengths. Proposing alternatives to traditional software and hardware for time- and resource-consuming tasks, such as those found in chemoinformatics, may open the door to new research and improvements in the field.
Collapse
Affiliation(s)
- Mauro Nascimben
- Department of Health Sciences, Center on Autoimmune and Allergic Diseases CAAD, Università del Piemonte Orientale, 28100 Novara, Italy
- Enginsoft SpA, 35129 Padua, Italy
- Correspondence:
| | - Lia Rimondini
- Department of Health Sciences, Center on Autoimmune and Allergic Diseases CAAD, Università del Piemonte Orientale, 28100 Novara, Italy
| |
Collapse
|
52
|
XSMILES: interactive visualization for molecules, SMILES and XAI attribution scores. J Cheminform 2023; 15:2. [PMID: 36609340 PMCID: PMC9817292 DOI: 10.1186/s13321-022-00673-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 12/17/2022] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Explainable artificial intelligence (XAI) methods have shown increasing applicability in chemistry. In this context, visualization techniques can highlight regions of a molecule to reveal their influence over a predicted property. For this purpose, some XAI techniques calculate attribution scores associated with tokens of SMILES strings or with atoms of a molecule. While an association of a score with an atom can be directly visually represented on a molecule diagram, scores computed for SMILES non-atom tokens cannot. For instance, a substring [N+] contains 3 non-atom tokens, i.e., [, [Formula: see text], and ], and their attributions, depending on the model, are not necessarily revealing an influence of the nitrogen atom over the predicted property; for that reason, it is not possible to represent the scores on a molecule diagram. Moreover, SMILES's notation is complex, foregrounding the need for techniques to facilitate the analysis of explanations associated with their tokens. RESULTS We propose XSMILES, an interactive visualization technique, to explore explainable artificial intelligence attributions scores and support the interpretation of SMILES. Users can input any type of score attributed to atom and non-atom tokens and visualize them on top of a 2D molecule diagram coordinated with a bar chart that represents a SMILES string. We demonstrate how attributions calculated for SMILES strings can be evaluated and better interpreted through interactivity with two use cases. CONCLUSIONS Data scientists can use XSMILES to understand their models' behavior and compare multiple modeling approaches. The tool provides a set of parameters to adapt the visualization to users' needs and it can be integrated into different platforms. We believe XSMILES can support data scientists to develop, improve, and communicate their models by making it easier to identify patterns and compare attributions through interactive exploratory visualization.
Collapse
|
53
|
Zheng X, Tomiura Y, Hayashi K. Investigation of the structure-odor relationship using a Transformer model. J Cheminform 2022; 14:88. [PMID: 36581889 PMCID: PMC9798546 DOI: 10.1186/s13321-022-00671-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 12/14/2022] [Indexed: 12/30/2022] Open
Abstract
The relationships between molecular structures and their properties are subtle and complex, and the properties of odor are no exception. Molecules with similar structures, such as a molecule and its optical isomer, may have completely different odors, whereas molecules with completely distinct structures may have similar odors. Many works have attempted to explain the molecular structure-odor relationship from chemical and data-driven perspectives. The Transformer model is widely used in natural language processing and computer vision, and the attention mechanism included in the Transformer model can identify relationships between inputs and outputs. In this paper, we describe the construction of a Transformer model for predicting molecular properties and interpreting the prediction results. The SMILES data of 100,000 molecules are collected and used to predict the existence of molecular substructures, and our proposed model achieves an F1 value of 0.98. The attention matrix is visualized to investigate the substructure annotation performance of the attention mechanism, and we find that certain atoms in the target substructures are accurately annotated. Finally, we collect 4462 molecules and their odor descriptors and use the proposed model to infer 98 odor descriptors, obtaining an average F1 value of 0.33. For the 19 odor descriptors that achieved F1 values greater than 0.45, we also attempt to summarize the relationship between the molecular substructures and odor quality through the attention matrix.
Collapse
Affiliation(s)
- Xiaofan Zheng
- Graduate School of Information Science and Electrical Engineering, Department of Informatics, Kyushu University, Fukuoka, Japan
| | - Yoichi Tomiura
- Graduate School of Information Science and Electrical Engineering, Department of Informatics, Kyushu University, Fukuoka, Japan
| | - Kenshi Hayashi
- Graduate School of Information Science and Electrical Engineering, Department of Electronics, Kyushu University, Fukuoka, Japan
| |
Collapse
|
54
|
Makarov D, Fadeeva Y, Safonova E, Shmukler L. Predictive modeling of antibacterial activity of ionic liquids by machine learning methods. Comput Biol Chem 2022; 101:107775. [DOI: 10.1016/j.compbiolchem.2022.107775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/24/2022] [Accepted: 10/03/2022] [Indexed: 11/03/2022]
|
55
|
Muzychka LV, Verves EV, Yaremchuk IO, Zinchenko AM, Shishkina SV, Semenyuta IV, Hodyna DM, Metelytsia LO, Kovalishyn V, Smolii OB. Synthesis, QSAR modeling, and molecular docking of novel fused 7-deazaxanthine derivatives as adenosine A 2A receptor antagonists. Chem Biol Drug Des 2022; 100:1025-1032. [PMID: 34651417 DOI: 10.1111/cbdd.13975] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 09/21/2021] [Accepted: 10/10/2021] [Indexed: 01/25/2023]
Abstract
Predictive QSAR models for the search of new adenosine A2A receptor antagonists were developed by using OCHEM platform. The predictive ability of the regression models has coefficient of determination q2 = 0.65-0.71 with cross-validation and independent test set. The inhibition activities of novel fused 7-deazaxanthine compounds were predicted by the developed QSAR models. A preparative method for the synthesis of pyrimido[5',4':4,5]pyrrolo[1,2-a][1,4]diazepine derivatives was developed, and 11 new adenosine A2A receptor antagonists were obtained. Preliminary investigations into the toxicology of fused 7-deazaxanthine compounds toward commonly used model organism to assess toxicity invertebrate cladoceran D. magna were also described.
Collapse
Affiliation(s)
- Liubov V Muzychka
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Evgenii V Verves
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine.,Enamine Ltd, Kyiv, Ukraine
| | - Iryna O Yaremchuk
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Anna M Zinchenko
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Svitlana V Shishkina
- Department of X-ray Diffraction Studies and Quantum Chemistry, STC "Institute for Single Crystals", NAS of Ukraine, Kharkiv, Ukraine
| | - Ivan V Semenyuta
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Diana M Hodyna
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Larysa O Metelytsia
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Vasyl Kovalishyn
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Oleg B Smolii
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| |
Collapse
|
56
|
Askr H, Elgeldawi E, Aboul Ella H, Elshaier YAMM, Gomaa MM, Hassanien AE. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev 2022; 56:5975-6037. [PMID: 36415536 PMCID: PMC9669545 DOI: 10.1007/s10462-022-10306-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/18/2022]
Abstract
Recently, using artificial intelligence (AI) in drug discovery has received much attention since it significantly shortens the time and cost of developing new drugs. Deep learning (DL)-based approaches are increasingly being used in all stages of drug development as DL technology advances, and drug-related data grows. Therefore, this paper presents a systematic Literature review (SLR) that integrates the recent DL technologies and applications in drug discovery Including, drug-target interactions (DTIs), drug-drug similarity interactions (DDIs), drug sensitivity and responsiveness, and drug-side effect predictions. We present a review of more than 300 articles between 2000 and 2022. The benchmark data sets, the databases, and the evaluation measures are also presented. In addition, this paper provides an overview of how explainable AI (XAI) supports drug discovery problems. The drug dosing optimization and success stories are discussed as well. Finally, digital twining (DT) and open issues are suggested as future research challenges for drug discovery problems. Challenges to be addressed, future research directions are identified, and an extensive bibliography is also included.
Collapse
Affiliation(s)
- Heba Askr
- Faculty of Computers and Artificial Intelligence, University of Sadat City, Sadat City, Egypt
| | - Enas Elgeldawi
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Heba Aboul Ella
- Faculty of Pharmacy and Drug Technology, Chinese University in Egypt (CUE), Cairo, Egypt
| | | | - Mamdouh M. Gomaa
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Aboul Ella Hassanien
- Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt
| |
Collapse
|
57
|
|
58
|
Gill ML. The rise of the machines in chemistry. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2022; 60:1044-1051. [PMID: 35976263 DOI: 10.1002/mrc.5304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 08/07/2022] [Accepted: 08/09/2022] [Indexed: 06/15/2023]
Abstract
The use of artificial intelligence and, more specifically, deep learning methods in chemistry is becoming increasingly common. Applications in informatics fields, such as cheminformatics and proteomics, structural biology, and spectroscopy, including NMR, are on the rise. Recent developments in model architectures, such as graph convolutional neural networks and transformers, have been enabled by advancements in computational hardware and software. However, model architectures with more predictive power often require larger amounts of training data, which can be challenging to acquire, but this requirement can be mitigated through techniques like pretraining and fine-tuning. In spite of these successes, challenges remain, such as normalization and scaling of data, availability of experimentally acquired data, and model explainability.
Collapse
|
59
|
Chandler M, Jain S, Halman J, Hong E, Dobrovolskaia MA, Zakharov AV, Afonin KA. Artificial Immune Cell, AI-cell, a New Tool to Predict Interferon Production by Peripheral Blood Monocytes in Response to Nucleic Acid Nanoparticles. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2022; 18:e2204941. [PMID: 36216772 PMCID: PMC9671856 DOI: 10.1002/smll.202204941] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/15/2022] [Indexed: 06/16/2023]
Abstract
Nucleic acid nanoparticles, or NANPs, rationally designed to communicate with the human immune system, can offer innovative therapeutic strategies to overcome the limitations of traditional nucleic acid therapies. Each set of NANPs is unique in their architectural parameters and physicochemical properties, which together with the type of delivery vehicles determine the kind and the magnitude of their immune response. Currently, there are no predictive tools that would reliably guide the design of NANPs to the desired immunological outcome, a step crucial for the success of personalized therapies. Through a systematic approach investigating physicochemical and immunological profiles of a comprehensive panel of various NANPs, the research team developes and experimentally validates a computational model based on the transformer architecture able to predict the immune activities of NANPs. It is anticipated that the freely accessible computational tool that is called an "artificial immune cell," or AI-cell, will aid in addressing the current critical public health challenges related to safety criteria of nucleic acid therapies in a timely manner and promote the development of novel biomedical tools.
Collapse
Affiliation(s)
- Morgan Chandler
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Sankalp Jain
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD 20850, USA
| | - Justin Halman
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Enping Hong
- Nanotechnology Characterization Lab, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Marina A. Dobrovolskaia
- Nanotechnology Characterization Lab, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD 20850, USA
| | - Kirill A. Afonin
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
60
|
Deep learning methods for molecular representation and property prediction. Drug Discov Today 2022; 27:103373. [PMID: 36167282 DOI: 10.1016/j.drudis.2022.103373] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/22/2022] [Accepted: 09/21/2022] [Indexed: 01/11/2023]
Abstract
With advances in artificial intelligence (AI) methods, computer-aided drug design (CADD) has developed rapidly in recent years. Effective molecular representation and accurate property prediction are crucial tasks in CADD workflows. In this review, we summarize contemporary applications of deep learning (DL) methods for molecular representation and property prediction. We categorize DL methods according to the format of molecular data (1D, 2D, and 3D). In addition, we discuss some common DL models, such as ensemble learning and transfer learning, and analyze the interpretability methods for these models. We also highlight the challenges and opportunities of DL methods for molecular representation and property prediction.
Collapse
|
61
|
Comparative Prediction of Gas Chromatographic Retention Indices for GC/MS Identification of Chemicals Related to Chemical Weapons Convention by Incremental and Machine Learning Methods. SEPARATIONS 2022. [DOI: 10.3390/separations9100265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
During on-site verification activities conducted by the Technical Secretariat of Organization for the Prohibition of Chemical Weapons, identification by gas chromatography retention indices (RI) data, in addition to mass spectrometry data, increase the reliability of factual findings. However, reference RIs do not cover all the possible chemical structures. That is why it is important to have models to predict RIs. Applicable only for narrow data sets of chemicals with a fixed scaffold (G- and V-series gases as example), the non-learning incremental method demonstrated predictive median absolute and percentage errors of 2–4 units and 0.1–0.2%; these are comparable with the experimental bias in RI measurements in the same laboratory with the same GC conditions. It outperforms the accuracy of two reported machine learning methods–median absolute and percentage errors of 11–52 units and 0.5–2.8%. However, for the whole Chemical Weapons Convention (CWC) data set of chemicals, when a fixed scaffold is absent, the incremental method is not applicable; essential machine learning methods achieved accuracy: median absolute and percentage errors of 29–33 units and 0.5–2.2%, depending on the machine learning method. In addition, we have developed a homology tree approach as a convenient method for the visualization of the CWC chemical space. We conclude that non-learning incremental methods may be more accurate than the state-of-the-art machine learning techniques in particular cases, such as predicting the RIs of homologues and isomers of chemicals related to CWC.
Collapse
|
62
|
Li S, Sultonov F, Tursunboev J, Park JH, Yun S, Kang JM. Ghostformer: A GhostNet-Based Two-Stage Transformer for Small Object Detection. SENSORS (BASEL, SWITZERLAND) 2022; 22:6939. [PMID: 36146288 PMCID: PMC9503248 DOI: 10.3390/s22186939] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/03/2022] [Accepted: 09/10/2022] [Indexed: 06/16/2023]
Abstract
In this paper, we propose a novel two-stage transformer with GhostNet, which improves the performance of the small object detection task. Specifically, based on the original Deformable Transformers for End-to-End Object Detection (deformable DETR), we chose GhostNet as the backbone to extract features, since it is better suited for an efficient feature extraction. Furthermore, at the target detection stage, we selected the 300 best bounding box results as regional proposals, which were subsequently set as primary object queries of the decoder layer. Finally, in the decoder layer, we optimized and modified the queries to increase the target accuracy. In order to validate the performance of the proposed model, we adopted a widely used COCO 2017 dataset. Extensive experiments demonstrated that the proposed scheme yielded a higher average precision (AP) score in detecting small objects than the existing deformable DETR model.
Collapse
Affiliation(s)
- Sijia Li
- Department of Artificial Intelligence, Kyungpook National University, Daegu 41566, Korea
| | - Furkat Sultonov
- Department of Artificial Intelligence, Kyungpook National University, Daegu 41566, Korea
| | - Jamshid Tursunboev
- Department of Artificial Intelligence, Kyungpook National University, Daegu 41566, Korea
| | - Jun-Hyun Park
- Department of Artificial Intelligence, Kyungpook National University, Daegu 41566, Korea
| | - Sangseok Yun
- Department of Information and Communications Engineering, Pukyong National University, Busan 48513, Korea
| | - Jae-Mo Kang
- Department of Artificial Intelligence, Kyungpook National University, Daegu 41566, Korea
| |
Collapse
|
63
|
Tetko IV, Klambauer G, Clevert DA, Shah I, Benfenati E. Artificial Intelligence Meets Toxicology. Chem Res Toxicol 2022; 35:1289-1290. [PMID: 35965447 PMCID: PMC10578182 DOI: 10.1021/acs.chemrestox.2c00196] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Igor V Tetko
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), 85764 Neuherberg, Germany
- BIGCHEM GmbH, Valerystr. 49, 85716 Unterschleißheim, Germany
| | - Günter Klambauer
- ELLIS Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Altenbergerstraße 69, 4040 Linz, Austria
| | | | - Imran Shah
- Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Emilio Benfenati
- Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, 20156 Milano, Italy
| |
Collapse
|
64
|
Lim S, Lee S, Piao Y, Choi M, Bang D, Gu J, Kim S. On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach. Comput Struct Biotechnol J 2022; 20:4288-4304. [PMID: 36051875 PMCID: PMC9399946 DOI: 10.1016/j.csbj.2022.07.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/29/2022] [Accepted: 07/29/2022] [Indexed: 11/22/2022] Open
Abstract
A large number of chemical compounds are available in databases such as PubChem and ZINC. However, currently known compounds, though large, represent only a fraction of possible compounds, which is known as chemical space. Many of these compounds in the databases are annotated with properties and assay data that can be used for drug discovery efforts. For this goal, a number of machine learning algorithms have been developed and recent deep learning technologies can be effectively used to navigate chemical space, especially for unknown chemical compounds, in terms of drug-related tasks. In this article, we survey how deep learning technologies can model and utilize chemical compound information in a task-oriented way by exploiting annotated properties and assay data in the chemical compounds databases. We first compile what kind of tasks are trying to be accomplished by machine learning methods. Then, we survey deep learning technologies to show their modeling power and current applications for accomplishing drug related tasks. Next, we survey deep learning techniques to address the insufficiency issue of annotated data for more effective navigation of chemical space. Chemical compound information alone may not be powerful enough for drug related tasks, thus we survey what kind of information, such as assay and gene expression data, can be used to improve the prediction power of deep learning models. Finally, we conclude this survey with four important newly developed technologies that are yet to be fully incorporated into computational analysis of chemical information.
Collapse
Affiliation(s)
- Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Sangseon Lee
- Institute of Computer Technology, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - MinGyu Choi
- Department of Chemistry, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Jeonghyeon Gu
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- MOGAM Institute for Biomedical Research, Yong-in 16924, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| |
Collapse
|
65
|
Kong Y, Zhao X, Liu R, Yang Z, Yin H, Zhao B, Wang J, Qin B, Yan A. Integrating concept of pharmacophore with graph neural networks for chemical property prediction and interpretation. J Cheminform 2022; 14:52. [PMID: 35927691 PMCID: PMC9351086 DOI: 10.1186/s13321-022-00634-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 07/16/2022] [Indexed: 11/10/2022] Open
Abstract
Recently, graph neural networks (GNNs) have revolutionized the field of chemical property prediction and achieved state-of-the-art results on benchmark data sets. Compared with the traditional descriptor- and fingerprint-based QSAR models, GNNs can learn task related representations, which completely gets rid of the rules defined by experts. However, due to the lack of useful prior knowledge, the prediction performance and interpretability of the GNNs may be affected. In this study, we introduced a new GNN model called RG-MPNN for chemical property prediction that integrated pharmacophore information hierarchically into message-passing neural network (MPNN) architecture, specifically, in the way of pharmacophore-based reduced-graph (RG) pooling. RG-MPNN absorbed not only the information of atoms and bonds from the atom-level message-passing phase, but also the information of pharmacophores from the RG-level message-passing phase. Our experimental results on eleven benchmark and ten kinase data sets showed that our model consistently matched or outperformed other existing GNN models. Furthermore, we demonstrated that applying pharmacophore-based RG pooling to MPNN architecture can generally help GNN models improve the predictive power. The cluster analysis of RG-MPNN representations and the importance analysis of pharmacophore nodes will help chemists gain insights for hit discovery and lead optimization.
Collapse
Affiliation(s)
- Yue Kong
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.,Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Xiaoman Zhao
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Ruizi Liu
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Zhenwu Yang
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Hongyan Yin
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.,Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Bowen Zhao
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Jinling Wang
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Bingjie Qin
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Aixia Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.
| |
Collapse
|
66
|
Tan T, Cheng H, Chen G, Song Z, Qi Z. Prediction of infinite‐dilution activity coefficients with neural collaborative filtering. AIChE J 2022. [DOI: 10.1002/aic.17789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Tian Tan
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering East China University of Science and Technology Shanghai China
| | - Hongye Cheng
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering East China University of Science and Technology Shanghai China
| | - Guzhong Chen
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering East China University of Science and Technology Shanghai China
| | - Zhen Song
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering East China University of Science and Technology Shanghai China
| | - Zhiwen Qi
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering East China University of Science and Technology Shanghai China
| |
Collapse
|
67
|
Theoretical and Experimental Studies of Phosphonium Ionic Liquids as Potential Antibacterials of MDR Acinetobacter baumannii. Antibiotics (Basel) 2022; 11:antibiotics11040491. [PMID: 35453241 PMCID: PMC9025513 DOI: 10.3390/antibiotics11040491] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 03/30/2022] [Accepted: 04/02/2022] [Indexed: 11/27/2022] Open
Abstract
A previously developed model to predict antibacterial activity of ionic liquids against a resistant A. baumannii strain was used to assess activity of phosphonium ionic liquids. Their antioxidant potential was additionally evaluated with newly developed models, which were based on public data. The accuracy of the models was rigorously evaluated using cross-validation as well as test set prediction. Six alkyl triphenylphosphonium and alkyl tributylphosphonium bromides with the C8, C10, and C12 alkyl chain length were synthesized and tested in vitro. Experimental studies confirmed their activity against A. baumannii as well as showed pronounced antioxidant properties. These results suggest that phosphonium ionic liquids could be promising lead structures against A. baumannii.
Collapse
|
68
|
Humer C, Heberle H, Montanari F, Wolf T, Huber F, Henderson R, Heinrich J, Streit M. ChemInformatics Model Explorer (CIME): exploratory analysis of chemical model explanations. J Cheminform 2022; 14:21. [PMID: 35379315 PMCID: PMC8981840 DOI: 10.1186/s13321-022-00600-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 03/12/2022] [Indexed: 11/10/2022] Open
Abstract
The introduction of machine learning to small molecule research- an inherently multidisciplinary field in which chemists and data scientists combine their expertise and collaborate - has been vital to making screening processes more efficient. In recent years, numerous models that predict pharmacokinetic properties or bioactivity have been published, and these are used on a daily basis by chemists to make decisions and prioritize ideas. The emerging field of explainable artificial intelligence is opening up new possibilities for understanding the reasoning that underlies a model. In small molecule research, this means relating contributions of substructures of compounds to their predicted properties, which in turn also allows the areas of the compounds that have the greatest influence on the outcome to be identified. However, there is no interactive visualization tool that facilitates such interdisciplinary collaborations towards interpretability of machine learning models for small molecules. To fill this gap, we present CIME (ChemInformatics Model Explorer), an interactive web-based system that allows users to inspect chemical data sets, visualize model explanations, compare interpretability techniques, and explore subgroups of compounds. The tool is model-agnostic and can be run on a server or a workstation.
Collapse
Affiliation(s)
| | - Henry Heberle
- Division Crop Science, Bayer AG, 40789, Monheim am Rhein, DE, Germany.
| | | | - Thomas Wolf
- Division Crop Science, Bayer AG, 65926, Frankfurt, DE, Germany
| | - Florian Huber
- Division Crop Science, Bayer AG, 65926, Frankfurt, DE, Germany
| | - Ryan Henderson
- Digital Technologies, Bayer AG, 13353, Berlin, DE, Germany
| | - Julian Heinrich
- Division Crop Science, Bayer AG, 40789, Monheim am Rhein, DE, Germany.
| | - Marc Streit
- Johannes Kepler University Linz, Linz, Austria.
| |
Collapse
|
69
|
Baskin I, Epshtein A, Ein-Eli Y. Benchmarking machine learning methods for modeling physical properties of ionic liquids. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2022.118616] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
70
|
Dey V, Machiraju R, Ning X. Improving Compound Activity Classification via Deep Transfer and Representation Learning. ACS OMEGA 2022; 7:9465-9483. [PMID: 35350358 PMCID: PMC8945064 DOI: 10.1021/acsomega.1c06805] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 02/23/2022] [Indexed: 06/14/2023]
Abstract
Recent advances in molecular machine learning, especially deep neural networks such as graph neural networks (GNNs), for predicting structure-activity relationships (SAR) have shown tremendous potential in computer-aided drug discovery. However, the applicability of such deep neural networks is limited by the requirement of large amounts of training data. In order to cope with limited training data for a target task, transfer learning for SAR modeling has been recently adopted to leverage information from data of related tasks. In this work, in contrast to the popular parameter-based transfer learning such as pretraining, we develop novel deep transfer learning methods TAc and TAc-fc to leverage source domain data and transfer useful information to the target domain. TAc learns to generate effective molecular features that can generalize well from one domain to another and increase the classification performance in the target domain. Additionally, TAc-fc extends TAc by incorporating novel components to selectively learn feature-wise and compound-wise transferability. We used the bioassay screening data from PubChem and identified 120 pairs of bioassays such that the active compounds in each pair are more similar to each other compared to their inactive compounds. Overall, TAc achieves the best performance with an average ROC-AUC of 0.801; it significantly improves the ROC-AUC of 83% of target tasks with an average task-wise performance improvement of 7.102%, compared to the best baseline dmpna. Our experiments clearly demonstrate that TAc achieves significant improvement over all baselines across a large number of target tasks. Furthermore, although TAc-fc achieves slightly worse ROC-AUC on average compared to TAc (0.798 vs 0.801), TAc-fc still achieves the best performance on more tasks in terms of PR-AUC and F1 compared to other methods. In summary, TAc-fc is also found to be a strong model with competitive or even better performance than TAc on a notable number of target tasks.
Collapse
Affiliation(s)
- Vishal Dey
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
| | - Raghu Machiraju
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
- Biomedical
Informatics, The Ohio State University, Columbus, Ohio 43210, United States
- Translational
Data Analytics Institute, The Ohio State
University, Columbus, Ohio 43210, United
States
| | - Xia Ning
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
- Biomedical
Informatics, The Ohio State University, Columbus, Ohio 43210, United States
- Translational
Data Analytics Institute, The Ohio State
University, Columbus, Ohio 43210, United
States
| |
Collapse
|
71
|
Harada Y, Hatakeyama M, Maeda S, Gao Q, Koizumi K, Sakamoto Y, Ono Y, Nakamura S. Molecular Design Learned from the Natural Product Porphyra-334: Molecular Generation via Chemical Variational Autoencoder versus Database Mining via Similarity Search, A Comparative Study. ACS OMEGA 2022; 7:8581-8590. [PMID: 35309498 PMCID: PMC8928499 DOI: 10.1021/acsomega.1c06453] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 02/18/2022] [Indexed: 06/14/2023]
Abstract
A comparative study is presented. The method via chemical variational autoencoder (VAE) and the method via similarity search are compared, focusing on their generation ability for new functional molecular design. Focusing on the natural porphyra-334 as a model molecule, we generated three groups: molecules of mycosporine-like amino acids (MAAs) as seeds (G SEEDS ), molecules generated via chemical VAE (G VAE ) and molecules gathered via similarity search (G SIM ). The number of molecules that satisfy the condition for the light absorption ability of porphyra-334 in G SEEDS , G VAE , and G SIM are 52, 138, and 6, respectively. The method via chemical VAE shows a promising potential for future molecular design. By using quantum chemistry wave function properties for chemical VAE, we find new molecules that are comparable to porphyra-334, including some with unexpected geometries. At the end, we show a group of molecules found with this method.
Collapse
Affiliation(s)
- Yuki Harada
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| | - Makoto Hatakeyama
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
- Sanyo-Onoda
City University, 1-1-1
Daigakudori, Sanyo-Onoda, Yamaguchi 756-0884, Japan
| | - Shuichi Maeda
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| | - Qi Gao
- Mitsubishi
Chemical Corporation Science & Innovation Center 1000 Kamoshida-cho, Yokohama, Kanagawa 227-8502, Japan
| | - Kenichi Koizumi
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| | - Yuki Sakamoto
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| | - Yuuki Ono
- Mitsubishi
Chemical Corporation Science & Innovation Center 1000 Kamoshida-cho, Yokohama, Kanagawa 227-8502, Japan
| | - Shinichiro Nakamura
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| |
Collapse
|
72
|
EMBER-Embedding Multiple Molecular Fingerprints for Virtual Screening. Int J Mol Sci 2022; 23:ijms23042156. [PMID: 35216273 PMCID: PMC8877815 DOI: 10.3390/ijms23042156] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 02/07/2022] [Accepted: 02/08/2022] [Indexed: 02/01/2023] Open
Abstract
In recent years, the debate in the field of applications of Deep Learning to Virtual Screening has focused on the use of neural embeddings with respect to classical descriptors in order to encode both structural and physical properties of ligands and/or targets. The attention on embeddings with the increasing use of Graph Neural Networks aimed at overcoming molecular fingerprints that are short range embeddings for atomic neighborhoods. Here, we present EMBER, a novel molecular embedding made by seven molecular fingerprints arranged as different “spectra” to describe the same molecule, and we prove its effectiveness by using deep convolutional architecture that assesses ligands’ bioactivity on a data set containing twenty protein kinases with similar binding sites to CDK1. The data set itself is presented, and the architecture is explained in detail along with its training procedure. We report experimental results and an explainability analysis to assess the contribution of each fingerprint to different targets.
Collapse
|
73
|
Ksenofontov AA, Lukanov MM, Bocharov PS, Berezin MB, Tetko IV. Deep neural network model for highly accurate prediction of BODIPYs absorption. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 267:120577. [PMID: 34776377 DOI: 10.1016/j.saa.2021.120577] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 10/12/2021] [Accepted: 10/31/2021] [Indexed: 06/13/2023]
Abstract
A possibility to accurately predict the absorption maximum wavelength of BODIPYs was investigated. We found that previously reported models had a low accuracy (40-57 nm) to predict BODIPYs due to the limited dataset sizes and/or number of BODIPYs (few hundreds). New models developed in this study were based on data of 6000-plus fluorescent dyes (including 4000-plus BODIPYs) and the deep neural network architecture. The high prediction accuracy (five-fold cross-validation room mean squared error (RMSE) of 18.4 nm) was obtained using a consensus model, which was more accurate than individual models. This model provided the excellent accuracy (RMSE of 8 nm) for molecules previously synthesized in our laboratory as well as for prospective validation of three new BODIPYs. We found that solvent properties did not significantly influence the model accuracy since only few BODIPYs exhibited solvatochromism. The analysis of large prediction errors suggested that compounds able to have intermolecular interactions with solvent or salts were likely to be incorrectly predicted. The consensus model is freely available at https://ochem.eu/article/134921 and can help the other researchers to accelerate design of new dyes with desired properties.
Collapse
Affiliation(s)
- Alexander A Ksenofontov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Michail M Lukanov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia; Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Pavel S Bocharov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia; Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Michail B Berezin
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia
| | - Igor V Tetko
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia; Helmholtz Zentrum München‑German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; BIGCHEM GmbH, Valerystr. 49, 85716 Unterschleißheim, Germany
| |
Collapse
|
74
|
Irwin R, Dimitriadis S, He J, Bjerrum EJ. Chemformer: a pre-trained transformer for computational chemistry. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac3ffb] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Abstract
Transformer models coupled with a simplified molecular line entry system (SMILES) have recently proven to be a powerful combination for solving challenges in cheminformatics. These models, however, are often developed specifically for a single application and can be very resource-intensive to train. In this work we present the Chemformer model—a Transformer-based model which can be quickly applied to both sequence-to-sequence and discriminative cheminformatics tasks. Additionally, we show that self-supervised pre-training can improve performance and significantly speed up convergence on downstream tasks. On direct synthesis and retrosynthesis prediction benchmark datasets we publish state-of-the-art results for top-1 accuracy. We also improve on existing approaches for a molecular optimisation task and show that Chemformer can optimise on multiple discriminative tasks simultaneously. Models, datasets and code will be made available after publication.
Collapse
|
75
|
Shi Y, Hua Y, Wang B, Zhang R, Li X. In Silico Prediction and Insights Into the Structural Basis of Drug Induced Nephrotoxicity. Front Pharmacol 2022; 12:793332. [PMID: 35082675 PMCID: PMC8785686 DOI: 10.3389/fphar.2021.793332] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 11/23/2021] [Indexed: 12/11/2022] Open
Abstract
Drug induced nephrotoxicity is a major clinical challenge, and it is always associated with higher costs for the pharmaceutical industry and due to detection during the late stages of drug development. It is desirable for improving the health outcomes for patients to distinguish nephrotoxic structures at an early stage of drug development. In this study, we focused on in silico prediction and insights into the structural basis of drug induced nephrotoxicity, based on reliable data on human nephrotoxicity. We collected 565 diverse chemical structures, including 287 nephrotoxic drugs on humans in the real world, and 278 non-nephrotoxic approved drugs. Several different machine learning and deep learning algorithms were employed for in silico model building. Then, a consensus model was developed based on three best individual models (RFR_QNPR, XGBOOST_QNPR, and CNF). The consensus model performed much better than individual models on internal validation and it achieved prediction accuracy of 86.24% external validation. The results of analysis of molecular properties differences between nephrotoxic and non-nephrotoxic structures indicated that several key molecular properties differ significantly, including molecular weight (MW), molecular polar surface area (MPSA), AlogP, number of hydrogen bond acceptors (nHBA), molecular solubility (LogS), the number of rotatable bonds (nRotB), and the number of aromatic rings (nAR). These molecular properties may be able to play an important part in the identification of nephrotoxic chemicals. Finally, 87 structural alerts for chemical nephrotoxicity were mined with f-score and positive rate analysis of substructures from Klekota-Roth fingerprint (KRFP). These structural alerts can well identify nephrotoxic drug structures in the data set. The in silico models and the structural alerts could be freely accessed via https://ochem.eu/article/140251 and http://www.sapredictor.cn, respectively. We hope the results should provide useful tools for early nephrotoxicity estimation in drug development.
Collapse
Affiliation(s)
- Yinping Shi
- Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
| | - Yuqing Hua
- Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China.,School of Pharmacy, Shandong First Medical University, Tai'an, China
| | - Baobao Wang
- Department of Nephrology, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
| | - Ruiqiu Zhang
- Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China.,School of Pharmacy, Shandong First Medical University, Tai'an, China
| | - Xiao Li
- Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China.,Shandong Provincial Qianfoshan Hospital, Shandong University, Jinan, China
| |
Collapse
|
76
|
Rusanov AI, Dmitrieva OA, Mamardashvili NZ, Tetko IV. More Is Not Always Better: Local Models Provide Accurate Predictions of Spectral Properties of Porphyrins. Int J Mol Sci 2022. [DOI: https://doi.org/10.3390/ijms23031201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
The development of new functional materials based on porphyrins requires fast and accurate prediction of their spectral properties. The available models in the literature for absorption wavelength and extinction coefficient of the Soret band have low accuracy for this class of compounds. We collected spectral data for porphyrins to extend the literature set and compared the performance of global and local models for their modelling using different machine learning methods. Interestingly, extension of the public database contributed models with lower accuracies compared to the models, which we built using porphyrins only. The later model calculated acceptable RMSE = 2.61 for prediction of the absorption band of 335 porphyrins synthesized in our laboratory, but had a low accuracy (RMSE = 0.52) for extinction coefficient. A development of models using only compounds from our laboratory significantly decreased errors for these compounds (RMSE = 0.5 and 0.042 for absorption band and extinction coefficient, respectively), but limited their applicability only to these homologous series. When developing models, one should clearly keep in mind their potential use and select a strategy that could contribute the most accurate predictions for the target application. The models and data are publicly available.
Collapse
|
77
|
Rusanov AI, Dmitrieva OA, Mamardashvili NZ, Tetko IV. More Is Not Always Better: Local Models Provide Accurate Predictions of Spectral Properties of Porphyrins. Int J Mol Sci 2022; 23:ijms23031201. [PMID: 35163123 PMCID: PMC8835262 DOI: 10.3390/ijms23031201] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 01/19/2022] [Indexed: 02/05/2023] Open
Abstract
The development of new functional materials based on porphyrins requires fast and accurate prediction of their spectral properties. The available models in the literature for absorption wavelength and extinction coefficient of the Soret band have low accuracy for this class of compounds. We collected spectral data for porphyrins to extend the literature set and compared the performance of global and local models for their modelling using different machine learning methods. Interestingly, extension of the public database contributed models with lower accuracies compared to the models, which we built using porphyrins only. The later model calculated acceptable RMSE = 2.61 for prediction of the absorption band of 335 porphyrins synthesized in our laboratory, but had a low accuracy (RMSE = 0.52) for extinction coefficient. A development of models using only compounds from our laboratory significantly decreased errors for these compounds (RMSE = 0.5 and 0.042 for absorption band and extinction coefficient, respectively), but limited their applicability only to these homologous series. When developing models, one should clearly keep in mind their potential use and select a strategy that could contribute the most accurate predictions for the target application. The models and data are publicly available.
Collapse
Affiliation(s)
- Aleksey I. Rusanov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, 153045 Ivanovo, Russia; (A.I.R.); (O.A.D.); (N.Z.M.)
| | - Olga A. Dmitrieva
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, 153045 Ivanovo, Russia; (A.I.R.); (O.A.D.); (N.Z.M.)
| | - Nugzar Zh. Mamardashvili
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, 153045 Ivanovo, Russia; (A.I.R.); (O.A.D.); (N.Z.M.)
| | - Igor V. Tetko
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, 153045 Ivanovo, Russia; (A.I.R.); (O.A.D.); (N.Z.M.)
- Helmholtz Munich, Institute of Structural Biology, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), D-85764 Neuherberg, Germany
- BIGCHEM GmbH, D-85716 Unterschleißheim, Germany
- Correspondence: ; Tel.: +49-89-3187-3575
| |
Collapse
|
78
|
Rusanov AI, Dmitrieva OA, Mamardashvili NZ, Tetko IV. More Is Not Always Better: Local Models Provide Accurate Predictions of Spectral Properties of Porphyrins. Int J Mol Sci 2022. [DOI: https:/doi.org/10.3390/ijms23031201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
The development of new functional materials based on porphyrins requires fast and accurate prediction of their spectral properties. The available models in the literature for absorption wavelength and extinction coefficient of the Soret band have low accuracy for this class of compounds. We collected spectral data for porphyrins to extend the literature set and compared the performance of global and local models for their modelling using different machine learning methods. Interestingly, extension of the public database contributed models with lower accuracies compared to the models, which we built using porphyrins only. The later model calculated acceptable RMSE = 2.61 for prediction of the absorption band of 335 porphyrins synthesized in our laboratory, but had a low accuracy (RMSE = 0.52) for extinction coefficient. A development of models using only compounds from our laboratory significantly decreased errors for these compounds (RMSE = 0.5 and 0.042 for absorption band and extinction coefficient, respectively), but limited their applicability only to these homologous series. When developing models, one should clearly keep in mind their potential use and select a strategy that could contribute the most accurate predictions for the target application. The models and data are publicly available.
Collapse
|
79
|
Jiménez-Luna J, Skalic M, Weskamp N. Benchmarking Molecular Feature Attribution Methods with Activity Cliffs. J Chem Inf Model 2022; 62:274-283. [PMID: 35019265 DOI: 10.1021/acs.jcim.1c01163] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Feature attribution techniques are popular choices within the explainable artificial intelligence toolbox, as they can help elucidate which parts of the provided inputs used by an underlying supervised-learning method are considered relevant for a specific prediction. In the context of molecular design, these approaches typically involve the coloring of molecular graphs, whose presentation to medicinal chemists can be useful for making a decision of which compounds to synthesize or prioritize. The consistency of the highlighted moieties alongside expert background knowledge is expected to contribute to the understanding of machine-learning models in drug design. Quantitative evaluation of such coloring approaches, however, has so far been limited to substructure identification tasks. We here present an approach that is based on maximum common substructure algorithms applied to experimentally-determined activity cliffs. Using the proposed benchmark, we found that molecule coloring approaches in conjunction with classical machine-learning models tend to outperform more modern, graph-neural-network alternatives. The provided benchmark data are fully open sourced, which we hope will facilitate the testing of newly developed molecular feature attribution techniques.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093 Zurich, Switzerland.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Miha Skalic
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Nils Weskamp
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| |
Collapse
|
80
|
Zhang XC, Wu CK, Yi JC, Zeng XX, Yang CQ, Lu AP, Hou TJ, Cao DS. Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration. RESEARCH 2022. [DOI: 10.34133/research.0004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Accurate prediction of pharmacological properties of small molecules is becoming increasingly important in drug discovery. Traditional feature-engineering approaches heavily rely on handcrafted descriptors and/or fingerprints, which need extensive human expert knowledge. With the rapid progress of artificial intelligence technology, data-driven deep learning methods have shown unparalleled advantages over feature-engineering-based methods. However, existing deep learning methods usually suffer from the scarcity of labeled data and the inability to share information between different tasks when applied to predicting molecular properties, thus resulting in poor generalization capability. Here, we proposed a novel multitask learning BERT (Bidirectional Encoder Representations from Transformer) framework, named MTL-BERT, which leverages large-scale pre-training, multitask learning, and SMILES (simplified molecular input line entry specification) enumeration to alleviate the data scarcity problem. MTL-BERT first exploits a large amount of unlabeled data through self-supervised pretraining to mine the rich contextual information in SMILES strings and then fine-tunes the pretrained model for multiple downstream tasks simultaneously by leveraging their shared information. Meanwhile, SMILES enumeration is used as a data enhancement strategy during the pretraining, fine-tuning, and test phases to substantially increase data diversity and help to learn the key relevant patterns from complex SMILES strings. The experimental results showed that the pretrained MTL-BERT model with few additional fine-tuning can achieve much better performance than the state-of-the-art methods on most of the 60 practical molecular datasets. Additionally, the MTL-BERT model leverages attention mechanisms to focus on SMILES character features essential to target properties for model interpretability.
Collapse
Affiliation(s)
- Xiao-Chen Zhang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Shangqiu Normal University, School of Information Technology, Shangqiu 476000, Henan, P. R. China
- College of Computer, National University of Defense Technology, Changsha 410005, Hunan, P. R. China
| | - Cheng-Kun Wu
- College of Computer, National University of Defense Technology, Changsha 410005, Hunan, P. R. China
| | - Jia-Cai Yi
- College of Computer, National University of Defense Technology, Changsha 410005, Hunan, P. R. China
| | - Xiang-Xiang Zeng
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, P. R. China
| | - Can-Qun Yang
- College of Computer, National University of Defense Technology, Changsha 410005, Hunan, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| | - Ting-Jun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| |
Collapse
|
81
|
Abstract
Quantitative structure-activity relationship (QSAR) models are routinely applied computational tools in the drug discovery process. QSAR models are regression or classification models that predict the biological activities of molecules based on the features derived from their molecular structures. These models are usually used to prioritize a list of candidate molecules for future laboratory experiments and to help chemists gain better insights into how structural changes affect a molecule's biological activities. Developing accurate and interpretable QSAR models is therefore of the utmost importance in the drug discovery process. Deep neural networks, which are powerful supervised learning algorithms, have shown great promise for addressing regression and classification problems in various research fields, including the pharmaceutical industry. In this chapter, we briefly review the applications of deep neural networks in QSAR modeling and describe commonly used techniques to improve model performance.
Collapse
|
82
|
Shin HK. Topological Distance-Based Electron Interaction Tensor to Apply a Convolutional Neural Network on Drug-like Compounds. ACS OMEGA 2021; 6:35757-35768. [PMID: 34984306 PMCID: PMC8717557 DOI: 10.1021/acsomega.1c05693] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 12/08/2021] [Indexed: 05/15/2023]
Abstract
Deep learning (DL) models in quantitative structure-activity relationship fed the molecular structure directly to the network without using human-designed descriptors by representing molecule as a graph or string (e.g., SMILES code). However, these two representations were oversimplification of real molecules to reflect chemical properties of molecular structures. Given that the choice of molecular representation determines the architecture of the DL model to apply, a novel way of molecular representation can open a way to apply diverse DL networks developed and used in other fields. A topological distance-based electron interaction (TDEi) tensor has been developed in this study inspired by the quantum mechanical model of the molecule, which defines a molecule with electrons and protons. In the TDEi tensor, the atomic orbital (AO) of each atom is represented by an electron configuration (EC) vector, which is a bit string based on the presence and absence of electrons in each AO according to spin indicated by positive and negative signs. Interactions between EC vectors were calculated based on the topological distance between atoms in a molecule. As a molecular structure was translated into 3D array, CNN models (modified VGGNet) were applied using a TDEi tensor to predict four physicochemical properties of drug-like compound datasets: MP (275,131), Lipop (4193), Esol (1127), and Freesolv (639). Models achieved good prediction accuracy. PCA showed that a stronger correlation was observed between the extracted features and the target endpoint as features were extracted from the deeper layer.
Collapse
Affiliation(s)
- Hyun Kil Shin
- Department
of Predictive Toxicology, Korea Institute
of Toxicology, Daejeon 34114, Republic of Korea
- Human
and Environmental Toxicology, University
of Science and Technology, Daejeon 34113, Republic of Korea
| |
Collapse
|
83
|
Chen G, Song Z, Qi Z. Transformer-convolutional neural network for surface charge density profile prediction: Enabling high-throughput solvent screening with COSMO-SAC. Chem Eng Sci 2021. [DOI: 10.1016/j.ces.2021.117002] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
84
|
Makarov D, Fadeeva Y, Shmukler L, Tetko I. Beware of proper validation of models for ionic Liquids! J Mol Liq 2021. [DOI: 10.1016/j.molliq.2021.117722] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
85
|
Li Y, Xu Y, Yu Y. CRNNTL: Convolutional Recurrent Neural Network and Transfer Learning for QSAR Modeling in Organic Drug and Material Discovery. Molecules 2021; 26:molecules26237257. [PMID: 34885843 PMCID: PMC8658888 DOI: 10.3390/molecules26237257] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 11/25/2021] [Accepted: 11/26/2021] [Indexed: 11/16/2022] Open
Abstract
Molecular latent representations, derived from autoencoders (AEs), have been widely used for drug or material discovery over the past couple of years. In particular, a variety of machine learning methods based on latent representations have shown excellent performance on quantitative structure–activity relationship (QSAR) modeling. However, the sequence feature of them has not been considered in most cases. In addition, data scarcity is still the main obstacle for deep learning strategies, especially for bioactivity datasets. In this study, we propose the convolutional recurrent neural network and transfer learning (CRNNTL) method inspired by the applications of polyphonic sound detection and electrocardiogram classification. Our model takes advantage of both convolutional and recurrent neural networks for feature extraction, as well as the data augmentation method. According to QSAR modeling on 27 datasets, CRNNTL can outperform or compete with state-of-art methods in both drug and material properties. In addition, the performances on one isomers-based dataset indicate that its excellent performance results from the improved ability in global feature extraction when the ability of the local one is maintained. Then, the transfer learning results show that CRNNTL can overcome data scarcity when choosing relative source datasets. Finally, the high versatility of our model is shown by using different latent representations as inputs from other types of AEs.
Collapse
Affiliation(s)
- Yaqin Li
- West China Tianfu Hospital, Sichuan University, Chengdu 610041, China
- Correspondence: (Y.L.); (Y.Y.)
| | - Yongjin Xu
- Department of Chemistry and Molecular Biology, University of Gothenburg, Kemivägen 10, 41296 Gothenburg, Sweden;
| | - Yi Yu
- Department of Chemistry and Molecular Biology, University of Gothenburg, Kemivägen 10, 41296 Gothenburg, Sweden;
- Correspondence: (Y.L.); (Y.Y.)
| |
Collapse
|
86
|
Kovalishyn V, Zyabrev V, Kachaeva M, Ziabrev K, Keith K, Harden E, Hartline C, James SH, Brovarets V. Design of new imidazole derivatives with anti-HCMV activity: QSAR modeling, synthesis and biological testing. J Comput Aided Mol Des 2021; 35:1177-1187. [PMID: 34766232 DOI: 10.1007/s10822-021-00428-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 10/26/2021] [Indexed: 11/29/2022]
Abstract
The problem of designing new antiviral drugs against Human Cytomegalovirus (HCMV) was addressed using the Online Chemical Modeling Environment (OCHEM). Data on compound antiviral activity to human organisms were collected from the literature and uploaded in the OCHEM database. The predictive ability of the regression models was tested through cross-validation, giving coefficient of determination q2 = 0.71-0.76. The validation of the models using an external test set proved that the models can be used to predict the activity of newly designed compounds with reasonable accuracy within the applicability domain (q2 = 0.70-0.74). The models were applied to screen a virtual chemical library of imidazole derivatives, which was designed to have antiviral activity. The six most promising compounds were identified, synthesized and their antiviral activities against HCMV were evaluated in vitro. However, only two of them showed some activity against the HCMV AD169 strain.
Collapse
Affiliation(s)
- Vasyl Kovalishyn
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Str, Kyiv, 02094, Ukraine.
| | - Volodymyr Zyabrev
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Str, Kyiv, 02094, Ukraine
| | - Maryna Kachaeva
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Str, Kyiv, 02094, Ukraine
| | - Kostiantyn Ziabrev
- Institute of Organic Chemistry, National Academy of Sciences, 5, Murmanska Str, Kyiv, 02660, Ukraine.,Click Chemistry Tools, East Gelging Dr, Scottsdale, AZ, 834185260, USA
| | - Kathy Keith
- Department of Pediatrics, Division of Pediatric Infectious Diseases, University of Alabama at Birmingham, Birmingham, Alabama, 35233, USA
| | - Emma Harden
- Department of Pediatrics, Division of Pediatric Infectious Diseases, University of Alabama at Birmingham, Birmingham, Alabama, 35233, USA
| | - Caroll Hartline
- Department of Pediatrics, Division of Pediatric Infectious Diseases, University of Alabama at Birmingham, Birmingham, Alabama, 35233, USA
| | - Scott H James
- Department of Pediatrics, Division of Pediatric Infectious Diseases, University of Alabama at Birmingham, Birmingham, Alabama, 35233, USA
| | - Volodymyr Brovarets
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Str, Kyiv, 02094, Ukraine
| |
Collapse
|
87
|
Wu CK, Zhang XC, Yang ZJ, Lu AP, Hou TJ, Cao DS. Learning to SMILES: BAN-based strategies to improve latent representation learning from molecules. Brief Bioinform 2021; 22:6356874. [PMID: 34427296 DOI: 10.1093/bib/bbab327] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/15/2021] [Accepted: 07/25/2021] [Indexed: 01/22/2023] Open
Abstract
Computational methods have become indispensable tools to accelerate the drug discovery process and alleviate the excessive dependence on time-consuming and labor-intensive experiments. Traditional feature-engineering approaches heavily rely on expert knowledge to devise useful features, which could be costly and sometimes biased. The emerging deep learning (DL) methods deliver a data-driven method to automatically learn expressive representations from complex raw data. Inspired by this, researchers have attempted to apply various deep neural network models to simplified molecular input line entry specification (SMILES) strings, which contain all the composition and structure information of molecules. However, current models usually suffer from the scarcity of labeled data. This results in a low generalization ability of SMILES-based DL models, which prevents them from competing with the state-of-the-art computational methods. In this study, we utilized the BiLSTM (bidirectional long short term merory) attention network (BAN) in which we employed a novel multi-step attention mechanism to facilitate the extracting of key features from the SMILES strings. Meanwhile, SMILES enumeration was utilized as a data augmentation method in the training phase to substantially increase the number of labeled data and enlarge the probability of mining more patterns from complex SMILES. We again took advantage of SMILES enumeration in the prediction phase to rectify model prediction bias and provide a more accurate prediction. Combined with the BAN model, our strategies can greatly improve the performance of latent features learned from SMILES strings. In 11 canonical absorption, distribution, metabolism, excretion and toxicity-related tasks, our method outperformed the state-of-the-art approaches.
Collapse
Affiliation(s)
- Cheng-Kun Wu
- State Key Laboratory of High-Performance Computing, College of Computer, National University of Defense Technology, China
| | - Xiao-Chen Zhang
- The College of Computer, National University of Defense Technology, China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Hunan, China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| |
Collapse
|
88
|
Ghosh D, Koch U, Hadian K, Sattler M, Tetko IV. Highly Accurate Filters to Flag Frequent Hitters in AlphaScreen Assays by Suggesting their Mechanism. Mol Inform 2021; 41:e2100151. [PMID: 34676998 DOI: 10.1002/minf.202100151] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Accepted: 09/29/2021] [Indexed: 11/06/2022]
Abstract
AlphaScreen is one of the most widely used assay technologies in drug discovery due to its versatility, dynamic range and sensitivity. However, a presence of false positives and frequent hitters contributes to difficulties with an interpretation of measured HTS data. Although filters do exist to identify frequent hitters for AlphaScreen, they are frequently based on privileged scaffolds. The development of such filters is time consuming and requires deep domain knowledge. Recently, machine learning and artificial intelligence methods are emerging as important tools to advance drug discovery and chemoinformatics, including their application to identification of frequent hitters in screening assays. However, the relative performance and complementarity of the Machine Learning and scaffold-based techniques has not yet been comprehensively compared. In this study, we analysed filters based on the privileged scaffolds with filters built using machine learning. Our results demonstrate that machine-learning methods provide more accurate filters for identification of frequent hitters in AlphaScreen assays than scaffold-based methods and can be easily redeveloped once new data are measured. We present highly accurate models to identify frequent hitters in AlphaScreen assays.
Collapse
Affiliation(s)
- Dipan Ghosh
- Lead Discovery Center GmbH, Otto-Hahn-Straße 15, 44227, Dortmund, Germany
| | - Uwe Koch
- Lead Discovery Center GmbH, Otto-Hahn-Straße 15, 44227, Dortmund, Germany
| | - Kamyar Hadian
- Assay Development and Screening Platform, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany
| | - Michael Sattler
- Bavarian NMR Center, Department Chemie, Technische Universität München, Ernst-Otto-Fischerstraße 2, D-85747, Garching, Germany.,Institute of Structural Biology, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany.,G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street 1, 153045, Ivanovo, Russia.,BIGCHEM GmbH, Valerystr. 49, D-85716, Unterschleißheim, Germany
| |
Collapse
|
89
|
Web-Based Quantitative Structure-Activity Relationship Resources Facilitate Effective Drug Discovery. Top Curr Chem (Cham) 2021; 379:37. [PMID: 34554348 DOI: 10.1007/s41061-021-00349-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 08/17/2021] [Indexed: 12/28/2022]
Abstract
Traditional drug discovery effectively contributes to the treatment of many diseases but is limited by high costs and long cycles. Quantitative structure-activity relationship (QSAR) methods were introduced to evaluate the activity of compounds virtually, which saves the significant cost of determining the activities of the compounds experimentally. Over the past two decades, many web tools for QSAR modeling with various features have been developed to facilitate the usage of QSAR methods. These web tools significantly reduce the difficulty of using QSAR and indirectly promote drug discovery. However, there are few comprehensive summaries of these QSAR tools, and researchers may have difficulty determining which tool to use. Hence, we systematically surveyed the mainstream web tools for QSAR modeling. This work may guide researchers in choosing appropriate web tools for developing QSAR models, and may also help develop more bioinformatics tools based on these existing resources. For nonprofessionals, we also hope to make more people aware of QSAR methods and expand their use.
Collapse
|
90
|
Kim J, Park S, Min D, Kim W. Comprehensive Survey of Recent Drug Discovery Using Deep Learning. Int J Mol Sci 2021; 22:9983. [PMID: 34576146 PMCID: PMC8470987 DOI: 10.3390/ijms22189983] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/09/2021] [Accepted: 09/10/2021] [Indexed: 02/07/2023] Open
Abstract
Drug discovery based on artificial intelligence has been in the spotlight recently as it significantly reduces the time and cost required for developing novel drugs. With the advancement of deep learning (DL) technology and the growth of drug-related data, numerous deep-learning-based methodologies are emerging at all steps of drug development processes. In particular, pharmaceutical chemists have faced significant issues with regard to selecting and designing potential drugs for a target of interest to enter preclinical testing. The two major challenges are prediction of interactions between drugs and druggable targets and generation of novel molecular structures suitable for a target of interest. Therefore, we reviewed recent deep-learning applications in drug-target interaction (DTI) prediction and de novo drug design. In addition, we introduce a comprehensive summary of a variety of drug and protein representations, DL models, and commonly used benchmark datasets or tools for model training and testing. Finally, we present the remaining challenges for the promising future of DL-based DTI prediction and de novo drug design.
Collapse
Affiliation(s)
- Jintae Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Sera Park
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Dongbo Min
- Computer Vision Lab, Department of Computer Science and Engineering, Ewha Womans University, Seoul 03760, Korea
| | - Wankyu Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
- System Pharmacology Lab, Department of Life Sciences, Ewha Womans University, Seoul 03760, Korea
| |
Collapse
|
91
|
Koras K, Kizling E, Juraeva D, Staub E, Szczurek E. Interpretable deep recommender system model for prediction of kinase inhibitor efficacy across cancer cell lines. Sci Rep 2021; 11:15993. [PMID: 34362938 PMCID: PMC8346627 DOI: 10.1038/s41598-021-94564-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 07/06/2021] [Indexed: 01/02/2023] Open
Abstract
Computational models for drug sensitivity prediction have the potential to significantly improve personalized cancer medicine. Drug sensitivity assays, combined with profiling of cancer cell lines and drugs become increasingly available for training such models. Multiple methods were proposed for predicting drug sensitivity from cancer cell line features, some in a multi-task fashion. So far, no such model leveraged drug inhibition profiles. Importantly, multi-task models require a tailored approach to model interpretability. In this work, we develop DEERS, a neural network recommender system for kinase inhibitor sensitivity prediction. The model utilizes molecular features of the cancer cell lines and kinase inhibition profiles of the drugs. DEERS incorporates two autoencoders to project cell line and drug features into 10-dimensional hidden representations and a feed-forward neural network to combine them into response prediction. We propose a novel interpretability approach, which in addition to the set of modeled features considers also the genes and processes outside of this set. Our approach outperforms simpler matrix factorization models, achieving R [Formula: see text] 0.82 correlation between true and predicted response for the unseen cell lines. The interpretability analysis identifies 67 biological processes that drive the cell line sensitivity to particular compounds. Detailed case studies are shown for PHA-793887, XMD14-99 and Dabrafenib.
Collapse
Affiliation(s)
- Krzysztof Koras
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Ewa Kizling
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Dilafruz Juraeva
- Oncology Bioinformatics, Translational Medicine, Merck Healthcare KGaA, Darmstadt, Germany
| | - Eike Staub
- Oncology Bioinformatics, Translational Medicine, Merck Healthcare KGaA, Darmstadt, Germany
| | - Ewa Szczurek
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland.
| |
Collapse
|
92
|
EGFRisopred: a machine learning-based classification model for identifying isoform-specific inhibitors against EGFR and HER2. Mol Divers 2021; 26:1531-1543. [PMID: 34345964 DOI: 10.1007/s11030-021-10284-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 07/21/2021] [Indexed: 10/20/2022]
Abstract
The EGFR kinase pathway is one of the most frequently activated signaling pathways in human cancers. EGFR and HER2 are the two significant members of this pathway, which are attractive drug targets of clinical relevance in lung and breast cancer. Therefore, identifying EGFR- and HER2-specific inhibitors is one of the important challenges in cancer drug discovery. To address this issue, a dataset of 519 compounds having inhibitory activity against both the isoforms, i.e., EGFR and HER2, was collected from the literature and developed a knowledge-based computational classification model for predicting the specificity of a molecule for an isoform (EGFR/HER2) with precision. A total of seventy-two classification models using nine fingerprint types, four classifiers (IBK, NB, SMO and RF) and two different datasets (EGFR and HER2 isoform specific) were developed. It was observed that the models developed using random forest and IBK performed better for EGFR- and HER2-specific datasets, respectively. Scaffold and functional group analysis led to the identification of prevalent core and fragments in each of the datasets. The accuracy of the selected best performing models was also evaluated using the decoy dataset. We have also developed an application EGFRisopred, which integrates the best performing models and permits the user to predict the specificity of a compound as an EGFR-/HER2-specific anticancer agent. It is expected that the tool's availability as a free utility will allow researchers to identify new inhibitors against these targets important in cancer.
Collapse
|
93
|
Gupta R, Mittal A, Agrawal V, Gupta S, Gupta K, Jain RR, Garg P, Mohanty SK, Sogani R, Chhabra HS, Gautam V, Mishra T, Sengupta D, Ahuja G. OdoriFy: A conglomerate of artificial intelligence-driven prediction engines for olfactory decoding. J Biol Chem 2021; 297:100956. [PMID: 34265305 PMCID: PMC8342790 DOI: 10.1016/j.jbc.2021.100956] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 06/24/2021] [Accepted: 07/09/2021] [Indexed: 12/01/2022] Open
Abstract
The molecular mechanisms of olfaction, or the sense of smell, are relatively underexplored compared with other sensory systems, primarily because of its underlying molecular complexity and the limited availability of dedicated predictive computational tools. Odorant receptors (ORs) allow the detection and discrimination of a myriad of odorant molecules and therefore mediate the first step of the olfactory signaling cascade. To date, odorant (or agonist) information for the majority of these receptors is still unknown, limiting our understanding of their functional relevance in odor-induced behavioral responses. In this study, we introduce OdoriFy, a Web server featuring powerful deep neural network-based prediction engines. OdoriFy enables (1) identification of odorant molecules for wildtype or mutant human ORs (Odor Finder); (2) classification of user-provided chemicals as odorants/nonodorants (Odorant Predictor); (3) identification of responsive ORs for a query odorant (OR Finder); and (4) interaction validation using Odorant-OR Pair Analysis. In addition, OdoriFy provides the rationale behind every prediction it makes by leveraging explainable artificial intelligence. This module highlights the basis of the prediction of odorants/nonodorants at atomic resolution and for the ORs at amino acid levels. A key distinguishing feature of OdoriFy is that it is built on a comprehensive repertoire of manually curated information of human ORs with their known agonists and nonagonists, making it a highly interactive and resource-enriched Web server. Moreover, comparative analysis of OdoriFy predictions with an alternative structure-based ligand interaction method revealed comparable results. OdoriFy is available freely as a web service at https://odorify.ahujalab.iiitd.edu.in/olfy/.
Collapse
Affiliation(s)
- Ria Gupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Aayushi Mittal
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Vishesh Agrawal
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Sushant Gupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Krishan Gupta
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Rishi Raj Jain
- Department of Computer Science and Design, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Prakriti Garg
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Sanjay Kumar Mohanty
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Riya Sogani
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Harshit Singh Chhabra
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Vishakha Gautam
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Tripti Mishra
- Pathfinder Research and Training Foundation, Greater Noida, Uttar Pradesh, India
| | - Debarka Sengupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India; Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India; Centre for Artificial Intelligence, Indraprastha Institute of Information Technology, New Delhi, India; Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India.
| |
Collapse
|
94
|
Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 2021; 25:1315-1360. [PMID: 33844136 PMCID: PMC8040371 DOI: 10.1007/s11030-021-10217-3] [Citation(s) in RCA: 280] [Impact Index Per Article: 93.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/22/2021] [Indexed: 02/06/2023]
Abstract
Drug designing and development is an important area of research for pharmaceutical companies and chemical scientists. However, low efficacy, off-target delivery, time consumption, and high cost impose a hurdle and challenges that impact drug design and discovery. Further, complex and big data from genomics, proteomics, microarray data, and clinical trials also impose an obstacle in the drug discovery pipeline. Artificial intelligence and machine learning technology play a crucial role in drug discovery and development. In other words, artificial neural networks and deep learning algorithms have modernized the area. Machine learning and deep learning algorithms have been implemented in several drug discovery processes such as peptide synthesis, structure-based virtual screening, ligand-based virtual screening, toxicity prediction, drug monitoring and release, pharmacophore modeling, quantitative structure-activity relationship, drug repositioning, polypharmacology, and physiochemical activity. Evidence from the past strengthens the implementation of artificial intelligence and deep learning in this field. Moreover, novel data mining, curation, and management techniques provided critical support to recently developed modeling algorithms. In summary, artificial intelligence and deep learning advancements provide an excellent opportunity for rational drug design and discovery process, which will eventually impact mankind. The primary concern associated with drug design and development is time consumption and production cost. Further, inefficiency, inaccurate target delivery, and inappropriate dosage are other hurdles that inhibit the process of drug delivery and development. With advancements in technology, computer-aided drug design integrating artificial intelligence algorithms can eliminate the challenges and hurdles of traditional drug design and development. Artificial intelligence is referred to as superset comprising machine learning, whereas machine learning comprises supervised learning, unsupervised learning, and reinforcement learning. Further, deep learning, a subset of machine learning, has been extensively implemented in drug design and development. The artificial neural network, deep neural network, support vector machines, classification and regression, generative adversarial networks, symbolic learning, and meta-learning are examples of the algorithms applied to the drug design and discovery process. Artificial intelligence has been applied to different areas of drug design and development process, such as from peptide synthesis to molecule design, virtual screening to molecular docking, quantitative structure-activity relationship to drug repositioning, protein misfolding to protein-protein interactions, and molecular pathway identification to polypharmacology. Artificial intelligence principles have been applied to the classification of active and inactive, monitoring drug release, pre-clinical and clinical development, primary and secondary drug screening, biomarker development, pharmaceutical manufacturing, bioactivity identification and physiochemical properties, prediction of toxicity, and identification of mode of action.
Collapse
Affiliation(s)
- Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Devesh Srivastava
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Mehar Sahu
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Swati Tiwari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India.
| |
Collapse
|
95
|
Vaz JM, Balaji S. Convolutional neural networks (CNNs): concepts and applications in pharmacogenomics. Mol Divers 2021; 25:1569-1584. [PMID: 34031788 PMCID: PMC8342355 DOI: 10.1007/s11030-021-10225-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Accepted: 04/21/2021] [Indexed: 12/17/2022]
Abstract
Convolutional neural networks (CNNs) have been used to extract information from various datasets of different dimensions. This approach has led to accurate interpretations in several subfields of biological research, like pharmacogenomics, addressing issues previously faced by other computational methods. With the rising attention for personalized and precision medicine, scientists and clinicians have now turned to artificial intelligence systems to provide them with solutions for therapeutics development. CNNs have already provided valuable insights into biological data transformation. Due to the rise of interest in precision and personalized medicine, in this review, we have provided a brief overview of the possibilities of implementing CNNs as an effective tool for analyzing one-dimensional biological data, such as nucleotide and protein sequences, as well as small molecular data, e.g., simplified molecular-input line-entry specification, InChI, binary fingerprints, etc., to categorize the models based on their objective and also highlight various challenges. The review is organized into specific research domains that participate in pharmacogenomics for a more comprehensive understanding. Furthermore, the future intentions of deep learning are outlined.
Collapse
Affiliation(s)
- Joel Markus Vaz
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
| | - S Balaji
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India.
| |
Collapse
|
96
|
Gallego V, Naveiro R, Roca C, Ríos Insua D, Campillo NE. AI in drug development: a multidisciplinary perspective. Mol Divers 2021; 25:1461-1479. [PMID: 34251580 PMCID: PMC8342381 DOI: 10.1007/s11030-021-10266-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/29/2021] [Indexed: 01/09/2023]
Abstract
The introduction of a new drug to the commercial market follows a complex and long process that typically spans over several years and entails large monetary costs due to a high attrition rate. Because of this, there is an urgent need to improve this process using innovative technologies such as artificial intelligence (AI). Different AI tools are being applied to support all four steps of the drug development process (basic research for drug discovery; pre-clinical phase; clinical phase; and postmarketing). Some of the main tasks where AI has proven useful include identifying molecular targets, searching for hit and lead compounds, synthesising drug-like compounds and predicting ADME-Tox. This review, on the one hand, brings in a mathematical vision of some of the key AI methods used in drug development closer to medicinal chemists and, on the other hand, brings the drug development process and the use of different models closer to mathematicians. Emphasis is placed on two aspects not mentioned in similar surveys, namely, Bayesian approaches and their applications to molecular modelling and the eventual final use of the methods to actually support decisions. Promoting a perfect synergy.
Collapse
Affiliation(s)
- Víctor Gallego
- Institute of Mathematical Sciences (ICMAT-CSIC), Nicolás Cabrera 13-15, 28049, Madrid, Spain
| | - Roi Naveiro
- Institute of Mathematical Sciences (ICMAT-CSIC), Nicolás Cabrera 13-15, 28049, Madrid, Spain
| | - Carlos Roca
- AItenea Biotech S.L. Parque Científico de Madrid, Faraday, 7, 28049, Madrid, Spain
| | - David Ríos Insua
- ICMAT-CSIC and Dept. of Statistics and OR, U. Compl. Madrid, Madrid, Spain
| | - Nuria E Campillo
- CIB-Margarita Salas (CSIC), Ramiro de Maeztu, 9, 28040, Madrid, Spain.
| |
Collapse
|
97
|
Lim H, Jung Y. MLSolvA: solvation free energy prediction from pairwise atomistic interactions by machine learning. J Cheminform 2021; 13:56. [PMID: 34332634 PMCID: PMC8325294 DOI: 10.1186/s13321-021-00533-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 07/15/2021] [Indexed: 01/04/2023] Open
Abstract
Recent advances in machine learning technologies and their applications have led to the development of diverse structure-property relationship models for crucial chemical properties. The solvation free energy is one of them. Here, we introduce a novel ML-based solvation model, which calculates the solvation energy from pairwise atomistic interactions. The novelty of the proposed model consists of a simple architecture: two encoding functions extract atomic feature vectors from the given chemical structure, while the inner product between the two atomistic feature vectors calculates their interactions. The results of 6239 experimental measurements achieve outstanding performance and transferability for enlarging training data owing to its solvent-non-specific nature. An analysis of the interaction map shows that our model has significant potential for producing group contributions on the solvation energy, which indicates that the model provides not only predictions of target properties but also more detailed physicochemical insights.
Collapse
Affiliation(s)
- Hyuntae Lim
- Department of Chemistry, Seoul National University, Seoul, 08826, South Korea
| | - YounJoon Jung
- Department of Chemistry, Seoul National University, Seoul, 08826, South Korea.
| |
Collapse
|
98
|
Krasnov L, Khokhlov I, Fedorov MV, Sosnin S. Transformer-based artificial neural networks for the conversion between chemical notations. Sci Rep 2021; 11:14798. [PMID: 34285269 PMCID: PMC8292511 DOI: 10.1038/s41598-021-94082-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 07/06/2021] [Indexed: 11/08/2022] Open
Abstract
We developed a Transformer-based artificial neural approach to translate between SMILES and IUPAC chemical notations: Struct2IUPAC and IUPAC2Struct. The overall performance level of our model is comparable to the rule-based solutions. We proved that the accuracy and speed of computations as well as the robustness of the model allow to use it in production. Our showcase demonstrates that a neural-based solution can facilitate rapid development keeping the required level of accuracy. We believe that our findings will inspire other developers to reduce development costs by replacing complex rule-based solutions with neural-based ones.
Collapse
Affiliation(s)
- Lev Krasnov
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology , Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia
- Syntelly LLC, Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia
- Department of Chemistry, Lomonosov Moscow State University, GSP-1, 1-3 Leninskiye Gory, Moscow, 119991, Russia
| | - Ivan Khokhlov
- Syntelly LLC, Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia
| | - Maxim V Fedorov
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology , Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia
- Syntelly LLC, Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia
| | - Sergey Sosnin
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology , Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia.
- Syntelly LLC, Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia.
| |
Collapse
|
99
|
Tinkov OV, Grigorev VY, Grigoreva LD. Prediction of an Organic Compound’s Biotransformation Time: A Study Using Avermectins. MOSCOW UNIVERSITY CHEMISTRY BULLETIN 2021. [PMCID: PMC8382113 DOI: 10.3103/s0027131421040088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The current spread of the SARS-CoV-2 coronavirus is a challenge for the entire world. Ivermectin is a promising agent, which could be used to combat the SARS-CoV-2 coronavirus. It represents a complex of semisynthetic derivatives of natural avermectins that have been taken advantage of for a long time in medicine and agriculture as antiparasitic drugs. However, the experimental ecotoxicology assessment data for individual avermectins are still scarce. In relation to this, the aim of this study is to develop a mathematical model that would allow reliably predicting the biotransformation ability of natural and semisynthetic avermectins and identifying the structural fragments of avermectin molecules that have the largest impact on this biological activity. The base for the model construction was a structurally heterogeneous set including organic compounds with experimentally determined biotransformation half-life periods (KmHL). Using the OCHEM web platform (https://ochem.eu) with the implemented PyDescriptor plugin for the descriptor calculation and Random Forest and Transformer-CNN algorithms, a satisfactory (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$R_{{{\text{test}}}}^{2}$$\end{document} = 0.81) Quantitative Relationship Structure—Activity (QSAR) model was developed. The subsequent calculations have shown that natural avermectins undergo on average faster biotransformation in fish than the semisynthetic ones. In addition, structural fragments that increase and decrease the biotransformation rate are identified.
Collapse
|
100
|
Tinkov OV, Grigorev VY, Grigoreva LD. QSAR analysis of the acute toxicity of avermectins towards Tetrahymena pyriformis. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:541-571. [PMID: 34157880 DOI: 10.1080/1062936x.2021.1932583] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 05/17/2021] [Indexed: 06/13/2023]
Abstract
Avermectins have been effectively used in medicine, veterinary medicine, and agriculture as antiparasitic agents for many years. However, there are still no reliable data on the main ecotoxicological characteristics of most individual avermectins. Although many QSAR models have been proposed to describe the acute toxicity of organic compounds towards Tetrahymena pyriformis (T. pyriformis), avermectins are outside the applicability domain of these models. The influence of the molecular structures of various organic compounds on the acute toxicity towards T. pyriformis was studied using the OCHEM web platform (https://ochem.eu). A data set of 1792 toxicants was used to create models. The QSAR (Quantitative Structure-Activity Relationship) models were developed using the molecular descriptors Dragon, ISIDA, CDK, PyDescriptor, alvaDesc, and SIRMS and machine learning methods, such as Least Squares Support Vector Machine and Transformer Convolutional Neural Network. The HYBOT descriptors and Random Forest were used for a comparative QSAR investigation. Since the best predictive ability was demonstrated by the Transformer Convolutional Neural Network model, it was used to predict the toxicity of individual avermectins towards T. pyriformis. During a structural interpretation of the developed QSAR model, we determined the significant molecular transformations that increase and decrease the acute toxicity of organic compounds.
Collapse
Affiliation(s)
- O V Tinkov
- Department of Pharmacology and Pharmaceutical Chemistry, Medical Faculty, Shevchenko Transnistria State University, Tiraspol, Moldova
- Department of Computer Science, Military Institute of the Ministry of Defense, Tiraspol, Moldova
| | - V Y Grigorev
- Department of Computer-aided Molecular Design, Institute of Physiologically Active Compounds of the Russian Academy of Science, Chernogolovka, Russia
| | - L D Grigoreva
- Department of Fundamental Physicochemical Engineering, Moscow State University, Moscow, Russia
| |
Collapse
|