Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Li X, Fourches D. SMILES Pair Encoding: A Data-Driven Substructure Tokenization Algorithm for Deep Learning. J Chem Inf Model 2021;61:1560-1569. [PMID: 33715361 DOI: 10.1021/acs.jcim.0c01127] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

For:	Li X, Fourches D. SMILES Pair Encoding: A Data-Driven Substructure Tokenization Algorithm for Deep Learning. J Chem Inf Model 2021;61:1560-1569. [PMID: 33715361 DOI: 10.1021/acs.jcim.0c01127] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Number

Cited by Other Article(s)

Han Y, Xu X, Hsieh CY, Ding K, Xu H, Xu R, Hou T, Zhang Q, Chen H. Retrosynthesis prediction with an iterative string editing model. Nat Commun 2024;15:6404. [PMID: 39080274 PMCID: PMC11289138 DOI: 10.1038/s41467-024-50617-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 07/09/2024] [Indexed: 08/02/2024] Open

Yuan Y, Tang X, Li H, Lang X, Li C, Song Y, Sun S, Yang Y, Zhou Z. KLSD: a kinase database focused on ligand similarity and diversity. Front Pharmacol 2024;15:1400136. [PMID: 38957398 PMCID: PMC11217335 DOI: 10.3389/fphar.2024.1400136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 05/28/2024] [Indexed: 07/04/2024] Open

Abstract

Due to the similarity and diversity among kinases, small molecule kinase inhibitors (SMKIs) often display multi-target effects or selectivity, which have a strong correlation with the efficacy and safety of these inhibitors. However, due to the limited number of well-known popular databases and their restricted data mining capabilities, along with the significant scarcity of databases focusing on the pharmacological similarity and diversity of SMIKIs, researchers find it challenging to quickly access relevant information. The KLIFS database is representative of specialized application databases in the field, focusing on kinase structure and co-crystallised kinase-ligand interactions, whereas the KLSD database in this paper emphasizes the analysis of SMKIs among all reported kinase targets. To solve the current problem of the lack of professional application databases in kinase research and to provide centralized, standardized, reliable and efficient data resources for kinase researchers, this paper proposes a research program based on the ChEMBL database. It focuses on kinase ligands activities comparisons. This scheme extracts kinase data and standardizes and normalizes them, then performs kinase target difference analysis to achieve kinase activity threshold judgement. It then constructs a specialized and personalized kinase database platform, adopts the front-end and back-end separation technology of SpringBoot architecture, constructs an extensible WEB application, handles the storage, retrieval and analysis of the data, ultimately realizing data visualization and interaction. This study aims to develop a kinase database platform to collect, organize, and provide standardized data related to kinases. By offering essential resources and tools, it supports kinase research and drug development, thereby advancing scientific research and innovation in kinase-related fields. It is freely accessible at: http://ai.njucm.edu.cn:8080.

Collapse

Das M, Ghosh A, Sunoj RB. Advances in machine learning with chemical language models in molecular property and reaction outcome predictions. J Comput Chem 2024;45:1160-1176. [PMID: 38299229 DOI: 10.1002/jcc.27315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/06/2024] [Accepted: 01/09/2024] [Indexed: 02/02/2024]

Abstract

Molecular properties and reactions form the foundation of chemical space. Over the years, innumerable molecules have been synthesized, a smaller fraction of them found immediate applications, while a larger proportion served as a testimony to creative and empirical nature of the domain of chemical science. With increasing emphasis on sustainable practices, it is desirable that a target set of molecules are synthesized preferably through a fewer empirical attempts instead of a larger library, to realize an active candidate. In this front, predictive endeavors using machine learning (ML) models built on available data acquire high timely significance. Prediction of molecular property and reaction outcome remain one of the burgeoning applications of ML in chemical science. Among several methods of encoding molecular samples for ML models, the ones that employ language like representations are gaining steady popularity. Such representations would additionally help adopt well-developed natural language processing (NLP) models for chemical applications. Given this advantageous background, herein we describe several successful chemical applications of NLP focusing on molecular property and reaction outcome predictions. From relatively simpler recurrent neural networks (RNNs) to complex models like transformers, different network architecture have been leveraged for tasks such as de novo drug design, catalyst generation, forward and retro-synthesis predictions. The chemical language model (CLM) provides promising avenues toward a broad range of applications in a time and cost-effective manner. While we showcase an optimistic outlook of CLMs, attention is also placed on the persisting challenges in reaction domain, which would optimistically be addressed by advanced algorithms tailored to chemical language and with increased availability of high-quality datasets.

Collapse

Kotlyarov R, Papachristos K, Wood GPF, Goodman JM. Leveraging Language Model Multitasking To Predict C-H Borylation Selectivity. J Chem Inf Model 2024;64:4286-4297. [PMID: 38708520 PMCID: PMC11134489 DOI: 10.1021/acs.jcim.4c00137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/05/2024] [Accepted: 04/23/2024] [Indexed: 05/07/2024]

Qiu X, Wang H, Tan X, Fang Z. G-K BertDTA: A graph representation learning and semantic embedding-based framework for drug-target affinity prediction. Comput Biol Med 2024;173:108376. [PMID: 38552281 DOI: 10.1016/j.compbiomed.2024.108376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/21/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]

Abstract

Developing new drugs is costly, time-consuming, and risky. Drug-target affinity (DTA), indicating the binding capability between drugs and target proteins, is a crucial indicator for drug development. Accurately predicting interaction strength between new drug-target pairs by analyzing previous experiments aids in screening potential drug molecules, repurposing them, and developing safe and effective medicines. Existing computational models for DTA prediction rely on strings or single-graph neural networks, lacking consideration of protein structure and molecular semantic information, leading to limited accuracy. Our experiments demonstrate that string-based methods may overlook protein conformations, causing a high root mean square error (RMSE) of 3.584 in affinity due to a lack of spatial context. Single graph networks also underperform on topology features, with a 6% lower confidence interval (CI) for activity classification. Absent semantic information also limits generalization across diverse compounds, resulting in 18% increment in RMSE and 5% in misclassifications within quantifications study, restricting potential drug discovery. To address these limitations, we propose G-K BertDTA, a novel framework for accurate DTA prediction incorporating protein features, molecular semantic features, and molecular structural information. In this proposed model, we represent drugs as graphs, with a GIN employed to learn the molecular topological information. For the extraction of protein structural features, we utilize a DenseNet architecture. A knowledge-based BERT semantic model is incorporated to obtain rich pre-trained semantic embeddings, thereby enhancing the feature information. We extensively evaluated our proposed approach on the publicly available benchmark datasets (i.e., KIBA and Davis), and experimental results demonstrate the promising performance of our method, which consistently outperforms previous state-of-the-art approaches. Code is available at https://github.com/AmbitYuki/G-K-BertDTA.

Collapse

Meewan I, Panmanee J, Petchyam N, Lertvilai P. HBCVTr: an end-to-end transformer with a deep neural network hybrid model for anti-HBV and HCV activity predictor from SMILES. Sci Rep 2024;14:9262. [PMID: 38649402 PMCID: PMC11035669 DOI: 10.1038/s41598-024-59933-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 04/16/2024] [Indexed: 04/25/2024] Open

Lin J, He Y, Ru C, Long W, Li M, Wen Z. Advancing Adverse Drug Reaction Prediction with Deep Chemical Language Model for Drug Safety Evaluation. Int J Mol Sci 2024;25:4516. [PMID: 38674100 PMCID: PMC11050562 DOI: 10.3390/ijms25084516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 04/15/2024] [Accepted: 04/16/2024] [Indexed: 04/28/2024] Open

Chen C, Huang Z, Zou X, Li S, Zhang D, Wang SL. Prediction of molecular-specific mutagenic alerts and related mechanisms of chemicals by a convolutional neural network (CNN) model based on SMILES split. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024;917:170435. [PMID: 38286298 DOI: 10.1016/j.scitotenv.2024.170435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 01/20/2024] [Accepted: 01/23/2024] [Indexed: 01/31/2024]

An L, Chen B, Zhang Y, Li H, Huang R, Li F, Tang Y. Compound Similarity Network as a Novel Data Mining Strategy for High-Throughput Investigation of Degradation Pathways of Organic Pollutants in Industrial Wastewater Treatment. Anal Chem 2024;96:3951-3959. [PMID: 38377587 DOI: 10.1021/acs.analchem.3c05983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]

Abstract

Identification of degradation products and pathways is crucial for investigating emerging pollutants and evaluation of wastewater treatment methods. Nontargeted analysis is a powerful tool to comprehensively investigate the degradation pathways of organic pollutants in real-world wastewater samples but often generates large data sets, making it difficult to effectively locate the exact information on interests. Herein, to efficiently establish the linkages among compounds in the same degradation pathways, we introduce a compound similarity network (CSN) as a novel data mining strategy for LC-MS-based nontargeted analysis of complex wastewater samples. Different from molecular networks that cluster compounds based on MS/MS spectra similarity, our CSN strategy harnesses molecular fingerprints to establish linkages among compounds and thus is spectra-independent. The effectiveness of CSN was demonstrated by nontargeted identification of degradation pathways and products of organic pollutants in leather industrial wastewater that underwent laboratory-scale activated carbon adsorption (ACD) and ozonation treatments. Utilizing CSN in interpreting nontargeted data, we tentatively annotated 4324 compounds in the untreated leather industrial wastewater, 3246 after ACD, and 3777 after ACD/ozonation. We located 145 potential degradation pathways of organic pollutants in the ACD/ozonation process using CSN and validated 7 pathways with 15 chemical standards. CSN also revealed 5 clusters of emerging pollutants, from which 3 compounds were selected for in vitro cytotoxicity study to evaluate their potential biohazards as new pollutants. As CSN offers an efficient way to connect massive compounds and to find multiple degradation pathways in a high-throughput manner, we anticipate that it will find wide applications in nontargeted analysis of diverse environmental samples.

Collapse

Temizer AB, Uludoğan G, Özçelik R, Koulani T, Ozkirimli E, Ulgen KO, Karali N, Özgür A. Exploring data-driven chemical SMILES tokenization approaches to identify key protein-ligand binding moieties. Mol Inform 2024;43:e202300249. [PMID: 38196065 DOI: 10.1002/minf.202300249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 11/13/2023] [Accepted: 01/06/2024] [Indexed: 01/11/2024]

Jinsong S, Qifeng J, Xing C, Hao Y, Wang L. Molecular fragmentation as a crucial step in the AI-based drug development pathway. Commun Chem 2024;7:20. [PMID: 38302655 PMCID: PMC10834946 DOI: 10.1038/s42004-024-01109-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 01/19/2024] [Indexed: 02/03/2024] Open

Zhu J, Che C, Jiang H, Xu J, Yin J, Zhong Z. SSF-DDI: a deep learning method utilizing drug sequence and substructure features for drug-drug interaction prediction. BMC Bioinformatics 2024;25:39. [PMID: 38262923 PMCID: PMC10810255 DOI: 10.1186/s12859-024-05654-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 01/12/2024] [Indexed: 01/25/2024] Open

Wei L, Fu N, Song Y, Wang Q, Hu J. Probabilistic generative transformer language models for generative design of molecules. J Cheminform 2023;15:88. [PMID: 37749655 PMCID: PMC10518939 DOI: 10.1186/s13321-023-00759-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 09/10/2023] [Indexed: 09/27/2023] Open

Ucak UV, Ashyrmamatov I, Lee J. Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization. J Cheminform 2023;15:55. [PMID: 37248531 PMCID: PMC10228139 DOI: 10.1186/s13321-023-00725-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 05/14/2023] [Indexed: 05/31/2023] Open

Guha R, Velegol D. Harnessing Shannon entropy-based descriptors in machine learning models to enhance the prediction accuracy of molecular properties. J Cheminform 2023;15:54. [PMID: 37211605 DOI: 10.1186/s13321-023-00712-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 03/18/2023] [Indexed: 05/23/2023] Open

Kang JK, Lee D, Muambo KE, Choi JW, Oh JE. Development of an embedded molecular structure-based model for prediction of micropollutant treatability in a drinking water treatment plant by machine learning from three years monitoring data. WATER RESEARCH 2023;239:120037. [PMID: 37182312 DOI: 10.1016/j.watres.2023.120037] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/25/2023] [Accepted: 05/01/2023] [Indexed: 05/16/2023]

Jaume-Santero F, Bornet A, Valery A, Naderi N, Vicente Alvarez D, Proios D, Yazdani A, Bournez C, Fessard T, Teodoro D. Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios. J Chem Inf Model 2023;63:1914-1924. [PMID: 36952584 PMCID: PMC10091402 DOI: 10.1021/acs.jcim.2c01407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]

Tysinger EP, Rai BK, Sinitskiy AV. Can We Quickly Learn to "Translate" Bioactive Molecules with Transformer Models? J Chem Inf Model 2023;63:1734-1744. [PMID: 36914216 DOI: 10.1021/acs.jcim.2c01618] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]

Jiang J, Zhang R, Ma J, Liu Y, Yang E, Du S, Zhao Z, Yuan Y. TranGRU: focusing on both the local and global information of molecules for molecular property prediction. APPL INTELL 2022;53:15246-15260. [PMID: 36405344 PMCID: PMC9662124 DOI: 10.1007/s10489-022-04280-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/17/2022] [Indexed: 11/16/2022]

Discovering design principles of collagen molecular stability using a genetic algorithm, deep learning, and experimental validation. Proc Natl Acad Sci U S A 2022;119:e2209524119. [PMID: 36161946 DOI: 10.1073/pnas.2209524119] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Jiang J, Zhang R, Zhao Z, Ma J, Liu Y, Yuan Y, Niu B. MultiGran-SMILES: multi-granularity SMILES learning for molecular property prediction. Bioinformatics 2022;38:4573-4580. [PMID: 35961025 DOI: 10.1093/bioinformatics/btac550] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 07/07/2022] [Accepted: 08/10/2022] [Indexed: 11/14/2022] Open

Deep learning methods for molecular representation and property prediction. Drug Discov Today 2022;27:103373. [PMID: 36167282 DOI: 10.1016/j.drudis.2022.103373] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/22/2022] [Accepted: 09/21/2022] [Indexed: 01/11/2023]

Gao Y, Chen S, Tong J, Fu X. Topology-enhanced molecular graph representation for anti-breast cancer drug selection. BMC Bioinformatics 2022;23:382. [PMID: 36123643 PMCID: PMC9484163 DOI: 10.1186/s12859-022-04913-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Accepted: 08/24/2022] [Indexed: 12/24/2022] Open

Abstract

Background

Breast cancer is currently one of the cancers with a higher mortality rate in the world. The biological research on anti-breast cancer drugs focuses on the activity of estrogen receptors alpha (ER\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}α), the pharmacokinetic properties and the safety of the compounds, which, however, is an expensive and time-consuming process. Developments of deep learning bring potential to efficiently facilitate the candidate drug selection against breast cancer.

Methods

In this paper, we propose an Anti-Breast Cancer Drug selection method utilizing Gated Graph Neural Networks (ABCD-GGNN) to topologically enhance the molecular representation of candidate drugs. By constructing atom-level graphs through atomic descriptors for each distinct compound, ABCD-GGNN can topologically learn both the implicit structure and substructure characteristics of a candidate drug and then integrate the representation with explicit discrete molecular descriptors to generate a molecule-level representation. As a result, the representation of ABCD-GGNN can inductively predict the ER\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}α, the pharmacokinetic properties and the safety of each candidate drug. Finally, we design a ranking operator whose inputs are the predicted properties so as to statistically select the appropriate drugs against breast cancer.

Results

Extensive experiments conducted on our collected anti-breast cancer candidate drug dataset demonstrate that our proposed method outperform all the other representative methods in the tasks of predicting ER\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}α, and the pharmacokinetic properties and safety of the compounds. Extended result analysis demonstrates the efficiency and biological rationality of the operator we design to calculate the candidate drug ranking from the predicted properties.

Conclusion

In this paper, we propose the ABCD-GGNN representation method to efficiently integrate the topological structure and substructure features of the molecules with the discrete molecular descriptors. With a ranking operator applied, the predicted properties efficiently facilitate the candidate drug selection against breast cancer.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-022-04913-6.

Collapse

Uludoğan G, Ozkirimli E, Ulgen KO, Karalı N, Özgür A. Exploiting pretrained biochemical language models for targeted drug design. Bioinformatics 2022;38:ii155-ii161. [PMID: 36124801 DOI: 10.1093/bioinformatics/btac482] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Abstract

MOTIVATION

The development of novel compounds targeting proteins of interest is one of the most important tasks in the pharmaceutical industry. Deep generative models have been applied to targeted molecular design and have shown promising results. Recently, target-specific molecule generation has been viewed as a translation between the protein language and the chemical language. However, such a model is limited by the availability of interacting protein-ligand pairs. On the other hand, large amounts of unlabelled protein sequences and chemical compounds are available and have been used to train language models that learn useful representations. In this study, we propose exploiting pretrained biochemical language models to initialize (i.e. warm start) targeted molecule generation models. We investigate two warm start strategies: (i) a one-stage strategy where the initialized model is trained on targeted molecule generation and (ii) a two-stage strategy containing a pre-finetuning on molecular generation followed by target-specific training. We also compare two decoding strategies to generate compounds: beam search and sampling.

RESULTS

The results show that the warm-started models perform better than a baseline model trained from scratch. The two proposed warm-start strategies achieve similar results to each other with respect to widely used metrics from benchmarks. However, docking evaluation of the generated compounds for a number of novel proteins suggests that the one-stage strategy generalizes better than the two-stage strategy. Additionally, we observe that beam search outperforms sampling in both docking evaluation and benchmark metrics for assessing compound quality.

AVAILABILITY AND IMPLEMENTATION

The source code is available at https://github.com/boun-tabi/biochemical-lms-for-drug-design and the materials (i.e., data, models, and outputs) are archived in Zenodo at https://doi.org/10.5281/zenodo.6832145.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Tang Q, Nie F, Zhao Q, Chen W. A merged molecular representation deep learning method for blood-brain barrier permeability prediction. Brief Bioinform 2022;23:6674486. [PMID: 36002937 DOI: 10.1093/bib/bbac357] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 07/27/2022] [Accepted: 07/30/2022] [Indexed: 12/30/2022] Open

Zeng Y, Chen X, Peng D, Zhang L, Huang H. Multi-scaled self-attention for drug-target interaction prediction based on multi-granularity representation. BMC Bioinformatics 2022;23:314. [PMID: 35922768 PMCID: PMC9347097 DOI: 10.1186/s12859-022-04857-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 07/22/2022] [Indexed: 11/21/2022] Open

Sreenivasan AP, Harrison PJ, Schaal W, Matuszewski DJ, Kultima K, Spjuth O. Predicting protein network topology clusters from chemical structure using deep learning. J Cheminform 2022;14:47. [PMID: 35841114 PMCID: PMC9284831 DOI: 10.1186/s13321-022-00622-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 06/06/2022] [Indexed: 11/10/2022] Open

InflamNat: web-based database and predictor of anti-inflammatory natural products. J Cheminform 2022;14:30. [PMID: 35659771 PMCID: PMC9167499 DOI: 10.1186/s13321-022-00608-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 05/09/2022] [Indexed: 11/30/2022] Open

Godinez WJ, Ma EJ, Chao AT, Pei L, Skewes-Cox P, Canham SM, Jenkins JL, Young JM, Martin EJ, Guiguemde WA. Design of potent antimalarials with generative chemistry. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00448-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

ColGen: An end-to-end deep learning model to predict thermal stability of de novo collagen sequences. J Mech Behav Biomed Mater 2021;125:104921. [PMID: 34758444 DOI: 10.1016/j.jmbbm.2021.104921] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 10/21/2021] [Indexed: 11/22/2022]

An X, Chen X, Yi D, Li H, Guan Y. Representation of molecules for drug response prediction. Brief Bioinform 2021;23:6375515. [PMID: 34571534 DOI: 10.1093/bib/bbab393] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 08/28/2021] [Accepted: 08/30/2021] [Indexed: 12/18/2022] Open