1
|
Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024; 29:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
In the dynamic field of drug discovery, deep attention neural networks are revolutionizing our approach to complex data. This review explores the attention mechanism and its extended architectures, including graph attention networks (GATs), transformers, bidirectional encoder representations from transformers (BERT), generative pre-trained transformers (GPTs) and bidirectional and auto-regressive transformers (BART). Delving into their core principles and multifaceted applications, we uncover their pivotal roles in catalyzing de novo drug design, predicting intricate molecular properties and deciphering elusive drug-target interactions. Despite challenges, these attention-based architectures hold unparalleled promise to drive transformative breakthroughs and accelerate progress in pharmaceutical research.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Drug Discovery Laboratory, Department of Pharmacy, University of Napoli Federico II, I-80131 Naples, Italy.
| |
Collapse
|
2
|
Omidian H. Synergizing blockchain and artificial intelligence to enhance healthcare. Drug Discov Today 2024; 29:104111. [PMID: 39034026 DOI: 10.1016/j.drudis.2024.104111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 07/09/2024] [Accepted: 07/16/2024] [Indexed: 07/23/2024]
Abstract
This perspective paper explores the synergistic potential of blockchain and artificial intelligence (AI) in transforming healthcare. It begins with an overview of blockchain's role in healthcare data management, security, the pharmaceutical supply chain, clinical trials, and health insurance. The discussion then shifts to the impact of AI on healthcare, followed by an examination of integrated AI-blockchain platforms and their benefits. Technical challenges, limitations, and solutions related to these technologies are scrutinized. The paper addresses regulatory compliance and ethical considerations, and proposes future directions for their implementation. It concludes with research and implementation guidelines, offering a roadmap for harnessing blockchain and AI to enhance healthcare outcomes.
Collapse
Affiliation(s)
- Hossein Omidian
- Barry & Judy Silverman College of Pharmacy, Nova Southeastern University, Fort Lauderdale, FL 33328, USA
| |
Collapse
|
3
|
Manen-Freixa L, Antolin AA. Polypharmacology prediction: the long road toward comprehensively anticipating small-molecule selectivity to de-risk drug discovery. Expert Opin Drug Discov 2024:1-27. [PMID: 39004919 DOI: 10.1080/17460441.2024.2376643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 07/02/2024] [Indexed: 07/16/2024]
Abstract
INTRODUCTION Small molecules often bind to multiple targets, a behavior termed polypharmacology. Anticipating polypharmacology is essential for drug discovery since unknown off-targets can modulate safety and efficacy - profoundly affecting drug discovery success. Unfortunately, experimental methods to assess selectivity present significant limitations and drugs still fail in the clinic due to unanticipated off-targets. Computational methods are a cost-effective, complementary approach to predict polypharmacology. AREAS COVERED This review aims to provide a comprehensive overview of the state of polypharmacology prediction and discuss its strengths and limitations, covering both classical cheminformatics methods and bioinformatic approaches. The authors review available data sources, paying close attention to their different coverage. The authors then discuss major algorithms grouped by the types of data that they exploit using selected examples. EXPERT OPINION Polypharmacology prediction has made impressive progress over the last decades and contributed to identify many off-targets. However, data incompleteness currently limits most approaches to comprehensively predict selectivity. Moreover, our limited agreement on model assessment challenges the identification of the best algorithms - which at present show modest performance in prospective real-world applications. Despite these limitations, the exponential increase of multidisciplinary Big Data and AI hold much potential to better polypharmacology prediction and de-risk drug discovery.
Collapse
Affiliation(s)
- Leticia Manen-Freixa
- Oncobell Division, Bellvitge Biomedical Research Institute (IDIBELL) and ProCURE Department, Catalan Institute of Oncology (ICO), Barcelona, Spain
| | - Albert A Antolin
- Oncobell Division, Bellvitge Biomedical Research Institute (IDIBELL) and ProCURE Department, Catalan Institute of Oncology (ICO), Barcelona, Spain
- Center for Cancer Drug Discovery, The Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| |
Collapse
|
4
|
Schuh M, Boldini D, Sieber SA. Synergizing Chemical Structures and Bioassay Descriptions for Enhanced Molecular Property Prediction in Drug Discovery. J Chem Inf Model 2024; 64:4640-4650. [PMID: 38836773 PMCID: PMC11200265 DOI: 10.1021/acs.jcim.4c00765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 05/23/2024] [Accepted: 05/23/2024] [Indexed: 06/06/2024]
Abstract
The precise prediction of molecular properties can greatly accelerate the development of new drugs. However, in silico molecular property prediction approaches have been limited so far to assays for which large amounts of data are available. In this study, we develop a new computational approach leveraging both the textual description of the assay of interest and the chemical structure of target compounds. By combining these two sources of information via self-supervised learning, our tool can provide accurate predictions for assays where no measurements are available. Remarkably, our approach achieves state-of-the-art performance on the FS-Mol benchmark for zero-shot prediction, outperforming a wide variety of deep learning approaches. Additionally, we demonstrate how our tool can be used for tailoring screening libraries for the assay of interest, showing promising performance in a retrospective case study on a high-throughput screening campaign. By accelerating the early identification of active molecules in drug discovery and development, this method has the potential to streamline the identification of novel therapeutics.
Collapse
Affiliation(s)
- Maximilian
G. Schuh
- TUM School of Natural Sciences, Department
of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, 85748 Garching
bei München, Germany
| | - Davide Boldini
- TUM School of Natural Sciences, Department
of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, 85748 Garching
bei München, Germany
| | - Stephan A. Sieber
- TUM School of Natural Sciences, Department
of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, 85748 Garching
bei München, Germany
| |
Collapse
|
5
|
Luong KD, Singh A. Application of Transformers in Cheminformatics. J Chem Inf Model 2024; 64:4392-4409. [PMID: 38815246 PMCID: PMC11167597 DOI: 10.1021/acs.jcim.3c02070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 06/01/2024]
Abstract
By accelerating time-consuming processes with high efficiency, computing has become an essential part of many modern chemical pipelines. Machine learning is a class of computing methods that can discover patterns within chemical data and utilize this knowledge for a wide variety of downstream tasks, such as property prediction or substance generation. The complex and diverse chemical space requires complex machine learning architectures with great learning power. Recently, learning models based on transformer architectures have revolutionized multiple domains of machine learning, including natural language processing and computer vision. Naturally, there have been ongoing endeavors in adopting these techniques to the chemical domain, resulting in a surge of publications within a short period. The diversity of chemical structures, use cases, and learning models necessitate a comprehensive summarization of existing works. In this paper, we review recent innovations in adapting transformers to solve learning problems in chemistry. Because chemical data is diverse and complex, we structure our discussion based on chemical representations. Specifically, we highlight the strengths and weaknesses of each representation, the current progress of adapting transformer architectures, and future directions.
Collapse
Affiliation(s)
- Kha-Dinh Luong
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| | - Ambuj Singh
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| |
Collapse
|
6
|
Zhang R, Yuan R, Tian B. PointGAT: A Quantum Chemical Property Prediction Model Integrating Graph Attention and 3D Geometry. J Chem Theory Comput 2024; 20:4115-4128. [PMID: 38727259 DOI: 10.1021/acs.jctc.3c01420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Predicting quantum chemical properties is a fundamental challenge for computational chemistry. While the development of graph neural networks has advanced molecular representation learning and property prediction, their performance could be further enhanced by incorporating three-dimensional (3D) structural geometry into two-dimensional (2D) molecular graph representation. In this study, we introduce the PointGAT model for quantum molecular property prediction, which integrates 3D molecular coordinates with graph-attention modeling. Comparison with other current models in molecular prediction tasks showed that PointGAT could provide higher predictive accuracy in various benchmark data sets from MoleculeNet, including ESOL, FreeSolv, Lipop, HIV, and 6 out of 12 tasks of the QM9 data set. To further examine PointGAT prediction of quantum mechanical (QM) energies, we constructed a C10 data set comprising 11,841 charged and chiral carbocation intermediates with QM energies calculated at the DM21/6-31G*//B3LYP/6-31G* levels. Notably, PointGAT achieved an R2 value of 0.950 and an MAE of 1.616 kcal/mol, outperforming even the best-performing graph neural network model with a reduction of 0.216 kcal/mol in MAE and an improvement of 0.050 in R2. Additional ablation studies indicated that incorporating molecular geometry into the model resulted in markedly higher predictive accuracy, reducing the MAE value from 1.802 to 1.616 kcal/mol. Moreover, visualization of PointGAT atomic attention weights suggested its predictions were interpretable. Findings in this study support the application of PointGAT as a powerful and versatile tool for quantum chemical property prediction that can facilitate high-accuracy modeling for fundamental exploration of chemical space as well as drug design and molecular engineering.
Collapse
Affiliation(s)
- Rong Zhang
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Rongqing Yuan
- Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
7
|
Yao S, Song J, Jia L, Cheng L, Zhong Z, Song M, Feng Z. Fast and effective molecular property prediction with transferability map. Commun Chem 2024; 7:85. [PMID: 38632308 PMCID: PMC11024153 DOI: 10.1038/s42004-024-01169-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 04/05/2024] [Indexed: 04/19/2024] Open
Abstract
Effective transfer learning for molecular property prediction has shown considerable strength in addressing insufficient labeled molecules. Many existing methods either disregard the quantitative relationship between source and target properties, risking negative transfer, or require intensive training on target tasks. To quantify transferability concerning task-relatedness, we propose Principal Gradient-based Measurement (PGM) for transferring molecular property prediction ability. First, we design an optimization-free scheme to calculate a principal gradient for approximating the direction of model optimization on a molecular property prediction dataset. We have analyzed the close connection between the principal gradient and model optimization through mathematical proof. PGM measures the transferability as the distance between the principal gradient obtained from the source dataset and that derived from the target dataset. Then, we perform PGM on various molecular property prediction datasets to build a quantitative transferability map for source dataset selection. Finally, we evaluate PGM on multiple combinations of transfer learning tasks across 12 benchmark molecular property prediction datasets and demonstrate that it can serve as fast and effective guidance to improve the performance of a target task. This work contributes to more efficient discovery of drugs, materials, and catalysts by offering a task-relatedness quantification prior to transfer learning and understanding the relationship between chemical properties.
Collapse
Affiliation(s)
- Shaolun Yao
- Collaborative Innovation Center of Artificial Intelligence by MOE and Zhejiang Provincial Government, Zhejiang University, 310027, Hangzhou, China
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
| | - Jie Song
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
- School of Software Technology, Zhejiang University, 315048, Ningbo, China
| | - Lingxiang Jia
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
| | - Lechao Cheng
- School of Computer Science and Information Engineering, Hefei University of Technology, 230009, Hefei, China
| | - Zipeng Zhong
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
| | - Mingli Song
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
| | - Zunlei Feng
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China.
- School of Software Technology, Zhejiang University, 315048, Ningbo, China.
| |
Collapse
|
8
|
Guo W, Dong Y, Hao GF. Transfer learning empowers accurate pharmacokinetics prediction of small samples. Drug Discov Today 2024; 29:103946. [PMID: 38460571 DOI: 10.1016/j.drudis.2024.103946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 02/22/2024] [Accepted: 03/05/2024] [Indexed: 03/11/2024]
Abstract
Accurate assessment of pharmacokinetic (PK) properties is crucial for selecting optimal candidates and avoiding downstream failures. Transfer learning is an innovative machine learning approach enabling high-throughput prediction with limited data. Recently, transfer learning methods showed promise in predicting ADME/PK parameters. Given the prolific growth of research on transfer learning for PK prediction, a comprehensive review of its advantages and challenges is imperative. This study explores the fundamentals, classifications, toolkits and applications of various transfer learning techniques for PK prediction, demonstrating their utility through three practical case studies. This work will serve as a reference for drug design researchers.
Collapse
Affiliation(s)
- Wenbo Guo
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang 550025, China
| | - Yawen Dong
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China.
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang 550025, China.
| |
Collapse
|
9
|
Zhang R, Wu C, Yang Q, Liu C, Wang Y, Li K, Huang L, Zhou F. MolFeSCue: enhancing molecular property prediction in data-limited and imbalanced contexts using few-shot and contrastive learning. Bioinformatics 2024; 40:btae118. [PMID: 38426310 PMCID: PMC10984949 DOI: 10.1093/bioinformatics/btae118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 02/04/2024] [Accepted: 02/27/2024] [Indexed: 03/02/2024] Open
Abstract
MOTIVATION Predicting molecular properties is a pivotal task in various scientific domains, including drug discovery, material science, and computational chemistry. This problem is often hindered by the lack of annotated data and imbalanced class distributions, which pose significant challenges in developing accurate and robust predictive models. RESULTS This study tackles these issues by employing pretrained molecular models within a few-shot learning framework. A novel dynamic contrastive loss function is utilized to further improve model performance in the situation of class imbalance. The proposed MolFeSCue framework not only facilitates rapid generalization from minimal samples, but also employs a contrastive loss function to extract meaningful molecular representations from imbalanced datasets. Extensive evaluations and comparisons of MolFeSCue and state-of-the-art algorithms have been conducted on multiple benchmark datasets, and the experimental data demonstrate our algorithm's effectiveness in molecular representations and its broad applicability across various pretrained models. Our findings underscore MolFeSCues potential to accelerate advancements in drug discovery. AVAILABILITY AND IMPLEMENTATION We have made all the source code utilized in this study publicly accessible via GitHub at http://www.healthinformaticslab.org/supp/ or https://github.com/zhangruochi/MolFeSCue. The code (MolFeSCue-v1-00) is also available as the supplementary file of this paper.
Collapse
Affiliation(s)
- Ruochi Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Chao Wu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
| | - Qian Yang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
| | - Chang Liu
- Beijing Life Science Academy, Beijing 102209, China
| | - Yan Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
| | - Kewei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
| | - Lan Huang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
- School of Biology and Engineering, Guizhou Medical University, Guiyang, Guizhou 550025, China
| |
Collapse
|
10
|
Jiang T, Wang Z, Yu W, Wang J, Yu S, Bao X, Wei B, Xuan Q. Mix-Key: graph mixup with key structures for molecular property prediction. Brief Bioinform 2024; 25:bbae165. [PMID: 38706318 PMCID: PMC11070654 DOI: 10.1093/bib/bbae165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/21/2024] [Accepted: 04/04/2024] [Indexed: 05/07/2024] Open
Abstract
Molecular property prediction faces the challenge of limited labeled data as it necessitates a series of specialized experiments to annotate target molecules. Data augmentation techniques can effectively address the issue of data scarcity. In recent years, Mixup has achieved significant success in traditional domains such as image processing. However, its application in molecular property prediction is relatively limited due to the irregular, non-Euclidean nature of graphs and the fact that minor variations in molecular structures can lead to alterations in their properties. To address these challenges, we propose a novel data augmentation method called Mix-Key tailored for molecular property prediction. Mix-Key aims to capture crucial features of molecular graphs, focusing separately on the molecular scaffolds and functional groups. By generating isomers that are relatively invariant to the scaffolds or functional groups, we effectively preserve the core information of molecules. Additionally, to capture interactive information between the scaffolds and functional groups while ensuring correlation between the original and augmented graphs, we introduce molecular fingerprint similarity and node similarity. Through these steps, Mix-Key determines the mixup ratio between the original graph and two isomers, thus generating more informative augmented molecular graphs. We extensively validate our approach on molecular datasets of different scales with several Graph Neural Network architectures. The results demonstrate that Mix-Key consistently outperforms other data augmentation methods in enhancing molecular property prediction on several datasets.
Collapse
Affiliation(s)
- Tianyi Jiang
- Institute of Cyberspace Security, College of Information Engineering, Zhejiang University of Technology, 310023, Hangzhou, China
- Binjiang Institute of Artificial Intelligence, Zhejiang University of Technology, 310056, Hangzhou, China
| | - Zeyu Wang
- Institute of Cyberspace Security, College of Information Engineering, Zhejiang University of Technology, 310023, Hangzhou, China
- Binjiang Institute of Artificial Intelligence, Zhejiang University of Technology, 310056, Hangzhou, China
| | - Wenchao Yu
- the College of Pharmaceutical Science & Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals, Zhejiang University of Technology, 310014, Hangzhou, China
| | - Jinhuan Wang
- Institute of Cyberspace Security, College of Information Engineering, Zhejiang University of Technology, 310023, Hangzhou, China
- Binjiang Institute of Artificial Intelligence, Zhejiang University of Technology, 310056, Hangzhou, China
| | - Shanqing Yu
- Institute of Cyberspace Security, College of Information Engineering, Zhejiang University of Technology, 310023, Hangzhou, China
- Binjiang Institute of Artificial Intelligence, Zhejiang University of Technology, 310056, Hangzhou, China
| | - Xiaoze Bao
- the College of Pharmaceutical Science & Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals, Zhejiang University of Technology, 310014, Hangzhou, China
| | - Bin Wei
- the College of Pharmaceutical Science & Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals, Zhejiang University of Technology, 310014, Hangzhou, China
| | - Qi Xuan
- Institute of Cyberspace Security, College of Information Engineering, Zhejiang University of Technology, 310023, Hangzhou, China
- Binjiang Institute of Artificial Intelligence, Zhejiang University of Technology, 310056, Hangzhou, China
| |
Collapse
|
11
|
Boldini D, Ballabio D, Consonni V, Todeschini R, Grisoni F, Sieber SA. Effectiveness of molecular fingerprints for exploring the chemical space of natural products. J Cheminform 2024; 16:35. [PMID: 38528548 DOI: 10.1186/s13321-024-00830-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/17/2024] [Indexed: 03/27/2024] Open
Abstract
Natural products are a diverse class of compounds with promising biological properties, such as high potency and excellent selectivity. However, they have different structural motifs than typical drug-like compounds, e.g., a wider range of molecular weight, multiple stereocenters and higher fraction of sp3-hybridized carbons. This makes the encoding of natural products via molecular fingerprints difficult, thus restricting their use in cheminformatics studies. To tackle this issue, we explored over 30 years of research to systematically evaluate which molecular fingerprint provides the best performance on the natural product chemical space. We considered 20 molecular fingerprints from four different sources, which we then benchmarked on over 100,000 unique natural products from the COCONUT (COlleCtion of Open Natural prodUcTs) and CMNPD (Comprehensive Marine Natural Products Database) databases. Our analysis focused on the correlation between different fingerprints and their classification performance on 12 bioactivity prediction datasets. Our results show that different encodings can provide fundamentally different views of the natural product chemical space, leading to substantial differences in pairwise similarity and performance. While Extended Connectivity Fingerprints are the de-facto option to encoding drug-like compounds, other fingerprints resulted to match or outperform them for bioactivity prediction of natural products. These results highlight the need to evaluate multiple fingerprinting algorithms for optimal performance and suggest new areas of research. Finally, we provide an open-source Python package for computing all molecular fingerprints considered in the study, as well as data and scripts necessary to reproduce the results, at https://github.com/dahvida/NP_Fingerprints .
Collapse
Affiliation(s)
- Davide Boldini
- TUM School of Natural Sciences, Department of Bioscience, Technical University of Munich, Center for Functional Protein Assemblies (CPA), 85748, Garching bei München, Germany.
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy
| | - Francesca Grisoni
- Institute for Complex Molecular Systems and Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, Netherlands
| | - Stephan A Sieber
- TUM School of Natural Sciences, Department of Bioscience, Technical University of Munich, Center for Functional Protein Assemblies (CPA), 85748, Garching bei München, Germany
| |
Collapse
|
12
|
Han J, Kwon Y, Choi YS, Kang S. Improving chemical reaction yield prediction using pre-trained graph neural networks. J Cheminform 2024; 16:25. [PMID: 38429787 PMCID: PMC10905905 DOI: 10.1186/s13321-024-00818-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Accepted: 02/19/2024] [Indexed: 03/03/2024] Open
Abstract
Graph neural networks (GNNs) have proven to be effective in the prediction of chemical reaction yields. However, their performance tends to deteriorate when they are trained using an insufficient training dataset in terms of quantity or diversity. A promising solution to alleviate this issue is to pre-train a GNN on a large-scale molecular database. In this study, we investigate the effectiveness of GNN pre-training in chemical reaction yield prediction. We present a novel GNN pre-training method for performance improvement.Given a molecular database consisting of a large number of molecules, we calculate molecular descriptors for each molecule and reduce the dimensionality of these descriptors by applying principal component analysis. We define a pre-text task by assigning a vector of principal component scores as the pseudo-label to each molecule in the database. A GNN is then pre-trained to perform the pre-text task of predicting the pseudo-label for the input molecule. For chemical reaction yield prediction, a prediction model is initialized using the pre-trained GNN and then fine-tuned with the training dataset containing chemical reactions and their yields. We demonstrate the effectiveness of the proposed method through experimental evaluation on benchmark datasets.
Collapse
Affiliation(s)
- Jongmin Han
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon, Republic of Korea
| | - Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea.
| | - Seokho Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon, Republic of Korea.
| |
Collapse
|
13
|
Ma M, Lei X. A deep learning framework for predicting molecular property based on multi-type features fusion. Comput Biol Med 2024; 169:107911. [PMID: 38160501 DOI: 10.1016/j.compbiomed.2023.107911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/18/2023] [Accepted: 12/24/2023] [Indexed: 01/03/2024]
Abstract
Extracting expressive molecular features is essential for molecular property prediction. Sequence-based representation is a common representation of molecules, which ignores the structure information of molecules. While molecular graph representation has a weak ability in expressing the 3D structure. In this article, we try to make use of the advantages of different type representations simultaneously for molecular property prediction. Thus, we propose a fusion model named DLF-MFF, which integrates the multi-type molecular features. Specifically, we first extract four different types of features from molecular fingerprints, 2D molecular graph, 3D molecular graph and molecular image. Then, in order to learn molecular features individually, we use four essential deep learning frameworks, which correspond to four distinct molecular representations. The final molecular representation is created by integrating the four feature vectors and feeding them into prediction layer to predict molecular property. We compare DLF-MFF with 7 state-of-the-art methods on 6 benchmark datasets consisting of multiple molecular properties, the experimental results show that DLF-MFF achieves state-of-the-art performance on 6 benchmark datasets. Moreover, DLF-MFF is applied to identify potential anti-SARS-CoV-2 inhibitor from 2500 drugs. We predict probability of each drug being inferred as a 3CL protease inhibitor and also calculate the binding affinity scores between each drug and 3CL protease. The results show that DLF-MFF product better performance in the identification of anti-SARS-CoV-2 inhibitor. This work is expected to offer novel research perspectives for accurate prediction of molecular properties and provide valuable insights into drug repurposing for COVID-19.
Collapse
Affiliation(s)
- Mei Ma
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China; School of Mathematics and Statistics, Qinghai Normal University, Qinghai, 810000, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| |
Collapse
|
14
|
Song Z, Chen J, Cheng J, Chen G, Qi Z. Computer-Aided Molecular Design of Ionic Liquids as Advanced Process Media: A Review from Fundamentals to Applications. Chem Rev 2024; 124:248-317. [PMID: 38108629 DOI: 10.1021/acs.chemrev.3c00223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The unique physicochemical properties, flexible structural tunability, and giant chemical space of ionic liquids (ILs) provide them a great opportunity to match different target properties to work as advanced process media. The crux of the matter is how to efficiently and reliably tailor suitable ILs toward a specific application. In this regard, the computer-aided molecular design (CAMD) approach has been widely adapted to cover this family of high-profile chemicals, that is, to perform computer-aided IL design (CAILD). This review discusses the past developments that have contributed to the state-of-the-art of CAILD and provides a perspective about how future works could pursue the acceleration of the practical application of ILs. In a broad context of CAILD, key aspects related to the forward structure-property modeling and reverse molecular design of ILs are overviewed. For the former forward task, diverse IL molecular representations, modeling algorithms, as well as representative models on physical properties, thermodynamic properties, among others of ILs are introduced. For the latter reverse task, representative works formulating different molecular design scenarios are summarized. Beyond the substantial progress made, some future perspectives to move CAILD a step forward are finally provided.
Collapse
Affiliation(s)
- Zhen Song
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Jiahui Chen
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Jie Cheng
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guzhong Chen
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zhiwen Qi
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
15
|
Hao Y, Chen X, Fei A, Jia Q, Chen Y, Shao J, Pandiyan S, Wang L. SG-ATT: A Sequence Graph Cross-Attention Representation Architecture for Molecular Property Prediction. Molecules 2024; 29:492. [PMID: 38276570 PMCID: PMC10819071 DOI: 10.3390/molecules29020492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/06/2024] [Accepted: 01/14/2024] [Indexed: 01/27/2024] Open
Abstract
Existing formats based on the simplified molecular input line entry system (SMILES) encoding and molecular graph structure are designed to encode the complete semantic and structural information of molecules. However, the physicochemical properties of molecules are complex, and a single encoding of molecular features from SMILES sequences or molecular graph structures cannot adequately represent molecular information. Aiming to address this problem, this study proposes a sequence graph cross-attention (SG-ATT) representation architecture for a molecular property prediction model to efficiently use domain knowledge to enhance molecular graph feature encoding and combine the features of molecular SMILES sequences. The SG-ATT fuses the two-dimensional molecular features so that the current model input molecular information contains molecular structure information and semantic information. The SG-ATT was tested on nine molecular property prediction tasks. Among them, the biggest SG-ATT model performance improvement was 4.5% on the BACE dataset, and the average model performance improvement was 1.83% on the full dataset. Additionally, specific model interpretability studies were conducted to showcase the performance of the SG-ATT model on different datasets. In-depth analysis was provided through case studies of in vitro validation. Finally, network tools for molecular property prediction were developed for the use of researchers.
Collapse
Affiliation(s)
- Yajie Hao
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Xing Chen
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Ailu Fei
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Qifeng Jia
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Yu Chen
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Jinsong Shao
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Sanjeevi Pandiyan
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Li Wang
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
- Research Center for Intelligent Information Technology, Nantong University, Nantong 226001, China
| |
Collapse
|
16
|
Xu F, Yang Z, Wang L, Meng D, Long J. MESPool: Molecular Edge Shrinkage Pooling for hierarchical molecular representation learning and property prediction. Brief Bioinform 2023; 25:bbad423. [PMID: 38048081 PMCID: PMC10753536 DOI: 10.1093/bib/bbad423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/18/2023] [Accepted: 10/29/2023] [Indexed: 12/05/2023] Open
Abstract
Identifying task-relevant structures is important for molecular property prediction. In a graph neural network (GNN), graph pooling can group nodes and hierarchically represent the molecular graph. However, previous pooling methods either drop out node information or lose the connection of the original graph; therefore, it is difficult to identify continuous subtructures. Importantly, they lacked interpretability on molecular graphs. To this end, we proposed a novel Molecular Edge Shrinkage Pooling (MESPool) method, which is based on edges (or chemical bonds). MESPool preserves crucial edges and shrinks others inside the functional groups and is able to search for key structures without breaking the original connection. We compared MESPool with various well-known pooling methods on different benchmarks and showed that MESPool outperforms the previous methods. Furthermore, we explained the rationality of MESPool on some datasets, including a COVID-19 drug dataset.
Collapse
Affiliation(s)
- Fanding Xu
- School of Life Science and Technology, Xi’an Jiaotong University, 710049 Shaanxi, China
| | - Zhiwei Yang
- School of Physics, Xi’an Jiaotong University, 710049 Shaanxi, China
| | - Lizhuo Wang
- School of Life Science and Technology, Xi’an Jiaotong University, 710049 Shaanxi, China
| | - Deyu Meng
- Rearch Institute for Mathematics and Mathematical Technology, Xi’an Jiaotong University, 710049 Shaanxi, China
- School of Mathematics and Statistics, Henan University, 475004 Henan, China
| | - Jiangang Long
- School of Life Science and Technology, Xi’an Jiaotong University, 710049 Shaanxi, China
| |
Collapse
|
17
|
Yu L, He X, Fang X, Liu L, Liu J. Deep Learning with Geometry-Enhanced Molecular Representation for Augmentation of Large-Scale Docking-Based Virtual Screening. J Chem Inf Model 2023; 63:6501-6514. [PMID: 37882338 DOI: 10.1021/acs.jcim.3c01371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Abstract
Structure-based virtual screening has been a crucial tool in drug discovery for decades. However, as the chemical space expands, the existing structure-based virtual screening techniques based on molecular docking and scoring struggle to handle billion-entry ultralarge libraries due to the high computational cost. To address this challenge, people have resorted to machine learning techniques to enhance structure-based virtual screening for efficiently exploring the vast chemical space. In those cases, compounds are usually treated as sequential strings or two-dimensional topology graphs, limiting their ability to incorporate three-dimensional structural information for downstream tasks. We herein propose a novel deep learning protocol, GEM-Screen, which utilizes the geometry-enhanced molecular representation of the compounds docking to a specific target and is trained on docking scores of a small fraction of a library through an active learning strategy to approximate the docking outcome for yet nontraining entries. This protocol is applied to virtual screening campaigns against the AmpC and D4 targets, demonstrating that GEM-Screen enriches more than 90% of the hit scaffolds for AmpC in the top 4% of model predictions and more than 80% of the hit scaffolds for D4 in the same top-ranking size of library. GEM-Screen can be used in conjunction with traditional docking programs for docking of only the top-ranked compounds to avoid the exhaustive docking of the whole library, thus allowing for discovering top-scoring compounds from billion-entry libraries in a rapid yet accurate fashion.
Collapse
Affiliation(s)
- Lan Yu
- School of Science, China Pharmaceutical University, Nanjing 210009, China
| | - Xiao He
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Shanghai 200062, China
| | - Xiaomin Fang
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen 518063, China
| | - Lihang Liu
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen 518063, China
| | - Jinfeng Liu
- School of Science, China Pharmaceutical University, Nanjing 210009, China
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| |
Collapse
|
18
|
Wu K, Karapetyan E, Schloss J, Vadgama J, Wu Y. Advancements in small molecule drug design: A structural perspective. Drug Discov Today 2023; 28:103730. [PMID: 37536390 PMCID: PMC10543554 DOI: 10.1016/j.drudis.2023.103730] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 07/19/2023] [Accepted: 07/27/2023] [Indexed: 08/05/2023]
Abstract
In this review, we outline recent advancements in small molecule drug design from a structural perspective. We compare protein structure prediction methods and explore the role of the ligand binding pocket in structure-based drug design. We examine various structural features used to optimize drug candidates, including functional groups, stereochemistry, and molecular weight. Computational tools such as molecular docking and virtual screening are discussed for predicting and optimizing drug candidate structures. We present examples of drug candidates designed based on their molecular structure and discuss future directions in the field. By effectively integrating structural information with other valuable data sources, we can improve the drug discovery process, leading to the identification of novel therapeutics with improved efficacy, specificity, and safety profiles.
Collapse
Affiliation(s)
- Ke Wu
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA
| | - Eduard Karapetyan
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA
| | - John Schloss
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA; School of Pharmacy, American University of Health Sciences, Signal Hill, CA 90755, USA
| | - Jaydutt Vadgama
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA; School of Pharmacy, American University of Health Sciences, Signal Hill, CA 90755, USA.
| | - Yong Wu
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA.
| |
Collapse
|
19
|
Wu J, Su Y, Yang A, Ren J, Xiang Y. An improved multi-modal representation-learning model based on fusion networks for property prediction in drug discovery. Comput Biol Med 2023; 165:107452. [PMID: 37690287 DOI: 10.1016/j.compbiomed.2023.107452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 08/12/2023] [Accepted: 09/04/2023] [Indexed: 09/12/2023]
Abstract
Accurate characterization of molecular representations plays an important role in the property prediction based on deep learning (DL) for drug discovery. However, most previous researches considered only one type of molecular representations, resulting in that it difficult to capture the full molecular feature information. In this study, a novel DL framework called multi-modal molecular representation learning fusion network (MMRLFN) is developed, which could simultaneously learn and integrate drug molecular features from molecular graphs and SMILES sequences. The developed MMRLFN method is composed of three complementary deep neural networks to learn various features from different molecular representations, such as molecular topology, local chemical background information, and substructures at varying scales. Eight public datasets involving various molecular properties used in drug discovery were employed to train and evaluate the developed MMRLFN. The obtained models showed better performances than the existing models based on mono-modal molecular representations. Additionally, a thorough analysis of the noise resistance and interpretability of the MMRLFN has been carried out. The generalization ability and effectiveness of the MMRLFN has been verified by case studies as well. Overall, the MMRLFN can accurately predict molecular properties and provide potentially valuable information from large datasets, thereby maximizing the possibility of successful drug discovery.
Collapse
Affiliation(s)
- Jinzhou Wu
- School of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing, 401331, China
| | - Yang Su
- School of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing, 401331, China.
| | - Ao Yang
- School of Safety Engineering (School of Emergency Management), Chongqing University of Science and Technology, Chongqing, 401331, China
| | - Jingzheng Ren
- Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, China
| | - Yi Xiang
- School of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing, 401331, China
| |
Collapse
|
20
|
Han S, Fu H, Wu Y, Zhao G, Song Z, Huang F, Zhang Z, Liu S, Zhang W. HimGNN: a novel hierarchical molecular graph representation learning framework for property prediction. Brief Bioinform 2023; 24:bbad305. [PMID: 37594313 DOI: 10.1093/bib/bbad305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 07/18/2023] [Accepted: 08/04/2023] [Indexed: 08/19/2023] Open
Abstract
Accurate prediction of molecular properties is an important topic in drug discovery. Recent works have developed various representation schemes for molecular structures to capture different chemical information in molecules. The atom and motif can be viewed as hierarchical molecular structures that are widely used for learning molecular representations to predict chemical properties. Previous works have attempted to exploit both atom and motif to address the problem of information loss in single representation learning for various tasks. To further fuse such hierarchical information, the correspondence between learned chemical features from different molecular structures should be considered. Herein, we propose a novel framework for molecular property prediction, called hierarchical molecular graph neural networks (HimGNN). HimGNN learns hierarchical topology representations by applying graph neural networks on atom- and motif-based graphs. In order to boost the representational power of the motif feature, we design a Transformer-based local augmentation module to enrich motif features by introducing heterogeneous atom information in motif representation learning. Besides, we focus on the molecular hierarchical relationship and propose a simple yet effective rescaling module, called contextual self-rescaling, that adaptively recalibrates molecular representations by explicitly modelling interdependencies between atom and motif features. Extensive computational experiments demonstrate that HimGNN can achieve promising performances over state-of-the-art baselines on both classification and regression tasks in molecular property prediction.
Collapse
Affiliation(s)
- Shen Han
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Haitao Fu
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Yuyang Wu
- College of Plant Science and Technology, Huazhong Agricultural University, People's Republic of China
| | - Ganglan Zhao
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Zhenyu Song
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Feng Huang
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Zhongfei Zhang
- Computer Science Department, Binghamton University, Binghamton, NY, USA
| | - Shichao Liu
- College of Informatics, Huazhong Agricultural University, People's Republic of China and Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Animal Farming Technology, Ministry of Agriculture, Huazhong Agricultural University
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, People's Republic of China and Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Animal Farming Technology, Ministry of Agriculture, Huazhong Agricultural University
| |
Collapse
|
21
|
Cremer J, Medrano Sandonas L, Tkatchenko A, Clevert DA, De Fabritiis G. Equivariant Graph Neural Networks for Toxicity Prediction. Chem Res Toxicol 2023; 36. [PMID: 37690056 PMCID: PMC10583285 DOI: 10.1021/acs.chemrestox.3c00032] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Indexed: 09/12/2023]
Abstract
Predictive modeling of toxicity is a crucial step in the drug discovery pipeline. It can help filter out molecules with a high probability of failing in the early stages of de novo drug design. Thus, several machine learning (ML) models have been developed to predict the toxicity of molecules by combining classical ML techniques or deep neural networks with well-known molecular representations such as fingerprints or 2D graphs. But the more natural, accurate representation of molecules is expected to be defined in physical 3D space like in ab initio methods. Recent studies successfully used equivariant graph neural networks (EGNNs) for representation learning based on 3D structures to predict quantum-mechanical properties of molecules. Inspired by this, we investigated the performance of EGNNs to construct reliable ML models for toxicity prediction. We used the equivariant transformer (ET) model in TorchMD-NET for this. Eleven toxicity data sets taken from MoleculeNet, TDCommons, and ToxBenchmark have been considered to evaluate the capability of ET for toxicity prediction. Our results show that ET adequately learns 3D representations of molecules that can successfully correlate with toxicity activity, achieving good accuracies on most data sets comparable to state-of-the-art models. We also test a physicochemical property, namely, the total energy of a molecule, to inform the toxicity prediction with a physical prior. However, our work suggests that these two properties can not be related. We also provide an attention weight analysis for helping to understand the toxicity prediction in 3D space and thus increase the explainability of the ML model. In summary, our findings offer promising insights considering 3D geometry information via EGNNs and provide a straightforward way to integrate molecular conformers into ML-based pipelines for predicting and investigating toxicity prediction in physical space. We expect that in the future, especially for larger, more diverse data sets, EGNNs will be an essential tool in this domain.
Collapse
Affiliation(s)
- Julian Cremer
- Computational
Science Laboratory, Universitat Pompeu Fabra,
Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
- Machine
Learning Research, Pfizer Worldwide Research
Development and Medical, Linkstr. 10, 10785 Berlin, Germany
| | - Leonardo Medrano Sandonas
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Djork-Arné Clevert
- Machine
Learning Research, Pfizer Worldwide Research
Development and Medical, Linkstr. 10, 10785 Berlin, Germany
| | - Gianni De Fabritiis
- Computational
Science Laboratory, Universitat Pompeu Fabra,
Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
- ICREA, Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
22
|
Wang Y, Zhang R, Zhang S, Guo L, Zhou Q, Zhao B, Mo X, Yang Q, Huang Y, Li K, Fan Y, Huang L, Zhou F. OCMR: A comprehensive framework for optical chemical molecular recognition. Comput Biol Med 2023; 163:107187. [PMID: 37393787 DOI: 10.1016/j.compbiomed.2023.107187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/10/2023] [Accepted: 06/19/2023] [Indexed: 07/04/2023]
Abstract
Artificial intelligence (AI) has achieved significant progress in the field of drug discovery. AI-based tools have been used in all aspects of drug discovery, including chemical structure recognition. We propose a chemical structure recognition framework, Optical Chemical Molecular Recognition (OCMR), to improve the data extraction capability in practical scenarios compared with the rule-based and end-to-end deep learning models. The proposed OCMR framework enhances the recognition performances via the integration of local information in the topology of molecular graphs. OCMR handles complex tasks like non-canonical drawing and atomic group abbreviation and substantially improves the current state-of-the-art results on multiple public benchmark datasets and one internally curated dataset.
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; School of Artificial Intelligence, Jilin University, Changchun, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Ruochi Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Shengde Zhang
- Machine Learning Department, Silexon AI Technology Co, Ltd, Beijing, 100084, China
| | - Liming Guo
- Machine Learning Department, Silexon AI Technology Co, Ltd, Beijing, 100084, China
| | - Qiong Zhou
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130012, China
| | - Bowen Zhao
- Machine Learning Department, Silexon AI Technology Co, Ltd, Beijing, 100084, China
| | - Xiaotong Mo
- Machine Learning Department, Silexon AI Technology Co, Ltd, Beijing, 100084, China
| | - Qian Yang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; School of Artificial Intelligence, Jilin University, Changchun, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Yajuan Huang
- Machine Learning Department, Silexon AI Technology Co, Ltd, Beijing, 100084, China
| | - Kewei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| | - Yusi Fan
- College of Software, Jilin University, Changchun, Jilin, 130012, China
| | - Lan Huang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; School of Artificial Intelligence, Jilin University, Changchun, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| |
Collapse
|
23
|
Wu Y, Ni X, Wang Z, Feng W. Enhancing drug property prediction with dual-channel transfer learning based on molecular fragment. BMC Bioinformatics 2023; 24:293. [PMID: 37479969 PMCID: PMC10360281 DOI: 10.1186/s12859-023-05413-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 07/13/2023] [Indexed: 07/23/2023] Open
Abstract
BACKGROUND Accurate prediction of molecular property holds significance in contemporary drug discovery and medical research. Recent advances in AI-driven molecular property prediction have shown promising results. Due to the costly annotation of in vitro and in vivo experiments, transfer learning paradigm has been gaining momentum in extracting general self-supervised information to facilitate neural network learning. However, prior pretraining strategies have overlooked the necessity of explicitly incorporating domain knowledge, especially the molecular fragments, into model design, resulting in the under-exploration of the molecular semantic space. RESULTS We propose an effective model with FRagment-based dual-channEL pretraining (FREL). Equipped with molecular fragments, FREL comprehensively employs masked autoencoder and contrastive learning to learn intra- and inter-molecule agreement, respectively. We further conduct extensive experiments on ten public datasets to demonstrate its superiority over state-of-the-art models. Further investigations and interpretations manifest the underlying relationship between molecular representations and molecular properties. CONCLUSIONS Our proposed model FREL achieves state-of-the-art performance on the benchmark datasets, emphasizing the importance of incorporating molecular fragments into model design. The expressiveness of learned molecular representations is also investigated by visualization and correlation analysis. Case studies indicate that the learned molecular representations better capture the drug property variation and fragment semantics.
Collapse
Affiliation(s)
- Yue Wu
- College of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Xinran Ni
- College of Pharmacy, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Zhihao Wang
- College of Intelligence and Information Engineering, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Weike Feng
- College of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China.
| |
Collapse
|
24
|
Toniato A, Unsleber JP, Vaucher AC, Weymuth T, Probst D, Laino T, Reiher M. Quantum chemical data generation as fill-in for reliability enhancement of machine-learning reaction and retrosynthesis planning. DIGITAL DISCOVERY 2023; 2:663-673. [PMID: 37312681 PMCID: PMC10259370 DOI: 10.1039/d3dd00006k] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 03/09/2023] [Indexed: 06/15/2023]
Abstract
Data-driven synthesis planning has seen remarkable successes in recent years by virtue of modern approaches of artificial intelligence that efficiently exploit vast databases with experimental data on chemical reactions. However, this success story is intimately connected to the availability of existing experimental data. It may well occur in retrosynthetic and synthesis design tasks that predictions in individual steps of a reaction cascade are affected by large uncertainties. In such cases, it will, in general, not be easily possible to provide missing data from autonomously conducted experiments on demand. However, first-principles calculations can, in principle, provide missing data to enhance the confidence of an individual prediction or for model retraining. Here, we demonstrate the feasibility of such an ansatz and examine resource requirements for conducting autonomous first-principles calculations on demand.
Collapse
Affiliation(s)
- Alessandra Toniato
- Laboratory of Physical Chemistry, ETH Zurich Vladimir-Prelog-Weg 2 8093 Zurich Switzerland
- National Center for Competence in Research-Catalysis (NCCR Catalysis), ETH Zurich Vladimir-Prelog-Weg 1-5/10 8093 Zurich Switzerland
- IBM Research Europe 8803 Rüschlikon Switzerland
- National Center for Competence in Research-Catalysis (NCCR Catalysis), IBM Research 8803 Rüschlikon Switzerland
| | - Jan P Unsleber
- Laboratory of Physical Chemistry, ETH Zurich Vladimir-Prelog-Weg 2 8093 Zurich Switzerland
- National Center for Competence in Research-Catalysis (NCCR Catalysis), ETH Zurich Vladimir-Prelog-Weg 1-5/10 8093 Zurich Switzerland
| | - Alain C Vaucher
- IBM Research Europe 8803 Rüschlikon Switzerland
- National Center for Competence in Research-Catalysis (NCCR Catalysis), IBM Research 8803 Rüschlikon Switzerland
| | - Thomas Weymuth
- Laboratory of Physical Chemistry, ETH Zurich Vladimir-Prelog-Weg 2 8093 Zurich Switzerland
- National Center for Competence in Research-Catalysis (NCCR Catalysis), ETH Zurich Vladimir-Prelog-Weg 1-5/10 8093 Zurich Switzerland
| | - Daniel Probst
- IBM Research Europe 8803 Rüschlikon Switzerland
- National Center for Competence in Research-Catalysis (NCCR Catalysis), IBM Research 8803 Rüschlikon Switzerland
| | - Teodoro Laino
- IBM Research Europe 8803 Rüschlikon Switzerland
- National Center for Competence in Research-Catalysis (NCCR Catalysis), IBM Research 8803 Rüschlikon Switzerland
| | - Markus Reiher
- Laboratory of Physical Chemistry, ETH Zurich Vladimir-Prelog-Weg 2 8093 Zurich Switzerland
- National Center for Competence in Research-Catalysis (NCCR Catalysis), ETH Zurich Vladimir-Prelog-Weg 1-5/10 8093 Zurich Switzerland
| |
Collapse
|
25
|
Ren GP, Wu KJ, He Y. Enhancing Molecular Representations Via Graph Transformation Layers. J Chem Inf Model 2023; 63:2679-2688. [PMID: 37104828 DOI: 10.1021/acs.jcim.3c00059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Molecular representation learning is an essential component of many molecule-oriented tasks, such as molecular property prediction and molecule generation. In recent years, graph neural networks (GNNs) have shown great promise in this area, representing a molecule as a graph composed of nodes and edges. There are increasing studies showing that coarse-grained or multiview molecular graphs are important for molecular representation learning. Most of their models, however, are too complex and lack flexibility in learning different granular information for different tasks. Here, we proposed a flexible and simple graph transformation layer (i.e., LineEvo), a plug-and-use module for GNNs, which enables molecular representation learning from multiple perspectives. The LineEvo layer transforms fine-grained molecular graphs into coarse-grained ones based on the line graph transformation strategy. Especially, it treats the edges as nodes and generates the new connected edges, atom features, and atom positions. By stacking LineEvo layers, GNNs can learn multilevel information, from atom-level to triple-atoms level and coarser level. Experimental results show that the LineEvo layers can improve the performance of traditional GNNs on molecular property prediction benchmarks on average by 7%. Additionally, we show that the LineEvo layers can help GNNs have more expressive power than the Weisfeiler-Lehman graph isomorphism test.
Collapse
Affiliation(s)
- Gao-Peng Ren
- Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- Institute of Zhejiang University-Quzhou, Quzhou 324000, China
| | - Ke-Jun Wu
- Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- Institute of Zhejiang University-Quzhou, Quzhou 324000, China
- School of Chemical and Process Engineering, University of Leeds, Leeds LS2 9JT, U. K
| | - Yuchen He
- State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
26
|
Song Y, Chen J, Wang W, Chen G, Ma Z. Double-head transformer neural network for molecular property prediction. J Cheminform 2023; 15:27. [PMID: 36823530 PMCID: PMC9951429 DOI: 10.1186/s13321-023-00700-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 02/16/2023] [Indexed: 02/25/2023] Open
Abstract
Existing molecular property prediction methods based on deep learning ignore the generalization ability of the nonlinear representation of molecular features and the reasonable assignment of weights of molecular features, making it difficult to further improve the accuracy of molecular property prediction. To solve the above problems, an end-to-end double-head transformer neural network (DHTNN) is proposed in this paper for high-precision molecular property prediction. For the data distribution characteristics of the molecular dataset, DHTNN specially designs a new activation function, beaf, which can greatly improve the generalization ability of the nonlinear representation of molecular features. A residual network is introduced in the molecular encoding part to solve the gradient explosion problem and ensure that the model can converge quickly. The transformer based on double-head attention is used to extract molecular intrinsic detail features, and the weights are reasonably assigned for predicting molecular properties with high accuracy. Our model, which was tested on the MoleculeNet [1] benchmark dataset, showed significant performance improvements over other state-of-the-art methods.
Collapse
Affiliation(s)
- Yuanbing Song
- College of Communication and Art Design, University of Shanghai for Science and Technology, Shanghai, China
| | - Jinghua Chen
- College of Communication and Art Design, University of Shanghai for Science and Technology, Shanghai, China
| | - Wenju Wang
- College of Communication and Art Design, University of Shanghai for Science and Technology, Shanghai, China.
| | - Gang Chen
- College of Communication and Art Design, University of Shanghai for Science and Technology, Shanghai, China
| | - Zhichong Ma
- College of Communication and Art Design, University of Shanghai for Science and Technology, Shanghai, China
| |
Collapse
|
27
|
Predicting Potent Compounds Using a Conditional Variational Autoencoder Based upon a New Structure-Potency Fingerprint. Biomolecules 2023; 13:biom13020393. [PMID: 36830761 PMCID: PMC9953226 DOI: 10.3390/biom13020393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 02/07/2023] [Accepted: 02/16/2023] [Indexed: 02/22/2023] Open
Abstract
Prediction of the potency of bioactive compounds generally relies on linear or nonlinear quantitative structure-activity relationship (QSAR) models. Nonlinear models are generated using machine learning methods. We introduce a novel approach for potency prediction that depends on a newly designed molecular fingerprint (FP) representation. This structure-potency fingerprint (SPFP) combines different modules accounting for the structural features of active compounds and their potency values in a single bit string, hence unifying structure and potency representation. This encoding enables the derivation of a conditional variational autoencoder (CVAE) using SPFPs of training compounds and apply the model to predict the SPFP potency module of test compounds using only their structure module as input. The SPFP-CVAE approach correctly predicts the potency values of compounds belonging to different activity classes with an accuracy comparable to support vector regression (SVR), representing the state-of-the-art in the field. In addition, highly potent compounds are predicted with very similar accuracy as SVR and deep neural networks.
Collapse
|
28
|
Ren GP, Yin YJ, Wu KJ, He Y. Force field-inspired molecular representation learning for property prediction. J Cheminform 2023; 15:17. [PMID: 36747267 PMCID: PMC9901163 DOI: 10.1186/s13321-023-00691-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 01/30/2023] [Indexed: 02/08/2023] Open
Abstract
Molecular representation learning is a crucial task to accelerate drug discovery and materials design. Graph neural networks (GNNs) have emerged as a promising approach to tackle this task. However, most of them do not fully consider the intramolecular interactions, i.e. bond stretching, angle bending, torsion, and nonbonded interactions, which are critical for determining molecular property. Recently, a growing number of 3D-aware GNNs have been proposed to cope with the issue, while these models usually need large datasets and accurate spatial information. In this work, we aim to design a GNN which is less dependent on the quantity and quality of datasets. To this end, we propose a force field-inspired neural network (FFiNet), which can include all the interactions by incorporating the functional form of the potential energy of molecules. Experiments show that FFiNet achieves state-of-the-art performance on various molecular property datasets including both small molecules and large protein-ligand complexes, even on those datasets which are relatively small and without accurate spatial information. Moreover, the visualization for FFiNet indicates that it automatically learns the relationship between property and structure, which can promote an in-depth understanding of molecular structure.
Collapse
Affiliation(s)
- Gao-Peng Ren
- grid.13402.340000 0004 1759 700XZhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027 China ,grid.13402.340000 0004 1759 700XInstitute of Zhejiang University-Quzhou, Quzhou, 324000 China
| | - Yi-Jian Yin
- grid.13402.340000 0004 1759 700XZhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027 China ,grid.13402.340000 0004 1759 700XInstitute of Zhejiang University-Quzhou, Quzhou, 324000 China
| | - Ke-Jun Wu
- Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, China. .,Institute of Zhejiang University-Quzhou, Quzhou, 324000, China. .,School of Chemical and Process Engineering, University of Leeds, Leeds, LS2 9JT, UK.
| | - Yuchen He
- State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, China.
| |
Collapse
|
29
|
Zhu W, Zhang Y, Zhao D, Xu J, Wang L. HiGNN: A Hierarchical Informative Graph Neural Network for Molecular Property Prediction Equipped with Feature-Wise Attention. J Chem Inf Model 2023; 63:43-55. [PMID: 36519623 DOI: 10.1021/acs.jcim.2c01099] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Elucidating and accurately predicting the druggability and bioactivities of molecules plays a pivotal role in drug design and discovery and remains an open challenge. Recently, graph neural networks (GNNs) have made remarkable advancements in graph-based molecular property prediction. However, current graph-based deep learning methods neglect the hierarchical information of molecules and the relationships between feature channels. In this study, we propose a well-designed hierarchical informative graph neural network (termed HiGNN) framework for predicting molecular property by utilizing a corepresentation learning of molecular graphs and chemically synthesizable breaking of retrosynthetically interesting chemical substructure (BRICS) fragments. Furthermore, a plug-and-play feature-wise attention block is first designed in HiGNN architecture to adaptively recalibrate atomic features after the message passing phase. Extensive experiments demonstrate that HiGNN achieves state-of-the-art predictive performance on many challenging drug discovery-associated benchmark data sets. In addition, we devise a molecule-fragment similarity mechanism to comprehensively investigate the interpretability of the HiGNN model at the subgraph level, indicating that HiGNN as a powerful deep learning tool can help chemists and pharmacists identify the key components of molecules for designing better molecules with desired properties or functions. The source code is publicly available at https://github.com/idruglab/hignn.
Collapse
Affiliation(s)
- Weimin Zhu
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou510006, China
| | - Yi Zhang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou510006, China
| | - Duancheng Zhao
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou510006, China
| | - Jianrong Xu
- Department of Pharmacology and Chemical Biology, Shanghai Jiao Tong University School of Medicine, Shanghai200025, China.,Academy of Integrative Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai201203, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou510006, China
| |
Collapse
|
30
|
TransG-net: transformer and graph neural network based multi-modal data fusion network for molecular properties prediction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04351-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
31
|
Using Artificial Intelligence for Drug Discovery: A Bibliometric Study and Future Research Agenda. Pharmaceuticals (Basel) 2022; 15:ph15121492. [PMID: 36558943 PMCID: PMC9785219 DOI: 10.3390/ph15121492] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/23/2022] [Accepted: 11/27/2022] [Indexed: 12/03/2022] Open
Abstract
Drug discovery is usually a rule-based process that is carefully carried out by pharmacists. However, a new trend is emerging in research and practice where artificial intelligence is being used for drug discovery to increase efficiency or to develop new drugs for previously untreatable diseases. Nevertheless, so far, no study takes a holistic view of AI-based drug discovery research. Given the importance and potential of AI for drug discovery, this lack of research is surprising. This study aimed to close this research gap by conducting a bibliometric analysis to identify all relevant studies and to analyze interrelationships among algorithms, institutions, countries, and funding sponsors. For this purpose, a sample of 3884 articles was examined bibliometrically, including studies from 1991 to 2022. We utilized various qualitative and quantitative methods, such as performance analysis, science mapping, and thematic analysis. Based on these findings, we furthermore developed a research agenda that aims to serve as a foundation for future researchers.
Collapse
|
32
|
Kong Y, Zhao X, Liu R, Yang Z, Yin H, Zhao B, Wang J, Qin B, Yan A. Integrating concept of pharmacophore with graph neural networks for chemical property prediction and interpretation. J Cheminform 2022; 14:52. [PMID: 35927691 PMCID: PMC9351086 DOI: 10.1186/s13321-022-00634-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 07/16/2022] [Indexed: 11/10/2022] Open
Abstract
Recently, graph neural networks (GNNs) have revolutionized the field of chemical property prediction and achieved state-of-the-art results on benchmark data sets. Compared with the traditional descriptor- and fingerprint-based QSAR models, GNNs can learn task related representations, which completely gets rid of the rules defined by experts. However, due to the lack of useful prior knowledge, the prediction performance and interpretability of the GNNs may be affected. In this study, we introduced a new GNN model called RG-MPNN for chemical property prediction that integrated pharmacophore information hierarchically into message-passing neural network (MPNN) architecture, specifically, in the way of pharmacophore-based reduced-graph (RG) pooling. RG-MPNN absorbed not only the information of atoms and bonds from the atom-level message-passing phase, but also the information of pharmacophores from the RG-level message-passing phase. Our experimental results on eleven benchmark and ten kinase data sets showed that our model consistently matched or outperformed other existing GNN models. Furthermore, we demonstrated that applying pharmacophore-based RG pooling to MPNN architecture can generally help GNN models improve the predictive power. The cluster analysis of RG-MPNN representations and the importance analysis of pharmacophore nodes will help chemists gain insights for hit discovery and lead optimization.
Collapse
Affiliation(s)
- Yue Kong
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.,Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Xiaoman Zhao
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Ruizi Liu
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Zhenwu Yang
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Hongyan Yin
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.,Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Bowen Zhao
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Jinling Wang
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Bingjie Qin
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Aixia Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.
| |
Collapse
|
33
|
Tan T, Cheng H, Chen G, Song Z, Qi Z. Prediction of infinite‐dilution activity coefficients with neural collaborative filtering. AIChE J 2022. [DOI: 10.1002/aic.17789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Tian Tan
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering East China University of Science and Technology Shanghai China
| | - Hongye Cheng
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering East China University of Science and Technology Shanghai China
| | - Guzhong Chen
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering East China University of Science and Technology Shanghai China
| | - Zhen Song
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering East China University of Science and Technology Shanghai China
| | - Zhiwen Qi
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering East China University of Science and Technology Shanghai China
| |
Collapse
|
34
|
Kim JH, Kim H, Kim WY. Effect of molecular representation on deep learning performance for prediction of molecular electronic properties. B KOREAN CHEM SOC 2022. [DOI: 10.1002/bkcs.12516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Jun Hyeong Kim
- Department of Chemistry Korea Advanced Institute of Science and Technology Daejeon South Korea
| | - Hyeonsu Kim
- Department of Chemistry Korea Advanced Institute of Science and Technology Daejeon South Korea
| | - Woo Youn Kim
- Department of Chemistry Korea Advanced Institute of Science and Technology Daejeon South Korea
- KI for Artificial Intelligence Korea Advanced Institute of Science and Technology Daejeon South Korea
| |
Collapse
|
35
|
Oliveira AF, Da Silva JLF, Quiles MG. Molecular Property Prediction and Molecular Design Using a Supervised Grammar Variational Autoencoder. J Chem Inf Model 2022; 62:817-828. [PMID: 35174705 DOI: 10.1021/acs.jcim.1c01573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Some of the most common applications of machine learning (ML) algorithms dealing with small molecules usually fall within two distinct domains, namely, the prediction of molecular properties and the design of novel molecules with some desirable property. Here we unite these applications under a single molecular representation and ML algorithm by modifying the grammar variational autoencoder (GVAE) model with the incorporation of property information into its training procedure, thus creating a supervised GVAE (SGVAE). Results indicate that the biased latent space generated by this approach can successfully be used to predict the molecular properties of the input molecules, produce novel and unique molecules with some desired property and also estimate the properties of random sampled molecules. We illustrate these possibilities by sampling novel molecules from the latent space with specific values of the lowest unoccupied molecular orbital (LUMO) energy after training the model using the QM9 data set. Furthermore, the trained model is also used to predict the properties of a hold-out set and the resulting mean absolute error (MAE) shows values close to chemical accuracy for the dipole moment and atomization energies, even outperforming ML models designed to exclusive predict molecular properties using the SMILES as molecular representation. Therefore, these results show that the proposed approach is a viable way to provide generative ML models with molecular property information in a way that the generation of novel molecules is likely to achieve better results, with the benefit that these new molecules can also have their molecular properties accurately predicted.
Collapse
Affiliation(s)
- André F Oliveira
- Associate Laboratory for Computing and Applied Mathematics, National Institute for Space Research, P.O. Box 515, 12227-010, São José dos Campos, SP, Brazil
| | - Juarez L F Da Silva
- São Carlos Institute of Chemistry, University of São Paulo, P.O. Box 780, 13560-970, São Carlos, SP, Brazil
| | - Marcos G Quiles
- Institute of Science and Technology, Federal University of São Paulo, 12247-014, São José dos Campos, SP, Brazil
| |
Collapse
|
36
|
Deng D, Lei Z, Hong X, Zhang R, Zhou F. Describe Molecules by a Heterogeneous Graph Neural Network with Transformer-like Attention for Supervised Property Predictions. ACS OMEGA 2022; 7:3713-3721. [PMID: 35128279 PMCID: PMC8811943 DOI: 10.1021/acsomega.1c06389] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 01/10/2022] [Indexed: 06/14/2023]
Abstract
Machine learning and deep learning have facilitated various successful studies of molecular property predictions. The rapid development of natural language processing and graph neural network (GNN) further pushed the state-of-the-art prediction performance of molecular property to a new level. A geometric graph could describe a molecular structure with atoms as the nodes and bonds as the edges. Therefore, a graph neural network may be trained to better represent a molecular structure. The existing GNNs assumed homogeneous types of atoms and bonds, which may miss important information between different types of atoms or bonds. This study represented a molecule using a heterogeneous graph neural network (MolHGT), in which there were different types of nodes and different types of edges. A transformer reading function of virtual nodes was proposed to aggregate all the nodes, and a molecule graph may be represented from the hidden states of the virtual nodes. This proof-of-principle study demonstrated that the proposed MolHGT network improved the existing studies of molecular property predictions. The source code and the training/validation/test splitting details are available at https://github.com/zhangruochi/Mol-HGT.
Collapse
Affiliation(s)
- Daiguo Deng
- Fermion
Technology Co., Limited, Guangzhou, Guangdong 510000, P. R. China
| | - Zengrong Lei
- Fermion
Technology Co., Limited, Guangzhou, Guangdong 510000, P. R. China
| | - Xiaobin Hong
- Fermion
Technology Co., Limited, Guangzhou, Guangdong 510000, P. R. China
| | - Ruochi Zhang
- Fermion
Technology Co., Limited, Guangzhou, Guangdong 510000, P. R. China
- School
of Artificial Intelligence, Jilin University, Changchun 130012, P. R. China
| | - Fengfeng Zhou
- College
of Computer Science and Technology, and Key Laboratory of Symbolic
Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P. R. China
| |
Collapse
|
37
|
Zhang XC, Wu CK, Yi JC, Zeng XX, Yang CQ, Lu AP, Hou TJ, Cao DS. Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration. RESEARCH 2022. [DOI: 10.34133/research.0004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Accurate prediction of pharmacological properties of small molecules is becoming increasingly important in drug discovery. Traditional feature-engineering approaches heavily rely on handcrafted descriptors and/or fingerprints, which need extensive human expert knowledge. With the rapid progress of artificial intelligence technology, data-driven deep learning methods have shown unparalleled advantages over feature-engineering-based methods. However, existing deep learning methods usually suffer from the scarcity of labeled data and the inability to share information between different tasks when applied to predicting molecular properties, thus resulting in poor generalization capability. Here, we proposed a novel multitask learning BERT (Bidirectional Encoder Representations from Transformer) framework, named MTL-BERT, which leverages large-scale pre-training, multitask learning, and SMILES (simplified molecular input line entry specification) enumeration to alleviate the data scarcity problem. MTL-BERT first exploits a large amount of unlabeled data through self-supervised pretraining to mine the rich contextual information in SMILES strings and then fine-tunes the pretrained model for multiple downstream tasks simultaneously by leveraging their shared information. Meanwhile, SMILES enumeration is used as a data enhancement strategy during the pretraining, fine-tuning, and test phases to substantially increase data diversity and help to learn the key relevant patterns from complex SMILES strings. The experimental results showed that the pretrained MTL-BERT model with few additional fine-tuning can achieve much better performance than the state-of-the-art methods on most of the 60 practical molecular datasets. Additionally, the MTL-BERT model leverages attention mechanisms to focus on SMILES character features essential to target properties for model interpretability.
Collapse
Affiliation(s)
- Xiao-Chen Zhang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Shangqiu Normal University, School of Information Technology, Shangqiu 476000, Henan, P. R. China
- College of Computer, National University of Defense Technology, Changsha 410005, Hunan, P. R. China
| | - Cheng-Kun Wu
- College of Computer, National University of Defense Technology, Changsha 410005, Hunan, P. R. China
| | - Jia-Cai Yi
- College of Computer, National University of Defense Technology, Changsha 410005, Hunan, P. R. China
| | - Xiang-Xiang Zeng
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, P. R. China
| | - Can-Qun Yang
- College of Computer, National University of Defense Technology, Changsha 410005, Hunan, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| | - Ting-Jun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| |
Collapse
|
38
|
Akbar R, Robert PA, Weber CR, Widrich M, Frank R, Pavlović M, Scheffer L, Chernigovskaya M, Snapkov I, Slabodkin A, Mehta BB, Miho E, Lund-Johansen F, Andersen JT, Hochreiter S, Hobæk Haff I, Klambauer G, Sandve GK, Greiff V. In silico proof of principle of machine learning-based antibody design at unconstrained scale. MAbs 2022; 14:2031482. [PMID: 35377271 PMCID: PMC8986205 DOI: 10.1080/19420862.2022.2031482] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Generative machine learning (ML) has been postulated to become a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody-binding parameters. The simulation framework enables the computation of synthetic antibody-antigen 3D-structures, and it functions as an oracle for unrestricted prospective evaluation and benchmarking of antibody design parameters of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (one dimensional: 1D) data can be used to design conformational (three dimensional: 3D) epitope-specific antibodies, matching, or exceeding the training dataset in affinity and developability parameter value variety. Furthermore, we established a lower threshold of sequence diversity necessary for high-accuracy generative antibody ML and demonstrated that this lower threshold also holds on experimental real-world data. Finally, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design.
Collapse
Affiliation(s)
- Rahmad Akbar
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Philippe A Robert
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Cédric R Weber
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Michael Widrich
- Ellis Unit Linz and Lit Ai Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
| | - Robert Frank
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | | | | | - Maria Chernigovskaya
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Igor Snapkov
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Andrei Slabodkin
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Brij Bhushan Mehta
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Enkelejda Miho
- Institute of Medical Engineering and Medical Informatics, School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Muttenz, Switzerland
| | - Fridtjof Lund-Johansen
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| | - Jan Terje Andersen
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway.,Institute of Clinical Medicine, Department of Pharmacology, University of Oslo, Oslo, Norway
| | - Sepp Hochreiter
- Ellis Unit Linz and Lit Ai Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria.,Institute of Advanced Research in Artificial Intelligence (IARAI), Austria
| | | | - Günter Klambauer
- Ellis Unit Linz and Lit Ai Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
| | | | - Victor Greiff
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway
| |
Collapse
|
39
|
Wang Q, Zhou Y. FedSPL: federated self-paced learning for privacy-preserving disease diagnosis. Brief Bioinform 2021; 23:6454650. [PMID: 34874995 DOI: 10.1093/bib/bbab498] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 10/28/2021] [Accepted: 10/30/2021] [Indexed: 12/18/2022] Open
Abstract
The growing expansion of data availability in medical fields could help improve the performance of machine learning methods. However, with healthcare data, using multi-institutional datasets is challenging due to privacy and security concerns. Therefore, privacy-preserving machine learning methods are required. Thus, we use a federated learning model to train a shared global model, which is a central server that does not contain private data, and all clients maintain the sensitive data in their own institutions. The scattered training data are connected to improve model performance, while preserving data privacy. However, in the federated training procedure, data errors or noise can reduce learning performance. Therefore, we introduce the self-paced learning, which can effectively select high-confidence samples and drop high noisy samples to improve the performances of the training model and reduce the risk of data privacy leakage. We propose the federated self-paced learning (FedSPL), which combines the advantage of federated learning and self-paced learning. The proposed FedSPL model was evaluated on gene expression data distributed across different institutions where the privacy concerns must be considered. The results demonstrate that the proposed FedSPL model is secure, i.e. it does not expose the original record to other parties, and the computational overhead during training is acceptable. Compared with learning methods based on the local data of all parties, the proposed model can significantly improve the predicted F1-score by approximately 4.3%. We believe that the proposed method has the potential to benefit clinicians in gene selections and disease prognosis.
Collapse
Affiliation(s)
- Qingyong Wang
- Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, China
| | - Yun Zhou
- Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, China
| |
Collapse
|
40
|
Lexa KW, Belyk KM, Henle J, Xiang B, Sheridan RP, Denmark SE, Ruck RT, Sherer EC. Application of Machine Learning and Reaction Optimization for the Iterative Improvement of Enantioselectivity of Cinchona-Derived Phase Transfer Catalysts. Org Process Res Dev 2021. [DOI: 10.1021/acs.oprd.1c00155] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Katrina W. Lexa
- Department of Computational and Structural Chemistry, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Kevin M. Belyk
- Department of Process Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Jeremy Henle
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, Illinois 61801, United States
| | - Bangping Xiang
- Department of Process Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Robert P. Sheridan
- Department of Computational and Structural Chemistry, MRL, Merck & Co., Inc., Kenilworth, New Jersey 07033, United States
| | - Scott E. Denmark
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, Illinois 61801, United States
| | - Rebecca T. Ruck
- Department of Process Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Edward C. Sherer
- Department of Computational and Structural Chemistry, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| |
Collapse
|
41
|
Yoshimori A. Prediction of Molecular Properties Using Molecular Topographic Map. Molecules 2021; 26:4475. [PMID: 34361624 PMCID: PMC8348331 DOI: 10.3390/molecules26154475] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 07/21/2021] [Accepted: 07/21/2021] [Indexed: 12/18/2022] Open
Abstract
Prediction of molecular properties plays a critical role towards rational drug design. In this study, the Molecular Topographic Map (MTM) is proposed, which is a two-dimensional (2D) map that can be used to represent a molecule. An MTM is generated from the atomic features set of a molecule using generative topographic mapping and is then used as input data for analyzing structure-property/activity relationships. In the visualization and classification of 20 amino acids, differences of the amino acids can be visually confirmed from and revealed by hierarchical clustering with a similarity matrix of their MTMs. The prediction of molecular properties was performed on the basis of convolutional neural networks using MTMs as input data. The performance of the predictive models using MTM was found to be equal to or better than that using Morgan fingerprint or MACCS keys. Furthermore, data augmentation of MTMs using mixup has improved the prediction performance. Since molecules converted to MTMs can be treated like 2D images, they can be easily used with existing neural networks for image recognition and related technologies. MTM can be effectively utilized to predict molecular properties of small molecules to aid drug discovery research.
Collapse
Affiliation(s)
- Atsushi Yoshimori
- Institute for Theoretical Medicine, Inc., 26-1, Muraoka-Higashi 2-chome, Fujisawa 251-0012, Japan
| |
Collapse
|
42
|
Kalliokoski T. Machine Learning Boosted Docking (HASTEN): An Open-source Tool To Accelerate Structure-based Virtual Screening Campaigns. Mol Inform 2021; 40:e2100089. [PMID: 34060239 DOI: 10.1002/minf.202100089] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 05/12/2021] [Indexed: 11/08/2022]
Abstract
The software macHine leArning booSTEd dockiNg (HASTEN) was developed to accelerate structure-based virtual screening using machine learning models. It has been validated using datasets both from literature (12 datasets, each containing three million molecules docked with FRED) and in-house sources (one dataset of four million compounds docked with Glide). HASTEN showed reasonable performance by having the mean recall value of 0.78 of the top one percent scoring molecules after docking 10 % of the dataset for the literature data, whereas excellent recall value of 0.95 was achieved for the in-house data. The program can be used with any docking- and machine learning methodology, and is freely available from https://github.com/TuomoKalliokoski/HASTEN.
Collapse
|
43
|
Bender A, Cortes-Ciriano I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data. Drug Discov Today 2021; 26:1040-1052. [PMID: 33508423 PMCID: PMC8132984 DOI: 10.1016/j.drudis.2020.11.037] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 11/07/2020] [Accepted: 11/30/2020] [Indexed: 12/11/2022]
Abstract
'Artificial Intelligence' (AI) has recently had a profound impact on areas such as image and speech recognition, and this progress has already translated into practical applications. However, in the drug discovery field, such advances remains scarce, and one of the reasons is intrinsic to the data used. In this review, we discuss aspects of, and differences in, data from different domains, namely the image, speech, chemical, and biological domains, the amounts of data available, and how relevant they are to drug discovery. Improvements in the future are needed with respect to our understanding of biological systems, and the subsequent generation of practically relevant data in sufficient quantities, to truly advance the field of AI in drug discovery, to enable the discovery of novel chemistry, with novel modes of action, which shows desirable efficacy and safety in the clinic.
Collapse
Affiliation(s)
- Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK; Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Isidro Cortes-Ciriano
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|