1
|
Gao J, Shen Z, Lu Y, Shen L, Zhou B, Xu D, Dai H, Xu L, Che J, Dong X. KnoMol: A Knowledge-Enhanced Graph Transformer for Molecular Property Prediction. J Chem Inf Model 2024. [PMID: 39323109 DOI: 10.1021/acs.jcim.4c01092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
Molecular property prediction (MPP) techniques are pivotal in reducing drug development costs by preemptively predicting bioactivity and ADMET properties. Despite the application of numerous deep learning approaches, enhancing the representational capacity of these models remains a significant challenge. This paper presents a novel knowledge-based Transformer framework, KnoMol, designed to improve the understanding of molecular structures. KnoMol integrates expert chemical knowledge into the Transformer, emulating the analytical methods of medicinal chemists. Additionally, the multiperspective attention mechanism provides a more precise way to represent ring systems. In the evaluation experiments, KnoMol achieved state-of-the-art performance on both MoleculeNet and small-scale data sets, surpassing existing models in terms of accuracy and generalization. Further research indicated that the incorporation of knowledge significantly reduces KnoMol's reliance on data volumes, offering a solution to the challenge of data scarcity. Moreover, KnoMol identified several new inhibitors of HER2 in a case study, demonstrating its value in real-world applications. Overall, this research not only provides a powerful tool for MPP but also serves as a successful precedent for embedding knowledge into Transformers, with positive implications for computer-aided drug discovery and the development of MPP algorithms.
Collapse
Affiliation(s)
- Jian Gao
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Center for AI and Intelligent Medicine, Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou 310018, China
| | - Zheyuan Shen
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yan Lu
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310058, China
| | - Liteng Shen
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Binbin Zhou
- Department of Computer Science and Computing, Zhejiang University City College, Hangzhou 310015, China
| | - Donghang Xu
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310058, China
| | - Haibin Dai
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310058, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Jinxin Che
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xiaowu Dong
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
2
|
Jiang X, Tan L, Zou Q. DGCL: dual-graph neural networks contrastive learning for molecular property prediction. Brief Bioinform 2024; 25:bbae474. [PMID: 39331017 PMCID: PMC11428321 DOI: 10.1093/bib/bbae474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 08/16/2024] [Accepted: 09/13/2024] [Indexed: 09/28/2024] Open
Abstract
In this paper, we propose DGCL, a dual-graph neural networks (GNNs)-based contrastive learning (CL) integrated with mixed molecular fingerprints (MFPs) for molecular property prediction. The DGCL-MFP method contains two stages. In the first pretraining stage, we utilize two different GNNs as encoders to construct CL, rather than using the method of generating enhanced graphs as before. Precisely, DGCL aggregates and enhances features of the same molecule by the Graph Isomorphism Network and the Graph Attention Network, with representations extracted from the same molecule serving as positive samples, and others marked as negative ones. In the downstream tasks training stage, features extracted from the two above pretrained graph networks and the meticulously selected MFPs are concated together to predict molecular properties. Our experiments show that DGCL enhances the performance of existing GNNs by achieving or surpassing the state-of-the-art self-supervised learning models on multiple benchmark datasets. Specifically, DGCL increases the average performance of classification tasks by 3.73$\%$ and improves the performance of regression task Lipo by 0.126. Through ablation studies, we validate the impact of network fusion strategies and MFPs on model performance. In addition, DGCL's predictive performance is further enhanced by weighting different molecular features based on the Extended Connectivity Fingerprint. The code and datasets of DGCL will be made publicly available.
Collapse
Affiliation(s)
- Xiuyu Jiang
- School of Computer Science and Engineering, Sun Yat-sen University, Waihuan East Street, Guangzhou 510006, China
| | - Liqin Tan
- School of Computer Science and Engineering, Sun Yat-sen University, Waihuan East Street, Guangzhou 510006, China
| | - Qingsong Zou
- School of Computer Science and Engineering, Sun Yat-sen University, Waihuan East Street, Guangzhou 510006, China
| |
Collapse
|
3
|
Zhang Y, Shen C, Xia K. Multi-Cover Persistence (MCP)-based machine learning for polymer property prediction. Brief Bioinform 2024; 25:bbae465. [PMID: 39323091 PMCID: PMC11424509 DOI: 10.1093/bib/bbae465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/07/2024] [Accepted: 09/05/2024] [Indexed: 09/27/2024] Open
Abstract
Accurate and efficient prediction of polymers properties is crucial for polymer design. Recently, data-driven artificial intelligence (AI) models have demonstrated great promise in polymers property analysis. Even with the great progresses, a pivotal challenge in all the AI-driven models remains to be the effective representation of molecules. Here we introduce Multi-Cover Persistence (MCP)-based molecular representation and featurization for the first time. Our MCP-based polymer descriptors are combined with machine learning models, in particular, Gradient Boosting Tree (GBT) models, for polymers property prediction. Different from all previous molecular representation, polymer molecular structure and interactions are represented as MCP, which utilizes Delaunay slices at different dimensions and Rhomboid tiling to characterize the complicated geometric and topological information within the data. Statistic features from the generated persistent barcodes are used as polymer descriptors, and further combined with GBT model. Our model has been extensively validated on polymer benchmark datasets. It has been found that our models can outperform traditional fingerprint-based models and has similar accuracy with geometric deep learning models. In particular, our model tends to be more effective on large-sized monomer structures, demonstrating the great potential of MCP in characterizing more complicated polymer data. This work underscores the potential of MCP in polymer informatics, presenting a novel perspective on molecular representation and its application in polymer science.
Collapse
Affiliation(s)
- Yipeng Zhang
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Cong Shen
- Department of Mathematics, National University of Singapore, Singapore 119076, Singapore
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| |
Collapse
|
4
|
Ren JN, Chen Q, Ye HYX, Cao C, Guo YM, Yang JR, Wang H, Khan MZI, Chen JZ. FGTN: Fragment-based graph transformer network for predicting reproductive toxicity. Arch Toxicol 2024:10.1007/s00204-024-03866-4. [PMID: 39292235 DOI: 10.1007/s00204-024-03866-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 09/10/2024] [Indexed: 09/19/2024]
Abstract
Reproductive toxicity is one of the important issues in chemical safety. Traditional laboratory testing methods are costly and time-consuming with raised ethical issues. Only a few in silico models have been reported to predict human reproductive toxicity, but none of them make full use of the topological information of compounds. In addition, most existing atom-based graph neural network methods focus on attributing model predictions to individual nodes or edges rather than chemically meaningful fragments or substructures. In current studies, we develop a novel fragment-based graph transformer network (FGTN) approach to generate the QSAR model of human reproductive toxicity by considering internal topological structure information of compounds. In the FGTN model, the compound is represented by a graph architecture using fragments to be nodes and bonds linking two fragments to be edges. A super molecule-level node is further proposed to connect all fragment nodes by undirected edges, obtaining global molecular features from fragment embeddings. The FGTN model achieved an accuracy (ACC) of 0.861 and an area under the receiver operating characteristic curve (AUC) value of 0.914 on nonredundant blind tests, outperforming traditional fingerprint-based machine learning models and atom-based GCN model. The FGTN model can attribute toxic predictions to fragments, generating specific structural alerts for the positive compound. Moreover, FGTN may also have the capability to distinguish various chemical isomers. We believe that FGTN can be used as a reliable and effective tool for human reproductive toxicity prediction in contribution to the advancement of chemical safety assessment.
Collapse
Affiliation(s)
- Jia-Nan Ren
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Qiang Chen
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Hong-Yu-Xiang Ye
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Cheng Cao
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
- Polytechnic Institute, Zhejiang University, 269 Shixiang Rd., Hangzhou, 310015, Zhejiang, China
| | - Ya-Min Guo
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Jin-Rong Yang
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
- Polytechnic Institute, Zhejiang University, 269 Shixiang Rd., Hangzhou, 310015, Zhejiang, China
| | - Hao Wang
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Muhammad Zafar Irshad Khan
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Jian-Zhong Chen
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
5
|
Zeng Z, Yin B, Wang S, Liu J, Yang C, Yao H, Sun X, Sun M, Xie G, Liu Z. ChatMol: interactive molecular discovery with natural language. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae534. [PMID: 39222004 DOI: 10.1093/bioinformatics/btae534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 08/24/2024] [Accepted: 08/29/2024] [Indexed: 09/04/2024]
Abstract
MOTIVATION Natural language is poised to become a key medium for human-machine interactions in the era of large language models. In the field of biochemistry, tasks such as property prediction and molecule mining are critically important yet technically challenging. Bridging molecular expressions in natural language and chemical language can significantly enhance the interpretability and ease of these tasks. Moreover, it can integrate chemical knowledge from various sources, leading to a deeper understanding of molecules. RESULTS Recognizing these advantages, we introduce the concept of conversational molecular design, a novel task that utilizes natural language to describe and edit target molecules. To better accomplish this task, we develop ChatMol, a knowledgeable and versatile generative pretrained model. This model is enhanced by incorporating experimental property information, molecular spatial knowledge, and the associations between natural and chemical languages. Several typical solutions including large language models (e.g. ChatGPT) are evaluated, proving the challenge of conversational molecular design and the effectiveness of our knowledge enhancement approach. Case observations and analysis offer insights and directions for further exploration of natural-language interaction in molecular discovery. AVAILABILITY AND IMPLEMENTATION Codes and data are provided in https://github.com/Ellenzzn/ChatMol/tree/main.
Collapse
Affiliation(s)
- Zheni Zeng
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| | - Bangchen Yin
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| | | | - Jiarui Liu
- PingAn Technology, Beijing 100027, China
| | - Cheng Yang
- School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | | | | | - Maosong Sun
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| | | | - Zhiyuan Liu
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
6
|
Xu M, Xiao X, Chen Y, Zhou X, Parisi L, Ma R. 3D physiologically-informed deep learning for drug discovery of a novel vascular endothelial growth factor receptor-2 (VEGFR2). Heliyon 2024; 10:e35769. [PMID: 39220924 PMCID: PMC11365333 DOI: 10.1016/j.heliyon.2024.e35769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 08/01/2024] [Accepted: 08/02/2024] [Indexed: 09/04/2024] Open
Abstract
Angiogenesis is an essential process in tumorigenesis, tumor invasion, and metastasis, and is an intriguing pathway for drug discovery. Targeting vascular endothelial growth factor receptor 2 (VEGFR2) to inhibit tumor angiogenic pathways has been widely explored and adopted in clinical practice. However, most drugs, such as the Food and Drug Administration -approved drug axitinib (ATC code: L01EK01), have considerable side effects and limited tolerability. Therefore, there is an urgent need for the development of novel VEGFR2 inhibitors. In this study, we propose a novel strategy to design potential candidates targeting VEGFR2 using three-dimensional (3D) deep learning and structural modeling methods. A geometric-enhanced molecular representation learning method (GEM) model employing a graph neural network (GNN) as its underlying predictive algorithm was used to predict the activity of the candidates. In the structural modeling method, flexible docking was performed to screen data with high affinity and explore the mechanism of the inhibitors. Small -molecule compounds with consistently improved properties were identified based on the intersection of the scores obtained from both methods. Candidates identified using the GEM-GNN model were selected for in silico modeling using molecular dynamics simulations to further validate their efficacy. The GEM-GNN model enabled the identification of candidate compounds with potentially more favorable properties than the existing drug, axitinib, while achieving higher efficacy.
Collapse
Affiliation(s)
- Mengyang Xu
- Faculty of Biology, Shenzhen MSU-BIT University, Shenzhen, 518172, Guangdong, China
| | - Xiaoyue Xiao
- Faculty of Biology, Shenzhen MSU-BIT University, Shenzhen, 518172, Guangdong, China
| | - Yinglu Chen
- Faculty of Biology, Shenzhen MSU-BIT University, Shenzhen, 518172, Guangdong, China
| | - Xiaoyan Zhou
- Faculty of Biology, Shenzhen MSU-BIT University, Shenzhen, 518172, Guangdong, China
| | - Luca Parisi
- Department of Computer Science, Tutorantis, Edinburgh, EH2 4AN, Scotland, United Kingdom
| | - Renfei Ma
- Faculty of Biology, Shenzhen MSU-BIT University, Shenzhen, 518172, Guangdong, China
| |
Collapse
|
7
|
Wang L, Wang S, Yang H, Li S, Wang X, Zhou Y, Tian S, Liu L, Bai F. Conformational Space Profiling Enhances Generic Molecular Representation for AI-Powered Ligand-Based Drug Discovery. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024:e2403998. [PMID: 39206753 DOI: 10.1002/advs.202403998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/25/2024] [Indexed: 09/04/2024]
Abstract
The molecular representation model is a neural network that converts molecular representations (SMILES, Graph) into feature vectors, and is an essential module applied across a wide range of artificial intelligence-driven drug discovery scenarios. However, current molecular representation models rarely consider the three-dimensional conformational space of molecules, losing sight of the dynamic nature of small molecules as well as the essence of molecular conformational space that covers the heterogeneity of molecule properties, such as the multi-target mechanism of action, recognition of different biomolecules, dynamics in cytoplasm and membrane. In this study, a new model named GeminiMol is proposed to incorporate conformational space profiles into molecular representation learning, which extracts the feature of capturing the complicated interplay between the molecular structure and the conformational space. Although GeminiMol is pre-trained on a relatively small-scale molecular dataset (39290 molecules), it shows balanced and superior performance not only on 67 molecular properties predictions but also on 73 cellular activity predictions and 171 zero-shot tasks (including virtual screening and target identification). By capturing the molecular conformational space profile, the strategy paves the way for rapid exploration of chemical space and facilitates changing paradigms for drug design.
Collapse
Affiliation(s)
- Lin Wang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China
| | - Shihang Wang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China
| | - Hao Yang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China
| | - Shiwei Li
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China
| | - Xinyu Wang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China
| | - Yongqi Zhou
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China
| | - Siyuan Tian
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China
| | - Lu Liu
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China
| | - Fang Bai
- Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology, Information Science and Technology, Shanghai Tech University, Shanghai Clinical Research and Trial Center, Shanghai, 201210, China
| |
Collapse
|
8
|
Zhou J, Huang M. Navigating the landscape of enzyme design: from molecular simulations to machine learning. Chem Soc Rev 2024; 53:8202-8239. [PMID: 38990263 DOI: 10.1039/d4cs00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Global environmental issues and sustainable development call for new technologies for fine chemical synthesis and waste valorization. Biocatalysis has attracted great attention as the alternative to the traditional organic synthesis. However, it is challenging to navigate the vast sequence space to identify those proteins with admirable biocatalytic functions. The recent development of deep-learning based structure prediction methods such as AlphaFold2 reinforced by different computational simulations or multiscale calculations has largely expanded the 3D structure databases and enabled structure-based design. While structure-based approaches shed light on site-specific enzyme engineering, they are not suitable for large-scale screening of potential biocatalysts. Effective utilization of big data using machine learning techniques opens up a new era for accelerated predictions. Here, we review the approaches and applications of structure-based and machine-learning guided enzyme design. We also provide our view on the challenges and perspectives on effectively employing enzyme design approaches integrating traditional molecular simulations and machine learning, and the importance of database construction and algorithm development in attaining predictive ML models to explore the sequence fitness landscape for the design of admirable biocatalysts.
Collapse
Affiliation(s)
- Jiahui Zhou
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| | - Meilan Huang
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| |
Collapse
|
9
|
Aksamit N, Tchagang A, Li Y, Ombuki-Berman B. Hybrid fragment-SMILES tokenization for ADMET prediction in drug discovery. BMC Bioinformatics 2024; 25:255. [PMID: 39090573 PMCID: PMC11295479 DOI: 10.1186/s12859-024-05861-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 07/10/2024] [Indexed: 08/04/2024] Open
Abstract
BACKGROUND Drug discovery and development is the extremely costly and time-consuming process of identifying new molecules that can interact with a biomarker target to interrupt the disease pathway of interest. In addition to binding the target, a drug candidate needs to satisfy multiple properties affecting absorption, distribution, metabolism, excretion, and toxicity (ADMET). Artificial intelligence approaches provide an opportunity to improve each step of the drug discovery and development process, in which the first question faced by us is how a molecule can be informatively represented such that the in-silico solutions are optimized. RESULTS This study introduces a novel hybrid SMILES-fragment tokenization method, coupled with two pre-training strategies, utilizing a Transformer-based model. We investigate the efficacy of hybrid tokenization in improving the performance of ADMET prediction tasks. Our approach leverages MTL-BERT, an encoder-only Transformer model that achieves state-of-the-art ADMET predictions, and contrasts the standard SMILES tokenization with our hybrid method across a spectrum of fragment library cutoffs. CONCLUSION The findings reveal that while an excess of fragments can impede performance, using hybrid tokenization with high frequency fragments enhances results beyond the base SMILES tokenization. This advancement underscores the potential of integrating fragment- and character-level molecular features within the training of Transformer models for ADMET property prediction.
Collapse
Affiliation(s)
- Nicholas Aksamit
- Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada
| | - Alain Tchagang
- Digital Technologies Research Centre, National Research Council Canada, 1200 Montreal Road, Ottawa, ON, K1A 0R6, Canada
| | - Yifeng Li
- Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
- Department of Biological Sciences, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
| | - Beatrice Ombuki-Berman
- Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
| |
Collapse
|
10
|
Hou L, Xiang H, Zeng X, Cao D, Zeng L, Song B. Attribute-guided prototype network for few-shot molecular property prediction. Brief Bioinform 2024; 25:bbae394. [PMID: 39133096 PMCID: PMC11318080 DOI: 10.1093/bib/bbae394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/08/2024] [Accepted: 07/27/2024] [Indexed: 08/13/2024] Open
Abstract
The molecular property prediction (MPP) plays a crucial role in the drug discovery process, providing valuable insights for molecule evaluation and screening. Although deep learning has achieved numerous advances in this area, its success often depends on the availability of substantial labeled data. The few-shot MPP is a more challenging scenario, which aims to identify unseen property with only few available molecules. In this paper, we propose an attribute-guided prototype network (APN) to address the challenge. APN first introduces an molecular attribute extractor, which can not only extract three different types of fingerprint attributes (single fingerprint attributes, dual fingerprint attributes, triplet fingerprint attributes) by considering seven circular-based, five path-based, and two substructure-based fingerprints, but also automatically extract deep attributes from self-supervised learning methods. Furthermore, APN designs the Attribute-Guided Dual-channel Attention module to learn the relationship between the molecular graphs and attributes and refine the local and global representation of the molecules. Compared with existing works, APN leverages high-level human-defined attributes and helps the model to explicitly generalize knowledge in molecular graphs. Experiments on benchmark datasets show that APN can achieve state-of-the-art performance in most cases and demonstrate that the attributes are effective for improving few-shot MPP performance. In addition, the strong generalization ability of APN is verified by conducting experiments on data from different domains.
Collapse
Affiliation(s)
- Linlin Hou
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai 201109, China
| | - Hongxin Xiang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai 201109, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410083, China
| | - Li Zeng
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai 201109, China
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| |
Collapse
|
11
|
Wang J, Yang Z, Chen C, Yao G, Wan X, Bao S, Ding J, Wang L, Jiang H. MPEK: a multitask deep learning framework based on pretrained language models for enzymatic reaction kinetic parameters prediction. Brief Bioinform 2024; 25:bbae387. [PMID: 39129365 PMCID: PMC11317537 DOI: 10.1093/bib/bbae387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 06/24/2024] [Accepted: 07/23/2024] [Indexed: 08/13/2024] Open
Abstract
Enzymatic reaction kinetics are central in analyzing enzymatic reaction mechanisms and target-enzyme optimization, and thus in biomanufacturing and other industries. The enzyme turnover number (kcat) and Michaelis constant (Km), key kinetic parameters for measuring enzyme catalytic efficiency, are crucial for analyzing enzymatic reaction mechanisms and the directed evolution of target enzymes. Experimental determination of kcat and Km is costly in terms of time, labor, and cost. To consider the intrinsic connection between kcat and Km and further improve the prediction performance, we propose a universal pretrained multitask deep learning model, MPEK, to predict these parameters simultaneously while considering pH, temperature, and organismal information. Through testing on the same kcat and Km test datasets, MPEK demonstrated superior prediction performance over the previous models. Specifically, MPEK achieved the Pearson coefficient of 0.808 for predicting kcat, improving ca. 14.6% and 7.6% compared to the DLKcat and UniKP models, and it achieved the Pearson coefficient of 0.777 for predicting Km, improving ca. 34.9% and 53.3% compared to the Kroll_model and UniKP models. More importantly, MPEK was able to reveal enzyme promiscuity and was sensitive to slight changes in the mutant enzyme sequence. In addition, in three case studies, it was shown that MPEK has the potential for assisted enzyme mining and directed evolution. To facilitate in silico evaluation of enzyme catalytic efficiency, we have established a web server implementing this model, which can be accessed at http://mathtc.nscc-tj.cn/mpek.
Collapse
Affiliation(s)
- Jingjing Wang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Zhijiang Yang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Chang Chen
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Ge Yao
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Xiukun Wan
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Shaoheng Bao
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| |
Collapse
|
12
|
An H, Liu X, Cai W, Shao X. AttenGpKa: A Universal Predictor of Solvation Acidity Using Graph Neural Network and Molecular Topology. J Chem Inf Model 2024; 64:5480-5491. [PMID: 38982757 DOI: 10.1021/acs.jcim.4c00449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
Rapid and accurate calculation of acid dissociation constant (pKa) is crucial for designing chemical synthesis routes, optimizing catalysts, and predicting chemical behavior. Despite recent progress in machine learning, predicting solvation acidity, especially in nonaqueous solvents, remains challenging due to limited experimental data. This challenge arises from treating experimental values in different solvents as distinct data domains and modeling them separately. In this work, we treat both the solutes and solvents equally from a perspective of molecular topology and propose a highly universal framework called AttenGpKa for predicting solvation acidity. AttenGpKa is trained using 26,522 experimental pKa values from 60 pure and mixed solvents in the iBonD database. As a result, our model can simultaneously predict the pKa values of a compound in various solvents, including pure water, pure nonaqueous, and mixed solvents. AttenGpKa achieves universality by using graph neural networks and attention mechanisms to learn complex effects within solute and solvent molecules. Furthermore, encodings of both solute and solvent molecules are adaptively fused to simulate the influence of the solvent on acid dissociation. AttenGpKa demonstrates robust generalization in extensive validations. The interpretability studies further indicate that our model has effectively learnt electronic and solvent effects. A free-to-use software is provided to facilitate the use of AttenGpKa for pKa prediction.
Collapse
Affiliation(s)
- Hongle An
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
13
|
Liu H, Hu B, Chen P, Wang X, Wang H, Wang S, Wang J, Lin B, Cheng M. Docking Score ML: Target-Specific Machine Learning Models Improving Docking-Based Virtual Screening in 155 Targets. J Chem Inf Model 2024; 64:5413-5426. [PMID: 38958413 DOI: 10.1021/acs.jcim.4c00072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
In drug discovery, molecular docking methods face challenges in accurately predicting energy. Scoring functions used in molecular docking often fail to simulate complex protein-ligand interactions fully and accurately leading to biases and inaccuracies in virtual screening and target predictions. We introduce the "Docking Score ML", developed from an analysis of over 200,000 docked complexes from 155 known targets for cancer treatments. The scoring functions used are founded on bioactivity data sourced from ChEMBL and have been fine-tuned using both supervised machine learning and deep learning techniques. We validated our approach extensively using multiple data sets such as validation of selectivity mechanism, the DUDE, DUD-AD, and LIT-PCBA data sets, and performed a multitarget analysis on drugs like sunitinib. To enhance prediction accuracy, feature fusion techniques were explored. By merging the capabilities of the Graph Convolutional Network (GCN) with multiple docking functions, our results indicated a clear superiority of our methodologies over conventional approaches. These advantages demonstrate that Docking Score ML is an efficient and accurate tool for virtual screening and reverse docking.
Collapse
Affiliation(s)
- Haihan Liu
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Baichun Hu
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Peiying Chen
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Xiao Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Hanxun Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Shizun Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Jian Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Bin Lin
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Maosheng Cheng
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| |
Collapse
|
14
|
Yang ZX, Xie XT, Kang PL, Wang ZX, Shang C, Liu ZP. Many-Body Function Corrected Neural Network with Atomic Attention (MBNN-att) for Molecular Property Prediction. J Chem Theory Comput 2024. [PMID: 39034686 DOI: 10.1021/acs.jctc.4c00660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/23/2024]
Abstract
Recent years have seen a surge of machine learning (ML) in chemistry for predicting chemical properties, but a low-cost, general-purpose, and high-performance model, desirable to be accessible on central processing unit (CPU) devices, remains not available. For this purpose, here we introduce an atomic attention mechanism into many-body function corrected neural network (MBNN), namely, MBNN-att ML model, to predict both the extensive and intensive properties of molecules and materials. The MBNN-att uses explicit function descriptors as the inputs for the atom-based feed-forward neural network (NN). The output of the NN is designed to be a vector to implement the multihead self-attention mechanism. This vector is split into two parts: the atomic attention weight part and the many-body-function part. The final property is obtained by summing the products of each atomic attention weight and the corresponding many-body function. We show that MBNN-att performs well on all QM9 properties, i.e., errors on all properties, below chemical accuracy, and, in particular, achieves the top performance for the energy-related extensive properties. By systematically comparing with other explicit-function-type descriptor ML models and the graph representation ML models, we demonstrate that the many-body-function framework and atomic attention mechanism are key ingredients for the high performance and the good transferability of MBNN-att in molecular property prediction.
Collapse
Affiliation(s)
- Zheng-Xin Yang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
| | - Xin-Tian Xie
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
| | - Pei-Lin Kang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
| | - Zhen-Xiong Wang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
| | - Cheng Shang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
- Shanghai Qi Zhi Institution, Shanghai 200030, China
| | - Zhi-Pan Liu
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
- Key Laboratory of Synthetic and Self-Assembly Chemistry for Organic Functional Molecules, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, China
- Shanghai Qi Zhi Institution, Shanghai 200030, China
| |
Collapse
|
15
|
Chen G, Jaffrelot Inizan T, Plé T, Lagardère L, Piquemal JP, Maday Y. Advancing Force Fields Parameterization: A Directed Graph Attention Networks Approach. J Chem Theory Comput 2024; 20:5558-5569. [PMID: 38875012 DOI: 10.1021/acs.jctc.3c01421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2024]
Abstract
Force fields (FFs) are an established tool for simulating large and complex molecular systems. However, parametrizing FFs is a challenging and time-consuming task that relies on empirical heuristics, experimental data, and computational data. Recent efforts aim to automate the assignment of FF parameters using pre-existing databases and on-the-fly ab initio data. In this study, we propose a graph-based force field (GB-FFs) model to directly derive parameters for the Generalized Amber Force Field (GAFF) from chemical environments and research into the influence of functional forms. Our end-to-end parametrization approach predicts parameters by aggregating the basic information in directed molecular graphs, eliminating the need for expert-defined procedures and enhances the accuracy and transferability of GAFF across a broader range of molecular complexes. Simulation results are compared to the original GAFF parametrization. In practice, our results demonstrate an improved transferability of the model, showcasing its improved accuracy in modeling intermolecular and torsional interactions, as well as improved solvation free energies. The optimization approach developed in this work is fully applicable to other nonpolarizable FFs as well as to polarizable ones.
Collapse
Affiliation(s)
- Gong Chen
- Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), UMR 7598 CNRS, 75005 Paris, France
| | - Théo Jaffrelot Inizan
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Thomas Plé
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Louis Lagardère
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Jean-Philip Piquemal
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Yvon Maday
- Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), UMR 7598 CNRS, 75005 Paris, France
| |
Collapse
|
16
|
Kim J, Chang W, Ji H, Joung I. Quantum-Informed Molecular Representation Learning Enhancing ADMET Property Prediction. J Chem Inf Model 2024; 64:5028-5040. [PMID: 38916580 DOI: 10.1021/acs.jcim.4c00772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
We examined pretraining tasks leveraging abundant labeled data to effectively enhance molecular representation learning in downstream tasks, specifically emphasizing graph transformers to improve the prediction of ADMET properties. Our investigation revealed limitations in previous pretraining tasks and identified more meaningful training targets, ranging from 2D molecular descriptors to extensive quantum chemistry simulations. These data were seamlessly integrated into supervised pretraining tasks. The implementation of our pretraining strategy and multitask learning outperforms conventional methods, achieving state-of-the-art outcomes in 7 out of 22 ADMET tasks within the Therapeutics Data Commons by utilizing a shared encoder across all tasks. Our approach underscores the effectiveness of learning molecular representations and highlights the potential for scalability when leveraging extensive data sets, marking a significant advancement in this domain.
Collapse
Affiliation(s)
- Jungwoo Kim
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| | - Woojae Chang
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| | - Hyunjun Ji
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| | - InSuk Joung
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| |
Collapse
|
17
|
Truong-Quoc C, Lee JY, Kim KS, Kim DN. Prediction of DNA origami shape using graph neural network. NATURE MATERIALS 2024; 23:984-992. [PMID: 38486095 DOI: 10.1038/s41563-024-01846-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 02/22/2024] [Indexed: 07/10/2024]
Abstract
Unlike proteins, which have a wealth of validated structural data, experimentally or computationally validated DNA origami datasets are limited. Here we present a graph neural network that can predict the three-dimensional conformation of DNA origami assemblies both rapidly and accurately. We develop a hybrid data-driven and physics-informed approach for model training, designed to minimize not only the data-driven loss but also the physics-informed loss. By employing an ensemble strategy, the model can successfully infer the shape of monomeric DNA origami structures almost in real time. Further refinement of the model in an unsupervised manner enables the analysis of supramolecular assemblies consisting of tens to hundreds of DNA blocks. The proposed model enables an automated inverse design of DNA origami structures for given target shapes. Our approach facilitates the real-time virtual prototyping of DNA origami, broadening its design space.
Collapse
Affiliation(s)
- Chien Truong-Quoc
- Department of Mechanical Engineering, Seoul National University, Seoul, Korea
| | - Jae Young Lee
- Institute of Advanced Machines and Design, Seoul National University, Seoul, Korea
| | - Kyung Soo Kim
- Department of Mechanical Engineering, Seoul National University, Seoul, Korea
| | - Do-Nyun Kim
- Department of Mechanical Engineering, Seoul National University, Seoul, Korea.
- Institute of Advanced Machines and Design, Seoul National University, Seoul, Korea.
- Institute of Engineering Research, Seoul National University, Seoul, Korea.
| |
Collapse
|
18
|
Zhang Z, He X, Long D, Luo G, Chen S. Enhancing generalizability and performance in drug-target interaction identification by integrating pharmacophore and pre-trained models. Bioinformatics 2024; 40:i539-i547. [PMID: 38940179 PMCID: PMC11211825 DOI: 10.1093/bioinformatics/btae240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION In drug discovery, it is crucial to assess the drug-target binding affinity (DTA). Although molecular docking is widely used, computational efficiency limits its application in large-scale virtual screening. Deep learning-based methods learn virtual scoring functions from labeled datasets and can quickly predict affinity. However, there are three limitations. First, existing methods only consider the atom-bond graph or one-dimensional sequence representations of compounds, ignoring the information about functional groups (pharmacophores) with specific biological activities. Second, relying on limited labeled datasets fails to learn comprehensive embedding representations of compounds and proteins, resulting in poor generalization performance in complex scenarios. Third, existing feature fusion methods cannot adequately capture contextual interaction information. RESULTS Therefore, we propose a novel DTA prediction method named HeteroDTA. Specifically, a multi-view compound feature extraction module is constructed to model the atom-bond graph and pharmacophore graph. The residue concat graph and protein sequence are also utilized to model protein structure and function. Moreover, to enhance the generalization capability and reduce the dependence on task-specific labeled data, pre-trained models are utilized to initialize the atomic features of the compounds and the embedding representations of the protein sequence. A context-aware nonlinear feature fusion method is also proposed to learn interaction patterns between compounds and proteins. Experimental results on public benchmark datasets show that HeteroDTA significantly outperforms existing methods. In addition, HeteroDTA shows excellent generalization performance in cold-start experiments and superiority in the representation learning ability of drug-target pairs. Finally, the effectiveness of HeteroDTA is demonstrated in a real-world drug discovery study. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/daydayupzzl/HeteroDTA.
Collapse
Affiliation(s)
- Zuolong Zhang
- School of Software, Henan University, Kaifeng, Henan Province 475000, China
| | - Xin He
- School of Software, Henan University, Kaifeng, Henan Province 475000, China
- Henan International Joint Laboratory of Intelligent Network Theory and Key Technology, Henan University, Kaifeng, Henan Province 475000, China
| | - Dazhi Long
- Department of Urology, Ji’an Third People’s Hospital, Ji’an, Jiangxi Province 343000, China
| | - Gang Luo
- School of Mathematics and Computer Science, Nanchang University, Nanchang, Jiangxi Province 330031, China
| | - Shengbo Chen
- Henan Engineering Research Center of Intelligent Technology and Application, Henan University, Kaifeng, Henan Province 475000, China
| |
Collapse
|
19
|
Sela M, Church JR, Schapiro I, Schneidman-Duhovny D. RhoMax: Computational Prediction of Rhodopsin Absorption Maxima Using Geometric Deep Learning. J Chem Inf Model 2024; 64:4630-4639. [PMID: 38829021 PMCID: PMC11200256 DOI: 10.1021/acs.jcim.4c00467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 05/15/2024] [Accepted: 05/17/2024] [Indexed: 06/05/2024]
Abstract
Microbial rhodopsins (MRs) are a diverse and abundant family of photoactive membrane proteins that serve as model systems for biophysical techniques. Optogenetics utilizes genetic engineering to insert specialized proteins into specific neurons or brain regions, allowing for manipulation of their activity through light and enabling the mapping and control of specific brain areas in living organisms. The obstacle of optogenetics lies in the fact that light has a limited ability to penetrate biological tissues, particularly blue light in the visible spectrum. Despite this challenge, most optogenetic systems rely on blue light due to the scarcity of red-shifted opsins. Finding additional red-shifted rhodopsins would represent a major breakthrough in overcoming the challenge of limited light penetration in optogenetics. However, determining the wavelength absorption maxima for rhodopsins based on their protein sequence is a significant hurdle. Current experimental methods are time-consuming, while computational methods lack accuracy. The paper introduces a new computational approach called RhoMax that utilizes structure-based geometric deep learning to predict the absorption wavelength of rhodopsins solely based on their sequences. The method takes advantage of AlphaFold2 for accurate modeling of rhodopsin structures. Once trained on a balanced train set, RhoMax rapidly and precisely predicted the maximum absorption wavelength of more than half of the sequences in our test set with an accuracy of 0.03 eV. By leveraging computational methods for absorption maxima determination, we can drastically reduce the time needed for designing new red-shifted microbial rhodopsins, thereby facilitating advances in the field of optogenetics.
Collapse
Affiliation(s)
- Meitar Sela
- The
Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Jonathan R. Church
- Fritz
Haber Center for Molecular Dynamics Research, Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Igor Schapiro
- Fritz
Haber Center for Molecular Dynamics Research, Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Dina Schneidman-Duhovny
- The
Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| |
Collapse
|
20
|
Li T, Huls NJ, Lu S, Hou P. Unsupervised manifold embedding to encode molecular quantum information for supervised learning of chemical data. Commun Chem 2024; 7:133. [PMID: 38862828 PMCID: PMC11166954 DOI: 10.1038/s42004-024-01217-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 06/03/2024] [Indexed: 06/13/2024] Open
Abstract
Molecular representation is critical in chemical machine learning. It governs the complexity of model development and the fulfillment of training data to avoid either over- or under-fitting. As electronic structures and associated attributes are the root cause for molecular interactions and their manifested properties, we have sought to examine the local electron information on a molecular manifold to understand and predict molecular interactions. Our efforts led to the development of a lower-dimensional representation of a molecular manifold, Manifold Embedding of Molecular Surface (MEMS), to embody surface electronic quantities. By treating a molecular surface as a manifold and computing its embeddings, the embedded electronic attributes retain the chemical intuition of molecular interactions. MEMS can be further featurized as input for chemical learning. Our solubility prediction with MEMS demonstrated the feasibility of both shallow and deep learning by neural networks, suggesting that MEMS is expressive and robust against dimensionality reduction.
Collapse
Affiliation(s)
- Tonglei Li
- Deparment of Industrial and Molecular Pharmaceutics, Purdue University, West Lafayette, 47907, IN, USA.
| | - Nicholas J Huls
- Deparment of Industrial and Molecular Pharmaceutics, Purdue University, West Lafayette, 47907, IN, USA
| | - Shan Lu
- Deparment of Industrial and Molecular Pharmaceutics, Purdue University, West Lafayette, 47907, IN, USA
| | - Peng Hou
- Deparment of Industrial and Molecular Pharmaceutics, Purdue University, West Lafayette, 47907, IN, USA
| |
Collapse
|
21
|
Zhang R, Yuan R, Tian B. PointGAT: A Quantum Chemical Property Prediction Model Integrating Graph Attention and 3D Geometry. J Chem Theory Comput 2024; 20:4115-4128. [PMID: 38727259 DOI: 10.1021/acs.jctc.3c01420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Predicting quantum chemical properties is a fundamental challenge for computational chemistry. While the development of graph neural networks has advanced molecular representation learning and property prediction, their performance could be further enhanced by incorporating three-dimensional (3D) structural geometry into two-dimensional (2D) molecular graph representation. In this study, we introduce the PointGAT model for quantum molecular property prediction, which integrates 3D molecular coordinates with graph-attention modeling. Comparison with other current models in molecular prediction tasks showed that PointGAT could provide higher predictive accuracy in various benchmark data sets from MoleculeNet, including ESOL, FreeSolv, Lipop, HIV, and 6 out of 12 tasks of the QM9 data set. To further examine PointGAT prediction of quantum mechanical (QM) energies, we constructed a C10 data set comprising 11,841 charged and chiral carbocation intermediates with QM energies calculated at the DM21/6-31G*//B3LYP/6-31G* levels. Notably, PointGAT achieved an R2 value of 0.950 and an MAE of 1.616 kcal/mol, outperforming even the best-performing graph neural network model with a reduction of 0.216 kcal/mol in MAE and an improvement of 0.050 in R2. Additional ablation studies indicated that incorporating molecular geometry into the model resulted in markedly higher predictive accuracy, reducing the MAE value from 1.802 to 1.616 kcal/mol. Moreover, visualization of PointGAT atomic attention weights suggested its predictions were interpretable. Findings in this study support the application of PointGAT as a powerful and versatile tool for quantum chemical property prediction that can facilitate high-accuracy modeling for fundamental exploration of chemical space as well as drug design and molecular engineering.
Collapse
Affiliation(s)
- Rong Zhang
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Rongqing Yuan
- Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
22
|
Tang Q, Ratnayake R, Seabra G, Jiang Z, Fang R, Cui L, Ding Y, Kahveci T, Bian J, Li C, Luesch H, Li Y. Morphological profiling for drug discovery in the era of deep learning. Brief Bioinform 2024; 25:bbae284. [PMID: 38886164 PMCID: PMC11182685 DOI: 10.1093/bib/bbae284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 05/13/2024] [Accepted: 06/03/2024] [Indexed: 06/20/2024] Open
Abstract
Morphological profiling is a valuable tool in phenotypic drug discovery. The advent of high-throughput automated imaging has enabled the capturing of a wide range of morphological features of cells or organisms in response to perturbations at the single-cell resolution. Concurrently, significant advances in machine learning and deep learning, especially in computer vision, have led to substantial improvements in analyzing large-scale high-content images at high throughput. These efforts have facilitated understanding of compound mechanism of action, drug repurposing, characterization of cell morphodynamics under perturbation, and ultimately contributing to the development of novel therapeutics. In this review, we provide a comprehensive overview of the recent advances in the field of morphological profiling. We summarize the image profiling analysis workflow, survey a broad spectrum of analysis strategies encompassing feature engineering- and deep learning-based approaches, and introduce publicly available benchmark datasets. We place a particular emphasis on the application of deep learning in this pipeline, covering cell segmentation, image representation learning, and multimodal learning. Additionally, we illuminate the application of morphological profiling in phenotypic drug discovery and highlight potential challenges and opportunities in this field.
Collapse
Affiliation(s)
- Qiaosi Tang
- Calico Life Sciences, South San Francisco, CA 94080, United States
| | - Ranjala Ratnayake
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, United States
| | - Gustavo Seabra
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, United States
| | - Zhe Jiang
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, United States
| | - Ruogu Fang
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, United States
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL 32611, United States
| | - Lina Cui
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, United States
| | - Yousong Ding
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, United States
| | - Tamer Kahveci
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, United States
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States
| | - Chenglong Li
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, United States
| | - Hendrik Luesch
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, United States
| | - Yanjun Li
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, United States
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, United States
| |
Collapse
|
23
|
Shen A, Yuan M, Ma Y, Du J, Wang M. Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction. Brief Bioinform 2024; 25:bbae256. [PMID: 38801702 PMCID: PMC11129775 DOI: 10.1093/bib/bbae256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 04/25/2024] [Accepted: 05/15/2024] [Indexed: 05/29/2024] Open
Abstract
Self-supervised learning plays an important role in molecular representation learning because labeled molecular data are usually limited in many tasks, such as chemical property prediction and virtual screening. However, most existing molecular pre-training methods focus on one modality of molecular data, and the complementary information of two important modalities, SMILES and graph, is not fully explored. In this study, we propose an effective multi-modality self-supervised learning framework for molecular SMILES and graph. Specifically, SMILES data and graph data are first tokenized so that they can be processed by a unified Transformer-based backbone network, which is trained by a masked reconstruction strategy. In addition, we introduce a specialized non-overlapping masking strategy to encourage fine-grained interaction between these two modalities. Experimental results show that our framework achieves state-of-the-art performance in a series of molecular property prediction tasks, and a detailed ablation study demonstrates efficacy of the multi-modality framework and the masking strategy.
Collapse
Affiliation(s)
- Ao Shen
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
| | - Mingzhi Yuan
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
| | - Yingfan Ma
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
| | - Jie Du
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
| | - Manning Wang
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
| |
Collapse
|
24
|
Xiang W, Zhong F, Ni L, Zheng M, Li X, Shi Q, Wang D. Gram matrix: an efficient representation of molecular conformation and learning objective for molecular pretraining. Brief Bioinform 2024; 25:bbae340. [PMID: 38990515 PMCID: PMC11238115 DOI: 10.1093/bib/bbae340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 06/05/2024] [Accepted: 06/28/2024] [Indexed: 07/12/2024] Open
Abstract
Accurate prediction of molecular properties is fundamental in drug discovery and development, providing crucial guidance for effective drug design. A critical factor in achieving accurate molecular property prediction lies in the appropriate representation of molecular structures. Presently, prevalent deep learning-based molecular representations rely on 2D structure information as the primary molecular representation, often overlooking essential three-dimensional (3D) conformational information due to the inherent limitations of 2D structures in conveying atomic spatial relationships. In this study, we propose employing the Gram matrix as a condensed representation of 3D molecular structures and for efficient pretraining objectives. Subsequently, we leverage this matrix to construct a novel molecular representation model, Pre-GTM, which inherently encapsulates 3D information. The model accurately predicts the 3D structure of a molecule by estimating the Gram matrix. Our findings demonstrate that Pre-GTM model outperforms the baseline Graphormer model and other pretrained models in the QM9 and MoleculeNet quantitative property prediction task. The integration of the Gram matrix as a condensed representation of 3D molecular structure, incorporated into the Pre-GTM model, opens up promising avenues for its potential application across various domains of molecular research, including drug design, materials science, and chemical engineering.
Collapse
Affiliation(s)
| | - Feisheng Zhong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350122, China
| | - Lin Ni
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Qian Shi
- Lingang Laboratory, Shanghai 200031, China
| | | |
Collapse
|
25
|
Wang D, Wang Y, Evans L, Tiwary P. From Latent Dynamics to Meaningful Representations. J Chem Theory Comput 2024; 20:3503-3513. [PMID: 38649368 DOI: 10.1021/acs.jctc.4c00249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
While representation learning has been central to the rise of machine learning and artificial intelligence, a key problem remains in making the learned representations meaningful. For this, the typical approach is to regularize the learned representation through prior probability distributions. However, such priors are usually unavailable or are ad hoc. To deal with this, recent efforts have shifted toward leveraging the insights from physical principles to guide the learning process. In this spirit, we propose a purely dynamics-constrained representation learning framework. Instead of relying on predefined probabilities, we restrict the latent representation to follow overdamped Langevin dynamics with a learnable transition density─a prior driven by statistical mechanics. We show that this is a more natural constraint for representation learning in stochastic dynamical systems, with the crucial ability to uniquely identify the ground truth representation. We validate our framework for different systems including a real-world fluorescent DNA movie data set. We show that our algorithm can uniquely identify orthogonal, isometric, and meaningful latent representations.
Collapse
Affiliation(s)
- Dedi Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| | - Yihang Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| | - Luke Evans
- Department of Mathematics, University of Maryland, College Park, Maryland 20742, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| |
Collapse
|
26
|
Zhang H, Liu X, Cheng W, Wang T, Chen Y. Prediction of drug-target binding affinity based on deep learning models. Comput Biol Med 2024; 174:108435. [PMID: 38608327 DOI: 10.1016/j.compbiomed.2024.108435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/05/2024] [Accepted: 04/07/2024] [Indexed: 04/14/2024]
Abstract
The prediction of drug-target binding affinity (DTA) plays an important role in drug discovery. Computerized virtual screening techniques have been used for DTA prediction, greatly reducing the time and economic costs of drug discovery. However, these techniques have not succeeded in reversing the low success rate of new drug development. In recent years, the continuous development of deep learning (DL) technology has brought new opportunities for drug discovery through the DTA prediction. This shift has moved the prediction of DTA from traditional machine learning methods to DL. The DL frameworks used for DTA prediction include convolutional neural networks (CNN), graph convolutional neural networks (GCN), and recurrent neural networks (RNN), and reinforcement learning (RL), among others. This review article summarizes the available literature on DTA prediction using DL models, including DTA quantification metrics and datasets, and DL algorithms used for DTA prediction (including input representation of models, neural network frameworks, valuation indicators, and model interpretability). In addition, the opportunities, challenges, and prospects of the application of DL frameworks for DTA prediction in the field of drug discovery are discussed.
Collapse
Affiliation(s)
- Hao Zhang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Xiaoqian Liu
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Wenya Cheng
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Tianshi Wang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Yuanyuan Chen
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China.
| |
Collapse
|
27
|
Yang Z, Huang T, Pan L, Wang J, Wang L, Ding J, Xiao J. QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learning. J Cheminform 2024; 16:48. [PMID: 38685101 PMCID: PMC11059686 DOI: 10.1186/s13321-024-00843-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 04/24/2024] [Indexed: 05/02/2024] Open
Abstract
Previous studies have shown that the three-dimensional (3D) geometric and electronic structure of molecules play a crucial role in determining their key properties and intermolecular interactions. Therefore, it is necessary to establish a quantum chemical (QC) property database containing the most stable 3D geometric conformations and electronic structures of molecules. In this study, a high-quality QC property database, called QuanDB, was developed, which included structurally diverse molecular entities and featured a user-friendly interface. Currently, QuanDB contains 154,610 compounds sourced from public databases and scientific literature, with 10,125 scaffolds. The elemental composition comprises nine elements: H, C, O, N, P, S, F, Cl, and Br. For each molecule, QuanDB provides 53 global and 5 local QC properties and the most stable 3D conformation. These properties are divided into three categories: geometric structure, electronic structure, and thermodynamics. Geometric structure optimization and single point energy calculation at the theoretical level of B3LYP-D3(BJ)/6-311G(d)/SMD/water and B3LYP-D3(BJ)/def2-TZVP/SMD/water, respectively, were applied to ensure highly accurate calculations of QC properties, with the computational cost exceeding 107 core-hours. QuanDB provides high-value geometric and electronic structure information for use in molecular representation models, which are critical for machine-learning-based molecular design, thereby contributing to a comprehensive description of the chemical compound space. As a new high-quality dataset for QC properties, QuanDB is expected to become a benchmark tool for the training and optimization of machine learning models, thus further advancing the development of novel drugs and materials. QuanDB is freely available, without registration, at https://quandb.cmdrg.com/ .
Collapse
Affiliation(s)
- Zhijiang Yang
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China
| | - Tengxin Huang
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China
| | - Li Pan
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China
| | - Jingjing Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China.
| | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China.
| | - Junhua Xiao
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China.
| |
Collapse
|
28
|
Song L, Zhu H, Wang K, Li M. LGGA-MPP: Local Geometry-Guided Graph Attention for Molecular Property Prediction. J Chem Inf Model 2024; 64:3105-3113. [PMID: 38516950 DOI: 10.1021/acs.jcim.3c02058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Molecular property prediction is a fundamental task of drug discovery. With the rapid development of deep learning, computational approaches for predicting molecular properties are experiencing increasing popularity. However, these existing methods often ignore the 3D information on molecules, which is critical in molecular representation learning. In the past few years, several self-supervised learning (SSL) approaches have been proposed to exploit the geometric information by using pre-training on 3D molecular graphs and fine-tuning on 2D molecular graphs. Most of these approaches are based on the global geometry of molecules, and there is still a challenge in capturing the local structure and local interpretability. To this end, we propose local geometry-guided graph attention (LGGA), which integrates local geometry into the attention mechanism and message-passing of graph neural networks (GNNs). LGGA introduces a novel method to model molecules, enhancing the model's ability to capture intricate local structural details. Experiments on various data sets demonstrate that the integration of local geometry has a significant impact on the improved results, and our model outperforms the state-of-the-art methods for molecular property prediction, establishing its potential as a promising tool in drug discovery and related fields.
Collapse
Affiliation(s)
- Lei Song
- School of Software, XinJiang University, Urumqi 830091, China
| | - Huimin Zhu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Kaili Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
29
|
Gallegos M, Isamura BK, Popelier PLA, Martín Pendás Á. An Unsupervised Machine Learning Approach for the Automatic Construction of Local Chemical Descriptors. J Chem Inf Model 2024; 64:3059-3079. [PMID: 38498942 PMCID: PMC11040729 DOI: 10.1021/acs.jcim.3c01906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 03/06/2024] [Accepted: 03/07/2024] [Indexed: 03/20/2024]
Abstract
Condensing the many physical variables defining a chemical system into a fixed-size array poses a significant challenge in the development of chemical Machine Learning (ML). Atom Centered Symmetry Functions (ACSFs) offer an intuitive featurization approach by means of a tedious and labor-intensive selection of tunable parameters. In this work, we implement an unsupervised ML strategy relying on a Gaussian Mixture Model (GMM) to automatically optimize the ACSF parameters. GMMs effortlessly decompose the vastness of the chemical and conformational spaces into well-defined radial and angular clusters, which are then used to build tailor-made ACSFs. The unsupervised exploration of the space has demonstrated general applicability across a diverse range of systems, spanning from various unimolecular landscapes to heterogeneous databases. The impact of the sampling technique and temperature on space exploration is also addressed, highlighting the particularly advantageous role of high-temperature Molecular Dynamics (MD) simulations. The reliability of the resulting features is assessed through the estimation of the atomic charges of a prototypical capped amino acid and a heterogeneous collection of CHON molecules. The automatically constructed ACSFs serve as high-quality descriptors, consistently yielding typical prediction errors below 0.010 electrons bound for the reported atomic charges. Altering the spatial distribution of the functions with respect to the cluster highlights the critical role of symmetry rupture in achieving significantly improved features. More specifically, using two separate functions to describe the lower and upper tails of the cluster results in the best performing models with errors as low as 0.006 electrons. Finally, the effectiveness of finely tuned features was checked across different architectures, unveiling the superior performance of Gaussian Process (GP) models over Feed Forward Neural Networks (FFNNs), particularly in low-data regimes, with nearly a 2-fold increase in prediction quality. Altogether, this approach paves the way toward an easier construction of local chemical descriptors, while providing valuable insights into how radial and angular spaces should be mapped. Finally, this work opens the possibility of encoding many-body information beyond angular terms into upcoming ML features.
Collapse
Affiliation(s)
- Miguel Gallegos
- Department
of Analytical and Physical Chemistry, University
of Oviedo, Oviedo E-33006, Spain
| | | | - Paul L. A. Popelier
- Department
of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, U.K.
| | - Ángel Martín Pendás
- Department
of Analytical and Physical Chemistry, University
of Oviedo, Oviedo E-33006, Spain
| |
Collapse
|
30
|
Yao S, Song J, Jia L, Cheng L, Zhong Z, Song M, Feng Z. Fast and effective molecular property prediction with transferability map. Commun Chem 2024; 7:85. [PMID: 38632308 PMCID: PMC11024153 DOI: 10.1038/s42004-024-01169-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 04/05/2024] [Indexed: 04/19/2024] Open
Abstract
Effective transfer learning for molecular property prediction has shown considerable strength in addressing insufficient labeled molecules. Many existing methods either disregard the quantitative relationship between source and target properties, risking negative transfer, or require intensive training on target tasks. To quantify transferability concerning task-relatedness, we propose Principal Gradient-based Measurement (PGM) for transferring molecular property prediction ability. First, we design an optimization-free scheme to calculate a principal gradient for approximating the direction of model optimization on a molecular property prediction dataset. We have analyzed the close connection between the principal gradient and model optimization through mathematical proof. PGM measures the transferability as the distance between the principal gradient obtained from the source dataset and that derived from the target dataset. Then, we perform PGM on various molecular property prediction datasets to build a quantitative transferability map for source dataset selection. Finally, we evaluate PGM on multiple combinations of transfer learning tasks across 12 benchmark molecular property prediction datasets and demonstrate that it can serve as fast and effective guidance to improve the performance of a target task. This work contributes to more efficient discovery of drugs, materials, and catalysts by offering a task-relatedness quantification prior to transfer learning and understanding the relationship between chemical properties.
Collapse
Affiliation(s)
- Shaolun Yao
- Collaborative Innovation Center of Artificial Intelligence by MOE and Zhejiang Provincial Government, Zhejiang University, 310027, Hangzhou, China
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
| | - Jie Song
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
- School of Software Technology, Zhejiang University, 315048, Ningbo, China
| | - Lingxiang Jia
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
| | - Lechao Cheng
- School of Computer Science and Information Engineering, Hefei University of Technology, 230009, Hefei, China
| | - Zipeng Zhong
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
| | - Mingli Song
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
| | - Zunlei Feng
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China.
- School of Software Technology, Zhejiang University, 315048, Ningbo, China.
| |
Collapse
|
31
|
Chen J, Schwaller P. Molecular hypergraph neural networks. J Chem Phys 2024; 160:144307. [PMID: 38597317 DOI: 10.1063/5.0193557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 03/14/2024] [Indexed: 04/11/2024] Open
Abstract
Graph neural networks (GNNs) have demonstrated promising performance across various chemistry-related tasks. However, conventional graphs only model the pairwise connectivity in molecules, failing to adequately represent higher order connections, such as multi-center bonds and conjugated structures. To tackle this challenge, we introduce molecular hypergraphs and propose Molecular Hypergraph Neural Networks (MHNNs) to predict the optoelectronic properties of organic semiconductors, where hyperedges represent conjugated structures. A general algorithm is designed for irregular high-order connections, which can efficiently operate on molecular hypergraphs with hyperedges of various orders. The results show that MHNN outperforms all baseline models on most tasks of organic photovoltaic, OCELOT chromophore v1, and PCQM4Mv2 datasets. Notably, MHNN achieves this without any 3D geometric information, surpassing the baseline model that utilizes atom positions. Moreover, MHNN achieves better performance than pretrained GNNs under limited training data, underscoring its excellent data efficiency. This work provides a new strategy for more general molecular representations and property prediction tasks related to high-order connections.
Collapse
Affiliation(s)
- Junwu Chen
- Laboratory of Artificial Chemical Intelligence (LIAC), Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC), Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
32
|
Harnik Y, Milo A. A focus on molecular representation learning for the prediction of chemical properties. Chem Sci 2024; 15:5052-5055. [PMID: 38577350 PMCID: PMC10988574 DOI: 10.1039/d4sc90043j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024] Open
Abstract
Molecular representation learning (MRL) is a specialized field in which deep-learning models condense essential molecular information into a vectorized form. Whereas recent research has predominantly emphasized drug discovery and bioactivity applications, MRL holds significant potential for diverse chemical properties beyond these contexts. The recently published study by King-Smith introduces a novel application of molecular representation training and compellingly demonstrates its value in predicting molecular properties (E. King-Smith, Chem. Sci., 2024, https://doi.org/10.1039/D3SC04928K). In this focus article, we will briefly delve into MRL in chemistry and the significance of King-Smith's work within the dynamic landscape of this evolving field.
Collapse
Affiliation(s)
- Yonatan Harnik
- Department of Chemistry, Ben-Gurion University of the Negev Beer Sheva 84105 Israel
| | - Anat Milo
- Department of Chemistry, Ben-Gurion University of the Negev Beer Sheva 84105 Israel
| |
Collapse
|
33
|
Varghese AJ, Bora A, Xu M, Karniadakis GE. TransformerG2G: Adaptive time-stepping for learning temporal graph embeddings using transformers. Neural Netw 2024; 172:106086. [PMID: 38159511 DOI: 10.1016/j.neunet.2023.12.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 12/18/2023] [Accepted: 12/22/2023] [Indexed: 01/03/2024]
Abstract
Dynamic graph embedding has emerged as a very effective technique for addressing diverse temporal graph analytic tasks (i.e., link prediction, node classification, recommender systems, anomaly detection, and graph generation) in various applications. Such temporal graphs exhibit heterogeneous transient dynamics, varying time intervals, and highly evolving node features throughout their evolution. Hence, incorporating long-range dependencies from the historical graph context plays a crucial role in accurately learning their temporal dynamics. In this paper, we develop a graph embedding model with uncertainty quantification, TransformerG2G, by exploiting the advanced transformer encoder to first learn intermediate node representations from its current state (t) and previous context (over timestamps [t-1,t-l], l is the length of context). Moreover, we employ two projection layers to generate lower-dimensional multivariate Gaussian distributions as each node's latent embedding at timestamp t. We consider diverse benchmarks with varying levels of "novelty" as measured by the TEA (Temporal Edge Appearance) plots. Our experiments demonstrate that the proposed TransformerG2G model outperforms conventional multi-step methods and our prior work (DynG2G) in terms of both link prediction accuracy and computational efficiency, especially for high degree of novelty. Furthermore, the learned time-dependent attention weights across multiple graph snapshots reveal the development of an automatic adaptive time stepping enabled by the transformer. Importantly, by examining the attention weights, we can uncover temporal dependencies, identify influential elements, and gain insights into the complex interactions within the graph structure. For example, we identified a strong correlation between attention weights and node degree at the various stages of the graph topology evolution.
Collapse
Affiliation(s)
| | - Aniruddha Bora
- Division of Applied Mathematics, Brown University, Providence, RI 02912, USA
| | - Mengjia Xu
- Department of Data Science, New Jersey Institute of Technology, Newark, NJ 07102, USA; Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | - George Em Karniadakis
- School of Engineering, Brown University, Providence, RI 02912, USA; Division of Applied Mathematics, Brown University, Providence, RI 02912, USA; Pacific Northwest National Laboratory, Richland, WA 99354, USA
| |
Collapse
|
34
|
Li Y, Wang W, Liu J, Wu C. Pre-training molecular representation model with spatial geometry for property prediction. Comput Biol Chem 2024; 109:108023. [PMID: 38335852 DOI: 10.1016/j.compbiolchem.2024.108023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/22/2024] [Accepted: 02/01/2024] [Indexed: 02/12/2024]
Abstract
AI-enhanced bioinformatics and cheminformatics pivots on generating increasingly descriptive and generalized molecular representation. Accurate prediction of molecular properties needs a comprehensive description of molecular geometry. We design a novel Graph Isomorphic Network (GIN) based model integrating a three-level network structure with a dual-level pre-training approach that aligns the characteristics of molecules. In our Spatial Molecular Pre-training (SMPT) Model, the network can learn implicit geometric information in layers from lower to higher according to the dimension. Extensive evaluations against established baseline models validate the enhanced efficacy of SMPT, with notable accomplishments in classification tasks. These results emphasize the importance of spatial geometric information in molecular representation modeling and demonstrate the potential of SMPT as a valuable tool for property prediction.
Collapse
Affiliation(s)
- Yishui Li
- Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Deya Road, Changsha, 410073, China; National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Deya Road, Changsha, 410073, China.
| | - Wei Wang
- National SuperComputer Center in Tianjin, TEDA Sixth Street, Tianjin, 300450, China
| | - Jie Liu
- Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Deya Road, Changsha, 410073, China; National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Deya Road, Changsha, 410073, China
| | - Chengkun Wu
- Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Deya Road, Changsha, 410073, China; National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Deya Road, Changsha, 410073, China.
| |
Collapse
|
35
|
Chen Y, Zhang L. Hi-GeoMVP: a hierarchical geometry-enhanced deep learning model for drug response prediction. Bioinformatics 2024; 40:btae204. [PMID: 38614131 PMCID: PMC11060866 DOI: 10.1093/bioinformatics/btae204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 02/11/2024] [Accepted: 04/11/2024] [Indexed: 04/15/2024] Open
Abstract
MOTIVATION Personalized cancer treatments require accurate drug response predictions. Existing deep learning methods show promise but higher accuracy is needed to serve the purpose of precision medicine. The prediction accuracy can be improved with not only topology but geometrical information of drugs. RESULTS A novel deep learning methodology for drug response prediction is presented, named Hi-GeoMVP. It synthesizes hierarchical drug representation with multi-omics data, leveraging graph neural networks and variational autoencoders for detailed drug and cell line representations. Multi-task learning is employed to make better prediction, while both 2D and 3D molecular representations capture comprehensive drug information. Testing on the GDSC dataset confirms Hi-GeoMVP's enhanced performance, surpassing prior state-of-the-art methods by improving the Pearson correlation coefficient from 0.934 to 0.941 and decreasing the root mean square error from 0.969 to 0.931. In the case of blind test, Hi-GeoMVP demonstrated robustness, outperforming the best previous models with a superior Pearson correlation coefficient in the drug-blind test. These results underscore Hi-GeoMVP's capabilities in drug response prediction, implying its potential for precision medicine. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/matcyr/Hi-GeoMVP.
Collapse
Affiliation(s)
- Yurui Chen
- Department of Mathematics and the Centre for Data Science and Machine Learning, National University of Singapore, Singapore 119076, Singapore
| | - Louxin Zhang
- Department of Mathematics and the Centre for Data Science and Machine Learning, National University of Singapore, Singapore 119076, Singapore
| |
Collapse
|
36
|
Wu K, Yang X, Wang Z, Li N, Zhang J, Liu L. Data-balanced transformer for accelerated ionizable lipid nanoparticles screening in mRNA delivery. Brief Bioinform 2024; 25:bbae186. [PMID: 38670158 PMCID: PMC11052633 DOI: 10.1093/bib/bbae186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 02/26/2024] [Accepted: 04/05/2024] [Indexed: 04/28/2024] Open
Abstract
Despite the widespread use of ionizable lipid nanoparticles (LNPs) in clinical applications for messenger RNA (mRNA) delivery, the mRNA drug delivery system faces an efficient challenge in the screening of LNPs. Traditional screening methods often require a substantial amount of experimental time and incur high research and development costs. To accelerate the early development stage of LNPs, we propose TransLNP, a transformer-based transfection prediction model designed to aid in the selection of LNPs for mRNA drug delivery systems. TransLNP uses two types of molecular information to perceive the relationship between structure and transfection efficiency: coarse-grained atomic sequence information and fine-grained atomic spatial relationship information. Due to the scarcity of existing LNPs experimental data, we find that pretraining the molecular model is crucial for better understanding the task of predicting LNPs properties, which is achieved through reconstructing atomic 3D coordinates and masking atom predictions. In addition, the issue of data imbalance is particularly prominent in the real-world exploration of LNPs. We introduce the BalMol block to solve this problem by smoothing the distribution of labels and molecular features. Our approach outperforms state-of-the-art works in transfection property prediction under both random and scaffold data splitting. Additionally, we establish a relationship between molecular structural similarity and transfection differences, selecting 4267 pairs of molecular transfection cliffs, which are pairs of molecules that exhibit high structural similarity but significant differences in transfection efficiency, thereby revealing the primary source of prediction errors. The code, model and data are made publicly available at https://github.com/wklix/TransLNP.
Collapse
Affiliation(s)
- Kun Wu
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiulong Yang
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Na Li
- National Facility for Protein Science in Shanghai, Zhangjiang Laboratory, Shanghai Advanced Research Institute, Chinese Academy of Sciences
| | - Jialu Zhang
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lizhuang Liu
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
37
|
Chang J, Ye JC. Bidirectional generation of structure and properties through a single molecular foundation model. Nat Commun 2024; 15:2323. [PMID: 38485914 PMCID: PMC10940637 DOI: 10.1038/s41467-024-46440-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 02/27/2024] [Indexed: 03/18/2024] Open
Abstract
Recent successes of foundation models in artificial intelligence have prompted the emergence of large-scale chemical pre-trained models. Despite the growing interest in large molecular pre-trained models that provide informative representations for downstream tasks, attempts for multimodal pre-training approaches on the molecule domain were limited. To address this, here we present a multimodal molecular pre-trained model that incorporates the modalities of structure and biochemical properties, drawing inspiration from recent advances in multimodal learning techniques. Our proposed model pipeline of data handling and training objectives aligns the structure/property features in a common embedding space, which enables the model to regard bidirectional information between the molecules' structure and properties. These contributions emerge synergistic knowledge, allowing us to tackle both multimodal and unimodal downstream tasks through a single model. Through extensive experiments, we demonstrate that our model has the capabilities to solve various meaningful chemical challenges, including conditional molecule generation, property prediction, molecule classification, and reaction prediction.
Collapse
Affiliation(s)
- Jinho Chang
- Graduate School of AI, KAIST, Daejeon, South Korea
| | - Jong Chul Ye
- Graduate School of AI, KAIST, Daejeon, South Korea.
| |
Collapse
|
38
|
Han J, Kwon Y, Choi YS, Kang S. Improving chemical reaction yield prediction using pre-trained graph neural networks. J Cheminform 2024; 16:25. [PMID: 38429787 PMCID: PMC10905905 DOI: 10.1186/s13321-024-00818-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Accepted: 02/19/2024] [Indexed: 03/03/2024] Open
Abstract
Graph neural networks (GNNs) have proven to be effective in the prediction of chemical reaction yields. However, their performance tends to deteriorate when they are trained using an insufficient training dataset in terms of quantity or diversity. A promising solution to alleviate this issue is to pre-train a GNN on a large-scale molecular database. In this study, we investigate the effectiveness of GNN pre-training in chemical reaction yield prediction. We present a novel GNN pre-training method for performance improvement.Given a molecular database consisting of a large number of molecules, we calculate molecular descriptors for each molecule and reduce the dimensionality of these descriptors by applying principal component analysis. We define a pre-text task by assigning a vector of principal component scores as the pseudo-label to each molecule in the database. A GNN is then pre-trained to perform the pre-text task of predicting the pseudo-label for the input molecule. For chemical reaction yield prediction, a prediction model is initialized using the pre-trained GNN and then fine-tuned with the training dataset containing chemical reactions and their yields. We demonstrate the effectiveness of the proposed method through experimental evaluation on benchmark datasets.
Collapse
Affiliation(s)
- Jongmin Han
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon, Republic of Korea
| | - Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea.
| | - Seokho Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon, Republic of Korea.
| |
Collapse
|
39
|
Zhu Y, Chen D, Du Y, Wang Y, Liu Q, Wu S. Molecular Contrastive Pretraining with Collaborative Featurizations. J Chem Inf Model 2024; 64:1112-1122. [PMID: 38315002 DOI: 10.1021/acs.jcim.3c01468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
Molecular pretraining, which learns molecular representations over massive unlabeled data, has become a prominent paradigm to solve a variety of tasks in computational chemistry and drug discovery. Recently, prosperous progress has been made in molecular pretraining with different molecular featurizations, including 1D SMILES strings, 2D graphs, and 3D geometries. However, the role of molecular featurizations with their corresponding neural architectures in molecular pretraining remains largely unexamined. In this paper, through two case studies─chirality classification and aromatic ring counting─we first demonstrate that different featurization techniques convey chemical information differently. In light of this observation, we propose a simple and effective MOlecular pretraining framework with COllaborative featurizations (MOCO). MOCO comprehensively leverages multiple featurizations that complement each other and outperforms existing state-of-the-art models that solely rely on one or two featurizations on a wide range of molecular property prediction tasks.
Collapse
Affiliation(s)
- Yanqiao Zhu
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Dingshuo Chen
- Center for Research on Intelligent Perception and Computing, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Yuanqi Du
- Department of Computer Science, Cornell University, Ithaca, New York 14853, United States
| | - Yingze Wang
- College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Qiang Liu
- Center for Research on Intelligent Perception and Computing, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Shu Wu
- Center for Research on Intelligent Perception and Computing, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
40
|
Shi R, Yu G, Huo X, Yang Y. Prediction of chemical reaction yields with large-scale multi-view pre-training. J Cheminform 2024; 16:22. [PMID: 38403627 PMCID: PMC10895839 DOI: 10.1186/s13321-024-00815-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 02/14/2024] [Indexed: 02/27/2024] Open
Abstract
Developing machine learning models with high generalization capability for predicting chemical reaction yields is of significant interest and importance. The efficacy of such models depends heavily on the representation of chemical reactions, which has commonly been learned from SMILES or graphs of molecules using deep neural networks. However, the progression of chemical reactions is inherently determined by the molecular 3D geometric properties, which have been recently highlighted as crucial features in accurately predicting molecular properties and chemical reactions. Additionally, large-scale pre-training has been shown to be essential in enhancing the generalization capability of complex deep learning models. Based on these considerations, we propose the Reaction Multi-View Pre-training (ReaMVP) framework, which leverages self-supervised learning techniques and a two-stage pre-training strategy to predict chemical reaction yields. By incorporating multi-view learning with 3D geometric information, ReaMVP achieves state-of-the-art performance on two benchmark datasets. Notably, the experimental results indicate that ReaMVP has a significant advantage in predicting out-of-sample data, suggesting an enhanced generalization ability to predict new reactions. Scientific Contribution: This study presents the ReaMVP framework, which improves the generalization capability of machine learning models for predicting chemical reaction yields. By integrating sequential and geometric views and leveraging self-supervised learning techniques with a two-stage pre-training strategy, ReaMVP achieves state-of-the-art performance on benchmark datasets. The framework demonstrates superior predictive ability for out-of-sample data and enhances the prediction of new reactions.
Collapse
Affiliation(s)
- Runhan Shi
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Gufeng Yu
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xiaohong Huo
- Shanghai Key Laboratory for Molecular Engineering of Chiral Drugs, Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yang Yang
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
41
|
Huang J, Zhou TP, Sun N, Yu H, Yu X, Liao RZ, Yao W, Dai Z, Wu G, Zhong F. Accessing ladder-shape azetidine-fused indoline pentacycles through intermolecular regiodivergent aza-Paternò-Büchi reactions. Nat Commun 2024; 15:1431. [PMID: 38365864 PMCID: PMC10873392 DOI: 10.1038/s41467-024-45687-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 01/31/2024] [Indexed: 02/18/2024] Open
Abstract
Small molecules with conformationally rigid, three-dimensional geometry are highly desirable in drug development, toward which a direct, simple-to-complexity synthetic logic is still of considerable challenges. Here, we report intermolecular aza-[2 + 2] photocycloaddition (the aza-Paternò-Büchi reaction) of indole that facilely assembles planar building blocks into ladder-shape azetidine-fused indoline pentacycles with contiguous quaternary carbons, divergent head-to-head/head-to-tail regioselectivity, and absolute exo stereoselectivity. These products exhibit marked three-dimensionality, many of which possess 3D score values distributed in the highest 0.5% region with reference to structures from DrugBank database. Mechanistic studies elucidated the origin of the observed regio- and stereoselectivities, which arise from distortion-controlled C-N coupling scenarios. This study expands the synthetic repertoire of energy transfer catalysis for accessing structurally intriguing architectures with high molecular complexity and underexplored topological chemical space.
Collapse
Affiliation(s)
- Jianjian Huang
- Hubei Engineering Research Center for Biomaterials and Medical Protective Materials, Hubei Key Laboratory of Bioinorganic Chemistry & Materia Medica, School of Chemistry and Chemical Engineering, Huazhong University of Science and Technology (HUST), 1037 Luoyu Road, Wuhan, 430074, China
| | - Tai-Ping Zhou
- Hubei Engineering Research Center for Biomaterials and Medical Protective Materials, Hubei Key Laboratory of Bioinorganic Chemistry & Materia Medica, School of Chemistry and Chemical Engineering, Huazhong University of Science and Technology (HUST), 1037 Luoyu Road, Wuhan, 430074, China
| | - Ningning Sun
- Hubei Engineering Research Center for Biomaterials and Medical Protective Materials, Hubei Key Laboratory of Bioinorganic Chemistry & Materia Medica, School of Chemistry and Chemical Engineering, Huazhong University of Science and Technology (HUST), 1037 Luoyu Road, Wuhan, 430074, China
| | - Huaibin Yu
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, 450000, China
| | - Xixiang Yu
- Hubei Engineering Research Center for Biomaterials and Medical Protective Materials, Hubei Key Laboratory of Bioinorganic Chemistry & Materia Medica, School of Chemistry and Chemical Engineering, Huazhong University of Science and Technology (HUST), 1037 Luoyu Road, Wuhan, 430074, China
| | - Rong-Zhen Liao
- Hubei Engineering Research Center for Biomaterials and Medical Protective Materials, Hubei Key Laboratory of Bioinorganic Chemistry & Materia Medica, School of Chemistry and Chemical Engineering, Huazhong University of Science and Technology (HUST), 1037 Luoyu Road, Wuhan, 430074, China.
| | - Weijun Yao
- School of Chemistry and Chemical Engineering, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Zhifeng Dai
- School of Chemistry and Chemical Engineering, Zhejiang Sci-Tech University, Hangzhou, 310018, China
- Longgang Institute of Zhejiang Sci-Tech University, Wenzhou, 325802, China
| | - Guojiao Wu
- Hubei Engineering Research Center for Biomaterials and Medical Protective Materials, Hubei Key Laboratory of Bioinorganic Chemistry & Materia Medica, School of Chemistry and Chemical Engineering, Huazhong University of Science and Technology (HUST), 1037 Luoyu Road, Wuhan, 430074, China
| | - Fangrui Zhong
- Hubei Engineering Research Center for Biomaterials and Medical Protective Materials, Hubei Key Laboratory of Bioinorganic Chemistry & Materia Medica, School of Chemistry and Chemical Engineering, Huazhong University of Science and Technology (HUST), 1037 Luoyu Road, Wuhan, 430074, China.
| |
Collapse
|
42
|
Hong Y, Welch CJ, Piras P, Tang H. Enhanced Structure-Based Prediction of Chiral Stationary Phases for Chromatographic Enantioseparation from 3D Molecular Conformations. Anal Chem 2024. [PMID: 38308813 DOI: 10.1021/acs.analchem.3c04028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2024]
Abstract
The accurate prediction of suitable chiral stationary phases (CSPs) for resolving the enantiomers of a given compound poses a significant challenge in chiral chromatography. Previous attempts at developing machine learning models for structure-based CSP prediction have primarily relied on 1D SMILES strings [the simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings] or 2D graphical representations of molecular structures and have met with only limited success. In this study, we apply the recently developed 3D molecular conformation representation learning algorithm, which uses rapid conformational analysis and point clouds of atom positions in the 3D space, enabling efficient chemical structure-based machine learning. By harnessing the power of the rapid 3D molecular representation learning and a data set comprising over 300,000 chromatographic enantioseparation records sourced from the literature, our models afford notable improvements for the chemical structure-based choice of appropriate CSP for enantioseparation, paving the way for more efficient and informed decision-making in the field of chiral chromatography.
Collapse
Affiliation(s)
- Yuhui Hong
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana 47408, United States
| | - Christopher J Welch
- Indiana Consortium for Analytical Science & Engineering (ICASE), Indianapolis, Indiana 46202, United States
| | - Patrick Piras
- Aix Marseille Université, CNRS, Centrale Marseille, FSCM, Chiropole, Marseille 13397, France
| | - Haixu Tang
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana 47408, United States
| |
Collapse
|
43
|
Ma M, Lei X. A deep learning framework for predicting molecular property based on multi-type features fusion. Comput Biol Med 2024; 169:107911. [PMID: 38160501 DOI: 10.1016/j.compbiomed.2023.107911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/18/2023] [Accepted: 12/24/2023] [Indexed: 01/03/2024]
Abstract
Extracting expressive molecular features is essential for molecular property prediction. Sequence-based representation is a common representation of molecules, which ignores the structure information of molecules. While molecular graph representation has a weak ability in expressing the 3D structure. In this article, we try to make use of the advantages of different type representations simultaneously for molecular property prediction. Thus, we propose a fusion model named DLF-MFF, which integrates the multi-type molecular features. Specifically, we first extract four different types of features from molecular fingerprints, 2D molecular graph, 3D molecular graph and molecular image. Then, in order to learn molecular features individually, we use four essential deep learning frameworks, which correspond to four distinct molecular representations. The final molecular representation is created by integrating the four feature vectors and feeding them into prediction layer to predict molecular property. We compare DLF-MFF with 7 state-of-the-art methods on 6 benchmark datasets consisting of multiple molecular properties, the experimental results show that DLF-MFF achieves state-of-the-art performance on 6 benchmark datasets. Moreover, DLF-MFF is applied to identify potential anti-SARS-CoV-2 inhibitor from 2500 drugs. We predict probability of each drug being inferred as a 3CL protease inhibitor and also calculate the binding affinity scores between each drug and 3CL protease. The results show that DLF-MFF product better performance in the identification of anti-SARS-CoV-2 inhibitor. This work is expected to offer novel research perspectives for accurate prediction of molecular properties and provide valuable insights into drug repurposing for COVID-19.
Collapse
Affiliation(s)
- Mei Ma
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China; School of Mathematics and Statistics, Qinghai Normal University, Qinghai, 810000, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| |
Collapse
|
44
|
Chen B, Pan Z, Mou M, Zhou Y, Fu W. Is fragment-based graph a better graph-based molecular representation for drug design? A comparison study of graph-based models. Comput Biol Med 2024; 169:107811. [PMID: 38168647 DOI: 10.1016/j.compbiomed.2023.107811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 11/23/2023] [Accepted: 12/03/2023] [Indexed: 01/05/2024]
Abstract
Graph Neural Networks (GNNs) have gained significant traction in various sectors of AI-driven drug design. Over recent years, the integration of fragmentation concepts into GNNs has emerged as a potent strategy to augment the efficacy of molecular generative models. Nonetheless, challenges such as symmetry breaking and potential misrepresentation of intricate cycles and undefined functional groups raise questions about the superiority of fragment-based graph representation over traditional methods. In our research, we undertook a rigorous evaluation, contrasting the predictive prowess of eight models-developed using deep learning algorithms-across 12 benchmark datasets that span a range of properties. These models encompass established methods like GCN, AttentiveFP, and D-MPNN, as well as innovative fragment-based representation techniques. Our results indicate that fragment-based methodologies, notably PharmHGT, significantly improve model performance and interpretability, particularly in scenarios characterized by limited data availability. However, in situations with extensive training, fragment-based molecular graph representations may not necessarily eclipse traditional methods. In summation, we posit that the integration of fragmentation, as an avant-garde technique in drug design, harbors considerable promise for the future of AI-enhanced drug design.
Collapse
Affiliation(s)
- Baiyu Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 202103, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Yuan Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Wei Fu
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 202103, China.
| |
Collapse
|
45
|
Lin CX, Guan Y, Li HD. Artificial intelligence approaches for molecular representation in drug response prediction. Curr Opin Struct Biol 2024; 84:102747. [PMID: 38091924 DOI: 10.1016/j.sbi.2023.102747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/26/2023] [Accepted: 11/26/2023] [Indexed: 02/09/2024]
Abstract
Drug response prediction is essential for drug development and disease treatment. One key question in predicting drug response is the representation of molecules, which has been greatly advanced by artificial intelligence (AI) techniques in recent years. In this review, we first describe different types of representation methods, pinpointing their key principles and discussing their limitations. Thereafter we discuss potential ways how these methods could be further developed. We expect that this review will provide useful guidance for researchers in the community.
Collapse
Affiliation(s)
- Cui-Xiang Lin
- School of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, Hunan Province, PR China
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
| | - Hong-Dong Li
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, PR China.
| |
Collapse
|
46
|
Ishiai S, Yasuda I, Endo K, Yasuoka K. Graph-Neural-Network-Based Unsupervised Learning of the Temporal Similarity of Structural Features Observed in Molecular Dynamics Simulations. J Chem Theory Comput 2024; 20:819-831. [PMID: 38190503 DOI: 10.1021/acs.jctc.3c00995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Classification of molecular structures is a crucial step in molecular dynamics (MD) simulations to detect various structures and phases within systems. Molecular structures, which are commonly identified using order parameters, were recently identified using machine learning (ML), that is, the ML models acquire structural features using labeled crystals or phases via supervised learning. However, these approaches may not identify unlabeled or unknown structures, such as the imperfect crystal structures observed in nonequilibrium systems and interfaces. In this study, we proposed the use of a novel unsupervised learning framework, denoted temporal self-supervised learning (TSSL), to learn structural features and design their parameters. In TSSL, the ML models learn that the structural similarity is learned via contrastive learning based on minor short-term variations caused by perturbations in MD simulations. This learning framework is applied to a sophisticated architecture of graph neural network models that use bond angle and length data of the neighboring atoms. TSSL successfully classifies water and ice crystals based on high local ordering, and furthermore, it detects imperfect structures typical of interfaces such as the water-ice and ice-vapor interfaces.
Collapse
Affiliation(s)
- Satoki Ishiai
- Department of Mechanical Engineering, Keio University, Yokohama 223-8522, Japan
| | - Ikki Yasuda
- Department of Mechanical Engineering, Keio University, Yokohama 223-8522, Japan
| | - Katsuhiro Endo
- Department of Mechanical Engineering, Keio University, Yokohama 223-8522, Japan
- National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki 305-8568, Japan
| | - Kenji Yasuoka
- Department of Mechanical Engineering, Keio University, Yokohama 223-8522, Japan
| |
Collapse
|
47
|
Wang R, Wang T, Zhuo L, Wei J, Fu X, Zou Q, Yao X. Diff-AMP: tailored designed antimicrobial peptide framework with all-in-one generation, identification, prediction and optimization. Brief Bioinform 2024; 25:bbae078. [PMID: 38446739 PMCID: PMC10939340 DOI: 10.1093/bib/bbae078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 01/25/2024] [Accepted: 02/08/2024] [Indexed: 03/08/2024] Open
Abstract
Antimicrobial peptides (AMPs), short peptides with diverse functions, effectively target and combat various organisms. The widespread misuse of chemical antibiotics has led to increasing microbial resistance. Due to their low drug resistance and toxicity, AMPs are considered promising substitutes for traditional antibiotics. While existing deep learning technology enhances AMP generation, it also presents certain challenges. Firstly, AMP generation overlooks the complex interdependencies among amino acids. Secondly, current models fail to integrate crucial tasks like screening, attribute prediction and iterative optimization. Consequently, we develop a integrated deep learning framework, Diff-AMP, that automates AMP generation, identification, attribute prediction and iterative optimization. We innovatively integrate kinetic diffusion and attention mechanisms into the reinforcement learning framework for efficient AMP generation. Additionally, our prediction module incorporates pre-training and transfer learning strategies for precise AMP identification and screening. We employ a convolutional neural network for multi-attribute prediction and a reinforcement learning-based iterative optimization strategy to produce diverse AMPs. This framework automates molecule generation, screening, attribute prediction and optimization, thereby advancing AMP research. We have also deployed Diff-AMP on a web server, with code, data and server details available in the Data Availability section.
Collapse
Affiliation(s)
- Rui Wang
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325000 Wenzhou, China
| | - Tao Wang
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325000 Wenzhou, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325000 Wenzhou, China
| | - Jinhang Wei
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325000 Wenzhou, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, 410012 Changsha, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, 611730 Chengdu, China
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, 999078 Macao, China
| |
Collapse
|
48
|
Liu Y, Jiang Y, Zhang F, Yang Y. A Novel Multi-Scale Graph Neural Network for Metabolic Pathway Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:178-187. [PMID: 38127612 DOI: 10.1109/tcbb.2023.3345647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Predicting the metabolic pathway classes of compounds in the human body is an important problem in drug research and development. For this purpose, we propose a Multi-Scale Graph Neural Network framework, named MSGNN. The framework includes a subgraph encoder, a feature encoder and a global feature processor, and a graph augmentation strategy is adopted. The subgraph encoder is responsible for extracting the local structural features of the compound, the feature encoder learns the characteristics of the atoms, and the global feature processor processes the information from the pre-training model and the two molecular fingerprints, while the graph augmentation strategy is to expand the train set through a scientific and reasonable method. The experiment result illustrates that the accuracy, precision, recall and F1 metrics of MSGNN reach 98.17%, 94.18%, 94.43% and 94.30%, respectively, which is superior to the similar models we have known. In addition, the ablation experiment demonstrates the indispensability of MSGNN modules.
Collapse
|
49
|
Liu F, Chen J, Li X, Liu R, Zhang Y, Gao C, Shi D. Advances in Development of Selective Antitumor Inhibitors That Target PARP-1. J Med Chem 2023; 66:16464-16483. [PMID: 38088333 DOI: 10.1021/acs.jmedchem.3c00865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
Cancer is a major threat to the lives and health of people around the world, and the development of effective antitumor drugs that exhibit fewer toxic effects is an important aspect of cancer treatment. PARP inhibitors are antitumor drugs that target pathways involved in DNA-damage repair. The currently approved PARP inhibitors include olaparib, niraparib, rucaparib, talazoparib, fuzuloparib, and pamiparib. Hematological toxicities associated with the simultaneous inhibition of PARP-1 and PARP-2 have limited the clinical applications of these drugs. The present review introduces the necessity for research on the development of selective PARP-1 inhibitors from the perspective of structural and functional mechanisms of PARP-1 inhibition. A review of recently reported selective PARP-1 inhibitors provides the foundation for exploring novel strategies for designing selective PARP-1 inhibitors from the perspective of structure-activity relationships combined with computer simulations.
Collapse
Affiliation(s)
- Fang Liu
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237 Shandong P. R. China
| | - Jiashu Chen
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237 Shandong P. R. China
| | - Xiangqian Li
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237 Shandong P. R. China
| | - Ruihua Liu
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237 Shandong P. R. China
| | - Yiting Zhang
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237 Shandong P. R. China
| | - Chenxia Gao
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237 Shandong P. R. China
| | - Dayong Shi
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237 Shandong P. R. China
| |
Collapse
|
50
|
Schaduangrat N, Homdee N, Shoombuatong W. StackER: a novel SMILES-based stacked approach for the accelerated and efficient discovery of ERα and ERβ antagonists. Sci Rep 2023; 13:22994. [PMID: 38151513 PMCID: PMC10752908 DOI: 10.1038/s41598-023-50393-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 12/19/2023] [Indexed: 12/29/2023] Open
Abstract
The role of estrogen receptors (ERs) in breast cancer is of great importance in both clinical practice and scientific exploration. However, around 15-30% of those affected do not see benefits from the usual treatments owing to the innate resistance mechanisms, while 30-40% will gain resistance through treatments. In order to address this problem and facilitate community-wide efforts, machine learning (ML)-based approaches are considered one of the most cost-effective and large-scale identification methods. Herein, we propose a new SMILES-based stacked approach, termed StackER, for the accelerated and efficient identification of ERα and ERβ inhibitors. In StackER, we first established an up-to-date dataset consisting of 1,996 and 1,207 compounds for ERα and ERβ, respectively. Using the up-to-date dataset, StackER explored a wide range of different SMILES-based feature descriptors and ML algorithms in order to generate probabilistic features (PFs). Finally, the selected PFs derived from the two-step feature selection strategy were used for the development of an efficient stacked model. Both cross-validation and independent tests showed that StackER surpassed several conventional ML classifiers and the existing method in precisely predicting ERα and ERβ inhibitors. Remarkably, StackER achieved MCC values of 0.829-0.847 and 0.712-0.786 in terms of the cross-validation and independent tests, respectively, which were 5.92-8.29 and 1.59-3.45% higher than the existing method. In addition, StackER was applied to determine useful features for being ERα and ERβ inhibitors and identify FDA-approved drugs as potential ERα inhibitors in efforts to facilitate drug repurposing. This innovative stacked method is anticipated to facilitate community-wide efforts in efficiently narrowing down ER inhibitor screening.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nutta Homdee
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|