1
|
Ren JN, Chen Q, Ye HYX, Cao C, Guo YM, Yang JR, Wang H, Khan MZI, Chen JZ. FGTN: Fragment-based graph transformer network for predicting reproductive toxicity. Arch Toxicol 2024; 98:4077-4092. [PMID: 39292235 DOI: 10.1007/s00204-024-03866-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 09/10/2024] [Indexed: 09/19/2024]
Abstract
Reproductive toxicity is one of the important issues in chemical safety. Traditional laboratory testing methods are costly and time-consuming with raised ethical issues. Only a few in silico models have been reported to predict human reproductive toxicity, but none of them make full use of the topological information of compounds. In addition, most existing atom-based graph neural network methods focus on attributing model predictions to individual nodes or edges rather than chemically meaningful fragments or substructures. In current studies, we develop a novel fragment-based graph transformer network (FGTN) approach to generate the QSAR model of human reproductive toxicity by considering internal topological structure information of compounds. In the FGTN model, the compound is represented by a graph architecture using fragments to be nodes and bonds linking two fragments to be edges. A super molecule-level node is further proposed to connect all fragment nodes by undirected edges, obtaining global molecular features from fragment embeddings. The FGTN model achieved an accuracy (ACC) of 0.861 and an area under the receiver operating characteristic curve (AUC) value of 0.914 on nonredundant blind tests, outperforming traditional fingerprint-based machine learning models and atom-based GCN model. The FGTN model can attribute toxic predictions to fragments, generating specific structural alerts for the positive compound. Moreover, FGTN may also have the capability to distinguish various chemical isomers. We believe that FGTN can be used as a reliable and effective tool for human reproductive toxicity prediction in contribution to the advancement of chemical safety assessment.
Collapse
Affiliation(s)
- Jia-Nan Ren
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Qiang Chen
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Hong-Yu-Xiang Ye
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Cheng Cao
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
- Polytechnic Institute, Zhejiang University, 269 Shixiang Rd., Hangzhou, 310015, Zhejiang, China
| | - Ya-Min Guo
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Jin-Rong Yang
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
- Polytechnic Institute, Zhejiang University, 269 Shixiang Rd., Hangzhou, 310015, Zhejiang, China
| | - Hao Wang
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Muhammad Zafar Irshad Khan
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Jian-Zhong Chen
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
2
|
Chen Y, Wan Z, Li Y, He X, Wei X, Han J. Graph Curvature Flow-Based Masked Attention. J Chem Inf Model 2024; 64:8153-8163. [PMID: 39443864 DOI: 10.1021/acs.jcim.4c01616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2024]
Abstract
Graph neural networks (GNNs) have revolutionized drug discovery in chemistry and biology, enhancing efficiency and reducing resource demands. However, classical GNNs often struggle to capture long-range dependencies due to challenges like oversmoothing and oversquashing. Graph Transformers address these issues by employing global self-attention mechanisms that allow direct information exchange between any pair of nodes, enabling the modeling of long-range interactions. Despite this, Graph Transformers often face difficulties in capturing the nuanced structural information on graphs. To overcome these challenges, we introduce the CurvFlow-Transformer, a novel graph Transformer model incorporating a curvature flow-based masked attention mechanism. By leveraging a topologically enhanced mask matrix, the attention layer can effectively detect subtle structural differences within graphs, balancing the focus between global mutual information and local structural details of molecules. The CurvFlow-Transformer demonstrates superior performance on the MoleculeNet data set, surpassing several state-of-the-art models across various tasks. Moreover, the model provides unique insights into the relationship between molecular structure and chemical properties by analyzing the attention heat coefficients of individual atoms.
Collapse
Affiliation(s)
- Yili Chen
- The College of Computer and Cyber Security, Fujian Normal University, Fuzhou 350117, China
| | - Zheng Wan
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Chemistry and Molecular Engineering, East China Normal University, 500 Dongchuan Road, Shanghai 200062, China
| | - Yangyang Li
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Xiao He
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Chemistry and Molecular Engineering, East China Normal University, 500 Dongchuan Road, Shanghai 200062, China
- Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing 401120, China
- New York University-East China Normal University Center for Computational Chemistry, School of Chemistry and Molecular Engineering, New York University Shanghai, Shanghai 200062, China
| | - Xian Wei
- MoE Engineering Research Center of Hardware/Software Co-Design Technology and Application, East China Normal University, Zhongshan North Road 3663, Shanghai 200062, China
| | - Jun Han
- The College of Computer and Cyber Security, Fujian Normal University, Fuzhou 350117, China
- Quanzhou Institute of Equipment Manufacturing, Fujian Institute of Research on the Structure of Matter, Chinese Academy of Sciences, Quanzhou 362216, China
| |
Collapse
|
3
|
Xu Y, Liu X, Xia W, Ge J, Ju CW, Zhang H, Zhang JZH. ChemXTree: A Feature-Enhanced Graph Neural Network-Neural Decision Tree Framework for ADMET Prediction. J Chem Inf Model 2024. [PMID: 39497657 DOI: 10.1021/acs.jcim.4c01186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2024]
Abstract
The rapid progression of machine learning, especially deep learning (DL), has catalyzed a new era in drug discovery, introducing innovative approaches for predicting molecular properties. Despite the many methods available for feature representation, efficiently utilizing rich, high-dimensional information remains a significant challenge. Our work introduces ChemXTree, a novel graph-based model that integrates a Gate Modulation Feature Unit (GMFU) and neural decision tree (NDT) in the output layer to address this challenge. Extensive evaluations on benchmark data sets, including MoleculeNet and eight additional drug databases, have demonstrated ChemXTree's superior performance, surpassing or matching the current state-of-the-art models. Visualization techniques clearly demonstrate that ChemXTree significantly improves the separation between substrates and nonsubstrates in the latent space. In summary, ChemXTree demonstrates a promising approach for integrating advanced feature extraction with neural decision trees, offering significant improvements in predictive accuracy for drug discovery tasks and opening new avenues for optimizing molecular properties.
Collapse
Affiliation(s)
- Yuzhi Xu
- Shanghai Frontiers Science Center of Artificial Intelligence and Deep Learning and NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Xinxin Liu
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Department of Materials Science and Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| | - Wei Xia
- Shanghai Frontiers Science Center of Artificial Intelligence and Deep Learning and NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Jiankai Ge
- Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Cheng-Wei Ju
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60615, United States
| | - Haiping Zhang
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen 518055, China
| | - John Z H Zhang
- Shanghai Frontiers Science Center of Artificial Intelligence and Deep Learning and NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
- Department of Chemistry, New York University, New York, New York 10003, United States
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen 518055, China
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, 200062 Shanghai, China
| |
Collapse
|
4
|
Lin M, Cai J, Wei Y, Peng X, Luo Q, Li B, Chen Y, Wang L. MalariaFlow: A comprehensive deep learning platform for multistage phenotypic antimalarial drug discovery. Eur J Med Chem 2024; 277:116776. [PMID: 39173285 DOI: 10.1016/j.ejmech.2024.116776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 07/31/2024] [Accepted: 08/01/2024] [Indexed: 08/24/2024]
Abstract
Malaria remains a significant global health challenge due to the growing drug resistance of Plasmodium parasites and the failure to block transmission within human host. While machine learning (ML) and deep learning (DL) methods have shown promise in accelerating antimalarial drug discovery, the performance of deep learning models based on molecular graph and other co-representation approaches warrants further exploration. Current research has overlooked mutant strains of the malaria parasite with varying degrees of sensitivity or resistance, and has not covered the prediction of inhibitory activities across the three major life cycle stages (liver, asexual blood, and gametocyte) within the human host, which is crucial for both treatment and transmission blocking. In this study, we manually curated a benchmark antimalarial activity dataset comprising 407,404 unique compounds and 410,654 bioactivity data points across ten Plasmodium phenotypes and three stages. The performance was systematically compared among two fingerprint-based ML models (RF::Morgan and XGBoost:Morgan), four graph-based DL models (GCN, GAT, MPNN, and Attentive FP), and three co-representations DL models (FP-GNN, HiGNN, and FG-BERT), which reveal that: 1) The FP-GNN model achieved the best predictive performance, outperforming the other methods in distinguishing active and inactive compounds across balanced, more positive, and more negative datasets, with an overall AUROC of 0.900; 2) Fingerprint-based ML models outperformed graph-based DL models on large datasets (>1000 compounds), but the three co-representations DL models were able to incorporate domain-specific chemical knowledge to bridge this gap, achieving better predictive performance. These findings provide valuable guidance for selecting appropriate ML and DL methods for antimalarial activity prediction tasks. The interpretability analysis of the FP-GNN model revealed its ability to accurately capture the key structural features responsible for the liver- and blood-stage activities of the known antimalarial drug atovaquone. Finally, we developed a web server, MalariaFlow, incorporating these high-quality models for antimalarial activity prediction, virtual screening, and similarity search, successfully predicting novel triple-stage antimalarial hits validated through experimental testing, demonstrating its effectiveness and value in discovering potential multistage antimalarial drug candidates.
Collapse
Affiliation(s)
- Mujie Lin
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Junxi Cai
- School of Civil Engineering and Transportation, South China University of Technology, Guangzhou, 510006, China
| | - Yuancheng Wei
- School of Software Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Xinru Peng
- School of Software Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Qianhui Luo
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Biaoshun Li
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yihao Chen
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Ling Wang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China.
| |
Collapse
|
5
|
Ahmad W, Chong KT, Tayara H. GGAS2SN: Gated Graph and SmilesToSeq Network for Solubility Prediction. J Chem Inf Model 2024; 64:7833-7843. [PMID: 39387596 DOI: 10.1021/acs.jcim.4c00792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Aqueous solubility is a critical physicochemical property of drug discovery. Solubility is a key issue in pharmaceutical development because it can limit a drug's absorption capacity. Accurate solubility prediction is crucial for pharmacological, environmental, and drug development studies. This research introduces a novel method for solubility prediction by combining gated graph neural networks (GGNNs) and graph attention neural networks (GATs) with Smiles2Seq encoding. Our methodology involves converting chemical compounds into graph structures with nodes representing atoms and edges indicating chemical bonds. These graphs are then processed by using a specialized graph neural network (GNN) architecture. Incorporating attention mechanisms into GNN allows for capturing subtle structural dependencies, fostering improved solubility predictions. Furthermore, we utilized the Smiles2Seq encoding technique to bridge the semantic gap between molecular structures and their textual representations. Smiles2Seq seamlessly converts chemical notations into numeric sequences, facilitating the efficient transfer of information into our model. We demonstrate the efficacy of our approach through comprehensive experiments on benchmark solubility data sets, showcasing superior predictive performance compared to traditional methods. Our model outperforms existing solubility prediction models and provides interpretable insights into the molecular features driving solubility behavior. This research signifies an important advancement in solubility prediction, offering potent tools for drug discovery, formulation development, and environmental assessments. The fusion of GGNN and Smiles2Seq encoding establishes a robust framework for accurately forecasting solubility across various chemical compounds, fostering innovation in various domains reliant on solubility data.
Collapse
Affiliation(s)
- Waqar Ahmad
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Korea
| |
Collapse
|
6
|
Han Z, Xia Z, Xia J, Tetko IV, Wu S. The state-of-the-art machine learning model for Plasma Protein Binding Prediction: computational modeling with OCHEM and experimental validation. Eur J Pharm Sci 2024; 204:106946. [PMID: 39490636 DOI: 10.1016/j.ejps.2024.106946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 10/18/2024] [Accepted: 10/23/2024] [Indexed: 11/05/2024]
Abstract
Plasma protein binding (PPB) is closely related to pharmacokinetics, pharmacodynamics and drug toxicity. Existing models for predicting PPB often suffer from low prediction accuracy and poor interpretability, especially for high PPB compounds, and are most often not experimentally validated. Here, we carried out a strict data curation protocol, and applied consensus modeling to obtain a model with a coefficient of determination of 0.90 and 0.91 on the training set and the test set, respectively. This model (available on the OCHEM platform https://ochem.eu/article/29) was further retrospectively validated for a set of 63 poly-fluorinated molecules and prospectively validated for a set of 25 highly diverse compounds, and its performance for both these sets was superior to that of the other previously reported models. Furthermore, we identified the physicochemical and structural characteristics of high and low PPB molecules for further structural optimization. Finally, we provide practical and detailed recommendations for structural optimization to decrease PPB binding of lead compounds.
Collapse
Affiliation(s)
- Zunsheng Han
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Zhonghua Xia
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Jie Xia
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China.
| | - Igor V Tetko
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; BIGCHEM GmbH, Valerystr. 49, 85716 Unterschleißheim, Germany.
| | - Song Wu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China.
| |
Collapse
|
7
|
García-Ortegón M, Seal S, Rasmussen C, Bender A, Bacallado S. Graph neural processes for molecules: an evaluation on docking scores and strategies to improve generalization. J Cheminform 2024; 16:115. [PMID: 39443970 PMCID: PMC11515514 DOI: 10.1186/s13321-024-00904-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 09/13/2024] [Indexed: 10/25/2024] Open
Abstract
Neural processes (NPs) are models for meta-learning which output uncertainty estimates. So far, most studies of NPs have focused on low-dimensional datasets of highly-correlated tasks. While these homogeneous datasets are useful for benchmarking, they may not be representative of realistic transfer learning. In particular, applications in scientific research may prove especially challenging due to the potential novelty of meta-testing tasks. Molecular property prediction is one such research area that is characterized by sparse datasets of many functions on a shared molecular space. In this paper, we study the application of graph NPs to molecular property prediction with DOCKSTRING, a diverse dataset of docking scores. Graph NPs show competitive performance in few-shot learning tasks relative to supervised learning baselines common in chemoinformatics, as well as alternative techniques for transfer learning and meta-learning. In order to increase meta-generalization to divergent test functions, we propose fine-tuning strategies that adapt the parameters of NPs. We find that adaptation can substantially increase NPs' regression performance while maintaining good calibration of uncertainty estimates. Finally, we present a Bayesian optimization experiment which showcases the potential advantages of NPs over Gaussian processes in iterative screening. Overall, our results suggest that NPs on molecular graphs hold great potential for molecular property prediction in the low-data setting. SCIENTIFIC CONTRIBUTION: Neural processes are a family of meta-learning algorithms which deal with data scarcity by transferring information across tasks and making probabilistic predictions. We evaluate their performance on regression and optimization molecular tasks using docking scores, finding them to outperform classical single-task and transfer-learning models. We examine the issue of generalization to divergent test tasks, which is a general concern of meta-learning algorithms in science, and propose strategies to alleviate it.
Collapse
Affiliation(s)
- Miguel García-Ortegón
- Statistical Laboratory, University of Cambridge, Wilberforce Rd, Cambridge, CB3 0WA, UK.
- Department of Engineering, University of Cambridge, Trumpington St, Cambridge, CB2 1PZ, UK.
- Department of Chemistry, University of Cambridge, Lensfield Rd, Cambridge, CB2 1EW, UK.
| | - Srijit Seal
- Imaging Platform, Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA, 02142, USA
| | - Carl Rasmussen
- Department of Engineering, University of Cambridge, Trumpington St, Cambridge, CB2 1PZ, UK
| | - Andreas Bender
- Department of Chemistry, University of Cambridge, Lensfield Rd, Cambridge, CB2 1EW, UK
| | - Sergio Bacallado
- Statistical Laboratory, University of Cambridge, Wilberforce Rd, Cambridge, CB3 0WA, UK
| |
Collapse
|
8
|
Suo Y, Qian X, Xiong Z, Liu X, Wang C, Mu B, Wu X, Lu W, Cui M, Liu J, Chen Y, Zheng M, Lu X. Enhancing the Predictive Power of Machine Learning Models through a Chemical Space Complementary DEL Screening Strategy. J Med Chem 2024. [PMID: 39441849 DOI: 10.1021/acs.jmedchem.4c01416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2024]
Abstract
DNA-encoded library (DEL) technology is an effective method for small molecule drug discovery, enabling high-throughput screening against target proteins. While DEL screening produces extensive data, it can reveal complex patterns not easily recognized by human analysis. Lead compounds from DEL screens often have higher molecular weights, posing challenges for drug development. This study refines traditional DELs by integrating alternative techniques like photocross-linking screening to enhance chemical diversity. Combining these methods improved predictive performance for small molecule identification models. Using this approach, we predicted active small molecules for BRD4 and p300, achieving hit rates of 26.7 and 35.7%. Notably, the identified compounds exhibit smaller molecular weights and better modification potential compared to traditional DEL molecules. This research demonstrates the synergy between DEL and AI technologies, enhancing drug discovery.
Collapse
Affiliation(s)
- Yanrui Suo
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 501 Haike Road, Zhang Jiang Hi-Tech Park, Pudong, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xu Qian
- DEL Department, Suzhou Alphama Biotechnology Co., Ltd., Suzhou 215125,China
| | - Zhaoping Xiong
- Technology Development Department, Suzhou Alphama Biotechnology Co., Ltd., Suzhou 215125,China
| | - Xiaohong Liu
- Technology Development Department, Suzhou Alphama Biotechnology Co., Ltd., Suzhou 215125,China
| | - Chao Wang
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 501 Haike Road, Zhang Jiang Hi-Tech Park, Pudong, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Baiyang Mu
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 501 Haike Road, Zhang Jiang Hi-Tech Park, Pudong, Shanghai 201203, China
- Shandong Second Medical University, Weifang 261053, China
| | - Xinyuan Wu
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 501 Haike Road, Zhang Jiang Hi-Tech Park, Pudong, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Weiwei Lu
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 501 Haike Road, Zhang Jiang Hi-Tech Park, Pudong, Shanghai 201203, China
| | - Meiying Cui
- DEL Department, Suzhou Alphama Biotechnology Co., Ltd., Suzhou 215125,China
| | - Jiaxiang Liu
- DEL Department, Suzhou Alphama Biotechnology Co., Ltd., Suzhou 215125,China
| | - Yujie Chen
- DEL Department, Suzhou Alphama Biotechnology Co., Ltd., Suzhou 215125,China
| | - Mingyue Zheng
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 501 Haike Road, Zhang Jiang Hi-Tech Park, Pudong, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xiaojie Lu
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 501 Haike Road, Zhang Jiang Hi-Tech Park, Pudong, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| |
Collapse
|
9
|
Collins JW, Ebrahimkhani M, Ramirez D, Deiloff J, Gonzalez M, Abedi M, Philippe-Venec L, Cole BM, Moore B, Nwankwo JO. Attentive graph neural network models for the prediction of blood brain barrier permeability. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.12.617907. [PMID: 39463958 PMCID: PMC11507759 DOI: 10.1101/2024.10.12.617907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
The blood brain barrier's (BBB) unique endothelial cells and tight junctions selectively regulate passage of molecules to the central nervous system (CNS) to prevent pathogen entry and maintain neural homeostasis. Various neurological conditions and neurodegenerative diseases benefit from small molecules capable of BBB penetration (BBBP) to elicit a therapeutic effect. Predicting BBBP often involves in silico assessment of molecular properties such as lipophilicity (log P ) and polar surface area (PSA) using the CNS multiparameter optimization (MPO) method. This study curated an open-source dataset to benchmark rigorously machine learning (ML) and neural network (NN) models with each other and with MPO for predicting BBBP. Our analysis demonstrated that AI models, especially attentive NNs using stereochemical features, significantly outperform MPO in predicting BBBP. An attentive graph neural network (GNN), we refer to as CANDID-CNS™, achieved a 0.23-0.26 higher AUROC score than MPO on full test sets, and a 0.17-0.19 higher score on stereoisomers filtered subsets. Regarding stereoisomers that differ in BBBP, which MPO cannot distinguish, attentive GNNs correctly classify these with AUROC and MCC metrics comparable to or better than MPO's AUROC and MCC on less difficult test molecules. These findings suggest that integrating attentive GNN models into pharmaceutical drug discovery processes can substantially improve prediction rates, and thereby reduce the timeline, cost, and increase probability of success of designing brain penetrant therapeutics for the treatment of a wide variety of neurological and neurodegenerative diseases.
Collapse
|
10
|
Gao J, Shen Z, Lu Y, Shen L, Zhou B, Xu D, Dai H, Xu L, Che J, Dong X. KnoMol: A Knowledge-Enhanced Graph Transformer for Molecular Property Prediction. J Chem Inf Model 2024; 64:7337-7348. [PMID: 39323109 DOI: 10.1021/acs.jcim.4c01092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
Molecular property prediction (MPP) techniques are pivotal in reducing drug development costs by preemptively predicting bioactivity and ADMET properties. Despite the application of numerous deep learning approaches, enhancing the representational capacity of these models remains a significant challenge. This paper presents a novel knowledge-based Transformer framework, KnoMol, designed to improve the understanding of molecular structures. KnoMol integrates expert chemical knowledge into the Transformer, emulating the analytical methods of medicinal chemists. Additionally, the multiperspective attention mechanism provides a more precise way to represent ring systems. In the evaluation experiments, KnoMol achieved state-of-the-art performance on both MoleculeNet and small-scale data sets, surpassing existing models in terms of accuracy and generalization. Further research indicated that the incorporation of knowledge significantly reduces KnoMol's reliance on data volumes, offering a solution to the challenge of data scarcity. Moreover, KnoMol identified several new inhibitors of HER2 in a case study, demonstrating its value in real-world applications. Overall, this research not only provides a powerful tool for MPP but also serves as a successful precedent for embedding knowledge into Transformers, with positive implications for computer-aided drug discovery and the development of MPP algorithms.
Collapse
Affiliation(s)
- Jian Gao
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Center for AI and Intelligent Medicine, Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou 310018, China
| | - Zheyuan Shen
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yan Lu
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310058, China
| | - Liteng Shen
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Binbin Zhou
- Department of Computer Science and Computing, Zhejiang University City College, Hangzhou 310015, China
| | - Donghang Xu
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310058, China
| | - Haibin Dai
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310058, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Jinxin Che
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xiaowu Dong
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
11
|
Wang K, Huang Y, Wang Y, You Q, Wang L. Recent advances from computer-aided drug design to artificial intelligence drug design. RSC Med Chem 2024:d4md00522h. [PMID: 39493228 PMCID: PMC11523840 DOI: 10.1039/d4md00522h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 10/09/2024] [Indexed: 11/05/2024] Open
Abstract
Computer-aided drug design (CADD), a cornerstone of modern drug discovery, can predict how a molecular structure relates to its activity and interacts with its target using structure-based and ligand-based methods. Fueled by ever-increasing data availability and continuous model optimization, artificial intelligence drug design (AIDD), as an enhanced iteration of CADD, has thrived in the past decade. AIDD demonstrates unprecedented opportunities in protein folding, property prediction, and molecular generation. It can also facilitate target identification, high-throughput screening (HTS), and synthetic route prediction. With AIDD involved, the process of drug discovery is greatly accelerated. Notably, AIDD offers the potential to explore uncharted territories of chemical space beyond current knowledge. In this perspective, we began by briefly outlining the main workflows and components of CADD. Then through showcasing exemplary cases driven by AIDD in recent years, we describe the evolving role of artificial intelligence (AI) in drug discovery from three distinct stages, that is, chemical library screening, linker generation, and de novo molecular generation. In this process, we attempted to draw comparisons between the features of CADD and AIDD.
Collapse
Affiliation(s)
- Keran Wang
- State Key Laboratory of Natural Medicines and, Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University Nanjing 210009 China +86 025 83271351 +86 15261483858
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University Nanjing 210009 China
| | - Yanwen Huang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University Beijing 100191 China
| | - Yan Wang
- Department of Urology, Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine Shanghai 201203 China +86 13122152007
| | - Qidong You
- State Key Laboratory of Natural Medicines and, Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University Nanjing 210009 China +86 025 83271351 +86 15261483858
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University Nanjing 210009 China
| | - Lei Wang
- State Key Laboratory of Natural Medicines and, Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University Nanjing 210009 China +86 025 83271351 +86 15261483858
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University Nanjing 210009 China
| |
Collapse
|
12
|
Srivastava P, Steuer A, Ferri F, Nicoli A, Schultz K, Bej S, Di Pizio A, Wolkenhauer O. Bitter peptide prediction using graph neural networks. J Cheminform 2024; 16:111. [PMID: 39375808 PMCID: PMC11459932 DOI: 10.1186/s13321-024-00909-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 09/22/2024] [Indexed: 10/09/2024] Open
Abstract
Bitter taste is an unpleasant taste modality that affects food consumption. Bitter peptides are generated during enzymatic processes that produce functional, bioactive protein hydrolysates or during the aging process of fermented products such as cheese, soybean protein, and wine. Understanding the underlying peptide sequences responsible for bitter taste can pave the way for more efficient identification of these peptides. This paper presents BitterPep-GCN, a feature-agnostic graph convolution network for bitter peptide prediction. The graph-based model learns the embedding of amino acids in the bitter peptide sequences and uses mixed pooling for bitter classification. BitterPep-GCN was benchmarked using BTP640, a publicly available bitter peptide dataset. The latent peptide embeddings generated by the trained model were used to analyze the activity of sequence motifs responsible for the bitter taste of the peptides. Particularly, we calculated the activity for individual amino acids and dipeptide, tripeptide, and tetrapeptide sequence motifs present in the peptides. Our analyses pinpoint specific amino acids, such as F, G, P, and R, as well as sequence motifs, notably tripeptide and tetrapeptide motifs containing FF, as key bitter signatures in peptides. This work not only provides a new predictor of bitter taste for a more efficient identification of bitter peptides in various food products but also gives a hint into the molecular basis of bitterness.Scientific ContributionOur work provides the first application of Graph Neural Networks for the prediction of peptide bitter taste. The best-developed model, BitterPep-GCN, learns the embedding of amino acids in the bitter peptide sequences and uses mixed pooling for bitter classification. The embeddings were used to analyze the sequence motifs responsible for the bitter taste.
Collapse
Affiliation(s)
- Prashant Srivastava
- Institute of Computer Science, University of Rostock, 18051, Rostock, Germany
| | - Alexandra Steuer
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Francesco Ferri
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Alessandro Nicoli
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Kristian Schultz
- Institute of Computer Science, University of Rostock, 18051, Rostock, Germany
| | - Saptarshi Bej
- Indian Institute of Science Education and Research Thiruvananthapuram, Maruthamala P. O, Vithura, 695551, Kerala, India
| | - Antonella Di Pizio
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany.
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany.
| | - Olaf Wolkenhauer
- Institute of Computer Science, University of Rostock, 18051, Rostock, Germany.
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany.
| |
Collapse
|
13
|
Spiekermann KA, Dong X, Menon A, Green WH, Pfeifle M, Sandfort F, Welz O, Bergeler M. Accurately Predicting Barrier Heights for Radical Reactions in Solution Using Deep Graph Networks. J Phys Chem A 2024; 128:8384-8403. [PMID: 39298746 DOI: 10.1021/acs.jpca.4c04121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
Quantitative estimates of reaction barriers and solvent effects are essential for developing kinetic mechanisms and predicting reaction outcomes. Here, we create a new data set of 5,600 unique elementary radical reactions calculated using the M06-2X/def2-QZVP//B3LYP-D3(BJ)/def2-TZVP level of theory. A conformer search is done for each species using TPSS/def2-TZVP. Gibbs free energies of activation and of reaction for these radical reactions in 40 common solvents are obtained using COSMO-RS for solvation effects. These balanced reactions involve the elements H, C, N, O, and S, contain up to 19 heavy atoms, and have atom-mapped SMILES. All transition states are verified by an intrinsic reaction coordinate calculation. We next train a deep graph network to directly estimate the Gibbs free energy of activation and of reaction in both gas and solution phases using only the atom-mapped SMILES of the reactant and product and the SMILES of the solvent. This simple input representation avoids computationally expensive optimizations for the reactant, transition state, and product structures during inference, making our model well-suited for high-throughput predictive chemistry and quickly providing information for (retro-)synthesis planning tools. To properly measure model performance, we report results on both interpolative and extrapolative data splits and also compare to several baseline models. During training and testing, the data set is augmented by including the reverse direction of each reaction and variants with different resonance structures. After data augmentation, we have around 2 million entries to train the model, which achieves a testing set mean absolute error of 1.16 kcal mol-1 for the Gibbs free energy of activation in solution. We anticipate this model will accelerate predictions for high-throughput screening to quickly identify relevant reactions in solution, and our data set will serve as a benchmark for future studies.
Collapse
Affiliation(s)
- Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Xiaorui Dong
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Angiras Menon
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Mark Pfeifle
- BASF Digital Solutions GmbH, Ludwigshafen am Rhein 67061, Germany
| | - Frederik Sandfort
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Oliver Welz
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Maike Bergeler
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| |
Collapse
|
14
|
Sun M, Fu C, Su H, Xiao R, Shi C, Lu Z, Pu X. Enhancing chemistry-intuitive feature learning to improve prediction performance of optical properties. Chem Sci 2024:d4sc02781g. [PMID: 39381129 PMCID: PMC11457255 DOI: 10.1039/d4sc02781g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 09/22/2024] [Indexed: 10/10/2024] Open
Abstract
Emitters have been widely applied in versatile fields, dependent on their optical properties. Thus, it is of great importance to explore a quick and accurate prediction method for optical properties. To this end, we have developed a state-of-the-art deep learning (DL) framework by enhancing chemistry-intuitive subgraph and edge learning and coupling this with prior domain knowledge for a classic message passing neural network (MPNN) which can better capture the structural features associated with the optical properties from a limited dataset. Benefiting from technical advantages, our model significantly outperforms eight competitive ML models used in five different optical datasets, achieving the highest accuracy to date in predicting four important optical properties (absorption wavelength, emission wavelength, photoluminescence quantum yield and full width at half-maximum), showcasing its robustness and generalization. More importantly, based on our predicted results, one new deep-blue light-emitting molecule PPI-2TPA was successfully synthesized and characterized, which exhibits close consistency with our predictions, clearly confirming the application potential of our model as a quick and reliable prediction tool for the optical properties of diverse emitters in practice.
Collapse
Affiliation(s)
- Ming Sun
- College of Chemistry, Sichuan University Chengdu 610064 People's Republic of China
| | - Caixia Fu
- College of Chemistry, Sichuan University Chengdu 610064 People's Republic of China
| | - Haoming Su
- College of Chemistry, Sichuan University Chengdu 610064 People's Republic of China
| | - Ruyue Xiao
- College of Chemistry, Sichuan University Chengdu 610064 People's Republic of China
| | - Chaojie Shi
- College of Chemistry, Sichuan University Chengdu 610064 People's Republic of China
| | - Zhiyun Lu
- College of Chemistry, Sichuan University Chengdu 610064 People's Republic of China
| | - Xuemei Pu
- College of Chemistry, Sichuan University Chengdu 610064 People's Republic of China
| |
Collapse
|
15
|
Jin Y, Wang Z, Dong M, Sun P, Chi W. Data-driven machine learning models for predicting the maximum absorption and emission wavelengths of single benzene fluorophores. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 326:125213. [PMID: 39332172 DOI: 10.1016/j.saa.2024.125213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 09/18/2024] [Accepted: 09/23/2024] [Indexed: 09/29/2024]
Abstract
Single benzene fluorophores (SBFs) have garnered significant research attention due to their ease of preparation, seamless diffusion into biological samples, and low molecular weight. Accurately predicting the molecular photophysical properties, specifically the maximum absorption and emission wavelengths, is pivotal in advancing functional SBFs. In this study, we introduce a machine-learning model to estimate the maximum absorption and emission wavelengths of SBFs precisely. This model leverages a Full Connect Neural Network and computational chemistry and is tailored to address the challenges associated with a relatively small dataset (81 SBFs). Remarkably, our model (SBFs-ML) demonstrates impressive accuracy, yielding a mean relative error of 1.54 % and 2.93 % for SBFs' maximum absorption and emission wavelengths, respectively. Importantly, the SBFs-ML was bullied based on only three descriptors, resulting in strong interpretability. Experimental results have strongly corroborated these predictions. Our prediction methods are poised to facilitate significantly the efficient design and creation of SBFs.
Collapse
Affiliation(s)
- Yongshi Jin
- School of Cyberspace Security, Hainan University, Haikou 570228, China; School of Chemistry and Chemical Engineering, Hainan University, Haikou 570228, China
| | - Zhaohe Wang
- School of Cyberspace Security, Hainan University, Haikou 570228, China; School of Chemistry and Chemical Engineering, Hainan University, Haikou 570228, China
| | - Miao Dong
- School of Chemistry and Chemical Engineering, Hainan University, Haikou 570228, China
| | - Pingping Sun
- School of Chemistry and Chemical Engineering, Hainan University, Haikou 570228, China.
| | - Weijie Chi
- School of Chemistry and Chemical Engineering, Hainan University, Haikou 570228, China.
| |
Collapse
|
16
|
He G, Liu S, Liu Z, Wang C, Zhang K, Li H. Prototype-based contrastive substructure identification for molecular property prediction. Brief Bioinform 2024; 25:bbae565. [PMID: 39494969 PMCID: PMC11533112 DOI: 10.1093/bib/bbae565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 08/11/2024] [Accepted: 10/22/2024] [Indexed: 11/05/2024] Open
Abstract
Substructure-based representation learning has emerged as a powerful approach to featurize complex attributed graphs, with promising results in molecular property prediction (MPP). However, existing MPP methods mainly rely on manually defined rules to extract substructures. It remains an open challenge to adaptively identify meaningful substructures from numerous molecular graphs to accommodate MPP tasks. To this end, this paper proposes Prototype-based cOntrastive Substructure IdentificaTion (POSIT), a self-supervised framework to autonomously discover substructural prototypes across graphs so as to guide end-to-end molecular fragmentation. During pre-training, POSIT emphasizes two key aspects of substructure identification: firstly, it imposes a soft connectivity constraint to encourage the generation of topologically meaningful substructures; secondly, it aligns resultant substructures with derived prototypes through a prototype-substructure contrastive clustering objective, ensuring attribute-based similarity within clusters. In the fine-tuning stage, a cross-scale attention mechanism is designed to integrate substructure-level information to enhance molecular representations. The effectiveness of the POSIT framework is demonstrated by experimental results from diverse real-world datasets, covering both classification and regression tasks. Moreover, visualization analysis validates the consistency of chemical priors with identified substructures. The source code is publicly available at https://github.com/VRPharmer/POSIT.
Collapse
Affiliation(s)
- Gaoqi He
- School of Computer Science and Technology, East China Normal University, 200062 Shanghai, China
| | - Shun Liu
- School of Computer Science and Technology, East China Normal University, 200062 Shanghai, China
| | - Zhuoran Liu
- School of Computer Science and Technology, East China Normal University, 200062 Shanghai, China
| | - Changbo Wang
- School of Computer Science and Technology, East China Normal University, 200062 Shanghai, China
| | - Kai Zhang
- School of Computer Science and Technology, East China Normal University, 200062 Shanghai, China
| | - Honglin Li
- Innovation Center for AI and Drug Discovery, East China Normal University, 200062 Shanghai, China
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, 200237 Shanghai, China
| |
Collapse
|
17
|
Jiang X, Tan L, Zou Q. DGCL: dual-graph neural networks contrastive learning for molecular property prediction. Brief Bioinform 2024; 25:bbae474. [PMID: 39331017 PMCID: PMC11428321 DOI: 10.1093/bib/bbae474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 08/16/2024] [Accepted: 09/13/2024] [Indexed: 09/28/2024] Open
Abstract
In this paper, we propose DGCL, a dual-graph neural networks (GNNs)-based contrastive learning (CL) integrated with mixed molecular fingerprints (MFPs) for molecular property prediction. The DGCL-MFP method contains two stages. In the first pretraining stage, we utilize two different GNNs as encoders to construct CL, rather than using the method of generating enhanced graphs as before. Precisely, DGCL aggregates and enhances features of the same molecule by the Graph Isomorphism Network and the Graph Attention Network, with representations extracted from the same molecule serving as positive samples, and others marked as negative ones. In the downstream tasks training stage, features extracted from the two above pretrained graph networks and the meticulously selected MFPs are concated together to predict molecular properties. Our experiments show that DGCL enhances the performance of existing GNNs by achieving or surpassing the state-of-the-art self-supervised learning models on multiple benchmark datasets. Specifically, DGCL increases the average performance of classification tasks by 3.73$\%$ and improves the performance of regression task Lipo by 0.126. Through ablation studies, we validate the impact of network fusion strategies and MFPs on model performance. In addition, DGCL's predictive performance is further enhanced by weighting different molecular features based on the Extended Connectivity Fingerprint. The code and datasets of DGCL will be made publicly available.
Collapse
Affiliation(s)
- Xiuyu Jiang
- School of Computer Science and Engineering, Sun Yat-sen University, Waihuan East Street, Guangzhou 510006, China
| | - Liqin Tan
- School of Computer Science and Engineering, Sun Yat-sen University, Waihuan East Street, Guangzhou 510006, China
| | - Qingsong Zou
- School of Computer Science and Engineering, Sun Yat-sen University, Waihuan East Street, Guangzhou 510006, China
| |
Collapse
|
18
|
Fang J, Tang Y, Gong C, Huang Z, Feng Y, Liu G, Tang Y, Li W. Prediction of Cytochrome P450 Substrates Using the Explainable Multitask Deep Learning Models. Chem Res Toxicol 2024; 37:1535-1548. [PMID: 39196814 DOI: 10.1021/acs.chemrestox.4c00199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2024]
Abstract
Cytochromes P450 (P450s or CYPs) are the most important phase I metabolic enzymes in the human body and are responsible for metabolizing ∼75% of the clinically used drugs. P450-mediated metabolism is also closely associated with the formation of toxic metabolites and drug-drug interactions. Therefore, it is of high importance to predict if a compound is the substrate of a given P450 in the early stage of drug development. In this study, we built the multitask learning models to simultaneously predict the substrates of five major drug-metabolizing P450 enzymes, namely, CYP3A4, 2C9, 2C19, 2D6, and 1A2, based on the collected substrate data sets. Compared to the single-task model and conventional machine learning models, the multitask fingerprints and graph neural networks model achieved superior performance with the average AUC values of 90.8% on the test set. Notably, the multitask model demonstrated its good performance on the small amount of substrate data sets such as CYP1A2, 2C9, and 2C19. In addition, the Shapley additive explanation and the attention mechanism were used to reveal specific substructures associated with P450 substrates, which were further confirmed and complemented by the substructure mining tool and the literature.
Collapse
Affiliation(s)
- Jiaojiao Fang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yan Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Changda Gong
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zejun Huang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yanjun Feng
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
19
|
Madushanka A, Laird E, Clark C, Kraka E. SmartCADD: AI-QM Empowered Drug Discovery Platform with Explainability. J Chem Inf Model 2024; 64:6799-6813. [PMID: 39177478 DOI: 10.1021/acs.jcim.4c00720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2024]
Abstract
Artificial intelligence (AI) has emerged as a pivotal force in enhancing productivity across various sectors, with its impact being profoundly felt within the pharmaceutical and biotechnology domains. Despite AI's rapid adoption, its integration into scientific research faces resistance due to myriad challenges: the opaqueness of AI models, the intricate nature of their implementation, and the issue of data scarcity. In response to these impediments, we introduce SmartCADD, an innovative, open-source virtual screening platform that combines deep learning, computer-aided drug design (CADD), and quantum mechanics methodologies within a user-friendly Python framework. SmartCADD is engineered to streamline the construction of comprehensive virtual screening workflows that incorporate a variety of formerly independent techniques─spanning ADMET property predictions, de novo 2D and 3D pharmacophore modeling, molecular docking, to the integration of explainable AI mechanisms. This manuscript highlights the foundational principles, key functionalities, and the unique integrative approach of SmartCADD. Furthermore, we demonstrate its efficacy through a case study focused on the identification of promising lead compounds for HIV inhibition. By democratizing access to advanced AI and quantum mechanics tools, SmartCADD stands as a catalyst for progress in pharmaceutical research and development, heralding a new era of innovation and efficiency.
Collapse
Affiliation(s)
- Ayesh Madushanka
- Department of Chemistry, Southern Methodist University, Dallas, Texas 75205, United States
| | - Eli Laird
- Department of Computer Science, Southern Methodist University, Dallas, Texas 75205, United States
| | - Corey Clark
- Department of Computer Science, Southern Methodist University, Dallas, Texas 75205, United States
| | - Elfi Kraka
- Department of Chemistry, Southern Methodist University, Dallas, Texas 75205, United States
| |
Collapse
|
20
|
Boonyarit B, Yamprasert N, Kaewnuratchadasorn P, Kinchagawat J, Prommin C, Rungrotmongkol T, Nutanong S. GraphEGFR: Multi-task and transfer learning based on molecular graph attention mechanism and fingerprints improving inhibitor bioactivity prediction for EGFR family proteins on data scarcity. J Comput Chem 2024; 45:2001-2023. [PMID: 38713612 DOI: 10.1002/jcc.27388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/16/2024] [Accepted: 04/19/2024] [Indexed: 05/09/2024]
Abstract
The proteins within the human epidermal growth factor receptor (EGFR) family, members of the tyrosine kinase receptor family, play a pivotal role in the molecular mechanisms driving the development of various tumors. Tyrosine kinase inhibitors, key compounds in targeted therapy, encounter challenges in cancer treatment due to emerging drug resistance mutations. Consequently, machine learning has undergone significant evolution to address the challenges of cancer drug discovery related to EGFR family proteins. However, the application of deep learning in this area is hindered by inherent difficulties associated with small-scale data, particularly the risk of overfitting. Moreover, the design of a model architecture that facilitates learning through multi-task and transfer learning, coupled with appropriate molecular representation, poses substantial challenges. In this study, we introduce GraphEGFR, a deep learning regression model designed to enhance molecular representation and model architecture for predicting the bioactivity of inhibitors against both wild-type and mutant EGFR family proteins. GraphEGFR integrates a graph attention mechanism for molecular graphs with deep and convolutional neural networks for molecular fingerprints. We observed that GraphEGFR models employing multi-task and transfer learning strategies generally achieve predictive performance comparable to existing competitive methods. The integration of molecular graphs and fingerprints adeptly captures relationships between atoms and enables both global and local pattern recognition. We further validated potential multi-targeted inhibitors for wild-type and mutant HER1 kinases, exploring key amino acid residues through molecular dynamics simulations to understand molecular interactions. This predictive model offers a robust strategy that could significantly contribute to overcoming the challenges of developing deep learning models for drug discovery with limited data and exploring new frontiers in multi-targeted kinase drug discovery for EGFR family proteins.
Collapse
Affiliation(s)
- Bundit Boonyarit
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| | - Nattawin Yamprasert
- School of Information, Computer, and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, Thailand
| | | | - Jiramet Kinchagawat
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| | - Chanatkran Prommin
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| | - Thanyada Rungrotmongkol
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence in Structural and Computational Biology Research Unit, Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Sarana Nutanong
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| |
Collapse
|
21
|
Liu X, Ai C, Yang H, Dong R, Tang J, Zheng S, Guo F. RetroCaptioner: beyond attention in end-to-end retrosynthesis transformer via contrastively captioned learnable graph representation. Bioinformatics 2024; 40:btae561. [PMID: 39342389 PMCID: PMC11520410 DOI: 10.1093/bioinformatics/btae561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 08/28/2024] [Accepted: 09/12/2024] [Indexed: 10/01/2024] Open
Abstract
MOTIVATION Retrosynthesis identifies available precursor molecules for various and novel compounds. With the advancements and practicality of language models, Transformer-based models have increasingly been used to automate this process. However, many existing methods struggle to efficiently capture reaction transformation information, limiting the accuracy and applicability of their predictions. RESULTS We introduce RetroCaptioner, an advanced end-to-end, Transformer-based framework featuring a Contrastive Reaction Center Captioner. This captioner guides the training of dual-view attention models using a contrastive learning approach. It leverages learned molecular graph representations to capture chemically plausible constraints within a single-step learning process. We integrate the single-encoder, dual-encoder, and encoder-decoder paradigms to effectively fuse information from the sequence and graph representations of molecules. This involves modifying the Transformer encoder into a uni-view sequence encoder and a dual-view module. Furthermore, we enhance the captioning of atomic correspondence between SMILES and graphs. Our proposed method, RetroCaptioner, achieved outstanding performance with 67.2% in top-1 and 93.4% in top-10 exact matched accuracy on the USPTO-50k dataset, alongside an exceptional SMILES validity score of 99.4%. In addition, RetroCaptioner has demonstrated its reliability in generating synthetic routes for the drug protokylol. AVAILABILITY AND IMPLEMENTATION The code and data are available at https://github.com/guofei-tju/RetroCaptioner.
Collapse
Affiliation(s)
- Xiaoyi Liu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 102488, China
- Ministry of Education, Engineering Research Center for Pharmaceutics of Chinese Materia Medica and New Drug Development, Beijing, 100102, China
| | - Chengwei Ai
- Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Hongpeng Yang
- Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, 29208, United States
| | - Ruihan Dong
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Jijun Tang
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518055, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Nanshan, 518055, China
| | - Shuangjia Zheng
- Global Institute of Future Technology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Fei Guo
- Computer Science and Engineering, Central South University, Changsha, 410083, China
| |
Collapse
|
22
|
Chen R, Wang Y, Shen Z, Ye C, Guo Y, Lu Y, Ding J, Dong X, Xu D, Zheng X. Discovery of potent CSK inhibitors through integrated virtual screening and molecular dynamic simulation. Arch Pharm (Weinheim) 2024; 357:e2400066. [PMID: 38809025 DOI: 10.1002/ardp.202400066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/23/2024] [Accepted: 05/08/2024] [Indexed: 05/30/2024]
Abstract
Oncogenic overexpression or activation of C-terminal Src kinase (CSK) has been shown to play an important role in triple-negative breast cancer (TNBC) progression, including tumor initiation, growth, metastasis, drug resistance. This revelation has pivoted the focus toward CSK as a potential target for novel treatments. However, until now, there are few inhibitors designed to target the CSK protein. Responding to this, our research has implemented a comprehensive virtual screening protocol. By integrating energy-based screening methods with AI-driven scoring functions, such as Attentive FP, and employing rigorous rescoring methods like Glide docking and molecular mechanics generalized Born surface area (MM/GBSA), we have systematically sought out inhibitors of CSK. This approach led to the discovery of a compound with a potent CSK inhibitory activity, reflected by an IC50 value of 1.6 nM under a homogeneous time-resolved fluorescence (HTRF) bioassay. Subsequently, molecule 2 exhibits strong growth inhibition of MD anderson - metastatic breast (MDA-MB) -231, Hs578T, and SUM159 cells, showing a level of growth inhibition comparable to that observed with dasatinib. Treatment with molecule 2 also induced significant G1 phase accumulation and cell apoptosis. Furthermore, we have explored the explicit binding interactions of the compound with CSK using molecular dynamics simulations, providing valuable insights into its mechanism of action.
Collapse
Affiliation(s)
- Roufen Chen
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of Medicine, Hangzhou City University, Hangzhou, China
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yuchen Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Zheyuan Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Chenyi Ye
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of Medicine, Hangzhou City University, Hangzhou, China
| | - Yu Guo
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yan Lu
- Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jianjun Ding
- School of Food Science and Technology, Jiangnan University, Wuxi, China
| | - Xiaowu Dong
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Donghang Xu
- Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xiaoli Zheng
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of Medicine, Hangzhou City University, Hangzhou, China
| |
Collapse
|
23
|
Liu Y, Zhang R, Yuan Y, Ma J, Li T, Yu Z. A Multi-view Molecular Pre-training with Generative Contrastive Learning. Interdiscip Sci 2024; 16:741-754. [PMID: 38710957 DOI: 10.1007/s12539-024-00632-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 03/20/2024] [Accepted: 04/06/2024] [Indexed: 05/08/2024]
Abstract
Molecular representation learning can preserve meaningful molecular structures as embedding vectors, which is a necessary prerequisite for molecular property prediction. Yet, learning how to accurately represent molecules remains challenging. Previous approaches to learning molecular representations in an end-to-end manner potentially suffered information loss while neglecting the utilization of molecular generative representations. To obtain rich molecular feature information, the pre-training molecular representation model utilized different molecular representations to reduce information loss caused by a single molecular representation. Therefore, we provide the MVGC, a unique multi-view generative contrastive learning pre-training model. Our pre-training framework specifically acquires knowledge of three fundamental feature representations of molecules and effectively integrates them to predict molecular properties on benchmark datasets. Comprehensive experiments on seven classification tasks and three regression tasks demonstrate that our proposed MVGC model surpasses the majority of state-of-the-art approaches. Moreover, we explore the potential of the MVGC model to learn the representation of molecules with chemical significance.
Collapse
Affiliation(s)
- Yunwu Liu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China.
| | - Ruisheng Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China.
| | - Yongna Yuan
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
| | - Jun Ma
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
| | - Tongfeng Li
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
| | - Zhixuan Yu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
| |
Collapse
|
24
|
Lavecchia A. Navigating the frontier of drug-like chemical space with cutting-edge generative AI models. Drug Discov Today 2024; 29:104133. [PMID: 39103144 DOI: 10.1016/j.drudis.2024.104133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 07/20/2024] [Accepted: 07/31/2024] [Indexed: 08/07/2024]
Abstract
Deep generative models (GMs) have transformed the exploration of drug-like chemical space (CS) by generating novel molecules through complex, nontransparent processes, bypassing direct structural similarity. This review examines five key architectures for CS exploration: recurrent neural networks (RNNs), variational autoencoders (VAEs), generative adversarial networks (GANs), normalizing flows (NF), and Transformers. It discusses molecular representation choices, training strategies for focused CS exploration, evaluation criteria for CS coverage, and related challenges. Future directions include refining models, exploring new notations, improving benchmarks, and enhancing interpretability to better understand biologically relevant molecular properties.
Collapse
Affiliation(s)
- Antonio Lavecchia
- 'Drug Discovery' Laboratory, Department of Pharmacy, University of Naples Federico II, I-80131 Naples, Italy.
| |
Collapse
|
25
|
Liang L, Liu Z, Yang X, Zhang Y, Liu H, Chen Y. Prediction of blood-brain barrier permeability using machine learning approaches based on various molecular representation. Mol Inform 2024; 43:e202300327. [PMID: 38864837 DOI: 10.1002/minf.202300327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 03/18/2024] [Accepted: 04/18/2024] [Indexed: 06/13/2024]
Abstract
The assessment of compound blood-brain barrier (BBB) permeability poses a significant challenge in the discovery of drugs targeting the central nervous system. Conventional experimental approaches to measure BBB permeability are labor-intensive, cost-ineffective, and time-consuming. In this study, we constructed six machine learning classification models by combining various machine learning algorithms and molecular representations. The model based on ExtraTree algorithm and random partitioning strategy obtains the best prediction result, with AUC value of 0.932±0.004 and balanced accuracy (BA) of 0.837±0.010 for the test set. We employed the SHAP method to identify important features associated with BBB permeability. In addition, matched molecular pair (MMP) analysis and representative substructure derivation method were utilized to uncover the transformation rules and distinctive structural features of BBB permeable compounds. The machine learning models proposed in this work can serve as an effective tool for assessing BBB permeability in the drug discovery for central nervous system disease.
Collapse
Affiliation(s)
- Li Liang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Zhiwen Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Xinyi Yang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| |
Collapse
|
26
|
Maryam, Rehman MU, Hussain I, Tayara H, Chong KT. A graph neural network approach for predicting drug susceptibility in the human microbiome. Comput Biol Med 2024; 179:108729. [PMID: 38955124 DOI: 10.1016/j.compbiomed.2024.108729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 06/04/2024] [Accepted: 06/08/2024] [Indexed: 07/04/2024]
Abstract
Recent studies have illuminated the critical role of the human microbiome in maintaining health and influencing the pharmacological responses of drugs. Clinical trials, encompassing approximately 150 drugs, have unveiled interactions with the gastrointestinal microbiome, resulting in the conversion of these drugs into inactive metabolites. It is imperative to explore the field of pharmacomicrobiomics during the early stages of drug discovery, prior to clinical trials. To achieve this, the utilization of machine learning and deep learning models is highly desirable. In this study, we have proposed graph-based neural network models, namely GCN, GAT, and GINCOV models, utilizing the SMILES dataset of drug microbiome. Our primary objective was to classify the susceptibility of drugs to depletion by gut microbiota. Our results indicate that the GINCOV surpassed the other models, achieving impressive performance metrics, with an accuracy of 93% on the test dataset. This proposed Graph Neural Network (GNN) model offers a rapid and efficient method for screening drugs susceptible to gut microbiota depletion and also encourages the improvement of patient-specific dosage responses and formulations.
Collapse
Affiliation(s)
- Maryam
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Mobeen Ur Rehman
- Khalifa University Center for Autonomous Robotic Systems (KUCARS), Khalifa University, United Arab Emirates
| | - Irfan Hussain
- Khalifa University Center for Autonomous Robotic Systems (KUCARS), Khalifa University, United Arab Emirates
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea; Advances Electronics and Information Research Centre, Jeonbuk National University, Jeonju, 54896, South Korea.
| |
Collapse
|
27
|
Zheng T, Mitchell JBO, Dobson S. Revisiting the Application of Machine Learning Approaches in Predicting Aqueous Solubility. ACS OMEGA 2024; 9:35209-35222. [PMID: 39157153 PMCID: PMC11325511 DOI: 10.1021/acsomega.4c06163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 07/19/2024] [Accepted: 07/22/2024] [Indexed: 08/20/2024]
Abstract
The solubility of chemical substances in water is a critical parameter in pharmaceutical development, environmental chemistry, agrochemistry, and other fields; however, accurately predicting it remains a challenge. This study aims to evaluate and compare the effectiveness of some of the most popular machine learning modeling methods and molecular featurization techniques in predicting aqueous solubility. Although these methods were not implemented in a competitive environment, some of their performance surpassed previous benchmarks, offering gradual but significant improvements. Our results show that methods based on graph convolution and graph attention mechanisms demonstrated exceptional predictive abilities with high-quality data sets, albeit with a sensitivity to data noise and errors. In contrast, models leveraging molecular descriptors not only provided better interpretability but also showed more resilience when dealing with inherent noise and errors in data. Our analysis of over 4000 molecular descriptors used in various models identified that approximately 800 of these descriptors make a significant contribution to solubility prediction. These insights offer guidance and direction for future developments in solubility prediction.
Collapse
Affiliation(s)
- Tianyuan Zheng
- School
of Computer Science, University of St Andrews, St Andrews, Fife KY16 9SX, U.K.
| | - John B. O. Mitchell
- EaStCHEM
School of Chemistry, University of St Andrews, St Andrews, Fife KY16 9ST, U.K.
| | - Simon Dobson
- School
of Computer Science, University of St Andrews, St Andrews, Fife KY16 9SX, U.K.
| |
Collapse
|
28
|
Overstreet R, King E, Clopton G, Nguyen J, Ciesielski D. QC-GN 2oMS 2: a Graph Neural Net for High Resolution Mass Spectra Prediction. J Chem Inf Model 2024; 64:5806-5816. [PMID: 39013165 DOI: 10.1021/acs.jcim.4c00446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2024]
Abstract
Predicting the mass spectrum of a molecular ion is often accomplished via three generalized approaches: rules-based methods for bond breaking, deep learning, or quantum chemical (QC) modeling. Rules-based approaches are often limited by the conditions for different chemical subspaces and perform poorly under chemical regimes with few defined rules. QC modeling is theoretically robust but requires significant amounts of computational time to produce a spectrum for a given target. Among deep learning techniques, graph neural networks (GNNs) have performed better than previous work with fingerprint-based neural networks in mass spectra prediction. To explore this technique further, we investigate the effects of including quantum chemically derived information as edge features in the GNN to increase predictive accuracy. The models we investigated include categorical bond order, bond force constants derived from extended tight-binding (xTB) quantum chemistry, and acyclic bond dissociation energies. We evaluated these models against a control GNN with no edge features in the input graphs. Bond dissociation enthalpies yielded the best improvement with a cosine similarity score of 0.462 relative to the baseline model (0.437). In this work we also apply dynamic graph attention which improves performance on benchmark problems and supports the inclusion of edge features. Between implementations, we investigate the nature of the molecular embedding for spectra prediction and discuss the recognition of fragment topographies in distinct chemistries for further development in tandem mass spectrometry prediction.
Collapse
Affiliation(s)
- Richard Overstreet
- Signature Science and Technology Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Ethan King
- Computing and Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Grady Clopton
- Department of Chemistry, Tennessee State University, Nashville, Tennessee 37209, United States
| | - Julia Nguyen
- Computing and Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Danielle Ciesielski
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
29
|
Zhang Q, Mao D, Tu Y, Wu YY. A New Fingerprint and Graph Hybrid Neural Network for Predicting Molecular Properties. J Chem Inf Model 2024; 64:5853-5866. [PMID: 39052623 DOI: 10.1021/acs.jcim.4c00586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Machine learning plays a role in accelerating drug discovery, and the design of effective machine learning models is crucial for accurately predicting molecular properties. Characterizing molecules typically involves the use of molecular fingerprints and molecular graphs. These are input into a multilayer perceptron (MLP) and variants of graph neural networks, such as graph attention networks (GATs). Due to the diverse types and large dimension of fingerprints, models may contain many features that are relatively irrelevant or redundant; meanwhile, although the GAT excels in handling heterogeneous graph tasks, it lacks the ability to extract collaborative information from neighboring nodes, which is crucial in scenarios where it cannot capture the joint influence of adjacent groups on atoms. To overcome these challenges, we introduce a hybrid model, combining improved GAT and MLP. In GAT, the recurrent neural network is employed to capture collaborative information. To address the dimensionality issue, we propose a feature selection algorithm, which is based on the principle of maximizing relevance while minimizing redundancy. Through experiments on 13 public data sets and 14 breast cell lines, our model demonstrates superior performance compared to state-of-the-art deep learning and traditional machine learning algorithms. Additionally, a series of ablation experiments were conducted to demonstrate the advantages of our improved version, as well as its antinoise capability and interpretability. These results indicate that our model holds promising prospects for practical applications.
Collapse
Affiliation(s)
- Qingtian Zhang
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Dangxin Mao
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Yusong Tu
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Yuan-Yan Wu
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| |
Collapse
|
30
|
Chilingaryan G, Tamoyan H, Tevosyan A, Babayan N, Hambardzumyan K, Navoyan Z, Aghajanyan A, Khachatrian H, Khondkaryan L. BartSmiles: Generative Masked Language Models for Molecular Representations. J Chem Inf Model 2024; 64:5832-5843. [PMID: 39054761 DOI: 10.1021/acs.jcim.4c00512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
We discover a robust self-supervised strategy tailored toward molecular representations for generative masked language models through a series of tailored, in-depth ablations. Using this pretraining strategy, we train BARTSmiles, a BART-like model with an order of magnitude more compute than previous self-supervised molecular representations. In-depth evaluations show that BARTSmiles consistently outperforms other self-supervised representations across classification, regression, and generation tasks, setting a new state-of-the-art on eight tasks. We then show that when applied to the molecular domain, the BART objective learns representations that implicitly encode our downstream tasks of interest. For example, by selecting seven neurons from a frozen BARTSmiles, we can obtain a model having performance within two percentage points of the full fine-tuned model on task Clintox. Lastly, we show that standard attribution interpretability methods, when applied to BARTSmiles, highlight certain substructures that chemists use to explain specific properties of molecules. The code and pretrained model are publicly available.
Collapse
Affiliation(s)
| | | | - Ani Tevosyan
- YerevaNN, Charents str. 20, 0025 Yerevan, Armenia
- Toxometris.ai, Sarmen str. 7, 0019 Yerevan, Armenia
| | - Nelly Babayan
- Institute of Molecular Biology, NAS RA, Hasratyan 7, 0014 Yerevan, Armenia
- Toxometris.ai, Sarmen str. 7, 0019 Yerevan, Armenia
| | | | | | - Armen Aghajanyan
- Meta AI Research, 1 Hacker Wy, Menlo Park, California 94025, United States
| | - Hrant Khachatrian
- YerevaNN, Charents str. 20, 0025 Yerevan, Armenia
- Yerevan State University, Alex Manoogian str. 1, 0025 Yerevan, Armenia
| | - Lusine Khondkaryan
- Institute of Molecular Biology, NAS RA, Hasratyan 7, 0014 Yerevan, Armenia
- Toxometris.ai, Sarmen str. 7, 0019 Yerevan, Armenia
| |
Collapse
|
31
|
Lu S, Huang Y, Shen WX, Cao YL, Cai M, Chen Y, Tan Y, Jiang YY, Chen YZ. Raman spectroscopic deep learning with signal aggregated representations for enhanced cell phenotype and signature identification. PNAS NEXUS 2024; 3:pgae268. [PMID: 39192845 PMCID: PMC11348106 DOI: 10.1093/pnasnexus/pgae268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 06/21/2024] [Indexed: 08/29/2024]
Abstract
Feature representation is critical for data learning, particularly in learning spectroscopic data. Machine learning (ML) and deep learning (DL) models learn Raman spectra for rapid, nondestructive, and label-free cell phenotype identification, which facilitate diagnostic, therapeutic, forensic, and microbiological applications. But these are challenged by high-dimensional, unordered, and low-sample spectroscopic data. Here, we introduced novel 2D image-like dual signal and component aggregated representations by restructuring Raman spectra and principal components, which enables spectroscopic DL for enhanced cell phenotype and signature identification. New ConvNet models DSCARNets significantly outperformed the state-of-the-art (SOTA) ML and DL models on six benchmark datasets, mostly with >2% improvement over the SOTA performance of 85-97% accuracies. DSCARNets also performed well on four additional datasets against SOTA models of extremely high performances (>98%) and two datasets without a published supervised phenotype classification model. Explainable DSCARNets identified Raman signatures consistent with experimental indications.
Collapse
Affiliation(s)
- Songlin Lu
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, 2279 Lishui Road, Nanshan District, Shenzhen 518055, Guangdong, P. R. China
- Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, 9 Kexue Avenue, Guangming District, Shenzhen 518132, Guangdong, P. R. China
| | - Yuanfang Huang
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, 2279 Lishui Road, Nanshan District, Shenzhen 518055, Guangdong, P. R. China
| | - Wan Xiang Shen
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, 18 Science Drive 4, Singapore 117543, Singapore
| | - Yu Lin Cao
- Tangyi and Tsinghua Shenzhen International Graduate School Collaborative Program, Tsinghua University, 2279 Lishui Road, Nanshan District, Shenzhen 518055, Guangdong, P. R. China
| | - Mengna Cai
- Tangyi and Tsinghua Shenzhen International Graduate School Collaborative Program, Tsinghua University, 2279 Lishui Road, Nanshan District, Shenzhen 518055, Guangdong, P. R. China
| | - Yan Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, 2279 Lishui Road, Nanshan District, Shenzhen 518055, Guangdong, P. R. China
- Shenzhen Kivita Innovative Drug Discovery Institute, Shenzhen 518057, Guangdong, P. R. China
| | - Ying Tan
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, 2279 Lishui Road, Nanshan District, Shenzhen 518055, Guangdong, P. R. China
- Institute of Drug Discovery Technology, Ningbo University, 818 Fenghua Road, Ningbo 315211, Zhejiang, P. R. China
| | - Yu Yang Jiang
- School of Pharmaceutical Sciences, Tsinghua University, 30 Shuangqing Road, Haidian District, Beijing 100084, P. R. China
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, 2279 Lishui Road, Nanshan District, Shenzhen 518055, Guangdong, P. R. China
- Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, 9 Kexue Avenue, Guangming District, Shenzhen 518132, Guangdong, P. R. China
| |
Collapse
|
32
|
Aksamit N, Tchagang A, Li Y, Ombuki-Berman B. Hybrid fragment-SMILES tokenization for ADMET prediction in drug discovery. BMC Bioinformatics 2024; 25:255. [PMID: 39090573 PMCID: PMC11295479 DOI: 10.1186/s12859-024-05861-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 07/10/2024] [Indexed: 08/04/2024] Open
Abstract
BACKGROUND Drug discovery and development is the extremely costly and time-consuming process of identifying new molecules that can interact with a biomarker target to interrupt the disease pathway of interest. In addition to binding the target, a drug candidate needs to satisfy multiple properties affecting absorption, distribution, metabolism, excretion, and toxicity (ADMET). Artificial intelligence approaches provide an opportunity to improve each step of the drug discovery and development process, in which the first question faced by us is how a molecule can be informatively represented such that the in-silico solutions are optimized. RESULTS This study introduces a novel hybrid SMILES-fragment tokenization method, coupled with two pre-training strategies, utilizing a Transformer-based model. We investigate the efficacy of hybrid tokenization in improving the performance of ADMET prediction tasks. Our approach leverages MTL-BERT, an encoder-only Transformer model that achieves state-of-the-art ADMET predictions, and contrasts the standard SMILES tokenization with our hybrid method across a spectrum of fragment library cutoffs. CONCLUSION The findings reveal that while an excess of fragments can impede performance, using hybrid tokenization with high frequency fragments enhances results beyond the base SMILES tokenization. This advancement underscores the potential of integrating fragment- and character-level molecular features within the training of Transformer models for ADMET property prediction.
Collapse
Affiliation(s)
- Nicholas Aksamit
- Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada
| | - Alain Tchagang
- Digital Technologies Research Centre, National Research Council Canada, 1200 Montreal Road, Ottawa, ON, K1A 0R6, Canada
| | - Yifeng Li
- Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
- Department of Biological Sciences, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
| | - Beatrice Ombuki-Berman
- Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
| |
Collapse
|
33
|
Guzman-Pando A, Ramirez-Alonso G, Arzate-Quintana C, Camarillo-Cisneros J. Deep learning algorithms applied to computational chemistry. Mol Divers 2024; 28:2375-2410. [PMID: 38151697 DOI: 10.1007/s11030-023-10771-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 11/14/2023] [Indexed: 12/29/2023]
Abstract
Recently, there has been a significant increase in the use of deep learning techniques in the molecular sciences, which have shown high performance on datasets and the ability to generalize across data. However, no model has achieved perfect performance in solving all problems, and the pros and cons of each approach remain unclear to those new to the field. Therefore, this paper aims to review deep learning algorithms that have been applied to solve molecular challenges in computational chemistry. We proposed a comprehensive categorization that encompasses two primary approaches; conventional deep learning and geometric deep learning models. This classification takes into account the distinct techniques employed by the algorithms within each approach. We present an up-to-date analysis of these algorithms, emphasizing their key features and open issues. This includes details of input descriptors, datasets used, open-source code availability, task solutions, and actual research applications, focusing on general applications rather than specific ones such as drug discovery. Furthermore, our report discusses trends and future directions in molecular algorithm design, including the input descriptors used for each deep learning model, GPU usage, training and forward processing time, model parameters, the most commonly used datasets, libraries, and optimization schemes. This information aids in identifying the most suitable algorithms for a given task. It also serves as a reference for the datasets and input data frequently used for each algorithm technique. In addition, it provides insights into the benefits and open issues of each technique, and supports the development of novel computational chemistry systems.
Collapse
Affiliation(s)
- Abimael Guzman-Pando
- Computational Chemistry Physics Laboratory, Facultad de Medicina y Ciencias Biomédicas, Universidad Autónoma de Chihuahua, Campus II, 31125, Chihuahua, Mexico
| | - Graciela Ramirez-Alonso
- Faculty of Engineering, Universidad Autónoma de Chihuahua, Campus II, 31125, Chihuahua, Mexico
| | - Carlos Arzate-Quintana
- Computational Chemistry Physics Laboratory, Facultad de Medicina y Ciencias Biomédicas, Universidad Autónoma de Chihuahua, Campus II, 31125, Chihuahua, Mexico
| | - Javier Camarillo-Cisneros
- Computational Chemistry Physics Laboratory, Facultad de Medicina y Ciencias Biomédicas, Universidad Autónoma de Chihuahua, Campus II, 31125, Chihuahua, Mexico.
| |
Collapse
|
34
|
Zhao X, Kong Y, Ji Y, Xin X, Chen L, Chen G, Yu C. Classification models for predicting the bioactivity of pan-TRK inhibitors and SAR analysis. Mol Divers 2024; 28:2077-2097. [PMID: 37910346 DOI: 10.1007/s11030-023-10735-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/22/2023] [Indexed: 11/03/2023]
Abstract
Tropomyosin receptor kinases (TRKs) are important broad-spectrum anticancer targets. The oncogenic rearrangement of the NTRK gene disrupts the extracellular structural domain and epitopes for therapeutic antibodies, making small-molecule inhibitors essential for treating NTRK fusion-driven tumors. In this work, several algorithms were used to construct descriptor-based and nondescriptor-based models, and the models were evaluated by outer 10-fold cross-validation. To find a model with good generalization ability, the dataset was partitioned by random and cluster-splitting methods to construct in- and cross-domain models, respectively. Among the 48 models built, the model with the combination of the deep neural network (DNN) algorithm and extended connectivity fingerprints 4 (ECFP4) descriptors achieved excellent performance in both dataset divisions. The results indicate that the DNN algorithm has a strong generalization prediction ability, and the richness of features plays a vital role in predicting unknown spatial molecules. Additionally, we combined the clustering results and decision tree models of fingerprint descriptors to perform structure-activity relationship analysis. It was found that nitrogen-containing aromatic heterocyclic and benzo heterocyclic structures play a crucial role in enhancing the activity of TRK inhibitors.
Collapse
Affiliation(s)
- Xiaoman Zhao
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
- College of Bio engineering, No. 9 Liangshuihe 1st Street, Beijing, 100176, People's Republic of China
| | - Yue Kong
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
| | - Yueshan Ji
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
| | - Xiulan Xin
- College of Bio engineering, No. 9 Liangshuihe 1st Street, Beijing, 100176, People's Republic of China
| | - Liang Chen
- College of Bio engineering, No. 9 Liangshuihe 1st Street, Beijing, 100176, People's Republic of China
| | - Guang Chen
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
| | - Changyuan Yu
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China.
| |
Collapse
|
35
|
Lv Q, Chen G, Yang Z, Zhong W, Chen CYC. Meta Learning With Graph Attention Networks for Low-Data Drug Discovery. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11218-11230. [PMID: 37028032 DOI: 10.1109/tnnls.2023.3250324] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Finding candidate molecules with favorable pharmacological activity, low toxicity, and proper pharmacokinetic properties is an important task in drug discovery. Deep neural networks have made impressive progress in accelerating and improving drug discovery. However, these techniques rely on a large amount of label data to form accurate predictions of molecular properties. At each stage of the drug discovery pipeline, usually, only a few biological data of candidate molecules and derivatives are available, indicating that the application of deep neural networks for low-data drug discovery is still a formidable challenge. Here, we propose a meta learning architecture with graph attention network, Meta-GAT, to predict molecular properties in low-data drug discovery. The GAT captures the local effects of atomic groups at the atom level through the triple attentional mechanism and implicitly captures the interactions between different atomic groups at the molecular level. GAT is used to perceive molecular chemical environment and connectivity, thereby effectively reducing sample complexity. Meta-GAT further develops a meta learning strategy based on bilevel optimization, which transfers meta knowledge from other attribute prediction tasks to low-data target tasks. In summary, our work demonstrates how meta learning can reduce the amount of data required to make meaningful predictions of molecules in low-data scenarios. Meta learning is likely to become the new learning paradigm in low-data drug discovery. The source code is publicly available at: https://github.com/lol88/Meta-GAT.
Collapse
|
36
|
Hao Y, Li B, Huang D, Wu S, Wang T, Fu L, Liu X. Developing a Semi-Supervised Approach Using a PU-Learning-Based Data Augmentation Strategy for Multitarget Drug Discovery. Int J Mol Sci 2024; 25:8239. [PMID: 39125808 PMCID: PMC11312053 DOI: 10.3390/ijms25158239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 07/26/2024] [Accepted: 07/26/2024] [Indexed: 08/12/2024] Open
Abstract
Multifactorial diseases demand therapeutics that can modulate multiple targets for enhanced safety and efficacy, yet the clinical approval of multitarget drugs remains rare. The integration of machine learning (ML) and deep learning (DL) in drug discovery has revolutionized virtual screening. This study investigates the synergy between ML/DL methodologies, molecular representations, and data augmentation strategies. Notably, we found that SVM can match or even surpass the performance of state-of-the-art DL methods. However, conventional data augmentation often involves a trade-off between the true positive rate and false positive rate. To address this, we introduce Negative-Augmented PU-bagging (NAPU-bagging) SVM, a novel semi-supervised learning framework. By leveraging ensemble SVM classifiers trained on resampled bags containing positive, negative, and unlabeled data, our approach is capable of managing false positive rates while maintaining high recall rates. We applied this method to the identification of multitarget-directed ligands (MTDLs), where high recall rates are critical for compiling a list of interaction candidate compounds. Case studies demonstrate that NAPU-bagging SVM can identify structurally novel MTDL hits for ALK-EGFR with favorable docking scores and binding modes, as well as pan-agonists for dopamine receptors. The NAPU-bagging SVM methodology should serve as a promising avenue to virtual screening, especially for the discovery of MTDLs.
Collapse
Affiliation(s)
- Yang Hao
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Bo Li
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Daiyun Huang
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- School of Life Sciences, Fudan University, Shanghai 200092, China
| | - Sijin Wu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| | - Tianjun Wang
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Lei Fu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| | - Xin Liu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| |
Collapse
|
37
|
Hou L, Xiang H, Zeng X, Cao D, Zeng L, Song B. Attribute-guided prototype network for few-shot molecular property prediction. Brief Bioinform 2024; 25:bbae394. [PMID: 39133096 PMCID: PMC11318080 DOI: 10.1093/bib/bbae394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/08/2024] [Accepted: 07/27/2024] [Indexed: 08/13/2024] Open
Abstract
The molecular property prediction (MPP) plays a crucial role in the drug discovery process, providing valuable insights for molecule evaluation and screening. Although deep learning has achieved numerous advances in this area, its success often depends on the availability of substantial labeled data. The few-shot MPP is a more challenging scenario, which aims to identify unseen property with only few available molecules. In this paper, we propose an attribute-guided prototype network (APN) to address the challenge. APN first introduces an molecular attribute extractor, which can not only extract three different types of fingerprint attributes (single fingerprint attributes, dual fingerprint attributes, triplet fingerprint attributes) by considering seven circular-based, five path-based, and two substructure-based fingerprints, but also automatically extract deep attributes from self-supervised learning methods. Furthermore, APN designs the Attribute-Guided Dual-channel Attention module to learn the relationship between the molecular graphs and attributes and refine the local and global representation of the molecules. Compared with existing works, APN leverages high-level human-defined attributes and helps the model to explicitly generalize knowledge in molecular graphs. Experiments on benchmark datasets show that APN can achieve state-of-the-art performance in most cases and demonstrate that the attributes are effective for improving few-shot MPP performance. In addition, the strong generalization ability of APN is verified by conducting experiments on data from different domains.
Collapse
Affiliation(s)
- Linlin Hou
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai 201109, China
| | - Hongxin Xiang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai 201109, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410083, China
| | - Li Zeng
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai 201109, China
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| |
Collapse
|
38
|
Guasch L, Maeder N, Cumming JG, Kramer C. From mundane to surprising nonadditivity: drivers and impact on ML models. J Comput Aided Mol Des 2024; 38:26. [PMID: 39052103 DOI: 10.1007/s10822-024-00566-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 07/16/2024] [Indexed: 07/27/2024]
Abstract
Nonadditivity (NA) in Structure-Activity and Structure-Property Relationship (SAR) data is a rare but very information rich phenomenon. It can indicate conformational flexibility, structural rearrangements, and errors in assay results and structural assignment. While purely ligand-based conformational causes of NA are rather well understood and mundane, other factors are less so and cause surprising NA that has a huge influence on SAR analysis and ML model performance. We here report a systematic analysis across a wide range of properties (20 on-target biological activities and 4 physicochemical ADME-related properties) to understand the frequency of various different phenomena that may lead to NA. A set of novel descriptors were developed to characterize double transformation cycles and identify trends in NA. Double transformation cycles were classified into "surprising" and "mundane" categories, with the majority being classed as mundane. We also examined commonalities among surprising cycles, finding LogP differences to have the most significant impact on NA. A distinct behavior of NA for on-target sets compared to ADME sets was observed. Finally, we show that machine learning models struggle with highly nonadditive data, indicating that a better understanding of NA is an important future research direction.
Collapse
Affiliation(s)
- Laura Guasch
- Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann- La Roche AG, Basel, 4070, Switzerland.
| | - Niels Maeder
- Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann- La Roche AG, Basel, 4070, Switzerland
| | - John G Cumming
- Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann- La Roche AG, Basel, 4070, Switzerland
| | - Christian Kramer
- Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann- La Roche AG, Basel, 4070, Switzerland
| |
Collapse
|
39
|
Miao R, Liu D, Mao L, Chen X, Zhang L, Yuan Z, Shi S, Li H, Li S. GR-pKa: a message-passing neural network with retention mechanism for pKa prediction. Brief Bioinform 2024; 25:bbae408. [PMID: 39171986 PMCID: PMC11339865 DOI: 10.1093/bib/bbae408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 07/26/2024] [Accepted: 08/01/2024] [Indexed: 08/23/2024] Open
Abstract
During the drug discovery and design process, the acid-base dissociation constant (pKa) of a molecule is critically emphasized due to its crucial role in influencing the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties and biological activity. However, the experimental determination of pKa values is often laborious and complex. Moreover, existing prediction methods exhibit limitations in both the quantity and quality of the training data, as well as in their capacity to handle the complex structural and physicochemical properties of compounds, consequently impeding accuracy and generalization. Therefore, developing a method that can quickly and accurately predict molecular pKa values will to some extent help the structural modification of molecules, and thus assist the development process of new drugs. In this study, we developed a cutting-edge pKa prediction model named GR-pKa (Graph Retention pKa), leveraging a message-passing neural network and employing a multi-fidelity learning strategy to accurately predict molecular pKa values. The GR-pKa model incorporates five quantum mechanical properties related to molecular thermodynamics and dynamics as key features to characterize molecules. Notably, we originally introduced the novel retention mechanism into the message-passing phase, which significantly improves the model's ability to capture and update molecular information. Our GR-pKa model outperforms several state-of-the-art models in predicting macro-pKa values, achieving impressive results with a low mean absolute error of 0.490 and root mean square error of 0.588, and a high R2 of 0.937 on the SAMPL7 dataset.
Collapse
Affiliation(s)
- Runyu Miao
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
| | - Danlin Liu
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, No. 3663, Zhongshan North Road, Putuo District, Shanghai, 200062, China
- School of Computer Science and Technology, East China Normal University, No. 3663, Zhongshan North Road, Putuo District, Shanghai, 200062, China
| | - Liyun Mao
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
| | - Xingyu Chen
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
| | - Leihao Zhang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
| | - Zhen Yuan
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
| | - Shanshan Shi
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
| | - Honglin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, No. 3663, Zhongshan North Road, Putuo District, Shanghai, 200062, China
- Lingang Laboratory, No. 319, Yueyang Road, Xuhui District, Shanghai, 200031, China
| | - Shiliang Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, No. 3663, Zhongshan North Road, Putuo District, Shanghai, 200062, China
- Department of Pain management, HuaDong Hospital affiliated to Fudan University, No. 221, West Yan'an Road, Jing'an District, Shanghai, 200040, China
| |
Collapse
|
40
|
Ramani V, Karmakar T. Graph Neural Networks for Predicting Solubility in Diverse Solvents Using MolMerger Incorporating Solute-Solvent Interactions. J Chem Theory Comput 2024. [PMID: 39041858 DOI: 10.1021/acs.jctc.4c00382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
The prediction of solubility is a complex and challenging physicochemical problem that has tremendous implications for the chemical and pharmaceutical industry. Recent advancements in machine learning methods have provided a great scope for predicting the reliable solubility of a large number of molecular systems. However, most of these methods rely on using physical properties obtained from experiments and expensive quantum chemical calculations. Here, we developed a method that utilizes a graphical representation of solute-solvent interactions using "MolMerger," which captures the strongest polar interactions between molecules using Gasteiger charges and creates a graph incorporating the true nature of the system. Using these graphs as input, a neural network learns the correlation between the structural properties of a molecule in the form of node embedding and its physicochemical properties as the output. This approach has been used to calculate molecular solubility by predicting the Log solubility values of various organic molecules and pharmaceuticals in diverse sets of solvents.
Collapse
Affiliation(s)
- Vansh Ramani
- Department of Chemical Engineering, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India
| | - Tarak Karmakar
- Department of Chemistry, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India
| |
Collapse
|
41
|
Chen S, Xie J, Ye R, Xu DD, Yang Y. Structure-aware dual-target drug design through collaborative learning of pharmacophore combination and molecular simulation. Chem Sci 2024; 15:10366-10380. [PMID: 38994407 PMCID: PMC11234869 DOI: 10.1039/d4sc00094c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 06/09/2024] [Indexed: 07/13/2024] Open
Abstract
Dual-target drug design has gained significant attention in the treatment of complex diseases, such as cancers and autoimmune disorders. A widely employed design strategy is combining pharmacophores to leverage the knowledge of structure-activity relationships of both targets. Unfortunately, pharmacophore combination often struggles with long and expensive trial and error, because the protein pockets of the two targets impose complex structural constraints. In this study, we propose AIxFuse, a structure-aware dual-target drug design method that learns pharmacophore fusion patterns to satisfy the dual-target structural constraints simulated by molecular docking. AIxFuse employs two self-play reinforcement learning (RL) agents to learn pharmacophore selection and fusion by comprehensive feedback including dual-target molecular docking scores. Collaboratively, the molecular docking scores are learned by active learning (AL). Through collaborative RL and AL, AIxFuse learns to generate molecules with multiple desired properties. AIxFuse is shown to outperform state-of-the-art methods in generating dual-target drugs against glycogen synthase kinase-3 beta (GSK3β) and c-Jun N-terminal kinase 3 (JNK3). When applied to another task against retinoic acid receptor-related orphan receptor γ-t (RORγt) and dihydroorotate dehydrogenase (DHODH), AIxFuse exhibits consistent performance while compared methods suffer from performance drops, leading to a 5 times higher performance in success rate. Docking studies demonstrate that AIxFuse can generate molecules concurrently satisfying the binding mode required by both targets. Further free energy perturbation calculation indicates that the generated candidates have promising binding free energies against both targets.
Collapse
Affiliation(s)
- Sheng Chen
- School of Computer Science and Engineering, Sun Yat-sen University Guangzhou 510006 China
- AixplorerBio Inc. Jiaxing 314031 China
| | - Junjie Xie
- School of Computer Science and Engineering, Sun Yat-sen University Guangzhou 510006 China
- AixplorerBio Inc. Jiaxing 314031 China
| | | | | | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University Guangzhou 510006 China
| |
Collapse
|
42
|
Guo X, Zhao X, Lu X, Zhao L, Zeng Q, Chen F, Zhang Z, Xu M, Feng S, Fan T, Wei W, Zhang X, Pang J, You X, Song D, Wang Y, Jiang J. A deep learning-driven discovery of berberine derivatives as novel antibacterial against multidrug-resistant Helicobacter pylori. Signal Transduct Target Ther 2024; 9:183. [PMID: 38972904 PMCID: PMC11228022 DOI: 10.1038/s41392-024-01895-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/17/2024] [Accepted: 06/14/2024] [Indexed: 07/09/2024] Open
Abstract
Helicobacter pylori (H. pylori) is currently recognized as the primary carcinogenic pathogen associated with gastric tumorigenesis, and its high prevalence and resistance make it difficult to tackle. A graph neural network-based deep learning model, employing different training sets of 13,638 molecules for pre-training and fine-tuning, was aided in predicting and exploring novel molecules against H. pylori. A positively predicted novel berberine derivative 8 with 3,13-disubstituted alkene exhibited a potency against all tested drug-susceptible and resistant H. pylori strains with minimum inhibitory concentrations (MICs) of 0.25-0.5 μg/mL. Pharmacokinetic studies demonstrated an ideal gastric retention of 8, with the stomach concentration significantly higher than its MIC at 24 h post dose. Oral administration of 8 and omeprazole (OPZ) showed a comparable gastric bacterial reduction (2.2-log reduction) to the triple-therapy, namely OPZ + amoxicillin (AMX) + clarithromycin (CLA) without obvious disturbance on the intestinal flora. A combination of OPZ, AMX, CLA, and 8 could further decrease the bacteria load (2.8-log reduction). More importantly, the mono-therapy of 8 exhibited comparable eradication to both triple-therapy (OPZ + AMX + CLA) and quadruple-therapy (OPZ + AMX + CLA + bismuth citrate) groups. SecA and BamD, playing a major role in outer membrane protein (OMP) transport and assembling, were identified and verified as the direct targets of 8 by employing the chemoproteomics technique. In summary, by targeting the relatively conserved OMPs transport and assembling system, 8 has the potential to be developed as a novel anti-H. pylori candidate, especially for the eradication of drug-resistant strains.
Collapse
Affiliation(s)
- Xixi Guo
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China
| | - Xiaosa Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Xi Lu
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China
| | - Liping Zhao
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China
| | - Qingxuan Zeng
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China
| | - Fenbei Chen
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China
| | - Zhimeng Zhang
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China
| | - Mengyi Xu
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China
| | - Shijiao Feng
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China
| | - Tianyun Fan
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China
| | - Wei Wei
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China
| | - Xin Zhang
- Department of Pharmacy, Affiliated Hospital of Jining Medical University, Jining Medical University, Jining, 272029, Shandong, China
| | - Jing Pang
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China.
| | - Xuefu You
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China.
| | - Danqing Song
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China.
| | - Yanxiang Wang
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China.
- Institute of Health and Medicine, Hefei Comprehensive National Science Center, Hefei, 230601, Anhui, China.
| | - Jiandong Jiang
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050, Beijing, China
| |
Collapse
|
43
|
Li Z, Qu N, Zhou J, Sun J, Ren Q, Meng J, Wang G, Wang R, Liu J, Chen Y, Zhang S, Zheng M, Li X. KinomeMETA: a web platform for kinome-wide polypharmacology profiling with meta-learning. Nucleic Acids Res 2024; 52:W489-W497. [PMID: 38752486 PMCID: PMC11223815 DOI: 10.1093/nar/gkae380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 04/21/2024] [Accepted: 04/26/2024] [Indexed: 07/06/2024] Open
Abstract
Kinase-targeted inhibitors hold promise for new therapeutic options, with multi-target inhibitors offering the potential for broader efficacy while minimizing polypharmacology risks. However, comprehensive experimental profiling of kinome-wide activity is expensive, and existing computational approaches often lack scalability or accuracy for understudied kinases. We introduce KinomeMETA, an artificial intelligence (AI)-powered web platform that significantly expands the predictive range with scalability for predicting the polypharmacological effects of small molecules across the kinome. By leveraging a novel meta-learning algorithm, KinomeMETA efficiently utilizes sparse activity data, enabling rapid generalization to new kinase tasks even with limited information. This significantly expands the repertoire of accurately predictable kinases to 661 wild-type and clinically-relevant mutant kinases, far exceeding existing methods. Additionally, KinomeMETA empowers users to customize models with their proprietary data for specific research needs. Case studies demonstrate its ability to discover new active compounds by quickly adapting to small dataset. Overall, KinomeMETA offers enhanced kinome virtual profiling capabilities and is positioned as a powerful tool for developing new kinase inhibitors and advancing kinase research. The KinomeMETA server is freely accessible without registration at https://kinomemeta.alphama.com.cn/.
Collapse
Affiliation(s)
- Zhaojun Li
- College of Computer and Information Engineering, Dezhou University, Dezhou City 253023, China
- Development Department, Suzhou Alphama Biotechnology Co., Ltd, Suzhou City 215000, China
| | - Ning Qu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Jingyi Zhou
- School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Lingang Laboratory, Shanghai 200031, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Jingjing Sun
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Qun Ren
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Jingyi Meng
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Guangchao Wang
- College of Computer and Information Engineering, Dezhou University, Dezhou City 253023, China
| | - Rongyan Wang
- College of Computer and Information Engineering, Dezhou University, Dezhou City 253023, China
| | - Jin Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Yijie Chen
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| |
Collapse
|
44
|
Su W, Li P, Zhong L, Liang W, Li T, Liu J, Ruan T, Jiang G. Occurrence and Distribution of Antibacterial Quaternary Ammonium Compounds in Chinese Estuaries Revealed by Machine Learning-Assisted Mass Spectrometric Analysis. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:11707-11717. [PMID: 38871667 DOI: 10.1021/acs.est.4c02380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
Antimicrobial resistance (AMR) undermines the United Nations Sustainable Development Goals of good health and well-being. Antibiotics are known to exacerbate AMR, but nonantibiotic antimicrobials, such as quaternary ammonium compounds (QACs), are now emerging as another significant driver of AMR. However, assessing the AMR risks of QACs in complex environmental matrices remains challenging due to the ambiguity in their chemical structures and antibacterial activity. By machine learning prediction and high-resolution mass spectrometric analysis, a list of antibacterial QACs (n = 856) from industrial chemical inventories is compiled, and it leads to the identification of 50 structurally diverse antibacterial QACs in sediments, including traditional hydrocarbon-based compounds and new subclasses that bear additional functional groups, such as choline, ester, betaine, aryl ether, and pyridine. Urban wastewater, aquaculture, and hospital discharges are the main factors influencing QAC distribution patterns in estuarine sediments. Toxic unit calculations and metagenomic analysis revealed that these QACs can influence antibiotic resistance genes (particularly sulfonamide resistance genes) through cross- and coresistances. The potential to influence the AMR is related to their environmental persistence. These results suggest that controlling the source, preventing the co-use of QACs and sulfonamides, and prioritizing control of highly persistent molecules will lead to global stewardship and sustainable use of QACs.
Collapse
Affiliation(s)
- Wenyuan Su
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Pengyang Li
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Laijin Zhong
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenqing Liang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tingyu Li
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiyan Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ting Ruan
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
45
|
Liu J, Gui Y, Rao J, Sun J, Wang G, Ren Q, Qu N, Niu B, Chen Z, Sheng X, Wang Y, Zheng M, Li X. In silico off-target profiling for enhanced drug safety assessment. Acta Pharm Sin B 2024; 14:2927-2941. [PMID: 39027254 PMCID: PMC11252485 DOI: 10.1016/j.apsb.2024.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/21/2024] [Accepted: 02/29/2024] [Indexed: 07/20/2024] Open
Abstract
Ensuring drug safety in the early stages of drug development is crucial to avoid costly failures in subsequent phases. However, the economic burden associated with detecting drug off-targets and potential side effects through in vitro safety screening and animal testing is substantial. Drug off-target interactions, along with the adverse drug reactions they induce, are significant factors affecting drug safety. To assess the liability of candidate drugs, we developed an artificial intelligence model for the precise prediction of compound off-target interactions, leveraging multi-task graph neural networks. The outcomes of off-target predictions can serve as representations for compounds, enabling the differentiation of drugs under various ATC codes and the classification of compound toxicity. Furthermore, the predicted off-target profiles are employed in adverse drug reaction (ADR) enrichment analysis, facilitating the inference of potential ADRs for a drug. Using the withdrawn drug Pergolide as an example, we elucidate the mechanisms underlying ADRs at the target level, contributing to the exploration of the potential clinical relevance of newly predicted off-target interactions. Overall, our work facilitates the early assessment of compound safety/toxicity based on off-target identification, deduces potential ADRs of drugs, and ultimately promotes the secure development of drugs.
Collapse
Affiliation(s)
- Jin Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
| | - Yike Gui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingjing Sun
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Gang Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qun Ren
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Ning Qu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Buying Niu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhiyi Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, Hangzhou 330106, China
| | - Xia Sheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Mingyue Zheng
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Nanjing University of Chinese Medicine, Nanjing 210023, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, Hangzhou 330106, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
46
|
Xu S, Shen L, Zhang M, Jiang C, Zhang X, Xu Y, Liu J, Liu X. Surface-based multimodal protein-ligand binding affinity prediction. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae413. [PMID: 38905501 DOI: 10.1093/bioinformatics/btae413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/15/2024] [Accepted: 06/19/2024] [Indexed: 06/23/2024]
Abstract
MOTIVATION In the field of drug discovery, accurately and effectively predicting the binding affinity between proteins and ligands is crucial for drug screening and optimization. However, current research primarily utilizes representations based on sequence or structure to predict protein-ligand binding affinity, with relatively less study on protein surface information, which is crucial for protein-ligand interactions. Moreover, when dealing with multimodal information of proteins, traditional approaches typically concatenate features from different modalities in a straightforward manner without considering the heterogeneity among them, which results in an inability to effectively exploit the complementary between modalities. RESULTS We introduce a novel multimodal feature extraction (MFE) framework that, for the first time, incorporates information from protein surfaces, 3D structures, and sequences, and uses cross-attention mechanism for feature alignment between different modalities. Experimental results show that our method achieves state-of-the-art performance in predicting protein-ligand binding affinity. Furthermore, we conduct ablation studies that demonstrate the effectiveness and necessity of protein surface information and multimodal feature alignment within the framework. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/Sultans0fSwing/MFE.
Collapse
Affiliation(s)
- Shiyu Xu
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Lian Shen
- Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China
| | - Menglong Zhang
- Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China
| | - Changzhi Jiang
- Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China
| | - Xinyi Zhang
- Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China
| | - Yanni Xu
- Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China
| | - Juan Liu
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361005, China
| | - Xiangrong Liu
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
- Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China
- Xiamen Key Laboratory of Intelligent Storage and Computing, Xiamen University, Xiamen 361005, China
| |
Collapse
|
47
|
Gong C, Feng Y, Zhu J, Liu G, Tang Y, Li W. Evaluation of machine learning models for cytochrome P450 3A4, 2D6, and 2C9 inhibition. J Appl Toxicol 2024; 44:1050-1066. [PMID: 38544296 DOI: 10.1002/jat.4601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 02/26/2024] [Accepted: 03/05/2024] [Indexed: 07/21/2024]
Abstract
Cytochrome P450 (CYP) enzymes are involved in the metabolism of approximately 75% of marketed drugs. Inhibition of the major drug-metabolizing P450s could alter drug metabolism and lead to undesirable drug-drug interactions. Therefore, it is of great significance to explore the inhibition of P450s in drug discovery. Currently, machine learning including deep learning algorithms has been widely used for constructing in silico models for the prediction of P450 inhibition. These models exhibited varying predictive performance depending on the use of machine learning algorithms and molecular representations. This leads to the difficulty in the selection of appropriate models for practical use. In this study, we systematically evaluated the conventional machine learning and deep learning models for three major P450 enzymes, CYP3A4, CYP2D6, and CYP2C9 from several perspectives, such as algorithms, molecular representation, and data partitioning strategies. Our results showed that the XGBoost and CatBoost algorithms coupled with the combined fingerprint/physicochemical descriptor features exhibited the best performance with Area Under Curve (AUC) of 0.92, while the deep learning models were generally inferior to the conventional machine learning models (average AUC reached 0.89) on the same test sets. We also found that data volume and sampling strategy had a minor effect on model performance. We anticipate that these results are helpful for the selection of molecular representations and machine learning/deep learning algorithms in the P450 model construction and the future model development of P450 inhibition.
Collapse
Affiliation(s)
- Changda Gong
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yanjun Feng
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Jieyu Zhu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| |
Collapse
|
48
|
Swanson K, Walther P, Leitz J, Mukherjee S, Wu JC, Shivnaraine RV, Zou J. ADMET-AI: a machine learning ADMET platform for evaluation of large-scale chemical libraries. Bioinformatics 2024; 40:btae416. [PMID: 38913862 PMCID: PMC11226862 DOI: 10.1093/bioinformatics/btae416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/12/2024] [Accepted: 06/21/2024] [Indexed: 06/26/2024] Open
Abstract
MOTIVATION The emergence of large chemical repositories and combinatorial chemical spaces, coupled with high-throughput docking and generative AI, have greatly expanded the chemical diversity of small molecules for drug discovery. Selecting compounds for experimental validation requires filtering these molecules based on favourable druglike properties, such as Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET). RESULTS We developed ADMET-AI, a machine learning platform that provides fast and accurate ADMET predictions both as a website and as a Python package. ADMET-AI has the highest average rank on the TDC ADMET Leaderboard, and it is currently the fastest web-based ADMET predictor, with a 45% reduction in time compared to the next fastest public ADMET web server. ADMET-AI can also be run locally with predictions for one million molecules taking just 3.1 h. AVAILABILITY AND IMPLEMENTATION The ADMET-AI platform is freely available both as a web server at admet.ai.greenstonebio.com and as an open-source Python package for local batch prediction at github.com/swansonk14/admet_ai (also archived on Zenodo at doi.org/10.5281/zenodo.10372930). All data and models are archived on Zenodo at doi.org/10.5281/zenodo.10372418.
Collapse
Affiliation(s)
- Kyle Swanson
- Department of Computer Science, Stanford University, 353 Jane Stanford Way, Stanford, CA 94305, USA
- Greenstone Biosciences, 3160 Porter Drive, Suite 140, Palo Alto, CA 94304, USA
| | - Parker Walther
- Carleton College, One North College Street, Northfield, MN 55057, USA
| | - Jeremy Leitz
- Greenstone Biosciences, 3160 Porter Drive, Suite 140, Palo Alto, CA 94304, USA
| | - Souhrid Mukherjee
- Greenstone Biosciences, 3160 Porter Drive, Suite 140, Palo Alto, CA 94304, USA
| | - Joseph C Wu
- Stanford Cardiovascular Institute, Stanford University, 265 Campus Drive, Stanford, CA 94305, USA
| | | | - James Zou
- Department of Computer Science, Stanford University, 353 Jane Stanford Way, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford, CA 94305, USA
| |
Collapse
|
49
|
Han C, Zhang D, Xia S, Zhang Y. Accurate Prediction of NMR Chemical Shifts: Integrating DFT Calculations with Three-Dimensional Graph Neural Networks. J Chem Theory Comput 2024; 20:5250-5258. [PMID: 38842505 PMCID: PMC11209944 DOI: 10.1021/acs.jctc.4c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/25/2024] [Accepted: 05/29/2024] [Indexed: 06/07/2024]
Abstract
Computer prediction of NMR chemical shifts plays an increasingly important role in molecular structure assignment and elucidation for organic molecule studies. Density functional theory (DFT) and gauge-including atomic orbital (GIAO) have established a framework to predict NMR chemical shifts but often at a significant computational expense with a limited prediction accuracy. Recent advancements in deep learning methods, especially graph neural networks (GNNs), have shown promise in improving the accuracy of predicting experimental chemical shifts, either by using 2D molecular topological features or 3D conformational representation. This study presents a new 3D GNN model to predict 1H and 13C chemical shifts, CSTShift, that combines atomic features with DFT-calculated shielding tensor descriptors, capturing both isotropic and anisotropic shielding effects. Utilizing the NMRShiftDB2 data set and conducting DFT optimization and GIAO calculations at the B3LYP/6-31G(d) level, we prepared the NMRShiftDB2-DFT data set of high-quality 3D structures and shielding tensors with corresponding experimentally measured 1H and 13C chemical shifts. The developed CSTShift models achieve the state-of-the-art prediction performance on both the NMRShiftDB2-DFT test data set and external CHESHIRE data set. Further case studies on identifying correct structures from two groups of constitutional isomers show its capability for structure assignment and elucidation. The source code and data are accessible at https://yzhang.hpc.nyu.edu/IMA.
Collapse
Affiliation(s)
- Chao Han
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Dongdong Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
50
|
Stienstra CMK, Hebert L, Thomas P, Haack A, Guo J, Hopkins WS. Graphormer-IR: Graph Transformers Predict Experimental IR Spectra Using Highly Specialized Attention. J Chem Inf Model 2024; 64:4613-4629. [PMID: 38845400 DOI: 10.1021/acs.jcim.4c00378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Infrared (IR) spectroscopy is an important analytical tool in various chemical and forensic domains and a great deal of effort has gone into developing in silico methods for predicting experimental spectra. A key challenge in this regard is generating highly accurate spectra quickly to enable real-time feedback between computation and experiment. Here, we employ Graphormer, a graph neural network (GNN) transformer, to predict IR spectra using only simplified molecular-input line-entry system (SMILES) strings. Our data set includes 53,528 high-quality spectra, measured in five different experimental media (i.e., phases), for molecules containing the elements H, C, N, O, F, Si, S, P, Cl, Br, and I. When using only atomic numbers for node encodings, Graphormer-IR achieved a mean test spectral information similarity (SISμ) value of 0.8449 ± 0.0012 (n = 5), which surpasses that the current state-of-the-art model Chemprop-IR (SISμ = 0.8409 ± 0.0014, n = 5) with only 36% of the encoded information. Augmenting node embeddings with additional node-level descriptors in learned embeddings generated through a multilayer perceptron improves scores to SISμ = 0.8523 ± 0.0006, a total improvement of 19.7σ (t = 19). These improved scores show how Graphormer-IR excels in capturing long-range interactions like hydrogen bonding, anharmonic peak positions in experimental spectra, and stretching frequencies of uncommon functional groups. Scaling our architecture to 210 attention heads demonstrates specialist-like behavior for distinct IR frequencies that improves model performance. Our model utilizes novel architectures, including a global node for phase encoding, learned node feature embeddings, and a one-dimensional (1D) smoothing convolutional neural network (CNN). Graphormer-IR's innovations underscore its value over traditional message-passing neural networks (MPNNs) due to its expressive embeddings and ability to capture long-range intramolecular relationships.
Collapse
Affiliation(s)
- Cailum M K Stienstra
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Liam Hebert
- Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Patrick Thomas
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Alexander Haack
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Jason Guo
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - W Scott Hopkins
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Watermine Innovation, Waterloo, Ontario N0B 2T0, Canada
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong
| |
Collapse
|