101
|
Zhang R, Nolte D, Sanchez-Villalobos C, Ghosh S, Pal R. Topological regression as an interpretable and efficient tool for quantitative structure-activity relationship modeling. Nat Commun 2024; 15:5072. [PMID: 38871711 DOI: 10.1038/s41467-024-49372-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Quantitative structure-activity relationship (QSAR) modeling is a powerful tool for drug discovery, yet the lack of interpretability of commonly used QSAR models hinders their application in molecular design. We propose a similarity-based regression framework, topological regression (TR), that offers a statistically grounded, computationally fast, and interpretable technique to predict drug responses. We compare the predictive performance of TR on 530 ChEMBL human target activity datasets against the predictive performance of deep-learning-based QSAR models. Our results suggest that our sparse TR model can achieve equal, if not better, performance than the deep learning-based QSAR models and provide better intuitive interpretation by extracting an approximate isometry between the chemical space of the drugs and their activity space.
Collapse
Affiliation(s)
- Ruibo Zhang
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| | - Daniel Nolte
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| | - Cesar Sanchez-Villalobos
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| | - Souparno Ghosh
- Department of Statistics, University of Nebraska - Lincoln, Lincoln, NB, 68588, USA.
| | - Ranadip Pal
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA.
| |
Collapse
|
102
|
Duan Y, Yang X, Zeng X, Wang W, Deng Y, Cao D. Enhancing Molecular Property Prediction through Task-Oriented Transfer Learning: Integrating Universal Structural Insights and Domain-Specific Knowledge. J Med Chem 2024; 67:9575-9586. [PMID: 38748846 DOI: 10.1021/acs.jmedchem.4c00692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
Precisely predicting molecular properties is crucial in drug discovery, but the scarcity of labeled data poses a challenge for applying deep learning methods. While large-scale self-supervised pretraining has proven an effective solution, it often neglects domain-specific knowledge. To tackle this issue, we introduce Task-Oriented Multilevel Learning based on BERT (TOML-BERT), a dual-level pretraining framework that considers both structural patterns and domain knowledge of molecules. TOML-BERT achieved state-of-the-art prediction performance on 10 pharmaceutical datasets. It has the capability to mine contextual information within molecular structures and extract domain knowledge from massive pseudo-labeled data. The dual-level pretraining accomplished significant positive transfer, with its two components making complementary contributions. Interpretive analysis elucidated that the effectiveness of the dual-level pretraining lies in the prior learning of a task-related molecular representation. Overall, TOML-BERT demonstrates the potential of combining multiple pretraining tasks to extract task-oriented knowledge, advancing molecular property prediction in drug discovery.
Collapse
Affiliation(s)
- Yanjing Duan
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Xixi Yang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410013, P. R. China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410013, P. R. China
| | - Wenxuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Youchao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| |
Collapse
|
103
|
Wu JN, Wang T, Chen Y, Tang LJ, Wu HL, Yu RQ. t-SMILES: a fragment-based molecular representation framework for de novo ligand design. Nat Commun 2024; 15:4993. [PMID: 38862578 PMCID: PMC11167009 DOI: 10.1038/s41467-024-49388-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 06/04/2024] [Indexed: 06/13/2024] Open
Abstract
Effective representation of molecules is a crucial factor affecting the performance of artificial intelligence models. This study introduces a flexible, fragment-based, multiscale molecular representation framework called t-SMILES (tree-based SMILES) with three code algorithms: TSSA (t-SMILES with shared atom), TSDY (t-SMILES with dummy atom but without ID) and TSID (t-SMILES with ID and dummy atom). It describes molecules using SMILES-type strings obtained by performing a breadth-first search on a full binary tree formed from a fragmented molecular graph. Systematic evaluations using JTVAE, BRICS, MMPA, and Scaffold show the feasibility of constructing a multi-code molecular description system, where various descriptions complement each other, enhancing the overall performance. In addition, it can avoid overfitting and achieve higher novelty scores while maintaining reasonable similarity on labeled low-resource datasets, regardless of whether the model is original, data-augmented, or pre-trained then fine-tuned. Furthermore, it significantly outperforms classical SMILES, DeepSMILES, SELFIES and baseline models in goal-directed tasks. And it surpasses state-of-the-art fragment, graph and SMILES based approaches on ChEMBL, Zinc, and QM9.
Collapse
Affiliation(s)
- Juan-Ni Wu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, PR China
| | - Tong Wang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, PR China
| | - Yue Chen
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, PR China
| | - Li-Juan Tang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, PR China
| | - Hai-Long Wu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, PR China.
| | - Ru-Qin Yu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, PR China.
| |
Collapse
|
104
|
De Carlo A, Ronchi D, Piastra M, Tosca EM, Magni P. Predicting ADMET Properties from Molecule SMILE: A Bottom-Up Approach Using Attention-Based Graph Neural Networks. Pharmaceutics 2024; 16:776. [PMID: 38931898 PMCID: PMC11207804 DOI: 10.3390/pharmaceutics16060776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/08/2024] [Accepted: 05/30/2024] [Indexed: 06/28/2024] Open
Abstract
Understanding the pharmacokinetics, safety and efficacy of candidate drugs is crucial for their success. One key aspect is the characterization of absorption, distribution, metabolism, excretion and toxicity (ADMET) properties, which require early assessment in the drug discovery and development process. This study aims to present an innovative approach for predicting ADMET properties using attention-based graph neural networks (GNNs). The model utilizes a graph-based representation of molecules directly derived from Simplified Molecular Input Line Entry System (SMILE) notation. Information is processed sequentially, from substructures to the whole molecule, employing a bottom-up approach. The developed GNN is tested and compared with existing approaches using six benchmark datasets and by encompassing regression (lipophilicity and aqueous solubility) and classification (CYP2C9, CYP2C19, CYP2D6 and CYP3A4 inhibition) tasks. Results show the effectiveness of our model, which bypasses the computationally expensive retrieval and selection of molecular descriptors. This approach provides a valuable tool for high-throughput screening, facilitating early assessment of ADMET properties and enhancing the likelihood of drug success in the development pipeline.
Collapse
Affiliation(s)
| | | | | | | | - Paolo Magni
- Dipartimento di Ingegneria Industriale e dell’Informazione, Università degli Studi di Pavia, 27100 Pavia, Italy; (A.D.C.); (D.R.); (M.P.); (E.M.T.)
| |
Collapse
|
105
|
Qian X, Ju B, Shen P, Yang K, Li L, Liu Q. Meta Learning with Attention Based FP-GNNs for Few-Shot Molecular Property Prediction. ACS OMEGA 2024; 9:23940-23948. [PMID: 38854580 PMCID: PMC11154901 DOI: 10.1021/acsomega.4c02147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/09/2024] [Accepted: 05/14/2024] [Indexed: 06/11/2024]
Abstract
Molecular property prediction holds significant importance in drug discovery, enabling the identification of biologically active compounds with favorable drug-like properties. However, the low data problem, arising from the scarcity of labeled data in drug discovery, poses a substantial obstacle for accurate predictions. To address this challenge, we introduce a novel architecture, AttFPGNN-MAML, for few-shot molecular property prediction. The proposed approach incorporates a hybrid feature representation to enrich molecular representations and model intermolecular relationships specific to the task. By leveraging ProtoMAML, a meta-learning strategy, our model is trained and adapted to new tasks. Evaluation on two few-shot data sets, MoleculeNet and FS-Mol, demonstrates our method's superior performance in three out of four tasks and across various support set sizes. These results convincingly validate the effectiveness of our method in the realm of few-shot molecular property prediction. The source code is publicly available at https://github.com/sanomics-lab/AttFPGNN-MAML.
Collapse
Affiliation(s)
- Xiaoliang Qian
- Translational
Medical Center for Stem Cell Therapy and Institute for Regenerative
Medicine, Shanghai East Hospital, Frontier Science Center for Stem
Cell Research, Bioinformatics Department, School of Life Sciences
and Technology, Tongji University, Shanghai 200092, China
- SanOmics
AI Co., Ltd., Hangzhou 311103, China
| | - Bin Ju
- SanOmics
AI Co., Ltd., Hangzhou 311103, China
- State
Key Laboratory for Diagnosis and Treatment of Infectious Diseases,
National Clinical Research Center for Infectious Diseases, Collaborative
Innovation Center for Diagnosis and Treatment of Infectious Diseases,
The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310009, China
| | - Ping Shen
- State
Key Laboratory for Diagnosis and Treatment of Infectious Diseases,
National Clinical Research Center for Infectious Diseases, Collaborative
Innovation Center for Diagnosis and Treatment of Infectious Diseases,
The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310009, China
| | - Keda Yang
- Shulan
International Medical College, Zhejiang
Shuren University, Hangzhou 310015, China
| | - Li Li
- Department
of Hepatobiliary Surgery, The First People’s
Hospital of Kunming, Kunming 650034, China
| | - Qi Liu
- Translational
Medical Center for Stem Cell Therapy and Institute for Regenerative
Medicine, Shanghai East Hospital, Frontier Science Center for Stem
Cell Research, Bioinformatics Department, School of Life Sciences
and Technology, Tongji University, Shanghai 200092, China
- Key
Laboratory
of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University),
Ministry of Education, Orthopaedic Department of Tongji Hospital,
Frontier Science Center for Stem Cell Research, Bioinformatics Department,
School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Shanghai
Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
| |
Collapse
|
106
|
Yin Y, Hu H, Yang J, Ye C, Goh WWB, Kong AWK, Wu J. OLB-AC: toward optimizing ligand bioactivities through deep graph learning and activity cliffs. Bioinformatics 2024; 40:btae365. [PMID: 38889277 PMCID: PMC11208724 DOI: 10.1093/bioinformatics/btae365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 05/14/2024] [Accepted: 06/14/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Deep graph learning (DGL) has been widely employed in the realm of ligand-based virtual screening. Within this field, a key hurdle is the existence of activity cliffs (ACs), where minor chemical alterations can lead to significant changes in bioactivity. In response, several DGL models have been developed to enhance ligand bioactivity prediction in the presence of ACs. Yet, there remains a largely unexplored opportunity within ACs for optimizing ligand bioactivity, making it an area ripe for further investigation. RESULTS We present a novel approach to simultaneously predict and optimize ligand bioactivities through DGL and ACs (OLB-AC). OLB-AC possesses the capability to optimize ligand molecules located near ACs, providing a direct reference for optimizing ligand bioactivities with the matching of original ligands. To accomplish this, a novel attentive graph reconstruction neural network and ligand optimization scheme are proposed. Attentive graph reconstruction neural network reconstructs original ligands and optimizes them through adversarial representations derived from their bioactivity prediction process. Experimental results on nine drug targets reveal that out of the 667 molecules generated through OLB-AC optimization on datasets comprising 974 low-activity, noninhibitor, or highly toxic ligands, 49 are recognized as known highly active, inhibitor, or nontoxic ligands beyond the datasets' scope. The 27 out of 49 matched molecular pairs generated by OLB-AC reveal novel transformations not present in their training sets. The adversarial representations employed for ligand optimization originate from the gradients of bioactivity predictions. Therefore, we also assess OLB-AC's prediction accuracy across 33 different bioactivity datasets. Results show that OLB-AC achieves the best Pearson correlation coefficient (r2) on 27/33 datasets, with an average improvement of 7.2%-22.9% against the state-of-the-art bioactivity prediction methods. AVAILABILITY AND IMPLEMENTATION The code and dataset developed in this work are available at github.com/Yueming-Yin/OLB-AC.
Collapse
Affiliation(s)
- Yueming Yin
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
- College of Computing and Data Science, Nanyang Technological University, 639798, Singapore
| | - Haifeng Hu
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Jitao Yang
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Chun Ye
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, 637551, Singapore
- School of Biological Sciences, Nanyang Technological University, 637551, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, 637551, Singapore
- Center for AI in Medicine, Nanyang Technological University, 639798, Singapore
- Division of Neurology, Department of Brain Sciences, Faculty of Medicine, Imperial College London, London W12 0NN, U.K
| | - Adams Wai-Kin Kong
- College of Computing and Data Science, Nanyang Technological University, 639798, Singapore
| | - Jiansheng Wu
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| |
Collapse
|
107
|
Zhou Y, Wang Z, Huang Z, Li W, Chen Y, Yu X, Tang Y, Liu G. In silico prediction of ocular toxicity of compounds using explainable machine learning and deep learning approaches. J Appl Toxicol 2024; 44:892-907. [PMID: 38329145 DOI: 10.1002/jat.4586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/16/2024] [Accepted: 01/16/2024] [Indexed: 02/09/2024]
Abstract
The accurate identification of chemicals with ocular toxicity is of paramount importance in health hazard assessment. In contemporary chemical toxicology, there is a growing emphasis on refining, reducing, and replacing animal testing in safety evaluations. Therefore, the development of robust computational tools is crucial for regulatory applications. The performance of predictive models is heavily reliant on the quality and quantity of data. In this investigation, we amalgamated the most extensive dataset (4901 compounds) sourced from governmental GHS-compliant databases and literature to develop binary classification models of chemical ocular toxicity. We employed 12 molecular representations in conjunction with six machine learning algorithms and two deep learning algorithms to create a series of binary classification models. The findings indicated that the deep learning method GCN outperformed the machine learning models in cross-validation, achieving an impressive AUC of 0.915. However, the top-performing machine learning model (RF-Descriptor) demonstrated excellent performance with an AUC of 0.869 on the test set and was therefore selected as the best model. To enhance model interpretability, we conducted the SHAP method and attention weights analysis. The two approaches offered visual depictions of the relevance of key descriptors and substructures in predicting ocular toxicity of chemicals. Thus, we successfully struck a delicate balance between data quality and model interpretability, rendering our model valuable for predicting and comprehending potential ocular-toxic compounds in the early stages of drug discovery.
Collapse
Affiliation(s)
- Yiqing Zhou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Ze Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Zejun Huang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yuanting Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Xinxin Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| |
Collapse
|
108
|
Campana PA, Prasse P, Lienhard M, Thedinga K, Herwig R, Scheffer T. Cancer drug sensitivity estimation using modular deep Graph Neural Networks. NAR Genom Bioinform 2024; 6:lqae043. [PMID: 38680251 PMCID: PMC11055499 DOI: 10.1093/nargab/lqae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/01/2024] [Accepted: 04/17/2024] [Indexed: 05/01/2024] Open
Abstract
Computational drug sensitivity models have the potential to improve therapeutic outcomes by identifying targeted drugs components that are tailored to the transcriptomic profile of a given primary tumor. The SMILES representation of molecules that is used by state-of-the-art drug-sensitivity models is not conducive for neural networks to generalize to new drugs, in part because the distance between atoms does not generally correspond to the distance between their representation in the SMILES strings. Graph-attention networks, on the other hand, are high-capacity models that require large training-data volumes which are not available for drug-sensitivity estimation. We develop a modular drug-sensitivity graph-attentional neural network. The modular architecture allows us to separately pre-train the graph encoder and graph-attentional pooling layer on related tasks for which more data are available. We observe that this model outperforms reference models for the use cases of precision oncology and drug discovery; in particular, it is better able to predict the specific interaction between drug and cell line that is not explained by the general cytotoxicity of the drug and the overall survivability of the cell line. The complete source code is available at https://zenodo.org/doi/10.5281/zenodo.8020945. All experiments are based on the publicly available GDSC data.
Collapse
Affiliation(s)
- Pedro A Campana
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| | - Paul Prasse
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| | - Matthias Lienhard
- Max Planck Institute for Molecular Genetics, Department Computational Molecular Biology, Berlin, Germany
| | - Kristina Thedinga
- Max Planck Institute for Molecular Genetics, Department Computational Molecular Biology, Berlin, Germany
| | - Ralf Herwig
- Max Planck Institute for Molecular Genetics, Department Computational Molecular Biology, Berlin, Germany
| | - Tobias Scheffer
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| |
Collapse
|
109
|
Zhang VY, O’Connor SL, Welsh WJ, James MH. Machine learning models to predict ligand binding affinity for the orexin 1 receptor. ARTIFICIAL INTELLIGENCE CHEMISTRY 2024; 2:100040. [PMID: 38476266 PMCID: PMC10927255 DOI: 10.1016/j.aichem.2023.100040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]
Abstract
The orexin 1 receptor (OX1R) is a G-protein coupled receptor that regulates a variety of physiological processes through interactions with the neuropeptides orexin A and B. Selective OX1R antagonists exhibit therapeutic effects in preclinical models of several behavioral disorders, including drug seeking and overeating. However, currently there are no selective OX1R antagonists approved for clinical use, fueling demand for novel compounds that act at this target. In this study, we meticulously curated a dataset comprising over 1300 OX1R ligands using a stringent filter and criteria cascade. Subsequently, we developed highly predictive quantitative structure-activity relationship (QSAR) models employing the optimized hyper-parameters for the random forest machine learning algorithm and twelve 2D molecular descriptors selected by recursive feature elimination with a 5-fold cross-validation process. The predictive capacity of the QSAR model was further assessed using an external test set and enrichment study, confirming its high predictivity. The practical applicability of our final QSAR model was demonstrated through virtual screening of the DrugBank database. This revealed two FDA-approved drugs (isavuconazole and cabozantinib) as potential OX1R ligands, confirmed by radiolabeled OX1R binding assays. To our best knowledge, this study represents the first report of highly predictive QSAR models on a large comprehensive dataset of diverse OX1R ligands, which should prove useful for the discovery and design of new compounds targeting this receptor.
Collapse
Affiliation(s)
- Vanessa Y. Zhang
- Department of Psychiatry, Robert Wood Johnson Medical School, Rutgers University and Rutgers Biomedical Health Sciences, Piscataway, NJ, USA
- Brain Health Institute, Rutgers University and Rutgers Biomedical and Health Sciences, Piscataway, NJ, USA
- West Windsor-Plainsboro High School South, West Windsor, NJ, USA
| | - Shayna L. O’Connor
- Department of Psychiatry, Robert Wood Johnson Medical School, Rutgers University and Rutgers Biomedical Health Sciences, Piscataway, NJ, USA
- Brain Health Institute, Rutgers University and Rutgers Biomedical and Health Sciences, Piscataway, NJ, USA
| | - William J. Welsh
- Department of Pharmacology, Robert Wood Johnson Medical School, Rutgers University and Rutgers Biomedical Health Sciences, Piscataway, NJ, USA
| | - Morgan H. James
- Department of Psychiatry, Robert Wood Johnson Medical School, Rutgers University and Rutgers Biomedical Health Sciences, Piscataway, NJ, USA
- Brain Health Institute, Rutgers University and Rutgers Biomedical and Health Sciences, Piscataway, NJ, USA
| |
Collapse
|
110
|
Zhang R, Yuan R, Tian B. PointGAT: A Quantum Chemical Property Prediction Model Integrating Graph Attention and 3D Geometry. J Chem Theory Comput 2024; 20:4115-4128. [PMID: 38727259 DOI: 10.1021/acs.jctc.3c01420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Predicting quantum chemical properties is a fundamental challenge for computational chemistry. While the development of graph neural networks has advanced molecular representation learning and property prediction, their performance could be further enhanced by incorporating three-dimensional (3D) structural geometry into two-dimensional (2D) molecular graph representation. In this study, we introduce the PointGAT model for quantum molecular property prediction, which integrates 3D molecular coordinates with graph-attention modeling. Comparison with other current models in molecular prediction tasks showed that PointGAT could provide higher predictive accuracy in various benchmark data sets from MoleculeNet, including ESOL, FreeSolv, Lipop, HIV, and 6 out of 12 tasks of the QM9 data set. To further examine PointGAT prediction of quantum mechanical (QM) energies, we constructed a C10 data set comprising 11,841 charged and chiral carbocation intermediates with QM energies calculated at the DM21/6-31G*//B3LYP/6-31G* levels. Notably, PointGAT achieved an R2 value of 0.950 and an MAE of 1.616 kcal/mol, outperforming even the best-performing graph neural network model with a reduction of 0.216 kcal/mol in MAE and an improvement of 0.050 in R2. Additional ablation studies indicated that incorporating molecular geometry into the model resulted in markedly higher predictive accuracy, reducing the MAE value from 1.802 to 1.616 kcal/mol. Moreover, visualization of PointGAT atomic attention weights suggested its predictions were interpretable. Findings in this study support the application of PointGAT as a powerful and versatile tool for quantum chemical property prediction that can facilitate high-accuracy modeling for fundamental exploration of chemical space as well as drug design and molecular engineering.
Collapse
Affiliation(s)
- Rong Zhang
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Rongqing Yuan
- Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
111
|
Zhang R, Lin Y, Wu Y, Deng L, Zhang H, Liao M, Peng Y. MvMRL: a multi-view molecular representation learning method for molecular property prediction. Brief Bioinform 2024; 25:bbae298. [PMID: 38920342 PMCID: PMC11200189 DOI: 10.1093/bib/bbae298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 05/09/2024] [Accepted: 06/07/2024] [Indexed: 06/27/2024] Open
Abstract
Effective molecular representation learning is very important for Artificial Intelligence-driven Drug Design because it affects the accuracy and efficiency of molecular property prediction and other molecular modeling relevant tasks. However, previous molecular representation learning studies often suffer from limitations, such as over-reliance on a single molecular representation, failure to fully capture both local and global information in molecular structure, and ineffective integration of multiscale features from different molecular representations. These limitations restrict the complete and accurate representation of molecular structure and properties, ultimately impacting the accuracy of predicting molecular properties. To this end, we propose a novel multi-view molecular representation learning method called MvMRL, which can incorporate feature information from multiple molecular representations and capture both local and global information from different views well, thus improving molecular property prediction. Specifically, MvMRL consists of four parts: a multiscale CNN-SE Simplified Molecular Input Line Entry System (SMILES) learning component and a multiscale Graph Neural Network encoder to extract local feature information and global feature information from the SMILES view and the molecular graph view, respectively; a Multi-Layer Perceptron network to capture complex non-linear relationship features from the molecular fingerprint view; and a dual cross-attention component to fuse feature information on the multi-views deeply for predicting molecular properties. We evaluate the performance of MvMRL on 11 benchmark datasets, and experimental results show that MvMRL outperforms state-of-the-art methods, indicating its rationality and effectiveness in molecular property prediction. The source code of MvMRL was released in https://github.com/jedison-github/MvMRL.
Collapse
Affiliation(s)
- Ru Zhang
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
| | - Yanmei Lin
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
- Center for Applied Mathematics of Guangxi, Nanning Normal University, 508 Xinning Road, Wuming District, Nanning 530100, China
| | - Yijia Wu
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 932 Lushan South Road, Changsha 410083, China
| | - Hao Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518000, China
| | - Mingzhi Liao
- Center of Bioinformatics, College of Life Sciences, Northwest A&F University, 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Yuzhong Peng
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
- Guangxi Academy of Sciences, 174 East University Road, Nanning 530007, China
| |
Collapse
|
112
|
Shen A, Yuan M, Ma Y, Du J, Wang M. Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction. Brief Bioinform 2024; 25:bbae256. [PMID: 38801702 PMCID: PMC11129775 DOI: 10.1093/bib/bbae256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 04/25/2024] [Accepted: 05/15/2024] [Indexed: 05/29/2024] Open
Abstract
Self-supervised learning plays an important role in molecular representation learning because labeled molecular data are usually limited in many tasks, such as chemical property prediction and virtual screening. However, most existing molecular pre-training methods focus on one modality of molecular data, and the complementary information of two important modalities, SMILES and graph, is not fully explored. In this study, we propose an effective multi-modality self-supervised learning framework for molecular SMILES and graph. Specifically, SMILES data and graph data are first tokenized so that they can be processed by a unified Transformer-based backbone network, which is trained by a masked reconstruction strategy. In addition, we introduce a specialized non-overlapping masking strategy to encourage fine-grained interaction between these two modalities. Experimental results show that our framework achieves state-of-the-art performance in a series of molecular property prediction tasks, and a detailed ablation study demonstrates efficacy of the multi-modality framework and the masking strategy.
Collapse
Affiliation(s)
- Ao Shen
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
| | - Mingzhi Yuan
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
| | - Yingfan Ma
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
| | - Jie Du
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
| | - Manning Wang
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, 200032, Shanghai, China
| |
Collapse
|
113
|
Xiang W, Zhong F, Ni L, Zheng M, Li X, Shi Q, Wang D. Gram matrix: an efficient representation of molecular conformation and learning objective for molecular pretraining. Brief Bioinform 2024; 25:bbae340. [PMID: 38990515 PMCID: PMC11238115 DOI: 10.1093/bib/bbae340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 06/05/2024] [Accepted: 06/28/2024] [Indexed: 07/12/2024] Open
Abstract
Accurate prediction of molecular properties is fundamental in drug discovery and development, providing crucial guidance for effective drug design. A critical factor in achieving accurate molecular property prediction lies in the appropriate representation of molecular structures. Presently, prevalent deep learning-based molecular representations rely on 2D structure information as the primary molecular representation, often overlooking essential three-dimensional (3D) conformational information due to the inherent limitations of 2D structures in conveying atomic spatial relationships. In this study, we propose employing the Gram matrix as a condensed representation of 3D molecular structures and for efficient pretraining objectives. Subsequently, we leverage this matrix to construct a novel molecular representation model, Pre-GTM, which inherently encapsulates 3D information. The model accurately predicts the 3D structure of a molecule by estimating the Gram matrix. Our findings demonstrate that Pre-GTM model outperforms the baseline Graphormer model and other pretrained models in the QM9 and MoleculeNet quantitative property prediction task. The integration of the Gram matrix as a condensed representation of 3D molecular structure, incorporated into the Pre-GTM model, opens up promising avenues for its potential application across various domains of molecular research, including drug design, materials science, and chemical engineering.
Collapse
Affiliation(s)
| | - Feisheng Zhong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350122, China
| | - Lin Ni
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Qian Shi
- Lingang Laboratory, Shanghai 200031, China
| | | |
Collapse
|
114
|
Liu W, Zhang J, Qiao G, Bian J, Dong B, Li Y. HMMF: a hybrid multi-modal fusion framework for predicting drug side effect frequencies. BMC Bioinformatics 2024; 25:196. [PMID: 38769492 PMCID: PMC11555943 DOI: 10.1186/s12859-024-05806-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 05/08/2024] [Indexed: 05/22/2024] Open
Abstract
BACKGROUND The identification of drug side effects plays a critical role in drug repositioning and drug screening. While clinical experiments yield accurate and reliable information about drug-related side effects, they are costly and time-consuming. Computational models have emerged as a promising alternative to predict the frequency of drug-side effects. However, earlier research has primarily centered on extracting and utilizing representations of drugs, like molecular structure or interaction graphs, often neglecting the inherent biomedical semantics of drugs and side effects. RESULTS To address the previously mentioned issue, we introduce a hybrid multi-modal fusion framework (HMMF) for predicting drug side effect frequencies. Considering the wealth of biological and chemical semantic information related to drugs and side effects, incorporating multi-modal information offers additional, complementary semantics. HMMF utilizes various encoders to understand molecular structures, biomedical textual representations, and attribute similarities of both drugs and side effects. It then models drug-side effect interactions using both coarse and fine-grained fusion strategies, effectively integrating these multi-modal features. CONCLUSIONS HMMF exhibits the ability to successfully detect previously unrecognized potential side effects, demonstrating superior performance over existing state-of-the-art methods across various evaluation metrics, including root mean squared error and area under receiver operating characteristic curve, and shows remarkable performance in cold-start scenarios.
Collapse
Affiliation(s)
- Wuyong Liu
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150006, China
| | - Jingyu Zhang
- Department of Neurology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang, China
| | - Guanyu Qiao
- Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Jilong Bian
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150006, China
| | - Benzhi Dong
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150006, China
| | - Yang Li
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150006, China.
| |
Collapse
|
115
|
Schlosser L, Rana D, Pflüger P, Katzenburg F, Glorius F. EnTdecker - A Machine Learning-Based Platform for Guiding Substrate Discovery in Energy Transfer Catalysis. J Am Chem Soc 2024; 146:13266-13275. [PMID: 38695558 DOI: 10.1021/jacs.4c01352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Due to the magnitude of chemical space, the discovery of novel substrates in energy transfer (EnT) catalysis remains a daunting task. Experimental and computational strategies to identify compounds that successfully undergo EnT-mediated reactions are limited by their time and cost efficiency. To accelerate the discovery process in EnT catalysis, we herein present the EnTdecker platform, which facilitates the large-scale virtual screening of potential substrates using machine-learning (ML) based predictions of their excited state properties. To achieve this, a data set is created containing more than 34,000 molecules aiming to cover a vast fraction of synthetically relevant compound space for EnT catalysis. Using this data predictive models are trained, and their aptitude for an in-lab application is demonstrated by rediscovering successful substrates from literature as well as experimental validation through luminescence-based screening. By reducing the computational effort needed to obtain excited state properties, the EnTdecker platform represents a tool to efficiently guide substrate selection and increase the experimental success rate for EnT catalysis. Moreover, through an easy-to-use web application, EnTdecker is made publicly accessible under entdecker.uni-muenster.de.
Collapse
Affiliation(s)
- Leon Schlosser
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| | - Debanjan Rana
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| | - Philipp Pflüger
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| | - Felix Katzenburg
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| | - Frank Glorius
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| |
Collapse
|
116
|
Yao R, Shen Z, Xu X, Ling G, Xiang R, Song T, Zhai F, Zhai Y. Knowledge mapping of graph neural networks for drug discovery: a bibliometric and visualized analysis. Front Pharmacol 2024; 15:1393415. [PMID: 38799167 PMCID: PMC11116974 DOI: 10.3389/fphar.2024.1393415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 04/12/2024] [Indexed: 05/29/2024] Open
Abstract
Introduction In recent years, graph neural network has been extensively applied to drug discovery research. Although researchers have made significant progress in this field, there is less research on bibliometrics. The purpose of this study is to conduct a comprehensive bibliometric analysis of graph neural network applications in drug discovery in order to identify current research hotspots and trends, as well as serve as a reference for future research. Methods Publications from 2017 to 2023 about the application of graph neural network in drug discovery were collected from the Web of Science Core Collection. Bibliometrix, VOSviewer, and Citespace were mainly used for bibliometric studies. Results and Discussion In this paper, a total of 652 papers from 48 countries/regions were included. Research interest in this field is continuously increasing. China and the United States have a significant advantage in terms of funding, the number of publications, and collaborations with other institutions and countries. Although some cooperation networks have been formed in this field, extensive worldwide cooperation still needs to be strengthened. The results of the keyword analysis clarified that graph neural network has primarily been applied to drug-target interaction, drug repurposing, and drug-drug interaction, while graph convolutional neural network and its related optimization methods are currently the core algorithms in this field. Data availability and ethical supervision, balancing computing resources, and developing novel graph neural network models with better interpretability are the key technical issues currently faced. This paper analyzes the current state, hot spots, and trends of graph neural network applications in drug discovery through bibliometric approaches, as well as the current issues and challenges in this field. These findings provide researchers with valuable insights on the current status and future directions of this field.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fei Zhai
- Faculty of Medical Device, Shenyang Pharmaceutical University, Shenyang, China
| | - Yuxuan Zhai
- Faculty of Medical Device, Shenyang Pharmaceutical University, Shenyang, China
| |
Collapse
|
117
|
Zhao B, Xu W, Guan J, Zhou S. Molecular property prediction based on graph structure learning. Bioinformatics 2024; 40:btae304. [PMID: 38710497 PMCID: PMC11112045 DOI: 10.1093/bioinformatics/btae304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 04/06/2024] [Accepted: 05/03/2024] [Indexed: 05/08/2024] Open
Abstract
MOTIVATION Molecular property prediction (MPP) is a fundamental but challenging task in the computer-aided drug discovery process. More and more recent works employ different graph-based models for MPP, which have achieved considerable progress in improving prediction performance. However, current models often ignore relationships between molecules, which could be also helpful for MPP. RESULTS For this sake, in this article we propose a graph structure learning (GSL) based MPP approach, called GSL-MPP. Specifically, we first apply graph neural network (GNN) over molecular graphs to extract molecular representations. Then, with molecular fingerprints, we construct a molecule similarity graph (MSG). Following that, we conduct GSL on the MSG, i.e. molecule-level GSL, to get the final molecular embeddings, which are the results of fuzing both GNN encoded molecular representations and the relationships among molecules. That is, combining both intra-molecule and inter-molecule information. Finally, we use these molecular embeddings to perform MPP. Extensive experiments on 10 various benchmark datasets show that our method could achieve state-of-the-art performance in most cases, especially on classification tasks. Further visualization studies also demonstrate the good molecular representations of our method. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/zby961104/GSL-MPP.
Collapse
Affiliation(s)
- Bangyi Zhao
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200438, China
| | - Weixia Xu
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200438, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200438, China
| |
Collapse
|
118
|
Xue G, Zhong M, Qian T, Li J. PSA-GNN: An augmented GNN framework with priori subgraph knowledge. Neural Netw 2024; 173:106155. [PMID: 38335793 DOI: 10.1016/j.neunet.2024.106155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 12/13/2023] [Accepted: 01/29/2024] [Indexed: 02/12/2024]
Abstract
Graph neural networks have become the primary graph representation learning paradigm, in which nodes update their embeddings by aggregating messages from their neighbors iteratively. However, current message passing based GNNs exploit the higher-order subgraph information other than 1st-order neighbors insufficiently. In contrast, the long-standing graph research has investigated various subgraphs such as motif, clique, core, and truss that contain important structural information to downstream tasks like node classification, which deserve to be preserved by GNNs. In this work, we propose to use the pre-mined subgraphs as priori knowledge to extend the receptive field of GNNs and enhance their expressive power to go beyond the 1st-order Weisfeiler-Lehman isomorphism test. For that, we introduce a general framework called PSA-GNN (Priori Subgraph Augmented Graph Neural Network), which augments each GNN layer by a pair of parallel convolution layers based on a bipartite graph between nodes and priori subgraphs. PSA-GNN intrinsically builds a hybrid receptive field by incorporating priori subgraphs as neighbors, while the embeddings and weights of subgraphs are trainable. Moreover, PSA-GNN can purify the noisy subgraphs both heuristically before training and deterministically during training based on a novel metric called homogeneity. Experimental results show that PSA-GNN achieves an improved performance compared with state-of-the-art message passing based GNN models.
Collapse
Affiliation(s)
- Guotong Xue
- School of Computer Science, Wuhan University, Wuhan, China
| | - Ming Zhong
- School of Computer Science, Wuhan University, Wuhan, China.
| | - Tieyun Qian
- School of Computer Science, Wuhan University, Wuhan, China
| | - Jianxin Li
- School of Information Technology, Deakin University, Burwood, Australia
| |
Collapse
|
119
|
Zhang S, Zhao D, Cui Q. Gap-Δenergy, a New Metric of the Bond Energy State, Assisting to Predict Molecular Toxicity. ACS OMEGA 2024; 9:17839-17847. [PMID: 38680329 PMCID: PMC11044234 DOI: 10.1021/acsomega.3c07682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 04/02/2024] [Accepted: 04/05/2024] [Indexed: 05/01/2024]
Abstract
Molecular toxicity is a critical feature of drug development. It is thus very important to develop computational models to evaluate the toxicity of small molecules. The accuracy of toxicity prediction largely depends on the quality of molecular representation; however, current methods for this purpose do not address this issue well. Here, we introduce a new metric, gap-Δenergy, which is designed to quantify the intermolecular bond energy difference with atom distance. We next find significant variations in the gap-Δenergy distribution among different types of molecules. Moreover, we show that this metric is able to distinguish the toxic small molecules. We collected data sets of toxic and exogenous small molecules and presented a novel index, namely, global toxicity, to evaluate the overall toxicity of molecules. Based on molecular descriptors and the proposed gap-Δenergy metric, we further constructed machine learning models that were trained with 7816 small molecules. The XGBoost-based model achieved the best performance with an AUC score of 0.965 and an F1 score of 0.849 on the test set (1954 small molecules), which outperformed the model that did not use gap-Δenergy features, with a sensitivity score increase of 3.2%.
Collapse
Affiliation(s)
- Senpeng Zhang
- Department of Biomedical
Informatics, State Key Laboratory of Vascular Homeostasis and Remodeling,
School of Basic Medical Sciences, Peking
University, 38 Xueyuan Rd, Beijing 100191, People’s Republic
of China
| | - Dongyu Zhao
- Department of Biomedical
Informatics, State Key Laboratory of Vascular Homeostasis and Remodeling,
School of Basic Medical Sciences, Peking
University, 38 Xueyuan Rd, Beijing 100191, People’s Republic
of China
| | | |
Collapse
|
120
|
Song L, Zhu H, Wang K, Li M. LGGA-MPP: Local Geometry-Guided Graph Attention for Molecular Property Prediction. J Chem Inf Model 2024; 64:3105-3113. [PMID: 38516950 DOI: 10.1021/acs.jcim.3c02058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Molecular property prediction is a fundamental task of drug discovery. With the rapid development of deep learning, computational approaches for predicting molecular properties are experiencing increasing popularity. However, these existing methods often ignore the 3D information on molecules, which is critical in molecular representation learning. In the past few years, several self-supervised learning (SSL) approaches have been proposed to exploit the geometric information by using pre-training on 3D molecular graphs and fine-tuning on 2D molecular graphs. Most of these approaches are based on the global geometry of molecules, and there is still a challenge in capturing the local structure and local interpretability. To this end, we propose local geometry-guided graph attention (LGGA), which integrates local geometry into the attention mechanism and message-passing of graph neural networks (GNNs). LGGA introduces a novel method to model molecules, enhancing the model's ability to capture intricate local structural details. Experiments on various data sets demonstrate that the integration of local geometry has a significant impact on the improved results, and our model outperforms the state-of-the-art methods for molecular property prediction, establishing its potential as a promising tool in drug discovery and related fields.
Collapse
Affiliation(s)
- Lei Song
- School of Software, XinJiang University, Urumqi 830091, China
| | - Huimin Zhu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Kaili Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
121
|
Long TZ, Jiang DJ, Shi SH, Deng YC, Wang WX, Cao DS. Enhancing Multi-species Liver Microsomal Stability Prediction through Artificial Intelligence. J Chem Inf Model 2024; 64:3222-3236. [PMID: 38498003 DOI: 10.1021/acs.jcim.4c00159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Liver microsomal stability, a crucial aspect of metabolic stability, significantly impacts practical drug discovery. However, current models for predicting liver microsomal stability are based on limited molecular information from a single species. To address this limitation, we constructed the largest public database of compounds from three common species: human, rat, and mouse. Subsequently, we developed a series of classification models using both traditional descriptor-based and classic graph-based machine learning (ML) algorithms. Remarkably, the best-performing models for the three species achieved Matthews correlation coefficients (MCCs) of 0.616, 0.603, and 0.574, respectively, on the test set. Furthermore, through the construction of consensus models based on these individual models, we have demonstrated their superior predictive performance in comparison with the existing models of the same type. To explore the similarities and differences in the properties of liver microsomal stability among multispecies molecules, we conducted preliminary interpretative explorations using the Shapley additive explanations (SHAP) and atom heatmap approaches for the models and misclassified molecules. Additionally, we further investigated representative structural modifications and substructures that decrease the liver microsomal stability in different species using the matched molecule pair analysis (MMPA) method and substructure extraction techniques. The established prediction models, along with insightful interpretation information regarding liver microsomal stability, will significantly contribute to enhancing the efficiency of exploring practical drugs for development.
Collapse
Affiliation(s)
- Teng-Zhi Long
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - De-Jun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Shao-Hua Shi
- Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR 999077, P. R. China
| | - You-Chao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Wen-Xuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR 999077, P. R. China
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| |
Collapse
|
122
|
Chang J, Fan X, Tian B. DeepP450: Predicting Human P450 Activities of Small Molecules by Integrating Pretrained Protein Language Model and Molecular Representation. J Chem Inf Model 2024; 64:3149-3160. [PMID: 38587937 DOI: 10.1021/acs.jcim.4c00115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
Cytochrome P450 enzymes (CYPs) play a crucial role in Phase I drug metabolism in the human body, and CYP activity toward compounds can significantly affect druggability, making early prediction of CYP activity and substrate identification essential for therapeutic development. Here, we established a deep learning model for assessing potential CYP substrates, DeepP450, by fine-tuning protein and molecule pretrained models through feature integration with cross-attention and self-attention layers. This model exhibited high prediction accuracy (0.92) on the test set, with area under the receiver operating characteristic curve (AUROC) values ranging from 0.89 to 0.98 in substrate/nonsubstrate predictions across the nine major human CYPs, surpassing current benchmarks for CYP activity prediction. Notably, DeepP450 uses only one model to predict substrates/nonsubstrates for any of the nine CYPs and exhibits certain generalizability on novel compounds and different categories of human CYPs, which could greatly facilitate early stage drug design by avoiding CYP-reactive compounds.
Collapse
Affiliation(s)
- Jiamin Chang
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Xiaoyu Fan
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
123
|
Vik D, Pii D, Mudaliar C, Nørregaard-Madsen M, Kontijevskis A. Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns. Sci Rep 2024; 14:8733. [PMID: 38627535 PMCID: PMC11021461 DOI: 10.1038/s41598-024-59620-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 04/12/2024] [Indexed: 04/19/2024] Open
Abstract
This study explores how machine-learning can be used to predict chromatographic retention times (RT) for the analysis of small molecules, with the objective of identifying a machine-learning framework with the robustness required to support a chemical synthesis production platform. We used internally generated data from high-throughput parallel synthesis in context of pharmaceutical drug discovery projects. We tested machine-learning models from the following frameworks: XGBoost, ChemProp, and DeepChem, using a dataset of 7552 small molecules. Our findings show that two specific models, AttentiveFP and ChemProp, performed better than XGBoost and a regular neural network in predicting RT accurately. We also assessed how well these models performed over time and found that molecular graph neural networks consistently gave accurate predictions for new chemical series. In addition, when we applied ChemProp on the publicly available METLIN SMRT dataset, it performed impressively with an average error of 38.70 s. These results highlight the efficacy of molecular graph neural networks, especially ChemProp, in diverse RT prediction scenarios, thereby enhancing the efficiency of chromatographic analysis.
Collapse
Affiliation(s)
- Daniel Vik
- Amgen Research Copenhagen, Amgen Inc., 2100, Copenhagen, Denmark.
| | - David Pii
- Amgen Research Copenhagen, Amgen Inc., 2100, Copenhagen, Denmark
| | - Chirag Mudaliar
- Amgen Research Copenhagen, Amgen Inc., 2100, Copenhagen, Denmark
| | | | | |
Collapse
|
124
|
Feng H, Qin L, Zhang B, Zhou J. Prediction and Interpretability of Melting Points of Ionic Liquids Using Graph Neural Networks. ACS OMEGA 2024; 9:16016-16025. [PMID: 38617653 PMCID: PMC11007696 DOI: 10.1021/acsomega.3c09543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/13/2024] [Accepted: 03/15/2024] [Indexed: 04/16/2024]
Abstract
Ionic liquids (ILs) have wide and promising applications in fields such as chemical engineering, energy, and the environment. However, the melting points (MPs) of ILs are one of the most crucial properties affecting their applications. The MPs of ILs are affected by various factors, and tuning these in a laboratory is time-consuming and costly. Therefore, an accurate and efficient method is required to predict the desired MPs in the design of novel targeted ILs. In this study, three descriptor-based machine learning (DBML) models and eight graph neural network (GNN) models were proposed to predict the MPs of ILs. Fingerprints and molecular graphs were used to represent molecules for the DBML and GNNs, respectively. The GNN models demonstrated performance superior to that of the DBML models. Among all of the examined models, the graph convolutional model exhibited the best performance with high accuracy (root-mean-squared error = 37.06, mean absolute error = 28.79, and correlation coefficient = 0.76). Benefiting from molecular graph representation, we built a GNN-based interpretable model to reveal the atomistic contribution to the MPs of ILs using a data-driven procedure. According to our interpretable model, amino groups, S+, N+, and P+ would increase the MPs of ILs, while the negatively charged halogen atoms, S-, and N- would decrease the MPs of ILs. The results of this study provide new insight into the rapid screening and synthesis of targeted ILs with appropriate MPs.
Collapse
Affiliation(s)
- Haijun Feng
- School
of Computer Sciences, Shenzhen Institute
of Information Technology, Shenzhen, Guangdong 518172, China
| | - Lanlan Qin
- School
of Chemistry and Chemical Engineering, South
China University of Technology, Guangzhou, Guangdong 510640, China
| | - Bingxuan Zhang
- School
of Computer Sciences, Shenzhen Institute
of Information Technology, Shenzhen, Guangdong 518172, China
| | - Jian Zhou
- School
of Chemistry and Chemical Engineering, South
China University of Technology, Guangzhou, Guangdong 510640, China
| |
Collapse
|
125
|
Zhang Z, Bian Y, Xie A, Han P, Zhou S. Can Pretrained Models Really Learn Better Molecular Representations for AI-Aided Drug Discovery? J Chem Inf Model 2024; 64:2921-2930. [PMID: 38145387 PMCID: PMC11005046 DOI: 10.1021/acs.jcim.3c01707] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/29/2023] [Accepted: 11/29/2023] [Indexed: 12/26/2023]
Abstract
Self-supervised pretrained models are gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pretrained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations has not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hopping (SH) in traditional Quantitative Structure-Activity Relationship analysis, we propose a method named Representation-Property Relationship Analysis (RePRA) to evaluate the quality of the representations extracted by the pretrained model and visualize the relationship between the representations and properties. The concepts of ACs and SH are generalized from the structure-activity context to the representation-property context, and the underlying principles of RePRA are analyzed theoretically. Two scores are designed to measure the generalized ACs and SH detected by RePRA, and therefore, the quality of representations can be evaluated. In experiments, representations of molecules from 10 target tasks generated by 7 pretrained models are analyzed. The results indicate that the state-of-the-art pretrained models can overcome some shortcomings of canonical Extended-Connectivity FingerPrints, while the correlation between the basis of the representation space and specific molecular substructures are not explicit. Thus, some representations could be even worse than the canonical fingerprints. Our method enables researchers to evaluate the quality of molecular representations generated by their proposed self-supervised pretrained models. And our findings can guide the community to develop better pretraining techniques to regularize the occurrence of ACs and SH.
Collapse
Affiliation(s)
- Ziqiao Zhang
- Shanghai
Key Lab of Intelligent Information Processing, and School of Computer
Science, Fudan University, Shanghai 200438, China
| | | | - Ailin Xie
- Shanghai
Key Lab of Intelligent Information Processing, and School of Computer
Science, Fudan University, Shanghai 200438, China
| | - Pengju Han
- Shanghai
Key Lab of Intelligent Information Processing, and School of Computer
Science, Fudan University, Shanghai 200438, China
| | - Shuigeng Zhou
- Shanghai
Key Lab of Intelligent Information Processing, and School of Computer
Science, Fudan University, Shanghai 200438, China
| |
Collapse
|
126
|
Kengkanna A, Ohue M. Enhancing property and activity prediction and interpretation using multiple molecular graph representations with MMGX. Commun Chem 2024; 7:74. [PMID: 38580841 PMCID: PMC10997661 DOI: 10.1038/s42004-024-01155-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/18/2024] [Indexed: 04/07/2024] Open
Abstract
Graph Neural Networks (GNNs) excel in compound property and activity prediction, but the choice of molecular graph representations significantly influences model learning and interpretation. While atom-level molecular graphs resemble natural topology, they overlook key substructures or functional groups and their interpretation partially aligns with chemical intuition. Recent research suggests alternative representations using reduced molecular graphs to integrate higher-level chemical information and leverages both representations for model. However, there is a lack of studies about applicability and impact of different molecular graphs on model learning and interpretation. Here, we introduce MMGX (Multiple Molecular Graph eXplainable discovery), investigating the effects of multiple molecular graphs, including Atom, Pharmacophore, JunctionTree, and FunctionalGroup, on model learning and interpretation with various perspectives. Our findings indicate that multiple graphs relatively improve model performance, but in varying degrees depending on datasets. Interpretation from multiple graphs in different views provides more comprehensive features and potential substructures consistent with background knowledge. These results help to understand model decisions and offer valuable insights for subsequent tasks. The concept of multiple molecular graph representations and diverse interpretation perspectives has broad applicability across tasks, architectures, and explanation techniques, enhancing model learning and interpretation for relevant applications in drug discovery.
Collapse
Affiliation(s)
- Apakorn Kengkanna
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, 226-8501, Japan
| | - Masahito Ohue
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, 226-8501, Japan.
| |
Collapse
|
127
|
Harnik Y, Milo A. A focus on molecular representation learning for the prediction of chemical properties. Chem Sci 2024; 15:5052-5055. [PMID: 38577350 PMCID: PMC10988574 DOI: 10.1039/d4sc90043j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024] Open
Abstract
Molecular representation learning (MRL) is a specialized field in which deep-learning models condense essential molecular information into a vectorized form. Whereas recent research has predominantly emphasized drug discovery and bioactivity applications, MRL holds significant potential for diverse chemical properties beyond these contexts. The recently published study by King-Smith introduces a novel application of molecular representation training and compellingly demonstrates its value in predicting molecular properties (E. King-Smith, Chem. Sci., 2024, https://doi.org/10.1039/D3SC04928K). In this focus article, we will briefly delve into MRL in chemistry and the significance of King-Smith's work within the dynamic landscape of this evolving field.
Collapse
Affiliation(s)
- Yonatan Harnik
- Department of Chemistry, Ben-Gurion University of the Negev Beer Sheva 84105 Israel
| | - Anat Milo
- Department of Chemistry, Ben-Gurion University of the Negev Beer Sheva 84105 Israel
| |
Collapse
|
128
|
Li Y, Wang W, Liu J, Wu C. Pre-training molecular representation model with spatial geometry for property prediction. Comput Biol Chem 2024; 109:108023. [PMID: 38335852 DOI: 10.1016/j.compbiolchem.2024.108023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/22/2024] [Accepted: 02/01/2024] [Indexed: 02/12/2024]
Abstract
AI-enhanced bioinformatics and cheminformatics pivots on generating increasingly descriptive and generalized molecular representation. Accurate prediction of molecular properties needs a comprehensive description of molecular geometry. We design a novel Graph Isomorphic Network (GIN) based model integrating a three-level network structure with a dual-level pre-training approach that aligns the characteristics of molecules. In our Spatial Molecular Pre-training (SMPT) Model, the network can learn implicit geometric information in layers from lower to higher according to the dimension. Extensive evaluations against established baseline models validate the enhanced efficacy of SMPT, with notable accomplishments in classification tasks. These results emphasize the importance of spatial geometric information in molecular representation modeling and demonstrate the potential of SMPT as a valuable tool for property prediction.
Collapse
Affiliation(s)
- Yishui Li
- Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Deya Road, Changsha, 410073, China; National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Deya Road, Changsha, 410073, China.
| | - Wei Wang
- National SuperComputer Center in Tianjin, TEDA Sixth Street, Tianjin, 300450, China
| | - Jie Liu
- Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Deya Road, Changsha, 410073, China; National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Deya Road, Changsha, 410073, China
| | - Chengkun Wu
- Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Deya Road, Changsha, 410073, China; National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Deya Road, Changsha, 410073, China.
| |
Collapse
|
129
|
Ghandikota SK, Jegga AG. Application of artificial intelligence and machine learning in drug repurposing. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 205:171-211. [PMID: 38789178 DOI: 10.1016/bs.pmbts.2024.03.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
The purpose of drug repurposing is to leverage previously approved drugs for a particular disease indication and apply them to another disease. It can be seen as a faster and more cost-effective approach to drug discovery and a powerful tool for achieving precision medicine. In addition, drug repurposing can be used to identify therapeutic candidates for rare diseases and phenotypic conditions with limited information on disease biology. Machine learning and artificial intelligence (AI) methodologies have enabled the construction of effective, data-driven repurposing pipelines by integrating and analyzing large-scale biomedical data. Recent technological advances, especially in heterogeneous network mining and natural language processing, have opened up exciting new opportunities and analytical strategies for drug repurposing. In this review, we first introduce the challenges in repurposing approaches and highlight some success stories, including those during the COVID-19 pandemic. Next, we review some existing computational frameworks in the literature, organized on the basis of the type of biomedical input data analyzed and the computational algorithms involved. In conclusion, we outline some exciting new directions that drug repurposing research may take, as pioneered by the generative AI revolution.
Collapse
Affiliation(s)
- Sudhir K Ghandikota
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Anil G Jegga
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States.
| |
Collapse
|
130
|
Xu L, Xia L, Pan S, Li Z. Triple Generative Self-Supervised Learning Method for Molecular Property Prediction. Int J Mol Sci 2024; 25:3794. [PMID: 38612602 PMCID: PMC11012122 DOI: 10.3390/ijms25073794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 03/17/2024] [Accepted: 03/26/2024] [Indexed: 04/14/2024] Open
Abstract
Molecular property prediction is an important task in drug discovery, and with help of self-supervised learning methods, the performance of molecular property prediction could be improved by utilizing large-scale unlabeled dataset. In this paper, we propose a triple generative self-supervised learning method for molecular property prediction, called TGSS. Three encoders including a bi-directional long short-term memory recurrent neural network (BiLSTM), a Transformer, and a graph attention network (GAT) are used in pre-training the model using molecular sequence and graph structure data to extract molecular features. The variational auto encoder (VAE) is used for reconstructing features from the three models. In the downstream task, in order to balance the information between different molecular features, a feature fusion module is added to assign different weights to each feature. In addition, to improve the interpretability of the model, atomic similarity heat maps were introduced to demonstrate the effectiveness and rationality of molecular feature extraction. We demonstrate the accuracy of the proposed method on chemical and biological benchmark datasets by comparative experiments.
Collapse
Affiliation(s)
| | | | | | - Zhen Li
- College of Computer Science and Technology, Qingdao University, Qingdao 266071, China; (L.X.); (L.X.); (S.P.)
| |
Collapse
|
131
|
Zhang C, Xie L, Lu X, Mao R, Xu L, Xu X. Developing an Improved Cycle Architecture for AI-Based Generation of New Structures Aimed at Drug Discovery. Molecules 2024; 29:1499. [PMID: 38611779 PMCID: PMC11013495 DOI: 10.3390/molecules29071499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/18/2024] [Accepted: 03/21/2024] [Indexed: 04/14/2024] Open
Abstract
Drug discovery involves a crucial step of optimizing molecules with the desired structural groups. In the domain of computer-aided drug discovery, deep learning has emerged as a prominent technique in molecular modeling. Deep generative models, based on deep learning, play a crucial role in generating novel molecules when optimizing molecules. However, many existing molecular generative models have limitations as they solely process input information in a forward way. To overcome this limitation, we propose an improved generative model called BD-CycleGAN, which incorporates BiLSTM (bidirectional long short-term memory) and Mol-CycleGAN (molecular cycle generative adversarial network) to preserve the information of molecular input. To evaluate the proposed model, we assess its performance by analyzing the structural distribution and evaluation matrices of generated molecules in the process of structural transformation. The results demonstrate that the BD-CycleGAN model achieves a higher success rate and exhibits increased diversity in molecular generation. Furthermore, we demonstrate its application in molecular docking, where it successfully increases the docking score for the generated molecules. The proposed BD-CycleGAN architecture harnesses the power of deep learning to facilitate the generation of molecules with desired structural features, thus offering promising advancements in the field of drug discovery processes.
Collapse
Affiliation(s)
| | | | | | | | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China; (C.Z.); (L.X.); (X.L.); (R.M.)
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China; (C.Z.); (L.X.); (X.L.); (R.M.)
| |
Collapse
|
132
|
Vinh T, Nguyen L, Trinh QH, Nguyen-Vo TH, Nguyen BP. Predicting Cardiotoxicity of Molecules Using Attention-Based Graph Neural Networks. J Chem Inf Model 2024; 64:1816-1827. [PMID: 38438914 DOI: 10.1021/acs.jcim.3c01286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
In drug discovery, the search for new and effective medications is often hindered by concerns about toxicity. Numerous promising molecules fail to pass the later phases of drug development due to strict toxicity assessments. This challenge significantly increases the cost, time, and human effort needed to discover new therapeutic molecules. Additionally, a considerable number of drugs already on the market have been withdrawn or re-evaluated because of their unwanted side effects. Among the various types of toxicity, drug-induced heart damage is a severe adverse effect commonly associated with several medications, especially those used in cancer treatments. Although a number of computational approaches have been proposed to identify the cardiotoxicity of molecules, the performance and interpretability of the existing approaches are limited. In our study, we proposed a more effective computational framework to predict the cardiotoxicity of molecules using an attention-based graph neural network. Experimental results indicated that the proposed framework outperformed the other methods. The stability of the model was also confirmed by our experiments. To assist researchers in evaluating the cardiotoxicity of molecules, we have developed an easy-to-use online web server that incorporates our model.
Collapse
Affiliation(s)
- Tuan Vinh
- Department of Chemistry, Emory University, 201 Dowman Drive, Atlanta, Georgia 30322-1007, United States
| | - Loc Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6012, New Zealand
| | - Quang H Trinh
- School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi 100000, Vietnam
| | - Thanh-Hoang Nguyen-Vo
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6012, New Zealand
- School of Innovation, Design and Technology, Wellington Institute of Technology, 21 Kensington Avenue, Lower Hutt 5012, New Zealand
| | - Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6012, New Zealand
| |
Collapse
|
133
|
Lou S, Yu Z, Huang Z, Wang H, Pan F, Li W, Liu G, Tang Y. In Silico Prediction of Chemical Acute Dermal Toxicity Using Explainable Machine Learning Methods. Chem Res Toxicol 2024; 37:513-524. [PMID: 38380652 DOI: 10.1021/acs.chemrestox.4c00012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
The research on acute dermal toxicity has consistently been a crucial component in assessing the potential risks of human exposure to active ingredients in pesticides and related plant protection products. However, it is difficult to directly identify the acute dermal toxicity of potential compounds through animal experiments alone. In our study, we separately integrated 1735 experimental data based on rabbits and 1679 experimental data based on rats to construct acute dermal toxicity prediction models using machine learning and deep learning algorithms. The best models for the two animal species achieved AUC values of 78.0 and 82.0%, respectively, on 10-fold cross-validation. Additionally, we employed SARpy to extract structural alerts, and in conjunction with Shapley additive explanation and attentive FP heatmap, we identified important features and structural fragments associated with acute dermal toxicity. This approach offers valuable insights for the detection of positive compounds. Moreover, a standalone software tool was developed to make acute dermal toxicity prediction easier. In summary, our research would provide an effective tool for acute dermal toxicity evaluation of pesticides, cosmetics, and drug safety assessment.
Collapse
Affiliation(s)
- Shang Lou
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zhuohang Yu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zejun Huang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Haoqiang Wang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Fei Pan
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
134
|
He Y, Liu K, Liu Y, Han W. Prediction of bitterness based on modular designed graph neural network. BIOINFORMATICS ADVANCES 2024; 4:vbae041. [PMID: 38566918 PMCID: PMC10987211 DOI: 10.1093/bioadv/vbae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/31/2024] [Accepted: 03/11/2024] [Indexed: 04/04/2024]
Abstract
Motivation Bitterness plays a pivotal role in our ability to identify and evade harmful substances in food. As one of the five tastes, it constitutes a critical component of our sensory experiences. However, the reliance on human tasting for discerning flavors presents cost challenges, rendering in silico prediction of bitterness a more practical alternative. Results In this study, we introduce the use of Graph Neural Networks (GNNs) in bitterness prediction, superseding traditional machine learning techniques. We developed an advanced model, a Hybrid Graph Neural Network (HGNN), surpassing conventional GNNs according to tests on public datasets. Using HGNN and three other GNNs, we designed BitterGNNs, a bitterness predictor that achieved an AUC value of 0.87 in both external bitter/non-bitter and bitter/sweet evaluations, outperforming the acclaimed RDKFP-MLP predictor with AUC values of 0.86 and 0.85. We further created a bitterness prediction website and database, TastePD (https://www.tastepd.com/). The BitterGNNs predictor, built on GNNs, offers accurate bitterness predictions, enhancing the efficacy of bitterness prediction, aiding advanced food testing methodology development, and deepening our understanding of bitterness origins. Availability and implementation TastePD can be available at https://www.tastepd.com, all codes are at https://github.com/heyigacu/BitterGNN.
Collapse
Affiliation(s)
- Yi He
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| | - Kaifeng Liu
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| | - Yuyang Liu
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| | - Weiwei Han
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| |
Collapse
|
135
|
Lim H, Joo Y, Ha E, Song Y, Yoon S, Shin T. Brain Age Prediction Using Multi-Hop Graph Attention Combined with Convolutional Neural Network. Bioengineering (Basel) 2024; 11:265. [PMID: 38534539 DOI: 10.3390/bioengineering11030265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 02/28/2024] [Accepted: 03/01/2024] [Indexed: 03/28/2024] Open
Abstract
Convolutional neural networks (CNNs) have been used widely to predict biological brain age based on brain magnetic resonance (MR) images. However, CNNs focus mainly on spatially local features and their aggregates and barely on the connective information between distant regions. To overcome this issue, we propose a novel multi-hop graph attention (MGA) module that exploits both the local and global connections of image features when combined with CNNs. After insertion between convolutional layers, MGA first converts the convolution-derived feature map into graph-structured data by using patch embedding and embedding-distance-based scoring. Multi-hop connections between the graph nodes are modeled by using the Markov chain process. After performing multi-hop graph attention, MGA re-converts the graph into an updated feature map and transfers it to the next convolutional layer. We combined the MGA module with sSE (spatial squeeze and excitation)-ResNet18 for our final prediction model (MGA-sSE-ResNet18) and performed various hyperparameter evaluations to identify the optimal parameter combinations. With 2788 three-dimensional T1-weighted MR images of healthy subjects, we verified the effectiveness of MGA-sSE-ResNet18 with comparisons to four established, general-purpose CNNs and two representative brain age prediction models. The proposed model yielded an optimal performance with a mean absolute error of 2.822 years and Pearson's correlation coefficient (PCC) of 0.968, demonstrating the potential of the MGA module to improve the accuracy of brain age prediction.
Collapse
Affiliation(s)
- Heejoo Lim
- Division of Mechanical and Biomedical Engineering, Ewha W. University, Seoul 03760, Republic of Korea
- Graduate Program in Smart Factory, Ewha W. University, Seoul 03760, Republic of Korea
| | - Yoonji Joo
- Ewha Brain Institute, Ewha W. University, Seoul 03760, Republic of Korea
| | - Eunji Ha
- Ewha Brain Institute, Ewha W. University, Seoul 03760, Republic of Korea
| | - Yumi Song
- Ewha Brain Institute, Ewha W. University, Seoul 03760, Republic of Korea
- Department of Brain and Cognitive Sciences, Ewha W. University, Seoul 03760, Republic of Korea
| | - Sujung Yoon
- Ewha Brain Institute, Ewha W. University, Seoul 03760, Republic of Korea
- Department of Brain and Cognitive Sciences, Ewha W. University, Seoul 03760, Republic of Korea
| | - Taehoon Shin
- Division of Mechanical and Biomedical Engineering, Ewha W. University, Seoul 03760, Republic of Korea
- Graduate Program in Smart Factory, Ewha W. University, Seoul 03760, Republic of Korea
| |
Collapse
|
136
|
Hunklinger A, Hartog P, Šícho M, Godin G, Tetko IV. The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS joint compound solubility challenge. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2024; 29:100144. [PMID: 38316342 DOI: 10.1016/j.slasd.2024.01.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 01/06/2024] [Accepted: 01/22/2024] [Indexed: 02/07/2024]
Abstract
The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.eu/article/27. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants.
Collapse
Affiliation(s)
- Andrea Hunklinger
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany
| | - Peter Hartog
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany
| | - Martin Šícho
- Leiden Academic Centre for Drug Research, Leiden University, 55 Einsteinweg, 2333 CC Leiden, the Netherlands; CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - Guillaume Godin
- dsm-firmenich SA, Rue de la Bergère 7, CH-1242 Satigny, Switzerland
| | - Igor V Tetko
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany; BIGCHEM GmbH, Valerystr. 49, DE-85716 Unterschleißheim, Germany.
| |
Collapse
|
137
|
Shi W, Lin K, Zhao Y, Li Z, Zhou T. Toward a comprehensive understanding of alicyclic compounds: Bio-effects perspective and deep learning approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 912:168927. [PMID: 38042202 DOI: 10.1016/j.scitotenv.2023.168927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 11/17/2023] [Accepted: 11/25/2023] [Indexed: 12/04/2023]
Abstract
The escalating use of alicyclic compounds in modern industrial production has led to a rapid increase of these substances in the environment, posing significant health hazards. Addressing this challenge necessitates a comprehensive understanding of these compounds, which can be achieved through the deep learning approach. Graph neural networks (GNN) known for its' extraordinary ability to process graph data with rich relationships, have been employed in various molecular prediction tasks. In this study, alicyclic molecules screened from PCBA, Toxcast and Tox21 are made as general bioactivity and biological targets' activity prediction datasets. GNN-based models are trained on the two datasets, while the Attentive FP and PAGTN achieve best performance individually. In addition, alicyclic carbon atoms make the greatest contribution to biological activity, which indicate that the alicycle structures have significant impact on the carbon atoms' contribution. Moreover, there are terrific number of active molecules in other public datasets, indicates that alicyclic compounds deserve more attention in POPs control. This study uncovered deeper structural-activity relationships within these compounds, offering new perspectives and methodologies for academic research in the field.
Collapse
Affiliation(s)
- Wenjie Shi
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China.
| | - Kunsen Lin
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China.
| | - Youcai Zhao
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, 1515 North Zhongshan Rd. (No. 2), Shanghai 200092, PR China
| | - Zongsheng Li
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China
| | - Tao Zhou
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, 1515 North Zhongshan Rd. (No. 2), Shanghai 200092, PR China.
| |
Collapse
|
138
|
Huang Z, Lou S, Wang H, Li W, Liu G, Tang Y. AttentiveSkin: To Predict Skin Corrosion/Irritation Potentials of Chemicals via Explainable Machine Learning Methods. Chem Res Toxicol 2024; 37:361-373. [PMID: 38294881 DOI: 10.1021/acs.chemrestox.3c00332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2024]
Abstract
Skin Corrosion/Irritation (Corr./Irrit.) has long been a health hazard in the Globally Harmonized System (GHS). Several in silico models have been built to predict Skin Corr./Irrit. as an alternative to the increasingly restricted animal testing. However, current studies are limited by data amount/quality and model availability. To address these issues, we compiled a traceable consensus GHS data set comprising 731 Corr., 1283 Irrit., and 1205 negative (Neg.) samples from 6 governmental databases and 2 external data sets. Then, a series of binary classifiers were developed with five machine learning (ML) algorithms and six molecular representations. For 10-fold cross-validation, the best Corr. vs Neg. classifier achieved an Area Under the Receiver Operating Characteristic Curve (AUC) of 97.1%, while the best Irrit. vs Neg. classifier achieved an AUC of 84.7%. Compared with existing in silico tools on external validation, our Attentive FP classifiers showed the highest metrics on Corr. vs Neg. and the second highest accuracy on Irrit. vs Neg. The SHapley Additive exPlanation approach was further applied to figure out important molecular features, and the attention weights were visualized to perform interpretable prediction. Structural alerts associated with Skin Corr./Irrit. were also identified. The interpretable Attentive FP classifiers were integrated into the software AttentiveSkin at https://github.com/BeeBeeWong/AttentiveSkin. The conventional ML classifiers are also provided on our platform admetSAR at http://lmmd.ecust.edu.cn/admetsar2/. Considering the data deficiency and the limited model availability of Skin Corr./Irrit., we believe that our data set and models could facilitate chemical safety assessment and relevant studies.
Collapse
Affiliation(s)
- Zejun Huang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Shang Lou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Haoqiang Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
139
|
Ma M, Lei X. A deep learning framework for predicting molecular property based on multi-type features fusion. Comput Biol Med 2024; 169:107911. [PMID: 38160501 DOI: 10.1016/j.compbiomed.2023.107911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/18/2023] [Accepted: 12/24/2023] [Indexed: 01/03/2024]
Abstract
Extracting expressive molecular features is essential for molecular property prediction. Sequence-based representation is a common representation of molecules, which ignores the structure information of molecules. While molecular graph representation has a weak ability in expressing the 3D structure. In this article, we try to make use of the advantages of different type representations simultaneously for molecular property prediction. Thus, we propose a fusion model named DLF-MFF, which integrates the multi-type molecular features. Specifically, we first extract four different types of features from molecular fingerprints, 2D molecular graph, 3D molecular graph and molecular image. Then, in order to learn molecular features individually, we use four essential deep learning frameworks, which correspond to four distinct molecular representations. The final molecular representation is created by integrating the four feature vectors and feeding them into prediction layer to predict molecular property. We compare DLF-MFF with 7 state-of-the-art methods on 6 benchmark datasets consisting of multiple molecular properties, the experimental results show that DLF-MFF achieves state-of-the-art performance on 6 benchmark datasets. Moreover, DLF-MFF is applied to identify potential anti-SARS-CoV-2 inhibitor from 2500 drugs. We predict probability of each drug being inferred as a 3CL protease inhibitor and also calculate the binding affinity scores between each drug and 3CL protease. The results show that DLF-MFF product better performance in the identification of anti-SARS-CoV-2 inhibitor. This work is expected to offer novel research perspectives for accurate prediction of molecular properties and provide valuable insights into drug repurposing for COVID-19.
Collapse
Affiliation(s)
- Mei Ma
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China; School of Mathematics and Statistics, Qinghai Normal University, Qinghai, 810000, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| |
Collapse
|
140
|
Jin Q, Xie J, Huang D, Zhao C, He H. MSFF-MA-DDI: Multi-Source Feature Fusion with Multiple Attention blocks for predicting Drug-Drug Interaction events. Comput Biol Chem 2024; 108:108001. [PMID: 38154317 DOI: 10.1016/j.compbiolchem.2023.108001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 11/30/2023] [Accepted: 12/03/2023] [Indexed: 12/30/2023]
Abstract
The interaction of multiple drugs could lead to severe events, which cause medical injuries and expenses. Accurate prediction of drug-drug interaction (DDI) events can help clinicians make effective decisions and establish appropriate therapy programs. However, there exist two issues worthy of further consideration. (i) The global features of drug molecules should be paid attention to, rather than just their local characteristics. (ii) The fusion of multi-source features should also be studied to capture the comprehensive features of the drug. This study designs a Multi-Source Feature Fusion framework with Multiple Attention blocks named MSFF-MA-DDI that utilizes multimodal data for DDI event prediction. MSFF-MA-DDI can (i) encode global correlations between long-distance atoms in drug molecular sequences by a self-attention layer based on a position embedding block and (ii) fuse drug sequence features and heterogeneous features (chemical substructure, target, and enzyme) through a multi-head attention block to better represent the features of drugs. Experiments on real-world datasets show that MSFF-MA-DDI can achieve performance that is close to or even better than state-of-the-art models. Especially in cold start scenarios, the model can achieve the best performance. The effectiveness of the model is also supported by the case study on nervous system drugs. The source codes and data are available at https://github.com/BioCenter-SHU/MSFF-MA-DDI.
Collapse
Affiliation(s)
- Qi Jin
- School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
| | - Jiang Xie
- School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China.
| | - Dingkai Huang
- School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
| | - Chang Zhao
- School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
| | - Hongjian He
- School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
| |
Collapse
|
141
|
Chen B, Pan Z, Mou M, Zhou Y, Fu W. Is fragment-based graph a better graph-based molecular representation for drug design? A comparison study of graph-based models. Comput Biol Med 2024; 169:107811. [PMID: 38168647 DOI: 10.1016/j.compbiomed.2023.107811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 11/23/2023] [Accepted: 12/03/2023] [Indexed: 01/05/2024]
Abstract
Graph Neural Networks (GNNs) have gained significant traction in various sectors of AI-driven drug design. Over recent years, the integration of fragmentation concepts into GNNs has emerged as a potent strategy to augment the efficacy of molecular generative models. Nonetheless, challenges such as symmetry breaking and potential misrepresentation of intricate cycles and undefined functional groups raise questions about the superiority of fragment-based graph representation over traditional methods. In our research, we undertook a rigorous evaluation, contrasting the predictive prowess of eight models-developed using deep learning algorithms-across 12 benchmark datasets that span a range of properties. These models encompass established methods like GCN, AttentiveFP, and D-MPNN, as well as innovative fragment-based representation techniques. Our results indicate that fragment-based methodologies, notably PharmHGT, significantly improve model performance and interpretability, particularly in scenarios characterized by limited data availability. However, in situations with extensive training, fragment-based molecular graph representations may not necessarily eclipse traditional methods. In summation, we posit that the integration of fragmentation, as an avant-garde technique in drug design, harbors considerable promise for the future of AI-enhanced drug design.
Collapse
Affiliation(s)
- Baiyu Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 202103, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Yuan Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Wei Fu
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 202103, China.
| |
Collapse
|
142
|
Yu L, Xu Z, Qiu W, Xiao X. MSDSE: Predicting drug-side effects based on multi-scale features and deep multi-structure neural network. Comput Biol Med 2024; 169:107812. [PMID: 38091725 DOI: 10.1016/j.compbiomed.2023.107812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 11/10/2023] [Accepted: 12/03/2023] [Indexed: 02/08/2024]
Abstract
Unexpected side effects may accompany the research stage and post-marketing of drugs. These accidents lead to drug development failure and even endanger patients' health. Thus, it is essential to recognize the unknown drug-side effects. Most existing methods in silico find the answer from the association network or similarity network of drugs while ignoring the drug-intrinsic attributes. The limitation is that they can only handle drugs in the maturation stage. To be suitable for early drug-side effect screening, we conceive a multi-structural deep learning framework, MSDSE, which synthetically considers the multi-scale features derived from the drug. MSDSE can jointly learn SMILES sequence-based word embedding, substructure-based molecular fingerprint, and chemical structure-based graph embedding. In the preprocessing stage of MSDSE, we project all features to the abstract space with the same dimension. MSDSE builds a bi-level channel strategy, including a convolutional neural network module with an Inception structure and a multi-head Self-Attention module, to learn and integrate multi-modal features from local to global perspectives. Finally, MSDSE regards the prediction of drug-side effects as pair-wise learning and outputs the pair-wise probability of drug-side effects through the inner product operation. MSDSE is evaluated and analyzed on benchmark datasets and performs optimally compared to other baseline models. We also set up the ablation study to explain the rationality of the feature approach and model structure. Moreover, we select model partial prediction results for the case study to reveal actual capability. The original data are available at http://github.com/yuliyi/MSDSE.
Collapse
Affiliation(s)
- Liyi Yu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Zhaochun Xu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China.
| |
Collapse
|
143
|
Xiong J, Cui R, Li Z, Zhang W, Zhang R, Fu Z, Liu X, Li Z, Chen K, Zheng M. Transfer learning enhanced graph neural network for aldehyde oxidase metabolism prediction and its experimental application. Acta Pharm Sin B 2024; 14:623-634. [PMID: 38322350 PMCID: PMC10840476 DOI: 10.1016/j.apsb.2023.10.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 09/07/2023] [Accepted: 10/11/2023] [Indexed: 02/08/2024] Open
Abstract
Aldehyde oxidase (AOX) is a molybdoenzyme that is primarily expressed in the liver and is involved in the metabolism of drugs and other xenobiotics. AOX-mediated metabolism can result in unexpected outcomes, such as the production of toxic metabolites and high metabolic clearance, which can lead to the clinical failure of novel therapeutic agents. Computational models can assist medicinal chemists in rapidly evaluating the AOX metabolic risk of compounds during the early phases of drug discovery and provide valuable clues for manipulating AOX-mediated metabolism liability. In this study, we developed a novel graph neural network called AOMP for predicting AOX-mediated metabolism. AOMP integrated the tasks of metabolic substrate/non-substrate classification and metabolic site prediction, while utilizing transfer learning from 13C nuclear magnetic resonance data to enhance its performance on both tasks. AOMP significantly outperformed the benchmark methods in both cross-validation and external testing. Using AOMP, we systematically assessed the AOX-mediated metabolism of common fragments in kinase inhibitors and successfully identified four new scaffolds with AOX metabolism liability, which were validated through in vitro experiments. Furthermore, for the convenience of the community, we established the first online service for AOX metabolism prediction based on AOMP, which is freely available at https://aomp.alphama.com.cn.
Collapse
Affiliation(s)
- Jiacheng Xiong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Rongrong Cui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhaojun Li
- College of Computer and Information Engineering, Dezhou University, Dezhou 253023, China
- AI Department, Suzhou Alphama Biotechnology Co., Ltd., Suzhou 215000, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Runze Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaohong Liu
- AI Department, Suzhou Alphama Biotechnology Co., Ltd., Suzhou 215000, China
| | - Zhenghao Li
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, Shanghai 200031, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- State Key Laboratory of Pharmaceutical Biotechnology, Nanjing University, Nanjing 210023, China
| |
Collapse
|
144
|
Isert C, Atz K, Riniker S, Schneider G. Exploring protein-ligand binding affinity prediction with electron density-based geometric deep learning. RSC Adv 2024; 14:4492-4502. [PMID: 38312732 PMCID: PMC10835705 DOI: 10.1039/d3ra08650j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 01/19/2024] [Indexed: 02/06/2024] Open
Abstract
Rational structure-based drug design relies on accurate predictions of protein-ligand binding affinity from structural molecular information. Although deep learning-based methods for predicting binding affinity have shown promise in computational drug design, certain approaches have faced criticism for their potential to inadequately capture the fundamental physical interactions between ligands and their macromolecular targets or for being susceptible to dataset biases. Herein, we propose to include bond-critical points based on the electron density of a protein-ligand complex as a fundamental physical representation of protein-ligand interactions. Employing a geometric deep learning model, we explore the usefulness of these bond-critical points to predict absolute binding affinities of protein-ligand complexes, benchmark model performance against existing methods, and provide a critical analysis of this new approach. The models achieved root-mean-squared errors of 1.4-1.8 log units on the PDBbind dataset, and 1.0-1.7 log units on the PDE10A dataset, not indicating significant advantages over benchmark methods, and thus rendering the utility of electron density for deep learning models context-dependent. The relationship between intermolecular electron density and corresponding binding affinity was analyzed, and Pearson correlation coefficients r > 0.7 were obtained for several macromolecular targets.
Collapse
Affiliation(s)
- Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Sereina Riniker
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| |
Collapse
|
145
|
Wu J, Chen Y, Wu J, Zhao D, Huang J, Lin M, Wang L. Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors. J Cheminform 2024; 16:13. [PMID: 38291477 PMCID: PMC10829268 DOI: 10.1186/s13321-023-00799-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 12/22/2023] [Indexed: 02/01/2024] Open
Abstract
Conventional machine learning (ML) and deep learning (DL) play a key role in the selectivity prediction of kinase inhibitors. A number of models based on available datasets can be used to predict the kinase profile of compounds, but there is still controversy about the advantages and disadvantages of ML and DL for such tasks. In this study, we constructed a comprehensive benchmark dataset of kinase inhibitors, involving in 141,086 unique compounds and 216,823 well-defined bioassay data points for 354 kinases. We then systematically compared the performance of 12 ML and DL methods on the kinase profiling prediction task. Extensive experimental results reveal that (1) Descriptor-based ML models generally slightly outperform fingerprint-based ML models in terms of predictive performance. RF as an ensemble learning approach displays the overall best predictive performance. (2) Single-task graph-based DL models are generally inferior to conventional descriptor- and fingerprint-based ML models, however, the corresponding multi-task models generally improves the average accuracy of kinase profile prediction. For example, the multi-task FP-GNN model outperforms the conventional descriptor- and fingerprint-based ML models with an average AUC of 0.807. (3) Fusion models based on voting and stacking methods can further improve the performance of the kinase profiling prediction task, specifically, RF::AtomPairs + FP2 + RDKitDes fusion model performs best with the highest average AUC value of 0.825 on the test sets. These findings provide useful information for guiding choices of the ML and DL methods for the kinase profiling prediction tasks. Finally, an online platform called KIPP ( https://kipp.idruglab.cn ) and python software are developed based on the best models to support the kinase profiling prediction, as well as various kinase inhibitor identification tasks including virtual screening, compound repositioning and target fishing.
Collapse
Affiliation(s)
- Jiangxia Wu
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yihao Chen
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Jingxing Wu
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Duancheng Zhao
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Jindi Huang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - MuJie Lin
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China.
| |
Collapse
|
146
|
Ishiai S, Yasuda I, Endo K, Yasuoka K. Graph-Neural-Network-Based Unsupervised Learning of the Temporal Similarity of Structural Features Observed in Molecular Dynamics Simulations. J Chem Theory Comput 2024; 20:819-831. [PMID: 38190503 DOI: 10.1021/acs.jctc.3c00995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Classification of molecular structures is a crucial step in molecular dynamics (MD) simulations to detect various structures and phases within systems. Molecular structures, which are commonly identified using order parameters, were recently identified using machine learning (ML), that is, the ML models acquire structural features using labeled crystals or phases via supervised learning. However, these approaches may not identify unlabeled or unknown structures, such as the imperfect crystal structures observed in nonequilibrium systems and interfaces. In this study, we proposed the use of a novel unsupervised learning framework, denoted temporal self-supervised learning (TSSL), to learn structural features and design their parameters. In TSSL, the ML models learn that the structural similarity is learned via contrastive learning based on minor short-term variations caused by perturbations in MD simulations. This learning framework is applied to a sophisticated architecture of graph neural network models that use bond angle and length data of the neighboring atoms. TSSL successfully classifies water and ice crystals based on high local ordering, and furthermore, it detects imperfect structures typical of interfaces such as the water-ice and ice-vapor interfaces.
Collapse
Affiliation(s)
- Satoki Ishiai
- Department of Mechanical Engineering, Keio University, Yokohama 223-8522, Japan
| | - Ikki Yasuda
- Department of Mechanical Engineering, Keio University, Yokohama 223-8522, Japan
| | - Katsuhiro Endo
- Department of Mechanical Engineering, Keio University, Yokohama 223-8522, Japan
- National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki 305-8568, Japan
| | - Kenji Yasuoka
- Department of Mechanical Engineering, Keio University, Yokohama 223-8522, Japan
| |
Collapse
|
147
|
Taj F, Stein LD. MMDRP: drug response prediction and biomarker discovery using multi-modal deep learning. BIOINFORMATICS ADVANCES 2024; 4:vbae010. [PMID: 38371918 PMCID: PMC10872075 DOI: 10.1093/bioadv/vbae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 12/01/2023] [Accepted: 01/16/2024] [Indexed: 02/20/2024]
Abstract
Motivation A major challenge in cancer care is that patients with similar demographics, tumor types, and medical histories can respond quite differently to the same drug regimens. This difference is largely explained by genetic and other molecular variabilities among the patients and their cancers. Efforts in the pharmacogenomics field are underway to understand better the relationship between the genome of the patient's healthy and tumor cells and their response to therapy. To advance this goal, research groups and consortia have undertaken large-scale systematic screening of panels of drugs across multiple cancer cell lines that have been molecularly profiled by genomics, proteomics, and similar techniques. These large data drug screening sets have been applied to the problem of drug response prediction (DRP), the challenge of predicting the response of a previously untested drug/cell-line combination. Although deep learning algorithms outperform traditional methods, there are still many challenges in DRP that ultimately result in these models' low generalizability and hampers their clinical application. Results In this article, we describe a novel algorithm that addresses the major shortcomings of current DRP methods by combining multiple cell line characterization data, addressing drug response data skewness, and improving chemical compound representation. Availability and implementation MMDRP is implemented as an open-source, Python-based, command-line program and is available at https://github.com/LincolnSteinLab/MMDRP.
Collapse
Affiliation(s)
- Farzan Taj
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada
- Adaptive Oncology, Ontario Institute for Cancer Research, Toronto, ON M5G 0A3, Canada
| | - Lincoln D Stein
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada
- Adaptive Oncology, Ontario Institute for Cancer Research, Toronto, ON M5G 0A3, Canada
| |
Collapse
|
148
|
Voinarovska V, Kabeshov M, Dudenko D, Genheden S, Tetko IV. When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges. J Chem Inf Model 2024; 64:42-56. [PMID: 38116926 PMCID: PMC10778086 DOI: 10.1021/acs.jcim.3c01524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/21/2023]
Abstract
Machine Learning (ML) techniques face significant challenges when predicting advanced chemical properties, such as yield, feasibility of chemical synthesis, and optimal reaction conditions. These challenges stem from the high-dimensional nature of the prediction task and the myriad essential variables involved, ranging from reactants and reagents to catalysts, temperature, and purification processes. Successfully developing a reliable predictive model not only holds the potential for optimizing high-throughput experiments but can also elevate existing retrosynthetic predictive approaches and bolster a plethora of applications within the field. In this review, we systematically evaluate the efficacy of current ML methodologies in chemoinformatics, shedding light on their milestones and inherent limitations. Additionally, a detailed examination of a representative case study provides insights into the prevailing issues related to data availability and transferability in the discipline.
Collapse
Affiliation(s)
- Varvara Voinarovska
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
- TUM
Graduate School, Faculty of Chemistry, Technical
University of Munich, 85748 Garching, Germany
| | - Mikhail Kabeshov
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
| | - Dmytro Dudenko
- Enamine
Ltd., 78 Chervonotkatska str., 02094 Kyiv, Ukraine
| | - Samuel Genheden
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
| | - Igor V. Tetko
- Molecular
Targets and Therapeutics Center, Helmholtz Munich − Deutsches
Forschungszentrum für Gesundheit und Umwelt (GmbH), Institute of Structural Biology, 85764 Neuherberg, Germany
| |
Collapse
|
149
|
Gu Y, Wang Y, Zhu K, Li W, Liu G, Tang Y. DBPP-Predictor: a novel strategy for prediction of chemical drug-likeness based on property profiles. J Cheminform 2024; 16:4. [PMID: 38183072 PMCID: PMC10771006 DOI: 10.1186/s13321-024-00800-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 01/03/2024] [Indexed: 01/07/2024] Open
Abstract
Evaluation of chemical drug-likeness is essential for the discovery of high-quality drug candidates while avoiding unwarranted biological and clinical trial costs. A high-quality drug candidate should have promising drug-like properties, including pharmacological activity, suitable physicochemical and ADMET properties. Hence, in silico prediction of chemical drug-likeness has been proposed while being a challenging task. Although several prediction models have been developed to assess chemical drug-likeness, they have such drawbacks as sample dependence and poor interpretability. In this study, we developed a novel strategy, named DBPP-Predictor, to predict chemical drug-likeness based on property profile representation by integrating physicochemical and ADMET properties. The results demonstrated that DBPP-Predictor exhibited considerable generalization capability with AUC (area under the curve) values from 0.817 to 0.913 on external validation sets. In terms of application feasibility analysis, the results indicated that DBPP-Predictor not only demonstrated consistent and reasonable scoring performance on different data sets, but also was able to guide structural optimization. Moreover, it offered a new drug-likeness assessment perspective, without significant linear correlation with existing methods. We also developed a free standalone software for users to make drug-likeness prediction and property profile visualization for their compounds of interest. In summary, our DBPP-Predictor provided a valuable tool for the prediction of chemical drug-likeness, helping to identify appropriate drug candidates for further development.
Collapse
Affiliation(s)
- Yaxin Gu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yimeng Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Keyun Zhu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
150
|
Arora S, Satija S, Mittal A, Solanki S, Mohanty SK, Srivastava V, Sengupta D, Rout D, Arul Murugan N, Borkar RM, Ahuja G. Unlocking The Mysteries of DNA Adducts with Artificial Intelligence. Chembiochem 2024; 25:e202300577. [PMID: 37874183 DOI: 10.1002/cbic.202300577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/18/2023] [Accepted: 10/23/2023] [Indexed: 10/25/2023]
Abstract
Cellular genome is considered a dynamic blueprint of a cell since it encodes genetic information that gets temporally altered due to various endogenous and exogenous insults. Largely, the extent of genomic dynamicity is controlled by the trade-off between DNA repair processes and the genotoxic potential of the causative agent (genotoxins or potential carcinogens). A subset of genotoxins form DNA adducts by covalently binding to the cellular DNA, triggering structural or functional changes that lead to significant alterations in cellular processes via genetic (e. g., mutations) or non-genetic (e. g., epigenome) routes. Identification, quantification, and characterization of DNA adducts are indispensable for their comprehensive understanding and could expedite the ongoing efforts in predicting carcinogenicity and their mode of action. In this review, we elaborate on using Artificial Intelligence (AI)-based modeling in adducts biology and present multiple computational strategies to gain advancements in decoding DNA adducts. The proposed AI-based strategies encompass predictive modeling for adduct formation via metabolic activation, novel adducts' identification, prediction of biochemical routes for adduct formation, adducts' half-life predictions within biological ecosystems, and, establishing methods to predict the link between adducts chemistry and its location within the genomic DNA. In summary, we discuss some futuristic AI-based approaches in DNA adduct biology.
Collapse
Affiliation(s)
- Sakshi Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Shiva Satija
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Aayushi Mittal
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Saveena Solanki
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Sanjay Kumar Mohanty
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Vaibhav Srivastava
- Division of Glycoscience, Department of Chemistry CBH School, Royal Institute of Technology (KTH) AlbaNova University Center, 10691, Stockholm, Sweden
| | - Debarka Sengupta
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Diptiranjan Rout
- Department of Transfusion Medicine National Cancer Institute, AIIMS, New Delhi, All India Institute of Medical Sciences, Ansari Nagar, New Delhi, 110608, India
| | - Natarajan Arul Murugan
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Roshan M Borkar
- Department of Pharmaceutical Analysis, National Institute of Pharmaceutical Education and Research (NIPER)-Guwahati, Sila Katamur Halugurisuk P.O.: Changsari, Dist, Guwahati, Assam, 781101, India
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| |
Collapse
|